Merge or join two data sets

34 visualizaciones (últimos 30 días)
Frank Olsen
Frank Olsen el 17 de Abr. de 2020
Comentada: ncyu el 3 de Nov. de 2022
I'm working on a project where I have to conduct an event study. The study is based on two timeseries data set:
1) sentiments from news posts (negative, neutral and positive), and
2) futures returns on specific asset and a relevant index.
In the news data I have isolated the date and sentiment in a table, with a huge number of rows since there're several news posts each day (also weekends). The data is further split up to negative, neutral and positive to conduct the event study and examine them seperately. The pic below is the negative.
In the returns data set I have price developments in returns for the asset and its index in a timetable. One price each trading day (not weekends and holidays).
To perform the event study I need to find a way to remove all rows that do not match a trading day and duplicate dates, so I'm left with one row for each trading day. Is that possible?
Can it be done by merging/joining the two data set, since the return data only consists of observations on trading days
Note: The return data goes back further in time since I need to compute the "normal return" based on returns from -205 days to -5 days prior to the news sentiment.
Thanks a lot for the help! I really appreciate any suggestions!
Frank
  2 comentarios
Mohammad Sami
Mohammad Sami el 17 de Abr. de 2020
you can use groupsummary to summarise your sentiment data by day first. For example it will count how many negative news item appeared on that day. Then you can merge it with your returns table.
Star Strider
Star Strider el 17 de Abr. de 2020
Explore the join function and its friends.

Iniciar sesión para comentar.

Respuesta aceptada

Peter Perkins
Peter Perkins el 27 de Abr. de 2020
You have an irregular intraday table (sentiments) and a daily timetable (returns). It sounds like you want to turn the sentiment data into a daily series? I'm gonna guess that maybe you want, on each trading day, a count of negative, neutral, and positive posts. Posts on weekends I guess you want to ignore.
Let's say you have this:
posts =
10×1 timetable
time sentiment
____________________ _________
27-Apr-2020 02:28:13 Negative
27-Apr-2020 06:01:41 Neutral
27-Apr-2020 09:56:51 Neutral
27-Apr-2020 13:57:48 Negative
27-Apr-2020 21:09:31 Positive
28-Apr-2020 00:31:11 Neutral
28-Apr-2020 02:26:17 Negative
28-Apr-2020 09:59:27 Neutral
28-Apr-2020 18:51:41 Negative
28-Apr-2020 19:01:19 Neutral
There are a bunch of ways to get the sentiment counts as separate daily count variables; here I'll show groupcountsto first get counts. In less recent releases of MATLAB, you can use groupsummary, or varfun. groupcounts has a nice way to compute daily counts, but here you want daily-by-sentiment counts, so discretize the times to dates before calling groupcounts.
>> posts.time = dateshift(posts.time,'start','day');
>> posts2 = groupcounts(posts,["time" "sentiment"])
posts2 =
5×3 table
time sentiment GroupCount
___________ _________ __________
27-Apr-2020 Negative 2
27-Apr-2020 Neutral 2
27-Apr-2020 Positive 1
28-Apr-2020 Negative 2
28-Apr-2020 Neutral 3
Now you need to make separate variables for each sentiment; that's unstack.
>> posts3 = unstack(posts2,'GroupCount','sentiment')
posts3 =
2×4 table
time Negative Neutral Positive
___________ ________ _______ ________
27-Apr-2020 2 2 1
28-Apr-2020 2 3 NaN
The NaN is a bit annoying; it's because unstack uses sum by default for aggregation. In the R2020a version of MATLAB that's just out, you can work around that by specifying @numel, but it's also easy to use replacemissing.
>> posts4 = fillmissing(posts3,'constant',0,'DataVariables',["Negative" "Neutral" "Positive"])
posts4 =
2×4 table
time Negative Neutral Positive
___________ ________ _______ ________
27-Apr-2020 2 2 1
28-Apr-2020 2 3 0
Now you are in business. Figure out the weekdays during the period you care about, create a daily datetime vector, and synchronize your posts and your returns timetables to that time vector.

Más respuestas (0)

Categorías

Más información sobre Startup and Shutdown en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by