Creating a single vector for EEG data that has timing overlap issues - vector

I am trying to create a single vector from EEG sleep data at 256Hz that lists each sleep stage and event as they occur in chronological order. The goal is to be able to process sleep data as they occur in their individual stages, but by removing times when the individual is awake (so wake and arousal events need to be coded to be removed from analyses).
My problem is that my EEG program gives me events and stages in an overlapping type of format. For example, my readout of scored events lists Wake as starting at 0 and lasting 3270000000 (in microseconds). Then stage n1 might occur next and last for 3 second epochs (30000000). In the middle of this time period, we might have a microarousal that occurs at a very particular moment for a certain period of time and it will be listed after this sleep stage and then the next sleep stage or event will be listed.
Example of data in csv file:
enter image description here
You can see that the date and start time occurs in order and that the start time for sleep stages (marked by the hypnogram type) all start at 0 and the relative start time increases incrementally in relation to the duration. The duration for all sleep stages are in 30s increments (in microseconds). However, if we look at in the picture N2 lasts 360000000 (or 360ms/ or 12, 30s epochs). the relative time is noted that N2 starts at 20040000000 and N3 starts at 20400000000, exactly 360000000 µs later. the problem is in the arousals. They occur during N2 sleep with their own start times and durations.
So, I don't know how to effectivly insert all the time points chronologically into a single vector when there are overlapping events. Can anyone help me, please?
% Input CSV file
infile="C:\Users\jerom\OneDrive\Desktop\PSG_Scoring\101_C_Scoring.csv";
% Read the CSv file
[~,~,table]=xlsread(infile);
start_musec=cell2mat(table(2:end,5));
duration_musec=cell2mat(table(2:end,6));
events=table(2:end,10);
% Define event times course
event_name={'Wake' 'Lights' 'N1' 'N2' 'N3' 'REM' 'Arousal'};
sfreq_Hz=256;
tmax_samp=ceil(sfreq_Hz*(start_musec(end)+duration_musec(end))/1e6);
event_vectors=[];
event_vectors.time_sec=((1:tmax_samp)-1)/sfreq_Hz;
time_musec=(event_vectors.time_sec)*1e6;
for evt=1:numel(event_name)
event_vectors.(event_name{evt})=zeros(1,tmax_samp);
for line=1:numel(start_musec)
if contains(events(line),event_name{evt})
is_in_current_event=(time_musec >= start_musec(line)) & (time_musec <= start_musec(line)+duration_musec(line));
event_vectors.(event_name{evt})(is_in_current_event)=1;
end
end
end
i was able to convert the time points to frequencies, but I was expecting to be able to connect this into data points that could be concatenated into a single vector, but I end up with a table.

Related

Efficient way to compile NC4 file information from separate files in R

I am currently trying to compile temperature information from the WDFE5 Data set which is quite large in size and am struggling to find an efficient way to meet my goal. My main goals are to:
Determine the max temperature for individual days for each individual grid cell
Change the time step from hourly to daily and from UTC to MST.
The data set is stored in monthly NC4 files and contains the temperature data in a 3 dimensional matrix (time lat lon). My main question is if there is a efficient way to compile this data to meet my goals or to manipulate the NC4 files to be easier to play around with (Somehow merge the monthly files into one mega file?)
I have tried two rather convoluted ways to catch holes between months (Example : due to the time conversion, some dates end up spanning between two months, which requires me to read in the next file and then continuing to read the data).
My first try was to individually read 1 month / file at a time, using pmax() to get the max value of each grid cell, and comparing time steps for 24 hours, and then repeating the process. I have been using
ncvar_get() with start and count to only read one time step at a time. To catch days that span two months, I was able to create a convoluted function to merge the two, by calculating the number of 1 hour periods left in one month, and how much would be needed from the next.
My second try still involved pmax(), but I tried a different method to fill in any holes between months. I set a date vector from the time variable to each hour time step, and match by same day. While this seems better, it still has to read multiple NC4 files which gets very convoluting compared to being able to just reading one NC4 file with all the needed information.
In the end, I tested a few test cases and both seem to solutions seem to work, but run extremely slow and seem very overcomplicated to me. I was wondering if anyone had suggestions on how to better set up the NC4 files for reading and time conversion.

Take profit function of a timeseries

I am having great difficulty with this topic and I could use assistance from some experts.
I have a standard time series, I can have it as an XTS or as a dataframe with numeric inputs. The headers are typical: DATE, OPEN, HIGH, LOW, CLOSE, SMA20, SMA50, CROSSOVER.
SMA20 refers to the 20 periods simple moving average. Crossover refers to where the 20 period SMA is above or below the SMA50. If its positive it is above, if it is negative it is below.
Here is my problem that I am trying to solve. How do I create a separate column to track profits or losses??? I want to create a take profit function at 100 pips and a stop loss function at 100 pips away from the entry point. The entry point will always be the open of the next day, once the crossover happens.
What I am thinking so far is to use the open price, then look at the difference of the high and the low for the day, if difference between the high and the open is greater than 100, the trade gets closed, if it is less, the trades stays open. I have no idea how to begin coding this.

RRDTOOL one second logging, values missing

I spent more than two months with RRDTOOL to find out how to store and visualize data on graph. I'm very close now to my goal, but for some reason I don't understand why it is happening that some data are considered to be NaN in my case.
I counting lines in gigabytes sized of log files and have feeding the result to an rrd database to visualize events occurrence. The stepping of the database is 60 seconds, the data is inserted in seconds base whenever it is available, so no guarantee the the next timestamp will be withing the heartbeat or within the stepping. Sometimes no data for minutes.
If have such big distance mostly my data is considered to be NaN.
b1_5D.rrd
1420068436:1
1420069461:1
1420073558:1
1420074583:1
1420076632:1
1420077656:1
1420079707:1
1420080732:1
1420082782:1
1420083807:1
1420086881:1
1420087907:1
1420089959:1
1420090983:1
1420094055:1
1420095080:1
1420097132:1
1420098158:1
1420103284:1
1420104308:1
1420107380:1
1420108403:1
1420117622:1
1420118646:1
1420121717:1
1420122743:1
1420124792:1
1420125815:1
1420131960:1
1420134007:1
1420147326:1
1420148352:1
rrdtool create b1_4A.rrd --start 1420066799 --step 60 DS:Value:GAUGE:120:0:U RRA:AVERAGE:0.5:1:1440 RRA:AVERAGE:0.5:10:1008 RRA:AVERAGE:0.5:30:1440 RRA:AVERAGE:0.5:360:1460
The above gives me an empty graph for the input above.
If I extend the heart beat, than it will fill the time gaps with the same data. I've tried to insert zero values, but that will average out the counts and bring results in mils.
Maybe I taking something wrong regarding RRDTool.
It would be great if someone could explain what I doing wrong.
Thank you.
It sounds as if your data - which is event-based at irregular timings - is not suitable for an RRD structure. RRD prefers to have its data at constant, regular intervals, and will coerce the incoming data to match its requirements.
Your RRD is defined to have a 60s step, and a 120s heartbeat. This means that it expects one sample every 60s, and no further apart than 120s.
Your DS is a gauge, and so the values you enter (all of them '1' in your example) will be the values stored, after any time normalisation.
If you increase the heartbeat, then a value received within this time will be used to make a linear approximation to fill in all samples since the last one. This is why doing so fills the gaps with the same data.
Since your step is 60s, the smallest sample time sidth will be 1 minute.
Since you are always storing '1's, your graph will therefore either show '1' (when the sample was received in the heartbeart window) or Unknown (when the heartbeat expired).
In other words, your graph is showing exactly what you gave it. You data are being coerced into a regular set of numerical values at a 1-minute step, each being 1 or Unknown.

R - Cluster x number of events within y time period

I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
https://dl.dropboxusercontent.com/u/16400709/StackOverflow/DataStack.csv
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.

Graphite: append a "current time" point to the end of a series

I have a "succeeded" metric that is just the timestamp. I want to see the time between successive successes (this is how long the data is stale for). I have
derivative(Success)
but I also want to know how long between the last success time and the current time. since derivative transforms xs[n] to xs[n+1] - xs[n], the "last" delta doesn't exist. How can I do this? Something like:
derivative(append(Success, now()))
I don't see any graphite functions for appending series, and I don't see any user-defined graphite functions.
The general problem is to be alerted when the data is stale, via graphite monitoring. There may be a better solution than the one I'm thinking about.
identity is a function whose value at any given time is the timestamp of that time.
keepLastValue is a function that takes a series and replicates data points forward over gaps in the data.
So then diffSeries(identity("now"), keepLastValue(Success)) will be a "sawtooth" series that climbs steadily while Success isn't updated, and jumps down to zero (or close to it — there might be some time skew) every time Success has a data point. If you use graphite monitoring to get the current value of that expression and compare it to some threshold, it will probably do what you want.

Resources