How to create a state sequence from an event sequence in TraMineR? - r

I've created a state sequence using the code:
comp.seq <- seqdef(comp,NULL,states=comp.scodes,labels=comp.labels, alphabet=comp.alphabet,right="Z",left="Z")
then i created a event sequence from that using:
comp.seqe<-seqecreate(comp.seq,tevent="state", use.labels=FALSE)
Then I searched for subsequences using:
subs <- seqefsub(comp.seqe,strsubseq=c("(A)-(C)-(A)"))
Now all I wanna do is create some plots of the resulting sequences. But I found out that there are no plotting functions like seqplot for event sequences, thus I'd like to convert the resulting event sequences into state sequences. Is it possible ? I've tried seqdef() with the subs object but wasn't successful. Is it the appropriate function?
Thanks

Look at this answer for how to convert event sequences in time stamped event (TSE) format to state sequences. And here you will find a solution for putting the outcome of seqefsub into TSE form.
Note that plots for state sequences may not be suited for rendering the outcome of seqefsub. The returned subsequences have no time stamp, which will result in an state alignment without sound meaning.
Why not simply using plot(subs), or the seqpcplot function if you are interested in the order of the events. seqpcplot accepts directly event sequences objects as input, and the outcome of seqefsub is such an object.

Related

Is there a more efficient way to subset time series data frame on irregular, repeated binary trigger column?

I am working with a time-series data stream from an experiment. We record multiple data channels, including a trigger channel ('X7.ramptrig' in linked data: Time-Series Data Example), that indicates when a relevant event occurs in other channels.
I am trying to create subsets of the next n-rows (e.g. 15,000) of the time-series (time steps are 0.1ms) that occur after onset of a trigger ('1'). That column has multiple triggers ('1') interspersed at irregular intervals. Every other time step is a '0', indicating no new event.
I am asking to see if there is a more efficient solution to directly subset the subsequent n-rows after a trigger is detected instead of the indirect (possibly inflexible) solution I have come up with.
Link to simple example data:
https://gtvault-my.sharepoint.com/:t:/g/personal/shousley6_gatech_edu/EZZSVk6pPpJPvE0fXq1W2KkBhib1VDoV_X5B0CoSerdjFQ?e=izlkml
I have a working solution that creates an index from the trigger channel and splits the dataset on that index. Because triggers have variability in placement in time, the subsequent data frame subsets are not consistent and there are occasionally 'extra' subsets that precede 'important' ones ('res$0' in example). Additionally, I need to have the subsets be matched for total time and aligned for trigger onset.
My current solution 'cuts' the lists of data frames to the same size (in the example to the first 15,000 rows). While this technically works it seems clunky. I also tried to translate a SQL solution using FETCH NEXT but those functions are not available in the SQLite supported in R.
I am completely open to alternatives so please be unconstrained by my current solution.
##create index to detect whenever an event trigger occurs
idx<-c(0, cumsum(diff(Time_Series_Data_Example$X7.ramptrig) >0))
## split the original dataframe on event triggers
split1<-split(Time_Series_Data_Example, idx)
## cuts DFs down to 1.5s
res <- lapply(split1, function(x){
x <- top_n(x, -15000)
})
Here is an example of data output: 'head(res[["1"]]' 2
For the example data and code provided, the output is 4 subsets, 3 of which are 'important' and time synced to the trigger. The first 'res$0' is a throw away subset.
Thanks in advance and please let me know how I can improve my question asking (this is my first attempt).

Backtest over specific dates in Quanstrat R

How do I backtest over specific dates, for example 2008::2010 in Quanstrat?
I want to load symbols from 2001::2017, but i only want to back test over a subset of dates. (rather than reload the symbols every time for specific date ranges)
There is no built-in way to do this in quantstrat. In fact, there is a comment at the beginning of the apply* functions that says:
#TODO add Date subsetting
(patches welcome)
There are a number of possible ways to do this with the existing code though.
Probably the simplest way is to load all your market data into an environment, and then subset your market data into the .GlobalEnv before each call to applyStrategy.
Indicators and signals should use vectorized functions, and should take (at most) seconds to apply to the entire series. So the simplest thing is probably to run applyIndicators and applySignals manually over the entire series, and then call applyRules with just the subset you want.
You could also add a signal function that does understand subsets. This signal function would be last in the strategy specification, and would filter all your other signals to 0 outside of your preferred date range.

Using grep() or sub() on sequence objects?

I want to summarize certain patterns in an event sequence object. The reason I want to do this is that my sequences are too long (several hundred events) and this makes computation extremely difficult. I have identified frequent subsequences, and now I want to replace certain frequent subsequences with markers that denote a full subsequence (as if it were a single event).
For example, I may have a pattern that I want to replace, say FA-FA. In the sequence
FA-FA-EX-EX-FA (5 event markers)
this would now be:
FAFA_pattern-EX-EX-FA (4 event markers)
I tried something along the lines of:
library(TraMineR)
data(actcal.tse)
actcal.seqe <- seqecreate(id = actcal.tse$id,
timestamp = actcal.tse$time, event = actcal.tse$event)
actcal.seqe2 <- sub("(LowPartTime)-1-(Stop)", "replaced_pattern", actcal.seqe)
this seems to work fine, however, it converts the sequence into a text string, and it no longer functions as a sequence object. Is there a way to conduct such replacement operations while maintaining the sequence object's status as a sequence object?

Find specific patterns in sequences

I'm using R package TraMineR to make some academic research on sequence analysis.
I want to find a pattern defined as someone being in the target company, then going out, then coming back to the target company.
(simplified) I've define state A as target company; B as outside industry company and C as inside industry company.
So what I want to do is find sequences with the specific patterns A-B-A or A-C-A.
After looking at this question (Strange number of subsequences? ) and reading the user guide, specially the following passages:
4.3.3 Subsequences
A sequence u is a subsequence of x if all successive elements ui of u appear >in x in the same
order, which we simply denote by u x. According to this denition, unshared >states can appear
between those common to both sequences u and x. For example, u = S; M is a >subsequence of
x = S; U; M; MC.
and
7.3.2 Finding sequences with a given subsequence
The seqpm() function counts the number of sequences that contain a given subsequence and collects
their row index numbers. The function returns a list with two elements. The rst element, MTab,
is just a table with the number of occurrences of the given subsequence in the data. Note that
only one occurrence is counted per sequence, even when the sub-sequence appears more than one
time in the sequence. The second element of the list, MIndex, gives the row index numbers of
the sequences containing the subsequence. These index numbers may be useful for accessing the
concerned sequences (example below). Since it is easier to search a pattern in a character string,
the function rst translates the sequence data in this format when using the seqconc function with
the TRUE option.
I concluded that seqpm() was the function I needed to get the job done.
So I have sequences like:
A-A-A-A-A-B-B-B-B-B-A-A-A-A-A
And out of the definition of subsequences that i found on the mentiod sources, i figure I could find that kind of sequence by using:
seqpm(sequence,"ABA")
But that does not happen. In order to find that example sequence i need to input
seqpm(sequence,"ABBBBBA")
which is not very useful for what I need.
So do you guys see where I might've missed something ?
How can I retrieve all the sequences that do go from A to B and Back to A?
Is there a way for me to find go from A to anything else and then back to A ?
Thanks a lot !
The title of the seqpm help page is "Find substring patterns in sequences", and this is what the function actually does. It searches for sequences that contain a given substring (not a subsequence). Seems there is a formulation error in the user's guide.
A solution to find the sequences that contain given subsequences, is to convert the state sequences into event sequences with seqecreate , and then use the seqefsub and seqeapplysub function. I illustrate using the actcal data that ships with TraMineR.
library(TraMineR)
data(actcal)
actcal.seq <- seqdef(actcal[,13:24])
## displaying the first state sequences
head(actcal.seq)
## transforming into event sequences
actcal.seqe <- seqecreate(actcal.seq, tevent = "state", use.labels=FALSE)
## displaying the first event sequences
head(actcal.seqe)
## now searching for the subsequences
subs <- seqefsub(actcal.seqe, strsubseq=c("(A)-(D)","(D)-(B)"))
## and identifying the sequences that contain the subsequences
subs.pres <- seqeapplysub(subs, method="presence")
head(subs.pres)
## we can now, for example, count the sequences that contain (A)-(D)
sum(subs.pres[,1])
## or list the sequences that contain (A)-(D)
rownames(subs.pres)[subs.pres[,1]==1]
Hope this helps.

Summarizing attributes across sequences in a single sequence object?

I'm using TraMineR to analyze sets of sequences. Each coherent set of sequences may contain 100 work processes from a single project for a single period of time. Using TraMineR I can easily calculate descriptive statistics for each sequence, however I'm more interested in descriptive statistics of the sequence object itself - subsuming all the smaller sequences within.
For example, to get state frequencies, I run:
seqstatd(sequences.sts)
However, this gives me the state frequencies for each sequence within my sequence object. I want to access the frequencies of states across all sequences inside of my sequence object. How can I accomplish this?
I am not sure to understand your question since seqstatd() returns the cross-sectional frequencies at each successive position, and NOT the state frequencies for each sequence. The latter is returned by seqistatd().
Assuming you refer to the outcome of seqistatd() you would get the mean time spent in each state with seqmeant(sequence.sts).
For other summaries you can use the apply function. For instance, you get the variance of the time spent in each state with
tab <- seqistatd(mvad.seq)
vart <- apply(tab,2,var)
head(vart)
Hope this helps.

Resources