TraMineR: extract events between equal states from SPELL-based sequence data - r

Context
This question concerns sequence analysis using TraMineR package. The package offers automatic transformation of temporal sequences (statuses in time) to event sequences (changes between statuses in time). One of the recurrent issues in my analyses concerns the options to distinguish events of change between equal statuses.
Question-specific example
Suppose we have sequences of employment statuses, e.g. work, unemployment, inactivity, retirement. The analysis is focused on career transitions, distinguishing between stable and transitional careers. All kinds of transitions are relevant, from work to unemployment, inactivity to work, but also (and most importantly) from work to work!
Question
For TraMineR an event takes place when a status in a sequence is changed. For instance, the respondent had 3 years of work and then 1 in unemployment: Work-Work-Work-Unemployment (assuming annual interval). This is the STS format, representing statuses in time. However, in SPELL format we have additional information, e.g:
Status Time1 Time2
Work 1 2
Work 2 3
Work 3 3
Unemployment 3 4
From the table above we can clearly see that two work-to-work transition events have occurred (otherwise there would be just one line: Work from 1 to 3). The question is whether there is any convenient way to extract an event object from the sequence object based on these data.
Data
My data contains work-related respondent statuses in the SPELL format (status, begin & end time), like this:
to.SO <- structure(list(ID = c(10, 11, 11, 12, 13, 13, 13, 13, 14, 14,
14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15), status = c(1,
1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3,
1, 3), time1 = c(1, 1, 104, 1, 1, 60, 109, 121, 1, 42, 47, 54,
64, 72, 78, 85, 116, 1, 29, 39, 69, 74, 78, 88), time2 = c(125,
104, 125, 125, 60, 109, 121, 125, 42, 47, 54, 64, 72, 78, 85,
116, 125, 29, 39, 69, 74, 78, 88, 125)), .Names = c("ID", "status",
"time1", "time2"), row.names = 10:33, class = "data.frame")
What I have tried
As per this post I must convert SPELL to STS first, then define sequences:
sts.data <- seqformat(data=to.SO,from="SPELL",to="STS",
id="ID",begin="time1",end="time2",status="status",
limit=125,process=FALSE)
sts.seq <- seqdef(sts.data,right="DEL")
alphabed <- c("Work","Study","Unemployed")
alphabet(sts.seq) <- alphabed
The information I require is already lost at this step, but until the bug (see link) is resolved there is no other way. It still shows what I want to achieve:
sts.seqe <- seqecreate(sts.seq) # creating events
sts.seqe
My results
Here, the first four event sequences are identical. If you look at the SPELL data (to.SO), it is apparent that there are multiple work-to-work transitions involved for respondents with id 11 and 13. In my other article I solve this by ascribing different statuses to job-1, job-2 and so forth. It is a less desirable strategy however, since it (1) explodes the number of statuses making subsequent dissimilarity analysis difficult and (2) is not theoretically important which job in career it is, the status of employment alone should cover it.
Thanks
I imagine this goes beyond the existing package capabilities, but perhaps I am missing something. Thanks in advance for reading this long post (at least) and for having any suggestions.

We could indeed imagine a solution which creates the event sequences from the spell data as you suggest. TraMineR does not offer this for now (but see Matthias' solution).
A work around, which you have already given in your question, is to distinguish the successive jobs as job1, job2, ...
I understand that this is less desirable, but you can use this strategy just for defining the event sequences assigning the same event, e.g. "start new job" to each transition from job i to job i+1. To do so you will need to specify a matrix (tmat) of size a x a where a is the size of your state alphabet, which lists in each cell(i, j), the events occurring when transiting from state i to state j. For example at the intersection of the row job1 and column job2, you would give "start new job", and since switching from job2 to job1 should not be possible you would just leave the corresponding cell empty. The cells tmat(i,i) on the diagonal define the start event when the state sequence starts in the corresponding state i.
Once you have defined the matrix (tmat) giving the events assigned to each possible transition, you create the event sequence object as
seqe <- seqecreate(sts2.seq, tevent=tmat)
And you can still use your original sts.seq for state sequence analysis with a single work status.
Hope this helps.

'seqecreate' accepts different kinds of input. One of them is a state sequences object (as produced by seqdef). But you can also build an event sequences objects by providing data in TSE format. For this, you should specify three vectors: id, timestamp, and event.
The spell format can be viewed as data in the TSE format (if you ignore the end of period). The begin column gives the time the event in the status column occured.
Therefore, we can use the following code:
## Start by giving some labels to the status vector
to.SO$event <- factor(to.SO$status, levels=1:3, labels=c("Work","Study","Unemployed"))
## Now, we can build the event sequences using seqecreate
## You may want to use timestamp=(to.SO$time1-1) instead. Events sequences start at time=0
seqe <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1, event=to.SO$event)
seqe
Now the fourth indiviudal has the correct event sequences
If you want to analyze the "Work>work" transition, then you need to recode your data.
## New vector holding our recoded events
event2 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
if(to.SO[i-1, "event"]=="Work"&& to.SO[i, "event"]=="Work"){ ##Check
event2[i] <- "Work>Work"
}
}
}
## More general case
event3 <- as.character(to.SO$event)
## For each row in the TSE data
for(i in 2:nrow(to.SO)){
if(to.SO[i-1, "ID"]==to.SO[i, "ID"]) {## If we have the same ID (individual)
event3[i] <- paste(to.SO[i-1, "event"], to.SO[i, "event"], sep=">")
}
}
By adapting this code, you can specify the transitions your are interested in.
seqe2 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event2)
seqe2
OR
seqe3 <- seqecreate(id=to.SO$ID, timestamp=to.SO$time1-1, event=event3)
seqe3

Related

Find the best combination of items based on multiple ordered criteria in Lua

I'm trying to make an algorithm in Lua to find the optimal combination of items to meet multiple ordered criteria.
Constraints are :
Finding the closest to criteria (sum of same variable for each slot) combination of items with exactly 1 item per slot
15 slots
Maximum 5 criteria, minimum 1 criteria
All values are positive integers
Criteria are prioritized such as A > B > C ...
Number of possible items per slot is theoretically between 0 and 15
For example I have a list of :
{slot: 1, valueA: 10, valueB: 20, valueC: 0}
{slot: 1, valueA: 10, valueB: 20, valueC: 16}
{slot: 2, valueA: 10, valueB: 40, valueD: 29}
{slot: 2, valueA: 30, valueB: 460, valueK: 47}
{slot: 2, valueA: 40, valueB: 50, valueC: 32}
{slot: 3, valueA: 55, valueB: 0, valueJ: 50}
With criteria such as : TotalA = 50, TotalB = 20, TotalC = 90
I want to get the best combination of items to meet TotalA and then TotalB and finally TotalC.
I tried to brute force this using loops but it takes too much time to solve this.
I've found a few discussion about the Knapsack problem and how to solve using dynamic programming or ILP solver (didn't found one in Lua however) but I'm not good enough in mathematics to figure out a working solution.
There's also a missing dimension in the Knapsack problem, the ordering of the criteria.
If someone can guide me with some simple words, pseudo code or Lua it would be awesome.

R removing duplicate values based on two columns, with different "tolerance" values for each column

I feel like I'm really close to getting this right, but can't seem to take that last step.
I have a data frame that looks like below:
df <- data.frame("Time" = c(1, 2, 2.01, 3, 4, 4, 5, 5.5, 6, 6.05, 7),
"Speed" = c(10, 20, 23, 30, 40, 49, 50, 52, 60, 62, 70))
I'm trying to remove duplicates from both Time and Speed, but with different thresholds.
With Time, anything that has a difference of <=0.05 is considered a duplicate - so (2, 2.01), (4, 4) and (6, 6.05) are duplicates.
With Speed, anything that has a difference of <=5 is considered a duplicate - so (20, 23) and (60, 62) are duplicates.
Since (2, 2.01) and (6, 6.05) Time pairs also have duplicate Speed values, I would like to keep one row and remove the other duplicate. But since the (4, 4) Time pair has a different Speed value ((40, 49) is over my tolerance value 5), I'd like to completely get rid of that row.
So my final data frame would look like:
df2 <- data.frame("Time" = c(1, 2, 3, 5, 5.5, 6, 7),
"Speed" = c(10, 20, 30, 50, 52, 60, 70))
I found a way to remove duplicates for (2, 2.01) and (6, 6.05) pairs with the line of code below:
df[!duplicated(round(df$Time, 1)),]
But what's returned from the line above still includes Time 4.0 and Speed 40 in the 4th row, and I can't find a way to completely get rid of this row. Does anyone have any suggestions?

R: Add column with what number of appearance of x the current row is

(Probably answered before but can't find it probably wording it wrong.)
Have a current dataset and where each row is a different video entry but the same videos appear over and over again.
Need every time video_id(character) of the same video appears it says what number of appearance this is.
Example: 1
Try this:
df <- data.frame(video_id=c('abc', rep('def', 2), rep('abc', 2), 'ghi', 'abc'),
views=c(100, 50, 70, 120, 150, 300, 150))
df$appearance <- ave(df$views, df$video_id, FUN=seq_along)

how do you count the number of results

Write a program that reads a series of numbers, ending with 0, and then tells you how
many numbers you have keyed in (other than the last 0). For example, if you keyed in
the numbers 5, -10, 50, 22, -945, 12, 0 it would output ‘You have entered 6 numbers.’.
doing my homework and can get this one to work
what stumps me is i understand adding the numbers to get the sum total but what do i call the number of numbers ...
thanks
Python has a very simple function that could be used here, string.count(). Given each number is separated by a comma, you can count the amount of commas to get the amount of numbers (not including the 0, which doesn't have a comma after it). An example of this in use would be
input = 5, -10, 50, 22, -945, 12, 0
Number_of_Numbers = input.count(',')

How to find positive integers of any numbers

How many positive two-digits integers are factors of (2^24 - 1)?
Can anyone tell me the formula or some shortcuts to find positive integers?
There is no easy way to find factors of a number except calculating them. You have to iterate over the two-digits integers and make mod calculation.
There appear to be 12 divisors in all: 13, 15, 17, 21, 35, 39, 45, 51, 63, 65, 85 and 91.
See: http://www.wolframalpha.com/input/?i=factorize+2%5E24+-+1
There actually is a trick that I know! First you need to prime factorize the number, lets say that my number is 56. My remaining prime numbers would be 7, 2, 2, 2. Since I have three 2s, I would write 2^3. Since there is only one 7, I will write 7^1. Then, add one to each of the powers: 1+1 3+1, and then multiply them. 1+1=2 3+1=4
4*2=8
So your for this example, the answer is 8!! Have fun!
from a fifth grader

Resources