Handling SPELL data with exact dates (feature request?) - r

I'm learning TraMineR and have used different types of longitudinal data. My original data is SPELL data with id, start time, end time and status, where the start and end times are exact dates, so my subsequences have varying lengths
With seqformat() I can chop the data (automatically) into 1 year pieces and convert into STS format, eg. where first variable is the first date, second variable is the first date + 1 year and so on.
What I would like to do is adjust the conversion so that I could use half year or one month time periods.
Here I have converted the dates into years with decimals with decimal.date():
id start end status
1 1 1965.138 1965.974 1
2 1 1968.714 1987.237 1
3 1 1985.667 2003.933 2
4 1 1988.499 1988.665 1
5 1 1996.652 1996.878 1
The sequence object that is created automatically has the data in one year subsequences:
$ y1960.16803278689
$ y1961.16803278689
$ y1962.16803278689
$ y1963.16803278689
So with data with dates I would like to have the option to use also shorter than 1 year subsequence lengths. I understand that with seqgranularity() the opposite is possible.
Alternatively I'm interested to know if there's some way in R outside TraMineR to handle the SPELL data to create certain length subsequences.

Related

Turn date column into days from beginning integer Rstudio

Hi everyone so I am currently plotting time series graphs in Rstudio, I have created a nice time series graph however I would actually like the x axis not to be showing me the date but more like an integer showing a number from the starting date of the graph.
Time Series Graph
Such as instead of seeing 01/01/2021 I want to see day 100, as in its the 100th day of recording data.
Do i need to create another column converting all the days into a numerical value then plot this?
If so how do i do this. At the moment all i have is a Date column and the value i am plotting column.
Column Data
Thanks
Assuming you want 01/01/2021 as first day you could use that as a reference and calculate the number of days passed since the first day of recording and plot that, this should give you more like an integer showing a number from the starting date.
Not sure what your data frame looks like so hopefully this helps.
Using lubridate
library(lubridate)
df
Date
1 01/01/2021
2 02/01/2021
3 03/01/2021
4 04/01/2021
df$days <- yday(dmy(df$Date)) -1
Output:
Date days
1 01/01/2021 0
2 02/01/2021 1
3 03/01/2021 2
4 04/01/2021 3
Which is indeed a numeric
str(df$days)
num [1:4] 0 1 2 3
This a simulation of dates
date.simulation = as.Date(1:100, "2001-01-01")
factor(date.simulation-min(date.simulation))
You just subtract the dates to the minimum date. And you need it as a factor for plotting purposes.

R: Function to repeat counting words in strings with different arguments

I am counting the sum of words of strings with specific arguments e.g. for weeks (week 1 = 1, week 2 = 2 and so on) with the following command:
sum(data[which(data[,17]==1), 19])
[,17] is the column in the data frame of the numeric argument of the week which has to be 1 for week 1
, 19 is the column in the data frame of the number of words of each string
I have 31 weeks and 228.000 strings and I do not want to execute each command for each week seperately so I am searching for a function which can do it automatically for week 1-31 and gives me the results.
Thanks for helping!

intersect interval of time by a character variable

I have a dataset with worker curriculum with:
2 time variable: df$from and df$to
1 name variable: df$name
1 bureau variable: df$bureau
I would like to generate a varible df$varable which sum 1 point every time there is an intersection of time for different person in the same bureau.
I am able to do intersect(df$from, df$to) but not to condition on df$bureau.

Time Series Analysis By Minute over a Year: How To Create Frame in R

I have a .csv of 1,052,640 rows. Each row is a reading of activity within a 1 minute interval for 2 years (7/1/2014 to 6/30/2016)
Using R, I imported the data into a dataframe like so:
uri = 'summary.csv'
df.visits <- read.csv(uri, header=FALSE)
names(df.visits) <- c("DateTime", "Visits")
df.visits <- data.frame(df.visits)
head(df.visits)
with the output
DateTime Visits
1 7/1/2014 12:00:00 AM 0
2 7/1/2014 12:01:00 AM 0
3 7/1/2014 12:02:00 AM 0
I am trying to push that dataframe into a time series structure like this:
ts.visits <- ts(df.visits,frequency=525960, start=c(2014,7,1))
head(ts.visits)
and the output is:
DateTime Visits
[1,] 788041 0
[2,] 788043 0
[3,] 788045 0
[4,] 788047 0
My question - is 525960 the correct value to use for frequency? What happens if there is a leap year? Are the dateTime values ('788041') correct? I want to do seasonality analysis by time of day, day of week, and month of year.
In R, ts objects are for time series with fixed seasonal period. If you want to consider the fact that there are a varying number of seconds in a year because of leap years, you have to use something else. The package xts is an alternative for arbitrary observation times.
Also, the column DateTime in your ts object (actually, mts) are NOT the times that the object uses internally. They are treated as the observations of another time series. The actual times can be obtained with time(ts.visits).

Selecting Specific Dates in R

I am wondering how to create a subset of data in R based on a list of dates, rather than by a date range.
For example, I have the following data set data which contains 3 years of 6-minute data.
date zone month day year hour minute temp speed gust dir
1 09/06/2009 00:00 PDT 9 6 2009 0 0 62 2 15 156
2 09/06/2009 00:06 PDT 9 6 2009 0 6 62 13 16 157
I have used breeze<-subset(data, ws>=15 & wd>=247.5 & wd<=315, select=date:dir) to select the rows which meet my criteria for a sea breeze, which is fine, but what I want to do is create a subset of the days which contain those times that meet my criteria.
I have used...
as.character(breeze$date)
trimdate<-strtrim(breeze$date, 10)
breezedate<-as.Date(trimdate, "%m/%d/%Y")
breezedate<-format(breezedate, format="%m/%d/%Y")
...to extract the dates from each row that meets my criteria so I have a variable called breezedate that contains a list of the dates that I want (not the most eloquent coding to do this, I'm sure). There are about two-hundred dates in the list. What I am trying to do with the next command is in my original dataset data to create a subset which contains only those days which meet the seabreeze criteria, not just the specific times.
breezedays<-(data$date==breezedate)
I think one of my issues here is that I am comparing one value to a list of values, but I am not sure how to make it work.
Lets assume your breezedate list looks like this and data$date is simple string:
breezedate <- as.Date(c("2009-09-06", "2009-10-01"))
This is probably want you want:
breezedays <- data[as.Date(data$date, '%m/%d/%Y') %in% breezedate]
The intersect() function (docs) will allow you to compare one data frame to another and return those records that are the same.
To use, run the following:
breezedays <- intersect(data$date,breezedate) # returns into breezedays all records that are shared between data$date and breezedate

Resources