R - how to create lagged variables by id, day, assessment nr and specifying the interval - r

Maybe someone here can help me out!
What I need to do in R is:
create lags for multiple variables considering id, day and day_nr (as I have multiple assessments for each participants on each day, and no lags should be created overnight, meaning no lag for the first assessment in the morning by the last observation on the former day)
I tried several options, for example this, but didnt manage to put in more than id:
library(data.table)
data[, lag.value:=c(NA, value[-.N]), by=id]
Furthermore, I now included the specific day time of the assessment and lags should only be created for obsersavtions with an interval <3hours between them, as number of assessment per day are irregular. Any idea how i could do this in R?
Thanks a lot!!
Tine

Related

Analyzing disparate time series in R

Are there tools in R that simplify analysis of lagged and disparate time series. For example:
Daily values that only occur on weekdays (no entry on weekends or holidays)
vs
Bi-annual values
What I'm seeking is ways to:
Complete the missing daily values (with interpolated, or last value rolled forward, etc.)
Look for correlation between daily values and the bi-annual value (only the values that came before the bi-annual event)
As an example:
10-year treasury note interest rate (daily on non-holiday weekdays) as "X" and i-bond fixed rate as "Y" (set May 1/Nov 1)
Any suggestions appreciated.
I've built a test dataset manually for "x" and used functions in zoo to populate the missing values (interpolated), but I'm hoping for a less "brute-force" method for looking at analyzing the disparate time series. I've used lag functions in the past, but those were on matching interval time series.
What Jon commented is what I had in mind:
expand a weekday time series to full week using missing value function(s) in zoo
Sample the daily value - say April 15 for the May 1, Oct 15 for Nov 1
Ideally be able to automate - say loop through April 1-30, Oct 1-30 to look for highest RSqr for the model of choice (linear, polynomial, etc.)
Not have to build discrete datasets for each of the above - but if that is what is required I can do it programmatically - I've done that with stock data in the past. I was looking for a more efficient means of selecting the datasets ad hoc during the analysis.
I don't have code to post, because I'm clueless as to the feature/function that would make the date selection I'm after possible (at least in R).
Thanks for the input so far. It has already been useful in helping me look at alternative methods to achieve what I'm after.

Creating new datasets from unique dates in R

I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.

Average after 2 group_by's in R

I am new to R can't find the right syntax for a specific average I need. I have a large fitbit dataset of heartrate per second for 30 people, for a month each. I want an average of heartrate per day per person to make the data easier to manage and join with other fitbit data.
First few lines of Data
The columns I have are Id (person Id#), Time (Date-Time), and Value (Heartrate). I already separated Time into two columns, one for date and one for time only. My idea is to group the information by person, then by date and get one average number per person per day. But, my code is not doing that.
hr_avg <- hr_per_second %>% group_by(Id) %>% group_by(Date) %>% summarize(mean(Value))
As a result I get an average by date only. I can't do this manually because the dataset is so big, Excel can't open it. And I can't upload it to BigQuery either, the database I learned to use during my data analysis course. Thanks.

Calculate rolling yearly differences in R with xts

I would like to calculate rolling yearly differences based on a daily time series in R with a xts object. However, I see currently two issues:
The number of trading days per year is not constant.
There could be holes in the time series, e.g. one year could be missing in-between.
Are there functions available in the library to take such rolling differences without constant lags (e.g. a lag of 260 days could be off by 10 days sometimes)? Or would the correct approach here to search for each date the same date one year before (minus one or two days to account for weekends)?

Compute average over sliding time interval (7 days ago/later) in R

I've seen a lot of solutions to working with groups of times or date, like aggregate to sum daily observations into weekly observations, or other solutions to compute a moving average, but I haven't found a way do what I want, which is to pluck relative dates out of data keyed by an additional variable.
I have daily sales data for a bunch of stores. So that is a data.frame with columns
store_id date sales
It's nearly complete, but there are some missing data points, and those missing data points are having a strong effect on our models (I suspect). So I used expand.grid to make sure we have a row for every store and every date, but at this point the sales data for those missing data points are NAs. I've found solutions like
dframe[is.na(dframe)] <- 0
or
dframe$sales[is.na(dframe$sales)] <- mean(dframe$sales, na.rm = TRUE)
but I'm not happy with the RHS of either of those. I want to replace missing sales data with our best estimate, and the best estimate of sales for a given store on a given date is the average of the sales 7 days prior and 7 days later. E.g. for Sunday the 8th, the average of Sunday the 1st and Sunday the 15th, because sales is significantly dependent on day of the week.
So I guess I can use
dframe$sales[is.na(dframe$sales)] <- my_func(dframe)
where my_func(dframe) replaces every stores' sales data with the average of the store's sales 7 days prior and 7 days later (ignoring for the first go around the situation where one of those data points is also missing), but I have no idea how to write my_func in an efficient way.
How do I match up the store_id and the dates 7 days prior and future without using a terribly inefficient for loop? Preferably using only base R packages.
Something like:
with(
dframe,
ave(sales, store_id, FUN=function(x) {
naw <- which(is.na(x))
x[naw] <- rowMeans(cbind(x[naw+7],x[naw-7]))
x
}
)
)

Resources