Creating new datasets from unique dates in R - r

I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.

Related

Average after 2 group_by's in R

I am new to R can't find the right syntax for a specific average I need. I have a large fitbit dataset of heartrate per second for 30 people, for a month each. I want an average of heartrate per day per person to make the data easier to manage and join with other fitbit data.
First few lines of Data
The columns I have are Id (person Id#), Time (Date-Time), and Value (Heartrate). I already separated Time into two columns, one for date and one for time only. My idea is to group the information by person, then by date and get one average number per person per day. But, my code is not doing that.
hr_avg <- hr_per_second %>% group_by(Id) %>% group_by(Date) %>% summarize(mean(Value))
As a result I get an average by date only. I can't do this manually because the dataset is so big, Excel can't open it. And I can't upload it to BigQuery either, the database I learned to use during my data analysis course. Thanks.

Converting daily .csv data in R to aggregate weekly data using R

I'm very new to R so I'm sorry if some of the terminology I use is incorrect. I have a large .csv file for daily visits, with the columns including the date (in D/M/Y format) and the number of visits that day in a seperate column. The date starts on 05/01/20 and ends on 06/11/20. I've plotted a daily time series of this data, and now I'm trying to plot a weekly time series, with the total sum of daily visits totaled together to get a weekly total, starting Monday and ending Sunday. I have looked through other similar questions on this site, and came across this code:
Week <- as.Date(cut(DF$Date, "week"))
aggregate(Frequency ~ Week, DF, sum)
However, I can't seem to get it to work. I would prefer to keep it as simple as possible. I also have the forecast package and zoo package installed if that helps.

R - how to create lagged variables by id, day, assessment nr and specifying the interval

Maybe someone here can help me out!
What I need to do in R is:
create lags for multiple variables considering id, day and day_nr (as I have multiple assessments for each participants on each day, and no lags should be created overnight, meaning no lag for the first assessment in the morning by the last observation on the former day)
I tried several options, for example this, but didnt manage to put in more than id:
library(data.table)
data[, lag.value:=c(NA, value[-.N]), by=id]
Furthermore, I now included the specific day time of the assessment and lags should only be created for obsersavtions with an interval <3hours between them, as number of assessment per day are irregular. Any idea how i could do this in R?
Thanks a lot!!
Tine

Sumarizing data depending on time and date

I have a workspace where I have study of the weather of every hour past one year (temperature, CO2 and stuff).
What I need to do is split whole workspace depending on date (cause I have several 2009-01-01 etc) and in next step summarize the data for each day separetly (I'm looking for summary of every variable for every day separetly).
I was searching for some kind of function and have one, which is almoust good. Separating day works quite good, but summary is really bad.
df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2009-01-01"), by="day", length.out=31), each=1))
summary(split(df, as.Date(df$date),AM19))

Compute average over sliding time interval (7 days ago/later) in R

I've seen a lot of solutions to working with groups of times or date, like aggregate to sum daily observations into weekly observations, or other solutions to compute a moving average, but I haven't found a way do what I want, which is to pluck relative dates out of data keyed by an additional variable.
I have daily sales data for a bunch of stores. So that is a data.frame with columns
store_id date sales
It's nearly complete, but there are some missing data points, and those missing data points are having a strong effect on our models (I suspect). So I used expand.grid to make sure we have a row for every store and every date, but at this point the sales data for those missing data points are NAs. I've found solutions like
dframe[is.na(dframe)] <- 0
or
dframe$sales[is.na(dframe$sales)] <- mean(dframe$sales, na.rm = TRUE)
but I'm not happy with the RHS of either of those. I want to replace missing sales data with our best estimate, and the best estimate of sales for a given store on a given date is the average of the sales 7 days prior and 7 days later. E.g. for Sunday the 8th, the average of Sunday the 1st and Sunday the 15th, because sales is significantly dependent on day of the week.
So I guess I can use
dframe$sales[is.na(dframe$sales)] <- my_func(dframe)
where my_func(dframe) replaces every stores' sales data with the average of the store's sales 7 days prior and 7 days later (ignoring for the first go around the situation where one of those data points is also missing), but I have no idea how to write my_func in an efficient way.
How do I match up the store_id and the dates 7 days prior and future without using a terribly inefficient for loop? Preferably using only base R packages.
Something like:
with(
dframe,
ave(sales, store_id, FUN=function(x) {
naw <- which(is.na(x))
x[naw] <- rowMeans(cbind(x[naw+7],x[naw-7]))
x
}
)
)

Resources