I'm very new to R so I'm sorry if some of the terminology I use is incorrect. I have a large .csv file for daily visits, with the columns including the date (in D/M/Y format) and the number of visits that day in a seperate column. The date starts on 05/01/20 and ends on 06/11/20. I've plotted a daily time series of this data, and now I'm trying to plot a weekly time series, with the total sum of daily visits totaled together to get a weekly total, starting Monday and ending Sunday. I have looked through other similar questions on this site, and came across this code:
Week <- as.Date(cut(DF$Date, "week"))
aggregate(Frequency ~ Week, DF, sum)
However, I can't seem to get it to work. I would prefer to keep it as simple as possible. I also have the forecast package and zoo package installed if that helps.
Related
I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.
I would like to calculate rolling yearly differences based on a daily time series in R with a xts object. However, I see currently two issues:
The number of trading days per year is not constant.
There could be holes in the time series, e.g. one year could be missing in-between.
Are there functions available in the library to take such rolling differences without constant lags (e.g. a lag of 260 days could be off by 10 days sometimes)? Or would the correct approach here to search for each date the same date one year before (minus one or two days to account for weekends)?
I have a daily rainfall data for 36 years. I want to analyze the time series, but my data is still in the form of frame data, how I change the frame data into time series. My data is a variable, how to unify the year number with the date and month, so the data is only in one column
You could use a time series package for that, such as fpp i.e. install.packages('fpp'). Since you don't give an example code, I can't really help you properly with it but it's quite easy.
ts(your_data, start =, frequency = ) At start = you put the year or month where you'd start and at frequency = you'd put e.g. 36 since you talk about 36 years.
You might want to check out https://robjhyndman.com/. He has an online (free) book available that walks you through the use of his package as well as providing useful information with respect to time series analysis.
Hope this helps.
I have a workspace where I have study of the weather of every hour past one year (temperature, CO2 and stuff).
What I need to do is split whole workspace depending on date (cause I have several 2009-01-01 etc) and in next step summarize the data for each day separetly (I'm looking for summary of every variable for every day separetly).
I was searching for some kind of function and have one, which is almoust good. Separating day works quite good, but summary is really bad.
df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2009-01-01"), by="day", length.out=31), each=1))
summary(split(df, as.Date(df$date),AM19))
I've seen a lot of solutions to working with groups of times or date, like aggregate to sum daily observations into weekly observations, or other solutions to compute a moving average, but I haven't found a way do what I want, which is to pluck relative dates out of data keyed by an additional variable.
I have daily sales data for a bunch of stores. So that is a data.frame with columns
store_id date sales
It's nearly complete, but there are some missing data points, and those missing data points are having a strong effect on our models (I suspect). So I used expand.grid to make sure we have a row for every store and every date, but at this point the sales data for those missing data points are NAs. I've found solutions like
dframe[is.na(dframe)] <- 0
or
dframe$sales[is.na(dframe$sales)] <- mean(dframe$sales, na.rm = TRUE)
but I'm not happy with the RHS of either of those. I want to replace missing sales data with our best estimate, and the best estimate of sales for a given store on a given date is the average of the sales 7 days prior and 7 days later. E.g. for Sunday the 8th, the average of Sunday the 1st and Sunday the 15th, because sales is significantly dependent on day of the week.
So I guess I can use
dframe$sales[is.na(dframe$sales)] <- my_func(dframe)
where my_func(dframe) replaces every stores' sales data with the average of the store's sales 7 days prior and 7 days later (ignoring for the first go around the situation where one of those data points is also missing), but I have no idea how to write my_func in an efficient way.
How do I match up the store_id and the dates 7 days prior and future without using a terribly inefficient for loop? Preferably using only base R packages.
Something like:
with(
dframe,
ave(sales, store_id, FUN=function(x) {
naw <- which(is.na(x))
x[naw] <- rowMeans(cbind(x[naw+7],x[naw-7]))
x
}
)
)