I'm doing a forecast in R using the forecast package.
I have a time series with daily data (download the .CSV here):
library(forecast)
data <- read.csv('daily_electricity.csv')
time_series <- ts(data$value, start=c(2007,1,1), frequency=365.25)
fit <- stlf(time_series) # uses STL decomposition
plot(fit)
forecast(fit, h=365)
But when I issue the last forecast command to get predictions for the next 365 days, the output not only skips days - but the values aren't in regular date format:
2012.687 2480489
2012.689 2411931
2012.692 2582997
2012.695 2190245
2012.697 2603242
2012.700 2413211
How can I get forecasts for the next 365 days, with each value formatted with the correct date, with no missing days?
The first thing you have to do is convert the dates (which are brought in as row numbers) to an actual column.
setDT(fit, keep.rownames = TRUE)[]
Then you can use decimal_date() to convert to proper format:
fit$dates <- as.Date(date_decimal(as.numeric(fit$rn)))
Related
Good day,
I am building an auto.arima forecast in R. I was able to complete the forecast successfully, however the results is not displaying the date.
Forecast result:
The Plot
Data
So if you look at the x-axis, you see here it displays the years in periods.I would like to be able to export this data with actual dates
I use
library("tseries")
library("forecast")
library("xts")
The code:
Pulsedata$date <- as.Date(Pulsedata$date,format = "%d-%b-%y")
PD_ts <- msts(Pulsedata$Call_volume, start = c(2016, 01), end = c(2018,
365), seasonal.periods=c(365))
DPD_ts <- decompose(PD_ts, "multiplicative")
AA <- auto.arima(ts(PD_ts,frequency=365),D=1)
Myforecast <- forecast(AA,h=365)
plot(Myforecast)
I have tried:
Anydate
sweep
as.date
lubridate
setDT
I have a CSV file containing data as follows-
date, group, integer_value
The date starts from 01-January-2013 to 31-October-2015 for the 20 groups contained in the data.
I want to create a time series for the 20 different groups. But the dates are not continuous and have sporadic gaps in it, hence-
group4series <- ts(group4, frequency = 365.25, start = c(2013,1,1))
works from programming point of view but is not correct due to gaps in data.
How can I use the 'date' column of the data to create the time series instead of the usual 'frequency' parameter of 'ts()' function?
Thanks!
You could use zoo::zoo instead of ts.
Since you don't provide sample data, let's generate daily data, and remove some days to introduce "gaps".
set.seed(2018)
dates <- seq(as.Date("2015/12/01"), as.Date("2016/07/01"), by = "1 day")
dates <- dates[sample(length(dates), 100)]
We construct a sample data.frame
df <- data.frame(
dates = dates,
val = cumsum(runif(length(dates))))
To turn df into a zoo timeseries, you can do the following
library(zoo)
ts <- with(df, zoo(val, dates))
Let's plot the timeseries
plot.zoo(ts)
I have a CSV file with the format
ref_date;wings;airfoil;turbines
2015-03-31;123,22;22,77;99,0
2015-04-30;123,22;28,77;99,0
2015-05-31;123,22;22,177;02,0
2015-06-30;56,288;22,77;99,0
and I want to use the forecast package to predict the next values of this time series. The forecast package only accepts a ts object, but so far all my attempts to create one failed. I tried to
1) Use zoo package
df = read.zoo(data_file, sep=';', dec=',', format="%Y-%m-%d", header=T)
but the data is truncated at the decimal point.
2) Use the zoo package with xts
df = read.zoo(datafile, sep=';', dec=',', format="%Y-%m-%d", header=T)
df_ts = ts(df)
The dates are nowhere to be seen, the index is just a sequence of numbers, like
1 123.22 22.77 99
3) Use read.csv and ts
df = read.zoo(datafile, sep=';', dec=',', format="%Y-%m-%d", header=T)
df_ts = ts(df)
4) Try using xts
df = read.csv(data_file, sep=';', header=T, dec=',')
tt = as.xts(df[,-1],order.by = as.Date(as.character(df[,1]), format = "%Y-%m-%d"))
forecast(tt)
Error in `tsp<-`(`*tmp*`, value = tsp.y) :
invalid time series parameters specified
the result looses all information about the date, including the ref_date column, and now the forecast package gives nonsense as result.
What is the correct approach to create the object that the forecast library is waiting and can generate a forecast, maintaining the dates, including in the plots?
I have been wrestling CSV data into ZOO/XTS objects and sympathize -- painful.
Suggest using as_xts() in the tidyquant package
as_xts(read_csv(file),ref_date)
You may need to coerce the resulting coredata() in the XTS object back to numeric.
I have one netCDF file (.nc) with 16 years(1998 - 2014) worth of daily precipitation (5844 layers). The 3 dimensions are time (size 5844), latitude (size 19) and longitude (size 20)
Is there a straightforward approach in R to compute for each rastercell:
Monthly & yearly average
A cummulative comparison (e.g. jan-mar compared to the average of all jan-mar)
So far I have:
library(ncdf4)
library(raster)
Rname <- 'F:/extracted_rain.nc'
rainfall <- nc_open(Rname)
readRainfall <- ncvar_get(rainfall, "rain") #"rain" is float name
raster_rainfall <- raster(Rname, varname = "rain") # also tried brick()
asdatadates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01') #The time interval is per 24 hours
My first challenge will be the compuatation of monthly averages for each raster cell. I'm not sure how best to proceed while keeping the ultimate goal (cummulative comparison) in mind. How can I easily access only days from a certain month?
raster(readRainfall[,,500])) # doesn't seem like a straightforward approach
Hopefully I made my question clear, a first push in the right direction would be much appreciated.
Sample data here
The question asked for a solution in R, but in case anyone is looking to do this task and wants a simple alternative command-line solution, these kind of statistics are the bread and butter of CDO
Monthly averages:
cdo monmean in.nc monmean.nc
Annual averages:
cdo yearmean in.nc yearmean.nc
Make the average of all the Jan, Feb etc:
cdo ymonmean in.nc ymonmean.nc
The monthly anomaly relative to the long term annual cycle:
cdo sub monmean.nc ymonmean.nc monanom.nc
Then you want a specific month, just select with selmon, or seldate.
you can call these functions from R using the system command.
Here is one approach using the zoo-package:
### first read the data
library(ncdf4)
library(raster)
library(zoo)
### use stack() instead of raster
stack_rainfall <- stack(Rname, varname = "rain")
### i renamed your "asdatadates" object for simplicity
dates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01')
In your example dataset you only have 18 layers, all coming from January 1998. However, the following should also work with more layers (months).
First, we will build a function that operates one one vector of values (i.e. pixel time series) to convert the input to a zoo object using dates and the calculates the mean using aggregate. The function returns a vector with the length equal to the number of months in dates.
monthly_mean_stack <- function(x) {
require(zoo)
pixel.ts <- zoo(x, dates)
out <- as.numeric(aggregate(pixel.ts, as.yearmon, mean, na.rm=TRUE))
out[is.nan(out)] <- NA
return(out)
}
Then, depending on whether you want the output to be a vector / matrix / data frame or want to stay in the raster format, you can either apply the function over the cell values after retrieving them with getValues, or use the calc-function from the raster-package to create a raster output (this will be a raster stack with as many layers as there a months in your data)
v <- getValues(stack_rainfall) # every row displays one pixel (-time series)
# this should give you a matrix with ncol = number of months and nrow = number of pixel
means_matrix <- t(apply(v, 1, monthly_mean_stack))
means_stack <- calc(stack_rainfall, monthly_mean_stack)
When you're working with large raster datasets you can also apply your functions in parallel using the clusterR function. See ?clusterR
I think easiest to convert to raster brick and then into a data.frame.
Then can pull stats quite easily using general code DF$weeklymean <- rowMeans(DF[, ])
I am looking to loop over my R data frame that is in year-quarter and run a rolling regression across every year quarter. I then use the coefficients from this model to fit values that are 1 quarter ahead. But would like to use quarterly date format in R?
I had similar issue with
[Stata question] (Stata year-quarter for loop), but revisiting it in R. Does R have the notion of year quarters that can be easily used in a loop? For e.g., one possibly round about way is
months.list <- c("03","06","09","12")
years.list <- c(1992:2007)
## Loop over the month and years
for(yidx in years.list)
{
for(midx in months.list)
{
}
}
I see zoo:: package has some functions, but not sure which one can I use that is specific to my case. Some thing along the following lines would be ideal:
for (yqidx in 1992Q1:2007Q4){
z <- lm(y ~ x, data = mydata <= yqidx )
}
When I do the look ahead, I need to hand it so that the predicated value is run on the the next quarter that is yqidx + 1, and so 2000Q4 moves to 2001Q1.
If all you need help on is how to generate quarters,
require(data.table)
require(zoo)
months.list <- c("03","06","09","12")
years.list <- c(1992:2007)
#The next line of code generates all the month-year combinations.
df<-expand.grid(year=years.list,month=months.list)
#Then, we paste together the year and month with a day so that we get dates like "2007-03-01". Pass that to as.Date, and pass the result to as.yearqtr.
df$Date=as.yearqtr(as.Date(paste0(df$year,"-",df$month,"-01")))
df<-df[order(df$Date),]
Then you can use loops if you'd like. I'd personally consider using data.table like so:
require(data.table)
require(zoo)
DT<-data.table(expand.grid(year=years.list,month=months.list))
DT<-DT[order(year,month)]
DT[,Date:=as.yearqtr(as.Date(paste0(year,"-",month,"-01")))]
#Generate fake x values.
DT[,X:=rnorm(64)]
#Generate time index.
DT[,t:=1:64]
#Fake time index.
DT[,Y:=X+rnorm(64)+t]
#Get rid of the year and month columns -unneeded.
DT[,c("year","month"):=NULL]
#Create a second data.table to hold all your models.
Models<-data.table(Date=DT$Date,Index=1:64)
#Generate your (rolling) models. I am assuming you want to use all past observations in each model.
Models[,Model:=list(list(lm(data=DT[1:Index],Y~X+t))),by=Index]
#You can access an individual model thusly:
Models[5,Model]