I have player over time data that is missing player counts over several years. I'm trying to fill in/predict the missing player count data over different intervals.
Data available here: https://1drv.ms/u/s!AvEZ_QPY7OZuhJAlKJN89rH185SUhA
I'm following the instructions below that use KalmanRun to impute the missing values. I've tried 3 different approaches to transforming the data- using an xts object, and 2 approaches to converting it into time series data
https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values
require(forecast)
library(xts)
library(anytime)
library(DescTools)
df_temp = read.csv("r_share.csv")
df_temp[['DateTime']] <- as.Date(strptime(df_temp[['DateTime']], format='%Y-%m-%d %H:%M:%S'))
3 approaches to convert data; xts seems to work best by returning non-zero data that is interpretable.
#Convert df_temp to TimeSeries object
df_temp = xts(df_temp$Players, df_temp$DateTime)
#df_temp = as.ts(log(df_temp$Players), start = start(df_temp$DateTime), end = end(df_temp$DateTime), frequency = 365)
#df_temp = ts(df_temp$Players, start = c(2013, 02, 02), end = c(2016, 01, 31), frequency = 365)
Fitting and plotting:
fit <- auto.arima(df_temp, seasonal = TRUE)
id.na <- which(is.na(df_temp))
kr <- KalmanRun(index(df_temp), fit$model, update = FALSE)
#?KalmanRun$tol
for (i in id.na)
df_temp[i] <- fit$model$Z %*% kr$states[i,]
plot(df_temp)
The expected output is data that mimics the variability seen in the actual data and is different for each interval, whereas the actual output is relatively stationary and unchanging (both intervals have nearly the same prediction).
It needs to be with model arima()?.
Maybe you could try with another model, developed by Facebook named Prophet.
Here you can find the guide and github page.
If I understood you want something like this:
# Import library
library(prophet)
# Read data
df = read.csv("C:/Users/Downloads/r_share.csv",sep = ";")
# Transform to date
df["DateTime"] = as.Date(df$DateTime,format = "%d/%m/%Y")
# Change names for the model
colnames(df) = c("ds","y")
# call model
m = prophet(df)
# make "future" just one day greater than past
future = make_future_dataframe(m,periods = 1)
# predict the points
forecast = predict(m,future)
# plot results
plot(m,forecast)
Related
I have a doubt related to the forecast () function from the package Forecast.
I am using this function for forecasting the closing price of a stock given an ARIMAX model (with xreg). The doubt is: when it is forecasting, the closing price at time t depends on the external regressors at time t-1 or it (closing price) depends on the external regressors at time t?
In other words, today I still don't know the high price (i.e.) so the closing price of today cannot depend on the high price of today, but on the one of yesterday.
This function works like that or in a different way?
I hope I have been clear. Thanks!
you can setup the function to work like this yes! Though there are some steps to take:
lag the regressor as you want yesterdays value to explain todays
clean values without regressor (first value of timeseries got no regressor as it will be used for the second value of the ts)
build the regressor for prediction
model and predict
Below I wrangled something together from a few links that shows how it can be done and thus should explain how prediction with regressor in your case works with forecast:
library(quantmod)
library(forecast)
library(dplyr)
# get some finance data to play with
quantmod::getSymbols("AAPL", from = '2017-01-01',
to = "2018-03-01",warnings = FALSE,
auto.assign = TRUE)
# I prefer working with df and then convert to ts objects later
new_AAPL <- as.data.frame(AAPL)%>%
# select close values and lag high values
dplyr::transmute(AAPL.Close,
AAPL.High = lag(AAPL.High)) %>%
# keep only complete values
dplyr::filter(across(everything(), ~!is.na(.x)))
# set up new time series, regressor (watch the starting points)
AAPL.Close <- ts(new_AAPL$AAPL.Close, start = as.Date("2017-01-04"), frequency = 365)
AAPL.High <- ts(new_AAPL$AAPL.High, start = as.Date("2017-01-04"), frequency = 365)
# set up the future regressor (last value of original high values
AAPL.futureg <- ts(as.data.frame(AAPL)$AAPL.High[291], start = as.Date("2018-03-02"), frequency = 365)
# I will use a arima model here
modArima <- forecast::auto.arima(AAPL.Close, xreg=AAPL.High)
# forecast with regressor
forecast::forecast(modArima, h = 1, xreg = AAPL.futureg)
Here is where I got the infos from:
https://www.codingfinance.com/post/2018-03-27-download-price/
https://stats.stackexchange.com/questions/41070/how-to-setup-xreg-argument-in-auto-arima-in-r
First of all I have already consulted this article and this but couldn't get it to work.
I have daily data starting from 28-03-2015 till 27-02-2017.
My TS object looks like this:
bvg11_prod_ts <- ts(bvg11_data$MA_PROD, freq=365, start=c(2015, 87), end=c(2017, 58))
the below graph shows the daily values:
autoplot(bvg11_prod_ts)
I have also tried creating the daily ts object by:
bvg11_prod_ts <- ts(bvg11_data$MA_PROD, freq=7, start=c(2015, 3), end=c(2017, 02))
autoplot(bvg11_prod_ts)
which results in this plot:
As you can see both graphs are completely different, however, the first one is more accurate!
Now when i try to use the bvg11_prodsTSHoltWinter <- HoltWinters(bvg11_prod_ts) It gives error:
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
What is wrong?
The error message is pretty clear: with a frequency of 365 you'll need at least 2*365 = 730 data points.
x_err = ts(runif(729), freq = 365)
# this gives an error
fit = HoltWinters(x_err)
# this will work
x = ts(runif(730), freq = 365)
fit = HoltWinters(x)
I have a problem with performing statistical analyses of longitudinal data after
the imputation of missing values using mice. After the imputation of missings in the wide
data-format I convert the extracted data to the longformat. Because of the longitudinal
data participants have duplicate rows (3 timepoints) and this causes problems when converting the long-formatted data set into a type mids object.
Does anyone know how to create a mids object or something else appropriate after the imputation? I want to use lmer,lme for pooled fixed effects afterwards.
I tried a lot of different things, but still cant figure it out.
Thanks in advance and see the code below:
# minimal reproducible example
## Make up some data
set.seed(2)
# ID Variable, Group, 3 Timepoints outcome measure (X1-X3)
Data <- data.frame(
ID = sort(sample(1:100)),
GROUP = sample(c(0, 1), 100, replace = TRUE),
matrix(sample(c(1:5,NA), 300, replace=T), ncol=3)
)
# install.packages("mice")
library(mice)
# Impute the data in wide format
m.out <- mice(Data, maxit = 5, m = 2, seed = 9, pred=quickpred(Data, mincor = 0.0, exclude = c("ID","GROUP"))) # ignore group here for easiness
# mids object?
is.mids(m.out) # TRUE
# Extract imputed data
imp_data <- complete(m.out, action = "long", include = TRUE)[, -2]
# Converting data into long format
# install.packages("reshape")
library(reshape)
imp_long <- melt(imp_data, id=c(".imp","ID","GROUP"))
# sort data
imp_long <- imp_long[order(imp_long$.imp, imp_long$ID, imp_long$GROUP),]
row.names(imp_long)<-NULL
# save as.mids
as.mids(imp_long,.imp=1, .id=2) # doesnt work
as.mids(imp_long) # doesnt work
Best,
Julian
I hope I can answer your question with this small example. I don't really see why conversion back to the mids class is necessary. Usually when I use mice I convert the imputed data to a list of completed datasets, then analyse that list using apply.
library(mice)
library(reshape)
library(lme4)
Data <- data.frame(
ID = sort(sample(1:100)),
GROUP = sample(c(0, 1), 100, replace = TRUE),
matrix(sample(c(1:5,NA), 300, replace=T), ncol=3)
)
# impute
m.out <- mice(Data, pred=quickpred(Data, mincor=0, exclude=c("ID","GROUP")))
# complete
imp.data <- as.list(1:5)
for(i in 1:5){
imp.data[[i]] <- complete(m.out, action=i)
}
# reshape
imp.data <- lapply(imp.data, melt, id=c("ID","GROUP"))
# analyse
imp.fit <- lapply(imp.data, FUN=function(x){
lmer(value ~ as.numeric(variable)+(1|ID), data=x)
})
imp.res <- sapply(imp.fit, fixef)
Keep in mind, however, that single-level imputation is not a good idea when you're interested in relationships of variables that vary at different levels.
For these tasks you should use procedures that maintain the two-level variation and do not suppress it as mice does in this configuration.
There are workarounds for mice, but for example Mplus and the pan package in R are specifically designed for two-level MI.
No sure how relevant my answer is since you have asked a question long time ago, but in any case... In this slide deck toward the end, on the slide titled "Method POST" the author uses function long2mids():
imp1 <- mice(boys)
long <- complete(imp1, "long", inc = TRUE)
long$whr <- with(long, wgt / (hgt / 100))
imp2 <- long2mids(long)
However, long2mids() has been deprecated in favor of as.mids() since version 2.22.
as.mids() from the miceadds package will work here
I am new to R and slowly getting acquainted. My question refers to the following piece of code.
I am creating a zoo object with the following headers and then filtering by date. On the filtered dates I am subtracting two columns (Tom from Elena). Everything works fine until here.
Code below:
b <- read.zoo(b1, header = TRUE, index.column = 1, format = "%d/%m/%Y")
startDate = "2013/11/02"
endDate = "2013/12/20"
dates <- seq(as.Date(startDate), as.Date(endDate), by=1)
TE = b[dates]$Tom - b[dates]$Elena
However I am then regressing the results from my subtraction (see above TE) on Elena. However i get an error message every time i try and to this regression
TE$model <- lm(TE ~ b[dates]$Elena)
Error in $<-.zoo(*tmp*, "model", value = list(coefficients = c(-0.0597128230859905, :
not possible for univariate zoo series
I have tried creating a data frame and then doing the regression but with no avail. Any help would be appreciated. Thanks.
You can not add the outcome of a regression (a list of class lm) to a time series of class zoo.
I recommend saving the model in a separate object, e.g.,
fit <- lm(TE ~ b[dates]$Elena)
I have hourly snapshot of an event starting from 2012-05-15-0700 to 2013-05-17-1800. How can I create a Timeseries on this data and perform HoltWinters to it?
I tried the following
EventData<-ts(Eventmatrix$X20030,start=c(2012,5,15),frequency=8000)
HoltWinters(EventData)
But I got Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods
What value should I put from Frequency?
I think you should consider using ets from the package forecast to perform exponential smoothing. Read this post to have a comparison between HoltWinters and ets .
require(xts)
require(forecast)
time_index <- seq(from = as.POSIXct("2012-05-15 07:00"),
to = as.POSIXct("2012-05-17 18:00"), by = "hour")
set.seed(1)
value <- rnorm(n = length(time_index))
eventdata <- xts(value, order.by = time_index)
ets(eventdata)
Now if you want to know more about the syntax of ets check the help of this function and the online book of Rob Hyndman (Chap 7 section 6)
Please take a look at the following post which might answer the question:
Decompose xts hourly time series
Its explains how you can create a xts object using POSIXct objects. This xts object can have its frequency attribute set manually and you will probably then be able to use HoltWinters