R HTS package: combinef and aggts not working with gts object

R HTS package: combinef and aggts not working with gts object - r

I'm trying to apply the combinef and aggts functions from the R hts package to a time series matrix in order to obtain an optimized set of forecasts across a hierarchy. I've run the same code every month without issue, and am now seeing errors after upgrading to hts package v4.5.
Reproducible example (I can share data file offline if needed)
#Read in forecast data for all levels of hierarchy#
fcast<-read.csv("SampleHierarchyForecast.csv", header = TRUE, check.names = FALSE)
#Convert to time series#
fcast<-ts(fcast, start = as.numeric(2010.25) + (64)/12, end = as.numeric(2010.25) + (75)/12, f= 12)
#Create time series of only the bottom level of the hierarchy#
index<-c()
fcastBottom<-fcast
for (i in 1:length(fcastBottom [1,]))
{
if(nchar(colnames(fcastBottom)[i])!=28)
index[i]<-i
else
index[i]<-0
}
fcastBottom<-fcastBottom[,-index]
#Create grouped time series from the bottom level forecast #
GtsForecast <- gts(fcastBottom, characters = list(c(12,12), c(4)), gnames = c("Category", "Item", "Customer", "Category-Customer"))
#Use combinef function to optimally combine the full hierarchy forecast using the groups from the full hierarchy gts#
combo <- combinef(fcast, groups = GtsForecast$groups)
*Warning message:
In mapply(rep, as.list(gnames), times, SIMPLIFY = FALSE) :
longer argument not a multiple of length of shorter*
traceback()
2: stop("Argument fcasts requires all the forecasts.")
1: combinef(fcast, groups = GtsForecast$groups)

There's a little bug when comebinef() function calls gts(). Now I've fixed it on github. So you can run your own code above without any trouble after updating the development version.
Alternatively, you need to tweak your code a bit if you don't want to install the newest version.
combo <- combinef(fcast, groups = GtsForecast$groups, keep = "bottom")
combo <- ts(combo, start = as.numeric(2010.25) + (64)/12,
end = as.numeric(2010.25) + (75)/12, f = 12)
colnames(combo) <- colnames(fcastBottom)
newGtsForecast <- gts(combo, characters = list(c(12,12), c(4)),
gnames = c("Category", "Item", "Customer",
"Category-Customer"))
Aggregate <- aggts(newGtsForecast)
Hope it helps.

Related

Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { : argument is of length zero

I am using WarbleR in R to do some acoustic analyses. As freq_range couldn't detect all the bottom frequencies very well, I have created a data frame manually with all the right bottom frequencies, loaded this into R and turned it into a selection table. Traq_freq_contour and compare.methods and freq_DTW all work fine (although freq_DTW does give a warning message:
Warning message: In (0:(n - 1)) * f : NAs produced by integer overflow
However. If I try to do the function cross_correlation, I get the following error:
Error in if (ncol(spc1$amp) > ncol(spc2$amp)) { :
argument is of length zero
I do not get this error with a selection table with the bottom and top frequency added with the freq_range function in R instead of manually. What could be the issue here? The selection tables both look similar:
This is the selection table partly made by R through freq_range:
And this is the one with the bottom frequencies added manually (which has more sound files than the one before):
This is part of the code I use:
#Comparing methods for quantitative analysis of signal structure
compare.methods(X = stnew, flim = c(0.6,2.5), bp = c(0.6,2.5), methods = c("XCORR", "dfDTW"))
#Measure acoustic parameters with spectro_analysis
paramsnew <- spectro_analysis(stnew, bp = c(0.6,2), threshold = 20)
write.csv(paramsnew, "new_acoustic_parameters.csv", row.names = FALSE)
#Remove parameters derived from fundamental frequency
paramsnew <- paramsnew[, grep("fun|peakf", colnames(paramsnew), invert = TRUE)]
#Dynamic time warping
dm <- freq_DTW(stnew, length.out = 30, flim = c(0.6,2), bp = c(0.6,2), wl = 300, img = TRUE)
str(dm)
#Spectrographic cross-correlation
xcnew <- cross_correlation(stnew, wl = 300, na.rm = FALSE)
str(xc)
Any idea what I'm doing wrong?

Problems formating data (biomod2)

I keep running into an error while trying to run the BIOMOD_FormatingData()-function.
I have checked through all arguments and removed any NA-values, the explanatory variables are the same for both the testing and training datasets (independent datasets), and I've generated pseudo-absence data for the evaluation dataset (included in eval.resp.var).
Has anyone run into this error before? and if so, what was the issue related to? This is my first time using Biomod2 for ensemble modelling and I've run out of ideas as to what could be causing this error!
Here is my script and the subsequent error:
library(biomod2)
geranium_data <-
BIOMOD_FormatingData(
resp.var = SG.occ.train['Geranium.lucidum'],
resp.xy = SG.occ.train[, c('Longitude', 'Latitude')],
expl.var = SG.variables,
resp.name = "geranium_data",
eval.resp.var = SG.test.data['Geranium.lucidum'],
eval.expl.var = SG.variables,
eval.resp.xy = SG.test.data[, c('Longitude', 'Latitude')],
PA.nb.rep = 10,
PA.nb.absences = 4650,
PA.strategy = 'random',
na.rm = TRUE
)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= geranium_data Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Response variable name was converted into geranium.data
> Pseudo Absences Selection checkings...
> random pseudo absences selection
> Pseudo absences are selected in explanatory variablesError in `names<-`(`*tmp*`, value = c("calibration", "validation")) : incorrect number of layer names

Loop in R through variable names with values as endings and create new variables from the result

I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)

You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.

how to interpolate data within groups in R using seqtime?

I am trying to use seqtime (https://github.com/hallucigenia-sparsa/seqtime) to analyze time-serie microbiome data, as follow:
meta = data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta<- meta[order(meta$day, meta$condition),]
meta.ts<-as.data.frame(t(meta))
otu=matrix(1:390, ncol = 39)
oturar<-rarefyFilter(otu, min=0)
rarotu<-oturar$rar
time<-meta.ts[1,]
interp.otu<-interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
the interpolation returns the following error:
[1] "Processing group a"
[1] "Number of members 13"
intervals
0
12
[1] "Selected interval: 1"
[1] "Length of time series: 13"
[1] "Length of time series after interpolation: 1"
Error in stinepack::stinterp(time.vector, as.numeric(x[i, ]), xout = xout, :
The values of x must strictly increasing
I tried to change method to "hyman", but it returns the error below:
Error in interpolateSub(x = x, time.vector = time.vector, method = method) :
Time points must be provided in chronological order.
I am using R version 3.6.1 and I am a bit new to R.
Please can anyone tell me what I am doing wrong/ how to go around these errors?
Many thanks!

I used quite some time stumbling around trying to figure this out. It all comes down to the data structure of meta and the resulting time variable used as input for the time.vector parameter.
When meta.ts is being converted to a data frame, all strings are automatically converted to factors - this includes day.
To adjust, you can edit your code to the following:
library(seqtime)
meta <- data.table::data.table(day=rep(c(15:27),each=3), condition =c("a","b","c"))
meta <- meta[order(meta$day, meta$condition),]
meta.ts <- as.data.frame(t(meta), stringsAsFactors = FALSE) # Set stringsAsFactors = FALSE
otu <- matrix(1:390, ncol = 39)
oturar <- rarefyFilter(otu, min=0)
rarotu <- oturar$rar
time <- as.integer(meta.ts[1,]) # Now 'day' is character, so convert to integer
interp.otu <- interpolate(rarotu, time.vector = time,
method = "stineman", groups = meta$condition)
As a bonus, read this blogpost for information on the stringsAsFactors parameter. Strings automatically being converted to Factors is a common bewilderment.

Please select a longer horizon when the forecasts are first computed in forecast package in r

When I run the following code, I do NOT get this error:
## https://www.dataiku.com/learn/guide/code/r/time_series.html
library(readxl)
library(forecast)
library(dplyr)
library(prophet)
library(rstan)
library(Hmisc)
library(caret)
data<-read_excel("Time Series/Items.xlsx", col_types = c("text", "numeric"))
Nper=0.75
stmodels=c("meanf","naive","snaive","rwf","croston","stlf","ses","holt","hw","splinef","thetaf","ets","auto.arima","tbats","prophet")
gkuniforecast = function(data, Np, Ncolumn, tsfreq, model) {
## Preparation
N = ceiling(Np*nrow(data))
## Models
if (model=="prophet"){
df=data
names(df)=c("ds","y")
df$ds=as.Date(paste(df$ds,"-01",sep=""), "%Y-%b-%d")
train.df = df[1:N,]
na.df=data.frame(ds=rep(NA, N),y=rep(NA, N))
test.df <- rbind(na.df, df[(N+1):nrow(data),])
m <- prophet(train.df)
future <- make_future_dataframe(m, periods = nrow(data)-N, freq = 'month')
pro_forecast <- predict(m, future)
plot(m, pro_forecast)
##prophet_plot_components(m, forecast)
acc=matrix(rep(NA, 16),nrow=2,ncol=8,dimnames=list(c("Training set", "Test set"),c("ME","RMSE","MAE","MPE","MAPE","MASE","ACF1","Theil's U")))
acc["Test set","RMSE"]=sqrt(mean((pro_forecast$yhat - test.df)^2, na.rm = TRUE))
}else{
x=pull(data,Ncolumn)
train.x = ts(x[1:N], frequency=tsfreq)
test.x <- ts(c(rep(NA, N), x[(N+1):NROW(x)]), frequency=tsfreq)
str1=paste0("m_",model," = ",model,"(train.x)")
if (Np==1) {str2=paste0("f_",model," = forecast(m_",model,", h=NROW(x)")
} else {str2=paste0("f_",model," = forecast(m_",model,", h=NROW(x)-N)")}
str3=paste0("plot(f_",model,")")
str4="lines(test.x)"
str5=paste0("acc=accuracy(f_",model,",test.x)")
str=paste0(str1,";",str2,";",str3,";",str4,";",str5)
eval(parse(text=str))
}
return(acc)
}
acc = lapply(stmodels, gkuniforecast, data=data, Np=Nper, Ncolumn=2,tsfreq=12)
But when I run this code, I do:
##Forecast data prep
tsfreq=5
x=pull(data,1)
train.x = ts(x[1:N], frequency=tsfreq)
test.x <- ts(c(rep(NA, N), x[(N+1):NROW(x)]), frequency=tsfreq)
stmodels=c("meanf","naive","snaive","rwf","croston","stlf","ses","holt","hw"##,"splinef"
,"thetaf","ets","auto.arima","tbats")
for (i in 1:length(stmodels)){
str1=paste0("m_",stmodels[i]," = ",stmodels[i],"(train.x)")
str2=paste0("f_",stmodels[i]," = forecast(m_",stmodels[i],", h=NROW(x)-N)")
str3=paste0("plot(f_",stmodels[i],")")
str4="lines(test.x)"
str5=paste0('acc[["',stmodels[i],'"]]=accuracy(f_',stmodels[i],',test.x)')
str=paste0(str1,";",str2,";",str3,";",str4,";",str5)
eval(parse(text=str))
}
There seems to be a problem with 'hw' (splinef is commented out, because it gives me another error), but I do not understand why in the first dataset, I get no errors and I do with the second dataset. What is also different is the frequency.
Again the error is:
Please select a longer horizon when the forecasts are first computed

You are mixing functions that create forecasts directly (like meanf()) with functions that generate models (like ets()). For functions that generate forecasts directly, you need to specify the forecast horizon when you call the function. See https://otexts.org/fpp2/the-forecast-package-in-r.html for a list of functions that produce forecasts directly.