How can I use summary() with lqmm and formula objects - r

I've got a list of formula objects to fit Linear Quantile Mixed Models with lqmm::lqmm().
I cannot use summary() to return model coefficients with standard errors etc. from the produced models.
d <- structure(list(DID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), pain = c(4L, 2L, 6L, 3L, 3L,
4L, 3L, 3L, 4L, 5L, 4L, 4L, 5L, 3L, 4L, 3L, 2L, 6L, 5L, 7L, 6L,
3L, 5L, 1L, 5L, 3L, 4L, 4L, 6L, 5L, 5L, 6L, 5L, 6L, 5L, 6L, 6L,
5L, 6L, 7L, 4L, 5L, 6L, 6L, 5L, 6L, 4L, 5L, 6L, 7L), wound = c(4L,
3L, 3L, 3L, 4L, 5L, 4L, 3L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 4L, 3L,
4L, 4L, 3L, 3L, 3L, 4L, 3L, 3L, 4L, 5L, 3L, 8L, 7L, 7L, 7L, 7L,
9L, 8L, 8L, 8L, 6L, 7L, 6L, 8L, 7L, 6L, 8L, 7L, 6L, 7L, 8L, 7L,
7L), mobility = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 3L,
3L, 6L, 5L, 6L, 6L, 5L, 6L, 5L, 5L, 5L, 5L, 6L, 5L, 6L, 5L, 5L,
5L, 6L, 5L, 5L, 3L, 5L, 6L)), row.names = c(NA, 50L), class = "data.frame")
library(lqmm)
x <- as.formula("pain ~ wound + mobility")
m1 <- lqmm(x,
random = ~ 1,
group = DID,
data = d)
summary(m1)
Error: object of type 'symbol' is not subsettable
I tried using eval(x) as suggested here, but got a recursion error.
m2 <- lqmm(eval(x),
random = ~ 1,
group = DID,
data = d)
summary(m2)
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Any ideas on how to extract model parameters?
Full sample data was taken from here.

Run this like below, It should work:
x <- as.formula('pain ~ wound + mobility')
m1 <- lqmm(x,
random = ~ 1,
group = DID,
data = d)
## Fixing the call fixed here.
m1$call$fixed <- x
summary(m1)
Output:
> m1$call$fixed <- x
> summary(m1)
Call: lqmm(fixed = pain ~ wound + mobility, random = ~1, group = DID,
data = d)
Quantile 0.5
Fixed effects:
Value Std. Error lower bound
(Intercept) 2.765900 1.294809 0.163883
wound 0.052025 0.077028 -0.102770
mobility 0.469649 0.127371 0.213687
upper bound Pr(>|t|)
(Intercept) 5.3679 0.0376887 *
wound 0.2068 0.5025982
mobility 0.7256 0.0005675 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
AIC:
[1] 166.1 (df = 5)
There is catch after little debug, I realised objects returned by below two approaches are not similar hence I have manipulated one of them like above:
m2 <- lqmm(pain ~ wound + mobility,
random = ~ 1,
group = DID,
data = d)
m1 <- lqmm(x,
random = ~ 1,
group = DID,
data = d)
If we closely observe m1$call and m2$call,(m1 is working well with summary) however, both are different objects and hence leading to the error which OP has encountered, I think its a bug but please let me know if there is any other explanation. Also while running all.equal(m1, m2) it tells me there is indeed a difference. So, after fiddling it with the given info, I have resetted the fixed element of list to original x (which is formula), which it seems to be working for now:
> all.equal(m1, m2)
[1] "Component “call”: target, current do not match when deparsed"

Related

Error when running a fixed effect in a GAM with an offset in R with gratia::draw()

I have a data set that looks like this:
structure(list(n = c(236896L, 73258L, 75570L, 5684L, 10242L,
2037L, 74194L, 41764L, 288115L, 6728L, 18964L, 5395L, 23192L,
12575L, 39591L, 12566L, 44458L, 126957L, 47316L, 152175L, 92913L,
81229L, 29622L, 1708L, 8526L, 52117L, 95385L, 22480L, 30521L,
51660L, 74320L, 273107L, 58L, 59686L, 77454L, 51471L, 66610L,
232321L, 53435L, 45270L), name = structure(c(9L, 9L, 6L, 5L,
2L, 5L, 6L, 9L, 6L, 4L, 4L, 4L, 4L, 2L, 9L, 2L, 6L, 1L, 4L, 6L,
4L, 9L, 2L, 5L, 3L, 4L, 6L, 2L, 9L, 2L, 4L, 4L, 7L, 9L, 1L, 6L,
6L, 6L, 8L, 2L), .Label = c("Ami", "Cho", "Fal", "For", "Ric",
"Sam", "Taw", "Tex", "Tol"), class = "factor"), change2 = c(0.0753607803884176,
-0.08058465598786, -0.00410425493512865, -0.0220964428266722,
0.0629320532004209, -0.0797306134519322, 0.0660481799732004,
-0.0572995403797303, -0.00713582946272, 0.00756646981276647,
0.032732914683994, -0.00632056690250293, 0.050358229187504, 0.0265162711945312,
0.0218803226963826, -0.0508818612242459, 0.00485925918649957,
0.0315158006542641, -0.0315622434590242, -0.0602515470219345,
-0.0409479919129347, 0.111224942380013, 0.00704490808823113,
0.0236731452544392, -0.0811686305416274, -0.0274692750452023,
0.00160881330548216, -0.0211269729894635, -0.0377625466699325,
-0.0311273993307701, -0.0118001904995042, 0.0023179680499073,
0.0263453251509878, 0.0767020512037913, -0.0113771665605732,
-0.0428469659333539, 0.0714746847470087, 0.10720066191237, 0.0153144105362596,
-0.109538998188302), Season = structure(c(2L, 1L, 4L, 3L, 3L,
1L, 4L, 3L, 2L, 4L, 4L, 4L, 1L, 4L, 3L, 1L, 3L, 4L, 1L, 1L, 3L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 3L, 4L, 1L, 2L, 3L, 2L, 2L, 3L, 4L,
2L, 3L, 2L), .Label = c("fall", "spring", "summer", "winter"), class = "factor"),
off = c(230915, 57957, 85583, 10526, 35316.6, 4851, 87287,
48226, 198700, 42050.6, 46252.8, 29974, 56566, 20959, 43175,
10385, 56997, 208126, 100672, 80516, 244507, 128730, 38470,
5177, 22435.6, 121202, 114234, 26140, 24693, 53812.6, 124281,
666114, 583, 76915, 140824.7, 91912, 78828, 219171, 95419,
33783.9)), row.names = c(NA, -40L), class = "data.frame")
I am running a GAM that looks like this:
gam1<-gam(n~Season+s(change2, by=Season, k=5)+
s(name, bs="re")+
offset(log(off)),
data=data,family=nb,method="REML")
With a random effect for the name variable and a fixed effect for the Season variable.
I am able to get all the outputs using the summary() command for this model, however when I try to display the partial effect plots with the gratia::draw() command, I get the following error:
library(gratia)
draw(gam1)
Error in eval(predvars, data, env) : object 'off' not found
In addition: Warning message:
In predict.gam(object, newdata = pred_data, type = "terms", terms = term, :
not all required variables have been supplied in newdata!
The partial effect plots show up with the plot.gam function, so I am wondering why this model will not work for the draw command? Also the model will run if I take the offset out, however this information is crucial to the analysis. Is there a reason why I can't run a fixed effect with an offset?
I can't reproduce this error with version 0.7.1 (on GitHub, but about to be submitted to CRAN) and I don't think anything of relevance to the reported problem changed between 0.7.0 (the current CRAN version) and this patched version.
With your data in tmp:
r$> gam1 <- gam(n ~ Season + s(change2, by = Season, k = 5)+
s(name, bs = "re") +
offset(log(off)),
data = tmp, family = nb, method = "REML")
r$> draw(gam1)
I get:

Why is 'ï..' at the front of the first colname when imported a csv into r?

Why is 'ï..' at the front of the first colname when imported a csv into r?
This happens whenever I have a csv saved via excel and then do read.csv() in r. What causes this behavior?
Dput:
structure(list(ï..Species = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 1L, 2L, 4L, 5L, 6L, 7L, 1L, 3L, 1L, 2L, 2L, 4L, 4L, 3L, 2L,
5L, 4L, 7L, 2L, 2L), Country = c(1L, 1L, 1L, 1L, 2L, 2L, 3L,
4L, 5L, 6L, 1L, 3L, 2L, 1L, 2L, 5L, 4L, 4L, 2L, 3L, 3L, 5L, 4L,
5L, 5L, 3L, 7L, 4L, 5L)), class = "data.frame", row.names = c(NA,
-29L))
Please review previous answer here regarding weird characters when importing data from Excel: Weird characters added to first column name after reading a toad-exported csv file

`non-finite value supplied` in ggstatsplot

I am working with ggstatsplot to get visual representations of my statistical analyses.
I have numerous datasets, all very similar in make-up. Some work just fine, while others don't. data1 is a working example, and data2 doesn't work.
data1 <- structure(list(
treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L),
.Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"),
value = c(1.74501, 2.04001, 1.89501, 1.84001,
1.89501, 9.75001, 8.50001, 8.80001, 11.50001, 10.25001, 7.90001,
9.25001, 11.45001, 7.75001, 7.75001, 7.55001, 8.70001, 8.20001,
6.95001, 6.60001, 7.40001, 7.15001, 8.25001, 9.20001, 8.95001,
6.45001, 6.05001, 5.40001, 7.95001, 6.80001, 4.65001, 6.40001,
6.40001, 6.70001, 5.40001, 3.20001, 2.70001, 4.30001, 4.10001,
3.60001, 4.00001, 3.00001, 4.70001, 3.10001, 3.50001, 6.45001,
5.45001, 4.90001, 7.25001, 4.55001, 4.70001, 6.25001, 5.65001,
6.00001, 5.10001)),
row.names = c(NA, -55L), class = c("tbl_df", "tbl", "data.frame"))
data2 <- structure(list(
treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L),
.Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"),
value = c(1.00001, 1.00001, 1.00001, 1.00001, 1.00001, 6.77501,
5.68751, 5.99201, 8.24501, 7.01251, 4.79501, 5.99126, 8.26276,
5.35376, 5.38751, 4.60251, 5.38901, 4.85201, 4.44401, 5.20501,
6.20701, 5.77001, 4.05201, 3.65126, 3.02401, 4.68351, 3.90001,
2.56951, 3.70001, 3.61901, 3.96401, 2.93601, 1.53901, 1.40801,
2.05601, 2.08501, 1.89701, 1.79501, 1.50001, 2.09151, 1.53551,
1.57501, 3.88851, 3.09151, 2.75501, 4.40626, 2.42001, 2.60951,
3.83501, 3.37151, 3.70001, 2.92701)),
row.names = c(NA, -52L), class = c("tbl_df", "tbl", "data.frame"))
I call the most basic analysis for both datasets:
library(Rmpfr)
library(ggstatsplot)
ggstatsplot::ggbetweenstats(
data = data1,
x = treatment,
y = value,
messages = FALSE )
ggstatsplot::ggbetweenstats(
data = data2,
x = treatment,
y = value,
messages = FALSE )
For data1 I get this:
for data2 I get:
> Error in stats::optim(par = 1.1 * rep(lambda, 2), fn = function(x) { : non-finite value supplied by optim
At first I thought the issue might be a few zeros that I passed on in the negative control, but I first upped them by a tiny amount and then by 1 to make sure the range of the values is not an issue. The only discrepancy I can see is that I only have 7 instead of 10 measurements for treatmentA (level 3) in data2 but 10 in data1 (had to remove a few NAs due to sample failure). However, in both cases the negative control (level 1) only has 5 values, and I don't think that in this type of analysis there is an issue with different sample sizes between the groups.
It's a good idea to try basic plots out in these cases eg isolate the boxplots:
So comparing the two datasets:
boxplot(value ~ treatment, data=data1)
boxplot(value ~ treatment, data=data2)
data2 has a treatment with no variability ("negative_ctrl"), 0 SD. I'm guessing this function is doing some tests that require variation. You will need to read the documentation for the function to see if this is brought up but you can get views either by removing these treatments, or forcing a very small amount of variation eg
# run without negative_ctrl
ggstatsplot::ggbetweenstats(
data = data2[data2$treatment != "negative_ctrl",],
x = treatment,
y = value,
messages = FALSE )
# add some tiny fake variation to force it through (this is a hack)
data3 <- data2
data3[data3$treatment=="negative_ctrl",][1,][["value"]] <- 1.0001
ggstatsplot::ggbetweenstats(
data = data3,
x = treatment,
y = value,
messages = FALSE )

Sankey diagram, alluvial, ggalluvial in R – Three data blocks: Baseline-Flow (many time points)-Outcome

We would like to present the change in muscle mass due to the exercise of different age group and the final performance/outcome at the competition at the end of the study.
We have several time points at which the muscle mass was measured. In this example I only show three time points, however, the study compromises 12 time points.
To present the change in muscle mass and deviation from the average I was able to use geom_flow(). However, it becomes very tricky to add the age groups on the left of the chart as well as the performance on the right side. These data are located in different variables.
Please help us to find a great way to present the data. Thanks.
Data Structure:
ID Age_at_start Month Deviation_muscle Performance
1 36 3 59 Outstanding
1 36 6 104 Outstanding
1 36 9 200 Outstanding
2 29 3 -40 average
2 29 6 -109 average
2 29 9 -30 average
3 22 3 310 above average
library(ggplot2)
library(ggalluvial)
df.san$age<-factor(df.san$age)
df.san$age<-factor(df.san$age, levels=c(1,2,3,4), labels=c("20 to 24 years","25 to 29 years","30 to 34 years","35 to 39 years"))
df.san$dev_group <-factor(df.san$dev_group,levels=c(1,2,3,4,5,6,7),labels=c("≥250g","≥150 to <250g","≥50 to <150g","> -50 to <50g","> -150 to ≤ -50","> -250 to ≤ -150", "≤ -250g"))
df.san$month <- factor(df.san$month,labels=c("1mo","2mo","3mo"))
df.san$perform<-factor(df.san$perform,levels=c(1,2,3,4),labels=c("outstanding "," above average "," average "," below average"))
ggplot(df.san,aes(x = month,stratum = dev_group, alluvium = ID, fill = dev_group,label = dev_group)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", lode.guidance = "rightleft", color = "darkgray") +
geom_stratum() +
theme(legend.position = "bottom") +
ggtitle("Effect of Exercice on Muscle Growth on Performance in 4 Different Age Groups ")
Data for df.san:
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L), age = c(2L, 3L, 3L, 1L, 3L, 1L, 2L, 3L, 4L, 1L, 1L, 3L, 1L, 4L, 4L, 3L, 4L, 3L, 4L, 2L, 2L, 1L, 2L, 4L, 1L, 1L, 4L, 1L, 3L, 1L, 2L, 3L, 4L, 4L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 4L, 3L, 3L, 2L), month = c(2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L, 2L, 4L, 6L), dev_muscle = c(-109.3, -236.2, -275.4, -44.5, -202.6, -436, 3, -115.8, -136.2, -142.1, -429, -561.4, -49, -248.8, -232.6, -15.9, -171.5, -391.6, -5.8, -21.7, -104.1, 12.6, -33.4, -25.4, -57.3, -50.7, -103.6, -124, -221.4, -457.2, 22.1, -126.9, -79.5, -76.8, -113.2, -129.7, -86.1, -126, -82.9, -10.8, -2.8, 88.3, 41.6, 0.2, 184.7), perform = c(1L, 2L, 1L, 2L, 4L, 1L, 1L, 4L, 3L, 4L, 2L, 4L, 4L, 4L, 2L, 2L, 4L, 3L, 3L, 4L, 1L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 1L, 2L, 4L, 3L, 2L, 1L, 3L, 2L, 1L, 1L, 4L, 4L), dev_group = c(5L, 6L, 7L, 4L, 6L, 7L, 4L, 5L, 5L, 5L, 7L, 7L, 4L, 6L, 6L, 4L, 6L, 7L, 4L, 4L, 5L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 7L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 3L, 4L, 4L, 2L)), class = "data.frame", row.names = c(NA, -45L))

R: applying multiple functions by group of columns on time series data

I have a time series data at hourly level. I am trying to build a forecast for that data. The following is sample of data:
sample <-
structure(list(group_type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Group 1",
"Group 2", "Group 5"), class = "factor"), sub_group_type = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("Sub Group 1", "Sub Group 2", "Sub Group 3"),
class = "factor"), date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1/1/17",
"1/2/17", "1/3/17"), class = "factor"), hour = c(6L, 7L, 8L, 9L, 10L, 11L, 12L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), weekday = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
.Label = c("Monday", "Sunday", "Tuesday"), class = "factor"), total = c(9L, 9L,
10L, 6L, 2L, 14L, 3L, 11L, 12L, 12L, 0L, 10L, 8L, 13L, 14L, 17L, 12L, 5L, 9L, 7L,
10L, 13L, 23L, 11L, 3L, 6L, 10L, 11L, 14L, 16L, 13L, 2L, 3L, 4L, 14L, 11L, 16L,
8L, 12L, 7L, 6L, 13L, 13L, 22L, 12L, 7L, 9L, 8L, 14L, 9L, 16L, 15L, 6L, 7L, 6L,
12L, 13L, 14L, 7L, 3L, 13L, 11L, 6L, 8L, 15L, 11L, 3L, 10L, 9L, 7L, 12L, 10L, 10L,
3L, 14L, 8L, 12L, 10L, 20L, 5L, 4L, 8L, 12L, 3L, 0L, 4L, 5L, 1L, 6L, 7L, 0L, 3L,
1L, 1L, 0L, 2L, 2L, 0L, 2L, 0L, 3L, 7L, 6L, 2L, 1L)), .Names = c("group_type",
"sub_group_type", "date", "hour", "weekday", "total"), class = "data.frame",
row.names = c(NA, -105L))
I am applying the following functions to the above data:
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=7, size=25)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=7)
mod_bats <- bats(x, ic='aicc', seasonal.periods=7)
mod_stl <- stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
}
test <- by(sample,list(sample$group_type,sample$sub_group_type,sample$date, sample$hour
),models)
However, I am getting the following error:
Error in ets(x, ic = "aicc", restrict = T) : y should be a univariate time series
If I split the data as follows as apply ets() function, I am able to run it without any issues. But, this splitting of data is not a very feasible option for me as the number of Groups and Sub Groups are too many and each of them has a different time series pattern:
sub_sample_1 <- sample[sample$group_type == "Group 1" & sample$sub_group_type == "Sub Group 1",6]
x <- msts(sub_sample_1, seasonal.periods=24)
mod_arima <- auto.arima(x, ic='aicc', stepwise=F)
mod_exp <- ets(x, ic='aicc', restrict=T)
mod_hwa <- HoltWinters(x,seasonal = "additive")
mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
mod_neural <- nnetar(x, p=24, size=10)
mod_tbats <- tbats(x, ic='aicc', seasonal.periods=24)
mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
mod_stl <- stlm(x, s.window=24, ic='aicc', robust=TRUE, method='ets')
mod_sts <- StructTS(x)
Is there any work around so that I can apply the models by group of columns with out encountering any errors?
Also, not all models are working for all the groups. For the sub_sample_1 data, HoltWinters, neuralnet, bats and stl are giving me error and others are working
> mod_hwa <- HoltWinters(x,seasonal = "additive")
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
time series has no or less than 2 periods
> mod_hwm <- HoltWinters(x,seasonal = "multiplicative")
Error in HoltWinters(x, seasonal = "multiplicative") :
data must be non-zero for multiplicative Holt-Winters
> mod_bats <- bats(x, ic='aicc', seasonal.periods=24)
Error in optim(par = param.vector$vect, fn = calcLikelihoodNOTransformed, :
function cannot be evaluated at initial parameters
I can understand why these models are not working for my data. How do I exclude them when they give errors when I apply the function?
Thanks in advance for the help!
This question is similar (extension maybe) to my other question here
Several issues arise from your current setup:
Functions return the last line if no return() is specified. So your first attempt will lose all lines except for mod_sts which will be the value test is assigned for each subset of by.
In your subset code, you are actually passing the 6th column (an atomic vector) whereas you pass all columns of dataframe in your first code attempt. This may be the reason for your error where input should be per the msts docs:
A numeric vector, ts object, matrix or data frame. It is intended that
the time series data is univariate, otherwise treated the same as
ts().
Your by is receiving four groupings, group_type, sub_group_type, date, and hour unlike your second subset code of two. Unless your data is very large, these many groupings may result with few rows or no rows, and hence not enough data points for model procedures as your last code block seems to suggest.
With that said, consider the following adjustment in returning a list of named elements by first two groupings, specifying the 6th column. And because by takes a combinations of factors which in subsetting dataframe may yield no rows, below uses tryCatch to capture any errors and return empty lists to be filtered out in final line.
models <- function(x){
x <- msts(x, seasonal.periods=c(24,168))
list(
mod_exp = ets(x, ic='aicc', restrict=T),
mod_hwa = HoltWinters(x,seasonal = "additive"),
mod_hwm = HoltWinters(x,seasonal = "multiplicative"),
mod_neural = nnetar(x, p=7, size=25),
mod_tbats = tbats(x, ic='aicc', seasonal.periods=7),
mod_bats = bats(x, ic='aicc', seasonal.periods=7),
mod_stl = stlm(x, s.window=7, ic='aicc', robust=TRUE, method='ets'),
mod_sts = StructTS(x)
)
}
# TRY/CATCH TO CAPTURE ERRORS AND RETURN EMPTY LIST
test <- by(sample[,6], list(sample$group_type, sample$sub_group_type),
function(x) tryCatch({ models(x)
}, error=function(e) return(list(NA))))
# TO REMOVE NULLs AND NAs (EMPTY ITEMS)
test <- Filter(function(i) length(i) > 0, test)

Resources