Getting error message while calculating rmse in a time series analysis - r

I am trying to replicate this example of time series analysis in R using Keras (see Here) and unfortunately I am receiving error message while computing first average rmes
coln <- colnames(compare_train)[4:ncol(compare_train)]
cols <- map(coln, quo(sym(.)))
rsme_train <-
map_dbl(cols, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_train
Error message:
Error in is_symbol(x) : object '.' not found
There are some helpful comments at the bottom of the post but new version of dplyr doesn't help really. Any suggestion how to get around this?

I stumbled upon the same problem, so here's a solution that is close to the original code.
The transformation for cols is not necessary, because !! works with the character vector already. You can change the code to
coln <- colnames(compare_train)[4:ncol(compare_train)]
rsme_train <-
map_df(coln, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>%
pull(.estimate) %>%
mean()
rsme_train
You might also want to check for updates of tidyverse, just to be sure.

Related

Problem `.x` is empty in pammtools packages

I am trying to replicate the example code in Bender and Schleip for Piece-wise exponential Additive Mixed modelling tools. Specifically a survival exercise with time varying effects.
https://arxiv.org/pdf/1806.01042.pdf
library(dplyr); library(tidyr); library(purrr); library(ggplot2)
library(survival); library(mgcv); library(pammtools)
data("pbc", package="survival")
# event time information
pbc <- pbc %>%
filter(id <= 312) %>%
mutate(status = ifelse(status==0,0,1) )%>%
select(id:status, trt:sex, bili, protime)
pbc %>% slice(1:6)
pbc_ped <- as_ped(
data = list(pbc, pbcseq),
formula = Surv(pbc$time, pbc$status)~sex|concurrent(bili, protime, tz_var = "day"),
id = "id")
I always get the error
Error: .x is empty, and no .init supplied
I installed and checked Rtools, I tried with different (older) version of Purrr, which sometimes is related with this error. I tried to run the code also on https://rdrr.io/snippets/.
Any idea? thank you very much...
You have not used the code in that vignette. And you added pbc$ to the arguments in Surv(), a common mistake but generally not a productive strategy
# Need to narrow the material from pbcseq
pbcseq <- pbcseq %>% select(id, day, bili, protime)
# I would have given it a different name
#------ Error when using "|" rather than "+"
pbc_ped <- as_ped(
data = list(pbc, pbcseq),
formula = Surv(time, status)~sex|concurrent(bili, protime, tz_var = "day"),
id = "id")
#Error: `.x` is empty, and no `.init` supplied
#________________
pbc_ped <- as_ped(
data = list(pbc, pbcseq),
formula = Surv(time, status)~sex + concurrent(bili, protime, tz_var = "day"),
id = "id") # No error
I think there may be an error in the vignette. I don't see any examples using the construct ...
Surv(time,status)~ variates | special(.)
They all use a "+" sign for adding the time-dependent covariates. If you go to https://adibender.github.io/pammtools//articles/data-transformation.html you see them using a "+" rather than a "|". I think there is some sloppiness in that package's documentation. But your additions only made the problem worse.

lm function is giving a warning that it is dropping rows

This is my question
Do the developers that make more games charge higher prices?
my code:
dev_data <- steam_data_final %>%
group_by(developer) %>%
summarize(num_dev = n(), avg_price = mean(price, na.rm = TRUE)) %>%
arrange(desc(num_dev))
dev_data
but this model isn't working, getting Warning: Dropping 3038 rows with missing values
mod_dev <- lm(num_dev ~ avg_price, data = dev_data)
Check if you have any NA using summary() or is.na() for each column. If you do have any NA, then it is the reason why the lm() gives you the warning message.
Also, it seems like you need to use lm(avg_price ~ num_dev, data = dev_data) instead of lm(num_dev ~ avg_price, data = dev_data). It seems like the dependent variable should be avg_price, not num_dev. (It depends on your question of research.)

Getting different estimates of the mean with or without lapply, even when specifying na.rm = T in all cases

I have spent several hours trying to trouble shoot this issue, albeit unsuccessfully.
I have over a hundred outcome variables I am trying to estimate their means. When computed individually (i.e., not in a loop or vectorized form), I get the correct estimate. Once I used lapply or some similar code, I get slightly different estimates.
Here are the stand-alone codes that gave the right answer:
prop.table(table(x$polytobacco))
mean(x$polytobacco, na.rm = T)
However, below is the apply statement that gave the wrong estimate:
varlist = c("polytobacco, ....") # over 100 variables
meanestimates = as.matrix(round(sapply(y[,varlist], function(x) mean(as.numeric(x), na.rm = T)), 1)) %>% as.data.frame() %>% t
Thanks in advance.
Ter.
I guess, less is more. I simplified the code it now gives the correct results.
#Overall
pooledov = lapply(x[,outcomes], function(x) mean(x, na.rm = T)) %>% as.data.frame()
rownames(pooledov) = "Total"
colnames(pooledov) = outcomes
> #analyses for subgroups
means = foreach(o = outcomes) %:%
foreach (p = predictors) %do% {
tapply(x[,o], x[,p], mean, na.rm =T) %>% as.data.frame()
}

define r square depending on 1 factor

I get data with one factor. The factor is ref_fruit. So my script looks like that to get the R_square for each factor (with MF depending on heure) :
models <- dlply(P1, "ref_fruit", function(df)
lm(MF ~ heure, data = df))
ldply(models, coef)
l_ply(models, summary, .print = TRUE)
the problem is the list I got with R square is really high each time : around 0.998. This is not what I observed with excel.
And the other problem is I got this message after executing:
ldply(models, coef)
Error in fs[[i]](x, ...) : attempt to apply non-function.
May someone help me please ?
Your problem is almost identical to this example for dplyr shown here. You didn't provide any data but it should be...
require(dplyr)
by_fruit <- group_by(df, ref_fruit)
models <- by_fruit %>% do(mod = lm(MF ~ heure, data = .))
models
summarise(models, rsq = summary(mod)$r.squared)

Comparing models with dplyr and broom::glance: How to continue if error is produced?

I would like to run each variable in a dataset as a univariate glmer model using the lme4 package in R. I would like to prepare the data with the dplyr/tidyr packages, and organize the results from each model with the broom package (i.e. do(glance(glmer...). I would most appreciate help that stuck within that framework. I'm not that great in R, but was able to produce a dataset that throws an error and has the same structure as the data I'm using:
library(lme4)
library(dplyr)
library(tidyr)
library(broom)
Bird<-c(rep(c(0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0),10))
Stop<-c(rep(seq(1,10), 20))
Count<-c(rep(c(rep(c(1,2), each=10)), each=10))
Route<-c(rep(seq(1,10), each=20))
X1<-rnorm(200, 50, 10)
X2<-rnorm(200, 10, 1)
X3<-c(rep(c(0),200))#trouble maker variable
Data<-data.frame(cbind(Bird, Stop, Count, Route, X1, X2, X3))
Data%>%
gather(Variable, Value, 5:7)%>%
group_by(Variable)%>%
do(glance(glmer(Bird~Value+Stop+(1+Stop|Route/Count), data=., family=binomial)))
The last variable produces an error so there is no output. What I would like is it to produce NA values in the output if this occurs, or just skip that variable. I've tried using 'try' to blow past the trouble maker variable:
do(try(glance(glmer(Bird~Value+Stop+(1+Stop|Route/Count), data=., family=binomial))))
which it does, but still an output is not produced because it can't coerce a 'try-error' to a data.frame. Unfortunately there is no tryharder function. I've tried some if statements which make sense to me but not the computer. I'm sure I'm not doing it right, but if for example I use:
try(glance(glmer(Bird~Value+Stop+(1+Stop|Route/Count), data=., family=binomial)))->mod
if(is.data.frame(mod)){do(mod)}
I get subscript out of bounds errors. Thanks very much for any input you can provide!
Use tryCatch before the call to glance:
zz = Data %>%
gather(Variable, Value, 5:7) %>%
group_by(Variable) %>%
do(aa = tryCatch(glmer(Bird~Value+Stop+(1+Stop|Route/Count), data=.,
family=binomial), error = function(e) data.frame(NA)))
zz %>%
glance(aa)

Resources