LME error in model.frame.default ... variable lengths differ - r

I am trying to run a random effects model with LME. It is part of a larger function and I want it to be flexible so that I can pass the fixed (and ideally random) effects variable names to the lme function as variables. get() worked great for this where I started with lm, but it only seems to throw the ambiguous "Error in model.frame.default(formula = ~var1 + var2 + ID, data = list( : variable lengths differ (found for 'ID')." I'm stumped, the data are the same lengths, there are no NAs in this data or the real data, ...
set.seed(12345) #because I got scolded for not doing this previously
var1="x"
var2="y"
exdat<-data.frame(ID=c(rep("a",10),rep("b",10),rep("c",10)),
x = rnorm(30,100,1),
y = rnorm(30,100,2))
#exdat<-as.data.table(exdat) #because the data are actually in a dt, but that doesn't seem to be the issue
Works great
lm(log(get(var1))~log(get(var2)),data=exdat)
lme(log(y)~log(x),random=(~1|ID), data=exdat)
Does not work
lme(log(get(var1,pos=exdat))~log(get(var2)),random=(~1|ID), data=exdat)
Does not work, but throws a new error code: "Error in model.frame.default(formula = ~var1 + var2 + rfac + exdat, data = list( : invalid type (list) for variable 'exdat'"
rfac="ID"
lme(log(get(var1))~log(get(var2)),random=~1|get(rfac,pos=exdat), data=exdat)

Part of the problem seems to be with the nlme package. If you can consider using lme4, the desired results can be obtained by with:
lme4::lmer(log(get(var1)) ~ log(get(var2)) + (1 | ID),
data = exdat)

Related

How do you compute average marginal effects for glm.cluster models?

I am looking for a way to compute average marginal effects with clustered standard errors which i seem to be having a few problems with. My model is as follows:
cseLogit <- miceadds::glm.cluster(data = data_long,
formula = follow ~ f1_distance + f2_distance + PolFol + MediaFol,
cluster = "id",
family = binomial(link = "logit"))
Where the dependent variable is binary (0/1) and all explanatory variables are numeric. I've tried to different ways of getting average marginal effects. The first one is:
marginaleffects <- margins(cseLogit, vcov = your_matrix)
Which gives me the following error:
Error in find_data.default(model, parent.frame()) :
'find_data()' requires a formula call
I've also tried this:
marginaleffects <- with(cseLogit, margins(glm_res, vcov=vcov))
which gives me this error:
Error in eval(predvars, data, env) :
object 'f1_distance' was not found
In addition: warnings:
1: In dydx.default(X[[i]], ...) :
Class of variable, f1_distance, is unrecognized. Returning NA.
2: In dydx.default(X[[i]], ...) :
Class of variable, f2_distance, is unrecognized. Returning NA.
Can you tell me what i'm doing wrong? If i haven't provided enough information, please let me know. Thanks in advance.

Unable to plot p values when using facet.by from ggsurvplot. Error message: "variable lengths differ"

I have a problem that I don't know how to solve. And it seems to be related to my data set (or is it?). Indeed, I am actually able to plot different p values when using facet.by when I use your example from issue#205 via the "colon" data set. However,it does not work with my data set that is available on my Github profile here (https://github.com/CroixJeremy2/Data-frame-for-stack-overflow.git).
Expected behavior
I would like to be able to plot different p values as in issue#205 with my data set.
Actual behavior
I am only able to plot the curves via facet.by. But I can't plot the p values that should be automatically calculated for a log-rank test. Instead, an error message is returned in my R console saying:
"Error in model.frame.default(formula = Survival ~ Sex, data = list(ID = c(147L, :"
"les longueurs des variables diffèrent (trouvé pour 'Sex')"
# the last line translated from French to English means:
"variable lengths differ (found for Sex)"
Steps to reproduce the problem
library(survival)
Survival = Surv(time = D$Age, event = D$outcome)
library(survminer)
fit = survfit(data = D, formula = Survival ~ Sex + Genotype)
ggsurvplot(fit = fit, data = D, pval = TRUE, facet.by = 'Genotype') #error message
ggsurvplot(fit = fit, data = D, facet.by = 'Genotype') #curves can be plotted
Remarks
Note that the survdiff() function works perfectly on my data sets in order to calculate p values from log-rank tests. Therefore, I do not know if I am doing something wrong in ggsurvplot() (most likely hypothesis) or if there is something wrong in the ggsurvplot() function itself (unlikely).
survdiff(data = D, subset = D$Ctrl, formula = Survival ~ Sex)
survdiff(data = D, subset = D$nKO, formula = Survival ~ Sex)
survdiff(data = D, subset = D$CRE_Ctrl, formula = Survival ~ Sex)
#works fine, p value returned, no message/warning/error returned.
Moreover, the variable lengths seems equal... And I don't have any "NA" values in my dataframe... So I really don't understand why I have this error message...
sapply(D,function(x) length(x))
# length = 298 for all my variables...
Thanks in advance for your response and help,

Error when applying lsmeans to model

I have a dataset (data1) as below:
T,D,P,R
1,0,1,1
2,1,1,3
3,0,1,7
1,0,2,2
2,1,2,3
3,1,2,7
1,0,3,1
2,1,3,4
3,1,3,7
1,1,4,1
2,1,4,3
3,0,4,7
To start with, all variables except the response (which is R) are specified as factors (so D, T and P are factors). I then fit the model:
model1 <- lme(R~D+T,random=~1|P, method="REML",data=data1)
Then I want to use lsmeans to average across factor levels of D and P.
To do this I type:
lsm_model <- lsmeans(model1,~T)
Doing this, I get two error messages:
"Error in [.data.frame(tbl, , vars, drop = FALSE) :
undefined columns selected
And
Error in ref.grid(object = list(modelStruct = list(reStruct = list(Plot = -11.7209195395097)), :
Perhaps a 'data' or 'params' argument is needed"
Why can't I fit lsmeans to this model? I have used lsmeans many times without any problems.

Back-transforming contrast lstrends results

I calculated a linear mixed model using the packages lme4 and lsmeans with the lmer-function, where I have one dependent variable rv and the interacting factors treatment, time, age, and race. I'm interested in the response variable change over time, that's why I use the lstrends-function. So far so good. The problem is, I have to square root the response variable in order to fit the model properly. But the pairs-function only gives out a response to the square root of the rv, hard to interpret!
So I tried to back-transform the response variable after pairs:
model.lmer <- lmer(sqrt(rv) ~ treat*time*age*race + (1|individual), data=mydata)
model.lst <- lstrends(model.lmer, ~treat | age*race , var = "time", type="response")
pairs(mouse.lst, type="response")
This obviously doesn't work, as stated by the package itself:
# Transformed response
sqwarp.rg <- ref.grid(update(warp.lm, sqrt(breaks) ~ .))
summary(sqwarp.rg)
# Back-transformed results - compare with summary of 'warp.rg'
summary(sqwarp.rg, type = "response")
# But differences of sqrts can't be back-transformed
summary(pairs(sqwarp.rg, by = "wool"), type = "response")
# We can do it via regrid
sqwarp.rg2 <- regrid(sqwarp.rg)
summary(sqwarp.rg2) # same as for sqwarp.rg with type = "response"
pairs(sqwarp.rg2, by = "wool")
It could look like the following code:
summary(pairs(lsmeans(rg.regrid, ~ treat | race*age, trend="time")), type="response")
The problem is, I can't alter the reference grid for lstrends, just for lsmeans, because the first argument in lstrends or lsmeans with trend="time" requires the linear mixed effect model (model.lmer) intead of just the reference grid like in lsmeans, without the trend-argument... That's probably why I can't back-transform the data with
This here sums up my problem pretty well:
model.sqrt <- lmer(sqrt(rv) ~ time*treat*race*age, data=mydata)
rg <- ref.grid(model.sqrt)
rg.regrid <- regrid(rg)
summary(pairs(lsmeans(rg.regrid, ~treat | race*age*time), type = "response"))
Works perfectly.
summary(pairs(lsmeans(rg.regrid, ~treat | race*age, trend="time"), type = "response"))
Gives the following error:
Error in summary(pairs(lsmeans(rg.regrid, ~vns | gen * age, trend = "time"), :
error in evaluating the argument 'object' in selecting a method for function 'summary': Error in data[[var]] : subscript out of bounds
How to avoid the error and still be able to back-transform my data?
It does NOT seem to seem possible at all - the back-transformation would be a complicated procedure without any obvious pattern. That's what the creator of the package said.

Subsetting data breaks GLM

I have a GLM Logit regression that works correctly, but when I add a subset argument to the GLM command, I get the following error:
invalid type (list) for variable '(weights)'.
So, the following command works:
glm(formula = A ~ B + C,family = "binomial",data = Data)
But the following command yield the error:
glm(formula = A ~ B + C,family = "binomial",data = Data,subset(Data,D<10))
(I realize that it may be difficult to answer this without seeing my data, but any general help on what may be causing my problem would be greatly appreciated)
Try subset=D<10 instead (you don't need to specify Data again, it is implicitly used as the environment for the subset argument). Because you haven't named the argument, R is interpreting it as the weights argument (which is the next argument after data).

Resources