Related
I have a function that computes a table and a model (and more...):
fun <- function(x, y, formula = y ~ x, data = NULL) {
out <- list()
out$tab <- table(x, y)
out$mod <- glm(formula = formula,
family = binomial,
data = data)
out
}
In the formula, I need to use x and y as provided in the function call (e.g. x = DF1$x and y = DF1$y) and variables from another data frame (e.g. a and b from DF2). It fails with my naive function:
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
# Error in eval(predvars, data, env) : object 'y' not found
How can I make glm search x and y from the function environment? I guess this issue is related to non-standard evaluation and/or scoping, but I have no idea how to fix it.
Data for the example:
smp <- function(x = c(TRUE, FALSE),
size = 1e2) {
sample(x = x,
size = size,
replace = TRUE)
}
DF1 <- data.frame(x = smp(),
y = smp())
DF2 <- data.frame(a = smp(x = LETTERS),
b = smp(x = LETTERS))
Why not just add x and y into data in the function?
fun <- function(x, y, formula = y ~ x, data = NULL) {
if(length(x) != length(y) |
length(x) != nrow(data) |
length(y) != nrow(data))stop("x, y and data need to be the same length.\n")
data$x <- x
data$y <- y
out <- list()
out$tab <- table(x, y)
out$mod <- glm(formula = formula,
family = binomial,
data = data)
out
}
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
# $tab
# y
# x FALSE TRUE
# FALSE 27 29
# TRUE 21 23
#
# $mod
# Call: glm(formula = formula, family = binomial, data = data)
#
# Coefficients:
# (Intercept) xTRUE aB aC aD aE aF aG aH aI aJ
# 3.2761 -1.8197 0.3409 -93.9103 -2.0697 20.6813 -41.5963 -1.1078 18.5921 -1.0857 -36.5442
# aK aL aM aN aO aP aQ aR aS aT aU
# -0.5730 -92.5513 -3.0672 22.8989 -53.6200 -0.9450 0.4626 -3.0672 0.3570 -22.8857 1.8867
# aV aW aX aY aZ bB bC bD bE bF bG
# 2.5307 19.5447 -90.5693 -134.0656 -2.5943 -1.2333 20.7726 110.6790 17.1022 -0.5279 -1.2537
# bH bI bJ bK bL bM bN bO bP bQ bR
# -21.7750 114.0199 20.3766 -42.5031 41.1757 -24.3553 -2.0310 -25.9223 -2.9145 51.2537 70.2707
# bS bT bU bV bW bX bY bZ
# -4.7728 -3.7300 -2.0333 -0.3906 -0.5717 -4.0728 0.8155 -4.4021
#
# Degrees of Freedom: 99 Total (i.e. Null); 48 Residual
# Null Deviance: 138.5
# Residual Deviance: 57.73 AIC: 161.7
#
# Warning message:
# glm.fit: fitted probabilities numerically 0 or 1 occurred
#
#DaveArmstrong's answer that was already accepted is correct. This answer explains why there was an error in the original version of the code.
#Thomas quoted the docs in a comment saying
If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.
The word "typically" is key here. The exact rule is that the environment attached to the formula is the one where the formula expression is first evaluated, because ~ is actually a function. It attaches the evaluation environment to the formula object, and that's the one that stays with it as you pass the object around.
If you run glm(y ~ x), the formula is evaluated wherever you call that, so that's the "typical" case.
In your example, you created the formula object when you called
fun(x = DF1$x,
y = DF1$y,
formula = y ~ x + a + b,
data = DF2)
That means the global environment (where you made this call) is attached to the formula, and there's no y there, so you got the error.
If you had used the default formula = y ~ x by calling
fun(x = DF1$x,
y = DF1$y,
data = DF2)
with no formula argument, it would work, because default arguments are evaluated in the evaluation frame of the function that uses them. Since fun() has local variables x and y created by the arguments, that would be fine.
You also asked why data = NULL would work in #DaveArmstrong's function. He added x and y to it using
data$x <- x
data$y <- y
If you start with data = NULL, the first line changes it to a list containing x and the second line adds a y component, so you end up with a list containing x and y and that's fine for data in glm().
I fitted a lasso logistic model with interaction terms. Then i wanted to visualize those interactions using a interaction plot.
I tried to find some R function that will plot interactions for glmnet models and i couldnt find any .
Is there any R package that will plot interactions for LASSO ?
Since i couldnt find any, i tried to do it manually , by plotting the predicted values. But i am getting some errors.
My code is as follows,
require(ISLR)
require(glmnet)
y <- Smarket$Direction
x <- model.matrix(Direction ~ Lag1 + Lag4* Volume, Smarket)[, -1]
lasso.mod <- cv.glmnet(x, y, alpha=1,family="binomial",nfolds = 5, type.measure="class",
lambda = seq(0.001,0.1,by = 0.001))
lasso.mod$lambda.min
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
lasso.mod1 <- glmnet(x, y, alpha=1,family="binomial",
lambda = lasso.mod$lambda.min)
pred$Direction = predict(lasso.mod1, newx=pred,
type="response", s= lasso.mod$lambda.min)
i am getting this error :
Error in cbind2(1, newx) %*% nbeta :
not-yet-implemented method for <data.frame> %*% <dgCMatrix>
Can any suggest anything to fix this issue ?
Thank you
predict.glmnet says newx must be a matrix. And you need to give interaction value by yourself.
library(dplyr)
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100)) %>%
mutate(`Lag4:Volume` = Lag4 * Volume) # preparing interaction values
pred$Direction = predict(lasso.mod1, newx = as.matrix(pred), # convert to matrix
type = "link", s= lasso.mod$lambda.min)
[EDITED]
Oh, I overlooked more general, better way.
pred = expand.grid(Lag1 = median(Smarket$Lag1),
Lag4 = c(-0.64,0.0385,0.596750),
Volume = seq(min(Smarket$Volume), max(Smarket$Volume), length=100))
pred$Direction = predict(lasso.mod1,
newx = model.matrix( ~ Lag1 + Lag4* Volume, pred)[, -1],
type="response", s= lasso.mod$lambda.min)
I'm new in R programming and I don't get a solution to an error which occurs when I use the nls Function.
I try to fit the data from an ecdf (values are extracted and saved in y) to this function model with four parameters:
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1))
When I start the nls Function these error occurs:
Error in numericDeriv(form[[3L]], names(ind), env) :
Fehlender Wert oder etwas Unendliches durch das Modell erzeugt
which means there is a missing value ore some value with infinity is generated through the model.
My vectors Abfluss and y have both the same lengths. Aim is to get the parameter estimation.
Maybe the problem is, that the model only works under this conditions:
c>0, d>0, b<=Abfluss<=a.
I try already the na.rm=True command. Then another error appears:
Error in model.frame.default(formula = ~y + Abfluss, na.rm = TRUE) :
Variablenlängen sind unterschiedlich (gefunden für '(na.rm)')
which means, the Length of variables are different.
I appreciative for every kind of help and advice.
For a better understanding I attach my whole code with whole data:
time<-c(1851:2013)
Abfluss<- c(4853,4214,5803,3430,4645,4485,3100,4797,4030,3590,5396,9864,3683,4485,4064,3420,5396,
4895,3931,4238,3790,3520,4263,5474,3790,4700,5109,4525,4007,6340,4993,6903,8160,3600,3480,3540,
3540,4565,3333,7764,
4755,7940,3112,3169,4435,5365,9422,3150,10500,4512,3790,4618,6126,3769,3704,
5938,5669,4552,5458,5854,4867,6057,4783,5753,5736,4618,6091,5820,5007,7984, 4435,
4645,7465,5820,5988,6022,4300,6062,3302,4877,4586,5275,4410,3174,4966,4939,4638,
5541,5760,6495,5435,4952,4912,6092,5182,5820,5129,6436,6648,3063,5550,5160,4400,
9600,6400,6380,6300,6180,6899,4360,5550,4580,3894,5277,7520,6780,5100,5430,4550,
6620,4050,4560,5290,6610,8560,4943,6940,4744,6650,5700,7440,6200,4597,3697,7300,
4644,5456,6302,3741,5398,9500,6296,5279,5923,6412,6559,6559,5891,5737,5010,5790,
10300,4150,4870,6740,7560,8010,5120,8170,7430, 7330,5900, 11150)
#EV4-Distribution
dEV4 <- function(x, a, b, c,d) {
m<-exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d))
return(m)
}
#Simulation example
Sim<-dEV4(Abfluss,a=max(Abfluss),b=min(Abfluss), c=3, d=1)
dEV4cdf<-cbind(Abfluss,Sim)
#Empirical cdf
p = ecdf(Abfluss)
y<- p(Abfluss) #Extracting of cumulated probabilities
m<-cbind(Abfluss,y)
#plot EV4 and ecdf
plot(dEV4cdf, type="p",main="EV4")
plot(ecdf(Abfluss), add=T)
#Fitting EV4 nls
fitsim <- nls(y ~ exp(-(((a-Abfluss)/(c*(Abfluss-b)))^d)),
start = list( a=max(Abfluss), b=min(Abfluss),
c=3, d=1), na.rm=TRUE)
Do not use starting values that are on the boundary of the feasible region and try nlxb in nlmrt instead (which can be used with the same arguments except data = is not optional):
library(nlmrt)
fitsim <- nlxb(y ~ exp(-(((a - Abfluss) / (c * (Abfluss - b))) ^ d)),
data = data.frame(y, Abfluss),
start = list(a = 2 * max(Abfluss), b = min(Abfluss) / 2, c = 3, d = 1))
plot(y ~ Abfluss, pch = 20)
o <- order(Abfluss)
fit <- y - fitsim$resid
lines(fit[o] ~ Abfluss[o], col = "red")
giving:
nlmrt class object: x
residual sumsquares = 0.02908 on 163 observations
after 5001 Jacobian and 6060 function evaluations
name coeff SE tstat pval gradient JSingval
a 20047.7 NA NA NA 1.119e-07 3251
b -1175384 NA NA NA 1.432e-09 0.1775
c 0.0129414 NA NA NA -0.1296 5.808e-06
d 12.146 NA NA NA -2.097e-06 6.798e-11
I am running multivariate mixed model in R by using nlme package. Suppose that x and y are responses variables for longitudinal data which assumed that the error within group is correlated. The residual error matrix is presented as:
So my question is how to involve the correlation into lme function?
I tried commands corr = corComSymm(from =~ 1 | x) or corr = corAR1(from =~ 1 | x) but did not work!
here en example:
# visiting time by months
time = rep(c(0,3,6,9),time = 4, 200)
# subjects
subject = rep(1:50, each = 4)
# first response variable "identity"
x = c(rep(0, 100), rep(1,100))
# second response variable "identity"
y = c(rep(1, 100), rep(0,100))
# values of both reponses variables (x_1, x_2)
value = c(rnorm(100,20,1),rnorm(100,48,1))
# variables refer to reponses variables (x_1, x_2)
variable = factor(c(rep(0,150),rep(1,50)), label=c("X","Y"))
df = data.frame(subject , time, x,y,value, variable)
library(nlme)
# fit the model that each response variable has intercept and slope (time) for each random and fixed effects
# as well as fixed effects slopes for sex and lesion, and each response has different variance
f= lme(value ~ -1 + x + y + x:time + y:time , random = ~ -1 + (x + y) + time:( x + y)|subject ,
weights = varIdent(form=~1| x),corr = corAR1(from = ~ 1|x), control=lmeControl(opt="optim"), data =df)
Error in corAR1(from = ~1 | x) : unused argument (from = ~1 | x)
Any suggestions?
I found this website (below) which helpful and useful, I posted here in case someone might has this problem in future.
https://rpubs.com/bbolker/3336
I have written a function to run phylogenetic generalized least squares, and everything looks like it should work fine, but for some reason, a specific variable which is defined in the script (W) keeps coming up as undefined. I have stared at this code for hours and cannot figure out where the problem is.
Any ideas?
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(0, 4), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(nlme)
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)
returns "Error in eval(expr, envir, enclos) : object 'W' not found"
even though W is specifically defined!
Thanks!
The error's occuring in the gls calls in myou and corMatrins2: you have to pass in W as a column in dat because gls is looking for it there (when you put weights = ~W as a formula like that it looks for dat$W and can't find it).
Just change data=dat to data=cbind(dat,W=W) in both functions.
The example is not reproducible for me, as lowerB and upperB are not defined, however, perhaps the following will work for you, cbinding dat with W:
myou <- function(alpha, datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data=dat, correlation = corMartins(alpha, tree, fixed = TRUE),weights = ~ W,method = "REML")
return(as.numeric(fm$logLik))
}
corMartins2<-function(datax, datay, tree){
data.frame(datax[tree$tip.label,],datay[tree$tip.label,],row.names=tree$tip.label)->dat
colnames(dat)<-c("Trait1","Trait2")
result <- optimize(f = myou, interval = c(lowerB, upperB), datax=datax,datay=datay, tree = tree, maximum = TRUE)
W<-diag(vcv.phylo(tree)) # Weights
### cbind W to dat
dat <- cbind(dat, W = W)
fm <- gls(Trait1 ~ Trait2, data = dat, correlation = corMartins(result$maximum, tree, fixed =T),weights = ~ W,method = "REML")
list(fm, result$maximum)}
#test
require(phytools)
simtree<-rcoal(50)
as.data.frame(fastBM(simtree))->dat1
as.data.frame(fastBM(simtree))->dat2
corMartins2(dat1,dat2,tree=simtree)