Crossvalidation of polynomial lm in R - error: lengths differ - r

I found other questions regarding this topic, such as this, however I am keep getting the error message
Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths
differ
Below is the code I am using:
library(DAAG)
attach(ultrasonic)
g.poly = lm(UR ~ poly(MD, 3), data = ultrasonic)
cv.poly <- cv.lm(ultrasonic, g.poly ,m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
Of course, the length is same:
> length(UR)
[1] 214
> length(MD)
[1] 214
Note that in the same script, I perform another linear regression with crossvalidation, which works.
library(DAAG)
g.lin = lm(log(UR) ~ MD, data = ultrasonic)
cv.lin <- cv.lm(ultrasonic, g.lin ,m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
Any idea why the polynomial regression crossvalidation does not work?
EDIT
To get the data:
install.packages('nlsmsn')
library('nlsmsn')
data(Ultrasonic)
#names differ, i am using copy in local machine with lower case u(ultrasonic) and different column names, but data are identical.
#UR = y
#MD = x

DAAG:::cv.lm obviously does not support everything you can do with lm, e.g., it does not support functions in the formula. You need to take an intermediate step.
mf <- as.data.frame(model.matrix(y ~ poly(x), data = Ultrasonic))
mf$y <- Ultrasonic$y
mf$`(Intercept)` <- NULL
#sanitize names
names(mf) <- make.names(names(mf))
#[1] "poly.x." "y"
g.poly.san <- lm(y ~ ., data = mf)
cv.poly <- cv.lm(mf, g.poly.san, m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
#works

Related

Invalid graphics state model4you

Following the model-based recursive partitioning in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6015941/ I want to replicate the following code:
sim_data <- function(n=2000){
x1 <- rnorm(n)
x2 <- rbinom(n,1,0.3)
x3 <- runif(n)
x4 <- rnorm(n)
t <- rbinom(n,1,0.5)
z <- 1-x2+x1+2*(x1>=0)*x2*t-2*(x1<0)*x2*t
pr <- 1/(1+exp(-z))
y <- as.factor(rbinom(n,1,pr))
data.frame(x1,x3,x2=as.factor(x2),x4, t=factor(t,labels=c("C","A")),y,z)
}
dt <- sim_data()
dt.num = as.data.frame(sapply(dt, as.numeric))
dt.num$y <- dt.num$y-1 #only to convert outcome 1,2 into 0,1
mbase <- glm(y~t, data=dt.num,
family = binomial())
round(summary(mbase)$coefficients,3)
library("model4you")
pmtr <- pmtree(mbase, zformula = ~. ,
data = dt.num,
control = ctree_control(minbucket = 250))
plot(pmtr, terminal_panel = node_pmterminal(pmtr,
plotfun = binomial_glm_plot,
confint = TRUE))
However, the following inexplicable error occurs:
Error in .Call.graphics(C_palette2, .Call(C_palette2, NULL)) :
invalid graphics state
I was looking for a solution to this problem in the post Persistent invalid graphics state error when using ggplot2. But the problem persists.
Any clue?
Thank you in advance
When I tried to replicate this, I got a different error:
plot(pmtr, terminal_panel = node_pmterminal(pmtr, plotfun = binomial_glm_plot, confint = TRUE))
## Waiting for profiling to be done...
## Error in plotfun(mod = list(coefficients = c(`(Intercept)` = -0.16839363929017, :
## Plotting currently only works for models with a single factor covariate.
## We recommend using partykit or ggparty plotting functionalities!
The reason for this is that the panel function expects both the response and the treatment to be binary factors (as in dt). When you use binary numeric variables instead (as in dt.num) the model estimation in glm() leads to equivalent output but the plot() functionality is confused.
When I refit both the glm() and the pmtree() with dt rather than dt.num everything works as intended for me, yielding the following graphic:

R nls function, error in lhs - rhs non-numeric argument to binary operator

I am trying to use the nls (non linear least squared) function on some covid data. I think the error is in my "sigfunction", but I can't seem to figure out how to make it work.
covid <- read.csv("covid19.csv")
plot(covid$Algeria ~ covid$days,xlab = "Time (days)",ylab="Cases")
I know that the line should follow this function.
s = width, m = middle, a = height
sigfunction <- function(a,x,m,s){a*exp(((x-m)/s)^2)}
mod <- nls(y ~ sigfunction, start=list(m=70,s=60,a=30), trace=TRUE)
but the nls command gives me the error:
Error in lhs - rhs : non-numeric argument to binary operator
2. nlsModel(formula, mf, start, wts)
1. nls(y ~ sigfunction, start = list(m = 70, s = 60, a = 30), trace = TRUE)
thanks to G. Grothendieck i fixed one problem in the function, but now when i run it i get this error:
Error in qr(.swts * gr) : dims [product 4] do not match the length of object [146]
i changed the last two lines to:
Algeria <- covid$Algeria
sigfunction <- function(a,x,m,s){a*exp(((x-m)/s)^2)}
mod <- nls(Algeria ~ sigfunction(a,x,m,s), start=list(x=75,m=70,s=60,a=30), trace=TRUE)

Variables length differ on Step function r

I fitted a model using the lmer() function (it works well). I have 11 explanatory variables. Three of them, if present in model, cause the step() function (from package lmerTest) to return the error: "Variables length differ (found on "...")" where "..." is the formula call.
I don't have any NA values in the data: there are 600 rows and all three of the problematic variables (H, I, J) are factors.
My code is:
library(purrr) ## for rdunif()
library(lmerTest)
data2 = as.data.frame(matrix(c(rdunif(600*7,1,5),
rdunif(600*3,0,1),
rdunif(600,1,9),
rep(c("a","b"),300)),
nrow = 600), byrow = FALSE)
names(data2) = c("A","B","C","D", "E","F","G","H","I","J","Z","M")
data2[,7:10] = lapply(data2[,7:10],factor)
data2[,c(1:6,11)] = lapply(data2[,c(1:6,11)],as.numeric)
mod1 = lmer(Z ~ A+B+C+D+E+F+G+
#H+
#I+
#J+
(1|M),data2)
step.mod1 = lmerTest::step(mod1) #it works
#
mod2 = lmer(Z ~ A+B+C+D+E+F+G+H+
#I+
#J+
(1|M),data2)
step.mod2 = lmerTest::step(mod2) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+")
mod3 = lmer(Z ~ A+B+C+D+E+F+G+H+I+J+
(1|M),data2)
step.mod3 = lmerTest::step(mod3) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+H+I+")
I know that this error is common when there are NAs, but what is the error in this case? How can I fix it?

Can the boxTidwell function handle binary outcome variables?

I initially wanted to run a boxTidwell() (found in the "car" package) analysis on my prospective Logistic Regression model (BinaryOutcomeVar ~ ContinuousPredVar + ContinuousPredVar^2 + ContinuousPredVar^3). I ran into issues:
Error in x - xbar : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA
So, I created a reproducable example for demonstrating the error:
Doesn't work:
boxTidwell(formula = Treatment ~ uptake, other.x = ~ poly(x = colnames(CO2)[c(1,2,4)], degree = 2), data = CO2)
boxTidwell(y = CO2$Treatment, x = CO2$uptake)
Works:
boxTidwell(formula = prestige ~ income + education, other.x = ~ poly(x = women , degree = 2), data = Prestige)
I've been goofing around with the other.x parameter and am guessing that's the issue.
Question
So, does anyone know if 1. the boxTidwell() function works with binary outcome variables 2. the logic behind the other.x, because I can't get my dummy example to work either.
After further searching, it looks like the car:::boxTidwell can't handle the binary outcome variable in the formula, but it can be hand coded:
require(MASS)
require(car)
d1<-read.csv("path for your csv file",sep=',',header=TRUE)
x<-d1$explanatory variable name
y<-d1$dependent variable name
#FIT IS DONE USING THE glm FUNCTION
m1res <- glm(y ~ x,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
lnx<-x*log(x)
m2res <- glm(y ~ x+lnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
alpha0<-1.0
pvalue<-coeff2[3,4]
pvalue
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<-0
err<-1
while (pvalue<0.1) {
alpha <-(beta2/beta1)+alpha0
err<-abs(alpha-alpha0)
alpha0<-alpha
mx<-x^alpha
m1res <- glm(y ~ mx,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
mlnx<-mx*log(x)
m2res <- glm(y ~ mx+mlnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
pvalue<-coeff2[3,4]
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<- iter+1
}
# PRINT THE POWER TO CONSOLE
alpha
above code taken from:
https://sites.google.com/site/ayyalaprem/box-tidwelltransform

Bayes predict, subscript out of bounds

I'm having some problems with the predict function when using bayesglm. I've read some posts that say this problem may arise when the out of sample data has more levels than the in sample data, but I'm using the same data for the fit and predict functions. Predict works fine with regular glm, but not with bayesglm. Example:
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error:
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# Edit... I just discovered this works.
# Should I be concerned about using these results?
# Not sure why is fails when I specify the dataset
predicted3 <- predict(bayesglmObject, type = "response")
Can't figure out how to predict with a bayesglm object. Any ideas? Thanks!
One of the reasons could be to do with the default setting for the parameter "drop.unused.levels" in the bayesglm command. By default, this parameter is set to TRUE. So if there are unused levels, it gets dropped during model building. However, the predict function still uses the original data with the unused levels present in the factor variable. This causes differences in level between the data used for model building and the one used for prediction (even it is the same data fame -in your case, myData). I have given an example below:
n <- 100
x1 <- rnorm (n)
x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE))
# Replacing 3 with 2 makes the level = 3 as unused
x2[x2==3] <- 2
y <- as.factor(sample(c(1,2),n,replace = TRUE))
myData <- data.frame(x1 = x1, x2 = x2, y = y)
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error - this uses default drop.unused.levels = TRUE
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# this works fine - value of drop.unused.levels is set to FALSE
bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels = FALSE)
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
I think a better way would be to use droplevels to drop the unused levels from the data frame beforehand and use it for both model building and prediction.

Resources