Variables length differ on Step function r

Variables length differ on Step function r - r

I fitted a model using the lmer() function (it works well). I have 11 explanatory variables. Three of them, if present in model, cause the step() function (from package lmerTest) to return the error: "Variables length differ (found on "...")" where "..." is the formula call.
I don't have any NA values in the data: there are 600 rows and all three of the problematic variables (H, I, J) are factors.
My code is:
library(purrr) ## for rdunif()
library(lmerTest)
data2 = as.data.frame(matrix(c(rdunif(600*7,1,5),
rdunif(600*3,0,1),
rdunif(600,1,9),
rep(c("a","b"),300)),
nrow = 600), byrow = FALSE)
names(data2) = c("A","B","C","D", "E","F","G","H","I","J","Z","M")
data2[,7:10] = lapply(data2[,7:10],factor)
data2[,c(1:6,11)] = lapply(data2[,c(1:6,11)],as.numeric)
mod1 = lmer(Z ~ A+B+C+D+E+F+G+
#H+
#I+
#J+
(1|M),data2)
step.mod1 = lmerTest::step(mod1) #it works
#
mod2 = lmer(Z ~ A+B+C+D+E+F+G+H+
#I+
#J+
(1|M),data2)
step.mod2 = lmerTest::step(mod2) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+")
mod3 = lmer(Z ~ A+B+C+D+E+F+G+H+I+J+
(1|M),data2)
step.mod3 = lmerTest::step(mod3) #it does not work and returns: Variables length differ (found on "A+B+C+D+E+F+G+H+I+")
I know that this error is common when there are NAs, but what is the error in this case? How can I fix it?

Related

Passing strings into 'contrasts' argument of lme/lmer

I am writing a sub-routine to return output of longitudinal mixed-effects models. I want to be able to pass elements from lists of variables into lme/lmer as the outcome and predictor variables. I would also like to be able to specify contrasts within these mixed-effects models, however I am having trouble with getting the contrasts() argument to recognise the strings as the variable names referred to in the model specification within the same lme/lme4 call.
Here's some toy data,
set.seed(345)
A0 <- rnorm(4,2,.5)
B0 <- rnorm(4,2+3,.5)
A1 <- rnorm(4,6,.5)
B1 <- rnorm(4,6+2,.5)
A2 <- rnorm(4,10,.5)
B2 <- rnorm(4,10+1,.5)
A3 <- rnorm(4,14,.5)
B3 <- rnorm(4,14+0,.5)
score <- c(A0,B0,A1,B1,A2,B2,A3,B3)
id <- rep(1:8,times = 4, length = 32)
time <- factor(rep(0:3, each = 8, length = 32))
group <- factor(rep(c("A","B"), times =2, each = 4, length = 32))
df <- data.frame(id = id, group = group, time = time, score = score)
Now the following call to lme works just fine, with contrasts specified (I know these are the default so this is all purely pedagogical).
mod <- lme(score ~ group*time, random = ~1|id, data = df, contrasts = list(group = contr.treatment(2), time = contr.treatment(4)))
The following also works, passing strings as variable names into lme using the reformulate() function.
t <- "time"
g <- "group"
dv <- "score"
mod1R <- lme(reformulate(paste0(g,"*",t), response = "score"), random = ~1|id, data = df)
But if I want to specify contrasts, like in the first example, it doesn't work
mod2R <- lme(reformulate(paste0(g,"*",t), response = "score"), random = ~1|id, data = df, contrasts = list(g = contr.treatment(2), t = contr.treatment(4)))
# Error in `contrasts<-`(`*tmp*`, value = contrasts[[i]]) : contrasts apply only to factors
How do I get lme to recognise that the strings specified to in the contrasts argument refer to the variables passed into the reformulate() function?

You should be able to use setNames() on the list of contrasts to apply the full names to the list:
# Using a %>% pipe so need to load magrittr
library(magrittr)
mod2R <- lme(reformulate(paste0(g,"*",t), response = "score"),
random = ~1|id,
data = df,
contrasts = list(g = contr.treatment(2), t = contr.treatment(4)) %>%
setNames(c(g, t))
)

Crossvalidation of polynomial lm in R - error: lengths differ

I found other questions regarding this topic, such as this, however I am keep getting the error message
Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths
differ
Below is the code I am using:
library(DAAG)
attach(ultrasonic)
g.poly = lm(UR ~ poly(MD, 3), data = ultrasonic)
cv.poly <- cv.lm(ultrasonic, g.poly ,m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
Of course, the length is same:
> length(UR)
[1] 214
> length(MD)
[1] 214
Note that in the same script, I perform another linear regression with crossvalidation, which works.
library(DAAG)
g.lin = lm(log(UR) ~ MD, data = ultrasonic)
cv.lin <- cv.lm(ultrasonic, g.lin ,m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
Any idea why the polynomial regression crossvalidation does not work?
EDIT
To get the data:
install.packages('nlsmsn')
library('nlsmsn')
data(Ultrasonic)
#names differ, i am using copy in local machine with lower case u(ultrasonic) and different column names, but data are identical.
#UR = y
#MD = x

DAAG:::cv.lm obviously does not support everything you can do with lm, e.g., it does not support functions in the formula. You need to take an intermediate step.
mf <- as.data.frame(model.matrix(y ~ poly(x), data = Ultrasonic))
mf$y <- Ultrasonic$y
mf$`(Intercept)` <- NULL
#sanitize names
names(mf) <- make.names(names(mf))
#[1] "poly.x." "y"
g.poly.san <- lm(y ~ ., data = mf)
cv.poly <- cv.lm(mf, g.poly.san, m=3, plotit=TRUE, printit=TRUE, dots=FALSE, seed=29)
#works

Random Forest using R

i'm working on building a predictive model for breast cancer data using R. After performing gcrma normalization, i generated the potential predictor variables. Now while i run the RF algorithm i encountered the following error
rf_output=randomForest(x=pred.data, y=target, importance = TRUE, ntree = 25001, proximity=TRUE, sampsize=sampsizes)
Error: Error in randomForest.default(x = pred.data, y = target, importance = TRUE, : Can not handle categorical predictors with more than 53 categories.
code:
library(randomForest)
library(ROCR)
library(Hmisc)
library(genefilter)
setwd("E:/kavya's project_work/final")
datafile<-"trainset_gcrma.txt"
clindatafile<-read.csv("mod clinical_details.csv")
outfile="trainset_RFoutput.txt"
varimp_pdffile="trainset_varImps.pdf"
MDS_pdffile="trainset_MDS.pdf"
ROC_pdffile="trainset_ROC.pdf"
case_pred_outfile="trainset_CasePredictions.txt"
vote_dist_pdffile="trainset_vote_dist.pdf"
data_import=read.table(datafile, header = TRUE, na.strings = "NA", sep="\t")
clin_data_import=clindatafile
clincaldata_order=order(clin_data_import[,"GEO.asscession.number"])
clindata=clin_data_import[clincaldata_order,]
data_order=order(colnames(data_import)[4:length(colnames(data_import))])+3 #Order data without first three columns, then add 3 to get correct index in original file
rawdata=data_import[,c(1:3,data_order)] #grab first three columns, and then remaining columns in order determined above
header=colnames(rawdata)
X=rawdata[,4:length(header)]
ffun=filterfun(pOverA(p = 0.2, A = 100), cv(a = 0.7, b = 10))
filt=genefilter(2^X,ffun)
filt_Data=rawdata[filt,]
#Get potential predictor variables
predictor_data=t(filt_Data[,4:length(header)])
predictor_names=c(as.vector(filt_Data[,3])) #gene symbol
colnames(predictor_data)=predictor_names
target= clindata[,"relapse"]
target[target==0]="NoRelapse"
target[target==1]="Relapse"
target=as.factor(target)
tmp = as.vector(table(target))
num_classes = length(tmp)
min_size = tmp[order(tmp,decreasing=FALSE)[1]]
sampsizes = rep(min_size,num_classes)
rf_output=randomForest(x=pred.data, y=target, importance = TRUE, ntree = 25001, proximity=TRUE, sampsize=sampsizes)
error:"Error in randomForest.default(x = pred.data, y = target, importance = TRUE, : Can not handle categorical predictors with more than 53 categories."
as i'm new to Machine learning i'm unable to proceed. kindly do the needful.
Thnks in advance.

It is hard to say without knowing the data. Run class or summary on all your predictor variables to ensure that they are not accidentally interpreted as characters or factors. If you really do have more than 53 levels, you will have to convert them to binary variables. Example:
mtcars$automatic <- mtcars$am == 0
mtcars$manual <- mtcars$am == 1

R: variable lengths differ (found for '...')

I am doing multinomial logistic regression:
Here is the important code:
ml$Category2 <- relevel(ml$Category, ref = "NON-CRIMINAL")
test <- multinom(Category2 ~ DayOfWeek +PdDistrict+liquor+district+MoonPhase,data = ml)
b<- predict(test, type="class")
The problem is when I tried to make classification table, after I run the code:
factcrime<- factor(b, levels=levels(ml$Category))
ctab<- xtabs(~ Category +factcrime, data=ml)
addmargins(ctab)
Then I received the error message:
Error in model.frame.default(formula = ~Category + factcrime, data =
ml) : variable lengths differ (found for 'factcrime')

Can the boxTidwell function handle binary outcome variables?

I initially wanted to run a boxTidwell() (found in the "car" package) analysis on my prospective Logistic Regression model (BinaryOutcomeVar ~ ContinuousPredVar + ContinuousPredVar^2 + ContinuousPredVar^3). I ran into issues:
Error in x - xbar : non-numeric argument to binary operator
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA
So, I created a reproducable example for demonstrating the error:
Doesn't work:
boxTidwell(formula = Treatment ~ uptake, other.x = ~ poly(x = colnames(CO2)[c(1,2,4)], degree = 2), data = CO2)
boxTidwell(y = CO2$Treatment, x = CO2$uptake)
Works:
boxTidwell(formula = prestige ~ income + education, other.x = ~ poly(x = women , degree = 2), data = Prestige)
I've been goofing around with the other.x parameter and am guessing that's the issue.
Question
So, does anyone know if 1. the boxTidwell() function works with binary outcome variables 2. the logic behind the other.x, because I can't get my dummy example to work either.

After further searching, it looks like the car:::boxTidwell can't handle the binary outcome variable in the formula, but it can be hand coded:
require(MASS)
require(car)
d1<-read.csv("path for your csv file",sep=',',header=TRUE)
x<-d1$explanatory variable name
y<-d1$dependent variable name
#FIT IS DONE USING THE glm FUNCTION
m1res <- glm(y ~ x,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
lnx<-x*log(x)
m2res <- glm(y ~ x+lnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
alpha0<-1.0
pvalue<-coeff2[3,4]
pvalue
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<-0
err<-1
while (pvalue<0.1) {
alpha <-(beta2/beta1)+alpha0
err<-abs(alpha-alpha0)
alpha0<-alpha
mx<-x^alpha
m1res <- glm(y ~ mx,family=binomial(link = "logit"))
coeff1<- coefficients(summary(m1res))
mlnx<-mx*log(x)
m2res <- glm(y ~ mx+mlnx ,family=binomial(link = "logit"))
coeff2<- coefficients(summary(m2res))
pvalue<-coeff2[3,4]
beta1<-coeff1[2,1]
beta2<-coeff2[3,1]
iter<- iter+1
}
# PRINT THE POWER TO CONSOLE
alpha
above code taken from:
https://sites.google.com/site/ayyalaprem/box-tidwelltransform

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Variables length differ on Step function r - r

Related

Passing strings into 'contrasts' argument of lme/lmer

Crossvalidation of polynomial lm in R - error: lengths differ

Random Forest using R

R: variable lengths differ (found for '...')

Can the boxTidwell function handle binary outcome variables?

Categories

Resources