LOOP in R: Error: variable lengths differ - r

I tried to build this Loop so that I can test two outcomes at the same time. However, it produced an error message: "Error in model.frame.default(formula = ~outcome + centered.predictor1 + : variable lengths differ (found for 'centered.predictor1')"
But when I tested each outcome separately, the code (without loop) didn't produce errors.
Thanks in advance for your help!
n1 = rnorm(n = 2000, mean = 0, sd = 1)
n2 = rnorm(n = 2000, mean = 0, sd = 1)
Z_familism = rnorm(n = 2000, mean = 0, sd = 1)
Z_avoidance = rnorm(n = 2000, mean = 0, sd = 1)
Country = rnorm(n = 2000, mean = 0, sd = 1)
Z_anxiety = rnorm(n = 2000, mean = 0, sd = 1)
data01<-data.frame(n1,n2,Z_familism,Z_avoidance,Country,Z_anxiety)
outcome<-c('n1', 'n2')
for (n in outcome){
rsa.data<-data.frame(predictor1=data01$Z_familism,
predictor2=data01$Z_avoidance,
nest=as.factor(data01$Country),
control=data01$Z_anxiety,
multilevel=data01$Country,
outcome=data01[n])
rsa.data <- within.data.frame(rsa.data, {
centered.predictor1 <- predictor1 - 0 #Center predictor 1
centered.predictor2 <- predictor2 - 0 #Center predictor 2
squared.predictor1 <- centered.predictor1* centered.predictor1 #Create squared term
squared.predictor2 <- centered.predictor2* centered.predictor2 #Create squared term
interaction <- centered.predictor1* centered.predictor2 #Create interaction term
})
mlm.model <- lme(outcome ~ centered.predictor1+centered.predictor2 + squared.predictor1 + interaction +squared.predictor2+control,
data = rsa.data,
random = ~ 1|multilevel, # Replace "nesting.variable" with the name of your nesting variable
na.action = "na.omit")
summary(mlm.model) #View Model
intervals(mlm.model, which = "fixed")
vcov(mlm.model) #View covariance of model
}

The problem is when you create the rsa.data dataframe inside the loop, specifically with the outcome column. Instead of data01[n] which returns a dataframe, you should use data01[, n], which returns a numeric vector. That way all your data has the same length.
rsa.data<-data.frame(predictor1=data01$Z_familism,
predictor2=data01$Z_avoidance,
nest=as.factor(data01$Country),
control=data01$Z_anxiety,
multilevel=data01$Country,
outcome=data01[, n])

Related

R simex with nls + power mean method

I am attempting to analyze a dataset using nonlinear least squares that has both measurement error and heteroscedasticity. I was able to fit the model using nls() in R and adjust for heteroscedasticity by weighting the observations by a power function of the fitted values (a technique known as the power mean method). However, when I additionally try to correct for measurement error using the simex package in R, I get the following error:
Error: measurement.error is constant 0 in column(s) 1
Which is strange because I have specified a nonzero measurement error. I have pasted example code below which reproduces this error.
library(simex)
set.seed(123456789)
x = runif(n = 1000, min = 1, max = 3.6)
x_err = x + rnorm(n = 1000, mean = 0, sd = 0.1)
y_mean = 100/(1+10^(log10(100)-x)*0.75)
y_het = y_mean + rnorm(n = 1000, mean = 0, sd = 10*x^-2)
y_het = ifelse(y_het > 0, y_het, 0)
w = (100/(1+10^(log10(100)-x_err)*0.75))^-2
nls_fit = nls(y_het ~ 100/(1+10^((log10(k)-x_err)*h)), start = list("k" = 100, "h" = 0.75), weights = 1/w)
simex(nls_fit, SIMEXvariable = "x_err", measurement.error = 0.1, asymptotic = F)

Vectors for simulations

The code I have so far is written below. This is simulation so there is no actual data. I have two questions:
I have two vectors (treat and cont) but I need to put them into one single vector which I did (vect), however, I need another vector that is coding for treatment vs. control. How do I do that?
For my model (model) I need to fit a linear model testing for a treatment effect but I don't know how to add that effect into what I have or is that what it is testing in the code I have?
library(car)
treat=rnorm(3, mean = 460, sd = 110)
treat
cont=rnorm(3, mean = 415, sd = 110)
cont
vect=c(treat, cont)
vect
nsims = 1000
p.value.saved = coeff.saved = vector()
for (i in 1:nsims) {
treat=rnorm(3, mean = 460, sd = 110)
cont=rnorm(3, mean = 415, sd = 110)
vect=c(treat, cont)
model = glm(treat ~ cont, family = poisson)
p.value.saved[i] = Anova(model)$P[1]
coeff.saved[i] = coef(model)
}
Thank you!
Something like this? (note that you'll get a bunch of warnings for running a poisson regression against continuous data.
n <- 3
nsims <- 10
do.call(
rbind,
lapply(1:nsims, function(.) {
treat <- rnorm(n, mean = 460, sd = 110)
cont <- rnorm(n, mean = 415, sd = 110)
# Instead of vect
df <- data.frame(
y = c(treat, cont),
x = rep(c("treat", "cont"), each = n)
)
# Model the values vs treatment indicator
model <- glm(y ~ x, data = df, family = poisson)
# Extract the model's p-value and coefficient of treatment.
data.frame(p = car::Anova(model)$P, coef = coef(model)[2])
})
)
The first creates the string and the second bit will combine them. In your example they are both length 3, hence the 3 repetition in rep("trt",3)
treat_lab = c(rep("control", 3),rep("trt", 3))
treatment <- cbind(treat_lab,c(treat,cont))

MLE error: initial value in 'vmmin' is not finite

We simulated a data set and created a model.
set.seed(459)
# seed mass
n <- 1000
seed.mass <- round(rnorm(n, mean = 250, sd = 75),digits = 1)
## Setting up the deterministic function
detFunc <- function(a,b,x){
return(exp(a+b*x)) / (1+exp(a+b*x))
}
# logit link function for the binomial
inv.link <- function(z){
p <-1/(1+exp(-z))
return(p)
}
#setting a and b values
a <- -2.109
b <- 0.02
# Simulating data
germination <- (rbinom(n = n, size = 10,
p = inv.link(detFunc(x = seed.mass, a = a, b = b))
))/10
## make data frame
mydata <- data.frame("predictor" = seed.mass, "response" = germination)
# plotting the data
tmp.x <- seq(0,1e3,length.out=500)
plot(germination ~ seed.mass,
xlab = "seed mass (mg)",
ylab = "germination proportion")
lines(tmp.x,inv.link(detFunc(x = tmp.x, a = a, b = b)),col="red",lwd=2)
When we check the model we created and infer the parameters, we get an error:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) : initial value in 'vmmin' is not finite
library(bbmle)
mod1<-mle2(response ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list("a"= -2.109 ,"b"= 0.02))
We're stumped and can't figure out why we're getting this error.
Your problem is that you're trying to fit a binomial outcome (which must be an integer) to a proportion.
You can use round(response*10) as your predictor (to put the proportion back on the count scale; round() is because (a/b)*b is not always exactly equal to a in floating-point math ...) Specifically, with your setup
mod1 <- mle2(round(response*10) ~ dbinom(size = 10,
p = inv.link(detFunc(x = predictor, a = a, b = b))
),
data = mydata,
start = list(a = -2.109 ,b = 0.02))
works fine. coef(mod1) is {-1.85, 0.018}, plausibly close to the true values you started with (we don't expect to recover the true values exactly, except as the average of many simulations [and even then MLE is only asymptotically unbiased, i.e. for large data sets ...]
The proximal problem is that trying to evaluate dbinom() with a non-integer value gives NA. The full output from your model fit would have been:
Error in optim(par = c(a = -2.109, b = 0.02), fn = function (p) :
initial value in 'vmmin' is not finite
In addition: There were 50 or more warnings (use warnings() to see the first 50)
It's always a good idea to check those additional warnings ... in this case they are all of the form
1: In dbinom(x = c(1, 1, 1, 0.8, 1, 1, 1, 1, 1, 1, 1, 0.8, ... :
non-integer x = 0.800000
which might have given you a clue ...
PS you can use qlogis() and plogis() from base R for your link and inverse-link functions ...

Simple Logistic Regression in a Loop?

I have a bunch of features for a multivariate logistic regression, but i Want to test each feature individually for multiple univariate logistic regressions.
I'm trying to do a loop like so
features <- c("f1","f2","f3","f4")
out <- list()
for (f in features) {
mod <- train(form = positive ~ f,
data = training,
method = "glm",
metric = "ROC",
family = "binomial")
out <- append(out,mod)
}
I'm getting an error saying variable lengths differ (found for 'f'). I think it's not recognizing f as the column name? How can I fix this?
For future reference an answer with a reprex that uses the same solution that was probosed by #Rorschach:
x <- runif(50, min = 0, max = 100)
z <- runif(50, min = 0, max = 100)
a <- runif(50, min = 0, max = 100)
b <- runif(50, min = 0, max = 100)
positive <- rbinom(50,1, 0.4)
training <- as.data.frame(cbind(x,z,a,b,positive = positive))
training$positive <- factor(training$positive)
library(caret)
features <- c("x","z","a","b")
out <- list()
for (f in features) {
mod <- train(form = as.formula(paste("positive ~ ", f)),
data = training,
method = "glm",
family = "binomial")
out <- append(out,mod)
}

How do I generate 5000 synthetic data sets in R with 1000 observations in each that are gaussian;

for each I need to For each data set, set σ 2 = 10 and µj = j, where j = 1, . . . , 5, 000 is the index of a data set.
We can use lapply to loop through 1 to 5000 and design a simple function to apply the data to the rnorm function.
lapply(1:5000, function(x) rnorm(n = 1000, mean = x, sd = sqrt(10)))
You can use purrr::map().
map(1:5000, ~ rnorm(n = 10000, mean = .x, sd = 10))
If you want to iterate over two different arguments to rnorm:
n_arg <- c(rep(10000, 2500), rep(20000, 2500))
map2(1:5000, n_arg, ~ rnorm(n = .y, mean = .x, sd = 10))

Resources