Undefined columns error when referencing element of data.frame - r

I am trying to plot many graphs, and I am having an error referencing elements of a data.frame.
Rather than manually change the variable names I would like to loop through and reference the specific variable names.
When I do this I get the "undefined columns selected" error.
When I run this code, I get the correct plot:
xy <- lm(Unfairness_Scale ~ OS_ImpCoreV_A * ImpCoreV_A, data =
branch_annual)
with(branch_annual, interact_plot(xy, pred = OS_ImpCoreV_A, modx =
ImpCoreV_A))
When I run this code, I get the "undefined columns selected" error:
xy <- lm(branch_annual$Unfairness_Scale ~ branch_annual$OS_ImpCoreV_A *
branch_annual$ImpCoreV_A, data = branch_annual)
with(branch_annual, interact_plot(xy, pred = branch_annual$OS_ImpCoreV_A,
modx = branch_annual$ImpCoreV_A))
I have tried several different methods to reference the elements of the data frame but I keep getting the same error. What am I not understanding correctly?
Thanks,
Sebastian

You can use as.formula with character as input from your loop and construct the formula inside lm function.
xy <- lm(as.formula(paste('Unfairness_Scale', '~', 'OS_ImpCoreV_A', '*
', 'ImpCoreV_A')), data = branch_annual)
with(branch_annual, interact_plot(xy, pred = 'OS_ImpCoreV_A',
modx = 'ImpCoreV_A'))

Related

In R, `Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)` but there are no Infs, no NaNs, no `char`s, etc

I am trying to use the lqmm package in R and receiving the error Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1). I can successfully use it for a version of my data in which a variable called cluster_name is averaged over.
I've tried to verify that there are no NaNs or infinite values in my dataset this way:
na_data = mydata
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # yields a dataframe with no observations
is.na(na_data) <- sapply(na_data, is.infinite)
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # still a dataframe with no observations
There are no variables in my dataframe that are type char -- every such variable has been converted to a factor.
When I run my model
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = begin_data, tau=.5, na.action=na.exclude)
on the first 12,528 lines of my dataset, the model works fine. Line 12,529 looks totally normal.
Similarly, if I run tail(mydata, 11943) I get a dataframe that runs without error, but tail(mydata, 11944) gives me a dataframe that generates the error. I can also run a subset from 9990:21825 without error, but extending the dataframe on either side generates the error. The whole dataframe is 29450 observations, and thus this middle slice contains the supposedly problematic observations. I tried making a smaller version of my dataset that contained just the borders of problems, and some observations around them, and I can see that 3/4 cases involve the same subject (7645), but I don't know what to make of that. I don't see how to make this reproducible without providing the whole dataframe (in case you were wondering, the small dataset doesn't cause any error). So here is the csv file I used.
Here is the function that gets the dataframe ready for analysis:
prep_data_set <- function(data_file, brain_var = 'beta', beh_var = 'accuracy') {
data = read.csv(data_file)
data$subject <- factor(data$subject)
data$type <- factor(data$type)
data$type <- relevel(data$type, ref = "S")
data$taught <- factor(data$taught)
data <- subset(data, data$run_num < 13)
data$run = factor(data$run_num)
brain_mean <- mean(data[[brain_var]])
brain_sd <- sd(data[[brain_var]])
beh_mean <- mean(data[[beh_var]])
beh_sd <- sd(data[[beh_var]])
data <- subset(data, data$cluster_name != "")
data$cluster_name <- factor(data$cluster_name)
data$mean_centered_brain <- data[[brain_var]]
data$std_brain <- data$mean_centered_brain/brain_sd
data$mean_centered_beh <- data[[beh_var]]
data$std_beh <- data$mean_centered_beh/beh_sd
return(data)
}
I run
mydata = prep_data_set(file.path(resdir, 'robust0005', 'pos_rel_con__all_clusters.csv'))
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = mydata, tau=.5, na.action=na.exclude)
to generate the error.
By comparison
regular_model = lmer(std_brain ~ type*taught*std_beh + (1|subject/run) +
(1|subject:cluster_name), data = mydata)
runs fine.
I hope there is something interesting and generalizable in this question; I know it's kind of annoying to post to Stack Overflow with some idiosyncratic problem in a ~30000 line dataset.

What does "invalid type (closure) for variable 'variable1'" mean and how do I fix it?

I am trying to write a function in R, which contains a function from another package. The code works perfectly outside a function.
I am guessing, it might have got to do something with the package I am using (survey).
A self-contained code example:
#activating the package
library(survey)
#getting the dataset into R
tm <- read.spss("tm.sav", to.data.frame = T, max.value.labels = 5)
# creating svydesign object (it basically contains the weights to adjust the variables (~persgew: also a column variable contained in the tm-dataset))
tm_w <- svydesign(ids=~0, weights = ~persgew, data = tm)
#getting overview of the welle-variable
#this variable is part of the tm-dataset. it is needed to execute the following steps
table(tm$welle)
# data manipulation as in: taking the v12d_gr-variable as well as the welle-variable and the svydesign-object to create a longitudinal variable which is transformed into a data frame that can be passed to ggplot
t <- svytable(~v12d_gr+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
v12d <- tt[2,]
v12d <- as.data.frame(v12d)
this is the code outside the function, working perfectly. since I have to transform quite a few variables in the exact same way, I aim to create a function to save up some time.
The following function is supposed to take a variable that will be transformed as an argument (v12sd2_gr).
#making sure the survey-object is loaded
tm_w <- svydesign(ids=~0, weights = ~persgew, data = data)
#trying to write a function containing the code from above
ltd_zsw <- function(variable1){
t <- svytable(~variable1+welle, tm_w)
tt <- round(prop.table(t,2)*100, digits=0)
var_ltd_zsw <- tt[2,]
var_ltd_zsw <- as.data.frame(var_ltd_zsw)
return(var_ltd_zsw)
}
Calling the function:
#as v12d has been altered already, I am trying to transform another variable v12sd2_gr
v12sd2 <- ltd_zsw(v12sd2_gr)
Console output:
Error in model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design)) :
invalid type (closure) for variable 'variable1'
Called from: model.frame.default(formula = weights ~ variable1 + welle, data = model.frame(design))
How do I fix it? And what does it mean to dynamically build a formula and reformulating?
PS: I hope it is the appropriate way to answer to the feedback in the comments.
Update: I think I was able to trace the problem back to the argument I am passing (variable1) and I am guessing it has got something to do with the fact, that I try to call a formula within the function. But when I try to call the svytable with as.formula(svytable(~variable1+welle, tm_w))it still doesn't work.
What to do?
I have found a solution to the problem.
Here is the tested and working function:
ltd_test <- function (var, x, string1="con", string2="pro") {
print (table (var))
x$w12d_gr <- ifelse(as.numeric(var)>2,1,0)
x$w12d_gr <- factor(x$w12d_gr, levels = c(0,1), labels = c(string1,string2))
print (table (x$w12d_gr))
x_w <- svydesign(ids=~0, weights = ~persgew, data = x)
t <- svytable(~w12d_gr+welle, x_w)
tt <- round(prop.table(t,2)*100, digits=0)
w12d <- tt[2,]
w12d <- as.data.frame(w12d)
}
The problem appeared to be caused by the svydesgin()-fun. In its output it produces an object which is then used by the formula for svytable()-fun. Thats why it is imperative to first create the x_w-object with svydesgin() and then use the svytable()-fun to create the t-object.
Within the code snippet I posted originally in the question the tm_w-object has been created and stored globally.
Thanks for the help to everyone. I hope this is gonna be of use to someone one day!

R nsltools Regression, preview function doesn't take variables

im quite new to R but wanted to use the packages "nls" and "nlstools" since it has nice tools for analysis and evaluation.
the code I use is:
conB1_2015 = read.csv("C:\\Path_to_File\\conB1_2015.csv")
conB1_2015 = na.omit(conB1_2015)
tRef <- mean(conB1_2015$Mean_Soil_Temp_V2..C., na.rm=TRUE)
rRef <- conB1_2015$Lin_Flux..mymol.m.2.s.1.[which.min(abs(conB1_2015$Mean_Soil_Temp_V2..C.-tRef))]
rMax <- max(conB1_2015$Lin_Flux..mymol.m.2.s.1., na.rm=TRUE)
half <- rMax/2
half_SM <- conB1_2015$Soil_Moist_V3[which.min(abs(conB1_2015$Lin_Flux..mymol.m.2.s.1.-half))]
form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
preview(form, data = conB1_2015, start = c(a = -1.98, b = -0.05), variable = 1)
The Problem is, that i get this Error running this code:
Error in data.frame(value, row.names = rn, check.names = FALSE) :
row names supplied are of the wrong length
When i change the variables in form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
to form <- as.formula(Lin_Flux..mymol.m.2.s.1.~(rRef<-4.41)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM<-7.19)+Soil_Moist_V3)
the function works fine.
I wanted to automate the script to run over several csv's to test different models on different data. Is it really not possible to pass variables into the preview function or am I missing something? There can't be a problem with headers or the data table since it's working fine in the second example.

R won't recognize column names as an object

I'm trying to build a histogram of residual values, however the first step I'm taking to do that is to run a linear model. R will not recognize the column name as an object.
The first three lines of code run fine. The second two give me an error saying the object area_ha cant be found, however, it is one of eight column titles in my data. Any advice on creating a linear model and a histogram to graph residuals would also be very helpful.
dat<-read.csv("/Users/sara/Desktop/birdsinforest.csv", header=TRUE)
linearmodel=lm(abundance ~ area_ha, data = dat)
summary(linearmodel)
area_ha$abundance_predicted = predict(linearmodel)
area_ha$residual = area_ha$abundance - area_ha$abundance_predicted
This is the error I get after running the last two lines of code:
Error in area_ha$abundance_predicted = predict(linearmodel) :
object 'area_ha' not found
Your code:
dat<-read.csv("/Users/sara/Desktop/birdsinforest.csv", header=TRUE)
linearmodel=lm(abundance ~ area_ha, data = dat)
summary(linearmodel)
area_ha$abundance_predicted = predict(linearmodel)
area_ha$residual = area_ha$abundance - area_ha$abundance_predicted
In the above code, area_ha seems like a variable (column name) and not data.frame since you're using it to fit a linear model. You should try the last two lines of code as below:
dat$abundance_predicted <- predict(linearmodel)
dat$residual <- dat$abundance - dat$abundance_predicted

VAR with exogenous variables

I am attempting a VAR model in R with an exogenous variable on:
VARM <- data.frame(y,x1,x2,x3) #x3 is the exogenous variable
First, I want to choose the correct lag order by using VARselect
VARselect(VARM, lag.max = 6, type = "const" , exogen=x3)
I then get the following error : "different row size of y and exogen"
I can't figure out what's causing this. When I view the data frame I have confirmed that the rows are the same and there is no missing observations. I've tried various things to use the x3 variable, but the closest I could get is this error when the VARselect runs:
"No column names supplied in exogen, using: exo1 , instead"
Seems that you were almost there. In the details of VARselect it says: "providing a matrix object for exogen". If, in addition, you do not want to get a warning (not an error) such as "No column names supplied in exogen, using: exo1 , instead" you should provide named matrix. For example:
df <- data.frame(x1 = rnorm(50), x2 = rnorm(50))
model <- VARselect(df, exogen = cbind(x3 = rnorm(50)))

Resources