In R, `Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)` but there are no Infs, no NaNs, no `char`s, etc - r

I am trying to use the lqmm package in R and receiving the error Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1). I can successfully use it for a version of my data in which a variable called cluster_name is averaged over.
I've tried to verify that there are no NaNs or infinite values in my dataset this way:
na_data = mydata
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # yields a dataframe with no observations
is.na(na_data) <- sapply(na_data, is.infinite)
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # still a dataframe with no observations
There are no variables in my dataframe that are type char -- every such variable has been converted to a factor.
When I run my model
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = begin_data, tau=.5, na.action=na.exclude)
on the first 12,528 lines of my dataset, the model works fine. Line 12,529 looks totally normal.
Similarly, if I run tail(mydata, 11943) I get a dataframe that runs without error, but tail(mydata, 11944) gives me a dataframe that generates the error. I can also run a subset from 9990:21825 without error, but extending the dataframe on either side generates the error. The whole dataframe is 29450 observations, and thus this middle slice contains the supposedly problematic observations. I tried making a smaller version of my dataset that contained just the borders of problems, and some observations around them, and I can see that 3/4 cases involve the same subject (7645), but I don't know what to make of that. I don't see how to make this reproducible without providing the whole dataframe (in case you were wondering, the small dataset doesn't cause any error). So here is the csv file I used.
Here is the function that gets the dataframe ready for analysis:
prep_data_set <- function(data_file, brain_var = 'beta', beh_var = 'accuracy') {
data = read.csv(data_file)
data$subject <- factor(data$subject)
data$type <- factor(data$type)
data$type <- relevel(data$type, ref = "S")
data$taught <- factor(data$taught)
data <- subset(data, data$run_num < 13)
data$run = factor(data$run_num)
brain_mean <- mean(data[[brain_var]])
brain_sd <- sd(data[[brain_var]])
beh_mean <- mean(data[[beh_var]])
beh_sd <- sd(data[[beh_var]])
data <- subset(data, data$cluster_name != "")
data$cluster_name <- factor(data$cluster_name)
data$mean_centered_brain <- data[[brain_var]]
data$std_brain <- data$mean_centered_brain/brain_sd
data$mean_centered_beh <- data[[beh_var]]
data$std_beh <- data$mean_centered_beh/beh_sd
return(data)
}
I run
mydata = prep_data_set(file.path(resdir, 'robust0005', 'pos_rel_con__all_clusters.csv'))
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = mydata, tau=.5, na.action=na.exclude)
to generate the error.
By comparison
regular_model = lmer(std_brain ~ type*taught*std_beh + (1|subject/run) +
(1|subject:cluster_name), data = mydata)
runs fine.
I hope there is something interesting and generalizable in this question; I know it's kind of annoying to post to Stack Overflow with some idiosyncratic problem in a ~30000 line dataset.

Related

Error in `$<-.data.frame`(`*tmp*`, "P_value", value = 9.66218350888067e-05) : replacement has 1 row, data has 0

Can some-one help me with my code, i have a code which is calculating a lot of logistic regression at the same time. i used this code also for a lm model and then it worked quite wel, however i tried to adapt it to a glm model but it does not work anymore.
Output_logistic <- data.frame()
glm_output = glm(test[,1] ~ test_2[,1], family = binomial ('logit'))
Output_2 <- data.frame(R_spuared = summary(glm_output)$r.squared)
Output_2$P_value <- summary(glm_output)$coefficients[2,4]
Output_2$Variabele <- paste(colnames(test))
Output_2$Variabele_1 <- paste(colnames(test_2))
Output_2$N_NA <- length(glm_output$na.action)
Output_2$df <- paste(glm_output$df.residual)
Output_logistic <- rbind(Output_logistic,Output_2)
running this code gives the next error:
Error in $<-.data.frame(*tmp*, "P_value", value = 9.66218350888067e-05) :
replacement has 1 row, data has 0
does anybody know what i have to adapt so that the code will work?
Thanks in advance
Your Output_2 is an empty data.frame (it has no rows) because summary(glm_output)$r.squared does not exist, because glm doesn’t report this value.
If you need the R-squared value you’ll have to calculate it yourself. But to fix the error you can simply change your code to construct the data-frame from the existing data in the summary:
output_2 = data.frame(
P_value = summary(glm_output)$coefficients[2, 4],
Variable = colnames(test),
# … etc.
)

R nsltools Regression, preview function doesn't take variables

im quite new to R but wanted to use the packages "nls" and "nlstools" since it has nice tools for analysis and evaluation.
the code I use is:
conB1_2015 = read.csv("C:\\Path_to_File\\conB1_2015.csv")
conB1_2015 = na.omit(conB1_2015)
tRef <- mean(conB1_2015$Mean_Soil_Temp_V2..C., na.rm=TRUE)
rRef <- conB1_2015$Lin_Flux..mymol.m.2.s.1.[which.min(abs(conB1_2015$Mean_Soil_Temp_V2..C.-tRef))]
rMax <- max(conB1_2015$Lin_Flux..mymol.m.2.s.1., na.rm=TRUE)
half <- rMax/2
half_SM <- conB1_2015$Soil_Moist_V3[which.min(abs(conB1_2015$Lin_Flux..mymol.m.2.s.1.-half))]
form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
preview(form, data = conB1_2015, start = c(a = -1.98, b = -0.05), variable = 1)
The Problem is, that i get this Error running this code:
Error in data.frame(value, row.names = rn, check.names = FALSE) :
row names supplied are of the wrong length
When i change the variables in form <- as.formula(Lin_Flux..mymol.m.2.s.1. ~ (rRef)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM)+Soil_Moist_V3)
to form <- as.formula(Lin_Flux..mymol.m.2.s.1.~(rRef<-4.41)*a*exp(b*Mean_Soil_Temp_V2..C.)*Soil_Moist_V3/(half_SM<-7.19)+Soil_Moist_V3)
the function works fine.
I wanted to automate the script to run over several csv's to test different models on different data. Is it really not possible to pass variables into the preview function or am I missing something? There can't be a problem with headers or the data table since it's working fine in the second example.

"Error in model.frame.default(data = train, formula = cost ~ .) : variable lengths differ", but all variables are length 76?

I'm modeling burrito prices in San Diego to determine whether some burritos are over/under priced (according to the model). I'm attempting to use regsubsets() to determine the best linear model, using the BIC, on a data frame of 76 observations of 14 variables. However, I keep getting an error saying that variable lengths differ, and thus a linear model doesn't work.
I've tried rounding all the observations in the data frame to one decimal place, I've used the length() function on each variable in the data frame to make sure they're all the same length, and before I made the model I used na.omit() on the data frame to make sure no NAs were present. By the way, the original dataset can be found here: https://www.kaggle.com/srcole/burritos-in-san-diego. I cleaned it up a bit in Excel first, removing all the categorical variables that appeared after the "overall" column.
burritos <- read.csv("/Users/Jack/Desktop/R/STOR 565 R Projects/Burritos.csv")
burritos <- burritos[ ,-c(1,2,5)]
burritos <- na.exclude(burritos)
burritos <- round(burritos, 1)
library(leaps)
library(MASS)
yelp <- burritos$Yelp
google <- burritos$Google
cost <- burritos$Cost
hunger <- burritos$Hunger
tortilla <- burritos$Tortilla
temp <- burritos$Temp
meat <- burritos$Meat
filling <- burritos$Meat.filling
uniformity <- burritos$Uniformity
salsa <- burritos$Salsa
synergy <- burritos$Synergy
wrap <- burritos$Wrap
overall <- burritos$overall
variable <- sample(1:nrow(burritos), 50)
train <- burritos[variable, ]
test <- burritos[-variable, ]
null <- lm(cost ~ 1, data = train)
full <- regsubsets(cost ~ ., data = train) #This is where error occurs

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

How to use predict from a model stored in a list in R?

I have a dataframe dfab that contains 2 columns that I used as argument to generate a series of linear models as following:
models = list()
for (i in 1:10){
models[[i]] = lm(fc_ab10 ~ (poly(nUs_ab, i)), data = dfab)
}
dfab has 32 observations and I want to predict fc_ab10 for only 1 value.
I thought of doing so:
newdf = data.frame(newdf = nUs_ab)
newdf[] = 0
newdf[1,1] = 56
prediction = predict(models[[1]], newdata = newdf)
First I tried writing newdf as a dataframe with only one position, but since there are 32 in the dataset on which the model was built, I thought I had to provide at least 32 points as well. I don't think this is necessary though.
Every time I run that piece of code I am given the following error:
Error: variable 'poly(nUs_ab, i) was fitted with type “nmatrix.1” but type “numeric” was supplied.
In addition: Warning message:
In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
longer object length is not a multiple of shorter object length
I thought all I need to use predict was a LM model, predictors (the number 56) given in a column-named dataframe. Obviously, I am mistaken.
How can I fix this issue?
Thanks.
newdf should be a data.frame with column name nUs_ab, otherwise R won't be able to know which column to operate upon (i.e., generate the prediction design matrix). So the following code should work
newdf = data.frame(nUs_ab = 56)
prediction = predict(models[[1]], newdata = newdf)

Resources