Error in `$<-.data.frame`(`*tmp*`, "P_value", value = 9.66218350888067e-05) : replacement has 1 row, data has 0 - r

Can some-one help me with my code, i have a code which is calculating a lot of logistic regression at the same time. i used this code also for a lm model and then it worked quite wel, however i tried to adapt it to a glm model but it does not work anymore.
Output_logistic <- data.frame()
glm_output = glm(test[,1] ~ test_2[,1], family = binomial ('logit'))
Output_2 <- data.frame(R_spuared = summary(glm_output)$r.squared)
Output_2$P_value <- summary(glm_output)$coefficients[2,4]
Output_2$Variabele <- paste(colnames(test))
Output_2$Variabele_1 <- paste(colnames(test_2))
Output_2$N_NA <- length(glm_output$na.action)
Output_2$df <- paste(glm_output$df.residual)
Output_logistic <- rbind(Output_logistic,Output_2)
running this code gives the next error:
Error in $<-.data.frame(*tmp*, "P_value", value = 9.66218350888067e-05) :
replacement has 1 row, data has 0
does anybody know what i have to adapt so that the code will work?
Thanks in advance

Your Output_2 is an empty data.frame (it has no rows) because summary(glm_output)$r.squared does not exist, because glm doesn’t report this value.
If you need the R-squared value you’ll have to calculate it yourself. But to fix the error you can simply change your code to construct the data-frame from the existing data in the summary:
output_2 = data.frame(
P_value = summary(glm_output)$coefficients[2, 4],
Variable = colnames(test),
# … etc.
)

Related

In R, `Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)` but there are no Infs, no NaNs, no `char`s, etc

I am trying to use the lqmm package in R and receiving the error Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1). I can successfully use it for a version of my data in which a variable called cluster_name is averaged over.
I've tried to verify that there are no NaNs or infinite values in my dataset this way:
na_data = mydata
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # yields a dataframe with no observations
is.na(na_data) <- sapply(na_data, is.infinite)
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # still a dataframe with no observations
There are no variables in my dataframe that are type char -- every such variable has been converted to a factor.
When I run my model
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = begin_data, tau=.5, na.action=na.exclude)
on the first 12,528 lines of my dataset, the model works fine. Line 12,529 looks totally normal.
Similarly, if I run tail(mydata, 11943) I get a dataframe that runs without error, but tail(mydata, 11944) gives me a dataframe that generates the error. I can also run a subset from 9990:21825 without error, but extending the dataframe on either side generates the error. The whole dataframe is 29450 observations, and thus this middle slice contains the supposedly problematic observations. I tried making a smaller version of my dataset that contained just the borders of problems, and some observations around them, and I can see that 3/4 cases involve the same subject (7645), but I don't know what to make of that. I don't see how to make this reproducible without providing the whole dataframe (in case you were wondering, the small dataset doesn't cause any error). So here is the csv file I used.
Here is the function that gets the dataframe ready for analysis:
prep_data_set <- function(data_file, brain_var = 'beta', beh_var = 'accuracy') {
data = read.csv(data_file)
data$subject <- factor(data$subject)
data$type <- factor(data$type)
data$type <- relevel(data$type, ref = "S")
data$taught <- factor(data$taught)
data <- subset(data, data$run_num < 13)
data$run = factor(data$run_num)
brain_mean <- mean(data[[brain_var]])
brain_sd <- sd(data[[brain_var]])
beh_mean <- mean(data[[beh_var]])
beh_sd <- sd(data[[beh_var]])
data <- subset(data, data$cluster_name != "")
data$cluster_name <- factor(data$cluster_name)
data$mean_centered_brain <- data[[brain_var]]
data$std_brain <- data$mean_centered_brain/brain_sd
data$mean_centered_beh <- data[[beh_var]]
data$std_beh <- data$mean_centered_beh/beh_sd
return(data)
}
I run
mydata = prep_data_set(file.path(resdir, 'robust0005', 'pos_rel_con__all_clusters.csv'))
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = mydata, tau=.5, na.action=na.exclude)
to generate the error.
By comparison
regular_model = lmer(std_brain ~ type*taught*std_beh + (1|subject/run) +
(1|subject:cluster_name), data = mydata)
runs fine.
I hope there is something interesting and generalizable in this question; I know it's kind of annoying to post to Stack Overflow with some idiosyncratic problem in a ~30000 line dataset.

Error in array, regression loop using "plyr"

Good morning,
I´m currently trying to run a truncated regression loop on my dataset. In the following I will give you a reproducible example of my dataframe.
library(plyr)
library(truncreg)
df <- data.frame("grid_id" = rep(c(1,2), 6),
"htcm" = rep(c(160,170,175), 4),
stringsAsFactors = FALSE)
View(df)
Now I tried to run a truncated regression on the variable "htcm" grouped by grid_id to receive only coefficients (intercept such as sigma), which I then stored into a dataframe. This code is written based on the ideas of #hadley
reg <- dlply(df, "grid_id", function(.)
truncreg(htcm ~ 1, data = ., point = 160, direction = "left")
)
regcoef <- ldply(reg, coef)
As this code works for one of my three datasets, I receive error messages for the other two ones. The datasets do not differ in any column but in their absolute length
(length(df1) = 4,000; length(df2) = 100,000; length(df3) = 13,000)
The error message which occurs is
"Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of type vector, was 'NULL'
I do not even know how to reproduce an example where this error code occurs, because this code works totally fine with one of my three datasets.
I already accounted for missing values in both columns.
Does anyone has a guess what I can fix to this code?
Thanks!!
EDIT:
I think I found the origin of error in my code, the problem is most likely about that in a truncated regression model, the standard deviation is calculated which automatically implies more than one observation for any group. As there are also groups with only n = 1 observations included, the standard deviation equals zero which causes my code to detect a vector of length = NULL. How can I drop the groups with less than two observations within the regression code?

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

Errors while performing caret tuning in R

I am building a predictive model with caret/R and I am running into the following problems:
When trying to execute the training/tuning, I get this error:
Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps :
missing value where TRUE/FALSE needed
After some research it appears that this error occurs when there missing values in the data, which is not the case in this example (I confirmed that the data set has no NAs). However, I also read somewhere that the missing values may be introduced during the re-sampling routine in caret, which I suspect is what's happening.
In an attempt to solve problem 1, I tried "pre-processing" the data during the re-sampling in caret by removing zero-variance and near-zero-variance predictors, and automatically inputting missing values using a carets knn automatic imputing method preProcess(c('zv','nzv','knnImpute')), , but now I get the following error:
Error: Matrices or data frames are required for preprocessing
Needless to say I checked and confirmed that the input data set are indeed matrices, so I dont understand why I get this second error.
The code follows:
x.train <- predict(dummyVars(class ~ ., data = train.transformed),train.transformed)
y.train <- as.matrix(select(train.transformed,class))
vbmp.grid <- expand.grid(estimateTheta = c(TRUE,FALSE))
adaptive_trctrl <- trainControl(method = 'adaptive_cv',
number = 10,
repeats = 3,
search = 'random',
adaptive = list(min = 5, alpha = 0.05,
method = "gls", complete = TRUE),
allowParallel = TRUE)
fit.vbmp.01 <- train(
x = (x.train),
y = (y.train),
method = 'vbmpRadial',
trControl = adaptive_trctrl,
preProcess(c('zv','nzv','knnImpute')),
tuneGrid = vbmp.grid)
The only difference between the code for problem (1) and (2) is that in (1), the pre-processing line in the train statement is commented out.
In summary,
-There are no missing values in the data
-Both x.train and y.train are definitely matrices
-I tried using a standard 'repeatedcv' method in instead of 'adaptive_cv' in trainControl with the same exact outcome
-Forgot to mention that the outcome class has 3 levels
Anyone has any suggestions as to what may be going wrong?
As always, thanks in advance
reyemarr
I had the same problem with my data, after some digging i found that I had some Inf (infinite) values in one of the columns.
After taking them out (df <- df %>% filter(!is.infinite(variable))) the computation ran without error.

MXnet odd error

This is my first ANN so I imagine that there might be a lot of things done wrong here. I don't follow
I'm trying to predict species of flowers from iris data set provided in R language but I get following error:
Error in `dimnames<-.data.frame`(`*tmp*`, value = list(n)) :
invalid 'dimnames' given for data frame
My code:
require(mxnet)
train <- iris[1:130,]
test <- iris[131:150,]
train.data <- as.data.frame(train[-5])
train.label <- data.frame(model.matrix(data=train,object =~Species-1))
test.data <- as.data.frame(test[-5])
test.label <- data.frame(model.matrix(data=test,object =~Species-1))
var1 <- mx.symbol.Variable("data")
layer0 <- mx.symbol.FullyConnected(var1, num.hidden=3)
cat.out <- mx.symbol.SoftmaxOutput(layer0)
net.model <- mx.model.FeedForward.create(cat.out,
array.layout = "auto",
X=train.data,
y=train.label,
eval.data = list(data=test.data,label=test.label),
num.round = 20,
array.batch.size = 20,
learning.rate=0.1,
momentum=0.9,
eval.metric = mx.metric.accuracy)
UPDATE:
I managed to get rid of this error by specifying column to use in labels(traning.label[,1]and test.label[,1]).
However now I'm training my net to predict just one of my binary variables while I have 3 (one for each species).
I had the same problem, turned out that:
train.data should be a matrix
train.label should be a numeric vector
Check these two and hopefully it should work.
I had a similar problem but during the prediction step. It turns out that my features were in a Data Frame which was causing the issue. Once I converted the data frame into a matrix, the issue went away.
pred.values = stats::predict(model,as.matrix(features))
instead of
pred.values = stats::predict(model,features)
So, the features need to be a matrix both during training and during the process of making predictions.

Resources