Unclear error with mice package - r

I'm using the mice package to interpolate some missing values. I've successfully been using mice in many cases without any problem. However I am now facing an unprecedented problem, that is, after the first iteration I get the following error:
mice(my_data)
iter imp variable
1 1 sunlight
Show Traceback
Rerun with Debug
Error in cor(xobs[, keep, drop = FALSE], use = "all.obs") : 'x' is empty
I have tried to look in the documentation but I cannot find anything useful. I looked up the error on the internet and found this https://stat.ethz.ch/pipermail/r-help/2015-December/434914.html but I was unable to find the answer to the problem described.
Sadly I cannot provide a working example of the data since my_data contains private data that I do not own and therefore cannot make publicly available. my_data is a dplyr dataframe however it looks like there's no difference in using a dplyr or a "base" dataframe.
Could anyone please explain me what is happening and (possibly) how to fix it? Thank you.
EDIT: added some more info on traceback:
cor(xobs[, keep, drop = FALSE], use = "all.obs")
4 remove.lindep(x, y, ry, ...)
3 sampler(p, data, m, imp, r, visitSequence, c(from, to), printFlag,
...)
2 mice::mice(my_data)

Very possible, some columns in the data input are overly correlated that certain methods of imputation are not applicable.

Related

SuperLearner Error in R - Object 'All' not found

I am trying to fit a model with the SuperLearner package. However, I can't even get past the stage of playing with the package to get comfortable with it....
I use the following code:
superlearner<-SuperLearner::SuperLearner(Y=y, X=as.data.frame(data_train[1:30]), family =binomial(), SL.library = list("SL.glmnet"), obsWeights = weights)
y is a numeric vector of the same length as my dataframe "data_train", containing the correct labels with 9 different classes. The dataframe "data_train" contains 30 columns with numeric data.
When i run this, i get the Error:
Error in get(library$screenAlgorithm[s], envir = env) :
Objekt 'All' not found
I don't really know what the problem could be and i can't really wrap my head around the source code. Please note that the variable obsWeights in the function contains a numeric vector of the same length as my data with weights i calculated for the model. This shouldn't be the problem, as it doesn't work either way.
Unfortunately i can't really share my data on here, but maybe someone had this error before...
Thanks!
this seems to happen if you do not attach SuperLearner, you can fix via library(SuperLearner)

Getting a subset error I did not get two months ago when running logistic regression (svyglm) on survey data (SPSS dataset)

I re-run script that previously worked with no errors about two months ago.
I used the haven package to upload an (non-public and proprietary) SPSS dataset and the survey package to analyze complex survey data.
Now, however, when I run even a simple logistic regression where both variables are dummies (coded 0 for no and 1 for yes)...something like this...
f <- read_sav("~/data.sav")
fsd <- svydesign(ids=~1, data=f, weights=~f$weight)
model <- svyglm(exclhlth~male,design=fsd,family=quasibinomial())
...I get the following errors:
Error: Must subset elements with a valid subscript vector.
x Subscript has the wrong type `omit`.
ℹ It must be logical, numeric, or character.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/vctrs_error_subscript_type>
Must subset elements with a valid subscript vector.
x Subscript has the wrong type `omit`.
ℹ It must be logical, numeric, or character.
Backtrace:
1. survey::svyglm(exclhlth ~ male, design = fsd, family = quasibinomial())
2. survey:::svyglm.survey.design(...)
4. survey:::`[.survey.design2`(design, -nas, )
5. base::`[.data.frame`(x$variables, i, ..1, drop = FALSE)
7. vctrs:::`[.vctrs_vctr`(xj, i)
8. vctrs:::vec_index(x, i, ...)
9. vctrs::vec_slice(x, i)
Run `rlang::last_trace()` to see the full context.
I've tried running it where I set male as a factor, and where both are set as factors. I get the same errors.
Since two months ago, I have updated R, Rstudio and both the haven and survey packages. So, I'm guessing that something changed but I am not sure what to do.
I only started transitioning from SPSS to R late last year, so I thank you in advance for any guidance and apologize in advance for newbie mistakes!
Ok, your problem seems to be that the RStudio data import functions are creating classes that hijack the subscript ([) operation. This has happened before, when RStudio switched from creating data.frame to tbl objects, but then it was sufficient to use as.data.frame() before calling svydesign().
Until a new version of the survey package is available, can you try using foreign::read.spss instead of haven::read_sav?
(Also, if you could come up with a less-confidential example and send it to the maintainer, I'm fairly sure he'd be grateful.)
Update: the issue is that the output of na.omit has class omit, and some of the variables have class haven_labelled, and the subsetting operator for haven_labelled is very fussy about the class of its arguments: it has to be plain integer or logical, without a class.
The help for the labelled class suggests using haven::as_factor or haven::zap_labels to coerce these labelled vectors to a standard R class.
Further update: I filed a github issue for the haven package, which was moved to the vctrs package, so this behaviour is likely to be changed.
Further further update: This has been fixed in the development version of vctrs

KSVM (in r) giving - Error in indexes[[j]] : subscript out of bounds

I have been running into this error every time I try to implement ksvm.
My code:
Train11<- read.csv('Train.csv', head=TRUE)
Train11 <- (sapply(Train11, as.numeric)) #convert all data to numeric
Train11 <- as.data.frame(Train11)
ModelV2<-ksvm(CityAssessment~., data=Train11, type= "C-svc", kernel="vanilladot", C=0.1,prob.model=TRUE)
Setting default kernel parameters
Error in indexes[[j]] : subscript out of bounds
I am not sure where I am going wrong. the dimensions of the dataset are 686 x 72. there aren't any NA values in the dataset (I've checked it!) and no infinite values either.
Many thanks!
I had the same problem, turned out I had only one class in my target vector.
For anyone reading this in the future. I had the same problem.
This is likely due to the way the kernlab package handles class probabilities (prob.model = TRUE) internally. If n is small or the classes are severely imbalanced, the internal 3-fold cv fails, probably for the reason user2173836 described.
Solutions:
1.) Set ksvm(..., prob.model = FALSE)
or
2.) Only run models with a large enough n and class balance. For my problem, running many single SVMs as baseline comparison to MTL-SVM, I could just skip over these "bad" tasks.

ImpulseDE2, matrix counts contains non-integer elements

Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!
And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!

R Error in `row.names<-.data.frame`(`*tmp*`, value = value) while using tell of the sensitivity package

I am conducting a sensitivity study using the Sensitivity package. When trying to calculate the sensitivity indices with the output data of the external model I get the error specified in the titel.
The output is a three column table stored in a csv file which I read in as follows:
day1 <- read.csv("day_1_outputs.csv",header=FALSE)
Now when I try to calculate sensitivity indices with the ouput of the first column:
tell(sob.pars,day1[,1])
I get:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
invalid 'row.names' length
At first I thought I should use a matrix like object because in another study I conducted I generated the ouput from a raster image read in as a matrix which worked fine, but that didn't help.
The help page for tell states using a vector for the model results but even if I store the column of the dataframe before using tell the problem persists.
I guess my main problem is that I don't understand the error message in conjunction with the tell function, sob.pars is a list returned by sensitivity analyses objects constructors from the same package so I don't know to which rownames of that object the message is refering.
Any hint is appreciated.
Finally found out what the problem was. The error is kind of missleading.
The problem was not the row names since these were identical, that's what irritated me in the first place. There was obviously nothing wrong with them.
The actual problem was the column names in sob.pars. These were missing. Once I added these everything worked fine. Thanks rawr anyways (I just only now noticed someone had commented on the question, I thought I would be notified when this happens, but I guess not).

Resources