R ffdf Error: is.null(names(x)) is not TRUE - r

I'm trying to understand what a particular error message means in R.
I'm using an ff data frame (ffdf) to build an rpart model. That works fine.
However, when I try to apply the predict function (or any function) using ffdfdply, I get a cryptic error message that I can't seem to crack. I'm hoping someone here can shed light on its meaning.
PredictedData<-ffdfdply(x=TrainingData,split=TrainingData$id,
FUN=function(x) {x$Predicted<-predict(Model1,newdata=x)
x})
If I've thought about this correctly, ffdfdply will take the TrainingData table, split it into chunks based on TrainingData$id, then apply the predict function using the model file Model1. Then it will return the table (its labelled x in the function field), combine them back together into the table PredictedData. PredictedData should be the same as TrainingData, except with an additional column called "Predicted" added.
However, when I run this, I get rather unhelpful error message.
2014-07-16 21:16:17, calculating split sizes
2014-07-16 21:16:36, building up split locations
2014-07-16 21:17:02, working on split 1/30, extracting data in RAM of 32 split elements, totalling, 0.07934 GB, while max specified data specified using BATCHBYTES is 0.07999 GB
Error: is.null(names(x)) is not TRUE
In addition: Warning message:
In ffdfdply(x = TrainingData, split = TrainingData$id, FUN = function(x) { :
split needs to be an ff factor, converting using as.character.ff to an ff factor
Yes, every column has a name. Those names contain just alphanumeric characters plus periods. But the error message makes me think that that the columns should not have names? I guess I 'm confused what this means exactly.
I appreciate any hints anyone can provide and I'll be happy to provide more detail.

I think I found the solution to this.
Turns out I had periods in my column names. When I removed those periods, this worked perfectly.

Related

SuperLearner Error in R - Object 'All' not found

I am trying to fit a model with the SuperLearner package. However, I can't even get past the stage of playing with the package to get comfortable with it....
I use the following code:
superlearner<-SuperLearner::SuperLearner(Y=y, X=as.data.frame(data_train[1:30]), family =binomial(), SL.library = list("SL.glmnet"), obsWeights = weights)
y is a numeric vector of the same length as my dataframe "data_train", containing the correct labels with 9 different classes. The dataframe "data_train" contains 30 columns with numeric data.
When i run this, i get the Error:
Error in get(library$screenAlgorithm[s], envir = env) :
Objekt 'All' not found
I don't really know what the problem could be and i can't really wrap my head around the source code. Please note that the variable obsWeights in the function contains a numeric vector of the same length as my data with weights i calculated for the model. This shouldn't be the problem, as it doesn't work either way.
Unfortunately i can't really share my data on here, but maybe someone had this error before...
Thanks!
this seems to happen if you do not attach SuperLearner, you can fix via library(SuperLearner)

In mclapply : scheduled cores 9 encountered errors in user code, all values of the jobs will be affected

I went through the existing stackoverflow links regarding this error, but no solution given there is working (and some questions dont have solutions there either)
Here is the problem I am facing:
I run Arima models in parallel using mclapply of parallel package. The sample data is being split by key onto different cores and results are clubbed together using do.call + rbind (the server I place the script in has 20 cores of cpu which is passed on to mc.cores field)
Below is my mclapply code:
print('Before lapply')
data_sub <- do.call(rbind, mclapply(ds,predict_function,mc.cores=num_cores))
print('After lapply')
I get multiple set of values like below as output of 'predict_function'
So basically, I get the file as given above from multiple cores to be send to rbind. The code works perfectly for some part of data. Now, I get another set of data , same like above with same data type of each column, but different value in column 2
data type of each column is given in the column name above.
For the second case, I get below error:
simpleError in charToDate(x): character string is not in a standard unambiguous format
Warning message:
In mclapply(ds, predict, mc.cores = num_cores) :
scheduled cores 9 encountered errors in user code, all values of the jobs will be affected
I dont see this print: print('After lapply') for the second case, but is visible for first case.
I checked the date column in above dataframe, its in Date format. When I tried unique(df$DATE) it threw all valid values in the format as given above.
What is the cause of the error here? is it the first one due to which mclapply isnt able to rbind the values? Is the warning something we need to understand better?
Any advice would be greatly appreciated.

ImpulseDE2, matrix counts contains non-integer elements

Possibly it's a stupid question (but be patient, I'm a beginner in R's word)... I'm working with ImpulseDE2, a package designed to RNAseq data analysis along different times (see article for more information).
The running function (runImpulseDE2) requires a matrix counts and a annotation data frame. I've created both but it appears this error message:
Error in checkCounts(matCountData, "matCountData"): ERROR: matCountData contains non-integer elements. Requires count data.
I have tried some solutions and nothing seems to work (and I've not found any solution in the Internet)...
as.matrix(data)
(data + 1) > and there isn't NAs nor zero values that originate this error ($ which(is.na(data)) and $ which(data < 1), but both results are integer(0))
as.numeric(data) > and appears another error: ERROR: [Rownames of matCountData] was not given as input.
I think that's something I'm not realizing, but I'm totally locked. Every tip will be welcome!
And here is the (silly) solution! This function seems not to accept float numbers... so applying a simple round is enough to solve this error.
Thanks for your help!

R lapply-split-rbindlist - does subset cause problems?

I'm sure this will be very easy as I'm still an R beginner but here goes...
I've started with a data frame which I've successfully put through lapply-split followed by rbindlist to regenerate as a dataframe.
From this same data set, I've subset some data and performed lapply-split followed by rbindlist and get the following error:
"Error in rbindlist(df) : Item 1 of list input is not a data.frame,
data.table or list"
This is confusing since it's the same (sub)set of data being split by the same parameter.
When I call:
df[1]
I get:
$SWS1Ami
[1] 13451.02
which is the mean value I wanted to calculate for the SWS1Ami group (so it seems to have done the lapply split correctly). When I call:
typeof(df[1])
I see it tells me this element(?) type is a list.
Two questions:
(1) What could cause rbindlist to not work after doing lapply-split? Why does this seem to sometimes work and sometimes not work?
(2) Is there a quick litmus test to tell if your dataframe is in the "right" setup to undergo lapply-split-rbindlist?

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources