R won't train my data set - r

I at moment trying to train my data, but can't seem to get R to work as I want it.
The data consist of hand written digits (400) where for each hand written number is 18x18 pixels extracted. So in total 400 x 324 data points as training data.
> class(train_data)
[1] "data.frame"
> str(train_data)
'data.frame': 400 obs. of 324 variables:
The code used for training is this
control = trainControl(method="cv",
number = 1,
repeats=0,
p = 0.9,
preProcOptions = list(thresh = 0.8),
)
knnFit = train(x=train_data,
y=factor(testClass[1:400]),
method ='knn',
trControl = control,
preProcess = c('PCA')
)
The problem here is that when i perform the train, i get an error message which i am not able to decipher what the problem is?
the error message is
Error in train.default(x = train_data, y = factor(testClass[1:400]), method = "knn", :
Stopping
In addition: Advarselsbesked:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

Just judging from the error, one would assume that there are NAs in the training set.
If you run sum(is.na(train_data)), you should be able to see whether there are missing values in your set. If there are, then you could use the same command column-by-column to figure out where they're coming from.

Related

randomForest **sometimes** predict()s NA on a training dataset

I got strange behaviour from randomForest: I sometimes get NAs predicted on my training dataset!! It's totally random, see the two runs getting different results!
> rf <- randomForest(formula(rfFormula), data = df2, ntree = 20, keep.forest = TRUE)
> pr <- predict(rf, type = "response")
> any(is.na(pr))
[1] TRUE
> which(is.na(pr))
1283
1001
>
> rf <- randomForest(formula(rfFormula), data = df2, ntree = 20, keep.forest = TRUE)
> pr <- predict(rf, type = "response")
> any(is.na(pr))
[1] FALSE
> which(is.na(pr))
named integer(0)
There are no NAs in my dataset:
> any(is.na(df2))
[1] FALSE
So why is that? Is it a bug in randomForest? Or some trouble related to OOB predictions?
1) Note that I use 119 variables in the formula.
2) Note that I use predict(rf, type = "response") instead of predict(rf, df2, type = "response"), that would be a mistake. I need to use the first way to get the OOB predictions :-)
It was exactly for the reason mentioned by #joran. The low number of trees (20) allowed that sometimes it did happen by chance that one observation was used to construct all 20 trees, and thus there were no trees to get the OOB (out-of-bag) prediction on that observation.
Setting ntree = 100 fixed it.
PS: The irony is that I actually put ntree = 20 for debugging purposes, to be able to quickly debug all errors in my script, and it actually generated a new, very tough one, which wouldn't normally appear :-D So this is how by being too diligent can turn out contraproductive :-)

Can Elastic net in R works with small data set

I have a small data set with 33 rows & 7 columns,sorry I can't able to share my data as it's client data. I need to build a regression model using this data set. I thought to use elastic net. I don't have any prior experience of implementing elastic net in R. So I referred this link http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net/#elastic-net & follow the steps.My elastic net code is given below
model <- train(
dataReg$sales ~., data = dataReg, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
But I'm getting the error message
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Checked my data set dataReg$sales giving me 33 records & my data does not contain any missing value. Can you please guide me to resolve the issue?

predict.MCMCglmm errors in R

I'm getting back into an old dataset, where I had multinomial mcmcglmm models. Understanding the posteriors for these models requires converting the formulas into predictions. I had been using' predict() in R in 2015-2016, and it was working in the following format, at that time:
pred <- predict.MCMCglmm(model, marginal = model$Random$formula, interval="prediction")
I'm now finding that the same line of code throws an error. To be specific: Error in if (!grepl("hu|zi|za|multinomial", object$Residual$family[i])) { :
argument is of length zero
I found this old communication indicating that it is because the family value is null, but when I use the solution model$Residual$family<-rep("gaussian", nrow(model$X)), I still get an error. This time it's Error in if (nat == 0) { : argument is of length zero
I am not understanding anything useful out of the traceback, but here it is just in case:
> 4. simulate.MCMCglmm(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior = posterior, verbose = verbose)
> 3. simulate(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior =
> posterior, verbose = verbose)
> 2. t(simulate(object = object, nsim = nrow(object$Sol), newdata = newdata, marginal = marginal, type = type, it = it, posterior =
> posterior, verbose = verbose))
> 1. predict.MCMCglmm(m1.Ed.all, marginal = m1.Ed.all$Random$formula, interval = "prediction")
I'm clearly missing something, but reading back through the documentation and googling has gotten me no further. I'm not even sure which argument is being referred to as length zero, here, because the family value is no longer blank. Although the fact that the error message changes suggests it does have to do with that value.
As an aside: if there's now a better, smoother way to plot predictions from a multinomial mcmcglmm I would be overjoyed to hear about it.

brms model not converging

I have this brms model
library(brms)
library(dplyr)
x = rep( c(-20:20,-20:20), 5)
y = c(x[1:41]^2, (x[42:82]+5)^2)
group = c(rep("A",41), rep("B",41) )
data = data.frame( x= x, y = y , group = group)
f = brm(y~ gp(x, cov ="exp_quad") +(1|group), data = data, control = list( adapt_delta = .95) )
f
and the model does not fit. I get this error
Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results!
We recommend running more iterations and/or setting stronger priors.
2: There were 1644 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Any idea how to get this to fit ?
Brian is most likely correct, you have created some test data that does not have any variance. Assuming this was a toy dataset for example, and you are working with a real dataset, you need to follow the directions in the warning message. I would try calling brm with the changes that I am making here:
f = brm(y~ gp(x, cov ="exp_quad") + (1|group), data = data, control = list( adapt_delta = .99), iter = 6000)
adapt_delta is always a value between 0 and 1, so if you get a warning message that you need to set it higher than 0.99, you could try 0.999. You didn't specify the number of iterations in your call, so it went with the default, which is 2000 I believe. I have tripled that. Also, if you have multiple cores on your computer, you should be setting cores = 4 in your call so that each chain can run on its own core.

Errors while performing caret tuning in R

I am building a predictive model with caret/R and I am running into the following problems:
When trying to execute the training/tuning, I get this error:
Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps :
missing value where TRUE/FALSE needed
After some research it appears that this error occurs when there missing values in the data, which is not the case in this example (I confirmed that the data set has no NAs). However, I also read somewhere that the missing values may be introduced during the re-sampling routine in caret, which I suspect is what's happening.
In an attempt to solve problem 1, I tried "pre-processing" the data during the re-sampling in caret by removing zero-variance and near-zero-variance predictors, and automatically inputting missing values using a carets knn automatic imputing method preProcess(c('zv','nzv','knnImpute')), , but now I get the following error:
Error: Matrices or data frames are required for preprocessing
Needless to say I checked and confirmed that the input data set are indeed matrices, so I dont understand why I get this second error.
The code follows:
x.train <- predict(dummyVars(class ~ ., data = train.transformed),train.transformed)
y.train <- as.matrix(select(train.transformed,class))
vbmp.grid <- expand.grid(estimateTheta = c(TRUE,FALSE))
adaptive_trctrl <- trainControl(method = 'adaptive_cv',
number = 10,
repeats = 3,
search = 'random',
adaptive = list(min = 5, alpha = 0.05,
method = "gls", complete = TRUE),
allowParallel = TRUE)
fit.vbmp.01 <- train(
x = (x.train),
y = (y.train),
method = 'vbmpRadial',
trControl = adaptive_trctrl,
preProcess(c('zv','nzv','knnImpute')),
tuneGrid = vbmp.grid)
The only difference between the code for problem (1) and (2) is that in (1), the pre-processing line in the train statement is commented out.
In summary,
-There are no missing values in the data
-Both x.train and y.train are definitely matrices
-I tried using a standard 'repeatedcv' method in instead of 'adaptive_cv' in trainControl with the same exact outcome
-Forgot to mention that the outcome class has 3 levels
Anyone has any suggestions as to what may be going wrong?
As always, thanks in advance
reyemarr
I had the same problem with my data, after some digging i found that I had some Inf (infinite) values in one of the columns.
After taking them out (df <- df %>% filter(!is.infinite(variable))) the computation ran without error.

Resources