Can Elastic net in R works with small data set - r

I have a small data set with 33 rows & 7 columns,sorry I can't able to share my data as it's client data. I need to build a regression model using this data set. I thought to use elastic net. I don't have any prior experience of implementing elastic net in R. So I referred this link http://www.sthda.com/english/articles/37-model-selection-essentials-in-r/153-penalized-regression-essentials-ridge-lasso-elastic-net/#elastic-net & follow the steps.My elastic net code is given below
model <- train(
dataReg$sales ~., data = dataReg, method = "glmnet",
trControl = trainControl("cv", number = 10),
tuneLength = 10
)
But I'm getting the error message
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Checked my data set dataReg$sales giving me 33 records & my data does not contain any missing value. Can you please guide me to resolve the issue?

Related

error when using causalweights package in R

I was trying to estimate a causal effect using inverse probability weighting from the causalweightspackage. However, I keep running into the following error message:
Error in model.frame.default(formula = d ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
I want to estimate the causal effect taking into consideration a matrix including multiple control variables. When using a single control from the data-set, R manages to generate an estimate, but when I try to use the matrix including all my control variables, I receive the above-mentioned error message.
My code is as follows and appears to generate estimates when using a single control instead of my predefined matrix of multiple controls as observable in the following code:
attach(data_clean2)
controls <- cbind(marits_1, nationality1, mother_tongue1, educ1,
lastj_fct1, child_subsidies, contr_2y,
unempl_r, gdp_gr, insured_earn)
ipw_atet <- treatweight(y = duration_ue2, # take initial data
d = treatment,
x = controls,
ATET = TRUE, # if = FALSE, estimates ATE (default)
trim = (1-pscore_max0),
boot = 2)
Has anyone encountered similar problems and found a solution?
Thanks in advance

Spatial panel model in R (spml): error with balanced spatial panel data

I looked at many posts on similar issue but didn't find a solution for my case.
I have balanced panel data for the CBSAs in the US from 2005-2015, when I ran fixed effects model with the following command, I get reasonable results and the panel is shown to be balanced.
plm(formula = fm, data = pCBSA_balanced, effect = "twoway", model = "within")
Balanced Panel: n = 938, T = 11, N = 10318
(actual results omitted...)
However, when I proceed to run the same model with spatial lag using the following command and its variations,
spml(formula = fm, data = pCBSA_balanced, listw = newCBSA_nb.lw, model = "within", effect="twoways", lag=TRUE,zero.policy=T,na.action = na.pass))
I get the following error...
Error in if (!balanced) stop("Estimation method unavailable for unbalanced panels") : missing value where TRUE/FALSE needed
I made sure there are no missing values in my panel data, ordered the panel to have id variable CBSAFP and year as my first and second variables, then sorted by year as suggested by other posts in similar situations, but the error persisted :(
I suspect the error might has something to do with 42 non-neighboring CBSAs in the weighting matrix. This is how I defined my listw, with the zero.policy option.
newCBSA_nb <- poly2nb(CBSA4p, queen = TRUE, row.names = CBSA4p$CBSAFP)
newCBSA_nb.lw <- nb2listw(newCBSA_nb, style="W", zero.policy=T)
I'm not willing to just get rid of the non-neighboring CBSAs for my analysis, does anyone have any suggestions on what I should do next?

Errors while performing caret tuning in R

I am building a predictive model with caret/R and I am running into the following problems:
When trying to execute the training/tuning, I get this error:
Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps :
missing value where TRUE/FALSE needed
After some research it appears that this error occurs when there missing values in the data, which is not the case in this example (I confirmed that the data set has no NAs). However, I also read somewhere that the missing values may be introduced during the re-sampling routine in caret, which I suspect is what's happening.
In an attempt to solve problem 1, I tried "pre-processing" the data during the re-sampling in caret by removing zero-variance and near-zero-variance predictors, and automatically inputting missing values using a carets knn automatic imputing method preProcess(c('zv','nzv','knnImpute')), , but now I get the following error:
Error: Matrices or data frames are required for preprocessing
Needless to say I checked and confirmed that the input data set are indeed matrices, so I dont understand why I get this second error.
The code follows:
x.train <- predict(dummyVars(class ~ ., data = train.transformed),train.transformed)
y.train <- as.matrix(select(train.transformed,class))
vbmp.grid <- expand.grid(estimateTheta = c(TRUE,FALSE))
adaptive_trctrl <- trainControl(method = 'adaptive_cv',
number = 10,
repeats = 3,
search = 'random',
adaptive = list(min = 5, alpha = 0.05,
method = "gls", complete = TRUE),
allowParallel = TRUE)
fit.vbmp.01 <- train(
x = (x.train),
y = (y.train),
method = 'vbmpRadial',
trControl = adaptive_trctrl,
preProcess(c('zv','nzv','knnImpute')),
tuneGrid = vbmp.grid)
The only difference between the code for problem (1) and (2) is that in (1), the pre-processing line in the train statement is commented out.
In summary,
-There are no missing values in the data
-Both x.train and y.train are definitely matrices
-I tried using a standard 'repeatedcv' method in instead of 'adaptive_cv' in trainControl with the same exact outcome
-Forgot to mention that the outcome class has 3 levels
Anyone has any suggestions as to what may be going wrong?
As always, thanks in advance
reyemarr
I had the same problem with my data, after some digging i found that I had some Inf (infinite) values in one of the columns.
After taking them out (df <- df %>% filter(!is.infinite(variable))) the computation ran without error.

R won't train my data set

I at moment trying to train my data, but can't seem to get R to work as I want it.
The data consist of hand written digits (400) where for each hand written number is 18x18 pixels extracted. So in total 400 x 324 data points as training data.
> class(train_data)
[1] "data.frame"
> str(train_data)
'data.frame': 400 obs. of 324 variables:
The code used for training is this
control = trainControl(method="cv",
number = 1,
repeats=0,
p = 0.9,
preProcOptions = list(thresh = 0.8),
)
knnFit = train(x=train_data,
y=factor(testClass[1:400]),
method ='knn',
trControl = control,
preProcess = c('PCA')
)
The problem here is that when i perform the train, i get an error message which i am not able to decipher what the problem is?
the error message is
Error in train.default(x = train_data, y = factor(testClass[1:400]), method = "knn", :
Stopping
In addition: Advarselsbesked:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Just judging from the error, one would assume that there are NAs in the training set.
If you run sum(is.na(train_data)), you should be able to see whether there are missing values in your set. If there are, then you could use the same command column-by-column to figure out where they're coming from.

Issues with predict function when building a CART model via CrossValidation using the train command

I am trying to build a CART model via cross validation using the train function of "caret" package.
My data is 4500 x 110 data frame, where all the predictor variables (except the first two, UserId and YOB (Year of Birth) which I am not using for model building) are factors with 2 levels except the dependent variable which is of type integer (although has only two values 1 and 0). Gender is one of the independent variables.
When I ran rpart command to get CART model (using the package "rpart"), i didn't have any problem with the predict function. However, I wanted to improve the model via cross validation, and so used the train function from the package "caret" with the following command:
tr = train(y ~ ., data = subImpTrain, method = "rpart", trControl = tr.control, tuneGrid = cp.grid)
This build the model with the following warning
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
But it did give me a final model (best.tree). However, when I am trying to run the predict function using the following command:
best.tree.pred = predict(best.tree, newdata = subImpTest)
on the test data, it is giving me the following error:
Error in eval(expr, envir, enclos) : object 'GenderMale' not found
The Gender variable has two values: Female, Male
Can anybody help me understand the error
As #lorelai suggested, caret dummy-codes your variables if you supply it a formula. An alternative is to provide it the variables themselves, like so:
tr = train(y = subImpTrain$y, x = subImpTrain[, -subImpTrain$y],
method = "rpart", trControl = tr.control, tuneGrid = cp.grid)
More importantly, however, you shouldn't use predict.rpart and instead use predict.train, like so:
predict(tr, subImpTest)
In which case it would work just fine with the formula interface.
I have had a similar problem in the past, although concerning another algorithm.
Basically, some algorithms transform the factor variables into dummy variables and rename them accordingly.
My solution was to create my own dummies and leave them in numerical format.
I read that decision trees manage to work properly even so.

Resources