I am using Amelia package in R to handle missing values.I get the below error when i am trying to train the random forest with the imputed data. I am not sure how can i convert amelia class to data frame which will be the right input to the randomForest function in R.
train_data<-read.csv("train.csv")
sum(is.na(train_data))
impute<- amelia(x=train_data,m=5,idvars=c("X13"), interacs=FALSE)
impute<= as.data.frame(impute)
for(i in 1:impute$m) {
model <- randomForest(Y ~X1+X2+X3+X4+X5+X6,
data= as.data.frame(impute))
}
Error in as.data.frame.default(impute) :
cannot coerce class ""amelia"" to a data.frame
If I used input to randomForest as impute$imputations[[i]] I the below error:
model <- randomForest(Y ~X1+X2+X3+X4+X5+X6,
impute$imputations[[i]])
Error: $ operator is invalid for atomic vectors
Can anyone suggest me how can I solve this problem .It would be a great help.
So, I think the first problem is this line here:
impute<= as.data.frame(impute)
Should be:
impute <- as.data.frame(impute)
Which will throw an error.
Multiple imputation replaces the data with multiple datasets, each with different replacements for the missing values. This reflects the uncertainty in those missing values predictions. By turning the Amelia object into a dataframe you are trying to make one data frame out of 5 data frames, and it's not obvious how to do this.
You might want to look into simpler forms of imputation (like imputing by the mean).
This is happening because you are trying to train on variable containing information on imputation you did. It does not have data you need to train on. You need to use the function complete to combine the imputed values in data set.
impute <- amelia(x=train_data,m=5,idvars=c("X13"), interacs=FALSE)
impute <- complete(impute,1)
impute <- as.data.frame(impute)
After this you won't have trouble training or predicting the data.
Related
I am looking at some code that prepares a dataframe for several prediction models to be tested later on. The general idea is to predict NormSec based on all the other columns.
Not sure what predict(dummies,newdata=data) does in this case.
I know that predict is used to predict based on an already trained fit. Why is it used in this case? The code works, just trying to understand it.
data<-read.csv(file="datatable.csv")
attach(data)
#selecting the useful columns from data table:
data<-data.frame(NormSec, Rivalry,Stars,NormFB,SeasonPart,FootballSeason,LeBron,Weekend,LastSeasonWins
,Holiday,BigGame,OverUnders,DaysSinceLast,DaysUntilNext, Weekday, Monthday, NewArena)
dummies <- dummyVars(NormSec~., data = data)
attach(dummies)
#Here is the function I don't get:
dataDescr<-predict(dummies,newdata=data)
dataDescr<-data.frame(dataDescr)
attach(dataDescr)
dummies is a dummy variable object and DataDescr (output of predict()) is the original dataframe without the NormSec column.
From the multiple imputation output (e.g., object of class mids for mice) I want to extract several imputed values for some of the imputed variables into a single dataset that also includes original data with the missing values.
Here are sample dataset and code:
library("mice")
nhanes
tempData <- mice(nhanes, seed = 23109)
Using the code below I can extract these values for each variable into separate datasets:
age_imputed<-as.data.frame(tempData$imp$age)
bmi_imputed<-as.data.frame(tempData$imp$bmi)
hyp_imputed<-as.data.frame(tempData$imp$hyp)
chl_imputed<-as.data.frame(tempData$imp$chl)
But I want to extract several variables to preserve the order of the rows for further analysis.
I would appreciate any help.
Use the complete function from mice package to extract the complete data set including the imputations:
complete(tempData, action = 1)
action argument takes the imputation number or if you need it in "all", "long" formats etc. Refer R documentation.
I have a short question:
I imputed item data using multiple imputation with the MICE package.
After imputation, I would like to sum items to a total score.
However, my data is now in a mids object, and I can't figure out how to do this simple task.
Does anyone have experience with this "problem"?
Best, Leonhard
I figured it out:
Create an object that contains all imputed datasets and the original
dataset
Apply the rowSums()
Reconstruct the .mids object
Example code:
# load .mids object
library("miceadds")
Dmi<-load.Rdata2("imp.Rdata",paste(getwd(),"imp",sep=""))
# create object that contains all imputed datasets and the original dataset
D<-complete(Dmi,"long",include=T)
# use rowSums
D$T<-rowSums(D[2:11])
# reconstruct .mids object
Dmi<-as.mids2(D)
I have fitted a multivariate polynomial using the lm() and step() functions in R. My data has dependent variable Y and some independent variables X1 till Xn. I formatted the formula to fit as follows: Y ~ I(X1^1)+I(X1^2)+I(X2^1)+... etc. When I use the predict() function on the original data everything works, even on the validation points which weren't used for the fit. But, I have to use the predict() function on some simulated data I produced. I made sure the simulated data is in a data.frame and all the elements are of type double like the original data. I copied the column names from the original data (X1, ... ,Xn) to the simulated data. Now when I use the predict() function I get the following error:
Error: variables ‘I(X1^1)’, ‘I(X1^2)’, ‘I(X2^1)’ were specified with different types from the fit
I really don't get it. The column names are the same, the types are the same and both original and simulated data are in a data.frame. What is happening here?
Thanks in advance!!
Sorry for not providing a reproducible example. But I've found a solution. It's not very elegant but here it is. When I coerce the data.frame with the original data to a matrix and then straight back to a data frame again some attributes and other stuff are cut off the original data. If I now use this data.frame for the fitting process the predict() function works also on the simulated data. The simulated data was in matrix format first and was converted to a data.frame. It's still not clear to me if there isn't a more elegant way to get rid off the attr, dimensions and other stuff in the data.frame of the original data. I've tried unname() but that didn't do the job.
I am new to R and I am using rms package for handling imputations. I noticed that the aregImpute function in rms only returns column values which has NA values.
impute <- aregImpute(Y~X1+X2+X3+X4+X5+X6,data= train_data, n.impute=5, nk=0)
impute$imputed$Y
When I tried to find the target value in the imputed data set using impute$imputed$Y, it returned NULL. According to my understanding, because target variable had no NA values I got NULL. My query is that how can I combine imputed dataset with the original dataset so that I will have the complete data set with no NA values. I actually want to try out various algorithm like Decision Tree, Random Forest with the imputed data. Does anyone have any suggestion. It would be a great help.