Replacement of NAs for in the variable? - r

Good evening,
I have dataset where there is one variable which is Gender with missing data. Could anyone please help me how could i replace these NAs using R Packages. I have tried the "Mice" package however it does not replace the NAs and its still exist in data under gender column. I have provide the sample data below with my codes. Thanks in advance for the support.
Dataset sample
R- codes used:
Used R-codes
Regards,
Kumar

That's too much code. You can try imputing the missing data with either of method="rf" or method="cart" although in my experience the cart method seems to be more accurate.
You could also use the preprocess function and either of medianImpute or knnImpute as it gives pretty good results for this kind of imputation.
Example with mice:
test_imp<-mice(df,m=5,method="cart",printFlag=F)
test_imputed<-complete(test_imp,3)#Selects third imputation
Example with preprocess
test_1<-preprocess(testdf,"medianImpute")
test_imputed<-predict(test_imputed,test_1)

Related

How to describe cases included in analyses in R?

I'm very new to R and pretty basic with analyses generally. I successfully ran a regression in R, but a lot of my data are missing. I'm fine with that because R just ignores the missing observations in the analyses and shows me the dfs in the summary. My problems is that I'd like to look more into the observations that are included in the analyses, but I'm not sure how to do that.
I tried to do na.omit, but R created a dataset with far fewer observations than it used in the regressions, so I think that takes it too far.
Basically, I'm trying to get the ages for the respondents that were included in the final analyses, not just the ages of the entire sample, many of whom were not included in the analyses.
Any advice you can give me would be very appreciated!! Please let me know if you need more information.
Thank you!
Edited to include Screenshot of data.

Running an ICC analysis

Cannot run an ICC analysis in R
I have loaded my data from excel spreadsheet and have tried the following:
ICC(CMI)
I have removed my row names. I am not sure if I need to convert my columns or use a difference approach. I have loaded the Psych package.
This is my code: ICC(Test)
This is what comes back:
Error in stack.data.frame(x) : no vector columns were selected
Not sure of what this means or how to fix this? Thanks in advance for any help. I really appreciate it.
I had the same problem with a dataset. I suggest you try:
ICC(as.matrix(Test))
. This worked for me. Otherwise, type help(ICC) and check the example and compare the procedure used there compared to your data.
Good luck!

Not losing observations when faced with missing data

I have a dataset where I've fitted a linear model and I've tried to use the step function on this linear model. I get an error message "saying number of rows in use has changed: remove missing values?".
I noticed that a few of the observations (not many) in my dataset had NA values for one variable. I've seen similar questions which suggest using na.omit(), but when I do this I lose the observations. I want to keep the observations however, because they contain useful information for the other variables. Is there a way to use step and avoid losing the observations?
You can call the nobs function to check that the number of observations is unchanged, and its use.fallback argument to potentially guess the missing values. The R documentation however recommends omitting the relevant data before running step.
I would discourage you from simply omitting the missing values if they are indeed really missing. You can use multiple imputation via Amelia to impute the data such that you have a full dataset.
see here: https://cran.r-project.org/web/packages/Amelia/Amelia.pdf
also I would recommend reviewing the book "Statistical Analysis With Missing Data" by R. Little and D.B. Rubin.

Princomp error in R : covariance matrix is not non-negative definite

I have this script which does a simple PCA analysis on number of variables and at the end attaches two coordinates and two other columns(presence, NZ_Field) to the output file. I have done this many times before but now its giving me this error:
I understand that it means there are negative eigenvalues. I looked at similar posts which suggest to use na.omit but it didn't work.
I have uploaded the "biodata.Rdata" file here:
covariance matrix is not non-negative definite
https://www.dropbox.com/s/1ex2z72lilxe16l/biodata.rdata?dl=0
I am pretty sure it is not because of missing values in data because I have used the same data with different "presence" and "NZ_Field" column.
Any help is highly appreciated.
load("biodata.rdata")
#save data separately
coords=biodata[,1:2]
biovars=biodata[,3:21]
presence=biodata[,22]
NZ_Field=biodata[,23]
#Do PCA
bpc=princomp(biovars ,cor=TRUE)
#re-attach data with auxiliary data..coordinates, presence and NZ location data
PCresults=cbind(coords, bpc$scores[,1:3], presence, NZ_Field)
write.table(PCresults,file= "hlb_pca_all.txt", sep= ",",row.names=FALSE)
This does appear to be an issue with missing data so there are a few ways to deal with it. One way is to manually do listwise deletion on the data before running the PCA which in your case would be:
biovars<-biovars[complete.cases(biovars),]
The other option is to use another package, specifically psych seems to work well here and you can use principal(biovars), and while the output is bit different it does work using pairwise deletion, so basically it comes down to whether or not you want to use pairwise or listwise deletion. Thanks!

Error thrown while imputing values using regression FNN package in R

I am trying to impute missing values using regression and have searched thoroughly online and it hasn't been of much help. I read the FNN package documentation for the knn.reg function and find it difficult to interpret. I have a column of missing values in the test data which i want to predict using my training data and have a code like this ::
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,2,3)],data.test[c(2)],5)
But I get the following error:: Error in get.knnx(train, test, k, algorithm) : Data include NAs. The column which contains missing values is col #2. When I exclude the column which has NA values i.e.
regress<-knn.reg(data.train[data.train[,4]==1,][c(1,2,3)],test=data.test[c(1,3)],data.test[c(2)],5)
I get an error:: Error in get.knnx(train, test, k, algorithm) : Number of columns must be same!. Please help !!
You might want to consider the mice package (and read part of the paper).
Using standard settings which have been proven to a good starting point:
library(mice)
mi <- mice(dataset)
mi.reg <- with(data=mi,exp=glm(y~x+z))
Here, simply calling mice() on your data will fill in each NA value. Finer tuning is of course possible (and needed if it would take too long to converge, or if you have reason to believe it is not accurate). Many different types of imputations are possible and are listed on page 16.

Resources