How to use kNN function in R - r

I ned to run a knn on my data set ( I tried to use dput() to show data set but it doesn't come up in format summary() so unsure how to share it).
I have used the following code
library(caret)
library(class)
set.seed(123)
ind <- createDataPartition(user_col$Nscore, p=0.7,list=FALSE)
training_data <- user_col[1:942,,1 ]
testing_data <- user_col[943:1884,,1 ]
model <- knn(training_data, testing_data, training_data,k=1)
predictions <- as.factor(model)
confusionMatrix(predictions, testing_data[,5])
It stops running at model<- ..... with this error
Error in knn(training_data, testing_data, training_data, k = 1) : 'train' and 'class' have different lengths
I have looked in the environment and both training_data and testing_data are the same sizes so not sure where the error is.

Related

Error : 'data' must be a data.frame, environment, or list

#define training and testing sets
set.seed(555)
train <- df2[1:800, c("charges")]
y_test <- df2[801:nrow(df2), c("charges")]
test <- df2[801:nrow(df2), c("age","bmi","children","smoker")]
#use model to make predictions on a test set
model <- pcr(charges~age+bmi+children+smoker, data = train, scale=TRUE, validation="CV")
pcr_pred <- predict(model, test, ncomp = 4)
#calculate RMSE
sqrt(mean((pcr_pred - y_test)^2))
I dont know why i get this error... already tried number of things but still stuck here
When you executed:
train <- df2[1:800, c("charges")]
You created an R atomic character vector. The class of the result would not be a list unless you also added the drop=FALSE parameter:
train <- df2[1:800, c("charges"), drop=FALSE]
That should fix that error although the lack of any data prevents any of us from determining whether further errors might arise. Actually, I'm pretty sure you did not want that train object to be just a single column since your model obviously expected other columns. Try this instead:
set.seed(555)
train <- df2[1:800, ]
test <- df2[801:nrow(df2), ]
#use model to make predictions on a test set
model <- pcr(charges~age+bmi+children+smoker, data = train, scale=TRUE, validation="CV")
pcr_pred <- predict(model, test, ncomp = 4)
#calculate RMSE
sqrt(mean((pcr_pred - y_test)^2))

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

Error in KNN 'train' and 'class' have different lengths, cl length =1

Good day
I'm getting and error trying to fit a KNN model. I created a test and train dataset as follows:
train <- bank_norm[1:32950, ]
test <- bank_norm[32951:41188, ]
train_labels <- bank[1:32950, 1]
bpredict <- knn(train, test, train_labels, 203)
To replicate, the two files are on this link:
[Box Folder with two datasets to replicate][1]
bank <- read_csv("~/R/bank.csv")
bank_norm <- read_csv("~/R/bank_norm.csv")
then execute the code to create train and test and fit the model.
Any idea what could be wrong? Thanks in advance.

Error with svydesign using imputed data sets

I am analyzing an imputed dataset using svydesign but I am getting an error. Below is the code:
library(mitools)
library(survey)
data(nhanes)
nhanes$hyp <- as.factor(nhanes$hyp)
imp <- mice(nhanes,method=c("polyreg","pmm","logreg","pmm"), seed = 23109)
des<-svydesign(id=~1, strat=~age, data=imputationList(imp))
Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class ""call"" to a data.frame
I am following the tutorial from this page:
http://r-survey.r-forge.r-project.org/survey/svymi.html
how do i modify the code for it to work?
EDIT:
I change data=imputationList(imp) to data=complete(imp,1) and i was able to make the code work. However, this is not efficient since I have to do this to all my imputed sets. Is there something worng with using imputationList?
mice() produces the results and the imputationList requires a list of all five data.frame with the imputed values, but you need to use mice::complete to construct those five completed data.frame objects
library(mitools)
library(survey)
library(mice)
data(nhanes)
nhanes$hyp <- as.factor(nhanes$hyp)
imp <- mice(nhanes,method=c("polyreg","pmm","logreg","pmm"), seed = 23109)
imp_list <- lapply( 1:5 , function( n ) complete( imp , action = n ) )
des<-svydesign(id=~1, strat=~age, data=imputationList(imp_list))

Error in predict.randomForest

I was hoping someone would be able to help me out with an issue I am having with the prediction function of the randomForest package in R. I keep getting the same error when I try to predict my test data:
Here's my code so far:
extractFeatures <- function(RCdata) {
features <- c(4, 9:13, 17:20)
fea <- RCdata[, features]
fea$Week <- as.factor(fea$Week)
fea$Age_Range <- as.factor(fea$Age_Range)
fea$Race <- as.factor(fea$Race)
fea$Referral_Source <- as.factor(fea$Referral_Source)
fea$Referral_Source_Category <- as.factor(fea$Referral_Source_Category)
fea$Rehire <- as.factor(fea$Rehire)
fea$CLFPR_.HS <- as.factor(fea$CLFPR_.HS)
fea$CLFPR_HS <- as.factor(fea$CLFPR_HS)
fea$Job_Openings <- as.factor(fea$Job_Openings)
fea$Turnover <- as.factor(fea$Turnover)
return(fea)
}
gp <- runif(nrow(RCdata))
RCdata <- RCdata[order(gp), ]
train <- RCdata[1:4600, ]
test <- RCdata[4601:6149, ]
rf <- randomForest(extractFeatures(train), suppressWarnings(as.factor(train$disposition_category)), ntree=100, importance=TRUE)
testpredict <- predict(rf, extractFeatures(test))
"Error in predict.randomForest(rf, extractFeatures(test)) :
Type of predictors in new data do not match that of the training data."
I have tried adding in the following line to the code, and still receive the same error:
testpredict <- predict(rf, extractFeatures(test), type="prob")
I found the source of the error being the fact that the training data has a level or two that is not found in the test data. So when I tried another suggestion I found online to adjust the levels of the test data to that of the training data, I keep getting NULL values in the fields I am using in both the training and test sets.
levels(test$Referral)
NULL
I can see the levels when I use the function, however.
levels(as.factor(test$Referral))
So then I tried the same suggestion I found online with adjusting the levels of the test to equal that of the training data using the following function and received an error:
levels(as.factor(test$Referral)) -> levels(as.factor(train$Referral))
Error in `levels<-.factor`(`*tmp*`, value = c(... :
number of levels differs
I am sure there is something simple I am missing (I am still very new to R), so any insight you can provide would be unbelievably helpful. Thanks!

Resources