So I was working on a project in R and I ran into a issue with fitting a KNN model to some data. I was getting different results when I ran the knn from class and kNN from DMwR libraries. I tied using the Weekly data from the psych package but I got similar results. Confusion matrices for the fits give significantly different results as does the strait up comparison between between the predictions.
I am not sure why these two functions are returning different results. Maybe someone can review my sample code and let me know what is going on.
library(ISLR)
WTrain <- subset(Weekly, Year <= 2008)
WTest <- subset(Weekly, Year >= 2009)
library(caret)
library(class)
fitClass <- knn(train = data.matrix(WTrain$Lag2), test = data.matrix(WTest$Lag2), cl=WTrain$Direction, k=5)
confusionMatrix(data = fitClass, reference = WTest$Direction)
library(DMwR)
fitDMwR <- kNN(Direction~Lag2,train = WTrain, test = WTest, norm=FALSE, k=5)
confusionMatrix(table(fitDMwR == 'Down', WTest$Direction =='Down'))
results <- cbind(fitClass,fitDMwR)
head(results)
Related
I am doing a simple change point analysis using the mcp R package, but my results still vary each time I rerun my code even after including set.seed. Appreciate any help on this! (Below is my code, thank you!)
library(rjags)
library(mcp)
model = list(y~1+x,~0+x,~0+x,~0+x,~0+x,~0+x)
set.seed(42)
fit_mcp = mcp(model, data=hosp_df)
summary(fit_mcp)
plot(fit_mcp)
newdata = data.frame(x = c(2023:2040))
prediction <- fitted(fit_mcp, newdata = newdata)
I am running a Bidirectional LSTM for multiclass text classification in R using Keras. I have run my model and I need to create a confusion matrix. I tried using predict_classes() but my RStudio threw an error that predict_classes() was deprecated. I tried to use this bit of code that I found on the RStudio Keras website:
prediction1 <- model %>%
predict(x.test) %>%
k_argmax(axis = -1)
NOTE: x.test is my matrix that contains the text features.
I am not sure how to use it + I have not found any examples of how to use it online so I am quite confused. I would appreciate any help that anyone could provide!
Thanks
You can use 'caret' library to achieve that.
#Install required packages
install.packages('caret')
#Import required library
library(caret)
#Creates vectors having data points
expected_value <- factor(c(1,0,1,0,1,1,1,0,0,1))
predicted_value <- factor(c(1,0,0,1,1,1,0,0,0,1))
#Creating confusion matrix
example <- confusionMatrix(data=predicted_value, reference = expected_value)
#Display results
example
Or the table function:
pred <- model %>% predict(x_test, batch_size = batch_size)
y_pred = round(pred)
# Confusion matrix
confusion_matrix = table(y_pred, y_test)
For the 'caret' example:
https://www.journaldev.com/46732/confusion-matrix-in-r
This is somewhat similar to the question I asked here. However, that question as zero answers and I think this question might be more fruitful in getting a response.
What I am trying to do is remove some features from an mlr created model, without having to fit the model again. For example, if we take the Boston data from the MASS library and create an mlr model, like so:
library(mlr)
library(MASS)
# Using the mlr package to train the data:
bTask <- makeRegrTask(data = Boston, target = "medv")
bLearn <- makeLearner("regr.randomForest")
bMod <- train(bLearn, bTask)
And then I use the task and trained model in some function, for example:
someFunc <- function(task, model){
pred <- predict(model, task)
pred <- pred$data$response
head(pred,10)
}
someFunc(bTask,bMod)
Everything works fine. But Im wondering if it's possible to remove some variables from bMod, without having to fit the mlr trained model again?
I know it's possible to drop features from the task using dropFeatures(), for example:
bTask1 <- dropFeatures(bTask, c("zn", "chas", "rad"))
But if I try to mix bTask1 and bMod like so:
pred1 <- predict(bMod, btask1)
I get the sensible error:
Error in predict.randomForest(.model$learner.model, newdata =
.newdata, : variables in the training data missing in newdata
Is there a way of dropping some features from the mlr created model (i.e, bMod) without fitting it again?
This is my first attempt using a machine learning paradigm in R. I'm using a planet data set (url: https://www.kaggle.com/mrisdal/open-exoplanet-catalogue) and I simply want to predict a planet's size based on the size of its Sun. This is the code I currently have, using nnet():
library(nnet)
#Organize data:
cols_to_keep = c(1,4,21)
full_data <- na.omit(read.csv('Planet_Data.csv')[, cols_to_keep])
#Split data:
train_data <- full_data[sample(nrow(full_data), round(nrow(full_data)/2)),]
rownames(train_data) <- 1:nrow(train_data)
test_data <- full_data[!rownames(full_data) %in% rownames(data1),]
rownames(test_data) <- 1:nrow(test_data)
#nnet
nnet_attempt <- nnet(RadiusJpt~HostStarRadiusSlrRad, data=train_data, size=0, linout=TRUE, skip=TRUE, maxNWts=10000, trace=FALSE, maxit=1000, decay=.001)
nnet_newdata <- predict(nnet_attempt, newdata=test_data)
nnet_newdata
When I print nnet_newdata I get a value for each row in my data, but I don't really understand what these values mean. Is this a proper way to use the nnet() package to predict a simple regression?
Thanks
When predict is called for an object with class nnet you will get, by default, the raw output from the nnet model applied to your new dataset. If, instead, yours is a classification problem, you can use type = "class".
See here.
I am doing just a regular logistic regression using the caret package in R. I have a binomial response variable coded 1 or 0 that is called a SALES_FLAG and 140 numeric response variables that I used dummyVars function in R to transform to dummy variables.
data <- dummyVars(~., data = data_2, fullRank=TRUE,sep="_",levelsOnly = FALSE )
dummies<-(predict(data, data_2))
model_data<- as.data.frame(dummies)
This gives me a data frame to work with. All of the variables are numeric. Next I split into training and testing:
trainIndex <- createDataPartition(model_data$SALE_FLAG, p = .80,list = FALSE)
train <- model_data[ trainIndex,]
test <- model_data[-trainIndex,]
Time to train my model using the train function:
model <- train(SALE_FLAG~. data=train,method = "glm")
Everything runs nice and I get a model. But when I run the predict function it does not give me what I need:
predict(model, newdata =test,type="prob")
and I get an ERROR:
Error in dimnames(out)[[2]] <- modelFit$obsLevels :
length of 'dimnames' [2] not equal to array extent
On the other hand when I replace "prob" with "raw" for type inside of the predict function I get prediction but I need probabilities so I can code them into binary variable given my threshold.
Not sure why this happens. I did the same thing without using the caret package and it worked how it should:
model2 <- glm(SALE_FLAG ~ ., family = binomial(logit), data = train)
predict(model2, newdata =test, type="response")
I spend some time looking at this but not sure what is going on and it seems very weird to me. I have tried many variations of the train function meaning I didn't use the formula and used X and Y. I used method = 'bayesglm' as well to check and id gave me the same error. I hope someone can help me out. I don't need to use it since the train function to get what I need but caret package is a good package with lots of tools and I would like to be able to figure this out.
Show us str(train) and str(test). I suspect the outcome variable is numeric, which makes train think that you are doing regression. That should also be apparent from printing model. Make it a factor if you want to do classification.
Max