RStudio - object not found - kohonen pack - r

I'm trying to write a script for som map. It comes from this tutorial. My problem is that Rstudio doesn't work. I have this code :
require(kohonen)
# Create a training data set (rows are samples, columns are variables
# Here I am selecting a subset of my variables available in "data"
data_train <- data[, c(2,4,5,8)]
# Change the data frame with training data to a matrix
# Also center and scale all variables to give them equal importance during
# the SOM training process.
data_train_matrix <- as.matrix(scale(data_train))
# Create the SOM Grid - you generally have to specify the size of the
# training grid prior to training the SOM. Hexagonal and Circular
# topologies are possible
som_grid <- somgrid(xdim = 20, ydim=20, topo="hexagonal")
# Finally, train the SOM, options for the number of iterations,
# the learning rates, and the neighbourhood are available
som_model <- som(data_train_matrix,
grid=som_grid,
rlen=500,
alpha=c(0.05,0.01),
keep.data = TRUE )
plot(som_model, type="changes")
If I try to run this script it writes this error :
Error in supersom(list(X), ...) : object 'data_train_matrix' not found
> plot(som_model, type="changes")
Error in plot(som_model, type = "changes") : object 'som_model' not found
I dont understand this. What does it means there is not data_train_matrix? I have data_train_matrix a few lines before. When I run just first 3 lines of code (to data_train_matrix <- as.matrix(scale(data_train))) it writes this error :
data_train_matrix <- as.matrix(scale(data_train))
Error in scale(data_train) : object 'data_train' not found
and when I run just the first two lines it writes :
data_train <- data[, c(2,4,5,8)]
Error in data[, c(2, 4, 5, 8)] :
object of type 'closure' is not subsettable
How is it possible that this code works in tutorial while I have so many errors using the same code ?

It looks like the error comes from having no original dataframe-like object. The variable "data-train", a subset of "data", was never properly assigned.
You need to first follow the commented line of creating a training data set.
# Create a training data set (rows are samples, columns are variables
# Here I am selecting a subset of my variables available in "data"
data_train <- data[, c(2,4,5,8)]
R also has a function named "data" and that is how it interprets the code. This function is not subsettable, like most functions in R.
If you create some data at the very front, everything should work.
data = data.frame(matrix(rnorm(20), nrow=2))
data_train <- data[, c(2,4,5,8)]
# the rest of the script as written

Related

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

Convert matrix to factor

I am trying to apply table function but I got this error, so I think that because the test is a factor and the prediction is a matrix:
Error in table(rfe_nB_test_folds[, 7], rfe_nB_predict) :
all arguments must have the same length
for that, I need to convert prediction result to factor so I can use it on table function, but I get this error and I think that because of 10 cross-validation because when I try it without 10 cross-validation it works:
Error in [.default(rfe_nB_predict, , 2) :
incorrect number of dimensions
My code:
set.seed(100)
rfe_nB_folds<-createFolds(BC_bind$outcome, k=10) #create folds
rfe_nB_fun <- lapply (rfe_nB_folds, function(x){
rfe_nB_traing_folds<-BC_bind[-x,]
rfe_nB_test_folds<-BC_bind[x,]
#build the model
rfe_nB_model<-naiveBayes(outcome ~ ., data = rfe_nB_traing_folds) #test the model
rfe_nB_predict<-predict(rfe_nB_model,rfe_nB_test_folds[-7],type="raw")
rfe_nB_predict<-as.factor(rfe_nB_predict)
CR<-roc.curve(rfe_nB_test_folds[,7], rfe_nB_predict[,2])
print(CR)
rfe_nB_table<-table(rfe_nB_test_folds[,7],rfe_nB_predict)
rfe_nB_confusionMatrix<-confusionMatrix(rfe_nB_table,positive = "R") #to see the matrex of echo flods
return (rfe_nB_confusionMatrix$table)
})
I'm used specific columns so I saved them on BC_bind as shown in the code.
Top_6featurs <- wpdc[,c(33,11,10,32,29,12)] #column number of top 6 featur
BC_bind <- data.frame(cbind(Top_6featurs , wpdc$outcome))

Testing Recommendation systems: How to specify how many items were given for the prediction. `calcPredictionAccuracy` function

I am trying to test a binary recommendation systems I created with the recommenderlab package. When I run the calcPredictionAccuracy function I get the following error:
Error in .local(x, data, ...) :
You need to specify how many items were given for the prediction!
I have performed numerous searches and can't seem to find any solution to this issue. If I try to add the given argument the error changes to:
error.ubcf<-calcPredictionAccuracy(p.ubcf, getData(test_index, "unknown", given=3))
Error in .local(x, ...) : unused argument (given = 3)
Here is a quick look at my code:
my data set is binary.watch.ratings
affinity.matrix <- as(binary.watch.ratings,"binaryRatingMatrix")
test_index <- evaluationScheme(affinity.matrix[1:1000], method="split",
train=0.9, given=1)
# creation of recommender model based on ubcf
Rec.ubcf <- Recommender(getData(test_index, "train"), "UBCF")
# creation of recommender model based on ibcf for comparison
Rec.ibcf <- Recommender(getData(test_index, "train"), "IBCF")
# making predictions on the test data set
p.ubcf <- predict(Rec.ubcf, getData(test_index, "known"), type="topNList")
# making predictions on the test data set
p.ibcf <- predict(Rec.ibcf, getData(test_index, "known"), type="topNList")
# obtaining the error metrics for both approaches and comparing them
##error occurs with the following two lines
error.ubcf<-calcPredictionAccuracy(p.ubcf, getData(test_index, "unknown"))
error.ibcf<-calcPredictionAccuracy(p.ibcf, getData(test_index, "unknown"))
error <- rbind(error.ubcf,error.ibcf)
rownames(error) <- c("UBCF","IBCF")
This produces the following error:
error.ubcf<-calcPredictionAccuracy(p.ubcf, getData(test_index, "unknown"))
Error in .local(x, data, ...) :
You need to specify how many items were given for the prediction!
My question is what point in my code must I specify how many items are given for prediction? Is this issue related to the fact that my data is binary?
Thanks
Robert
for topNList, you must specify the number of items you want back. So you add these with the predict() function call:
# making predictions on the test data set
p.ubcf <- predict(Rec.ubcf, getData(test_index, "known"), type="topNList", n=10)
# making predictions on the test data set
p.ibcf <- predict(Rec.ibcf, getData(test_index, "known"), type="topNList", n=10)
By varying n, you will be able to see how it impacts your TP/FP/TN/FN accuracy measures, as well as precision/recall. The calculation methodology for these values is at the bottom of this page:
https://github.com/mhahsler/recommenderlab/blob/master/R/calcPredictionAccuracy.R

MXnet odd error

This is my first ANN so I imagine that there might be a lot of things done wrong here. I don't follow
I'm trying to predict species of flowers from iris data set provided in R language but I get following error:
Error in `dimnames<-.data.frame`(`*tmp*`, value = list(n)) :
invalid 'dimnames' given for data frame
My code:
require(mxnet)
train <- iris[1:130,]
test <- iris[131:150,]
train.data <- as.data.frame(train[-5])
train.label <- data.frame(model.matrix(data=train,object =~Species-1))
test.data <- as.data.frame(test[-5])
test.label <- data.frame(model.matrix(data=test,object =~Species-1))
var1 <- mx.symbol.Variable("data")
layer0 <- mx.symbol.FullyConnected(var1, num.hidden=3)
cat.out <- mx.symbol.SoftmaxOutput(layer0)
net.model <- mx.model.FeedForward.create(cat.out,
array.layout = "auto",
X=train.data,
y=train.label,
eval.data = list(data=test.data,label=test.label),
num.round = 20,
array.batch.size = 20,
learning.rate=0.1,
momentum=0.9,
eval.metric = mx.metric.accuracy)
UPDATE:
I managed to get rid of this error by specifying column to use in labels(traning.label[,1]and test.label[,1]).
However now I'm training my net to predict just one of my binary variables while I have 3 (one for each species).
I had the same problem, turned out that:
train.data should be a matrix
train.label should be a numeric vector
Check these two and hopefully it should work.
I had a similar problem but during the prediction step. It turns out that my features were in a Data Frame which was causing the issue. Once I converted the data frame into a matrix, the issue went away.
pred.values = stats::predict(model,as.matrix(features))
instead of
pred.values = stats::predict(model,features)
So, the features need to be a matrix both during training and during the process of making predictions.

Error in predict.randomForest

I was hoping someone would be able to help me out with an issue I am having with the prediction function of the randomForest package in R. I keep getting the same error when I try to predict my test data:
Here's my code so far:
extractFeatures <- function(RCdata) {
features <- c(4, 9:13, 17:20)
fea <- RCdata[, features]
fea$Week <- as.factor(fea$Week)
fea$Age_Range <- as.factor(fea$Age_Range)
fea$Race <- as.factor(fea$Race)
fea$Referral_Source <- as.factor(fea$Referral_Source)
fea$Referral_Source_Category <- as.factor(fea$Referral_Source_Category)
fea$Rehire <- as.factor(fea$Rehire)
fea$CLFPR_.HS <- as.factor(fea$CLFPR_.HS)
fea$CLFPR_HS <- as.factor(fea$CLFPR_HS)
fea$Job_Openings <- as.factor(fea$Job_Openings)
fea$Turnover <- as.factor(fea$Turnover)
return(fea)
}
gp <- runif(nrow(RCdata))
RCdata <- RCdata[order(gp), ]
train <- RCdata[1:4600, ]
test <- RCdata[4601:6149, ]
rf <- randomForest(extractFeatures(train), suppressWarnings(as.factor(train$disposition_category)), ntree=100, importance=TRUE)
testpredict <- predict(rf, extractFeatures(test))
"Error in predict.randomForest(rf, extractFeatures(test)) :
Type of predictors in new data do not match that of the training data."
I have tried adding in the following line to the code, and still receive the same error:
testpredict <- predict(rf, extractFeatures(test), type="prob")
I found the source of the error being the fact that the training data has a level or two that is not found in the test data. So when I tried another suggestion I found online to adjust the levels of the test data to that of the training data, I keep getting NULL values in the fields I am using in both the training and test sets.
levels(test$Referral)
NULL
I can see the levels when I use the function, however.
levels(as.factor(test$Referral))
So then I tried the same suggestion I found online with adjusting the levels of the test to equal that of the training data using the following function and received an error:
levels(as.factor(test$Referral)) -> levels(as.factor(train$Referral))
Error in `levels<-.factor`(`*tmp*`, value = c(... :
number of levels differs
I am sure there is something simple I am missing (I am still very new to R), so any insight you can provide would be unbelievably helpful. Thanks!

Resources