Error in eval(expr, envir, enclos) : object 'accueil' not found - r

I try ti create a RandomForst model using R for sentiment analysis :
Here the code :
data = as.data.frame(as.matrix(dtm_train), stringsAsFactors = T)
>data
accueil bon depuis banque client très service conseiller agence a plus je
634 0 0 0 0 0 0 0 0 0 0 0 0
3802 0 0 0 0 0 0 0 0 0 0 0 0
16739 0 0 0 0 0 1 0 0 0 0 0 1
20992 0 0 0 0 0 0 0 0 0 0 0 0
4742 0 0 0 0 0 0 0 0 0 0 0 0
5104 0 0 1 0 0 0 0 0 0 0 0 0
6978 1 1 0 1 0 0 0 0 0 0 0 0
21630 0 2 0 0 0 0 1 0 0 0 0 0
13606 0 0 0 0 0 0 0 0 0 0 0 0
21910 0 0 0 0 0 0 0 1 0 0 0 0
8184 0 0 0 0 0 0 0 0 0 0 0 0
...
Note = train[['Note.Reco']]
> Note
[1] 9 10 9 0 10 8 10 7 10 10 5 5 8 8 2 9 8 0 10 10 8 0 8 7 7 6 9 10 8 9 5 10 10 0 5 3 2 8 8 1 7 6 0 8 9 0 5 5 8 6 8
[52] 8 7 8 9 9 9 10 5 4 5 8 8 8 9 9 10 9 8 4 10 9 8 8 8 8 5 0 9 8 7 5 3 2 10 8 10 9 0 10 6 10 8 5 9 10 1 8 9 1
reviews.test = test$reason
[1] "Pas assez service......"
[2] "Pour résidant s....."
[3] " emails, réponses ...."
[4] "Même ...."
review.test_DF = as.data.frame(reviews.test,stringsAsFactors = T)
reviews.svm = randomForest(Note~., data)
pred.svm = predict(reviews.svm, review.test_DF, type="class")
I get this error :
> pred.svm = predict(reviews.svm, review.test_DF, type="class")
Error in eval(expr, envir, enclos) : object 'accueil' not found
Can you help me to resolve this problem?
thank you in advance

Related

Error in running glinternet : a statistical function for automatic model selection using interaction terms by Stanford's professor T. Hastie

The glinternet is an R package and a function that implements an algorithm developed by Trevor Hastie -- the eminent Stanford professor on Statistical Learning -- and his ex-phD student. glinternet() detects automatically interaction terms and as such it is very useful in building a model in a situation with many variables where the possible combinations are enormous.
When I run glinternet I get an error message which I reproduce here using the mtcars base R dataset:
data(mtcars)
setDT(mtcars)
glimpse(mtcars)
x = as.matrix(mtcars[, -c("am"), with = FALSE])
class(x)
y <- mtcars$am
class(y)
glinter_fit <- glinternet(x , y, numLevels = 2)
Error: pCat + pCont == ncol(X) is not TRUE
Your advice will be appreciated.
It's not very clear, but you need to provide a vector that is as long as your number of predictor columns, each element indicating the number of categories for each column.
In your example, in x it's all continuous, so we do:
glinternet(x,y,numLevels=rep(1,ncol(x)))
Call: glinternet(X = x, Y = y, numLevels = rep(1, ncol(x)))
lambda objValue cat cont catcat contcont catcont
1 0.068900 0.1210 0 0 0 0 0
2 0.062800 0.1200 0 1 0 0 0
3 0.057100 0.1180 0 1 0 0 0
4 0.052000 0.1160 0 1 0 0 0
5 0.047300 0.1130 0 2 0 0 0
6 0.043100 0.1100 0 2 0 0 0
7 0.039200 0.1060 0 3 0 0 0
8 0.035700 0.1020 0 3 0 0 0
9 0.032500 0.0983 0 3 0 0 0
10 0.029600 0.0944 0 3 0 0 0
11 0.026900 0.0904 0 3 0 0 0
12 0.024500 0.0866 0 3 0 0 0
13 0.022300 0.0829 0 3 0 0 0
14 0.020300 0.0794 0 3 0 0 0
15 0.018500 0.0760 0 3 0 0 0
16 0.016800 0.0728 0 3 0 1 0
17 0.015300 0.0698 0 4 0 1 0
18 0.014000 0.0668 0 4 0 1 0
19 0.012700 0.0638 0 4 0 2 0
20 0.011600 0.0608 0 4 0 2 0
21 0.010500 0.0579 0 3 0 2 0
22 0.009580 0.0551 0 3 0 2 0
23 0.008720 0.0523 0 3 0 2 0
24 0.007940 0.0497 0 3 0 2 0
25 0.007230 0.0472 0 3 0 3 0
26 0.006580 0.0448 0 5 0 3 0
27 0.005990 0.0425 0 5 0 3 0
28 0.005450 0.0403 0 5 0 3 0
29 0.004960 0.0382 0 5 0 3 0
30 0.004520 0.0361 0 4 0 3 0
31 0.004110 0.0342 0 4 0 3 0
32 0.003740 0.0324 0 4 0 4 0
33 0.003410 0.0307 0 4 0 5 0
34 0.003100 0.0291 0 4 0 6 0
35 0.002820 0.0275 0 3 0 6 0
36 0.002570 0.0261 0 3 0 6 0
37 0.002340 0.0247 0 3 0 8 0
38 0.002130 0.0234 0 3 0 7 0
39 0.001940 0.0221 0 3 0 7 0
40 0.001760 0.0210 0 3 0 7 0
41 0.001610 0.0199 0 3 0 8 0
42 0.001460 0.0188 0 3 0 8 0
43 0.001330 0.0178 0 4 0 10 0
44 0.001210 0.0168 0 4 0 10 0
45 0.001100 0.0159 0 4 0 12 0
46 0.001000 0.0149 0 4 0 12 0
47 0.000914 0.0140 0 4 0 12 0
48 0.000832 0.0132 0 4 0 12 0
49 0.000757 0.0123 0 3 0 13 0
50 0.000689 0.0115 0 2 0 13 0

R: Print omitted 0's in table() - contingency tables [duplicate]

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)
EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

R: Algorithm for setting missing values faster

I have a problem with setting missing values in data frame. In the first 3 columns there are ID of product, ID of store, and number of week. There are also 28 columns from 4 to 31 corresponding to last 28 days of selling item (last 7 days are days in our week). I want to set the missing values by comparing two records with the same first and second column but different number of weeks.
corrections <- function(x,y){
#the functions changes vector y if the difference between weeks is not greeter than 3
if (x[1]==y[1] && x[2]==y[2] && -(x[3]-y[3])<=3){
t=y[3]-x[3]
t=as.integer(t)
a=x[(4+ (t*7) ):31]
b=y[4:(31- (t*7)) ]
c= a-b
for (i in 1:(28-(t*7))){
if (is.na(c[i]))
{
if (!(is.na(a[i]) && is.na(b[i])))
{
if (is.na(b[i]))
b[i]=a[i]
else
a[i]=b[i]
}
}
}
y[4:(31- t*7)]=b
}
return(y)
}
for (i in 2:(dim(salesTraining)[1]) {
salesTraining[i,]=corrections(salesTraining[i-1,], salesTraining[i,])
}
The loop takes 1 minute for every 1000 records so if my data have 212000 records it will take ~3,5 hours (if it's linear complexity). Is there any error or can I do it better - faster?
Example of data frame:
productID storeID weekInData dailySales1 dailySales2 dailySales3 dailySales4 dailySales5
1 1 1 37 0 0 0 0 0
2 1 1 38 0 0 0 0 0
3 1 1 39 0 0 0 0 0
4 1 1 40 0 NA 0 NA 2
5 1 1 41 NA 0 NA 0 0
6 1 1 42 0 0 0 NA 0
7 1 1 43 0 0 NA 0 NA
8 1 1 44 0 2 1 NA 0
9 1 1 45 NA 0 0 NA 0
10 1 1 46 NA 0 0 NA NA
dailySales6 dailySales7 dailySales8 dailySales9 dailySales10 dailySales11 dailySales12 dailySales13
1 NA NA 0 NA 0 0 0 0
2 0 NA NA 0 0 0 0 0
3 0 NA 0 0 0 NA 2 NA
4 0 NA 0 NA 0 NA 0 0
5 0 0 NA 0 0 0 0 0
6 NA 0 NA 0 0 0 0 0
7 0 0 0 2 NA 0 0 0
8 0 NA 0 NA 0 NA 0 1
9 1 0 0 0 0 0 1 0
10 0 0 0 NA 0 NA 0 0
dailySales14 dailySales15 dailySales16 dailySales17 dailySales18 dailySales19 dailySales20
1 0 0 0 0 0 0 0
2 0 0 0 0 5 2 NA
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
6 0 0 2 1 0 0 NA
7 0 0 0 0 0 0 1
8 0 0 0 0 0 1 0
9 0 0 -1 0 0 0 0
10 0 0 0 0 0 0 0
dailySales21 dailySales22 dailySales23 dailySales24 dailySales25 dailySales26 dailySales27
1 NA 0 0 0 5 2 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
5 0 0 NA 1 0 0 0
6 0 0 0 0 0 0 1
7 0 0 0 0 0 1 0
8 0 0 NA 0 0 0 0
9 NA 0 0 0 NA 0 0
10 0 1 0 0 0 0 0
dailySales28 daysStoreClosed_series daysStoreClosed_target dayOfMonth dayOfYear weekOfYear month
1 0 5 2 23 356 51 12
2 0 6 2 30 363 52 12
3 0 6 1 6 5 1 1
4 0 6 1 13 12 2 1
5 0 6 1 19 18 3 1
6 0 5 1 26 25 4 1
7 0 4 1 2 32 5 2
8 0 4 1 9 39 6 2
9 0 4 1 16 46 7 2
10 0 4 1 23 53 8 2
quarter
1 4
2 4
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1

Extend table by adding missing values [duplicate]

This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 8 years ago.
I need to extend a table in R language.
result 3 4 5 6 7 8
5 6 29 295 104 6 0
6 1 9 112 238 66 5
7 0 0 5 29 40 6
Should be extended to
result 1 2 3 4 5 6 7 8 9 10
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
5 0 0 6 29 295 104 6 0 0 0
6 0 0 1 9 112 238 66 5 0 0
7 0 0 0 0 5 29 40 6 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
So I need add zeros in missing values. Also, in alternative scenario an output as a matrix (10x10) with the same data would be satisfying.
EDIT:
table(factor(x, levels = 1:10), factor(y, levels = 1:10)) worked perfectly.
As the guys in the comments mentioned. Factoring works perfectly.
table(factor(x, levels = 1:10), factor(y, levels = 1:10))

Force `table` to include all factors from both arrays in R

I am using the following R code to produce a confusion matrix comparing the true labels of some data to the output of a neural network.
t <- table(as.factor(test.labels), as.factor(nnetpredict))
However, sometimes the neural network doesn't predict any of a certain class, so the table isn't square (as, for example, there are 5 levels in the test.labels factor, but only 3 levels in the nnetpredict factor). I want to make the table square by adding in any factor levels necessary, and setting their counts to zero.
How should I go about doing this?
Example:
> table(as.factor(a), as.factor(b))
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
You can see in the table above that there are 7 rows, but 10 columns, because the a factor only has 7 levels, whereas the b factor has 10 levels. What I want to do is to pad the table with zeros so that the row labels and the column labels are the same, and the matrix is square. From the example above, this would produce:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 1 0 0
2 0 1 0 0 0 0 0 0 1 0
3 0 0 1 0 0 0 0 0 0 1
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
The reason I need to do this is two-fold:
For display to users/in reports
So that I can use a function to calculate the Kappa statistic, which requires a table formatted like this (square, same row and col labels)
EDIT - round II to address the additional details in the question. I deleted my first answer since it wasn't relevant anymore.
This has produced the desired output for the test cases I've given it, but I definitely advise testing thoroughly with your real data. The approach here is to find the full list of levels for both inputs into the table and set that full list as the levels before generating the table.
squareTable <- function(x,y) {
x <- factor(x)
y <- factor(y)
commonLevels <- sort(unique(c(levels(x), levels(y))))
x <- factor(x, levels = commonLevels)
y <- factor(y, levels = commonLevels)
table(x,y)
}
Two test cases:
> #Test case 1
> set.seed(1)
> x <- factor(sample(0:9, 100, TRUE))
> y <- factor(sample(3:7, 100, TRUE))
>
> table(x,y)
y
x 3 4 5 6 7
0 2 1 3 1 0
1 1 0 2 3 0
2 1 0 3 4 3
3 0 3 6 3 2
4 4 4 3 2 1
5 2 2 0 1 0
6 1 2 3 2 3
7 3 3 3 4 2
8 0 4 1 2 4
9 2 1 0 0 3
> squareTable(x,y)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 2 1 3 1 0 0 0
1 0 0 0 1 0 2 3 0 0 0
2 0 0 0 1 0 3 4 3 0 0
3 0 0 0 0 3 6 3 2 0 0
4 0 0 0 4 4 3 2 1 0 0
5 0 0 0 2 2 0 1 0 0 0
6 0 0 0 1 2 3 2 3 0 0
7 0 0 0 3 3 3 4 2 0 0
8 0 0 0 0 4 1 2 4 0 0
9 0 0 0 2 1 0 0 3 0 0
> squareTable(y,x)
y
x 0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 2 1 1 0 4 2 1 3 0 2
4 1 0 0 3 4 2 2 3 4 1
5 3 2 3 6 3 0 3 3 1 0
6 1 3 4 3 2 1 2 4 2 0
7 0 0 3 2 1 0 3 2 4 3
8 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0
>
> #Test case 2
> set.seed(1)
> xx <- factor(sample(0:2, 100, TRUE))
> yy <- factor(sample(3:5, 100, TRUE))
>
> table(xx,yy)
yy
xx 3 4 5
0 4 14 9
1 14 15 9
2 11 11 13
> squareTable(xx,yy)
y
x 0 1 2 3 4 5
0 0 0 0 4 14 9
1 0 0 0 14 15 9
2 0 0 0 11 11 13
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
> squareTable(yy,xx)
y
x 0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 4 14 11 0 0 0
4 14 15 11 0 0 0
5 9 9 13 0 0 0

Resources