Arima loop vs Arima function [R] - r

I am trying to build a function where I estimate a lot of arima models with a for loop.
I got the for loops running and getting me my desired output, but as soon as I try to get my code into the function I get errors.
This is the loop:
model_acc <- data.frame()
for (p in 0:4) {
for(d in 0:2){
for (q in 0:4){
model <- arima(data, order=c(p,d,q), method = "ML")
acc <- accuracy(model)
acc <- as.data.frame(acc)
acc_ext <- data.frame(acc,
loglikeli=logLik(model),
AIC=AIC(model),
BIC=BIC(model),
order=paste(p,d,q,sep=","))
acc_ext <- select(acc_ext,
ME, RMSE, MAE, MAPE,loglikeli,AIC,BIC,order)
model_acc <- rbind(model_acc, acc_ext)
}
}
}
I am aware that there are some models that cannot be computed with Maximum Likelihood, due to the constraints in the optimization. But this loop gets me 61 models out of 75 (just with method="CSS"). I get the models that could be computed.
So the parameters I'd like to move are: data, p_max, d_max, q_max, and, method.
So the function goes like this:
which_arima <- function(data, p_max, d_max, q_max, method){
model_acc <- data.frame()
for (p in 0:p_max) {
for(d in 0:d_max){
for (q in 0:q_max){
model <- arima(data, order=c(p,d,q), method = method)
acc <- accuracy(model)
acc <- as.data.frame(acc)
acc_ext <- data.frame(acc,
loglikeli=logLik(model),
AIC=AIC(model),
BIC=BIC(model),
order=paste(p,d,q,sep=","))
acc_ext <- select(acc_ext,
ME, RMSE, MAE, MAPE,loglikeli,AIC,BIC,order)
model_acc <- rbind(model_acc, acc_ext)
}
}
}
return(model_acc)
}
a <- which_arima(data, p_max=4, d_max=2, q_max=4, method="ML")
But when I execute it, I get this error (referring to the models that could not be computed) and don't get anything. (in the for loop only I got the models that could be computed).
Error in optim(init[mask], armafn, method = optim.method, hessian = TRUE, :
non-finite finite-difference value [4]
In addition: Warning messages:
1: In arima(data, order = c(p, d, q), method = method) :
possible convergence problem: optim gave code = 1
2: In arima(data, order = c(p, d, q), method = method) :
possible convergence problem: optim gave code = 1
3: In log(s2) : NaNs produced
4: In log(s2) : NaNs produced
Called from: optim(init[mask], armafn, method = optim.method, hessian = TRUE,
control = optim.control, trans = as.logical(transform.pars))
Browse[1]> Q
What is going wrong? Because without the function environment is working "fine". And more importantly, how can I solve this?
Thanks in advance!!
Here is the data:
> dput(data)
structure(c(1.04885832686158, 1.06016074629379, 1.0517956106758,
1.02907998600003, 1.05054370620123, 1.07261670636915, 1.0706491823234,
1.0851355199628, 1.08488055975672, 1.08085233559646, 1.081489249884,
1.08587205516048, 1.07249155362154, 1.05497731364761, 1.05675866316574,
1.06428371643968, 1.06065865122313, 1.05621234529568, 1.05339905298902,
1.05787030302435, 1.0658034000068, 1.08707776713932, 1.08626056161822,
1.10238697375394, 1.11390088086972, 1.12120513732074, 1.11937921359653,
1.10341241626668, 1.1156190247407, 1.12376155972358, 1.12411603174635,
1.12183475077377, 1.12994175229071, 1.12956170931204, 1.12199732095331,
1.11645064755987, 1.12481242467782, 1.13066151473637, 1.13028712061827,
1.12694056065497, 1.12382226475179, 1.12352013167586, 1.13391069257413,
1.14763982976838, 1.14481816405703, 1.14852949174863, 1.14182560351963,
1.14086563926171, 1.14491904045717, 1.14897189333479, 1.14616964486707,
1.15074750127031, 1.14681353487065, 1.11151754535415, 1.10497749493861,
1.10963378437214, 1.12415745716768, 1.17507535290893, 1.20285968503846,
1.22784769136553, 1.23940795216891, 1.254741010879, 1.29442450660416,
1.30428779451896, 1.31314618462517, 1.32544236970695, 1.33728107423435,
1.34408499591568, 1.34199331033196, 1.34027541040719, 1.33616830504407,
1.33360421057602, 1.33332422301893, 1.34717794252774, 1.3502492092262,
1.35168291803248, 1.35827816606688, 1.36772644852242, 1.36755741578293,
1.36926148542701, 1.37264481021763, 1.37322962601678, 1.37643913938007,
1.37906284181634, 1.37644362054554, 1.38911039237937, 1.39412557349575,
1.40094895608589, 1.40630864159528, 1.40823485306921, 1.4138446752069,
1.42340582796496, 1.43641264727375, 1.43605231080207, 1.44839810240334,
1.45451041581127, 1.46166006472498, 1.46774816064695, 1.46930608347752,
1.47885183796249, 1.49059366171423, 1.49849145403671, 1.51209667142067,
1.5250141727637, 1.5392257264567, 1.55144303632514, 1.56488453313021,
1.58308777691125, 1.59737589266492, 1.60896279958586, 1.62553339664661,
1.63594174408691, 1.65233080464302, 1.67114336171075, 1.6897476078746,
1.71673790971729, 1.74453973794979, 1.76317526009814, 1.79187692264759,
1.84186982937622, 1.9460629324144, 2.05986108970089, 2.06767436493269,
2.0783176148561, 2.08271855277262, 2.09358626977224, 2.09674958523685,
2.11582742548029, 2.12810020369675, 2.13596929171732, 2.13972610568317,
2.14456803530813, 2.15013985201827, 2.16007349878874, 2.17165498940627,
2.18057666565755, 2.19162746118342, 2.20308765886345, 2.21304799942168,
2.22367586966847, 2.23629862083737, 2.24751866055731, 2.26100586740225,
2.40972893063106, 2.60366275683037, 2.68572993101095, 2.70501080420283,
2.6676315643757, 2.6479269687206, 2.64641010174172, 2.69966594490103,
2.69665303568271, 2.71396750774502, 2.71900427132191, 2.72876269360869,
2.76276620421252, 2.76620189252239, 2.74632816231219, 2.74196673817286,
2.72905831066292, 2.75190757584346, 2.77801573354251, 2.84089580821293,
2.85681823660541, 2.84754572013613, 2.85858396073969, 2.86184353545653,
2.86958309986952, 2.94279115543111, 2.98631808884879, 3.00648449252989,
3.00620698598987, 3.15207693676406, 3.27614511764022, 3.32011714920345,
3.39367422894347, 3.64822360464499, 3.61835354049394, 3.59374251055335,
3.63237359915986, 3.62209957896007, 3.64554153297999, 3.71611226971083,
3.76031231050606, 3.80307769833913, 3.77959145461296, 3.74772344909971,
3.95072671083008, 4.03652777624058, 4.06630193640976, 4.08838169421096,
4.09074775372752, 4.09286687677964, 4.11466378890098, 4.14350067096966,
4.18153835521181, 4.21299240125327, 4.23975062689892, 4.26683207875595,
4.29265054707555, 4.31835343358436, 4.34946580314932, 4.37865522989399,
4.41294135451665), .Dim = c(204L, 1L), .Dimnames = list(NULL,
"price"), .Tsp = c(2004, 2020.91666666667, 12), class = "ts")

We could add a tryCatch in the function
which_arima <- function(data, p_max, d_max, q_max, method){
model_acc <- data.frame()
for (p in 0:p_max) {
for(d in 0:d_max){
for (q in 0:q_max){
tryCatch({
model <- arima(data, order=c(p,d,q), method = method)
acc <- accuracy(model)
acc <- as.data.frame(acc)
acc_ext <- data.frame(acc,
loglikeli=logLik(model),
AIC=AIC(model),
BIC=BIC(model),
order=paste(p,d,q,sep=","))
acc_ext <- select(acc_ext,
ME, RMSE, MAE, MAPE,loglikeli,AIC,BIC,order)
model_acc <- rbind(model_acc, acc_ext)
}, error = function(e) NA)
}
}
}
return(model_acc)
}
-testing
a <- which_arima(data, p_max=4, d_max=2, q_max=4, method="ML")
-output
> a
ME RMSE MAE MAPE loglikeli AIC BIC order
Training set 3.916595e-14 1.00150757 0.84665890 47.3734354 -289.77077 583.54155 590.17779 0,0,0
Training set1 1.507413e-03 0.50685119 0.42608540 23.8330904 -153.49920 312.99840 322.95276 0,0,1
Training set2 1.477754e-03 0.27462038 0.23150111 12.9162286 -31.20907 70.41814 83.69062 0,0,2
Training set3 1.349691e-03 0.16826013 0.13265807 7.3234273 67.17326 -124.34652 -107.75592 0,0,3
Training set4 1.205197e-03 0.12347033 0.09404764 5.1708085 132.56865 -253.13729 -233.22857 0,0,4
Training set5 1.649574e-02 0.03945226 0.02063318 0.9365795 367.68785 -733.37570 -730.06250 0,1,0
Training set6 1.103986e-02 0.03456075 0.01736215 0.7957414 394.41586 -784.83172 -778.20531 0,1,1
Training set7 1.033720e-02 0.03443721 0.01713550 0.7848747 395.13798 -784.27595 -774.33634 0,1,2
Training set8 9.546954e-03 0.03417545 0.01651661 0.7683963 396.69035 -785.38071 -772.12788 0,1,3
Training set9 8.268413e-03 0.03353547 0.01710311 0.7855244 400.43015 -790.86030 -774.29427 0,1,4
Training set10 1.081905e-04 0.03982073 0.01849307 0.8429273 363.50025 -725.00049 -721.69223 0,2,0
Training set11 2.800510e-03 0.03429965 0.01750163 0.8103622 392.52320 -781.04639 -774.42986 0,2,1
Training set12 2.920421e-03 0.03214346 0.01515181 0.7129633 405.66898 -805.33795 -795.41315 0,2,2
Training set13 2.915234e-03 0.03206868 0.01541923 0.7234715 406.11610 -804.23221 -790.99914 0,2,3
Training set14 2.915216e-03 0.03206786 0.01543761 0.7239875 406.11852 -802.23704 -785.69571 0,2,4
Training set15 1.609540e-02 0.03954680 0.02075934 0.9489873 365.76961 -725.53923 -715.58487 1,0,0
Training set16 1.067822e-02 0.03464237 0.01747532 0.8057485 392.50610 -777.01221 -763.73973 1,0,1
Training set17 7.714409e-03 0.03500020 0.01712196 0.8100354 390.85979 -771.71958 -755.12898 1,0,2
Training set18 9.510129e-03 0.03417676 0.01653561 0.7702435 398.64834 -785.29668 -765.38796 1,0,3
Training set19 9.299540e-03 0.03407723 0.01644942 0.7661016 399.22596 -784.45192 -761.22508 1,0,4
Training set20 8.521452e-03 0.03440107 0.01658612 0.7665062 395.36364 -786.72729 -780.10088 1,1,0
Training set21 9.502976e-03 0.03434348 0.01673934 0.7705014 395.69269 -785.38538 -775.44577 1,1,1
Training set22 3.638516e-03 0.03220174 0.01508764 0.7126483 408.13770 -808.27541 -795.02259 1,1,2
Training set23 3.626362e-03 0.03212825 0.01534293 0.7227711 408.58054 -807.16108 -790.59505 1,1,3
Training set24 8.353323e-03 0.03319389 0.01722817 0.8063780 402.38983 -792.77965 -772.90042 1,1,4
Training set25 1.322429e-04 0.03862934 0.01853910 0.8452931 369.60607 -735.21213 -728.59560 1,2,0
Training set26 2.950783e-03 0.03271462 0.01554742 0.7271035 402.15143 -798.30287 -788.37806 1,2,1
Training set27 2.918645e-03 0.03207500 0.01535616 0.7214170 406.08191 -804.16382 -790.93075 1,2,2
Training set28 2.915432e-03 0.03206844 0.01542446 0.7236258 406.11678 -802.23356 -785.69222 1,2,3
Training set29 2.892408e-03 0.03184546 0.01585528 0.7398747 407.44682 -802.89365 -783.04404 1,2,4
Training set30 3.275778e-02 0.06802502 0.03811120 1.7010099 257.85907 -507.71814 -494.44566 2,0,0
Training set31 9.458801e-03 0.03434793 0.01677640 0.7737224 397.64430 -785.28860 -768.69800 2,0,1
Training set32 1.041047e-02 0.03449857 0.01757479 0.8092751 393.34656 -774.69312 -754.78440 2,0,2
Training set33 1.036041e-02 0.03438249 0.01712067 0.7851881 397.17474 -780.34949 -757.12265 2,0,3
Training set34 9.291907e-03 0.03413569 0.01668650 0.7780739 395.47305 -774.94611 -748.40115 2,0,4
Training set35 8.657322e-03 0.03439622 0.01656361 0.7657220 395.39192 -784.78384 -774.84422 2,1,0
Training set36 8.975188e-03 0.03415064 0.01646538 0.7625588 396.82841 -785.65683 -772.40401 2,1,1
Training set37 3.623756e-03 0.03213180 0.01528195 0.7207391 408.54688 -807.09376 -790.52773 2,1,2
Training set38 3.632392e-03 0.03218922 0.01509295 0.7124041 408.20813 -804.41627 -784.53703 2,1,3
Training set39 3.593594e-03 0.03190425 0.01582339 0.7407521 409.90942 -805.81883 -782.62639 2,1,4
Training set40 2.046999e-04 0.03534316 0.01743186 0.8004223 387.39069 -768.78138 -758.85658 2,2,0
Training set41 2.900379e-03 0.03229554 0.01543942 0.7252999 404.68622 -801.37243 -788.13936 2,2,1
Training set42 2.912130e-03 0.03206051 0.01549744 0.7258632 406.16110 -802.32220 -785.78086 2,2,2
Training set43 2.748199e-03 0.03106662 0.01710118 0.8027724 411.95382 -811.90764 -792.05804 2,2,3
Training set44 2.757572e-03 0.03048849 0.01571731 0.7454319 413.92678 -813.85355 -790.69568 2,2,4
Training set45 8.190706e-03 0.03447649 0.01665674 0.7750253 393.49946 -776.99891 -760.40831 3,0,0
Training set46 8.485733e-03 0.03422971 0.01656490 0.7726290 394.93100 -777.86199 -757.95327 3,0,1
Training set47 9.212683e-03 0.03436951 0.01678990 0.7781612 393.54762 -773.09523 -749.86839 3,0,2
Training set48 8.721991e-03 0.03406162 0.01638032 0.7597073 399.31535 -782.63070 -756.08574 3,0,3
Training set49 -1.095108e-03 0.03200273 0.01626031 0.8020173 407.59681 -797.19361 -767.33053 3,0,4
Training set50 6.642458e-03 0.03334238 0.01646485 0.7579776 401.61737 -795.23474 -781.98192 3,1,0
Training set51 3.614071e-03 0.03235398 0.01536258 0.7247132 407.14878 -804.29756 -787.73153 3,1,1
Training set52 3.626052e-03 0.03212051 0.01541776 0.7251026 408.62481 -805.24962 -785.37038 3,1,2
Training set53 3.434470e-03 0.03112232 0.01708791 0.8047847 414.43876 -814.87751 -791.68507 3,1,3
Training set54 3.429177e-03 0.03037721 0.01633882 0.7892697 417.28525 -818.57050 -792.06486 3,1,4
Training set55 2.343659e-04 0.03506668 0.01740936 0.7937255 388.95388 -769.90777 -756.67470 3,2,0
Training set56 2.921378e-03 0.03207232 0.01556489 0.7249596 406.11547 -802.23095 -785.68961 3,2,1
Training set57 2.923439e-03 0.03200307 0.01554917 0.7264361 406.53973 -801.07945 -781.22984 3,2,2
Training set58 2.772438e-03 0.03033715 0.01644824 0.7949022 414.69079 -815.38158 -792.22370 3,2,3
Training set59 2.758142e-03 0.03032083 0.01638087 0.7893286 414.81461 -813.62923 -787.16309 3,2,4
Training set60 6.105981e-03 0.03341497 0.01657335 0.7683822 399.75129 -787.50258 -767.59386 4,0,0
Training set61 7.918597e-03 0.03223310 0.01733992 0.8410424 404.54182 -793.08364 -766.53868 4,0,2
Training set62 3.580192e-03 0.03210903 0.01545304 0.7266964 410.69767 -803.39533 -773.53225 4,0,3
Training set63 9.682367e-03 0.03234607 0.01684835 0.8031973 407.24757 -794.49514 -761.31394 4,0,4
Training set64 6.558516e-03 0.03333914 0.01646740 0.7571758 401.63677 -793.27354 -776.70751 4,1,0
Training set65 6.614327e-03 0.03334172 0.01646714 0.7576681 401.62115 -791.24231 -771.36307 4,1,1
Training set66 3.601945e-03 0.03225054 0.01523192 0.7204685 407.38418 -800.76837 -777.57593 4,1,2
Training set67 3.435674e-03 0.03038226 0.01636894 0.7939351 417.16875 -818.33749 -791.83184 4,1,3
Training set68 3.441183e-03 0.03057401 0.01590164 0.7605592 416.07385 -814.14770 -784.32885 4,1,4
Training set69 2.783446e-04 0.03453279 0.01699742 0.7813553 391.99285 -773.98571 -757.44437 4,2,0
Training set70 2.922130e-03 0.03191875 0.01548585 0.7279757 407.03673 -802.07347 -782.22386 4,2,1
Training set71 2.921712e-03 0.03191246 0.01550785 0.7286895 407.07153 -800.14306 -776.98519 4,2,2
Training set72 2.757144e-03 0.03032018 0.01638662 0.7906756 414.79253 -813.58506 -787.11892 4,2,3
Training set73 2.776647e-03 0.03052505 0.01588780 0.7626634 413.36293 -808.72586 -778.95146 4,2,4

Related

Finding the precision, recall and the f1 in R

I want to run models on a loop via and then store the performance metrics into a table. I do not want to use the confusionMatrix function in caret, but I want to compute the precision, recall and f1 and then store those in a table. Please assist, edits to the code are welcome.
My attempt is below.
library(MASS) #will load our biopsy data
library(caret)
data("biopsy")
biopsy$ID<-NULL
names(biopsy)<-c('clump thickness','uniformity cell size','uniformity cell shape',
'marginal adhesion','single epithelial cell size','bare nuclei',
'bland chromatin','normal nuclei','mitosis','class')
sum(is.na(biopsy))
biopsy<-na.omit(biopsy)
sum(is.na(biopsy))
head(biopsy,5)
set.seed(123)
inTraining <- createDataPartition(biopsy$class, p = .75, list = FALSE)
training <- biopsy[ inTraining,]
testing <- biopsy[-inTraining,]
# Run algorithms using 10-fold cross validation
control <- trainControl(method="repeatedcv", number=10,repeats = 5, verboseIter = F, classProbs = T)
#CHANGING THE CHARACTERS INTO FACTORS VARAIBLES
training<- as.data.frame(unclass(training),
stringsAsFactors = TRUE)
#CHANGING THE CHARACTERS INTO FACTORS VARAIBLES
testing <- as.data.frame(unclass(testing),
stringsAsFactors = TRUE)
models<-c("svmRadial","rf")
results_table <- data.frame(models = models, stringsAsFactors = F)
for (i in models){
model_train<-train(class~., data=training, method=i,
trControl=control,metric="Accuracy")
predictions<-predict(model_train, newdata=testing)
precision_<-posPredValue(predictions,testing)
recall_<-sensitivity(predictions,testing)
f1<-(2*precision_*recall_)/(precision_+recall_)
# put that in the results table
results_table[i, "Precision"] <- precision_
results_table[i, "Recall"] <- recall_
results_table[i, "F1score"] <- f1
}
However I get an error which says Error in posPredValue.default(predictions, testing) : inputs must be factors. i do not know where I went wrong and any edits to my code are welcome.
I know that I could get precision,recall, f1 by just using the code below (B), however this is a tutorial question where I am required not to use the code example below (B):
(B)
for (i in models){
model_train<-train(class~., data=training, method=i,
trControl=control,metric="Accuracy")
predictions<-predict(model_train, newdata=testing)
print(confusionMatrix(predictions, testing$class,mode="prec_recall"))
}
A few things need to happen.
You have to change the function calls for posPredValue and sensitivity. For both, change testing to testing$class.
for the results_table, i is a word, not a value, so you're assigning results_table["rf", "Precision"] <- precision_ (This makes a new row, where the row name is "rf".)
Here is your for statement, with changes to those functions mentioned in 1) and a modification to address the issue in 2).
for (i in models){
model_train <- train(class~., data = training, method = i,
trControl= control, metric = "Accuracy")
assign("fit", model_train)
predictions <- predict(model_train, newdata = testing)
precision_ <-posPredValue(predictions, testing$class)
recall_ <- sensitivity(predictions, testing$class)
f1 <- (2*precision_ * recall_) / (precision_ + recall_)
# put that in the results table
results_table[results_table$models %in% i, "Precision"] <- precision_
results_table[results_table$models %in% i, "Recall"] <- recall_
results_table[results_table$models %in% i, "F1score"] <- f1
}
This is what it looks like for me.
results_table
# models Precision Recall F1score
# 1 svmRadial 0.9722222 0.9459459 0.9589041
# 2 rf 0.9732143 0.9819820 0.9775785

Plot a Neural Net Curve Using neuralnet and ROCR package

Here I have a classification task and I need to use neuralnet and ROCR packages. The problem is that I got the error messages when I use prediction function.
Here is my code:
#load packages
require(neuralnet)
library(ROCR)
#create data set
train<-read.table(file="train.txt",header=TRUE,sep=",")
test<- read.table(file="test.txt",header=TRUE,sep=",")
#build model and make predictions
nn.sag <- neuralnet(Type ~ Area+Perimeter+Compactness+Length+Width+Asymmetry+Groove, data = train, hidden = 5, algorithm = "sag", err.fct = "sse", linear.output = FALSE)
prob = compute(nn.sag, test[, -ncol(test)] )
prob.result <- prob$net.result
nn.pred = prediction(prob.result, test$Type)
pref <- performance(nn.pred, "tpr", "fpr")
plot(pref)
And here I got the error message for the 'prediction' function:
'$ operator is invalid for atomic vectors'
The dataset looks like (only training dataset here):
Area,Perimeter,Compactness,Length,Width,Asymmetry,Groove,Type
14.8,14.52,0.8823,5.656,3.288,3.112,5.309,1
14.79,14.52,0.8819,5.545,3.291,2.704,5.111,1
14.99,14.56,0.8883,5.57,3.377,2.958,5.175,1
19.14,16.61,0.8722,6.259,3.737,6.682,6.053,0
15.69,14.75,0.9058,5.527,3.514,1.599,5.046,1
14.11,14.26,0.8722,5.52,3.168,2.688,5.219,1
13.16,13.55,0.9009,5.138,3.201,2.461,4.783,1
16.16,15.33,0.8644,5.845,3.395,4.266,5.795,0
15.01,14.76,0.8657,5.789,3.245,1.791,5.001,1
14.11,14.1,0.8911,5.42,3.302,2.7,5,1
17.98,15.85,0.8993,5.979,3.687,2.257,5.919,0
21.18,17.21,0.8989,6.573,4.033,5.78,6.231,0
14.29,14.09,0.905,5.291,3.337,2.699,4.825,1
14.59,14.28,0.8993,5.351,3.333,4.185,4.781,1
11.42,12.86,0.8683,5.008,2.85,2.7,4.607,1
12.11,13.47,0.8392,5.159,3.032,1.502,4.519,1
15.6,15.11,0.858,5.832,3.286,2.725,5.752,0
15.38,14.66,0.899,5.477,3.465,3.6,5.439,0
18.94,16.49,0.875,6.445,3.639,5.064,6.362,0
12.36,13.19,0.8923,5.076,3.042,3.22,4.605,1
14.01,14.29,0.8625,5.609,3.158,2.217,5.132,1
17.12,15.55,0.8892,5.85,3.566,2.858,5.746,0
15.78,14.91,0.8923,5.674,3.434,5.593,5.136,1
16.19,15.16,0.8849,5.833,3.421,0.903,5.307,1
14.43,14.4,0.8751,5.585,3.272,3.975,5.144,1
13.8,14.04,0.8794,5.376,3.155,1.56,4.961,1
14.46,14.35,0.8818,5.388,3.377,2.802,5.044,1
18.59,16.05,0.9066,6.037,3.86,6.001,5.877,0
18.75,16.18,0.8999,6.111,3.869,4.188,5.992,0
15.49,14.94,0.8724,5.757,3.371,3.412,5.228,1
12.73,13.75,0.8458,5.412,2.882,3.533,5.067,1
13.5,13.85,0.8852,5.351,3.158,2.249,5.176,1
14.38,14.21,0.8951,5.386,3.312,2.462,4.956,1
14.86,14.67,0.8676,5.678,3.258,2.129,5.351,1
18.45,16.12,0.8921,6.107,3.769,2.235,5.794,0
17.32,15.91,0.8599,6.064,3.403,3.824,5.922,0
20.2,16.89,0.8894,6.285,3.864,5.173,6.187,0
20.03,16.9,0.8811,6.493,3.857,3.063,6.32,0
18.14,16.12,0.8772,6.059,3.563,3.619,6.011,0
13.99,13.83,0.9183,5.119,3.383,5.234,4.781,1
15.57,15.15,0.8527,5.92,3.231,2.64,5.879,0
16.2,15.27,0.8734,5.826,3.464,2.823,5.527,1
20.97,17.25,0.8859,6.563,3.991,4.677,6.316,0
14.16,14.4,0.8584,5.658,3.129,3.072,5.176,1
13.45,14.02,0.8604,5.516,3.065,3.531,5.097,1
15.5,14.86,0.882,5.877,3.396,4.711,5.528,1
16.77,15.62,0.8638,5.927,3.438,4.92,5.795,0
12.74,13.67,0.8564,5.395,2.956,2.504,4.869,1
14.88,14.57,0.8811,5.554,3.333,1.018,4.956,1
14.28,14.17,0.8944,5.397,3.298,6.685,5.001,1
14.34,14.37,0.8726,5.63,3.19,1.313,5.15,1
14.03,14.16,0.8796,5.438,3.201,1.717,5.001,1
19.11,16.26,0.9081,6.154,3.93,2.936,6.079,0
14.52,14.6,0.8557,5.741,3.113,1.481,5.487,1
18.43,15.97,0.9077,5.98,3.771,2.984,5.905,0
18.81,16.29,0.8906,6.272,3.693,3.237,6.053,0
13.78,14.06,0.8759,5.479,3.156,3.136,4.872,1
14.69,14.49,0.8799,5.563,3.259,3.586,5.219,1
18.85,16.17,0.9056,6.152,3.806,2.843,6.2,0
12.88,13.5,0.8879,5.139,3.119,2.352,4.607,1
12.78,13.57,0.8716,5.262,3.026,1.176,4.782,1
14.33,14.28,0.8831,5.504,3.199,3.328,5.224,1
19.46,16.5,0.8985,6.113,3.892,4.308,6.009,0
19.38,16.72,0.8716,6.303,3.791,3.678,5.965,0
15.26,14.85,0.8696,5.714,3.242,4.543,5.314,1
20.24,16.91,0.8897,6.315,3.962,5.901,6.188,0
19.94,16.92,0.8752,6.675,3.763,3.252,6.55,0
20.71,17.23,0.8763,6.579,3.814,4.451,6.451,0
16.17,15.38,0.8588,5.762,3.387,4.286,5.703,0
13.02,13.76,0.8641,5.395,3.026,3.373,4.825,1
16.53,15.34,0.8823,5.875,3.467,5.532,5.88,0
13.89,14.02,0.888,5.439,3.199,3.986,4.738,1
18.98,16.57,0.8687,6.449,3.552,2.144,6.453,0
17.08,15.38,0.9079,5.832,3.683,2.956,5.484,1
15.03,14.77,0.8658,5.702,3.212,1.933,5.439,1
16.14,14.99,0.9034,5.658,3.562,1.355,5.175,1
18.65,16.41,0.8698,6.285,3.594,4.391,6.102,0
20.1,16.99,0.8746,6.581,3.785,1.955,6.449,0
17.99,15.86,0.8992,5.89,3.694,2.068,5.837,0
15.88,14.9,0.8988,5.618,3.507,0.7651,5.091,1
13.22,13.84,0.868,5.395,3.07,4.157,5.088,1
18.3,15.89,0.9108,5.979,3.755,2.837,5.962,0
19.51,16.71,0.878,6.366,3.801,2.962,6.185,0
The prediction() function is available in both neuralnet and ROCR package in R. So do not load both packages together. First load neuralnet, train your model and then detach it using detach() and then load ROCR package. Try following code:
#load packages
require(neuralnet)
#create data set
train<-read.table(file="train.txt",header=TRUE,sep=",")
test<- read.table(file="test.txt",header=TRUE,sep=",")
#build model and make predictions
nn.sag <- neuralnet(Type ~ Area+Perimeter+Compactness+Length+Width+Asymmetry+Groove, data = train, hidden = 5, algorithm = "sag", err.fct = "sse", linear.output = FALSE)
prob = compute(nn.sag, test[, -ncol(test)] )
prob.result <- prob$net.result
detach(package:neuralnet,unload = T)
library(ROCR)
nn.pred = prediction(prob.result, test$Type)
pref <- performance(nn.pred, "tpr", "fpr")
plot(pref)
Or just simply use ROCR::prediction(prediction(prob.result, test$Type))
For selecting the right package.

How to solve "The data cannot have more levels than the reference" error when using confusioMatrix?

I'm using R programming.
I divided the data as train & test for predicting accuracy.
This is my code:
library("tree")
credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv")
library("caret")
set.seed(1000)
intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE)
train<-credit[intrain, ]
test<-credit[-intrain, ]
treemod<-tree(Creditability~. , data=train)
plot(treemod)
text(treemod)
cv.trees<-cv.tree(treemod,FUN=prune.tree)
plot(cv.trees)
prune.trees<-prune.tree(treemod,best=3)
plot(prune.trees)
text(prune.trees,pretty=0)
install.packages("e1071")
library("e1071")
treepred<-predict(prune.trees, newdata=test)
confusionMatrix(treepred, test$Creditability)
The following error message happens in confusionMatrix:
Error in confusionMatrix.default(rpartpred, test$Creditability) : the data cannot have more levels than the reference
The credit data can download at this site.
http://freakonometrics.free.fr/german_credit.csv
If you look carefully at your plots, you will see that you are training a regression tree and not a classication tree.
If you run credit$Creditability <- as.factor(credit$Creditability) after reading in the data and use type = "class" in the predict function, your code should work.
code:
credit <- read.csv("http://freakonometrics.free.fr/german_credit.csv" )
credit$Creditability <- as.factor(credit$Creditability)
library(caret)
library(tree)
library(e1071)
set.seed(1000)
intrain <- createDataPartition(y = credit$Creditability, p = 0.7, list = FALSE)
train <- credit[intrain, ]
test <- credit[-intrain, ]
treemod <- tree(Creditability ~ ., data = train, )
cv.trees <- cv.tree(treemod, FUN = prune.tree)
plot(cv.trees)
prune.trees <- prune.tree(treemod, best = 3)
plot(prune.trees)
text(prune.trees, pretty = 0)
treepred <- predict(prune.trees, newdata = test, type = "class")
confusionMatrix(treepred, test$Creditability)
I had the same issue in classification. It turns out that there is ZERO observation in a specific group therefore I got the error "the data cannot have more levels than the reference”.
Make sure there all groups in your test set appears in your training set.

Different results with “xgboost” official package vs. xgboost from "caret" package in R

I am new to R programming language and I need to run "xgboost" for some experiments. The problem is that I need to cross-validate the model and get the accuracy and I found two ways that give me different results:
With "caret" using:
library(mlbench)
library(caret)
library(caretEnsemble)
dtrain <- read.csv("student-mat.csv", header=TRUE, sep=";")
formula <- G3~.
dtrain$G3<-as.factor(dtrain$G3)
control <- trainControl(method="cv", number=10)
seed <- 10
metric <- "Accuracy"
fit.xgb <- train(formula, data=dtrain, method="xgbTree", metric=metric, trControl=control, nthread =4)
fit.xgb
fit.xgbl <- train(formula, data=dtrain, method="xgbLinear", metric=metric, trControl=control, nthread =4)
fit.xgbl
Using the "xgboost" package and the following code:
library(xgboost)
printArray <- function(label, array){
cat(paste(label, paste(array, collapse = ", "), sep = ": \n"), "\n\n")
setwd("D:\\datasets")
dtrain <- read.csv("moodle7original(AtributosyNotaNumericos).csv", header=TRUE, sep=",")
label <- as.numeric(dtrain[[33]])
data <- as.matrix(sapply(dtrain, as.numeric))
croosvalid <-
xgb.cv(
data = data,
nfold = 10,
nround = 10,
label = label,
prediction = TRUE,
objective = "multi:softmax",
num_class = 33
)
print(croosvalid)
printArray("Actual classes", label[label != croosvalid\$pred])
printArray("Predicted classes", croosvalid\$pred[label != croosvalid\$pred])
correctlyClassified <- length(label[label == croosvalid\$pred])
incorrectlyClassified <- length(label[label != croosvalid\$pred])
accurancy <- correctlyClassified * 100 / (correctlyClassified + incorrectlyClassified)
print(paste("Accurancy: ", accurancy))
But the results differ very much on the same dataset. I usually get 99% accuracy on student performance dataset with the second snip of code and ~63% with the first one...
I set the same seed on both of them.
Am I wrong with the second? Please tell me why if so!
Two things are different among codes, the first one is the most grave:
When you call label <- as.numeric(dtrain[[11]]) and data <- as.matrix(sapply(dtrain, as.numeric)), the 11th column in data is actually label. Of course you'll get a high accuracy, the label itself is in the data! That's grave leakage, you should instead use data <- as.matrix(sapply(dtrain[,-11L], as.numeric))
A minor difference is that you are using objective = "multi:softmax" in the second code, caret implements objective = "multi:softprob" for multiclass classification. I dunno how much different that might do, but it's different among codes. Check it.

How to incorporate logLoss in caret

I'm attempting to incorporate logLoss as the performance measure used when tuning randomForest (other classifiers) by way of caret (instead of the default options of Accuracy or Kappa).
The first R script executes without error using defaults. However, I get:
Error in { : task 1 failed - "unused argument (model = method)"
when using the second script.
The function logLoss(predict(rfModel,test[,-c(1,95)],type="prob"),test[,95]) works by way of leveraging a separately trained randomForest model.
The dataframe has 100+ columns and 10,000+ rows. All elements are numeric outside of the 9-level categorical "target" at col=95. A row id is located in col=1.
Unfortunately, I'm not correctly grasping the guidance provided by http://topepo.github.io/caret/training.html, nor having much luck via google searches.
Your help are greatly appreciated.
Working R script:
fitControl = trainControl(method = "repeatedcv",number = 10,repeats = 10)
rfGrid = expand.grid(mtry=c(1,9))
rfFit = train(target ~ ., data = train[,-1],method = "rf",trControl = fitControl,verbose = FALSE,tuneGrid = rfGrid)
Not working R script:
logLoss = function(data,lev=NULL,method=NULL) {
lLoss = 0
epp = 10^-15
for (i in 1:nrow(data)) {
index = as.numeric(lev[i])
p = max(min(data[i,index],1-epp),epp)
lLoss = lLoss - log(p)
}
lLoss = lLoss/nrow(data)
names(lLoss) = c("logLoss")
lLoss
}
fitControl = trainControl(method = "repeatedcv",number = 10,repeats = 10,summaryFunction = logLoss)
rfGrid = expand.grid(mtry=c(1,9))
rfFit = train(target ~ ., data = trainBal[,-1],method = "rf",trControl = fitControl,verbose = FALSE,tuneGrid = rfGrid)
I think you should set summaryFunction=mnLogLoss in trainControl and metric="logLoss" in train (I found it here). Like this:
# load libraries
library(caret)
# load the dataset
data(iris)
# prepare resampling method
control <- trainControl(method="cv", number=5, classProbs=TRUE, summaryFunction=mnLogLoss)
set.seed(7)
fit <- train(Species~., data=iris, method="rf", metric="logLoss", trControl=control)
# display results
print(fit)
Your argument name is not correct (i.e. "unused argument (model = method)"). The webpage says that the last function argument should be called model and not method.

Resources