Caret error: "all the Accuracy metric values are missing" - r

I'm getting the following error and I don't know what may have gone wrong.
I'm using R Studio with the 3.1.3 version of R for Windows 8.1 and using the Caret package for datamining.
I have the following training data:
str(training)
'data.frame': 212300 obs. of 21 variables:
$ FL_DATE_MDD_MMDD : int 101 101 101 101 101 101 101 101 101 101 ...
$ FL_DATE : int 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 ...
$ UNIQUE_CARRIER : Factor w/ 13 levels "9E","AA","AS",..: 11 10 2 5 8 9 11 10 10 10 ...
$ DEST : Factor w/ 150 levels "ABE","ABQ","ALB",..: 111 70 82 8 8 31 110 44 53 80 ...
$ DEST_CITY_NAME : Factor w/ 148 levels "Akron, OH","Albany, NY",..: 107 61 96 9 9 29 106 36 97 78 ...
$ ROUNDED_TIME : int 451 451 551 551 551 551 551 551 551 551 ...
$ CRS_DEP_TIME : int 500 520 600 600 600 600 600 600 602 607 ...
$ DEP_DEL15 : Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 2 1 1 ...
$ CRS_ARR_TIME : int 746 813 905 903 855 815 901 744 901 841 ...
$ Conditions : Factor w/ 28 levels "Blowing Snow",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Dew.PointC : num -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 ...
$ Events : Factor w/ 10 levels "","Fog","Fog-Rain",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Gust.SpeedKm.h : num NA NA NA NA NA NA NA NA NA NA ...
$ Humidity : int 68 68 71 71 71 71 71 71 71 71 ...
$ Precipitationmm : num NA NA NA NA NA NA NA NA NA NA ...
$ Sea.Level.PressurehPa: num 1021 1021 1022 1022 1022 ...
$ TemperatureC : num -9.4 -9.4 -10 -10 -10 -10 -10 -10 -10 -10 ...
$ VisibilityKm : num 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 ...
$ Wind.Direction : Factor w/ 18 levels "Calm","East",..: 9 9 7 7 7 7 7 7 7 7 ...
$ WindDirDegrees : int 320 320 330 330 330 330 330 330 330 330 ...
$ Wind.SpeedKm.h : num 20.4 20.4 13 13 13 13 13 13 13 13 ...
- attr(*, "na.action")=Class 'omit' Named int [1:22539] 3 32 45 87 94 325 472 548 949 1333 ...
.. ..- attr(*, "names")= chr [1:22539] "3" "32" "45" "87" ...
and when I execute the following command:
ldaModel <- train(DEP_DEL15~.,data=training,method="lda",preProc=c("center","scale"),na.remove=TRUE)
I get:
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error in train.default(x, y, weights = w, ...) : Stopping

It is probably due to having about outcome factor with levels "0" and "1".
There is a specific warning issued when this happens: At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1"
It seems that people uniformly ignore warnings so I'm going to make this throw an error in the next version.

If the variables Gust.SpeedKm.h and Precipitationmm contain only NA's try omitting those variables from your data before running the model. If they contain partial NA's and you think they could have predictive value as features then use imputation. Follow this documentation for pre-processing in caret, including imputation.

Related

Agricolae, tapply, error: arguments must have same length

I am new to R and I am having issues moving forward with the data analysis. My Excel data has a lot of NA's and I tried troubleshooting this error. Here's my code if anyone can help, and a link to a sample of my data
file:///C:/Users/steph/Documents/DLI%20ANOVA%20Sample.htm
Some of my variables have 4 reps instead of all 8reps, so I have a lot of NA's in the excel file. I keep getting this error after I try tapply:
Error in tapply(X = data1$gi..m3., INDEX = data1$cultivar, FUN = mean, :
arguments must have same length
library(agricolae)
data1=read.csv("DLI ANOVA Sample.csv", header=T, as.is=T)
#setting factors
block = as.factor(data1$block)
treatmentt = as.factor(data1$trt)
cultivar<-factor(data1$cv,c("CR", "LB","RF","RR","S","SNS","SNY","SSJ","YC"))
str(data1)
#Summary statistics
tapply(X = data1$growth.index, INDEX = data1$cultivar, FUN = mean, na.rm=T)
tapply(X = data1$growth.index, INDEX = data1$treatment, FUN = mean, na.rm=T)
data.frame': 288 obs. of 24 variables:
$ block : int 1 1 2 2 3 3 4 4 1 1 ...
$ trt : chr "HL-L" "HL-L" "HL-L" "HL-L" ..
$ cv : chr "CR" "CR" "CR" "CR" ...
$ rep : int 1 2 3 4 5 6 7 8 1 2 ...
$ height : int 23 20 25 19 23 19 22 19 19 24
$ growth.index : num 0.0221 0.0258 0.0276 0.0227 0.0209
$ number.of.mature.fruit : int 34 30 35 34 28 25 40 24 12 16 ...
$ mature.fruit.fw : num 163 163 186 152 169 ...
$ number.of.immature.fruit : int 38 28 40 27 35 37 44 48 20 30 ...
$ immature.fruit.fw : num 77.4 66.6 87.6 43.4 81.3 ...
$ Total.number.of.fruit : num 72 58 75 61 63 62 84 72 32 46 ...
$ Total.fruit.fw : num 241 230 273 195 250 ...
$ Fruit.Water.Content..g. : num NA 209 NA 176 NA ...
$ Brix.. : num 4.9 NA 5.6 NA 4.7 NA 5.1 NA 5.6 NA ...
$ pH : num 4.17 NA 4.3 NA 4.1 ...
$ EC.uS.mL : num 4.46 NA 9.19 NA 8.24 ...
$ X..citric.Acid : num 0.704 NA 0.397 NA 0.653 ...
$ Sugar.Acid.Ratio : num 6.96 NA 14.11 NA 7.2 ...
$ oedema.injury.level..1.6. : int 3 3 1 2 1 1 1 2 2 1 ...
$ Stomatal.conductance : num NA 365 NA 422 NA ...
$ spad : num NA NA NA 64.3 NA 65.5 NA 68.7 NA 55.6 ...
$ Irrigation.Events : int NA 14 NA 12 NA 13 NA 16 NA 13 ...
$ WUE : num NA 0.00584 NA 0.00693 NA ...
$ transpiration..g.H2O.lost..g.dry.biomass.: num NA 117 NA 111 NA ...

leaflet, Error: cannot allocate vector of size 177.2 Mb

I have tried everything I can think of to fix this error but I have not been able to figure it out. 32 bit machine, trying to build a choropleth. The data file is pretty basic some municipal IDs with population figures associated with it. The shape file is taken from here: www.ontario.ca/data/municipal-boundaries
library('tmap')
library('leaflet')
library('magrittr')
library('rio')
library('plyr')
library('scales')
library('htmlwidgets')
library('tmaptools')
setwd("C:/Users/rdhasa/desktop")
datafile <- "shapefiles2/Population - 2014.csv"
Pop2014 <- rio::import(datafile)
Pop2014$Population <- as.factor(Pop2014$Population)
str(Pop2014)
'data.frame': 454 obs. of 9 variables:
$ MUNID : int 20002 18000 18013 18001 18005 18017 18009 18039 18020 18029 ...
$ YEAR : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
$ MAH CODE : int 1106 10000 10101 10102 10401 10402 10404 10601 10602 10603 ...
$ V4 : int 1999 1800 1813 1801 1805 1817 1809 1839 1820 1829 ...
$ Municipality: chr "Toronto C" "Durham R" "Oshawa C" "Pickering C" ...
$ Tier : chr "ST" "UT" "LT" "LT" ...
$ A : int 11 11 11 11 11 11 11 11 11 11 ...
$ B : chr "a" "a" "a" "a" ...
$ Population : Factor w/ 438 levels "-","1,006","1,026",..: 160 359 117 432 86 419 97 73 179 171 ...
mnshape <- "shapefiles2/MUNICIPAL_BOUNDARY_LOWER_AND_SINGLE_TIER.shp"
mngeo2 <- read_shape(file=mnshape)
str(mngeo2#data)
'data.frame': 683 obs. of 13 variables:
$ MUNID : int 1002 1002 1002 1009 1009 1009 1016 1016 1016 1026 ...
$ MAH_CODE : int 71616 71616 71616 71618 71618 71618 71614 71614 71614 71613 ...
$ SGC_CODE : int 1005 1005 1005 1011 1011 1011 1020 1020 1020 1030 ...
$ ASSESSMENT: int 101 101 101 406 406 406 506 506 506 511 ...
$ LEGAL_NAME: Factor w/ 414 levels "CITY OF BARRIE",..: 369 369 369 370 370 370 96 96 96 334 ...
$ STATUS : Factor w/ 2 levels "LOWER TIER","SINGLE TIER": 1 1 1 1 1 1 1 1 1 1 ...
$ EXTENT : Factor w/ 3 levels "ISLANDS","LAND",..: 1 2 3 1 2 3 1 2 3 2 ...
$ MSO : Factor w/ 4 levels "CENTRAL","EASTERN",..: 2 2 2 2 2 2 2 2 2 2 ...
$ NAME_PREFI: Factor w/ 8 levels "-","CITY OF",..: 6 6 6 6 6 6 4 4 4 6 ...
$ UPPER_TIER: Factor w/ 30 levels "BRUCE","DUFFERIN",..: 27 27 27 27 27 27 27 27 27 27 ...
$ NAME : Factor w/ 413 levels "ADDINGTON HIGHLANDS",..: 339 339 339 342 342 342 337 337 337 259 ...
$ Shape_Leng: num 0.115 1.622 1.563 0.551 1.499 ...
$ Shape_Area: num 2.32e-05 6.95e-02 7.51e-03 5.63e-04 5.09e-02 ...
mnmap <- append_data(mngeo2, Pop2014, key.shp = "MUNID", key.data="MUNID")
minPct <- min(c(mnmap#data$Population))
maxPct <- max(c(mnmap#data$Population))
paletteLayers <- colorBin(palette = "RdBu", domain = c(minPct, maxPct), bins = c(0, 50000,200000 ,500000, 1000000, 2000000) , pretty=FALSE)
rm(mngeo2)
rm(Pop2014)
rm(mnshape)
rm(datafile)
rm(maxPct)
rm(minPct)
gc()
leaflet(mnmap) %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(stroke=TRUE,
smoothFactor = 0.2,
weight = 1,
fillOpacity = .6)
Error: cannot allocate vector of size 177.2 Mb
Is there I can maybe safe space through simplfying the shape file. If so how would I go about doing this efficiently?
THanks

How to deal with " rank-deficient fit may be misleading" in R?

I'm trying to predict the values of test data set based on train data set, it is predicting the values (no errors) however the predictions deviate A LOT by the original values. Even predicting values around -356 although none of the original values exceeds 200 (and there are no negative values). The warning is bugging me as I think the values deviates a lot because of this warning.
Warning message:
In predict.lm(fit2, data_test) :
prediction from a rank-deficient fit may be misleading
any way I can get rid of this warning? the code is simple
fit2 <- lm(runs~., data=train_data)
prediction<-predict(fit2, data_test)
prediction
I searched a lot but tbh I couldn't understand much about this error.
str of test and train data set in case someone needs them
> str(train_data)
'data.frame': 36 obs. of 28 variables:
$ matchid : int 57 58 55 56 53 54 51 52 45 46 ...
$ TeamName : chr "South Africa" "West Indies" "South Africa" "West Indies" ...
$ Opp_TeamName : chr "West Indies" "South Africa" "West Indies" "South Africa" ...
$ TeamRank : int 4 3 4 3 4 3 10 7 5 1 ...
$ Opp_TeamRank : int 3 4 3 4 3 4 7 10 1 5 ...
$ Team_Top10RankingBatsman : int 0 1 0 1 0 1 0 0 2 2 ...
$ Team_Top50RankingBatsman : int 4 6 4 6 4 6 3 5 4 3 ...
$ Team_Top100RankingBatsman: int 6 8 6 8 6 8 7 7 7 6 ...
$ Opp_Top10RankingBatsman : int 1 0 1 0 1 0 0 0 2 2 ...
$ Opp_Top50RankingBatsman : int 6 4 6 4 6 4 5 3 3 4 ...
$ Opp_Top100RankingBatsman : int 8 6 8 6 8 6 7 7 6 7 ...
$ InningType : chr "1st innings" "2nd innings" "1st innings" "2nd innings" ...
$ Runs_OverAll : num 361 705 348 630 347 ...
$ AVG_Overall : num 27.2 20 23.3 19.1 24 ...
$ SR_Overall : num 128 121 120 118 118 ...
$ Runs_Last10Matches : num 118.5 71 102.1 71 78.6 ...
$ AVG_Last10Matches : num 23.7 20.4 20.9 20.4 23.2 ...
$ SR_Last10Matches : num 120 106 114 106 116 ...
$ Runs_BatingFirst : num 236 459 230 394 203 ...
$ AVG_BatingFirst : num 30.6 23.2 24 21.2 27.1 ...
$ SR_BatingFirst : num 127 136 123 125 118 ...
$ Runs_BatingSecond : num 124 262 119 232 144 ...
$ AVG_BatingSecond : num 25.5 18.3 22.8 17.8 22.8 ...
$ SR_BatingSecond : num 125 118 112 117 114 ...
$ Runs_AgainstTeam2 : num 88.3 118.3 76.3 103.9 49.3 ...
$ AVG_AgainstTeam2 : num 28.2 23 24.7 22.1 16.4 ...
$ SR_AgainstTeam2 : num 139 127 131 128 111 ...
$ runs : int 165 168 231 236 195 126 143 141 191 135 ...
> str(data_test)
'data.frame': 34 obs. of 28 variables:
$ matchid : int 59 60 61 62 63 64 65 66 69 70 ...
$ TeamName : chr "India" "West Indies" "England" "New Zealand" ...
$ Opp_TeamName : chr "West Indies" "India" "New Zealand" "England" ...
$ TeamRank : int 2 3 5 1 4 8 6 2 10 1 ...
$ Opp_TeamRank : int 3 2 1 5 8 4 2 6 1 10 ...
$ Team_Top10RankingBatsman : int 1 1 2 2 0 0 1 1 0 2 ...
$ Team_Top50RankingBatsman : int 5 6 4 3 4 2 5 5 3 3 ...
$ Team_Top100RankingBatsman: int 7 8 7 6 6 5 7 7 7 6 ...
$ Opp_Top10RankingBatsman : int 1 1 2 2 0 0 1 1 2 0 ...
$ Opp_Top50RankingBatsman : int 6 5 3 4 2 4 5 5 3 3 ...
$ Opp_Top100RankingBatsman : int 8 7 6 7 5 6 7 7 6 7 ...
$ InningType : chr "1st innings" "2nd innings" "2nd innings" "1st innings" ...
$ Runs_OverAll : num 582 618 470 602 509 ...
$ AVG_Overall : num 25 21.8 20.3 20.7 19.6 ...
$ SR_Overall : num 113 120 123 120 112 ...
$ Runs_Last10Matches : num 182 107 117 167 140 ...
$ AVG_Last10Matches : num 37.1 43.8 21 24.9 27.3 ...
$ SR_Last10Matches : num 111 153 122 141 120 ...
$ Runs_BatingFirst : num 319 314 271 345 294 ...
$ AVG_BatingFirst : num 23.6 17.8 20.6 20.3 19.5 ...
$ SR_BatingFirst : num 116.9 98.5 118 124.3 115.8 ...
$ Runs_BatingSecond : num 264 282 304 256 186 ...
$ AVG_BatingSecond : num 28 23.7 31.9 21.6 16.5 ...
$ SR_BatingSecond : num 96.5 133.9 129.4 112 99.5 ...
$ Runs_AgainstTeam2 : num 98.2 95.2 106.9 75.4 88.5 ...
$ AVG_AgainstTeam2 : num 45.3 42.7 38.1 17.7 27.1 ...
$ SR_AgainstTeam2 : num 125 138 152 110 122 ...
$ runs : int 192 196 159 153 122 120 160 161 70 145 ...
In simple word, how can I get rid of this warning so that it doesn't effect my predictions?
(Intercept) matchid TeamNameBangladesh
1699.98232628 -0.06793787 59.29445330
TeamNameEngland TeamNameIndia TeamNameNew Zealand
347.33030177 -499.40074338 -179.19192936
TeamNamePakistan TeamNameSouth Africa TeamNameSri Lanka
-272.71610614 -3.54867488 -45.27920191
TeamNameWest Indies Opp_TeamNameBangladesh Opp_TeamNameEngland
-345.54349798 135.05901017 108.04227770
Opp_TeamNameIndia Opp_TeamNameNew Zealand Opp_TeamNamePakistan
-162.24418387 -60.55364436 -114.74599364
Opp_TeamNameSouth Africa Opp_TeamNameSri Lanka Opp_TeamNameWest Indies
196.90856999 150.70170068 -6.88997714
TeamRank Opp_TeamRank Team_Top10RankingBatsman
NA NA NA
Team_Top50RankingBatsman Team_Top100RankingBatsman Opp_Top10RankingBatsman
NA NA NA
Opp_Top50RankingBatsman Opp_Top100RankingBatsman InningType2nd innings
NA NA 24.24029455
Runs_OverAll AVG_Overall SR_Overall
-0.59935875 20.12721378 -13.60151334
Runs_Last10Matches AVG_Last10Matches SR_Last10Matches
-1.92526750 9.24182916 1.23914363
Runs_BatingFirst AVG_BatingFirst SR_BatingFirst
1.41001672 -9.88582744 -6.69780509
Runs_BatingSecond AVG_BatingSecond SR_BatingSecond
-0.90038727 -7.11580086 3.20915976
Runs_AgainstTeam2 AVG_AgainstTeam2 SR_AgainstTeam2
3.35936312 -5.90267210 2.36899131
You can have a look at this detailed discussion :
predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading
In general, multi-collinearity can lead to a rank deficient matrix in logistic regression.
You can try applying PCA to tackle the multi-collinearity issue and then apply logistic regression afterwards.

"length of 'dimnames' [1] not equal to array extent" error in linear regression summary in r

I'm running a straightforward linear regression model fit on the following dataframe:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tues"<..: 2 3 4 5 6 7 1 2 3 4 ...
$ MONTH : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 3 3 3 3 3 3 3 3 3 3 ...
I split the data in to training and test sets as follows using the caret package:
split<-createDataPartition(y = model_data_rev$EXIT_20, p = 0.7, list = FALSE)
d_training = model_data_rev[split,]
d_test = model_data_rev[-split,]
I train the model using the train function in the caret package:
ctrl<-trainControl(method = 'cv',number = 5)
lmCVFit<-train(EXIT_20 ~ ., data = d_training, method = 'lm', trControl = ctrl, metric='Rsquared')
summary(lmCVFit)
When I run summary(lmCVFit) I get the following error:
Error in summary.lm(object$finalModel, ...) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning message:
In cbind(est, se, tval, 2 * pt(abs(tval), rdf, lower.tail = FALSE)) :
number of rows of result is not a multiple of vector length (arg 1)
I thought it might be the related to the my initial dataframe above. Specifically, i thought it could have to do with the factor variables. So I cut them off (not shown), ran everything again, and got the same error.
I also ran the regression without CV using the 'lm' function in R and got the same error when I ran summary()
Has anyone seen this and can anyone help? I can't find anything on line that speaks to this error in the context of regression.
Thanks in advance.
EDIT
I modified the ordinal variable to standard character variables. Structure now looks like this:
> str(model_data_rev)
'data.frame': 128857 obs. of 12 variables:
$ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ...
$ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ...
$ ENTRY_12: num 2853 2989 3174 3027 3028 ...
$ ENTRY_16: num 2858 3028 3075 2992 3419 ...
$ ENTRY_20: num 7260 7188 7587 7560 7165 ...
$ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ...
$ EXIT_8 : num 1501 1631 1594 1576 1536 ...
$ EXIT_12 : num 3862 3923 4158 3970 3895 ...
$ EXIT_16 : num 1559 1539 1737 1681 1795 ...
$ EXIT_20 : num 2145 2310 2217 2330 2291 ...
$ DAY : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 4 2 6 7 ...
$ MONTH : Factor w/ 12 levels "April","August",..: 8 8 8 8 8 8 8 8 8 8 ...
I still get the error when running summary after fitting the model.
It is also important emphasize that the model fitting works without throwing an error. It is summary() that is throwing off the error.
Thanks.

Error in evalSummaryFunction Caret R

I have this data set
'data.frame': 212300 obs. of 19 variables:
$ FL_DATE_MDD_MMDD : int 101 101 101 101 101 101 101 101 101 101 ...
$ FL_DATE : int 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 ...
$ UNIQUE_CARRIER : Factor w/ 13 levels "9E","AA","AS",..: 11 10 2 5 8 9 11 10 10 10 ...
$ DEST : Factor w/ 150 levels "ABE","ABQ","ALB",..: 111 70 82 8 8 31 110 44 53 80 ...
$ DEST_CITY_NAME : Factor w/ 148 levels "Akron, OH","Albany, NY",..: 107 61 96 9 9 29 106 36 97 78 ...
$ ROUNDED_TIME : int 451 451 551 551 551 551 551 551 551 551 ...
$ CRS_DEP_TIME : int 500 520 600 600 600 600 600 600 602 607 ...
$ DEP_DEL15 : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 1 1 2 1 1 ...
$ CRS_ARR_TIME : int 746 813 905 903 855 815 901 744 901 841 ...
$ Conditions : Factor w/ 28 levels "Blowing Snow",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Dew.PointC : num -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 -14.4 ...
$ Events : Factor w/ 10 levels "","Fog","Fog-Rain",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Humidity : int 68 68 71 71 71 71 71 71 71 71 ...
$ Sea.Level.PressurehPa: num 1021 1021 1022 1022 1022 ...
$ TemperatureC : num -9.4 -9.4 -10 -10 -10 -10 -10 -10 -10 -10 ...
$ VisibilityKm : num 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 16.1 ...
$ Wind.Direction : Factor w/ 18 levels "Calm","East",..: 9 9 7 7 7 7 7 7 7 7 ...
$ WindDirDegrees : int 320 320 330 330 330 330 330 330 330 330 ...
$ Wind.SpeedKm.h : num 20.4 20.4 13 13 13 13 13 13 13 13 ...
- attr(*, "na.action")=Class 'omit' Named int [1:22539] 3 32 45 87 94 325 472 548 949 1333 ...
.. ..- attr(*, "names")= chr [1:22539] "3" "32" "45" "87" ...
and when I execute the following in Caret
plsFit3x10cv <-train(DEP_DEL15~., data=training3, method="pls",trControl=ctrl,metric="ROC",preProc=c("center","scale"))
I get the error:
Error in evalSummaryFunction(y, wts = weights, ctrl = trControl, lev = classLevels, :
train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
The answer to your question is in the error message. It says train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl(). So, you need to use classProbs = TRUE in trainControl(), and of course, set summaryFunction = twoClassSummary (if you have not already done so).
ctrl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary)
plsFit3x10cv <-train(DEP_DEL15~.,
data=training3,
method="pls",
preProc=c("center","scale"),
metric="ROC",
trControl=ctrl)

Resources