How to perform sensitivity analysis using Lek's profile in R? - r

I am trying to do sensitivity analysis using R. My data set has few continuous explanatory variables and a categorical response variable (7 categories).
I tried to run the below mentioned code.
model=train(factor(mode)~Time+Cost+Age+Income,
method="nnet",
preProcess("center","scale"),
data=train,
verbose=F,
trControl=trainControl(method='cv', verboseIter = F),
tuneGrid=expand.grid(.size=c(1:20), .decay=c(0,0.001,0.01,0.1)))
After getting the output through this code, I tried to develop Lek's profile using the below mentioned code.
Lekprofile(model)
However, I got the error stating "Errors in xvars[, x_names]: subscript out of bound"
Please help me to resolve the error.

It doesn't work for a classification model , for example, if we use a regression model:
library(caret)
library(NeuralNetTools)
library(mlbench)
data(BostonHousing)
str(BostonHousing)
'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
$ chas : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : num 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311 311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
$ b : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
We train the model, exclude the categorical chas:
model = train(medv ~ .,data=BostonHousing[,-4],method="nnet",
trControl=trainControl(method="cv",number=10),
tuneGrid=data.frame(size=c(5,10,20),decay=0.1))
lekprofile(model)
You can see the y-axis is meant to be continuous. We can try to discretize our response variable medv and you can see it crashes:
BostonHousing$medv = cut(BostonHousing$medv,4)
model = train(medv ~ .,data=BostonHousing[,-4],method="nnet",
trControl=trainControl(method="cv",number=10),
tuneGrid=data.frame(size=c(5,10,20),decay=0.1))
lekprofile(model)
Error in `[.data.frame`(preds, , ysel, drop = FALSE) :
undefined columns selected

Related

"'groups' must be a factor" on Shapiro-Wilk test on Rcmdr

I am trying to run a shapiro-wilk normality test on R (Rcmdr to be more accurate) by going to "Statistics=>Summary=>Descriptive statistics" and then selecting one of my dependent variable and choosing "summary by group".
Rcmdr automatically triggers the following code :
normalityTest(Algometre.J0 ~ Modalite, test="shapiro.test",
data=Dataset)
And I am getting the following error message :
'groups' must be a factor.
I have already categorized my independant variable as a factor (I swear, I did !)
Any idea what's wrong ?
Thanx in advance
Here is what str(Dataset) shows :
'data.frame': 76 obs. of 11 variables:
$ Modalite : chr "C" "C" "C" "C" ...
$ Angle.J0 : num 20.1 20.5 21 22.5 19.1 ...
$ Angle.J1 : num 21.7 22.6 22.8 23.3 20.5 ...
$ Angle.J2 : num 22.3 23 23.9 24.2 21 ...
$ Epaisseur.J0: num 1.97 1.54 1.76 1.89 1.53 1.87 1.54 2 1.79 1.41 ...
$ Epaisseur.J1: num 2.07 1.49 1.87 1.91 1.54 1.9 1.51 2.03 1.71 1.48 ...
$ Epaisseur.J2: num 2.08 1.69 1.77 2 1.61 1.99 1.38 2.06 1.86 1.53 ...
$ Algometre.J0: num 45 40 105 165 66.3 ...
$ Algometre.J1: num 32.7 39.7 91.7 124 63.7 ...
$ Algometre.J2: num 51.3 58.7 101 138 60.3 ...
$ ObsNumber : int 1 2 3 4 5 6 7 8 9 10 ...
What does that mean ?

How to normalize all variables in an R dataframe (except for the one variable that's a factor)

I'm having difficulty applying the max-min normalize function to the predictor variables (30 of them) in my data frame without excluding the diagnosis variable (as it is a factor and not subject to the function) from the data frame.
```{r}
cancer_data <- as.data.frame(lapply(cancer_data, normalize))
```
This won't run bc it will prompt an error message referencing the factor column, but I don't want the new data frame to be created without that column. I would just like to apply the normalize function I created to the 30 predictor variables.
Here is the structure of my data frame if it provides helpful context at all:
str(cancer_data)
## 'data.frame': 569 obs. of 31 variables:
## $ diagnosis : Factor w/ 2 levels "Benign","Malignant": 1 1 1 1 1 1 1 2 1 1 ...
## $ radius_mean : num 12.3 10.6 11 11.3 15.2 ...
## $ texture_mean : num 12.4 18.9 16.8 13.4 13.2 ...
## $ perimeter_mean : num 78.8 69.3 70.9 73 97.7 ...
## $ area_mean : num 464 346 373 385 712 ...
## $ smoothness_mean : num 0.1028 0.0969 0.1077 0.1164 0.0796 ...
## $ compactness_mean : num 0.0698 0.1147 0.078 0.1136 0.0693 ...
## $ concavity_mean : num 0.0399 0.0639 0.0305 0.0464 0.0339 ...
## $ points_mean : num 0.037 0.0264 0.0248 0.048 0.0266 ...
## $ symmetry_mean : num 0.196 0.192 0.171 0.177 0.172 ...
## $ dimension_mean : num 0.0595 0.0649 0.0634 0.0607 0.0554 ...
## $ radius_se : num 0.236 0.451 0.197 0.338 0.178 ...
## $ texture_se : num 0.666 1.197 1.387 1.343 0.412 ...
## $ perimeter_se : num 1.67 3.43 1.34 1.85 1.34 ...
## $ area_se : num 17.4 27.1 13.5 26.3 17.7 ...
## $ smoothness_se : num 0.00805 0.00747 0.00516 0.01127 0.00501 ...
## $ compactness_se : num 0.0118 0.03581 0.00936 0.03498 0.01485 ...
## $ concavity_se : num 0.0168 0.0335 0.0106 0.0219 0.0155 ...
## $ points_se : num 0.01241 0.01365 0.00748 0.01965 0.00915 ...
## $ symmetry_se : num 0.0192 0.035 0.0172 0.0158 0.0165 ...
## $ dimension_se : num 0.00225 0.00332 0.0022 0.00344 0.00177 ...
## $ radius_worst : num 13.5 11.9 12.4 11.9 16.2 ...
## $ texture_worst : num 15.6 22.9 26.4 15.8 15.7 ...
## $ perimeter_worst : num 87 78.3 79.9 76.5 104.5 ...
## $ area_worst : num 549 425 471 434 819 ...
## $ smoothness_worst : num 0.139 0.121 0.137 0.137 0.113 ...
## $ compactness_worst: num 0.127 0.252 0.148 0.182 0.174 ...
## $ concavity_worst : num 0.1242 0.1916 0.1067 0.0867 0.1362 ...
## $ points_worst : num 0.0939 0.0793 0.0743 0.0861 0.0818 ...
## $ symmetry_worst : num 0.283 0.294 0.3 0.21 0.249 ...
## $ dimension_worst : num 0.0677 0.0759 0.0788 0.0678 0.0677 ...
Assuming you already have normalize function in your environment. You can get the numeric variables in your data and apply the function to selected columns using lapply.
cols <- sapply(cancer_data, is.numeric)
cancer_data[cols] <- lapply(cancer_data[cols], normalize)
Or without creating cols.
cancer_data[] <- lapply(cancer_data, function(x)
if(is.numeric(x)) normalize(x) else x)
If you want to exclude only 1st column, you can also use :
cancer_data[-1] <- lapply(cancer_data[-1], normalize)
This should work, but do look into tidymodels
Thanks to akrun for the new shorter answer.
library(tidyverse)
cancer_data <-cancer_data %>% mutate_if(negate(is.factor), normalize)

How to create a loop to run 2 variable generalized regression models?

I have 19 variables and I want to run 19 different regressions that consist of 2 independent variables from my dataset.
*Update -This is my dataset's structure:
$ Failure_Response_Var_Yr: num 0 0 0 0 0 0 0 0 0 0 ...
$ exp_var_nocorr_2 : num 4.61 5.99 6.13 3.17 4.4 ...
$ exp_var_nocorr_3 : num 4.16 5.46 5.24 2.86 3.72 ...
$ exp_var_nocorr_4 : num 0.00191 2.23004 0.5613 1.07986 0.99836 ...
$ exp_var_nocorr_5 : num 0.709 2.79 6.846 15.478 11.418 ...
$ exp_var_nocorr_6 : num 0.724 0.497 1.782 0.156 2.525 ...
$ exp_var_nocorr_7 : num 0 168.17 92.041 0.584 265.338 ...
$ exp_var_nocorr_8 : num -38.64 4.89 1.5 24.8 16.56 ...
$ exp_var_nocorr_9 : num 116 88.3 56.4 60.6 57.6 ...
$ exp_var_nocorr_10 : num 0 10.3 0 93.7 0 ...
$ exp_var_nocorr_11 : num 1.02 1.23 1.31 2.06 1.33 ...
$ exp_var_nocorr_12 : num 60 140 124 275 203 ...
$ exp_var_nocorr_13 : num 10.835 5.175 1.838 0.347 0.783 ...
$ exp_var_nocorr_14 : num 59 60.2 87.2 42.2 84.2 ...
$ exp_var_nocorr_15 : num 61.9 68.3 99 50.2 103.9 ...
$ exp_var_nocorr_16 : num 4.4 11.24 8.23 6.9 8.84 ...
$ exp_var_nocorr_17 : num 6.43 18.62 10.72 15.62 10.35 ...
I wrote this code:
col17 <- names(my.sample)[-c(1:9,26:29)]
Such that now dput(col17) gives out:
c("exp_var_nocorr_2", "exp_var_nocorr_3", "exp_var_nocorr_4", "exp_var_nocorr_5", "exp_var_nocorr_6", "exp_var_nocorr_7", "exp_var_nocorr_8", "exp_var_nocorr_9", "exp_var_nocorr_10", "exp_var_nocorr_11", "exp_var_nocorr_12", "exp_var_nocorr_13", "exp_var_nocorr_14", "exp_var_nocorr_15", "exp_var_nocorr_16", "exp_var_nocorr_17" )
`logit.test2 <- vector("list", length(col17))
#start of loop #
for(i in seq_along(col17)){
for(k in seq_along(col17)){
logit.test2[i] <- glm(reformulate(col17[i]+col17[k], "Failure_Response_Var_Yr"),
family=binomial(link='logit'), data=my.sample)
}
}`
# end of loop #
but it printed out this problem:
"Error in col17[i] + col17[k] : non-numeric argument to binary operator"
Can anybody hand me out a code that can fix this problem?

Error in r: undefined columns selected

I was trying to do a partition plot, and I used the following codes:
install.packages('klaR')
library(klaR)
partimat(Type~. , data = training, method = "lda")
partimat('Type'~. , data = training, method = "qda")
R gave me this error code:
Error in `[.data.frame`(m, xvars) : undefined columns selected
and my data is like this
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 178 obs. of 13 variables:
$ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
$ Malic acid : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
$ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
$ Alcalinity of ash : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
$ Magnesium : int 127 100 101 113 118 112 96 121 97 98 ...
$ Total phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
$ Flavanoids : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
$ Nonflavanoid phenols: num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
$ Proanthocyanins : num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
$ Color intensity : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
$ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
$ Proline : int 1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ...
$ Type : int 1 1 1 1 1 1 1 1 1 1 ...
Please let me know how to solve it!
There is no Type variable in the UCI Machine Learning Wine data set. The classification variable is class, and it is the first column in the data set.
# data source: UCI ML Repository Wine data
# https://archive.ics.uci.edu/ml/datasets/wine
library(klaR)
colNames <- c("class","alcohol","malicAcid","ash","acalinityOfAsh",
"magnesium","totalPhenols","flavanoids","nonflavanoidPhenols",
"proanthocyanins","colorIntensity","hue","od280.od315OfDilutedWines",
"proline")
wine <- read.csv("./data/wine.csv",header=FALSE,col.names=colNames)
wine$class <- as.factor(wine$class)
partimat(class ~ alcohol + malicAcid, data=wine, method="lda",plot.matrix=FALSE)
...and the output:
I had the same problem and I could fix it by changing the name of my varibles. In my data set I had a variable whose name had a blank space at the beginning. The program could not recognize it and that triggered the error. I removed that blank space and the problem disappeared.

Error using IRMI imputation

I've been trying to use irmi from VIM package to replace NA's.
My data looks something like this:
> str(sub_mex)
'data.frame': 21 obs. of 83 variables:
$ pH : num 7.2 7.4 7.4 7.36 7.2 7.82 7.67 7.73 7.79 7.7 ...
$ Cond : num 1152 1078 1076 1076 1018 ...
$ CO3 : num NA NA NA NA NA ...
$ Mg : num 25.8 24.9 24.3 24.8 23.4 ...
$ NO3 : num 49.7 25.6 27.1 39.6 52.8 ...
$ Cd : num 0.0088 0.0104 0.0085 0.0092 0.0086 ...
$ As_H : num 0.006 0.0059 0.0056 0.0068 0.0073 ...
$ As_F : num 0.0056 0.0058 0.0057 0.0066 0.0065 0.004 0.004 0.004 0.0048 0.0078 ...
$ As_FC : num NA NA NA NA NA NA NA NA NA 0.0028 ...
$ Pb : num 0.0097 0.0096 0.0092 0.01 0.0093 0.0275 0.024 0.0255 0.031 0.024 ...
$ Fe : num 0.39 0.26 0.27 0.28 0.32 0.135 0.08 NA 0.13 NA ...
$ No_EPT : int 0 0 0 0 0 0 0 0 0 0 ...
I've subset my sub_mex dataset to analyze observations separately, so i have sub_t dataset. Which look something like this
> str(sub_t)
'data.frame': 5 obs. of 83 variables:
$ pH : num 7.82 7.67 7.73 7.79 7.7
$ CO3 : num 45 NA 37.2 41.9 40.3
$ Mg : num 41.3 51.4 47.7 51.8 53
$ NO3 : num 47.1 40.7 39.9 42.1 37.6
$ Cd : num 0.0173 0.0145 0.016 0.016 0.0154
$ As_H : num 0.00949 0.01009 0.00907 0.00972 0.00954
$ As_F : num 0.004 0.004 0.004 0.0048 0.0078
$ As_FC : num NA NA NA NA 0.0028
$ Pb : num 0.0275 0.024 0.0255 0.031 0.024
$ Fe : num 0.135 0.08 NA 0.13 NA
$ No_EPT : int 0 0 0 0 0
I impute NA's of the sub_mex dataset using:
imp_mexi <- irmi(sub_mex) which works fine
However when I try to impute the subset sub_t I got the following error message:
> imp_t <- irmi(sub_t)
Error in indexNA2s[, variable[j]] : subscript out of bounds
Does anyone have an idea of how to solve this? I want to impute my data sub_t and I don't want to use a subset of the ìmp_mexi imputed dataset.
Any help will be deeply appreciated.
I had a similar issue and discovered that one of the columns in my dataframe was entirely missing- hence the out of bounds error.

Resources