Model Prediction Partial Least Square Model - r

I am following the procedure explained Hair et al (2021) to run a partial least square model (seminR).
So far, it worked well. However, when using the predict function, I get the following error:
Parallel encountered this ERROR:
Must subset columns with a valid subscript vector.
x Subscript endogenous_items must be a simple vector, not a matrix.
r in summary.connection(connection) : invalid connection
I have recently started working with r. My dataset is an excel table with 579 obs. of 47 variables. How can I solve these problems? Thank you very much in advance.
That's my code:
composite("EA", multi_items("EA_", 1:3))`
composite("DB", multi_items("DB_", 1:5)),
composite("LTB", multi_items("LTB_", 1:5)),
composite("SN", multi_items("SN_", 1:3)),
composite("PBC", multi_items("PBC_", 1:3)),
composite("SE", multi_items("SE_", 1:8)),
composite("INT", multi_items("INT_", 1:2)),
composite("B", multi_items("B_", 1:2)))
Create structural model
`final_sm_ext <- relationships(
paths(from = c("DB", "LTB", "SN", "PBC", "SE", "INT"), to = c("B")),
paths(from = c("EA"), to = c("INT")))`
bike_final_model_ext <- estimate_pls('data = PLSdata,
measurement_model = final_mm_ext,
structural_model = final_sm_ext,
inner_weights = path_weighting,
missing = mean_replacement,
missing_value = "-99")
summary_bike_final_model_ext <- summary(bike_final_model_ext)
predict_bike_final_model_ext <- `predict_pls( model = bike_final_model_ext,
technique = predict_DA, noFolds = 10, reps = 10)```

Related

External Cluster Validation - Categorical Data R

I've recently been attempting to evaluate output from k-modes (a cluster label), relative to a so-called True cluster label (labelled 'class' below).
In other words: I've been attempting to external validate the clustering output. However, when I tried external validation measures from the 'fpc' package, I was unsuccessful (error term posted below script).
I've attached my code for the mushroom dataset. I would appreciate if anyone could show me how to successful execute these external validation measures in the context of categorical data.
Any help appreciated.
# LIBRARIES
install.packages('klaR')
install.packages('fpc')
library(klaR)
library(fpc)
#MUSHROOM DATA
mushrooms <- read.csv(file = "https://raw.githubusercontent.com/miachen410/Mushrooms/master/mushrooms.csv", header = FALSE)
names(mushrooms) <- c("edibility", "cap-shape", "cap-surface", "cap-color",
"bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring", "veil-type",
"veil-color", "ring-number", "ring-type", "spore-print-color",
"population", "habitat")
names(mushrooms)[names(mushrooms)=="edibility"] <- "class"
indexes <- apply(mushrooms, 2, function(x) any(is.na(x) | is.infinite(x)))
colnames(mushrooms)[indexes]
table(mushrooms$class)
str(mushrooms)
#REMOVING CLASS VARIABLE
mushroom.df <- subset(mushrooms, select = -c(class))
#KMODES ANALYSIS
result.kmode <- kmodes(mushroom.df, 2, iter.max = 50, weighted = FALSE)
#EXTERNAL VALIDATION ATTEMPT
mushrooms$class <- as.factor(mushrooms$class)
class <- as.numeric(mushrooms$class))
clust_stats <- cluster.stats(d = dist(mushroom.df),
class, result.kmode$cluster)
#ERROR TERM
Error in silhouette.default(clustering, dmatrix = dmat) :
NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In dist(mushroom.df) : NAs introduced by coercion

Debug error in frame$yval2[where, 1L + nclass + 1L:nclass, drop = FALSE]: subscript out of bounds

I'm using rpart library to build a regression tree, with the following code:
skillcraft <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv", header = T, sep =",")
skillcraft$LeagueIndex <- factor(skillcraft$LeagueIndex)
skillcraft <- skillcraft[-1]
skillcraft$Age <- as.numeric(levels(skillcraft$Age))[skillcraft$Age]
skillcraft$TotalHours <- as.numeric(
levels(skillcraft$TotalHours))[skillcraft$TotalHours]
skillcraft$HoursPerWeek <- as.numeric(
levels(skillcraft$HoursPerWeek))[skillcraft$HoursPerWeek]
skillcraft <- skillcraft[complete.cases(skillcraft),]
library(caret)
set.seed(133)
skillcraft_sampling_vector <- createDataPartition(
skillcraft$LeagueIndex, p = 0.8, list = F)
skillcraft_train <- skillcraft[skillcraft_sampling_vector,]
skillcraft_test <- skillcraft[-skillcraft_sampling_vector,]
library(rpart)
regtree <- rpart(LeagueIndex ~., data = skillcraft_train)
regtree_predictions <- predict(regtree, skillcraft_test)
The last line of this code is throwing the error:
Error in frame$yval2[where, 1L + nclass + 1L:nclass, drop = FALSE] :
subscript out of bounds
This doesn't seem very clear, but I've checked that both data frames (train and test) have the same structure and now I'm having trouble in finding a way to debug this code.
Can anyone help?
Thanks in advance!
My best guess is that the problem lies in the LeagueIndex factor. This variable was provided as ordinal data (from Bronze to Professional) and converted to a character factor "1", "2", "3", etc. up to "8".
It looks like in addition to your error with rpart, you get a warning when partitioning the data based on this factor:
In createDataPartition(skillcraft$LeagueIndex, p = 0.8, list = F) :
Some classes have no records ( 8 ) and these will be ignored
Apparently there are no records with LeagueIndex of 8. This seems to come after you select for completed cases here:
skillcraft <- skillcraft[complete.cases(skillcraft),]
And all of the LeagueIndex=8 cases are removed as these will have missing data for Age, HoursPerWeek, and TotalHours (coerced to NA) when converted via as.numeric.
skillcraft[which(skillcraft$LeagueIndex == 8), c("Age", "HoursPerWeek", "TotalHours")]
Age HoursPerWeek TotalHours
3341 ? ? ?
3342 ? ? ?
3343 ? ? ?
...
Assuming you still wanted a factor, I believe if you get rid of the unused factor level this will work such as:
skillcraft$LeagueIndex <- droplevels(skillcraft$LeagueIndex)
before partitioning the data. (You could just do on the training set in this example, but you would want the same factor levels in your test and train sets.)

Error running neural net

library(nnet)
set.seed(9850)
train1<- sample(1:155,110)
test1 <- setdiff(1:110,train1)
ideal <- class.ind(hepatitis$class)
hepatitisANN = nnet(hepatitis[train1,-20], ideal[train1,], size=10, softmax=TRUE)
j <- predict(hepatitisANN, hepatitis[test1,-20], type="class")
hepatitis[test1,]$class
table(predict(hepatitisANN, hepatitis[test1,-20], type="class"),hepatitis[test1,]$class)
confusionMatrix(hepatitis[test1,]$class, j)
Error:
Error in nnet.default(hepatitis[train1, -20], ideal[train1, ], size = 10, :
NA/NaN/Inf in foreign function call (arg 2)
In addition: Warning message:
In nnet.default(hepatitis[train1, -20], ideal[train1, ], size = 10, :
NAs introduced by coercion
hepatitis variable consists of the hepatitis dataset available on UCI.
This error message is because you have character values in your data.
Try reading the hepatitis dataset with na.strings = "?". This is defined in the description of the dataset on the uci page.
headers <- c("Class","AGE","SEX","STEROID","ANTIVIRALS","FATIGUE","MALAISE","ANOREXIA","LIVER BIG","LIVER FIRM","SPLEEN PALPABLE","SPIDERS","ASCITES","VARICES","BILIRUBIN","ALK PHOSPHATE","SGOT","ALBUMIN","PROTIME","HISTOLOGY")
hepatitis <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/hepatitis.data", header = FALSE, na.strings = "?")
names(hepatitis) <- headers
library(nnet)
set.seed(9850)
train1<- sample(1:155,110)
test1 <- setdiff(1:110,train1)
ideal <- class.ind(hepatitis$Class)
# will give error due to missing values
# 1st column of hepatitis dataset is the class variable
hepatitisANN <- nnet(hepatitis[train1,-1], ideal[train1,], size=10, softmax=TRUE)
This code will not give your error, but it will give an error on missing values. You will need to do address those before you can continue.
Also be aware that the class variable is the first variable in the dataset straight from the UCI data repository
Edit based on comments:
The na.action only works if you use the formula notation of nnet.
So in your case:
hepatitisANN <- nnet(class.ind(Class)~., hepatitis[train1,], size=10, softmax=TRUE, na.action = na.omit)

Error: object not found - cor.ci

I'm trying to use cor.ci to obtain polychoric correlations with significance tests, but it keeps giving me an error message. Here is the code:
install.packages("Hmisc")
library(Hmisc)
mydata <- spss.get("S-IAT for R.sav", use.value.labels=TRUE)
install.packages('psych')
library(psych)
poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
poly.example
print(corr.test(poly.example$rho), short=FALSE)
Here is the error message it gives:
> library(psych)
> poly.example <- cor.ci(mydata(nvar = 10,n = 100)$items,n.iter = 10,poly = TRUE)
Error in cor.ci(mydata(nvar = 10, n = 100)$items, n.iter = 10, poly = TRUE) :
could not find function "mydata"
> poly.example
Error: object 'poly.example' not found
> print(corr.test(poly.example$rho), short=FALSE)
Error in is.data.frame(x) : object 'poly.example' not found
How can I make it recognize mydata and/or select certain variables from this dataset for the analysis? I got the above code from here:
Polychoric correlation matrix with significance in R
Thanks!
You have several problems.
1) As previously commented upon, you are treating mydata as a function, but you need to treat it as a data.frame. Thus the call should be
poly.example <- cor.ci(mydata,n.iter = 10,poly = TRUE)
If you are trying to just get the first 100 cases and the first 10 variables, then
poly.example <- cor.ci(mydata[1:10,1:100],n.iter = 10,poly = TRUE)
2) Then, you do not want to run corr.test on the resulting correlation matrix. corr.test should be run on the data.
print(corr.test(mydata[1:10,1:100],short=FALSE)
Note that corr.test is testing the Pearson correlation.

how to solve negative subscript error in R?

I am trying to normalize the data frame before prediction but I get this error :
Error in seq_len(nrows)[i] :
only 0's may be mixed with negative subscripts
Called from: top level
Here is my code :
library('caret')
load(file = "some dataset path here")
DummyDataSet = data
attach(DummyDataSet)
foldCount = 10
classifyLabels = DummyDataSet$ClassLabel
folds = createFolds(classifyLabels,k=foldCount)
for (foldIndex in 1:foldCount){
cat("----- Start Fold -----\n")
#holding out samples of one fold in each iterration
testFold = DummyDataSet[folds[[foldIndex]],]
testLabels = classifyLabels[folds[[foldIndex]]]
trainFolds = DummyDataSet[-folds[[foldIndex]],]
trainLabels = classifyLabels[-folds[[foldIndex]]]
#Zero mean unit variance normalization to ONLY numerical data
for (k in 1:ncol(trainFolds)){
if (!is.integer(trainFolds[,k])){
params = meanStdCalculator(trainFolds[,k])
trainFolds[,k] = sapply(trainFolds[,k], function(x) (x - params[1])/params[2])
testFold[,k] = sapply(testFold[,k], function(x) (x - params[1])/params[2])
}
}
meanStdCalculator = function(data){
Avg = mean(data)
stdDeviation = sqrt(var(data))
return(c(Avg,stdDeviation))
}
cat("----- Start Fold -----\n")
}
where trainFolds is a fold creating by caret package and its type is data.frame.
I have already read these links :
R Debugging
Subset
Negative Subscripts
but I couldn't find out what is wrong with the indexes?
anybody can help me?

Resources