Error in plsm... manifest variables must be contained in data - r

I am trying to make a PLS-SEM model and I am using the plsm() function in R from the semPLS package. However, at first I got an error saying:
The latent variables are not allowed to coincide with names of observed variables.
I understood it, but after going through my input and even in my measurement model matrix adding single-factor constructs (directly measured variables) I now get the following:
mod <- plsm(data = survey, strucmod = smin, measuremod = mmin)
Error in plsm(data = survey, strucmod = smin, measuremod = mmin) :
The manifest variables must be contained in the data.
I am at a loss as to how I should proceed. It seems that whenever I "fix" one problem, it directly causes another. Does anyone have any examples aside from the standard mobi example from the package where I could see how it's done when I have both latent and directly measured variables?
Found the code for the function, but now I'm even more confused.
https://github.com/cran/semPLS/blob/master/R/plsm.R
Could anyone explain in a simple manner how I am supposed to name my df columns, and the measurement model to avoid this problem?

don't know if you ever solved this, but i just had a similar issue, and seemed to be the only other person. i ended up getting this solved through some trial and error.
I created three tables:
structmodel: SM - column names: Source|Target
measurement model: MM - Column names: Source|Target
Data: Column names - Measurement headers
I converted the sm and mm tables to a matrix
datamatrix_SM = as.matrix(SM)
datamatrix_MM = as.matrix(MM)

Related

How do I use prodlim function with a non-binary variable in formula?

I am trying to (eventually) plot data by groups, using the prodlim function.
I'm adjusting and adapting code that someone else (not available for questions) has written, and I'm not very familiar with the prodlim library/function. There are definitely other ways to do what I'd like to, but I'm trying to keep it consistent with what the previous person did.
I have code that works, when dividing the data into 2 groups, but when I try to adjust for a 4 group situation, I get an error.
Of note, the data is coming over from SAS using StatTransfer, which has been working fine.
I am new to coding, but I have compared the dataframes I'm trying to work with. The second is just a subset of the first (where the code does work), with all the same variables, and both of the variables I'm trying to group by are integer values.
Hist(medpop$dz_time, medpop$dz_status) works just fine, so the problem must be with the prodlim function, and I haven't understood much of what I've looked up about it, sadly :/ But it the documentation seems to indicate it supports continuous or categorical variables, and doesn't seem limited to binary either. None of the options seem applicable as I understand them.
this works:
M <- prodlim(Hist(dz_time, dz_status)~med, data=pop)
where med is a binary value =1 when a member of this population is taking it, and dz is a disease that some portion develop.
this does not:
(either of these get the error as below)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=medpop)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=pop, subset=pop$med==1)
medpop = the subset of the original population taking the med,
strength = categorical variable ("1","2","3","4")
For the line that does work, the next step is just plot(M), giving a plot with two lines, med==0 and med==1 (showing cumulative incidence of dz_status by dz_time).
For the other line, I get an error saying
Error in KernSmooth::dpik(cumtabx/N, kernel = "box") :
scale estimate is zero for input data
I don't know what that means or how to fix it.. :/

Anova Test- Separate Groups with Two Factor Comparison

Good morning,
I am trying to run some ANOVA tests on my dataset (using R) and I keep getting errors. I am trying to compare the average percentage of correct responses, as a factor of what "group" the subjects were in and what session/day it was. However, I have two separate conditions that I need to analyze separately.
So essentially, I need to compare PctCorrect in Condition 1, between groups and sessions and then do the same thing for condition 2.
I attempted using this code:
aov(ext$Pct.Correct[ext$Condition=="NC-EXT"]~ext$Group*ext$Session, data=ext)
and I got the following error:
Error in model.frame.default(formula = ext$Pct.Correct[ext$Condition
== : variable lengths differ (found for 'ext$Group')
I ran this code to make sure that all of my values were even:
mytable <- table(ext$Session, ext$Group, ext$Condition)
ftable(mytable)
And they were all the same value (which was to be expected), so I am not sure what's wrong.
I am very new to R so it's entirely possible that I am doing this completely wrong. Any help would be greatly appreciated.
You are filtering the left side of the equation and not filtering the right side, thus the "variable length error".
You could try filter your dataframe in the data= option like this:
aov(Pct.Correct ~ Group* Session, data=ext[ext$Condition=="NC-EXT",])

PLS-DA in mdatools package in R

I have been trying to use the mdatools package to run a pls-da using the plsda() function. I have data with 9000 variables and around 30 observations. Each of the 30 rows are patients; the first column contains the clinical status for each patient (disease or control) and the remaining 8999 columns contain numerical data on the patients. I used the following code to run the plsda:
plsda(data[,2:9000], data[,1], ncomp = 8999, coeffs.ci = 'jk')
When the code finally compiles, it returns an error, saying"Error in selectCompNum.pls(model,ncomp): wrong number of selected components!"
I chose ncomp =8999 as the total number of numbers from 2 to 9000...and the strange thing is, this worked well with a low number of components. For example, when I tried
plsda(data[,2:10], data[,1], ncomp = 9, coeffs.ci = 'jk')
No error message is returned.
Perhaps I am misunderstanding how to select the right number of components? I would greatly appreciate any help. Thank you very much in advance!
I am developer of the mdatools package and just came across your question accidentally. Number of components in PLS/PLS-DA is a number of latent variables. Every latent variable is a linear combination of original variables. Normally, you need much less components than the original variables, depending on the type of data number of components can be from 1-2 to 10-20. I recommend you to look at PLS part of the tutorial and ask me directly (e.g. by email) if you still have any questions or issues with the package.

h2o Actual column must contain binary labels

I keep on getting the following error in version:3.8.0.3 when trying to predict on a frame either in R or on the website.
I get this error even if I try to run a subset of the training set.
Error evaluating cell
Error calling POST
/3/Predictions/models/DeepLearning_model_R_14596238744_1/frames/t2
with opts {"predictions_frame":"prediction-b0eb96...
ERROR MESSAGE: Actual column must contain binary class labels, but
found cardinality 1!
Even getting this error when I use a subset of the data frame I used to train the model:
t2 <- training_set[1:5,]
I had this happen to me while including the (true/oracle) classification label to the feature data frame used in the predict function.
Anecdotally, this specific data.frame was entirely composed of observations from one class.
Two suggestions :
Try to remove the label/target column from your frame
or ensure that all the classes are represented

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources