R: partimat function doesn't recognize my classes - r

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.

From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Related

Mann Whitney U test in R

I want to perform a Mann Whitney U test on a small data set of non-parametric data in R, can anyone help me find the right code?
I have been trying to use the following code, to do the test straight from my data set, but I keep getting the error message:
Error: Column.Heading.1 not found.
wilcox.test(Column.Heading.1, Column.Heading.2, data=DATA)
I'm new to stats so not sure if I'm missing something here.
Thanks in advance!
You might be assuming that the wilcox.test() will be able to find Column.Heading.1 and Column.Heading.2 inside DATA.
Unfortunately, this does not happen. The argument data only plays a role if you are giving a formula to the first element, i.e. Column.Heading.1~Column.Heading.2
If you want to use the x,y configuration that you are using you have to write the column in full, as if you trying to see the values on the console. For example.
wilcox.test(DATA$Column.Heading.1, DATA$Column.Heading.2)
Note that the formula and x,y configuration have different meanings. the formula assumes that Column.Heading.2 is a factor grouping the numbers (like "group1" or "group2"). the x, y configuration expects that both columns are filled with numbers.
If you want to specify the data argument to wilcox.test then you need to use the formula interface, which means you will need a dataset in long format with a single column for the response and other for the group.
Otherwise to compare two columns of the same data frame you will need to refer to their locations explicitly. That is if we make a data frame:
dat <- data.frame(x=c(1.1,2.2,3.3),y=c(0.9,2.1,2.8))
Then we can run a Wilcoxon test either using:
wilcox.test(dat$x, dat$y)
or
with(dat , wilcox.test(x,y))

Error in plsm... manifest variables must be contained in data

I am trying to make a PLS-SEM model and I am using the plsm() function in R from the semPLS package. However, at first I got an error saying:
The latent variables are not allowed to coincide with names of observed variables.
I understood it, but after going through my input and even in my measurement model matrix adding single-factor constructs (directly measured variables) I now get the following:
mod <- plsm(data = survey, strucmod = smin, measuremod = mmin)
Error in plsm(data = survey, strucmod = smin, measuremod = mmin) :
The manifest variables must be contained in the data.
I am at a loss as to how I should proceed. It seems that whenever I "fix" one problem, it directly causes another. Does anyone have any examples aside from the standard mobi example from the package where I could see how it's done when I have both latent and directly measured variables?
Found the code for the function, but now I'm even more confused.
https://github.com/cran/semPLS/blob/master/R/plsm.R
Could anyone explain in a simple manner how I am supposed to name my df columns, and the measurement model to avoid this problem?
don't know if you ever solved this, but i just had a similar issue, and seemed to be the only other person. i ended up getting this solved through some trial and error.
I created three tables:
structmodel: SM - column names: Source|Target
measurement model: MM - Column names: Source|Target
Data: Column names - Measurement headers
I converted the sm and mm tables to a matrix
datamatrix_SM = as.matrix(SM)
datamatrix_MM = as.matrix(MM)

h2o Actual column must contain binary labels

I keep on getting the following error in version:3.8.0.3 when trying to predict on a frame either in R or on the website.
I get this error even if I try to run a subset of the training set.
Error evaluating cell
Error calling POST
/3/Predictions/models/DeepLearning_model_R_14596238744_1/frames/t2
with opts {"predictions_frame":"prediction-b0eb96...
ERROR MESSAGE: Actual column must contain binary class labels, but
found cardinality 1!
Even getting this error when I use a subset of the data frame I used to train the model:
t2 <- training_set[1:5,]
I had this happen to me while including the (true/oracle) classification label to the feature data frame used in the predict function.
Anecdotally, this specific data.frame was entirely composed of observations from one class.
Two suggestions :
Try to remove the label/target column from your frame
or ensure that all the classes are represented

Atomic vectors in R and applying function to them

So I have a data set that I'm using from UC Irvine's website("Wine Quality" dataset) and I want to take a look at a plot of the residuals of the data set. The reason I'm doing this is to look to see if there is a an increase in variance so I could run a log based regression. To look at the residuals I apply this code:
residuals(white.wine)
white.wine is how I named my dataframe. However I get this error thrown at me, "NULL". If I want to look at the residuals of a specific predictor variable like Fixed Acidity I get this error:
Error: $ operator is invalid for atomic vectors.
Any way around this? Thanks for any help!
#Hugh was right in saying that "residuals" must be used against a model, but I think your question was also asking about how to apply something over a data frame. In case you just want the variance of each predictor variable, you might want something like:
apply(white.wine, 2, var)
As the ?apply documentation says, you need to provide the data, the margin, and the function. The margin refers to applying over rows or columns, with 1 signaling to apply a function over the rows, and 2 indicating that the function should be applied over columns. I'm assuming you have predictor variables in columns, so I used a 2 in the code above.

convert period in stata to NA in r

I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.

Resources