h2o Actual column must contain binary labels - r

I keep on getting the following error in version:3.8.0.3 when trying to predict on a frame either in R or on the website.
I get this error even if I try to run a subset of the training set.
Error evaluating cell
Error calling POST
/3/Predictions/models/DeepLearning_model_R_14596238744_1/frames/t2
with opts {"predictions_frame":"prediction-b0eb96...
ERROR MESSAGE: Actual column must contain binary class labels, but
found cardinality 1!
Even getting this error when I use a subset of the data frame I used to train the model:
t2 <- training_set[1:5,]

I had this happen to me while including the (true/oracle) classification label to the feature data frame used in the predict function.
Anecdotally, this specific data.frame was entirely composed of observations from one class.
Two suggestions :
Try to remove the label/target column from your frame
or ensure that all the classes are represented

Related

I can't figure out why R thinks that my dataset contains element pair duplicats in my dataset?

When I run my fixed effects regression as shown below I get an error saying that I have duplicates within my dataset. This is the error: "Error in pdim.default(index[1], index[[2]]) : duplicate couples (id-time)"!
fe_distance <- plm(total_trip_distance ~ apparentTemperature+summary+AREA+POPULATION+bar+nightclub+hospital+social_facility, data = regression1, model= "within", index=c("ZIPCODE", "time"))
To form the dataset I grouped by ZIPCODE and time so I don't understand how I could possibly get duplicates within the elements. I was thinking it could be because of the type of variable that the variable was stored as, however changing that doesn't seem to solve my error.
Any recommendations would be very appreciated!
You can check for any duplicate pairings between zipcode and time via sum(duplicated(regression1[,c('ZIPCODE','time')])). If this is greater than 0, then you do have duplicates, and should check to see if there is anything unexpected in your original data (or any errors that could have resulted in the grouping step).

Error in plsm... manifest variables must be contained in data

I am trying to make a PLS-SEM model and I am using the plsm() function in R from the semPLS package. However, at first I got an error saying:
The latent variables are not allowed to coincide with names of observed variables.
I understood it, but after going through my input and even in my measurement model matrix adding single-factor constructs (directly measured variables) I now get the following:
mod <- plsm(data = survey, strucmod = smin, measuremod = mmin)
Error in plsm(data = survey, strucmod = smin, measuremod = mmin) :
The manifest variables must be contained in the data.
I am at a loss as to how I should proceed. It seems that whenever I "fix" one problem, it directly causes another. Does anyone have any examples aside from the standard mobi example from the package where I could see how it's done when I have both latent and directly measured variables?
Found the code for the function, but now I'm even more confused.
https://github.com/cran/semPLS/blob/master/R/plsm.R
Could anyone explain in a simple manner how I am supposed to name my df columns, and the measurement model to avoid this problem?
don't know if you ever solved this, but i just had a similar issue, and seemed to be the only other person. i ended up getting this solved through some trial and error.
I created three tables:
structmodel: SM - column names: Source|Target
measurement model: MM - Column names: Source|Target
Data: Column names - Measurement headers
I converted the sm and mm tables to a matrix
datamatrix_SM = as.matrix(SM)
datamatrix_MM = as.matrix(MM)

R - skmeans with zeros

I'm a total R beginner and try to cluster user data using the function skmeans.
I always get the error message:
"Error in if (!all(row_norms(x) > 0)) stop("Zero rows are not allowed.") :
missing value where TRUE/FALSE needed".
There already is a topic about this error message explaining that zeros are not allowed in rows.
However, my blueprint for what I'm trying to do is an example based on a data set which is also full of zeros. Working with this example, the error message does not appear and the function works fine. The error message only occurs when I apply the same procedure to my data set which doesn't seem different from the blueprint's data set.
Here's the function used for the kmeans:
weindaten.clusters <- skmeans(wendaten.tr, 5, method="genetic")
And here's the data set:
For my own data set, I used this function
kunden.cluster<- skmeans(test4, 5, method="genetic")
for this data set:
Could somebody please help me understand what the difference between the two data sets is (vector vs. something else maybe) and how I can change my data to be able to use skeams?
You cannot use spherical k-means on this data.
Spherical k-means uses angles for similarity. But the all-zero row cannot be used in angular computations.
Choose a different algorithm, unless you can treat the all-zero roe specially (for example on text, this would be an empty document).

Error while using rarecurve() in R

I am using vegan::rarecurve on community data.
lac.com.data<-wisconsin(lac.com.data)
rarecurve(lac.com.data)
Unfortunately, I am getting an error and cannot figure out how to fix it.
Error in seq.default(1, tot[i], by = step) : wrong sign in 'by' argument
I tried
rarecurve(lac.com.data,step=1)
to no avail.
I already generated a tabasco() graph and performed a Wisconsin standardization on the data frame without any problem.
There is no reproducible example. However, your usage is wrong. Function rarecurve needs input data of counts: it samples individuals from each sampling unit (row), and therefore you must have data on individuals. The error is caused by the use of wisconsin(lac.com.data): after that all rowSums(lac.com.data) will be 1, and your data are non-integers. You cannot use rarecurve for wisconsin() transformed data or any other non-integer data. Here the error manifests because the estimated numbers of individuals (rowSums of transformed data which are all 1) are lower than the number of species (>1).
Obviously we need to check input in rarecurve. We assumed that people would know what kind input is needed, but we were wrong.

R: partimat function doesn't recognize my classes

I am a relatively novice r user and am attempting to use the partimat() function within the klaR package to plot decision boundaries for a linear discriminant analysis but I keep encountering the same error. I have tried inputing the arguments multiple different ways according to the manual, but keep getting the following error:
Error in partimat.default(x, grouping, ...) :
at least two classes required
Here is an example of the input I've given:
partimat(sources1[,c(3:19)],grouping=sources1[,2],method="lda",prec=100)
where my data table is loaded in under the name "sources1" with columns 3 through 19 containing the explanatory variables and column 2 containing the classes. I have also tried doing it by entering the formula like so:
partimat(sources1$group~sources1$tio2+sources1$v+sources1$cr+sources1$co+sources1$ni+sources1$rb+sources1$sr+sources1$y+sources1$zr+sources1$nb+sources1$la+sources1$gd+sources1$yb+sources1$hf+sources1$ta+sources1$th+sources1$u,data=sources1)
with these being the column heading.
I have successfully run an LDA on this same data set without issue so I'm not quite sure what is wrong.
From the source code of the partimat.default function getAnywhere(partimat.default) it states
if (nlevels(grouping) < 2)
stop("at least two classes required")
Therefore maybe you haven't defined your grouping column as a factor variable. If you try summary(sources1[,2]) what do you get? If it's not a factor, try
sources1[,2] <- as.factor(sources1[,2])
Or in method 2 try removing the "sources1$"on each of your variable names in the formula as you specify the data frame in which to look for these variable names in the data argument. I think you are effectively specifying the dataframe twice and it might be looking, for instance, for
"sources1$sources1$groups"
Rather than
"sources1$groups"
Without further error messages or a reproducible example (i.e. include some data in your post) it's hard to say really.
HTH

Resources