I am facing this problem in R, AOV function - r

This is my code:
> av = aov(r ~ tf);av
r = matrix with numerical data
tf= factor data
This is the error:
Error in model.frame.default(formula = r ~ tf, drop.unused.levels = TRUE) :
variable lengths differ (found for 'tf')
What is possibly wrong? I am very new to this, I have checked my previous steps and everything seems right. Please let me know if you need any additional information

Number of rows of the matrix should be the same as the length of the vector 'tf'. If that is not the case, it would show the length difference error. Below code works as the number of rows of 'r' is 10 and length of 'tf' is 10
r <- matrix(rnorm(5 * 10), 10, 5)
tf <- factor(sample(letters[1:3], 10, replace = TRUE))
aov(r ~ tf)

Related

SMOTE target variable not found in R

Why is it when I run the smote function in R, an error appears saying that my target variable is not found? I am using the smotefamily package to run this smote function.
tv_smote <- SMOTE(tv_smote, Churn, K = 5, dup_size = 0)
Error in table(target) : object 'Churn' not found
Chunk Codes
df data structure
df 1st few rows
Generally, you should include the minimum code and data needed to reproduce the problem. This saves time and gives you more chance of getting an answer. However, try this...
tv_smote <- SMOTE(tv_smote, tv_smote$Churn, K = 5, dup_size = 0)
I can get the same error by doing this:
library(smotefamily)
df <- data.frame(x = 1:8, y = 8:1)
df_smote <- SMOTE(df, y, K = 3, dup_size = 0)
It appears to me that SMOTE doesn't know y is a column name. In the documentation, ?SMOTE, it says that the target argument is "A vector of a target class attribute corresponding to a dataset X." My interpretation of this is that you need to supply a vector, not a name. Changing it to df_smote <- SMOTE(df, df$y, K = 3, dup_size = 0) gets past that part.
I am not familiar with the SMOTE function, but from testing it would appear that SMOTE cannot accept dataframes with any factor type columns. I get Error in get.knnx(data, query, k, algorithm) : Data non-numeric when using as.factor.

Error: negative length vectors are not allowed

I have a relatively big dataframe in R called df which is about 2.9 gb in size, with dimensions
3701578 rows and 94 columns. I am trying to run the following command with the package pls to perform a principal component regression (pcr):
set.seed(1)
y_cols = tail(colnames(df),1) # select last column as dependent variable
x_cols = colnames(df)[-c(1, 2, 93, 94)] # PCA applied only to columns from 3 to 92, whose components become the regressors for pcr
formula = as.formula(
paste0("`",y_cols,"`", " ~ ", paste(paste0("`", x_cols, "`"), collapse = " + "))
) # to ease the writing down the formula
model <- pcr(formula=formula, data=df[df$date<19801231,], scale=FALSE, center=FALSE)
I get the following error:
Error in array(0, dim = c(npred, nresp, ncomp)): negative length vectors are not allowed
Traceback:
1. pcr(formula = formula, data = df[df$date < 19801231, ], scale = FALSE,
. center = FALSE)
2. eval(cl, parent.frame())
3. eval(cl, parent.frame())
4. pls::mvr(formula = formula, data = df[df$date < 19801231, ],
. scale = FALSE, center = FALSE, method = "svdpc")
5. fitFunc(X, Y, ncomp, Y.add = Y.add, center = center, ...)
6. array(0, dim = c(npred, nresp, ncomp))
Slicing the dataframe as in the formula gives a smaller dataframe of 751024 rows × 94 columns. At the beginning I thought (based on similar cases I found online) that this could be due to a memory limit, but actually I have around 1000 gb of RAM available so that is definitely not the case. Funny thing, I have no problem if I run the same command on the entire dataframe df. Creating a new object e.g. new<- df[df$date < 19801231, ] and then running the code does not help either. I managed to get it running if I set some missing data (relatively few) to zero in new. However, if I keep the missing data, the pcr command runs smoothly if I use the entire (bigger) df. Somebody has any idea about this behavior?

What does 'not enough x observations' imply in R?

I'm trying to run a t-test in R Studio and I keep coming back with this error -->
Error in t.test.default(x = subset(mydata$InfMort, subset = mydata$SubSahCountryvariable == : not enough 'x' observations
Here's the code -->
with(mydata,
t.test(x = subset(mydata$InfMort, subset = mydata$SubSahCountryvariable == 1),
y = subset(mydata$InfMort, subset = mydata$ArabCountryvariable == 1),
alternative = "two.sided"))
Anyone have any idea what's going on? I'm a beginner level in R.
This is because you have less than the minimum number of observations in the first group, which is 2 observations. For example:
> t.test(c(1), c(3,4))
Error in t.test.default(c(1), c(3, 4)) : not enough 'x' observations

R segmented package "variable lengths differ"

I'm having trouble getting started with the segmented package in R.
When running the basic example below I get the error:
Error in model.frame.default(formula = y ~ x + U1.x + psi1.x, data = mfExt, :
variable lengths differ (found for 'x')
I was expecting segmented to return a piecewise linear model with 2 segments. I'm clearly making a mistake in my call, but am unable to work out from the error message and the documentation what my mistake is. Help would be appreciated.
require(segmented)
test.df = data.frame(x = c(1:100),
y = c(c(1:50),seq(from = 52, by = 2, length = 50)))
test.mod = lm(y ~ x,
test.df)
segmented(test.mod,
seg.Z = ~ x,
psi = list(x = 40))
It turns out that I had an object in my workspace called 'x'. After removing this object the call to segmented gave expected results.
I can replicate the error any time I have an object of length 1 called x regardless of whether that object is a list or a vector.
If the object has length greater than 1, the error disappears and segmented behaves as expected.
Weird. Thanks #Pascal for your input.

KNN in R: 'train and class have different lengths'?

Here is my code:
train_points <- read.table("kaggle_train_points.txt", sep="\t")
train_labels <- read.table("kaggle_train_labels.txt", sep="\t")
test_points <- read.table("kaggle_test_points.txt", sep="\t")
#uses package 'class'
library(class)
knn(train_points, test_points, train_labels, k = 5);
dim(train_points) is 42000 x 784
dim(train_labels) is 42000 x 1
I don't see the issue, but I'm getting the error :
Error in knn(train_points, test_points, train_labels, k = 5) :
'train' and 'class' have different lengths.
What's the problem?
Without access to the data, it's really hard to help. However, I suspect that train_labels should be a vector. So try
cl = train_labels[,1]
knn(train_points, test_points, cl, k = 5)
Also double check:
dim(train_points)
dim(test_points)
length(cl)
I had the same issue in trying to apply knn on breast cancer diagnosis from wisconsin dataset I found that the issue was linked to the fact that cl argument need to be a vector factor (my mistake was to write cl=labels , I thought this was the vector to be predicted it was in fact a data frame of one column ) so the solution was to use the following syntax : knn (train, test,cl=labels$diagnosis,k=21) diagnosis was the header of the one column data frame labels and it worked well
Hope this help !
I have recently encountered a very similar issue.
I wanted to give only a single column as a predictor. In such cases, selecting a column, you have to remember about drop argument and set it to FALSE. The knn() function accepts only matrices or data frames as train and test arguments. Not vectors.
knn(train = trainSet[, 2, drop = FALSE], test = testSet[, 2, drop = FALSE], cl = trainSet$Direction, k = 5)
Try converting the data into a dataframe using as.dataframe(). I was having the same problem & afterwards it worked fine:
train_pointsdf <- as.data.frame(train_points)
train_labelsdf <- as.data.frame(train_labels)
test_pointsdf <- as.data.frame(test_points)
Simply set drop = TRUE while you're excluding cl from dataframe, it causes to remove dimension from an array which have only one level:
cl = train_labels[,1, drop = TRUE]
knn(train_points, test_points, cl, k = 5)
I had a similar error when I was reading to a tibble (read_csv) and when I switched to read.csv the code worked.
Followed the code as given in the book but will show error due to mismatch lengths (1 is df other is vector returned). I reached here but nothing worked exactly but ideas helped that vectors were needed for comparison.
This throws error
gmodels::CrossTable(x = wbcd_test_labels, # actuals
y = wbcd_test_pred, # predicted
prop.chisq = FALSE)
The following works :
gmodels::CrossTable(x = wbcd_test_labels$diagnosis, # actuals
y = wbcd_test_pred, # predicted
prop.chisq = FALSE)
where using $ for x makes it a vector and hence matches
Additionally while running knn
Cl parameter shoud also have vector save labels in vectors else there will be length mismatch OR use labelDF$Class_label
wbcd_test_pred <- knn(train = wbcd_train,
test = wbcd_test,
cl =wbcd_train_labels$diagnosis, #note this
k = 21)
Hope this helps beginners like me.
Uninstall R Previous versions and install R version > 4.0. It will work.

Resources