What does 'not enough x observations' imply in R? - r

I'm trying to run a t-test in R Studio and I keep coming back with this error -->
Error in t.test.default(x = subset(mydata$InfMort, subset = mydata$SubSahCountryvariable == : not enough 'x' observations
Here's the code -->
with(mydata,
t.test(x = subset(mydata$InfMort, subset = mydata$SubSahCountryvariable == 1),
y = subset(mydata$InfMort, subset = mydata$ArabCountryvariable == 1),
alternative = "two.sided"))
Anyone have any idea what's going on? I'm a beginner level in R.

This is because you have less than the minimum number of observations in the first group, which is 2 observations. For example:
> t.test(c(1), c(3,4))
Error in t.test.default(c(1), c(3, 4)) : not enough 'x' observations

Related

SMOTE target variable not found in R

Why is it when I run the smote function in R, an error appears saying that my target variable is not found? I am using the smotefamily package to run this smote function.
tv_smote <- SMOTE(tv_smote, Churn, K = 5, dup_size = 0)
Error in table(target) : object 'Churn' not found
Chunk Codes
df data structure
df 1st few rows
Generally, you should include the minimum code and data needed to reproduce the problem. This saves time and gives you more chance of getting an answer. However, try this...
tv_smote <- SMOTE(tv_smote, tv_smote$Churn, K = 5, dup_size = 0)
I can get the same error by doing this:
library(smotefamily)
df <- data.frame(x = 1:8, y = 8:1)
df_smote <- SMOTE(df, y, K = 3, dup_size = 0)
It appears to me that SMOTE doesn't know y is a column name. In the documentation, ?SMOTE, it says that the target argument is "A vector of a target class attribute corresponding to a dataset X." My interpretation of this is that you need to supply a vector, not a name. Changing it to df_smote <- SMOTE(df, df$y, K = 3, dup_size = 0) gets past that part.
I am not familiar with the SMOTE function, but from testing it would appear that SMOTE cannot accept dataframes with any factor type columns. I get Error in get.knnx(data, query, k, algorithm) : Data non-numeric when using as.factor.

I am facing this problem in R, AOV function

This is my code:
> av = aov(r ~ tf);av
r = matrix with numerical data
tf= factor data
This is the error:
Error in model.frame.default(formula = r ~ tf, drop.unused.levels = TRUE) :
variable lengths differ (found for 'tf')
What is possibly wrong? I am very new to this, I have checked my previous steps and everything seems right. Please let me know if you need any additional information
Number of rows of the matrix should be the same as the length of the vector 'tf'. If that is not the case, it would show the length difference error. Below code works as the number of rows of 'r' is 10 and length of 'tf' is 10
r <- matrix(rnorm(5 * 10), 10, 5)
tf <- factor(sample(letters[1:3], 10, replace = TRUE))
aov(r ~ tf)

Error in array, regression loop using "plyr"

Good morning,
I´m currently trying to run a truncated regression loop on my dataset. In the following I will give you a reproducible example of my dataframe.
library(plyr)
library(truncreg)
df <- data.frame("grid_id" = rep(c(1,2), 6),
"htcm" = rep(c(160,170,175), 4),
stringsAsFactors = FALSE)
View(df)
Now I tried to run a truncated regression on the variable "htcm" grouped by grid_id to receive only coefficients (intercept such as sigma), which I then stored into a dataframe. This code is written based on the ideas of #hadley
reg <- dlply(df, "grid_id", function(.)
truncreg(htcm ~ 1, data = ., point = 160, direction = "left")
)
regcoef <- ldply(reg, coef)
As this code works for one of my three datasets, I receive error messages for the other two ones. The datasets do not differ in any column but in their absolute length
(length(df1) = 4,000; length(df2) = 100,000; length(df3) = 13,000)
The error message which occurs is
"Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of type vector, was 'NULL'
I do not even know how to reproduce an example where this error code occurs, because this code works totally fine with one of my three datasets.
I already accounted for missing values in both columns.
Does anyone has a guess what I can fix to this code?
Thanks!!
EDIT:
I think I found the origin of error in my code, the problem is most likely about that in a truncated regression model, the standard deviation is calculated which automatically implies more than one observation for any group. As there are also groups with only n = 1 observations included, the standard deviation equals zero which causes my code to detect a vector of length = NULL. How can I drop the groups with less than two observations within the regression code?

Error in seq.default(from = min(k), to = max(k), length = nBreaks + 1) : 'from' must be a finite number. WISH-R package

I have a list of pre-filtered genomic regions (based on previous GWAS and some enrichment analysis performed on GSEA) and I am looking for interesting gene-gene interactions.
i have a binary phenotype and i have used glm=T in the model of course.
I have followed in detail the WISH-R guide - https://github.com/QSG-Group/WISH - and generated the correlations matrix without issues.
I am now struggling to use the generate.modules function, so I am writing here for some help.
i have tried several times to run generate.modules(correlations,values="Coefficients",thread=2)
before that I have also run as suggested:
correlations$Coefficients[(is.na(correlations$Coefficients))]<-0
correlations$Pvalues[(is.na(correlations$Pvalues))]<-1
This is my R code:
library(WISH)
library(data.table)
ped <- fread("D:/Dati/GWAS_ITALIAN_PBC_Mike_files/EPISTASI/epistasi_all SNPs_all_TF/file_epistasi_per_wish/all_snp_tf_recoded.ped", data.table=F)
tped <- fread("D:/Dati/GWAS_ITALIAN_PBC_Mike_files/EPISTASI/epistasi_all SNPs_all_TF/file_epistasi_per_wish/all_snp_tf_recoded.tped", data.table=F)
pval <- fread("D:/Dati/GWAS_ITALIAN_PBC_Mike_files/EPISTASI/epistasi_all SNPs_all_TF/file_epistasi_per_wish/ALL_SNP_TF_p.txt", data.table=F)
id <- fread("D:/Dati/GWAS_ITALIAN_PBC_Mike_files/EPISTASI/epistasi_all SNPs_all_TF/file_epistasi_per_wish/ALL_SNP_TF_id.txt", data.table=F)
genotype <-generate.genotype(ped,tped,snp.id=id, pvalue=0.005,id.select=NULL,gwas.p=pval,major.freq=0.95,fast.read=T)
LD_genotype<-LD_blocks(genotype)
genotype <- LD_genotype$genotype
pheno<-fread("D:/Dati/GWAS_ITALIAN_PBC_Mike_files/EPISTASI/epistasi_all SNPs_all_TF/file_epistasi_per_wish/pheno.txt",data.table=F)
pheno<-ifelse(pheno=="1","0","1")
pheno<-as.numeric(pheno)
correlations<-epistatic.correlation(pheno, genotype,threads = 2 ,test=F,glm=T)
genome.interaction(tped,correlations,quantile = 0.9)
correlations$Coefficients[(is.na(correlations$Coefficients))]<-0
correlations$Pvalues[(is.na(correlations$Pvalues))]<-1
generate.modules(correlations,values="Coefficients",thread=2)
I get the following error:
Error in seq.default(from = min(k), to = max(k), length = nBreaks + 1) :
'from' must be a finite number.
Do you have some hints to debug this error here?
What is the main issue here?

argument "x" is missing, with no default in ezANOVA

I am getting a weird problem with ezANOVA. When I try to execute code below it says that some data is missing, but when I look at the data, nothing is missing.
model_acc <- ezANOVA(data = nback_acc_summary[complete.cases(nback_acc_summary),],
dv = Stimulus1.ACC,
wid = Subject,
within = c(ExperimentName, Target),
between = Group,
type = 3,
detailed = T)
When I run these lines I get an error message that says:
Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.
Then I run
ezDesign(nback_acc_summary)
And get the message:
Error in as.list(c(x, y, row, col)) :
argument "x" is missing, with no default
I am not sure what to change in the code, because I can't really figure out what the problem is. I've researched the issue online, and it seems like quite a lot of users have encountered it before, but there is a very limited amount of solutions posted. I would be grateful for any kind of help.
Thanks!
For an ANOVA model you must have observations in all conditions created by the design of your model.
For example, if ExperimentName, Target, and Group each have two levels each, you have 2 x 2 x 2 = 8 conditions which require multiple observations in each condition. Then, add a constraint to this that your model is repeated measures which means that each Subject within a level of your between factor Group must have observations for all of the within conditions (i.e., ExperimentName x Target = 2 x 2 = 4).
The first error suggests you have fallen short of having enough data in the conditions suggested by your model.
The following should produce a plot to help identify which conditions are missing data:
ezDesign(
data = nback_acc_summary[complete.cases(nback_acc_summary), ],
x = Target,
y = Subject,
row = ExperimentName,
col = Group
)

Resources