argument "x" is missing, with no default in ezANOVA - r

I am getting a weird problem with ezANOVA. When I try to execute code below it says that some data is missing, but when I look at the data, nothing is missing.
model_acc <- ezANOVA(data = nback_acc_summary[complete.cases(nback_acc_summary),],
dv = Stimulus1.ACC,
wid = Subject,
within = c(ExperimentName, Target),
between = Group,
type = 3,
detailed = T)
When I run these lines I get an error message that says:
Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.
Then I run
ezDesign(nback_acc_summary)
And get the message:
Error in as.list(c(x, y, row, col)) :
argument "x" is missing, with no default
I am not sure what to change in the code, because I can't really figure out what the problem is. I've researched the issue online, and it seems like quite a lot of users have encountered it before, but there is a very limited amount of solutions posted. I would be grateful for any kind of help.
Thanks!

For an ANOVA model you must have observations in all conditions created by the design of your model.
For example, if ExperimentName, Target, and Group each have two levels each, you have 2 x 2 x 2 = 8 conditions which require multiple observations in each condition. Then, add a constraint to this that your model is repeated measures which means that each Subject within a level of your between factor Group must have observations for all of the within conditions (i.e., ExperimentName x Target = 2 x 2 = 4).
The first error suggests you have fallen short of having enough data in the conditions suggested by your model.
The following should produce a plot to help identify which conditions are missing data:
ezDesign(
data = nback_acc_summary[complete.cases(nback_acc_summary), ],
x = Target,
y = Subject,
row = ExperimentName,
col = Group
)

Related

Error: cannot subset columns that don't exist: predictY in hlme

I met a strange problem when I tried to get the predicted mean to draw a trajectory plot.
I used the results from the following 3-class model:
WK$TIME_25<-(WK$TIME_QRT_R-25)/10 #TIME_QRT_R is the orginal time variable
m3a <- hlme(GFR_QRT_MEAN~ poly(TIME_25, degree = 3, raw =
TRUE)+GENDER+AGE_BL+HOSP+PROTEINURIA_bl,random = ~ 1, mixture=~poly(TIME_25, degree = 3, raw
= TRUE),subject = "ID",data = WK, ng = 3,B=m1)
After I got the results, I tried to create new dataset to draw trajectory mean plot. My codes are as below:
datnew<-data.frame(TIME_QRT_R=seq(0,27,length=100)
datnew$TIME_25<-(datnew$TIME_QRT_R-25)/10
datnew$GENDER<-1
datnew$AGE_BL<-64
datnew$HOSP<-1 # 1 stands for hospitalization
datnew$PROTEINURIA_bl<-1 # 1 stands for the presence of renal damage
mon_p<-predictY(m3a, datnew, var.time='TIME_QRT_R', draws=T)
To my surprise, I got the following error message after running the 'predictY' statement above:
"Error: cannot subset columns that don't exist. x column 'TIME_25' doesn't exist"
I created 'TIME_25' in the 'datnew' data, and the hlme model also includes this variable. Why did I get this error? Could you let me know how to fix it? Thank you!
Sincerely,
Liang Feng

Error in array, regression loop using "plyr"

Good morning,
I´m currently trying to run a truncated regression loop on my dataset. In the following I will give you a reproducible example of my dataframe.
library(plyr)
library(truncreg)
df <- data.frame("grid_id" = rep(c(1,2), 6),
"htcm" = rep(c(160,170,175), 4),
stringsAsFactors = FALSE)
View(df)
Now I tried to run a truncated regression on the variable "htcm" grouped by grid_id to receive only coefficients (intercept such as sigma), which I then stored into a dataframe. This code is written based on the ideas of #hadley
reg <- dlply(df, "grid_id", function(.)
truncreg(htcm ~ 1, data = ., point = 160, direction = "left")
)
regcoef <- ldply(reg, coef)
As this code works for one of my three datasets, I receive error messages for the other two ones. The datasets do not differ in any column but in their absolute length
(length(df1) = 4,000; length(df2) = 100,000; length(df3) = 13,000)
The error message which occurs is
"Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : 'data' must be of type vector, was 'NULL'
I do not even know how to reproduce an example where this error code occurs, because this code works totally fine with one of my three datasets.
I already accounted for missing values in both columns.
Does anyone has a guess what I can fix to this code?
Thanks!!
EDIT:
I think I found the origin of error in my code, the problem is most likely about that in a truncated regression model, the standard deviation is calculated which automatically implies more than one observation for any group. As there are also groups with only n = 1 observations included, the standard deviation equals zero which causes my code to detect a vector of length = NULL. How can I drop the groups with less than two observations within the regression code?

"Input datasets must be dataframes" error in kamila package in R

I have a mixed type data set, one continuous variable, and eight categorical variables, so I wanted to try kamila clustering. It gives me an error when I use one continuous variable, but when I use two continuous variables it is working.
library(kamila)
data <- read.csv("mixed.csv",header=FALSE,sep=";")
conInd <- 9
conVars <- data[,conInd]
conVars <- data.frame(scale(conVars))
catVarsFac <- data[,c(1,2,3,4,5,6,7,8)]
catVarsFac[] <- lapply(catVarsFac, factor)
kamRes <- kamila(conVars, catVarsFac, numClust=5, numInit=10,calcNumClust = "ps",numPredStrCvRun = 10, predStrThresh = 0.5)
Error in kamila(conVar = conVar[testInd, ], catFactor =
catFactor[testInd, : Input datasets must be dataframes
I think the problem is that the function assumes that you have at least two of both data types (i.e. >= 2 continuous variables, and >= 2 categorical variables). It looks like you supplied a single column index (conInd = 9, just column 9), so you have only one continuous variable in your data. Try adding another continuous variable to your continuous data.
I had the same problem (with categoricals) and this approach fixed it for me.
I think the ultimate source of the error in the program is at around line 170 of the source code. Here's the relevant snippet...
numObs <- nrow(conVar)
numInTest <- floor(numObs/2)
for (cvRun in 1:numPredStrCvRun) {
for (ithNcInd in 1:length(numClust)) {
testInd <- sample(numObs, size = numInTest, replace = FALSE)
testClust <- kamila(conVar = conVar[testInd,],
catFactor = catFactor[testInd, ],
numClust = numClust[ithNcInd],
numInit = numInit, conWeights = conWeights,
catWeights = catWeights, maxIter = maxIter,
conInitMethod = conInitMethod, catBw = catBw,
verbose = FALSE)
When the code partitions your data into a training set, it's selecting rows from a one-column data.frame, but that returns a vector by default in that case. So you end up with "not a data.frame" even though you did supply a data.frame. That's where the error comes from.
If you can't dig up another variable to add to your data, you could edit the code such that the calls to kamila in the cvRun for loop wrap the data.frame() function around any subsetted conVar or catFactor, e.g.
testClust <- kamila(conVar = data.frame(conVar[testInd,]),
catFactor = data.frame(catFactor[testInd,], ... )
and just save that as your own version of the function called say, my_kamila, which you could use instead.
Hope this helps.

"Error in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, : y and x different lengths"

I am trying to do Program Evaluation for my data set which is "growth" (basically it is the gdp of each country).
I have already ruled out all of the countries that do not have a complete set of data for the time period 1980-2000 so I am sure that I do not have any missing data.
After importing the data into R, I followed the instructions in the paper:
An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo
This is a screen shot of how my data looks like in RStudio.
> time.pretr<-c("1980", "1983")
> time.tr<-c("1984","1994")
> treated <-"SOUTHAFRICA"
> econ.integ<-pampe(time.pretr = time.pretr,time.tr = time.tr,treated = treated,data = growth)
Error in leaps.setup(x, y, wt = weights, nbest = nbest, nvmax = nvmax, :
y and x different lengths
What could I do to resolve this error and what does it mean when it says "y and x different lengths"?
Please note, this is my first time using R/RStudio so I am not familiar with the terms. If anyone could offer some help, it would be greatly appreciated.
My solution was the following:
x - has to be created using as.martix(your_data) and
y - has to be created as: your_data$target_column
It appears leaps is very particular of the data structures used. As Александр Ласкорунский wrote he had to use those particular structures. I tried some experimenting and it seems that the x entry HAS to be a matrix data structure. Not a data frame or a tibble but specifically a matrix. For the y argument the structure MUST be a vector. I'm sure the more versed individuals in this area would be able to expand on this.
EDIE: Seems not only a vector but it has to be an atomic vector.

Path Analysis with plspm

I am attempting to build a Partial Least Squares Path Model using 'plspm'. After reading through the tutorial and formatting my data I am getting hung up on an error:
"Error in if (w.dif < tol || itermax == iter) break : missing value where TRUE/FALSE needed".
I assume that this error is the result of missing values for some of the latent variables (e.g. Soil_Displaced) has a lot of NAs because this variable was only measured in a subset of the replicates in the experiment. Is there a way to get around this error and work with variables with a lot of missing values. I am attaching my code and dateset here and the dataset can also be found in this dropbox file; https://www.dropbox.com/sh/51x08p4yf5qlbp5/-al2pwdCol
this is my code for now:
# inner model matrix
warming = c(0,0,0,0,0,0)
Treatment=c(0,0,0,0,0,0)
Soil_Displaced = c(1,1,0,0,0,0)
Mass_Lost_10mm = c(1,1,0,0,0,0)
Mass_Lost_01mm = c(1,1,0,0,0,0)
Daily_CO2 = c(1,1,0,1,0,0)
Path_inner = rbind(warming, Treatment, Soil_Displaced, Mass_Lost_10mm, Mass_Lost_01mm,Daily_CO2 )
innerplot(Path_inner)
#develop the outter model
Path_outter = list (3, 4:5, 6, 7, 8, 9)
# modes
#designates the model as a reflective model
Path_modes = rep("A", 6)
# Run it plspm(Data, inner matrix, outer list, modes)
Path_pls = plspm(data.2011, Path_inner, Path_outter, Path_modes)
Any input on this issue would be helpful. Thanks!
plspm does work limited with missing values, you have to set the scaling to numeric.
for your example the code looks as follows:
example_scaling = list(c("NUM"),
c("NUM", "NUM"),
c("NUM"),
c("NUM"),
c("NUM"),
c("NUM"))
Path_pls = plspm(data.2011, Path_inner, Path_outter, Path_modes, scaling = example_scaling)
But heres the limitation:
However if your dataset contains one observation where all indicators of a latent variable are missing values, this won't work.
First Case: F.e. the latent variable "Treatment" has 2 indicators, if one of them is NA, it works fine.
Second Case: But if there is just one observation where both indicators are NA, it won't work.
Since youre measuring the other 5 latent variables with just one indicator and you say your data contains lots of missing values, the second one will likely be the case.
PLSPM will not work with missing values therefore I had to interpolate some of the missing values from known observations. When this was done the code above worked great!.

Resources