R Error in mclogit::mblogit() solve.default(X[[i]], ...) : 'a' (4 x 1) must be square - r

I am trying to use a multinomial logistic regression model to determine how different factors influence the liklihood of several behavioral states among two species of shark.
19 individual animals, comprising two distinct species, were each tracked for ~100 days each and a different behavioral state was identified for each day data were collected.
I would like to code individual shark as a categorical random effect variable (with 19 levels) within species, a categorical fixed effect variable (with 2 levels).
With this general idea, the code that I am currently trying to run is:
mclogit::mblogit(cluster ~ species, random = ~1|individual %in% species, data = df, method = "MQL")
The model appears to run normally but produces the error message:
Error in *tmp*[[k]] : subscript out of bounds
Reversing the order of the random effect interaction term produces a different error message. Now the code reads:
mclogit::mblogit(cluster ~ species, random = ~1|species %in% individual, data = df, method = "MQL")
And produces the error:
Error in solve.default(X[[i]], ...) : 'a' (6 x 1) must be square
Here is a sample of the raw data with which I am trying to fit my model:
df <- data.frame(
Date = c("2015-11-25", "2016-01-24", "2016-02-27", "2016-03-27", "2017-12-02", "2017-12-06", "2015-10-30", "2015-10-31"),
cluster = factor(c(3,3,4,6,3,1,3,2)),
species = factor(c("I.oxyrinchus", "I.oxyrinchus", "I.oxyrinchus", "I.oxyrinchus", "P.glauca", "P.glauca", "P.glauca", "P.glauca")),
individual = factor(c("141257", "141257", "141254", "141254", "141256", "141256", "141255", "141255")))
Attempting to run the code with this reduced dataset produces only the second of the two error messages.
My questions are two fold:
What are the meanings of these two error messages, and how might I address one or both of them?
Why might the order of the terms in the random effect portion of the model formula produce two different results?
Thank you.

Related

Can't run glm due to the following error: "variable lengths differ (found for 'data')"

I try to run a regression using the glm function, however I keer getting the same error message: "variable lengths differ (found for 'data')". I can't see how my data does not have the same length as I use a sample of 1000 for both my dependent and independent variables. The reason I take a sample of my total data is because I have more than a million observations and I want to see if the model works properly. (running it with all the data takes a very long time) This is the code I use:
sample = sample(1:nrow(agg), 1000, replace = FALSE)
y=agg$TO_DEFAULT_IN_12M_INDICATOR[sample]
test <- glm(as.factor(y) ~., data = as.factor(agg[sample,]), family = binomial)
#coef(full.model)
Here agg contains all my data, and my y is an indicator function of 0's and 1's. Does anyone know how I could fix this problem?

weighting not works in 'aov' function of R

I have got in trouble with implementing weighted dataset by aov function in R.
For example my dataset "data_file" has target var "Y", and four independent var named (treat, V1 ,V2, V3).
Assuming:
V1(2 groups) & treat(3 groups) --> categorical,
V2 and v3 --> continuous.
I want to check baseline comparisons of independent variables among treat groups.
I ran aov test for this purpose,example:
base_V2_aov <- aov(data_file$V2 ~ data_file$treat)
base_V2_anov <- anova(base_V2_aov)
base_V2
It worked and showed significant difference of V2 among "treat" group but other variables were non significant, then i decided to weight my data based on V2 and run aov test in weighted data.
I used mnps function in Twang package for weighting.
mnps.data <- mnps(treat ~ V2, data_file, estimand = "ATE", stop-method = "es.mean", n.trees=5000, varbose = F)
data_file$ weight <- get.weights(mnps.data, stop.method = "es.mean")
I have read in one stackoverflow answer that survey packages does not support weighting for one-way ANOVA test, but aov function does.
So i ran this code:
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(data_file$weight))
base_V2_anov <- anova(base_V2_aov)
print(base_V2_anov)
It showes an error:
Error: $ operator is invalid for atomic vectors
I tried :
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(weight))
It did not find object "weight"
I also checked this :
base_V2_aov <- aov(data_file$V2 ~ data_file$trea, weights(data_file))
It did not show error, but the results were exactly the same as without weighting(i expected to change base on significant difference without weighting)
I want to know what is the appropriate "object" for weights in aov function?
It seems you should use:
weight = your weighting variable in the aov arguments.
I replicate something like your dataset, and after using above code the results of the comparisons between groups were different which showed that weighting method had worked.

Plot how the estimated survival depends upon the value of a covariate of interest. Problems with relevel

I want to plot how the estimated survival from a Cox model depends upon the value of a covariate of interest, while the rest of variables are fixed to their average values (if they are continuous variables) or lowest values for dummy. Following this example http://www.sthda.com/english/wiki/cox-proportional-hazards-model , I have construct a new data frame with three rows, one for each value of my variable of interest; and the other covariates are fixed. Among these covariates I have two factor vectors. I created the new dataset and later it is passed to survfit() via the newdata argument.
When I passed the data frame to survfit(), I obtain the following error message error in relevel.default(occupation) : 'relevel' only for factors. Where is the source of problem? If the source of problem is related to the factor vectors, how I can solve it? Below find an example of the code. Unfortunately, I cannot share the data or find a dataset that produces the same error message:
I have transformed the factor variables into integer vectors in the cox model and in the new dataset. it did not work.
I have deleated all the factor variables and it works.
I have tried to implement this strategy, but it did not work: Plotting predicted survival curves for continuous covariates in ggplot
fit <- coxph(Surv(entry, exit, event == 1) ~ status_plot +
exp_national + relevel(occupation, 5) + age + gender + EDUCATION , data = data)
data_rank <- with(data,
data.frame(status_plot = c(1,2,3), # factor vector of interest
exp_national=rep(mean(exp_national, na.rm = TRUE), 3),
occupation = c(5,5,5), # factor with 6 categories, number 5 is the category of reference in the cox model
age=rep(mean(age, na.rm = TRUE), 3),
gender = c(1,1,1),
EDUCATION=rep(mean(EDUCATION, na.rm = TRUE), 3) ))
surv.fin <- survfit(fit, newdata=data_rank) # this produces the error
Looking at the code it appears you probably attempted to take the mean of a factor. So do post at least str(data) as an edit to the body of your question. You should also realize that you can give a single value to a column in a data.frame call and have it recycled to the correct length, you all the meanss could be entered as a single item rather thanrep`-ng.

Getting a warning using predict function in R

I have a data set of 400 observations which I divided in 2 separate sets one for training (300 observations) and one for testing (100 observations). I am trying to create a step function regression, the problem is once I try to use the model in order to predict values form the test set I get a warning:
Warning message: 'newdata' had 100 rows but variables found have 300 rows
The variable I am trying to predict is Income and the explanatory variable is called Age.
This is the code:
fit=lm(Incomeāˆ¼cut (training$Age ,4) ,data=training)
predict(fit,test)
Instead of getting 100 predictions based on the test data I get a warning sign and 300 predictions based on the training data.
I read about other people having this question and usually the answer has to do with the name of the variable being different in the data set and in the model, but I don't think this is the problem because while using a regular simple regression I don't get a warning :
lm.fit=lm(Income~Age,data = training)
predict(lm.fit,test)
There are a number of problems here, so it will take several steps to get to a good answer. You did not provide data so I am going to use other data that gets the same kind of error message. The built-in data set iris has 4 continuous variables. I will arbitrarily select two for use here, then apply code just like yours
MyData = iris[,3:4]
set.seed(2017) # for reproducibility
T = sample(150, 100)
training = MyData[ T, ]
test = MyData[-T, ]
fit=lm(Petal.Width ~ cut(training$Petal.Length, 4), data=training)
predict(fit,test)
Warning message:
'newdata' had 50 rows but variables found have 100 rows
So I am getting the same type of error.
cut is changing the continuous variable Petal.Length into a factor with 4 levels. You built your model on the factor, but when you try to predict the new values, you just passed in test, which still has the continuous values (Age in your data; Petal.Length in mine). Trying to evaluate the predict statement, we need to evaluate cut(test$Petal.Length, 4) as part of the process. Look at what that means.
C1 = cut(training$Petal.Length, 4)
C2 = cut(test$Petal.Length, 4)
levels(C1)
[1] "(0.994,2.42]" "(2.42,3.85]" "(3.85,5.28]" "(5.28,6.71]"
levels(C2)
[1] "(1.09,2.55]" "(2.55,4]" "(4,5.45]" "(5.45,6.91]"
The levels are completely different. There is no way that your model can be used on these different levels. You can see the bin boundaries for C1 so it is tempting to just use those boundaries and partition the test data.
levels(C1)
"[0.994,2.42]" "(2.42,3.85]" "(3.85,5.28]" "(5.28,6.71]"
CutPoints = c(0.994, 2.42, 3.85, 5.28, 6.71)
C2 = cut(test$Petal.Length, breaks=CutPoints, include.lowest=TRUE)
But under careful examination, you will see that this did not work. Just printing out a relevant piece of the data
C2[42:46]
[1] (5.28,6.71] (5.28,6.71] <NA> (3.85,5.28] (3.85,5.28]
C2[44] is undefined. Why? One of the values in the test set fell outside the range of values for the training set, so it does not belong in any bin.
test$Petal.Length[44]
[1] 6.9
So what you really need to do is impose no lower limit or upper limit.
## cut the training data to get cut points
C1 = cut(training$Petal.Length, 4)
levels(C1)
"[0.994,2.42]" "(2.42,3.85]" "(3.85,5.28]" "(5.28,6.71]"
CutPoints = c(-Inf, 2.42, 3.85, 5.28, Inf)
It may be easiest to just make new data.frames with the binned data
Binned.training = training
Binned.training$Petal.Length = cut(training$Petal.Length, CutPoints)
Binned.test = test
Binned.test$Petal.Length = cut(test$Petal.Length, CutPoints)
fit=lm(Petal.Width ~ Petal.Length, data=Binned.training)
predict(fit,Binned.test)
## No errors
This will work for your test data and any data that you get in the future.

How would I run an ANOVA in R on this long form data?

My factors are constraint (high or low), picture type type (a,b,c,d), and electrode (29 values).
My dependent variable is the amplitude measured from the electrodes. So it is 2 x 4 x 29. How can I run the ANOVA so that R does not think that each of the electrodes is another measurement but included in the 'electrode' factor?
This is what I tried so far but I get an error
anova1 <- ezANOVA(data=dat, dv=n_400, wid=subject, within=.(constraint, ending, electrode), type="III")
>Error in ezANOVA_main(data = data, dv = dv, wid = wid, within = within, :
One or more cells is missing data. Try using ezDesign() to check your data.

Resources