using clmm with random factor. Output? - r

I have a data set of trees from different populations (Pop_ID) that I scored based on leaf senescence (Score) at different moments in time (Date).
I want to use a mixed models for ordinal data with Gen_ID (genotype) as a random factor. I use the ordinal package.
fit=clmm(Score~Date*Pop_ID+ (1|Gen_ID), data=subset)
Here is what my data looks like:
ID2 Gen_ID Pop_Id Country Score Datum Date
1 2 FR213 FR2 France 2 23okt 1
2 4 PO148 PO2 Poland 3 23okt 1
3 5 FR218 FR6 France 2 23okt 1
4 6 NE91 NE4 The Netherlands 2 23okt 1
5 8 FR21 FR1 France 2 23okt 1
6 9 D68 DU2 Germany 2 23okt 1
I don't understand the different options that clmm uses such as link or treshold.
I am also unsure how to read the output. I would like to do an anova but I get the following error:
Error in (function (object, ...) : anova is not implemented for a
single "clm" object
Anova(fit) also gives me an error:
Error in vcov.clm(object, method = "Cholesky") : Cannot compute vcov:
Hessian is not positive definite
I would like to use a lsmeans but I am not sure how to
lsmeans(fit,~Datum2,mode="linear.predictor")
But this gives me the following error
Error in vcov.clm(object, method = "Cholesky") : Cannot compute vcov:
Hessian is not positive definite
How can I solve this?
When would you use Hess=T in your model?
I would appreciate the help. Thank you!

Related

inputs of 2 separate predict() return the same set of fitted values

Confession: I attempted to ask this question yesterday, but used a sample, congruent dataset which resembles the my "real" data in hopes this would be more convenient for readers here. One issue was resolved, but another remains that appears immutable.
My objective is creating a linear model of two predicted vectors: "yC.hat", and "yT.hat" which are meant to project average effects for unique observed values of pri2000v as a function of the average poverty level "I(avgpoverty^2) under control (treatment = 0) and treatment (treatment = 1) conditions.
While I appear to have no issues running the regression itself, the inputs of my data argument have no effect on predict(), and only the object itself affects the output. As a result, treatment = 0 and treatment = 1 in the data argument result in the same fitted values. In fact, I can plug in ANY value into the data argument and it makes do difference. So I suspect my failure to understand issue starts here.
Here is my code:
q6rega <- lm(pri2000v ~ treatment + I(log(pobtot1994)) + I(avgpoverty^2)
#interactions
+ treatment:avgpoverty + treatment:I(avgpoverty^2), data = pga)
## predicted PRI support under the Treatment condition
q6.yT.hat <- predict(q6rega,
data = data.frame(I(avgpoverty^2) = 9:25, treatment = 1))
## predicted PRI support rate under the Control condition
q6.yC.hat <- predict(q6rega,
data = data.frame(I(avgpoverty^2) = 9:25, treatment = 0))
q6.yC.hat == q6.yT.hat
TRUE[417]
dput(pga has been posted on my github, if needed
EDIT: There were a few things wrong with my code above, but not specifying pobtot1994 somehow resulted in R treating it as newdata being omitted. Since I'm fairly new to statistics, I confused fitted values with the prediction output that I was actually trying to achieve. I would have expected that an unexpected input is to produce an error instead.
I'm surprised you are able to run a prediction when it is lacking the required variable (pobtot1994) for your model in the new data frame for prediction.
Anyway, you would need to create a new data frame with the three variables in untransformed form used in the model. Since you are interested to compare the fitted values of avgpoverty 3 to 5 for treatment 1 and 0, you need to force the third variable pobtot1994 as a constant. I use the mean of pobtot9994 here for simplicity.
newdat <- expand.grid(avgpoverty=3:5, treatment=factor(c(0,1)), pobtot1994=mean(pga$pobtot1994))
avgpoverty treatment pobtot1994
1 3 0 2037.384
2 4 0 2037.384
3 5 0 2037.384
4 3 1 2037.384
5 4 1 2037.384
6 5 1 2037.384
The prediction will show you the different values for the two conditions.
newdat$fitted <- predict(q6rega, newdata=newdat)
avgpoverty treatment pobtot1994 fitted
1 3 0 2037.384 38.86817
2 4 0 2037.384 50.77476
3 5 0 2037.384 55.67832
4 3 1 2037.384 51.55077
5 4 1 2037.384 49.03148
6 5 1 2037.384 59.73910

Errors using powerSim and powerCurve for a clmm in R

I'm new to clmm and run into the following problem:
I want to obtain the optimal sample size for my study with R using powerSim and powerCurve. Because my data is ordinal, I'm using a clmm. Study participants (VPN) should evaluate three sentence types (SH1,SM1, and SP1) on a 5 point likert scale (evaluation.likert). I need to account for my participants as a random factor while the sentence types and the evaluation are my fixed factors.
Here's a glimpse of my data (count of VPN goes up to 40 for each of the parameters, I just shortened it here):
VPN parameter evaluation.likert
1 1 SH1 2
2 2 SH1 4
3 3 SH1 5
4 4 SH1 3
...
5 1 SM1 4
6 2 SM1 2
7 3 SM1 2
8 4 SM1 5
...
9 1 SP1 1
10 2 SP1 1
11 3 SP1 3
12 4 SP1 5
...
Now, with some help I created the following model:
clmm(likert~parameter+(1|VPN), data=dfdata)
With this model, I'm doing the simulation:
ps1 <- powerSim(power, test=fixed("likert:parameter", "anova"), nsim=40)
Warning:
In observedPowerWarning(sim) :
This appears to be an "observed power" calculation
print(ps1)
Power for predictor 'likert:parameter', (95% confidence interval):
0.00% ( 0.00, 8.81)
Test: Type-I F-test
Based on 40 simulations, (0 warnings, 40 errors)
alpha = 0.05, nrow = NA
Time elapsed: 0 h 0 m 0 s
nb: result might be an observed power calculation
In the above example, I tried it with 40 participants but I already also ran a simulation with 2000000 participants to check if I just need a huge amount of people. The results were the same though: 0.0%.
lastResult()$errors tells me that I'm using a method which is not applicable for clmm:
not applicable method for'simulate' on object of class "clmm"
But besides the anova I'm doing here, I've also already tried z, t, f, chisq, lr, sa, kr, pb. (And instead of test=fixed, I've also already tried test=compare, test=fcompare, test=rcompare, and even test=random())
So I guess there must be something wrong with my model? Or are really none of these methods applicaple for clmms?
Many thanks in advance, your help is already very much appreciated!

Adjusted survival curve based on weigthed cox regression

I'm trying to make an adjusted survival curve based on a weighted cox regression performed on a case cohort data set in R, but unfortunately, I can't make it work. I was therefore hoping that some of you may be able to figure it out why it isn't working.
In order to illustrate the problem, I have used (and adjusted a bit) the example from the "Package 'survival'" document, which means im working with:
data("nwtco")
subcoh <- nwtco$in.subcohort
selccoh <- with(nwtco, rel==1|subcoh==1)
ccoh.data <- nwtco[selccoh,]
ccoh.data$subcohort <- subcoh[selccoh]
ccoh.data$age <- ccoh.data$age/12 # Age in years
fit.ccSP <- cch(Surv(edrel, rel) ~ stage + histol + age,
data =ccoh.data,subcoh = ~subcohort, id=~seqno, cohort.size=4028, method="LinYing")
The data set is looking like this:
seqno instit histol stage study rel edrel age in.subcohort subcohort
4 4 2 1 4 3 0 6200 2.333333 TRUE TRUE
7 7 1 1 4 3 1 324 3.750000 FALSE FALSE
11 11 1 2 2 3 0 5570 2.000000 TRUE TRUE
14 14 1 1 2 3 0 5942 1.583333 TRUE TRUE
17 17 1 1 2 3 1 960 7.166667 FALSE FALSE
22 22 1 1 2 3 1 93 2.666667 FALSE FALSE
Then, I'm trying to illustrate the effect of stage in an adjusted survival curve, using the ggadjustedcurves-function from the survminer package:
library(suvminer)
ggadjustedcurves(fit.ccSP, variable = ccoh.data$stage, data = ccoh.data)
#Error in survexp(as.formula(paste("~", variable)), data = ndata, ratetable = fit) :
# Invalid rate table
But unfortunately, this is not working. Can anyone figure out why? And can this somehow be fixed or done in another way?
Essentially, I'm looking for a way to graphically illustrate the effect of a continuous variable in a weighted cox regression performed on a case cohort data set, so I would, generally, also be interested in hearing if there are other alternatives than the adjusted survival curves?
Two reasons it is throwing errors.
The ggadjcurves function is not being given a coxph.object, which it's halp page indicated was the designed first object.
The specification of the variable argument is incorrect. The correct method of specifying a column is with a length-1 character vector that matches one of the names in the formula. You gave it a vector whose value was a vector of length 1154.
This code succeeds:
fit.ccSP <- coxph(Surv(edrel, rel) ~ stage + histol + age,
data =ccoh.data)
ggadjustedcurves(fit.ccSP, variable = 'stage', data = ccoh.data)
It might not answer your desires, but it does answer the "why-error" part of your question. You might want to review the methods used by Therneau, Cynthia S Crowson, and Elizabeth J Atkinson in their paper on adjusted curves:
https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf

Fixing an error in R- "Incorrect number of dimensions" in the Dunn Test

I am trying to use the Dunn test for a comparison but I am getting an error: "Error in Psort[1, i] : incorrect number of dimensions"
the data I am trying to use is this sort of idea (but sample size is bigger):
Frequency Height
1 10
2 11
1 9
1 8
2 15
1 9
2 11
2 13
the code I used was
dunnTest(Height ~ Frequency,
data=Data,
method="bh")
is my problem that my frequency is only split into two groups? cause for another factor my frequency had three groups and it worked fine. If this is the problem, is there another test I can do that will perform a similar/the same function?
Thanks!
The Dunn test is equivalent to the Wilcox test (wilcox.test) if you adjust values of input parameters (disable the exact calculation of p value, disable the continuity correction, more here). For your data, one obtains:
> wilcox.test(df$Frequency, df$Height, correct = FALSE, exact = FALSE)
Wilcoxon rank sum test
data: df$Frequency and df$Height
W = 0, p-value = 0.0006346
alternative hypothesis: true location shift is not equal to 0
I think you are using the dunnTest function from the FSA package. This function fails for two groups.
Data
df <- read.table(text="Frequency Height
1 10
2 11
1 9
1 8
2 15
1 9
2 11
2 13", header=TRUE)

cv.MclustDA() error: Error in data[-folds[[i]], , drop = FALSE] : incorrect number of dimensions

I'm using the mclust() package for R to classify a univariate dataset, then assign classifications to new data using the Discriminate Analysis functionality. When trying to calculate the cross validation error rate using cv.MclustDA() I keep getting an error. Below is the code and error. The model object works fine, but something isn't right with doing the cross validation on that model object. Can anyone shed any light on what this error is? It's obviously failing based on the nfolds= argument, but changing the number doesn't help.
> DA_mclust_AmazData_3group=MclustDA(data=Combined_AmazData[,4], class=Combined_AmazData[,13])
> summary(DA_mclust_AmazData_3group)
------------------------------------------------
Gaussian finite mixture model for classification
------------------------------------------------
MclustDA model summary:
log.likelihood n df BIC
-Inf 29 18 -Inf
Classes n Model G
1 12 E 4
2 8 X 1
3 9 E 4
Training classification summary:
Predicted
Class 1 2 3
1 12 0 0
2 0 8 0
3 0 0 9
Training error = 0
>
> cv.MclustDA(DA_mclust_AmazData_3group)
cross-validating...
|
| 0%Error in data[-folds[[i]], , drop = FALSE] :
incorrect number of dimensions
The administrator for the Mclust package got in touch with me. This was a bug in the handling of the univariate case and will be fixed in the next version, which should be posted to CRAN soon.

Resources