I am doing an AICc analysis with my insect biomass per hour data in R to find which of the environmental predictors I've measured influence the biomass the most. I am doing glmm with the Gamma distribution and a "log" as link function for my model competition. All my models are converging except my null model. I am still struggling to understand why this is hapenning. Does anybody have an idea? Here is my code in R:
What my data look like:
> str(insectnona)
'data.frame': 76 obs. of 28 variables:
$ TIME : Factor w/ 7 levels "2016_6","2016_7",..: 4 6 7 3 2 4 6 7 2 3 ...
$ JULIAN : Factor w/ 28 levels "147","148","149",..: 3 9 23 24 16 2 11 20 10 19 ...
$ SITE : Factor w/ 8 levels "1","3","5","12",..: 1 1 1 1 2 3 3 3 4 4 ...
$ HABITAT : Factor w/ 3 levels "C","E","F": 1 1 1 1 1 1 1 1 1 1 ...
$ TEMP_CIVIL : num 17.8 18.9 21.1 15 16 ...
$ BIO_ZONE : Factor w/ 3 levels "ESSFwh3","ICHdw1",..: 2 2 2 2 2 2 2 2 1 1 ...
$ AGE_CLASS : Factor w/ 3 levels "6","7","8": 2 2 2 2 2 1 1 1 3 3 ...
$ RICHNESS : int 4 9 9 9 8 6 8 8 3 2 ...
$ ARANEAE_Btot: num 0 0.1 0.1 6.9 3.73 ...
$ COL_Btot : num 2152.4 66.8 88.4 6.9 80.4 ...
$ DIP_Btot : num 72.8 39.6 17.7 20.9 132.4 ...
$ EPH_Btot : num 0 0 0 10.2 0.0333 ...
$ HEM_Btot : num 0 0.1 18.5 0 0 ...
$ HOM_Btot : num 0 14.9 30 6.2 0 ...
$ HYM_Btot : num 40.9 65.6 36.5 38 36.7 ...
$ LEP_Btot : num 161 2625 696 390 869 ...
$ NEU_Btot : num 0 0.1 3 15.5 10.6 ...
$ ORT_Btot : num 0 24.8 0 0 0 0 0 0 0 0 ...
$ PSO_Btot : num 0 0 0 0 0 0 9.3 0 0 0 ...
$ THY_Btot : num 0 0 0 0 0 0 0 0 0 0 ...
$ TRI_Btot : num 0 0 34.5 20.3 4.4 ...
$ BIOMASS_tot : num 2427 2837 924 515 1138 ...
$ OTHER_Btot : num 114 145 140 118 188 ...
$ COL_bhr : num 321.254 10.603 10.914 0.843 11.518 ...
$ LEP_bhr : num 24.1 416.7 85.9 47.7 124.5 ...
$ BIOMASS_hr : num 362.3 450.3 114.1 62.9 162.9 ...
$ RICHNESS_hr : num 0.597 1.429 1.111 1.1 1.146 ...
$ sTEMP_CIVIL : num 0.6228 0.8304 1.263 0.0796 0.2736 ...
My model competition:
modl <- list()
modl[[1]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[2]]=glmer(BIOMASS_hr~HABITAT + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[3]]=glmer(BIOMASS_hr~AGE_CLASS + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[4]]=glmer(BIOMASS_hr~BIO_ZONE + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[5]]=glmer(BIOMASS_hr~sTEMP_CIVIL + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[6]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + sTEMP_CIVIL + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[7]]=glmer(BIOMASS_hr~HABITAT + sTEMP_CIVIL + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[8]]=glmer(BIOMASS_hr~HABITAT + BIO_ZONE + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[9]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + BIO_ZONE + (1|SITE) +(1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[10]]=glmer(BIOMASS_hr~1 + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log"))
aictab(modl)
And then I get this warning message only for the null model (model 10):
Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, : Model failed to converge with max|grad| =
0.0169244 (tol = 0.001, component 1)
Thanks in advance for your help!
Related
In my data, Hemoglobin_group is a categorical variable with 6 levels. When I run the below code, I can't see Hemoglobin_group levels in the output. How can I solve this problem?
fit <- coxph(Surv(Time,Status)~Hemoglobin_group, data)
fit
My output:
coef exp(coef) se(coef) z p
Hemoglobin_group -0.06585 0.93627 0.01874 -3.514 0.000441
str(data)
tibble [2,021 x 21] (S3: tbl_df/tbl/data.frame)
$ status : num [1:2021] 1 1 1 1 1 1 0 0 0 0 ...
$ id : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Time : num [1:2021] 20.3 20.3 20.3 20.3 20.3 ...
$ t1 : num [1:2021] 0 1 2 3 4 5 0 1 2 3 ...
$ t2 : num [1:2021] 1 2 3 4 5 ...
$ sex_string : chr [1:2021] "MALE" "MALE" "MALE" "MALE"...
$ sex : num [1:2021] 1 1 1 1 1 1 1 1 1 1 ...
$ age : num [1:2021] 89 89 89 89 89 89 77 77 77 77 ...
$ hemoglobin : num [1:2021] 9.71 10.22 11.3 11.8 11.2 ...
$ Diabet : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Hemoglobin_group : num [1:2021] 4 4 4 4 4 4 5 5 5 5 ...
$ Kreatinin : num [1:2021] 7.19 7.19 7.19 7.19 7.19 ...
$ fosfor : num [1:2021] 4.14 4.14 4.14 4.14 4.14 ...
$ Kalsiyum : num [1:2021] 8.5 8.5 8.5 8.5 8.5 ...
$ CRP : num [1:2021] 1.33 1.33 1.33 1.33 1.33 ...
$ Albumin : num [1:2021] 4.19 4.19 4.19 4.19 4.19 ...
$ Ferritin : num [1:2021] 428 428 428 428 428 ...
$ months : num [1:2021] 1 2 3 4 5 6 1 2 3 4 ...
It looks like when I write str (data). I thought I was transforming into a factor by doing the following codes in my data. I guess I couldn't transform. I did not understand?
The codes I wrote to convert to factor were as follows
sex<-as.factor(sex)
is.factor(sex)
Diabet<-as.factor(Diabet)
is.factor(Diabet)
Status<-as.factor(Status)
is.factor(Status)
months<-as.factor(months)
is.factor(months)
Hemoglobin_group<-as.factor(Hemoglobin_group)
is.factor(Hemoglobin_group)
When ı run this code, R console looks like:
> sex<-as.factor(sex)
> is.factor(sex)
[1] TRUE
>
> Diabet<-as.factor(Diabet)
> is.factor(Diabet)
[1] TRUE
>
>
> Status<-as.factor(Status)
> is.factor(Status)
[1] TRUE
>
> months<-as.factor(months)
> is.factor(months)
[1] TRUE
>
> Hemoglobin_group<-as.factor(Hemoglobin_group)
> is.factor(Hemoglobin_group)
[1] TRUE
In this case, don't the categorical variables in my data turn into factors?
Your variable Hemoglobin_group is probably considered as a numeric value. Try:
Hemoglobin_groupF <- factor(Hemoglobin_group)
fit <- coxph(Surv(Time,Status) ~ Hemoglobin_groupF, data)
fit
The reference group will the first factor. You can easily change your reference factor with the function relevel
I'm trying to execute the Cross-Validation for the boosting regression/classification trees using the function gbm.step() from the R package dismo, but it returns a empty output and I can't figure out why. This is the code I'm using:
ColIndexCov <- match(names(myRS),colnames(DFbrt_df2))
ColIndexResp <- match(c("HasRes"),colnames(DFbrt_df2))
DFbrt_df <- DFbrt#data
DFbrt_df2 <- na.omit(DFbrt_df)
myBRT = gbm.step(data=DFbrt_df2,
gbm.x = ColIndexCov,
gbm.y = ColIndexResp,
tree.complexity = 3,
learning.rate = 10^(-8),
n.trees = 50,
family = "bernoulli",
n.folds = 4,
fold.vector = DFbrt_df2$Region.num,
step.size = 50,
verbose = F,
silent = T
)
str(DFbrt_df2)
'data.frame': 560845 obs. of 18 variables:
$ Nsamples : num 310 310 310 310 310 310 310 310 310 310 ...
$ cluster : num 39 39 39 39 39 39 39 39 39 39 ...
$ R : num 44.9 44.9 44.9 44.9 44.9 ...
$ P50 : num 0.565 0.544 0.609 0.605 0.593 ...
$ regions : Factor w/ 6 levels "China_east","China_middlesouth",..: 1 1 1 1 1 1 1 1 1 1 ...
$ HasRes : num 1 0 1 0 0 0 1 1 0 0 ...
$ use : num 10.02 9.75 0 9.38 8.77 ...
$ acc : num 0 0 0.4103 0.0769 0.0779 ...
$ tmp : num 2.46 2.46 2.46 2.46 2.45 ...
$ irg : num 1.788 0.399 1.205 1.836 1.841 ...
$ PgExt : num 3.11 0 3.7 3.11 3.18 ...
$ PgInt : num 4.69 2.76 0 3.99 2.22 ...
$ ChExt : num 3.74 0 4.33 3.74 3.81 ...
$ ChInt : num 5.01 5.99 5.35 4.88 4.97 ...
$ Ca : num 0 0 2.71 0 2.8 ...
$ veg : num 0 0 0 0 0 0 0 0 0 0 ...
$ Region.num: num 4 4 4 4 4 4 4 4 4 4 ...
$ Region : num 4 4 4 4 4 4 4 4 4 4 ...
- attr(*, "na.action")= 'omit' Named int 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "names")= chr "1" "2" "3" "4" ...
the answer variable is the variable HasRes and the covariates are the variables use, acc, tmp, irg, PgExt, PgInt, ChExt, ChInt, ca, veg.
library(survival)
etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))
mfit2 <- survfit(Surv(etime, event) ~ sex, data=mgus2)
plot(mfit2, col=c(1,2,1,2), lty=c(2,2,1,1),
mark.time=FALSE, lwd=2, xscale=12,
xlab="Years post diagnosis", ylab="Probability in State")
legend(240, .6, c("death:female", "death:male", "pcm:female", "pcm:male"),
col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
This is a reproducible example here. I wonder, how can it be possible to take out these data from 'mfit2' so it can be plotted in ggplot2?
You can extract the data from the summary of the fitted object using lapply
sfit <- summary(mfit2)
str(sfit)
List of 24
$ n : int [1:2] 631 753
$ time : num [1:359] 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk : int [1:359, 1:3] 631 610 599 595 588 587 581 580 573 569 ...
$ n.event : int [1:359, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
$ n.censor : num [1:359] 1 0 0 0 0 0 0 0 0 1 ...
$ pstate : num [1:359, 1:3] 0.968 0.951 0.944 0.933 0.932 ...
$ p0 : num [1:2, 1:3] 1 1 0 0 0 0
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "sex=F" "sex=M"
.. ..$ : chr [1:3] "(s0)" "pcm" "death"
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
...
I think the columns you need are the time, pstate and `strata. But some others, such as the numbers at risk may be useful.
cols <- lapply(c(2:6, 8, 16, 17), function(x) sfit[x])
Then combine these columns into a data frame with do.call
data <- do.call(data.frame, cols)
str(data)
'data.frame': 359 obs. of 21 variables:
$ time : num 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk.1 : int 631 610 599 595 588 587 581 580 573 569 ...
$ n.risk.2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.risk.3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.1: int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.2: int 0 2 0 1 0 1 0 0 2 1 ...
$ n.event.3: int 20 9 4 6 1 5 1 7 2 2 ...
$ n.censor : num 1 0 0 0 0 0 0 0 0 1 ...
$ pstate.1 : num 0.968 0.951 0.944 0.933 0.932 ...
$ pstate.2 : num 0 0.00317 0.00317 0.00476 0.00476 ...
$ pstate.3 : num 0.0317 0.046 0.0523 0.0619 0.0634 ...
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
$ lower.1 : num 0.955 0.934 0.927 0.914 0.912 ...
$ lower.2 : num NA 0.000796 0.000796 0.00154 0.00154 ...
$ lower.3 : num 0.0206 0.0322 0.0375 0.0456 0.047 ...
$ upper.1 : num 0.982 0.968 0.963 0.953 0.952 ...
$ upper.2 : num NA 0.0127 0.0127 0.0147 0.0147 ...
$ upper.3 : num 0.0488 0.0656 0.0729 0.0838 0.0856 ...
This data is in wide format, best to reshape to long for the graph.
mgus3 <- data %>%
pivot_longer(cols=-c(time, strata, n.censor),
names_to=c(".value","state"),
names_pattern="(.+).(.+)") %>%
filter(state!=1) %>% # Exclude the censored state
mutate(state=factor(state, labels=c("pcm","death")),
group=interaction(strata, state))
Then plot it.
library(ggplot)
mgus3 %>%
ggplot(aes(x=time, y=pstate, col=group)) +
geom_line(aes(linetype=group)) +
ylab("Probability in State") +
theme_bw()
You should be able to add confidence bands and make it more pretty.
I've read my .CSV and then converted the file to a data frame using several methods including:
df<-read.csv('cdSH2015Fall.csv', dec = ".", na.strings = c("na"), header=TRUE,
row.names=NULL, stringsAsFactors=F)
df<-as.data.frame(lapply(df, unlist)) # converted .csv to a a data.frame
str(df) # provides the structure of df.
'data.frame': 72 obs. of 16 variables:
$ trtGroup : Factor w/ 68 levels "AANN","AAPN",..: 5 7 14 18 20 23
27 33 37 48 ...
$ cd : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ PreviousExp : Factor w/ 2 levels "Empty","Enriched": 2 1 2 2 2 2 1
1 1 1 ...
$ treatment : Factor w/ 2 levels "NN","PN": 1 1 1 1 1 1 1 1 1 1 ...
$ total.Area.DarkBlue.: num 827 1037 663 389 983 ...
$ numberOfGroups : int 1 1 1 1 1 1 1 1 1 1 ...
$ totalGroupArea : num 15.72 2.26 9.45 11.57 9.73 ...
$ averageGrpArea : num 15.72 2.26 9.45 11.57 9.73 ...
$ proximityToPlants : num 5.65 16.05 2.58 9.65 4.74 ...
$ latFeed : num 2 0.5 0 1 0 0 1 0.5 2 1 ...
$ latBalloon : num 6 2 2 NA 0 0.1 3 0.5 1 0.7 ...
$ countChases : int 5 8 16 4 16 21 18 11 14 28 ...
$ chases : int 95 87 67 923 636 96 1210 571 775 816 ...
$ grpDiameter : num 16.8 23.3 19.5 11.2 29.9 ...
$ grpActiv : num 4908 5164 4197 5263 5377 ...
$ NND : num 0 11.88 8.98 3.6 9.8 ...
I then run my model two ways:
First option.
fit = t.test(df$proximityToPlants[which (df$cd==1 &
df$treatment == 'PN')], df$proximityToPlants[which
(df$cd==0 & df$treatment == 'PN')]
)
Second option trying to ensure I have a proper data frame.
Subset the data and then create a matrix.
cdProximityToPlantsPN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==1 & cdSH2015Fall$treatment == 'PN')]
H2ProximityToPlantsPN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==0 & cdSH2015Fall$treatment == 'PN')]
cdProximityToPlantsNN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==1 & cdSH2015Fall$treatment == 'NN')]
H2ProximityToPlantsNN<-cdSH2015Fall$proximityToPlants[which (cdSH2015Fall$cd==0 & cdSH2015Fall$treatment == 'NN')]
Creating a matrix
df<-
cbind(cdProximityToPlantsPN,H2ProximityToPlantsPN,cdProximityToPlantsNN,
H2ProximityToPlantsNN)
mat <- sapply(df,unlist)
fit=t.test(mat[,1],mat[,2], paired = F, var.equal = T)
Yet, I still get errors when assessing outliers using the following:
outlierTest(fit) # Bonferonni p-value for most extreme obs
Error in UseMethod("outlierTest") :
no applicable method for 'outlierTest' applied to an object of class
"htest"
qqPlot(fit, main="QQ Plot") #qq plot for studentized resid
Error in order(x[good]) : unimplemented type 'list' in 'orderVector1'
leveragePlots(fit) # leverage plots
Error in formula.default(model) : invalid formula
I know the issue must be with my data structure. Any ideas on how to fix it?
I'm following up an old question addressed here:
calculate x-value of curve maximum of a smooth line in R and ggplot2
How could I calculate the Y-value of curve maximum?
Cheers
It would seem to me that code changes of "x" to "y" and 'vline' to 'hline' and "xintercept" to "yintercept" would be all that were needed:
gb <- ggplot_build(p1)
exact_y_value_of_the_curve_maximum <- gb$data[[1]]$y[which(diff(sign(diff(gb$data[[1]]$y)))==-2)+1]
p1 + geom_hline( yintercept =exact_y_value_of_the_curve_maximum)
exact_y_value_of_the_curve_maximum
I don't think I would call these "exact" since they are only numerical estimates. The other way to get that value would be
max(gb$data[[1]]$y)
As the $data element of that build-object can be examined:
> str(gb$data)
List of 2
$ :'data.frame': 80 obs. of 7 variables:
..$ x : num [1:80] 1 1.19 1.38 1.57 1.76 ...
..$ y : num [1:80] -123.3 -116.6 -109.9 -103.3 -96.6 ...
..$ ymin : num [1:80] -187 -177 -166 -156 -146 ...
..$ ymax : num [1:80] -59.4 -56.5 -53.5 -50.3 -46.9 ...
..$ se : num [1:80] 29.3 27.6 25.9 24.3 22.8 ...
..$ PANEL: int [1:80] 1 1 1 1 1 1 1 1 1 1 ...
..$ group: int [1:80] 1 1 1 1 1 1 1 1 1 1 ...
$ :'data.frame': 16 obs. of 4 variables:
..$ x : num [1:16] 1 2 3 4 5 6 7 8 9 10 ...
..$ y : num [1:16] -79.6 -84.7 -88.4 -74.1 -29.6 ...
..$ PANEL: int [1:16] 1 1 1 1 1 1 1 1 1 1 ...
..$ group: int [1:16] 1 1 1 1 1 1 1 1 1 1 ...