issue with gbm.step() function in R - r

I'm trying to execute the Cross-Validation for the boosting regression/classification trees using the function gbm.step() from the R package dismo, but it returns a empty output and I can't figure out why. This is the code I'm using:
ColIndexCov <- match(names(myRS),colnames(DFbrt_df2))
ColIndexResp <- match(c("HasRes"),colnames(DFbrt_df2))
DFbrt_df <- DFbrt#data
DFbrt_df2 <- na.omit(DFbrt_df)
myBRT = gbm.step(data=DFbrt_df2,
gbm.x = ColIndexCov,
gbm.y = ColIndexResp,
tree.complexity = 3,
learning.rate = 10^(-8),
n.trees = 50,
family = "bernoulli",
n.folds = 4,
fold.vector = DFbrt_df2$Region.num,
step.size = 50,
verbose = F,
silent = T
)
str(DFbrt_df2)
'data.frame': 560845 obs. of 18 variables:
$ Nsamples : num 310 310 310 310 310 310 310 310 310 310 ...
$ cluster : num 39 39 39 39 39 39 39 39 39 39 ...
$ R : num 44.9 44.9 44.9 44.9 44.9 ...
$ P50 : num 0.565 0.544 0.609 0.605 0.593 ...
$ regions : Factor w/ 6 levels "China_east","China_middlesouth",..: 1 1 1 1 1 1 1 1 1 1 ...
$ HasRes : num 1 0 1 0 0 0 1 1 0 0 ...
$ use : num 10.02 9.75 0 9.38 8.77 ...
$ acc : num 0 0 0.4103 0.0769 0.0779 ...
$ tmp : num 2.46 2.46 2.46 2.46 2.45 ...
$ irg : num 1.788 0.399 1.205 1.836 1.841 ...
$ PgExt : num 3.11 0 3.7 3.11 3.18 ...
$ PgInt : num 4.69 2.76 0 3.99 2.22 ...
$ ChExt : num 3.74 0 4.33 3.74 3.81 ...
$ ChInt : num 5.01 5.99 5.35 4.88 4.97 ...
$ Ca : num 0 0 2.71 0 2.8 ...
$ veg : num 0 0 0 0 0 0 0 0 0 0 ...
$ Region.num: num 4 4 4 4 4 4 4 4 4 4 ...
$ Region : num 4 4 4 4 4 4 4 4 4 4 ...
- attr(*, "na.action")= 'omit' Named int 1 2 3 4 5 6 7 8 9 10 ...
..- attr(*, "names")= chr "1" "2" "3" "4" ...
the answer variable is the variable HasRes and the covariates are the variables use, acc, tmp, irg, PgExt, PgInt, ChExt, ChInt, ca, veg.

Related

In cox regression categorical levels not shown in my output

In my data, Hemoglobin_group is a categorical variable with 6 levels. When I run the below code, I can't see Hemoglobin_group levels in the output. How can I solve this problem?
fit <- coxph(Surv(Time,Status)~Hemoglobin_group, data)
fit
My output:
coef exp(coef) se(coef) z p
Hemoglobin_group -0.06585 0.93627 0.01874 -3.514 0.000441
str(data)
tibble [2,021 x 21] (S3: tbl_df/tbl/data.frame)
$ status : num [1:2021] 1 1 1 1 1 1 0 0 0 0 ...
$ id : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Time : num [1:2021] 20.3 20.3 20.3 20.3 20.3 ...
$ t1 : num [1:2021] 0 1 2 3 4 5 0 1 2 3 ...
$ t2 : num [1:2021] 1 2 3 4 5 ...
$ sex_string : chr [1:2021] "MALE" "MALE" "MALE" "MALE"...
$ sex : num [1:2021] 1 1 1 1 1 1 1 1 1 1 ...
$ age : num [1:2021] 89 89 89 89 89 89 77 77 77 77 ...
$ hemoglobin : num [1:2021] 9.71 10.22 11.3 11.8 11.2 ...
$ Diabet : num [1:2021] 1 1 1 1 1 1 2 2 2 2 ...
$ Hemoglobin_group : num [1:2021] 4 4 4 4 4 4 5 5 5 5 ...
$ Kreatinin : num [1:2021] 7.19 7.19 7.19 7.19 7.19 ...
$ fosfor : num [1:2021] 4.14 4.14 4.14 4.14 4.14 ...
$ Kalsiyum : num [1:2021] 8.5 8.5 8.5 8.5 8.5 ...
$ CRP : num [1:2021] 1.33 1.33 1.33 1.33 1.33 ...
$ Albumin : num [1:2021] 4.19 4.19 4.19 4.19 4.19 ...
$ Ferritin : num [1:2021] 428 428 428 428 428 ...
$ months : num [1:2021] 1 2 3 4 5 6 1 2 3 4 ...
It looks like when I write str (data). I thought I was transforming into a factor by doing the following codes in my data. I guess I couldn't transform. I did not understand?
The codes I wrote to convert to factor were as follows
sex<-as.factor(sex)
is.factor(sex)
Diabet<-as.factor(Diabet)
is.factor(Diabet)
Status<-as.factor(Status)
is.factor(Status)
months<-as.factor(months)
is.factor(months)
Hemoglobin_group<-as.factor(Hemoglobin_group)
is.factor(Hemoglobin_group)
When ı run this code, R console looks like:
> sex<-as.factor(sex)
> is.factor(sex)
[1] TRUE
>
> Diabet<-as.factor(Diabet)
> is.factor(Diabet)
[1] TRUE
>
>
> Status<-as.factor(Status)
> is.factor(Status)
[1] TRUE
>
> months<-as.factor(months)
> is.factor(months)
[1] TRUE
>
> Hemoglobin_group<-as.factor(Hemoglobin_group)
> is.factor(Hemoglobin_group)
[1] TRUE
In this case, don't the categorical variables in my data turn into factors?
Your variable Hemoglobin_group is probably considered as a numeric value. Try:
Hemoglobin_groupF <- factor(Hemoglobin_group)
fit <- coxph(Surv(Time,Status) ~ Hemoglobin_groupF, data)
fit
The reference group will the first factor. You can easily change your reference factor with the function relevel

Loop through list programatically

I have a list in R that I want to loop through all the elements.
This is the structure of the object:
> str(AAPL.OPT[c])
List of 1
$ jun.12.2020:List of 2
..$ calls:'data.frame': 52 obs. of 7 variables:
.. ..$ Strike: num [1:52] 180 185 200 210 240 ...
.. ..$ Last : num [1:52] 123 118 131 120 85 ...
.. ..$ Chg : num [1:52] 0 0 7.61 9.48 0 ...
.. ..$ Bid : num [1:52] 149 144 129 119 89 ...
.. ..$ Ask : num [1:52] 153.3 148.5 133.5 123.7 93.5 ...
.. ..$ Vol : int [1:52] NA 15 16 2 1 1 3 36 1 2 ...
.. ..$ OI : int [1:52] 0 15 25 4 50 3 4 36 6 10 ...
..$ puts :'data.frame': 56 obs. of 7 variables:
.. ..$ Strike: num [1:56] 150 165 170 180 185 190 195 200 205 210 ...
.. ..$ Last : num [1:56] 0.05 0.02 0.14 0.05 0.03 0.02 0.01 0.02 0.01 0.01 ...
.. ..$ Chg : num [1:56] 0 0 0 0 0 0 0 0 0 0 ...
.. ..$ Bid : num [1:56] NA 0 0 0 0 0 0 0 0 0 ...
.. ..$ Ask : num [1:56] 2.13 0.11 0.11 1.8 1.87 0.01 1.88 0.5 1.88 2.13 ...
.. ..$ Vol : int [1:56] NA 1 1 2 1 16 1 17 1 21 ...
.. ..$ OI : int [1:56] 1 10 7 9 76 201 113 314 92 264 ...
I cannot access the next level of the object programatically (by indexing the value)
I want to do something like this:
AAPL.OPT[c][1]
instead of this
AAPL.OPT[c]$jun.12.2020
Sample data of AAPL.OPT[c]
$`jun.12.2020`$`calls`
Strike Last Chg Bid Ask Vol OI
AAPL200612C00180000 180.0 123.29 0.00000000 149.00 153.35 NA 0
AAPL200612C00185000 185.0 117.60 0.00000000 144.00 148.50 15 15
AAPL200612C00200000 200.0 131.15 7.60999300 129.00 133.50 16 25
AAPL200612C00210000 210.0 119.95 9.47999600 119.30 123.65 2 4
....
AAPL.OPT[c] gives a list of length 1 which has two other lists in them. If we use [[c]] it gives a list of length 2 andtTo access each dataframe you can subset them further using [[ so AAPL.OPT[[c]][[1]] and AAPL.OPT[[c]][[2]].
We can use
AAPL.OPT[[c]]$jun.12.2020

R Extract Data From SurvFit

library(survival)
etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
event <- factor(event, 0:2, labels=c("censor", "pcm", "death"))
mfit2 <- survfit(Surv(etime, event) ~ sex, data=mgus2)
plot(mfit2, col=c(1,2,1,2), lty=c(2,2,1,1),
mark.time=FALSE, lwd=2, xscale=12,
xlab="Years post diagnosis", ylab="Probability in State")
legend(240, .6, c("death:female", "death:male", "pcm:female", "pcm:male"),
col=c(1,2,1,2), lty=c(1,1,2,2), lwd=2, bty='n')
This is a reproducible example here. I wonder, how can it be possible to take out these data from 'mfit2' so it can be plotted in ggplot2?
You can extract the data from the summary of the fitted object using lapply
sfit <- summary(mfit2)
str(sfit)
List of 24
$ n : int [1:2] 631 753
$ time : num [1:359] 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk : int [1:359, 1:3] 631 610 599 595 588 587 581 580 573 569 ...
$ n.event : int [1:359, 1:3] 0 0 0 0 0 0 0 0 0 0 ...
$ n.censor : num [1:359] 1 0 0 0 0 0 0 0 0 1 ...
$ pstate : num [1:359, 1:3] 0.968 0.951 0.944 0.933 0.932 ...
$ p0 : num [1:2, 1:3] 1 1 0 0 0 0
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "sex=F" "sex=M"
.. ..$ : chr [1:3] "(s0)" "pcm" "death"
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
...
I think the columns you need are the time, pstate and `strata. But some others, such as the numbers at risk may be useful.
cols <- lapply(c(2:6, 8, 16, 17), function(x) sfit[x])
Then combine these columns into a data frame with do.call
data <- do.call(data.frame, cols)
str(data)
'data.frame': 359 obs. of 21 variables:
$ time : num 1 2 3 4 5 6 7 8 9 10 ...
$ n.risk.1 : int 631 610 599 595 588 587 581 580 573 569 ...
$ n.risk.2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.risk.3 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.1: int 0 0 0 0 0 0 0 0 0 0 ...
$ n.event.2: int 0 2 0 1 0 1 0 0 2 1 ...
$ n.event.3: int 20 9 4 6 1 5 1 7 2 2 ...
$ n.censor : num 1 0 0 0 0 0 0 0 0 1 ...
$ pstate.1 : num 0.968 0.951 0.944 0.933 0.932 ...
$ pstate.2 : num 0 0.00317 0.00317 0.00476 0.00476 ...
$ pstate.3 : num 0.0317 0.046 0.0523 0.0619 0.0634 ...
$ strata : Factor w/ 2 levels "sex=F","sex=M": 1 1 1 1 1 1 1 1 1 1 ...
$ lower.1 : num 0.955 0.934 0.927 0.914 0.912 ...
$ lower.2 : num NA 0.000796 0.000796 0.00154 0.00154 ...
$ lower.3 : num 0.0206 0.0322 0.0375 0.0456 0.047 ...
$ upper.1 : num 0.982 0.968 0.963 0.953 0.952 ...
$ upper.2 : num NA 0.0127 0.0127 0.0147 0.0147 ...
$ upper.3 : num 0.0488 0.0656 0.0729 0.0838 0.0856 ...
This data is in wide format, best to reshape to long for the graph.
mgus3 <- data %>%
pivot_longer(cols=-c(time, strata, n.censor),
names_to=c(".value","state"),
names_pattern="(.+).(.+)") %>%
filter(state!=1) %>% # Exclude the censored state
mutate(state=factor(state, labels=c("pcm","death")),
group=interaction(strata, state))
Then plot it.
library(ggplot)
mgus3 %>%
ggplot(aes(x=time, y=pstate, col=group)) +
geom_line(aes(linetype=group)) +
ylab("Probability in State") +
theme_bw()
You should be able to add confidence bands and make it more pretty.

null model fail to converge in an AICc anaysis

I am doing an AICc analysis with my insect biomass per hour data in R to find which of the environmental predictors I've measured influence the biomass the most. I am doing glmm with the Gamma distribution and a "log" as link function for my model competition. All my models are converging except my null model. I am still struggling to understand why this is hapenning. Does anybody have an idea? Here is my code in R:
What my data look like:
> str(insectnona)
'data.frame': 76 obs. of 28 variables:
$ TIME : Factor w/ 7 levels "2016_6","2016_7",..: 4 6 7 3 2 4 6 7 2 3 ...
$ JULIAN : Factor w/ 28 levels "147","148","149",..: 3 9 23 24 16 2 11 20 10 19 ...
$ SITE : Factor w/ 8 levels "1","3","5","12",..: 1 1 1 1 2 3 3 3 4 4 ...
$ HABITAT : Factor w/ 3 levels "C","E","F": 1 1 1 1 1 1 1 1 1 1 ...
$ TEMP_CIVIL : num 17.8 18.9 21.1 15 16 ...
$ BIO_ZONE : Factor w/ 3 levels "ESSFwh3","ICHdw1",..: 2 2 2 2 2 2 2 2 1 1 ...
$ AGE_CLASS : Factor w/ 3 levels "6","7","8": 2 2 2 2 2 1 1 1 3 3 ...
$ RICHNESS : int 4 9 9 9 8 6 8 8 3 2 ...
$ ARANEAE_Btot: num 0 0.1 0.1 6.9 3.73 ...
$ COL_Btot : num 2152.4 66.8 88.4 6.9 80.4 ...
$ DIP_Btot : num 72.8 39.6 17.7 20.9 132.4 ...
$ EPH_Btot : num 0 0 0 10.2 0.0333 ...
$ HEM_Btot : num 0 0.1 18.5 0 0 ...
$ HOM_Btot : num 0 14.9 30 6.2 0 ...
$ HYM_Btot : num 40.9 65.6 36.5 38 36.7 ...
$ LEP_Btot : num 161 2625 696 390 869 ...
$ NEU_Btot : num 0 0.1 3 15.5 10.6 ...
$ ORT_Btot : num 0 24.8 0 0 0 0 0 0 0 0 ...
$ PSO_Btot : num 0 0 0 0 0 0 9.3 0 0 0 ...
$ THY_Btot : num 0 0 0 0 0 0 0 0 0 0 ...
$ TRI_Btot : num 0 0 34.5 20.3 4.4 ...
$ BIOMASS_tot : num 2427 2837 924 515 1138 ...
$ OTHER_Btot : num 114 145 140 118 188 ...
$ COL_bhr : num 321.254 10.603 10.914 0.843 11.518 ...
$ LEP_bhr : num 24.1 416.7 85.9 47.7 124.5 ...
$ BIOMASS_hr : num 362.3 450.3 114.1 62.9 162.9 ...
$ RICHNESS_hr : num 0.597 1.429 1.111 1.1 1.146 ...
$ sTEMP_CIVIL : num 0.6228 0.8304 1.263 0.0796 0.2736 ...
My model competition:
modl <- list()
modl[[1]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[2]]=glmer(BIOMASS_hr~HABITAT + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[3]]=glmer(BIOMASS_hr~AGE_CLASS + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[4]]=glmer(BIOMASS_hr~BIO_ZONE + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[5]]=glmer(BIOMASS_hr~sTEMP_CIVIL + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[6]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + sTEMP_CIVIL + (1|SITE) + (1|TIME), data=insectnona,family="Gamma"(link="log") )
modl[[7]]=glmer(BIOMASS_hr~HABITAT + sTEMP_CIVIL + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[8]]=glmer(BIOMASS_hr~HABITAT + BIO_ZONE + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[9]]=glmer(BIOMASS_hr~AGE_CLASS + HABITAT + BIO_ZONE + (1|SITE) +(1|TIME),data=insectnona,family="Gamma"(link="log") )
modl[[10]]=glmer(BIOMASS_hr~1 + (1|SITE) + (1|TIME),data=insectnona,family="Gamma"(link="log"))
aictab(modl)
And then I get this warning message only for the null model (model 10):
Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl =
control$checkConv, : Model failed to converge with max|grad| =
0.0169244 (tol = 0.001, component 1)
Thanks in advance for your help!

'Incorrect number of dimensions' when running Zelig 'arima' on imputed data

I'm getting an error when I try to run an arima model with the zelig package. I'm using MI data with 20 imputations that were created with Amelia. Here is a short summary of my id and response variables:
$ imp20:'data.frame': 442 obs. of 50 variables:
..$ region : Factor w/ 4 levels "Central Africa",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ subregionid : Factor w/ 4 levels "FC","FE","FS",..: 3 3 3 3 3 3 3 3 3 3 ...
..$ country : Factor w/ 34 levels "Angola","Benin",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO2 : Factor w/ 34 levels "AO","BF","BJ",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ ISO3 : Factor w/ 34 levels "AGO","BEN","BFA",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ year : num [1:442] 2002 2003 2004 2005 2006 ...
..$ cap.lat : num [1:442] -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 -8.5 ...
..$ cap.long : num [1:442] 13.2 13.2 13.2 13.2 13.2 ...
..$ NGDP_RPCH : num [1:442] 14.53 5.25 10.88 18.26 20.73 ...
..$ NGDPD : num [1:442] 3.18 3.31 3.38 3.44 3.48 ...
..$ NGDPDPC : num [1:442] 2.68 2.69 2.72 2.75 2.78 ...
..$ NGSD_NGDP : num [1:442] 10.62 7.77 12.63 26.98 40.94
...
..$ PIKE.regional : num [1:442] 0.225 0.295 0.287 0.358 0.357 ...
..$ Definite.Probable : num [1:442] 36 36 36 36 36.1 ...
..$ Elephant.range : num [1:442] 406006 433613 511662 456046 459418 ...
..$ Change.by.year : num [1:442] 0.000463 0.000463 0.000463 0.000463 0.000463 ...
..$ Diff.from.expected : num [1:442] -0.0415 -0.0415 -0.0415 -0.0415 -0.0415 ...
Diff.from.expected is my response variable. And here is the code that I've run along with the error I'm getting.
z1 <- zarima$new()
> z1$zelig(Diff.from.expected~GNI, order=c(1,0,1), model="arima",
+ data = a.coVarsTrans.more, ts="year", cs="country")
Error in data[, cs] : incorrect number of dimensions
So it appears to me that there is an issue with the cs='country' call, but I'm not sure what the issue is. I'm planning to add more independent variables, but want to make sure that a basic model works first, which clearly it doesn't.
Here is the link to my saved Amelia .Rdata file.

Resources