mlr3proba surv.xgboost is not producing distr output + documentation link unstable - r

R version used: 3.6.3, mlr3 version: 0.4.0-9000, mlr3proba version: 0.1.6.9000 and xgboost version: 0.90.0.2 (as stated on Rstudio package manager)
Unfortunately, when applying surv.xgboost for training and prediction, no distr output is produced as stated in the documentation: https://mlr3proba.mlr-org.com/reference/LearnerSurvXgboost.html, only crank and lp outputs are produced.
Also please note that the documentation link above is also unstable as it sometimes links to a new mlr3proba version 0.2.0 throwing a 404 error while other times it works and shows documentation notes for surv.xgboost as per mlr3proba 0.1.6.
Please let me know if you would like me to provide any further details concerning the issue. Thank you in advance for your time.

Hi thanks for using mlr3proba! Good spot on the documentation problem, I will get that fixed asap. xgboost does not natively predict distr, this is a mistake in the documentation. You can check this with LearnerSurvXgboost$new()$predict_types. However it is easy to get a distribution prediction:
library(mlr3); library(mlr3proba); library(mlr3pipelines)
learn = distrcompositor(lrn("surv.xgboost"), estimator = "kaplan", form = "ph")
You could change the form and estimator arguments though as xgboost assumes a PH form these are the most sensible options.
Let me know if the code chunk doesn't work for some reason, and if it does please mark as answered :)
Raphael

Related

Problem in model fit () in Keras R Package for a simple DNN

I've been trying to fit a simple model for a house price prediction in the following model:
Everything runs fine till the fit part, where I get the following error:
I've tried installing reticulate and RTools manually, using online notebooks like Kaggle and could not figure out how to implement this. Can somebody help me please. Thanks in advance!
I've tried using the library KerasR as well but couldn't get it running too, but for other reasons: invalid 'dimnames' given for data set, but this is something for another post

How to use weight with the package "crsosstable" for R

The crosstable package give me exactly what I need to do some exploratory work in a data set composed of answers to a survey. But I need to weight the crosstabulation to get a representative results of the population I'm studying. Any ideas how I could use weights with this package?
So far I have used the "survey" package to do that, but it's lacking presentation tool to get publication ready tables.
Thanks.
I'm the dev of the crosstable package and it is unfortunately not supporting weights yet.
I would love to implement this as a feature one day, so you should definitely open a Feature Request on GitHub.
As I've never had to do a weighted description myself, please add a simplified version of your use case so that I can make something useful to everyone.

Error in 'ts' function when using 'zoib' R package for beta regression

I am working with the R package 'zoib' for performing beta regression in R. I am trying to replicate the example included on page 41 in the paper the package authors published in The R Journal:
Lui F and Kong Y. 2015. zoib: An R Package for Bayesian Inference for Beta Regression and Zero/One Inflated Beta Regression. The R Journal 7(2)
I believe I am using the exact same data and code that they use:
library(zoib)
data("GasolineYield", package="zoib")
GasolineYield$batch <- as.factor(GasolineYield$batch)
d <- GasolineYield
eg1.fixed <- zoib(yield ~ temp + as.factor(batch) | 1, data=GasolineYield, joint=FALSE,
random=0, EUID=1:nrow(d), zero.inflation=F, one.inflation=F,
n.iter=1050, n.thin=5, n.burn=50)
sample1 <- eg1$coeff
traceplot(sample1)
autocorr.plot(sample1)
gelman.diag(sample1)
However, I am getting an error when I try to do the diagnostic plots on the samples. This is the error message:
Error in ts(seq(from = start(x), to = end(x), by = thin(x)), start = start(x), :
invalid time series parameters specified
I cannot understand why the code isn't working or what I can do to fix the problem. I can trace the error to the time function which is called by zoib, and it seems like maybe it is a problem that the sample object does not have a tsp attribute, but the zoib package authors make it clear that their model output is meant to be used with coda, so I am very confused. I don't have much experience working with MCMC or time series objects, so maybe I am just missing something obvious. Can anyone explain why the example provided by the package authors is failing, and what the solution is?
I e-mailed the package author (Fang Liu) and she informed me that there was in fact a bug in the version of the package I have, but that the bug is fixed in the most recent version of zoib (Version 1.4.2). Using the most recent version, the code now works.

Arch modeling Python

I have been using Python to fit an ARCH model to monthly return series of Intel stock from 1989-2010. I have used the ARCH library written by Kevin Shepphard. Now, when cross checking with R, my coefficients of Volatilty model is slightly different than what R tells me it is. I am wondering, why is there so many differences in results across packages? Which language is correct then? R's fGarch package or Kevin shepphards package? The problem is the p values across the two languages are completely different. I'm confused which language to use to get the correct results. I have attached the link to my work below. If you scroll down, you will be able to see my Python implementation where I'm trying to fit a arch(3) model and likewise Rs implementation. If someone can please explain where the difference is coming from and which package to trust, I would highly appreciate it
Thanks
http://nbviewer.ipython.org/gist/mrajancsr/96a19065794c8c0bd850
Fixed in 95ccc3e on August 6, 2015
https://github.com/bashtage/arch/commit/95ccc3e94d408d92c6d0d8635a62ff2a26243f45

R package "spatstat": How do you get standard errors for non-smooth terms in a poisson process model (function: ppm) when use.gam=TRUE?

In the R package spatstat (I am using the current version, 1.31-0) , there is an option use.gam. When you set this to true, you can include smooth terms in the linear predictor, the same way you do with the R package mgcv. For example,
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
Now, if I want a confidence interval for the intercept, you can usually use summary or vcov, which works when you don't use gam but fails when you do use gam
vcov(g)
which gives the error message
Error in model.frame.default(formula = fmla, data =
list(.mpl.W = c(7.09716796875, :invalid type (list) for variable 's(x, y)'
I am aware that this standard error approximation here is not justified when you use gam, but this is captured by the warning message:
In addition: Warning message: model was fitted by gam();
asymptotic variance calculation ignores this
I'm not concerned about this - I am prepared to justify the use of these standard errors for the purpose I'm using them - I just want the numbers and would like to avoid "writing-my-own" to do so.
The error message I got above does not seem to depend on the data set I'm using. I used the nztrees example here because I know it comes pre-loaded with spatstat. It seems like it's complaining about the variable itself, but the model clearly understands the syntax since it fits the model (and the predicted values, for my own dataset, look quite good, so I know it's not just pumping out garbage).
Does anybody have any tips or insights about this? Is this a bug? To my surprise, I've been unable to find any discussion of this online. Any help or hints are appreciated.
Edit: Although I have definitively answered my own question here, I will not accept my answer for the time being. That way, if someone is interested and willing to put in the effort to find a "workaround" for this without waiting for the next edition of spatstat, I can award the bounty to him/her. Otherwise, I'll just accept my own answer at the end of the bounty period.
I have contacted one of the package authors, Adrian Baddeley, about this. He promptly responded and let me know that this is indeed a bug with the software and, apparently, I am the first person to encounter it. Fortunately, it only took him a short time to track down the issue and correct it. The fix will be included in the next release of spatstat, 1.31-1.
Edit: The updated version of spatstat has been released and does not have this bug anymore:
g <- ppm(nztrees, ~1+s(x,y), use.gam=TRUE)
sqrt( vcov(g)[1,1] )
[1] 0.1150982
Warning message:
model was fitted by gam(); asymptotic variance calculation ignores this
See the spatstat website for other release notes. Thanks to all who read and participated in this thread!
I'm not sure you can specify the trend the way you have which is possibly what is causing the error. It doesn't seem to make sense according to the documentation:
The default formula, ~1, indicates the model is stationary and no trend is to be fitted.
But instead you can specify the model like so:
g <- ppm(nztrees, ~x+y, use.gam=TRUE)
#Then to extract the coefficientss:
>coef(g)
(Intercept) x y
-5.0346019490 0.0013582470 -0.0006416421
#And calculate their se:
vc <- vcov(g)
se <- sqrt(diag(vc))
> se
(Intercept) x y
0.264854030 0.002244702 0.003609366
Does this make sense/expected result? I know that the package authors are very active on the r-sig-geo mailing lsit as they have helped me in the past. You may also want to post your question to that mailing list, but you should reference your question here when you do.

Resources