Frailty estimates in coxph object - r

If one uses obj=coxph(... + frailty(id) ), then the object also returns (log)frailty estimates for each individual, which can be extracted with obj$frail.
Does anybody knows how these estimates are being obtained? Are they Empirical Bayes estimates?
Thanks!
Theodor

The default distribution for frailty can be seen in the ?frailty page to be "gamma". If you look at the frailty function (which is not hidden) you see that it simply pastes the name of the distribution onto "frailty." and uses get() to retrieve the proper function. So look at frailty.gamma (also not hidden) to find the answers to your question. Looking back at the help page again, you can see that I should have been able to figure all that out without looking at the code, since it's right up at the top of the page. But there are many routes to knowledge with R. (They are ML, not "empirical Bayes", estimates.)
The help page suggests to me that the author (Therneau) expects you to consult Therneau and Grambsch for further details not obvious from reading the code. If you are doing serious work with survival models in R that is a very useful book to have. It's very clear and helpful in understanding the underpinnings of the 'survival'-package.

Related

How to code a fixed-effects Tobit panel data model in R?

We want to run regressions on panel data, and use fixed-effects (to alleviate the problem of a firm- or time-effect) and censor the model at zero. However, I find little when googling the model, in terms of R code. Does anyone perhaps have any insights?
I will post a screenshot of our data, and of. the intended outcome.
We have tried the code described on this site: http://vps58738.lws-hosting.com/Rpkg/plm/reference/pldv.html
But, we are slightly unsure on what to code for all the specification (or parameters), and experience errors in the result. We are also unsure if this method described on the site is appropriate - it was just the only close thing we found.

R: Evaluate Gradient Boosting Machines (GBM) for Regression

Which are the best metrics to evaluate the fit of a GBM algorithm in R (metrics, graphs, ratios)? And how interpret them?
I think maybe you are overthinking this one! Take a step back and think about what matters... the error. You have forecasted values and you have observed values. the difference tells you most of what you need to know when comparing across models. Basic measures like MSE, MPE, etc. should do fine. If you are looking to refine within a given model, I would recommend taking a look at the gbm documentation. For example, you can pass your gbm model object to summary(), to get the relative influence of each of your variables. Additionally, you can find a lot of information in the documentation, so if you haven't taken a look, I would recommend doing so! I have posted the link at the bottom.
-Carmine
gbm_documentation

Can ensemble classifiers underperform the best single classifier?

I have recently run an ensemble classifier in MLR (R) of a multicenter data set. I noticed that the ensemble over three classifiers (that were trained on different data modalities) was worse than the best classifier.
This seemed to be unexpected to me. I was using logistic regressions (without any parameter optimization) as simple classifier and a Partial Least Squares (PLS) Discriminant Analysis as a superlearner, since the base-learner predictions ought to be correlated. I also tested different superlearners like NB, and logistic regression. The results did not change.
Here are my specific questions:
1) Do you know, whether this can in principle occur?
(I also googled a bit and found this blog that seems to indicate that it can:
https://blogs.sas.com/content/sgf/2017/03/10/are-ensemble-classifiers-always-better-than-single-classifiers/)
2) Especially, if you are as surprised as I was, do you know of any checks I could do in mlr to make sure, that there isnt a bug. I have tried to use a different cross-validation scheme (originally I used leave-center-out CV, but since some centers provided very little data, I wasnt sure, whether this might lead to weird model fits of the super learner), but it still holds. I also tried to combine different data modalities and they give me the same phenomenon.
I would be grateful to hear, whether you have experienced this and if not, whether you know what the problem could be.
Thanks in advance!
Yes, this can happen - ensembles do not always guarantee a better result. More details regarding cases where this can happen are discussed also in this cross-validate question

How does r calculate the p-values in logistic regression

What type of p-values do R calculate in a binomial logistic regression, and where is this documented?
When i read the documentation for ?glm() I find no reference to the calculation of the p-values.
The p-values are calculated by the function summary.glm. See ?summary.glm for a (very brief) bit about how those are calculated.
For more information, look at the source code by typing
summary.glm
at the R command prompt. There you will find the lines of code where an object pvalue is created. Follow the code back to see how the components of the p-value calculation are (conditionally) calculated.
The authors of R wrote the help system with several principles in mind: compactness (don't write more than is needed, it's not a textbook), accuracy, and a curious and well-educated audience. It really was written for other statisticians. The "curious" part of that opening sentence was included to raise the question why you did not also follow the various links in the ?glm page: to summary.glm where you would have found one answer to your ambiguous question or to anova.glm where you would have found another possible answer. The help-authors do expect that you will follow those links and read the whole page and execute the examples. You will notice that even after you get to summary.glm that there is no mention of "binary logistic regression" since they pretty much assume that you are well-grounded in statistics and have copy of McCullagh and Nelder handy, or if not that you will go read the references.
The other principle: sometimes it is the code itself (given the open-source nature of R) that performs the documentation. Technically glm doesn't print anything and print.glm doesn't print p-values. It would be print.summary.glm or print.anova.glm that would be doing any printing. Part of learning R is learning that the results printed to the console will have gone through a eval-print loop and that output can be tailored with object-class-specific functions.
These assumptions are just part of what many people see as a "steep learning curve for R" (although I would have called it a shallow curve if plotted with time/effort on x-axis.)

R script - nls function

Can anyone give me a good explanation for what the parameter "algorithm" does in the nls function in R?
Also, how does the formula work? I know it uses a tilda, but I can't really find a down-to-earth explanation of it.
Also, how important are the start values? Do I need to try multiple start values, or can I still have a guarantee that nls will find the correct parameters regardless of the start values I use?
In brief:
nls() is going to vary parameters to try to minimize the square error between your model and your data. There's several good methods it can try to find the minimum. Reading the details about "method" in ?optim will provide some good info and references.
In general, for nonlinear models, your results can be sensitive to initial guess. You should try several different guesses to make sure that the outputs are close. If your results are very sensitive to your guess, you can try re-parameterizing, using a different algorithm, or rethinking your model.
As for the formula, I'd echo the previous answer. Work through the examples in the bottom of ?nls and then try to ask a more specific question.

Resources