This question already has answers here:
ROC curve in R using ROCR package
(6 answers)
Closed 9 years ago.
I am doing ensemble forecasts for a quantity. And I have around 20 forecast values at each observation point. I will have an event definition of x% i.e. say 95% of highest observation value. I am trying to construct an ROC Curve using R:
Is ROCR a good package for probabilistic based ROC score?
Can you provide an example of how to construct this ROC curve?
Just assume a fake dataset.
I am reading all sorts of papers. But I am very confused as to how to calculate the forecast probabilities.
I would encourage you to look at the caret package. It's wonderful for ensemble learning. It'll tune your parameters for you based on RMSE, ROC (AUC), etc. by cross-validation. That is split your data up into samples with replacement, run tons of models while tuning parameters and give you back the best model.
The vignette (listed on the package page) here is excellent and you'll see examples in there showing plotting ROC curves.
However, if what you're looking for is the simple method to calculate an ROC score from predictions and held out data, check out page 11 of this pdf.
Related
I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals
I want to build a survival model then calculate the X-year (e.g. 10-year) risk of survival.
Is there a way to do this using coxph or survreg? Is this possible using random survival forest (e.g. ranger)?
P.S. not sure if important but data is wide (~100 features - mostly continuous) and 17k samples.
For anyone else trying to do the same. If you build a cox-model with survival::coxph or rms::cph you can use the function pec::predictSurvProb.
In order to compare two survival curves at a fixed point in time and perform basically a two sample test, I need to extract the sample variance of the estimate at a given point in time.
For an object created with the svykm function from Thomas Lumley's survey package in R, this should be accessible in the varlog list. Do the entries in this list constitute the transformed variances on the log scale or the untransformed variances?
I have read the documentation provided for the survey package, but did not fully come to a conclusion. I note that confidence intervals are computed on the log(survival) scale, following the default in survival package and their bounds are given as exp(log(x$surv)+1.96*sqrt(x$varlog)) and exp(log(x$surv)-1.96*sqrt(x$varlog)) in the R package documentation.
They are variances on the log scale.
I have some data that has two variables: spend and outcomes and they are given at a weekly frequency.
I would like to model the relationship between the two at a yearly level, but do not have enough years worth of data to build a model. I do have about 3 years worth of weekly data, however, and would like to simulate several more weeks of data points (spend and outcomes) based on a bi-variate probability density between spend and outcomes which I could then use to roll up to a yearly frequency.
Is there a package in R that can take take two variables and find an estimate for the density function which I could then use to simulate many more data points?
Thanks so much!
The simulate_kde function in the package simukde will internally make a kernel density estimation and create samples from it.
Alternatively, the MASS package has the kde2d function to obtain a bivariate kernel density.
You could then sample from that, as for instance described in this post.
I want to extract the by tree predictions for each observation from the rfsrc object. In other words, for i trees and j observations, I want to extract an [i,j] matrix of the predictions. My goal is to calculate the prediction confidence intervals using the R code found at https://github.com/swager/randomForestCI. My analysis requires a competing risks random forest; otherwise I would have used the randomForest package which makes the by tree predictions more obvious to extract.
I appreciate any assistance.
EDIT: I am attempting to follow the procedure outlined here: http://blog.revolutionanalytics.com/2016/03/confidence-intervals-for-random-forest.html