fitting a distribution graphically - r

I am running some tests to try and determine what distribution my data follows. By the look of the density of my data I thought it looked a bit like a logistic distribution. I than used the package MASS to estimate the parameters of the distribution. However when I graph them together although better than the normal, the logistic is still not very good..Is there a way to find what distribution would go better? Thank you for the help !
library(quantmod)
getSymbols("^NDX",src="yahoo", from='1997-6-01', to='2012-6-01')
daily<- allReturns(NDX) [,c('daily')]
dailySerieTemporel<-ts(data=daily)
x<-na.omit(dailySerieTemporel)
library(MASS)
(xFit<-fitdistr(x,"logistic"))
# location scale
# 0.0005210570 0.0106366354
# (0.0002941922) (0.0001444678)
xFitEst<-coef(xFit)
plot(density(x))
set.seed(125)
lines(density(rlogis(length(x), xFitEst['location'], xFitEst['scale'])), col=3)
lines(density(rnorm(length(x), mean(x), sd(x))), col=2)

This is elementary R: plot() creates a new plotting canvas by default, and you should use a command such as lines() to add to an existing plot.
This works for your example:
plot(density(x))
lines(density(rlogis(length(x), location = 0.0005210570,
scale = 0.0106366354)), col="blue")
as it adds the estimated logistic fit in blue to your existing plot.

Related

Residual Plot for multivariate regression in Time Series, with time on X axis in R

I have a dataframe which is a time series. I am using the function lm to build a multivariate regression model.
linearmodel <- lm(Y~X1+X2+X3, data = data)
I want to plot the residuals of this linearmodel on the y-axis and time on the x-axis using a simple function, with the lm() object as the input.
Standard residual plotting functions like the one in car package (car::residualPlot) gives residuals on the Y-axis and fitted-values on the Y-axis.
Ideally, I need the residuals on the Y-axis and the timescale on the X-axis. But I understand that the function lm() is time agnositc. So, I can live with if the residuals are on Y-axis in the same order as the data input and nothing on the X-axis
Is there a plotting function which i can use by passing the linearmodel object into the function (not something where i can extract the residuals and use ggplot2). So for example: plot<- plotresidualsinorder(linearmodels) should give me the residuals on Y-axis in the same order of the data input?
I want to use this plot in R-shiny ultimately.
My research led me to car package, which is wonderful in its own right, but doesn't have the function to solve my problem.
Many thanks in advance for the help.
You can use the Residual Plot information. For the proposed solution, we need to apply the lm function to a formula that describes your Y variables by the variables X1+X2+X3, and save the linear regression model in a new linearmodel variable. Finally, we compute the residual with the resid function. In your case, the following solution can be representative for your problem.
Proposed solution:
linearmodel <- lm(Y~X1+X2+X3, data = data)
lm_resid <- resid(linearmodel)
plot(data$X1+X2+X3, lm_resid,
ylab="Residuals", xlab="Time",
main="Data")
abline(0, 0)
For any help concerning how does the resid function works, you can try:
help(resid)
Calisto's solution will work, but there is a more simple and straightforward solution. The lm function already give to you the regression residuals. So you may simply pass:
plot(XTime, linearmodel$residuals, main = "Residuals")
XTime is the Date variable of your dataset, maybe you may require to format that with POSIX functions: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/as.POSIX*
Add parameters as you need to share it on R-shiny.

R generalized hyperbolic distribution plot line in hist

I have the following code:
x.grid <- seq(-0.05,to=0.05,by=0.001)
hist(df$dataset.d.return.aex, breaks=100, main='daily returns')
Which results in the following plot:
[
Now, I want to draw the implied marginal distributions in histograms of the daily returns, where I use a generalized hyperbolic distribution. I tried to do it with the following code:
ghypuv <- fit.ghypuv(data = returns[,"d.return.aex"], symmetric = TRUE)
lines(ghypuv, col="blue")
However, this results in the following plot:
So, my question is, how to properly draw the implied marginal distributions (using a generalized hyperbolic distribution) in histograms of the daily returns?
Try hist(..., freq=FALSE) to use density rather than number of counts on the y-axis.
(This is probably a duplicate ...)

Specificity of ROC curve plotting in reverse direction

I wish to plot the ROC curve for a SVM classifier I have built but when I plot my data, the x axis (specificity) is plotting from 1.0 -> -1.0, see the image below.
In order to plot this I used the following:
> plot(roc(predictor = fit.down.Kernel$pred$Overshooting, response = fit.down.Kernel$pred$obs))
where fit.down.Kernel is my model, Overshooting is the target feature I wish to predict.
Obviously I have gone about this the wrong way, can anyone point me in the right direction please?
Ultimately I have a bunch of models which I have trained using a variety of different datasets (upsampled, downsampled...) and I wish to visually compare their performance using the ROC curve. I guess I need to get the axis working properly before proceeding to multiple plots.
You can use ROCR package in R. Refer to a code below and use with your Predictions vs actual results.
Prob.mod are predictions from various models ( 1, 2, 3) & y.test is your actual Overshooting
Use Prediction function from ROCR
prediction.mod1 <- prediction(prob.mod1, y.test)
prediction.mod2 <- prediction(prob.mod2, y.test)
prediction.mod3 <- prediction(prob.mod3, y.test)
Calculating AUC
auc.mod1=performance(prediction.mod1, "auc")#y.values)
auc.mod2=performance(prediction.mod2, "auc")#y.values)
auc.mod3=performance(prediction.mod3, "auc")#y.values)
Plot AUCs
plot(auc.mod1, ylim=c(0.1, 1))
plot(auc.mod2, col=2, add=TRUE)
plot(auc.mod3, col=3, add=TRUE)

How to extract average ROC curve predictions using ROCR?

The ROCR library in R offer the ability to plot an average ROC curve (right from the ROCR reference manual):
library(ROCR)
library(ROCR)
data(ROCR.xval)
# plot ROC curves for several cross-validation runs (dotted
# in grey), overlaid by the vertical average curve and boxplots
# showing the vertical spread around the average.
data(ROCR.xval)
pred <- prediction(ROCR.xval$predictions, ROCR.xval$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf,col="grey82",lty=3)
plot(perf,lwd=3,avg="vertical",spread.estimate="boxplot",add=TRUE)
Lovely. Unfortunately, there's seemingly no ability to obtain the average ROC curve itself as an object/dataframe/etc. for further statistical testing (say, with pROC). I did do some research (albeit perhaps after the fact), and I found this post:
Global variables in R
I looked through ROCR's code reveals the following lines for passing a result to a plot:
performance_plots.R, (starting at line 451)
## compute average curve
perf.avg <- perf.sampled
perf.avg#x.values <- list( rowMeans( data.frame( perf.avg#x.values)))
perf.avg#y.values <- list(rowMeans( data.frame( perf.avg#y.values)))
perf.avg#alpha.values <- list( alpha.values )
So, using the trace function I looked up here (General suggestions for debugging in R):
trace(.performance.plot.horizontal.avg, edit=TRUE)
I added the following line to the performance_plots.R after the lines listed above:
perf.rocr.avg <<- perf.avg # note the double `<<`
A horrible hack, yet it works as I can plot perf.rocr.avg without a problem. Unfortunately, when using pROC, I can't compare my averaged ROC curve because it requires a pROC roc object. That's fine, but the catch is that the pROC roc object requires the original prediction and reference data to create. As far as I can tell, ROCR is averaging the ROC curves themselves and not the predictions, so it seems I can't get what I want out of ROCR.
Is there a way to reverse-engineer the predictions from the averaged ROC curve created by ROCR?
I met the same problem as you. In my perspective, the average ROC generated by the ROCR package just assigned numeric values, while other statistical attribution (e.g. confidence interval) lacks. That means statistic with the average ROC may make no sense and that's why the roc object can't be generated by (tpr, fpr) list in PRoc package. However, I find a paper to address this problem, i.e., the comparison between average ROCs. The title is "The average area under correlated receiver operating characteristic curves: a nonparametric approach based on generalized two-sample Wilcoxon statistics". I hope that's helpful.

How to Adjust restricted cubic spline cox model using rms package?

I am trying to plot a restricted cubic spline model using the rms package. However I don't find any way to adjust my cox proportional hazard model, I can only get the unadjusted fit.
Here is my code:
library(survival)
library(rms)
dd <- datadist(Cox9)
options(datadist="dd")
fit <- cph(Surv(follcox,evento) ~ rcs(G_VINO,3))
plot(Predict(fit_vino), lty=1, lwd=3, ylim=c(-0.5,1.0),xlim = c(0,50), col="white")
With this coding I get the unadjusted spline model.
I wondered how can I add the confounding variables to adjust the model.
I tried:
fit_vino_adj <- cph(Surv(follcox,evento) ~rcs(G_VINO+edad0+actfis+energia))
plot(Predict(fit_vino_adj), lty=2, lwd=2)
But that gives me the splines model of each variable separately, anyone has an idea how can I adjust my model?
Since you failed to include the data in Cox9 or show how one might construct a similar dataframe or show any output, we can only guess at what happened and respond in generalities. It appears that you are bundling the variables within the rcs function. That is unlikely to succeed, or if it does succeed seems likely that the results will be incorrect. Instead you should construct this fit and then plot only the adjusted fit of the curve you are interested in by naming the variable of focus in the Predict-call.
fit_vino_adj <- cph(Surv(follcox,evento) ~ rcs(G_VINO, 3)+edad0+actfis+energia)
plot(Predict(fit_vino_adj, name="G_VINO"), lty=2, lwd=2)
Or perhaps (assuming these are all continuous measurements) make the very slightly modified plotting call after:
fit_vino_adj2 <- cph(Surv(follcox,evento) ~ rcs(G_VINO, 3)+rcs(edad0, 3) +
rcs(actfis, 3) + rcs(energia, 3) )
plot(Predict(fit_vino_adj), lty=2, lwd=2) # to see form of all variable fits.
If you want to have two or more rcs splines in the models, then you need to wrap rcs around the other variables separately. I did not think that rcs function was not like the ^ function, which has a formula expansion method. (Although your claim that you got separate output from that second model makes me wonder if I have completely kept up with that package.) If you wanted a complex surface for what I call "crossed-splines", then you would use the * operator between two rcs calls. Crossing with a factor variable will construct individual rcs-spline fits for each level of the factor.

Resources