I'm trying to plot the Worm plot residuals on a model fitted using the gamlss function from the gamlss package. The interest graph looks like the one below:
Initially, below is the computational routine referring to the use of the wormplot_gg function from the childsds package, however, the result expressed using the function described above is not looks like the example shown above, which is being applied to a dataset contained within R.
library(ggplot2)
library(gamlss)
library(childsds)
head(Orange)
Dados <- Orange
Model <- gamlss(circumference~age, family=NO,data=Dados); Model
wp(Model)
wormplot_gg(m = Model)
Below are the traditional results via the wp function in the gamlss package.
And finally, we have the results obtained through the wormplot_gg function from the childsds package. However, as already described, this one does not present itself in the way I am interested, that is, with the visual structure of the first figure.
using qqplotr https://aloy.github.io/qqplotr/index.html with the detrend=True option
library(qqplotr)
set.seed(1)
df <- data.frame(z=rnorm(50))
ggplot(df, aes(sample=z)) +
stat_qq_point(detrend = T) +
stat_qq_band(detrend = T, color='black', fill=NA, size=0.5)
you can also add geom_hline(yintercept = 0)
edit:
In the case of using this with a gamlss model, the first have to extract the randomized residuals out of the model, which for gamlss is done simply with the function residuals, so you can just do e.g., df <- data.frame(z=residuals(Model)) and then just continue with the rest of the code
Related
I am trying to write a function that spits out a KM survival curve. I am going to use this in a ShineyApp which is why I want to write a function so I can easily pass in arguments from a dropdown menu (which will input as a string into the strata argument). Here is a simplified version of what I need:
survival_function <- function(data_x, strata_x="1"){
survFormula <- Surv(data_x$time, data_x$status)
my_survfit <- survfit(data=data_x, as.formula(paste("survFormula~", {{strata_x}})))
ggsurvplot(my_survfit, data = data_x, pval=T)
}
survival_function(inputdata, "strata_var")
I get an error:
Error in paste("survFormula1~", { : object 'strata_x' not found
I'm at a loss because
as.formula(paste("~", {{arg}}))
has worked in other functions I've written to produce plots using ggplot to easily change variables to facet by, but this doesn't even seem to recognize strata_x as an argument.
Your function needs a couple of tweaks to get it working with ggsurvplot. It would be best to create the Surv object as a new column in the data frame and use this column in your formula. You also need to make sure you have an actual symbolic formula as the $call$formula member of the survfit object, otherwise ggsurvplot will fail to work due to non-standard evaluation deep within its internals.
library(survival)
library(survminer)
survival_function <- function(data_x, strata_x) {
data_x$s <- Surv(data_x$time, data_x$status)
survFormula <- as.formula(paste("s ~", strata_x))
my_survfit <- survfit(survFormula, data = data_x)
my_survfit$call$formula <- survFormula
ggsurvplot(my_survfit, data = data_x)
}
We can test this on the included lung data set:
survival_function(lung, "sex")
Created on 2022-08-03 by the reprex package (v2.0.1)
I am getting PCA components using preProcess() from caret in R, and getting quantitative results.
dataPCA <- preProcess(data[1:ncol(data)-1], method = "pca", thresh = 0.95)
print(dataPCA)
print(dataPCA$rotation)
PCATrain <- predict(dataPCA,dataTrain[,1:ncol(dataTrain)-1])
PCATest <- predict(dataPCA,dataTest[,1:ncol(dataTest)-1])
However, I'd like to plot the components and variances explained, like you'd do with prcomp with plot(pca, type="1"). Is it possible using preProcess() or should I run prcomp()?
preProcess doesn't save that info. I'd use prcomp.
This is my first attempt using a machine learning paradigm in R. I'm using a planet data set (url: https://www.kaggle.com/mrisdal/open-exoplanet-catalogue) and I simply want to predict a planet's size based on the size of its Sun. This is the code I currently have, using nnet():
library(nnet)
#Organize data:
cols_to_keep = c(1,4,21)
full_data <- na.omit(read.csv('Planet_Data.csv')[, cols_to_keep])
#Split data:
train_data <- full_data[sample(nrow(full_data), round(nrow(full_data)/2)),]
rownames(train_data) <- 1:nrow(train_data)
test_data <- full_data[!rownames(full_data) %in% rownames(data1),]
rownames(test_data) <- 1:nrow(test_data)
#nnet
nnet_attempt <- nnet(RadiusJpt~HostStarRadiusSlrRad, data=train_data, size=0, linout=TRUE, skip=TRUE, maxNWts=10000, trace=FALSE, maxit=1000, decay=.001)
nnet_newdata <- predict(nnet_attempt, newdata=test_data)
nnet_newdata
When I print nnet_newdata I get a value for each row in my data, but I don't really understand what these values mean. Is this a proper way to use the nnet() package to predict a simple regression?
Thanks
When predict is called for an object with class nnet you will get, by default, the raw output from the nnet model applied to your new dataset. If, instead, yours is a classification problem, you can use type = "class".
See here.
I want to compare GWR fittings produced between spgwr and mgcv, but I got a error with gam function of mgcv . Here is a example :
require(spgwr)
require(mgcv)
require(R2BayesX)
data(columbus)
col.bw <- gwr.sel(crime ~ income + housing, data=columbus,verbose=F,
coords=cbind(columbus$x, columbus$y))
col.gauss <- gwr(crime ~ income + housing, data=columbus,
coords=cbind(columbus$x, columbus$y),
bandwidth=col.bw, hatmatrix=TRUE)
#gwr fitting with Intercept
col.gam<-gam(crime ~s(x,y)+s(x,y)*income+s(x,y)*housing, data=columbus)#mgcv ERROR
b1<-bayesx(crime ~sx(x,y)+sx(x,y)*income+sx(x,y)*housing, data=columbus)#R2Bayesx ERROR
Question:
How to fit the same gwr using gam and bayesx function(the smooth functions of location )
How to control the parameters to be similiar as possible including optimal bandwidth
The mgcv error comes from the factor that you are specifying the "interactions" between the spatial smooth and variables income and housing. Read ?gam.models for details on using by terms. I think for this you need
col.gam <- gam(crime ~s(x,y, k = 5) + s(x,y, by = income, k = 5) +
s(x,y, by = housing, k = 5), data=columbus)
In this example, as there are only 49 observations, you need to restrict the dimensions of the basis functions, which I do here with k = 5, but you should investigate whether you need to vary these a little, within the constraints of the data.
By the looks of the error from bayesx, you have the same issue of specifying the model incorrectly. I'm not familiar with bayesx(), but it looks like it uses the same s() function as supplied with mgcv, so the model specification should be the same as I show above.
As for 2. can you expand on what you mean here Comparable getween gam() and bayesx() or getting both or one of these comparable with the spgwr() model?
Sorry if this question is trivial, but I'm trying to figure out how to plot a certain type of natural cubic spline (NCS) in R and it's completely eluded me.
In a previous question I learned how to plot the NCS generated by the ns() command in ggplot, but I'm interested in how to plot a slightly different NCS generated the smooth.Pspline command in the pspline package. As far as I know this is the only package that automatically selects the proper smoothing penalty by CV for a given dataset.
Ideally I would be able to provide smooth.Pspline as a method to a stat_smooth layer in ggplot2. My current code is like:
plot <- ggplot(data_plot, aes(x=age, y=wOBA, color=playerID, group=playerID))
plot <- plot + stat_smooth(method = lm, formula = y~ns(x,4),se=FALSE)
I'd like to replace the "lm" formula with smooth.Pspline's functionality. I did a little bit of googling and found a solution to the very similar B-spline function smooth.spline, written by Hadley. But I haven't been able to adapt this to smooth.Pspline perfectly. Does anyone have experience with this?
Thanks so much!
You simply need to inspect how predict.smooth.Pspline returns the predicted values.
In the internal workings of stat_smooth, predictdf is called to create the smoothed line. predictdf is an internal (non-exported) function of ggplot2 (it is defined here) it is a standard S3 method.
sm.spline returns an object of class smooth.Pspline, therefore for stat_smooth to work you need to create method for predictdf for class smooth.Pspline.
As such the following will work.
smP <- function(formula,data,...){
M <- model.frame(formula, data)
sm.spline(x =M[,2],y =M[,1])
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.smooth.Pspline <- function(model, xseq, se, level) {
pred <- predict(model, xseq)
data.frame(x = xseq, y = c(pred))
}
An example (with a pspline fitted using mgcv::gam as comparison). mgcv is awesome and gives great flexibility in fitting methods and smoothing spline choices (although not CV, only GCV/UBRE/REML/ML)
d <- ggplot(mtcars, aes(qsec, wt))
d + geom_point() + stat_smooth(method = smP, se= FALSE, colour='red', formula = y~x) +
stat_smooth(method = 'gam', colour = 'blue', formula = y~s(x,bs='ps'))