smooth.Pspline wrapper for stat_smooth (in ggplot2) - r

Sorry if this question is trivial, but I'm trying to figure out how to plot a certain type of natural cubic spline (NCS) in R and it's completely eluded me.
In a previous question I learned how to plot the NCS generated by the ns() command in ggplot, but I'm interested in how to plot a slightly different NCS generated the smooth.Pspline command in the pspline package. As far as I know this is the only package that automatically selects the proper smoothing penalty by CV for a given dataset.
Ideally I would be able to provide smooth.Pspline as a method to a stat_smooth layer in ggplot2. My current code is like:
plot <- ggplot(data_plot, aes(x=age, y=wOBA, color=playerID, group=playerID))
plot <- plot + stat_smooth(method = lm, formula = y~ns(x,4),se=FALSE)
I'd like to replace the "lm" formula with smooth.Pspline's functionality. I did a little bit of googling and found a solution to the very similar B-spline function smooth.spline, written by Hadley. But I haven't been able to adapt this to smooth.Pspline perfectly. Does anyone have experience with this?
Thanks so much!

You simply need to inspect how predict.smooth.Pspline returns the predicted values.
In the internal workings of stat_smooth, predictdf is called to create the smoothed line. predictdf is an internal (non-exported) function of ggplot2 (it is defined here) it is a standard S3 method.
sm.spline returns an object of class smooth.Pspline, therefore for stat_smooth to work you need to create method for predictdf for class smooth.Pspline.
As such the following will work.
smP <- function(formula,data,...){
M <- model.frame(formula, data)
sm.spline(x =M[,2],y =M[,1])
}
# an s3 method for predictdf (called within stat_smooth)
predictdf.smooth.Pspline <- function(model, xseq, se, level) {
pred <- predict(model, xseq)
data.frame(x = xseq, y = c(pred))
}
An example (with a pspline fitted using mgcv::gam as comparison). mgcv is awesome and gives great flexibility in fitting methods and smoothing spline choices (although not CV, only GCV/UBRE/REML/ML)
d <- ggplot(mtcars, aes(qsec, wt))
d + geom_point() + stat_smooth(method = smP, se= FALSE, colour='red', formula = y~x) +
stat_smooth(method = 'gam', colour = 'blue', formula = y~s(x,bs='ps'))

Related

Time-dependent covariates- is there something wrong with this code? (R program)

I am checking a few of my Cox multivariate regression analyses' proportional hazard assumptions using time-dependent co-variates, using the survival package. The question is looking at survival in groups with different ADAMTS13 levels (a type of enzyme).
Could I check if something is wrong with my code itself? It keeps saying Error in tt(TMAdata$ADAMTS13level.f) : could not find function "tt" . Why?
Notably, ADAMTS13level.f is a factor variable.
cox_multivariate_survival_ADAMTS13 <- coxph(Surv(TMAdata$Daysalive, TMAdata$'Dead=1')
~TMAdata$ADAMTS13level.f
+TMAdata$`Age at diagnosis`
+TMAdata$CCIwithoutage
+TMAdata$Gender.f
+TMAdata$`Peak Creatinine`
+TMAdata$DICorcrit.f,
tt(TMAdata$ADAMTS13level.f),
tt = function(x, t, ...)
{mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)})
Thanks- starting with the fundamentals of my actual code or typos- I have tried different permutations to no avail yet.
#Limey was on the right track!
The time-transformed version of ADAMTS13level.f needs to be added to the model, instead of being separated into a separate argument of coxph(...).
The form of coxph call when testing the time-dependent categorical variables is described in How to use the timeSplitter by Max Gordon.
Other helpful documentation:
coxph - fit proportional hazards regression model
cox_multivariate_survival_ADAMTS13 <-
coxph(
Surv(
Daysalive,
'Dead=1'
) ~
ADAMTS13level.f
+ `Age at diagnosis`
+ CCIwithoutage
+ Gender.f
+ `Peak Creatinine`
+ DICorcrit.f
+ tt(ADAMTS13level.f),
tt = function(x, t, ...) {
mtrx <- model.matrix(~x)[,-1]
mtrx * log(t)
},
data = TMAdata
)
p.s. with the original data, there was also a problem because Daysalive included a zero (0) value, which eventually resulted in an 'infinite predictor' error from coxph, probably because tt transformed the data using a log(t). (https://rdrr.io/github/therneau/survival/src/R/coxph.R)

Worm plot residuals graph in ggplot2

I'm trying to plot the Worm plot residuals on a model fitted using the gamlss function from the gamlss package. The interest graph looks like the one below:
Initially, below is the computational routine referring to the use of the wormplot_gg function from the childsds package, however, the result expressed using the function described above is not looks like the example shown above, which is being applied to a dataset contained within R.
library(ggplot2)
library(gamlss)
library(childsds)
head(Orange)
Dados <- Orange
Model <- gamlss(circumference~age, family=NO,data=Dados); Model
wp(Model)
wormplot_gg(m = Model)
Below are the traditional results via the wp function in the gamlss package.
And finally, we have the results obtained through the wormplot_gg function from the childsds package. However, as already described, this one does not present itself in the way I am interested, that is, with the visual structure of the first figure.
using qqplotr https://aloy.github.io/qqplotr/index.html with the detrend=True option
library(qqplotr)
set.seed(1)
df <- data.frame(z=rnorm(50))
ggplot(df, aes(sample=z)) +
stat_qq_point(detrend = T) +
stat_qq_band(detrend = T, color='black', fill=NA, size=0.5)
you can also add geom_hline(yintercept = 0)
edit:
In the case of using this with a gamlss model, the first have to extract the randomized residuals out of the model, which for gamlss is done simply with the function residuals, so you can just do e.g., df <- data.frame(z=residuals(Model)) and then just continue with the rest of the code

R One-Class SVM - Get Probabilistic outputs

I am trying to find away to derive probabilistic outputs when predicting from a one-class svm in R. I know this is not supported in libsvm and I also know this question has been asked before and here a couple of years ago on SO but packages were not available at that time. I'm hoping things have changed now! Also this question is still valid as no approach implemented in R was given as a solution.
I could not find a package to do this so I tried two approaches myself to get around this:
Get the decision values and transform them through the use of the sigmoid activation function. This is described in this paper. Note the paragraph:
Furthermore, SVMs can also produce class probabilities as output instead of class labels. This
is can done by an improved implementation (Lin, Lin, and Weng 2001) of Platt’s a posteriori
probabilities (Platt 2000) where a sigmoid function is fitted to the decision values f of the binary SVM classifiers, A and B being estimated by minimizing the negative log-likelihood function
Use a logistic regression function on the predicted output and derive the probabilities from it. This approach was first described by Platt and an approach is outlined here
My problem is that to check if either of my two solutions are plausible, I tested these two approaches on a two-class svm problem as e1071, using libsvm, gives probabilities for two-class problems so this was taken as the 'truth'. I found that neither of my approaches aligned closely to libsvm.
Here are three graphs showing the resulting probabilities versus the known decision values.
Click to see image. Sorry I seem to have too low a reputation to embed the image which is frustrating! I'm not sure if someone in the community with a higher reputation can edit to embed?
I think my Platt approach is theoretically more sound but, as can be seen from the graph, it appears the logistic regression was somehow too good, the probabilities associated with either classification being extremely close to 1 for positive and 0 for negative.
My code for the Platt implementation is
platt_scale <- function(oc_svm, X){
# Get SVM predictions
y_pred <- predict(oc_svm$best.model,X)
#y_pred <- as.factor(ifelse(y_pred==T,"pos","neg"))
# Train using logistic regression with cross-validation
require(caret)
model <- train(x = X,
y = y_pred,
method = "glm",
family=binomial(),
trControl = trainControl(method = "cv",
number = 5),
control = list(maxit = 50) #BROUGHT IN TO STOP WARNING MESSAGES
)
return(predict(model,
newdata = X,
type = "prob")[,1])
}
I get the following warning when this runs
glm.fit: fitted probabilities numerically 0 or 1 occurred
So I am clearly doing something wrong! I feel like fixing this function is probably the best approach but I don't see where I have gone wrong? I am following the approach I mentioned earlier, here
I get the sigmoid of the decision values as follows
sig_mult <-e1071::sigmoid(decision_values)
The examples were done using the Iris dataset, full code is here
data(iris)
two_class<-iris[iris$Species %in% c("setosa","versicolor"),]
#Make Two-class SVM
svm_mult<-e1071::tune(svm,
train.x = two_class[,1:4],
train.y = factor(two_class[,5],levels=c("setosa", "versicolor")),
type="C-classification",
kernel="radial",
gamma=0.05,
cost=1,
probability = T,
tunecontrol = tune.control(cross = 5))
#Get related decision values
dec_vals_mult <-attr(predict(svm_mult$best.model,
two_class[,1:4],
decision.values = T #use decision values to get score
), "decision.values")
#Get related probabilities
prob_mult <-attr(predict(svm_mult$best.model,
two_class[,1:4],
probability = T #use decision values to get score
), "probabilities")[,1]
#transform decision values using sigmoid
sig_mult <-e1071::sigmoid(dec_vals_mult)
#Use Platt Implementation function to derive probabilities
platt_imp<-platt_scale(svm_mult,two_class[,1:4])
require(ggplot2)
data2<-as.data.frame(cbind(dec_vals_mult,sig_mult))
names(data2)<-c("Decision.Values","Sigmoid.Decision.Values(Prob)")
sig<-ggplot(data=data2,aes(x=Decision.Values,
y=`Sigmoid.Decision.Values(Prob)`,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)+
theme(legend.position = "none")
data3<-as.data.frame(cbind(dec_vals_mult,prob_mult))
names(data3)<-c("Decision.Values","Probabilities")
actual<-ggplot(data=data3,aes(x=Decision.Values,
y=Probabilities,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)+
theme(legend.position = "none")
data4<-as.data.frame(cbind(dec_vals_mult,platt_imp))
names(data4)<-c("Decision.Values","Platt")
plat_imp<-ggplot(data=data4,aes(x=Decision.Values,
y=Platt,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)
require(ggpubr)
ggarrange(actual, plat_imp, sig,
labels = c("Actual", "Platt Implementation", "Sigmoid Transformation"),
ncol = 3,
label.x = -.05,
label.y = 1.001,
font.label = list(size = 8.5, color = "black", face = "bold", family = NULL),
common.legend = TRUE, legend = "bottom")

Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?

I am trying to use activation functions other than the pre-implemented "logistic" and "tanh" in the R package neuralnet. Specifically, I would like to use rectified linear units (ReLU) f(x) = max{x,0}. Please see my code below.
I believe I can use custom functions if defined by (for example)
custom <- function(a) {x*2}
but if I set max(x,0) instead of x*2 then R tells me that 'max is not in the derivatives table', and same for '>' operator. So I am looking for a sensible workaround as I am thinking numerical integration of max in this case wouldn't be an issue.
nn <- neuralnet(
as.formula(paste("X",paste(names(Z[,2:10]), collapse="+"),sep="~")),
data=Z[,1:10], hidden=5, err.fct="sse",
act.fct="logistic", rep=1,
linear.output=TRUE)
Any ideas? I am a bit confused as I didn't think the neuralnet package would do analytical differentiation.
The internals of the neuralnet package will try to differentiate any function provided to act.fct. You can see the source code here.
At line 211 you will find the following code block:
if (is.function(act.fct)) {
act.deriv.fct <- differentiate(act.fct)
attr(act.fct, "type") <- "function"
}
The differentiate function is a more complex use of the deriv function which you can also see in the source code above. Therefore, it is currently not possible to provide max(0,x) to the act.fct. It would require an exception placed in the code to recognize the ReLU and know the derivative. It would be a great exercise to get the source code, add this in and submit to the maintainers to expand (but that may be a bit much).
However, regarding a sensible workaround, you could use softplus function which is a smooth approximation of the ReLU. Your custom function would look like this:
custom <- function(x) {log(1+exp(x))}
You can view this approximation in R as well:
softplus <- function(x) log(1+exp(x))
relu <- function(x) sapply(x, function(z) max(0,z))
x <- seq(from=-5, to=5, by=0.1)
library(ggplot2)
library(reshape2)
fits <- data.frame(x=x, softplus = softplus(x), relu = relu(x))
long <- melt(fits, id.vars="x")
ggplot(data=long, aes(x=x, y=value, group=variable, colour=variable))+
geom_line(size=1) +
ggtitle("ReLU & Softplus") +
theme(plot.title = element_text(size = 26)) +
theme(legend.title = element_blank()) +
theme(legend.text = element_text(size = 18))
You can approximate the max function with a differentiable function, such as:
custom <- function(x) {x/(1+exp(-2*k*x))}
The variable k determines the accuracy of the approximation.
Other approximations can be derived from equations in section "Analytic approximations": https://en.wikipedia.org/wiki/Heaviside_step_function
a bit belated, but in case anyone else is still looking for an answer. Here's how to incorporate the non-approximated ReLu function. This is achieved by loading it from a package.
Note that while you could technically define the relu function yourself (with max() or if(x<0) etc.), this wouldn't work in the neural net package because it needs a differentiable function.
First, load the relu function from sigmoid package, which is differentiable
install.packages('sigmoid')
library(sigmoid)
relu()
Second, insert in your code
nn <- neuralnet(
as.formula(paste("X",paste(names(Z[,2:10]), collapse="+"),sep="~")),
data=Z[,1:10],
hidden=5, err.fct="sse",
act.fct=relu,
rep=1,
linear.output=TRUE)
I found this solution in another post, but can't for the life of me rememeber which one, so credits to unknown.

Editing variable labels for dotplot of (g)lmer object with R/Lattice

I'm using the glmer() function from the lme4 package to estimate a complex mixed effects models with multiple random effects. After the model is estimated, I'm using the dotplot() function from the lattice package to create a dotplot of the random effects, which include varying slopes. My inquiry concerns how to edit the variable labels in the dotplot
A simple reproducible example of my problem goes as follows.
library(lattice)
library(lme4)
data(sleepstudy)
sleepstudy$x <- rnorm(180)
M1 <- lmer(Reaction ~ Days + x + (Days + x | Subject), sleepstudy)
dotplot(ranef(M1, condVar=TRUE), ylab="Levels", main=FALSE,
scales = list(x =list(relation = 'free')))[["Subject"]]
This produces the following dotplot, which approximates what I want.
What I would like to do is edit the variable labels. Namely, I'd like to remove the parentheses from "(Intercept)", and change the labels for both varying slopes to be something other than the variable names called into glmer(). Is that possible? I'm sure it is and that it's likely a "strip" option, though I'm not sure what exactly it would be.
I'd be open to a ggplot2 solution as well. Either/or is fine as long as it gets to what I want. In fact, a ggplot2 solution might be better if it allows me to adjust the bounds on the conditional variance to something like 1.645*se. I don't think lattice will give me that option.
Thanks for any help.
One simple way to do this assuming no native renaming functionality is to add an intermediate variable that you assign ranef(M1,condVar=TRUE) to, which I call model. Then you can use colnames() to rename the labels.
Solution:
library(lattice)
library(lme4)
data(sleepstudy)
sleepstudy$x <- rnorm(180)
M1 <- lmer(Reaction ~ Days + x + (Days + x | Subject), sleepstudy)
model <- ranef(M1,condVar=TRUE)
colnames(model[[1]]) <- c("Intercept","Days","x") # Add your labelshere
dotplot(model, ylab="Levels", main=FALSE,
scales = list(x =list(relation = 'free')))[["Subject"]]
Alternatively, you could coerce this to a data.frame() or data.table() and use this in ggplot2 as you mentioned.

Resources