Change options of GAM plots indvidually? - r

Consider the code below to fit a GAM:
library(mgcv)
x1=runif(200)
x2=runif(200)
y=sin(x1)+x2^2+rnorm(200)
m = gam(y~s(x1)+s(x2))
Now using plot(m) plots smooth terms plots separately, so to merge plots I've found this code:
par(mfrow=c(1,2))
plot(m)
The plotted graph looks like this:
However I can not change the options of each plot individually, e.g. setting main="plot" changes both plots titles and I need to title each plot differently. How can I change set options of each plot separately?

You can use the select argument of plot.gam:
select: Allows the plot for a single model term to be selected for
printing. e.g. if you just want the plot for the second
smooth term set select=2.
For example:
library(mgcv)
df <- data.frame(x1 = runif(200), x2 = runif(200))
df <- transform(df, y = sin(x1) + x2^2 + rnorm(200))
m <- gam(y~s(x1)+s(x2), data = df)
layout(matrix(1:2, ncol = 2))
plot(m, select = 1, main = "First smooth")
plot(m, select = 2, main = "Second smooth")
layout(1)
The resulting plot is shown below

Related

How to customize x-axis range with shapviz

I'm plotting the shap values of my variables using the shapviz package. Specifically, I'm plotting a beeswarm plot using the sv_importance command, and dependence plots for the most important variables using sv_dependence. However, to make the results of different models more easily comparable, I would like to customize the x-axis range to make it equal for every plot. Do you have any suggestions about how to customize the axis range for shapviz objects? Here is a reproducible example:
library(shapviz)
set.seed(1)
X_train <- data.matrix(`colnames<-`(replicate(26, rnorm(100)), LETTERS))
dtrain <- xgboost::xgb.DMatrix(X_train, label = rnorm(100))
fit <- xgboost::xgb.train(data = dtrain, nrounds = 50)
shp <- shapviz(fit, X_pred = X_train)
p <- sv_importance(shp, kind = "beeswarm", show_numbers = TRUE, max_display = 15)
p
d <- sv_dependence(shp, v="I")
d
From here, how can I make the x-axis range in the plot p equal to [-2.0, 2.0] instead of [-1.0, 0.5] (as it is by default)?
p is a ggplot object, so you can add whatever scale you like. Just be aware that the "x" axis is actually the y axis, since coord_flip is used internally:
library(ggplot2)
p + scale_y_continuous(limits = c(-2, 2))

R: Survminer double graph

I am involved in a project where we are plotting survival curves for an event with a pretty low incidence, and the Kaplan-Meier curves (plotted using survminer) are pretty flat. I do not want to simply zoom in on the Y-axis as I think the incidence rates may then be misinterpreted by the reader. One way to show both the 'true' rate and zoom in on eventual small differences is to do it as NEJM does it:
https://www.nejm.org/na101/home/literatum/publisher/mms/journals/content/nejm/2011/nejm_2011.364.issue-9/nejmoa1007432/production/images/img_large/nejmoa1007432_f1.jpeg.
I have, however, not found a way to do this directly in survminer. For reproducibility's sake, I would like to avoid involving any Adobe software.
Does anyone know a way to get a small, zoomed in version included on top of the original graph? I would like to accomplish this with survminer but tips on any other good ggplot-based KM packages are appreciated.
Small example:
library(survival)
library(survminer)
df <- genfan
df$treat<-sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat, data = df)
p <- ggsurvplot(fit, data = df, risk.table = TRUE, fun = 'event', ylim = c(0, 1))
p # Normal flat, singular graph
There are a few ways to do this but one suggestion is too make the two plots you have and arrange them with grid.arrange. First make the two plots. Then pull out the risk table and plot separately for the first plot (you cannot put a ggsurvplot object in a grid.arrange). Nest the second plot in plot one with a annotation_custom. Finally, use layout_matrix to specify the dimensions of your plot and put it back together with grid.arrange.
library(survival)
library(survminer)
library(grid)
library(gridExtra)
df <- genfan
df$treat<-sample(c(0,1),nrow(df),replace=TRUE)
fit <- survfit(Surv(hours, status) ~ treat, data = df)
p <- ggsurvplot(fit, data = df, risk.table = TRUE, fun = 'event', ylim = c(0, 1))
#zoomed plot and remove risk table
g <- ggsurvplot(fit, data = df, risk.table = FALSE, fun = 'event', ylim = c(0, .5))
risktab <- p$table
justplot <- p$plot
p2 <- justplot +
annotation_custom(grob = ggplotGrob(g$plot+
theme(legend.position = "none")),
xmin = 60,xmax=Inf,ymin = .5,ymax = Inf)
lay <- rbind(c(1,1),
c(1,1),
c(2,2))
gridExtra::grid.arrange(p2, risktab,
#use layout matrix to set sizes
layout_matrix=lay
)

I want to create the empirical cumulative distribution function for two samples and put the plots in the same plot [R]

I am using this code to generate the empirical cumulative distribution function for the two samples (you can put any numerical values in them). I would like to put them in the same plot but if you run the following commands everything is overlapping really bad [see picture 1]. Is there any way to do it like this [see picture 2] (also I want the symbols to disappear and be a line like the picture 2) .
plot(ecdf(sample[,1]),pch = 1)
par(new=TRUE)
plot(ecdf(sample[,2]),pch = 2)
picture 1:https://www.dropbox.com/s/sg1fr8jydsch4xp/vanboeren2.png?dl=0
picture 2:https://www.dropbox.com/s/erhgla34y5bxa58/vanboeren1.png?dl=0
Update: I am doing this
df1 <- data.frame(x = sample[,1])
df2 <- data.frame(x = sample[,2])
ggplot(df1, aes(x, colour = "g")) + stat_ecdf()
+geom_step(data = df2)
scale_x_continuous(limits = c(0, 5000)) `
which is very close (in terms of shape) but still can not put them at the same plot.
Try this with basic plot:
df1 <- data.frame(x = runif(200,1,5))
df2 <- data.frame(x = runif(200,3,8))
plot(ecdf(df1[,1]),pch = 1, xlim=c(0,10), main=NULL)
par(new=TRUE)
plot(ecdf(df2[,1]),pch = 2, xlim=c(0,10), main=NULL)
Both graphs have now the same xlim (try removing it to see both superimposed incorrectly). The main=NULL removes the title
Result:

How to do a 3D plot using R?

I want to plot a 3D plot using R. My data set is independent, which means the values of x, y, and z are not dependent on each other. The plot I want is given in this picture:
This plot was drawn by someone using MATLAB. How can I can do the same kind of Plot using R?
Since you posted your image file, it appears you are not trying to make a 3d scatterplot, rather a 2d scatterplot with a continuous color scale to indicate the value of a third variable.
Option 1: For this approach I would use ggplot2
# make data
mydata <- data.frame(x = rnorm(100, 10, 3),
y = rnorm(100, 5, 10),
z = rpois(100, 20))
ggplot(mydata, aes(x,y)) + geom_point(aes(color = z)) + theme_bw()
Which produces:
Option 2: To make a 3d scatterplot, use the cloud function from the lattice package.
library(lattice)
# make some data
x <- runif(20)
y <- rnorm(20)
z <- rpois(20, 5) / 5
cloud(z ~ x * y)
I usually do these kinds of plots with the base plotting functions and some helper functions for the color levels and color legend from the sinkr package (you need the devtools package to install from GitHib).
Example:
#library(devtools)
#install_github("marchtaylor/sinkr")
library(sinkr)
# example data
grd <- expand.grid(
x=seq(nrow(volcano)),
y=seq(ncol(volcano))
)
grd$z <- c(volcano)
# plot
COL <- val2col(grd$z, col=jetPal(100))
op <- par(no.readonly = TRUE)
layout(matrix(1:2,1,2), widths=c(4,1), heights=4)
par(mar=c(4,4,1,1))
plot(grd$x, grd$y, col=COL, pch=20)
par(mar=c(4,1,1,4))
imageScale(grd$z, col=jetPal(100), axis.pos=4)
mtext("z", side=4, line=3)
par(op)
Result:

Panel functions in Lattice using differing data

I am working with a data frame called d in R. I want to plot a scatter plot using two of the columns, include a best-fit regression line, and also plot binned means.
I have calculated the centers of the bins and binned means, and included those as columns in the data frame.
I can make the scatter plot and regression line work, but cannot get the binned means to show up. Using the code below I get no errors, but the panel.points function does not show up.
scatter.Epsilon <- xyplot(Epsilon ~ data.subset.UpdatedVS30.091015,
data = d,
grid = TRUE,
scales = list(x = list(log = 10)),
xlab = "Vs30 (m/s)",
ylab = "Epsilon",
ylim = c(-4, 3),
xlim = c(10^2,10^3.4),
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(d$bin.ep[subscripts], d$means.ep[subscripts],
col = 'red')})
scatter.Epsilon
A simplified data set would be:
dist <- rnorm(10,4,100)
x <- seq(1,100)
bin <-rep(50,100)
mean <- rep(mean(dist),100)
d <- data.frame(x,dist,bin,mean)
where dist ~ x is the scatterplot component, and mean represents the binned mean for data points between 1-100, and bin is the bin's center (at 50). I want to add one point at (bin, mean) on top of dist ~ x. My real data set has multiple bins and means based on data.subset.UpdatedVS30.091015 that I want to add on top of Epsilon ~ data.subset.UpdatedVS30.091015.
I think you might be trying to do too much work in the call to panel.points. Using your example data, this code works fine:
scatter.Epsilon <- xyplot(dist ~ x,
data = d,
grid = TRUE,
subscripts = TRUE,
panel=function(x,y,subscripts,...) {
panel.xyplot(x,y)
panel.abline(mod <- lm(y ~ x), col = 'black')
panel.points(bin,mean,col = 'red')})
and plots a red point right where it should be. Have you tried just
panel.points(bin.ep,means.ep,col='red')
There is no grouping variable in your formula, so no need for subscripts.

Resources