I'm plotting the shap values of my variables using the shapviz package. Specifically, I'm plotting a beeswarm plot using the sv_importance command, and dependence plots for the most important variables using sv_dependence. However, to make the results of different models more easily comparable, I would like to customize the x-axis range to make it equal for every plot. Do you have any suggestions about how to customize the axis range for shapviz objects? Here is a reproducible example:
library(shapviz)
set.seed(1)
X_train <- data.matrix(`colnames<-`(replicate(26, rnorm(100)), LETTERS))
dtrain <- xgboost::xgb.DMatrix(X_train, label = rnorm(100))
fit <- xgboost::xgb.train(data = dtrain, nrounds = 50)
shp <- shapviz(fit, X_pred = X_train)
p <- sv_importance(shp, kind = "beeswarm", show_numbers = TRUE, max_display = 15)
p
d <- sv_dependence(shp, v="I")
d
From here, how can I make the x-axis range in the plot p equal to [-2.0, 2.0] instead of [-1.0, 0.5] (as it is by default)?
p is a ggplot object, so you can add whatever scale you like. Just be aware that the "x" axis is actually the y axis, since coord_flip is used internally:
library(ggplot2)
p + scale_y_continuous(limits = c(-2, 2))
Related
I want to plot the clustering coefficient and the average shortest-
path as a function of the parameter p of the Watts-Strogatz model as following:
And this is my code:
library(igraph)
library(ggplot2)
library(reshape2)
library(pracma)
p <- #don't know how to generate this?
trans <- -1
path <- -1
for (i in p) {
ws_graph <- watts.strogatz.game(1, 1000, 4, i)
trans <-c(trans, transitivity(ws_graph, type = "undirected", vids = NULL,
weights = NULL))
path <- c(path,average.path.length(ws_graph))
}
#Remove auxiliar values
trans <- trans[-1]
path <- path[-1]
#Normalize them
trans <- trans/trans[1]
path <- path/path[1]
x = data.frame(v1 = p, v2 = path, v3 = trans)
plot(p,trans, ylim = c(0,1), ylab='coeff')
par(new=T)
plot(p,path, ylim = c(0,1), ylab='coeff',pch=15)
How should I proceed to make this x-axis?
You can generate the values of p using code like the following:
p <- 10^(seq(-4,0,0.2))
You want your x values to be evenly spaced on a log10 scale. This means you need to take evenly spaced values as the exponent for the base 10, because the log10 scale takes the log10 of your x values, which is the exact opposite operation.
With this, you are already pretty far. You don't need par(new=TRUE), you can simply use the function plot followed by the function points. The latter does not redraw the whole plot. Use the argument log = 'x' to tell R you need a logarithmic x axis. This only needs to be set in the plot function, the points function and all other low-level plot functions (those who do not replace but add to the plot) respect this setting:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x')
points(p,path, ylim = c(0,1), ylab='coeff',pch=15)
EDIT: If you want to replicate the log-axis look of the above plot, you have to calculate them yourselves. Search the internet for 'R log10 minor ticks' or similar. Below is a simple function which can calcluate the appropriate position for log axis major and minor ticks
log10Tck <- function(side, type){
lim <- switch(side,
x = par('usr')[1:2],
y = par('usr')[3:4],
stop("side argument must be 'x' or 'y'"))
at <- floor(lim[1]) : ceil(lim[2])
return(switch(type,
minor = outer(1:9, 10^(min(at):max(at))),
major = 10^at,
stop("type argument must be 'major' or 'minor'")
))
}
After you have defined this function, by using the above code, you can call the function inside the axis(...) function, which draws axes. As a suggestion: save the function away in its own R script and import that script at the top of your calculation using the function source. By this means, you can reuse the function in future projects. Prior to drawing the axes, you have to prevent plot from drawing default axes, so add the parameter axes = FALSE to your plot call:
plot(p,trans, ylim = c(0,1), ylab='coeff', log='x', axes=F)
Then you may generate the axes, using the tick positions generated by the
new function:
axis(1, at=log10Tck('x','major'), tcl= 0.2) # bottom
axis(3, at=log10Tck('x','major'), tcl= 0.2, labels=NA) # top
axis(1, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # bottom
axis(3, at=log10Tck('x','minor'), tcl= 0.1, labels=NA) # top
axis(2) # normal y axis
axis(4) # normal y axis on right side of plot
box()
As a third option, as you are importing ggplot2 in your original post: The same, without all of the above, with ggplot:
# Your data needs to be in the so-called 'long format' or 'tidy format'
# that ggplot can make sense of it. Google 'Wickham tidy data' or similar
# You may also use the function 'gather' of the package 'tidyr' for this
# task, which I find more simple to use.
d2 <- reshape2::melt(x, id.vars = c('v1'), measure.vars = c('v2','v3'))
ggplot(d2) +
aes(x = v1, y = value, color = variable) +
geom_point() +
scale_x_log10()
I want to plot a 3D plot using R. My data set is independent, which means the values of x, y, and z are not dependent on each other. The plot I want is given in this picture:
This plot was drawn by someone using MATLAB. How can I can do the same kind of Plot using R?
Since you posted your image file, it appears you are not trying to make a 3d scatterplot, rather a 2d scatterplot with a continuous color scale to indicate the value of a third variable.
Option 1: For this approach I would use ggplot2
# make data
mydata <- data.frame(x = rnorm(100, 10, 3),
y = rnorm(100, 5, 10),
z = rpois(100, 20))
ggplot(mydata, aes(x,y)) + geom_point(aes(color = z)) + theme_bw()
Which produces:
Option 2: To make a 3d scatterplot, use the cloud function from the lattice package.
library(lattice)
# make some data
x <- runif(20)
y <- rnorm(20)
z <- rpois(20, 5) / 5
cloud(z ~ x * y)
I usually do these kinds of plots with the base plotting functions and some helper functions for the color levels and color legend from the sinkr package (you need the devtools package to install from GitHib).
Example:
#library(devtools)
#install_github("marchtaylor/sinkr")
library(sinkr)
# example data
grd <- expand.grid(
x=seq(nrow(volcano)),
y=seq(ncol(volcano))
)
grd$z <- c(volcano)
# plot
COL <- val2col(grd$z, col=jetPal(100))
op <- par(no.readonly = TRUE)
layout(matrix(1:2,1,2), widths=c(4,1), heights=4)
par(mar=c(4,4,1,1))
plot(grd$x, grd$y, col=COL, pch=20)
par(mar=c(4,1,1,4))
imageScale(grd$z, col=jetPal(100), axis.pos=4)
mtext("z", side=4, line=3)
par(op)
Result:
I want to plot a matrix of z values with x rows and y columns as a surface similar to this graph from MATLAB.
Surface plot:
Code to generate matrix:
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 0.240)# 2013 length
y<-seq(from = -241.440, to = 241.440, by = 0.240)
matrix_fun<-matrix(data = 0, nrow = length(x), ncol = length(y))
# Generate two dimensional travel distance probability density function
for (i in 1:length(x)) {
for (j in 1:length(y)){
dxy<-sqrt(x[i]^2+y[j]^2)
prob<-1/(scale^(shape)*gamma(shape))*dxy^(shape-1)*exp(-(dxy/scale))
matrix_fun[i,j]<-prob
}}
# Rescale 2-d pdf to sum to 1
a<-sum(matrix_fun)
matrix_scale<-matrix_fun/a
I am able to generate surface plots using a couple methods (persp(), persp3d(), surface3d()) but the colors aren't displaying the z values (the probabilities held within the matrix). The z values only seem to display as heights not as differentiated colors as in the MATLAB figure.
Example of graph code and graphs:
library(rgl)
persp3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
surface3d(x=x, y=y, z=matrix_scale, color=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)))
persp(x=x, y=y, z=matrix_scale, theta=30, phi=30, col=rainbow(25, start=min(matrix_scale), end=max(matrix_scale)), border=NA)
Image of the last graph
Any other tips to recreate the image in R would be most appreciated (i.e. legend bar, axis tick marks, etc.)
So here's a ggplot solution which seems to come a little bit closer to the MATLAB plot
# Parameters
shape<-1.849241
scale<-38.87986
x<-seq(from = -241.440, to = 241.440, by = 2.40)
y<-seq(from = -241.440, to = 241.440, by = 2.40)
df <- expand.grid(x=x,y=y)
df$dxy <- with(df,sqrt(x^2+y^2))
df$prob <- dgamma(df$dxy,shape=shape,scale=scale)
df$prob <- df$prob/sum(df$prob)
library(ggplot2)
library(colorRamps) # for matlab.like(...)
library(scales) # for labels=scientific
ggplot(df, aes(x,y))+
geom_tile(aes(fill=prob))+
scale_fill_gradientn(colours=matlab.like(10), labels=scientific)
BTW: You can generate your data frame of probabilities much more efficiently using the built-in dgamma(...) function, rather than calculating it yourself.
In line with alexis_laz's comment, here is an example using filled.contour. You might want to increase your by to 2.40 since the finer granularity increases the time it takes to generate the plot by a lot but doesn't improve quality.
filled.contour(x = x, y = y, z = matrix_scale, color = terrain.colors)
# terrain.colors is in the base grDevices package
If you want something closer to your color scheme above, you can fiddle with the rainbow function:
filled.contour(x = x, y = y, z = matrix_scale,
color = (function(n, ...) rep(rev(rainbow(n/2, ...)[1:9]), each = 3)))
Finer granularity:
filled.contour(x = x, y = y, z = matrix_scale, nlevels = 150,
color = (function(n, ...)
rev(rep(rainbow(50, start = 0, end = 0.75, ...), each = 3))[5:150]))
I am using rCharts to create an interactive scatter plot in R. The following code works just fine:
library(rCharts)
test.df <- data.frame(x=sample(1:100,size=30,replace=T),
y=sample(10:1000,size=30,replace=T),
g=rep(c('a','b'),each=15))
n1 <- nPlot(y ~ x, group="g", data=test.df, type='scatterChart')
n1
What I need is to use a log-scale for both X- and Y-axis. How can I specify this in rCharts without hacking the produced html/javascript?
Update 1:
A more realistic and static version of the plot I am trying to get plotted with rCharts:
set.seed(2935)
y_nbinom <- c(rnbinom(n=20, size=10, mu=90), rnbinom(n=20, size=20, mu=1282), rnbinom(n=20, size=30, mu=12575))
x_nbinom <- c(rnbinom(n=20, size=30, mu=100), rnbinom(n=20, size=40, mu=1000), rnbinom(n=20, size=50, mu=10000))
x_fixed <- c(rep(100,20), rep(1000,20), rep(10000,20))
realp <- rep(0:2, each=2) * 20 + sample(1:20, size=6, replace=F)
tdf <- data.frame(y = c(y_nbinom,y_nbinom,y_nbinom[realp]), x=c(x_fixed,x_nbinom,x_nbinom[realp]), type=c(rep(c('fixed','nbinom'),each=60), rep('real',6)))
with(tdf, plot(x, y, col=type, pch=19, log='xy'))
I think this question is a bit old but I had a similar problem and solve it using the info in this post:
rCharts nvd3 library force ticks
Here my solution for a base10 log-scaled stacked area chart, it shouldn't be too different for a scatter plot.
df<-data.frame(x=rep(10^seq(0,5,length.out=24),each=4),
y=round(runif(4*24,1,50)),
var=rep(LETTERS[1:4], 4))
df$x<-log(df$x,10)
p <- nPlot(y ~ x, group = 'var', data = df,
type = 'stackedAreaChart', id = 'chart')
p$xAxis(tickFormat = "#!function (x) {
tickformat = [1,10,100,1000,10000,'100k'];
return tickformat[x];}!#")
p
I am trying to plot 4 ecdf functions on one plot but can't seem to figure out the proper syntax.
If I have 4 functions "A, B, C, D" what would be the proper syntax in R to get them to be plotted on the same chart with different colors. Thanks!
Here is one way (for three of them, works for four the same way):
set.seed(42)
ecdf1 <- ecdf(rnorm(100)*0.5)
ecdf2 <- ecdf(rnorm(100)*1.0)
ecdf3 <- ecdf(rnorm(100)*2.0)
plot(ecdf3, verticals=TRUE, do.points=FALSE)
plot(ecdf2, verticals=TRUE, do.points=FALSE, add=TRUE, col='brown')
plot(ecdf1, verticals=TRUE, do.points=FALSE, add=TRUE, col='orange')
Note that I am using the fact that the third has the widest range, and use that to initialize the canvas. Else you need ylim=c(...).
The package latticeExtra provides the function ecdfplot.
library(lattice)
library(latticeExtra)
set.seed(42)
vals <- data.frame(r1=rnorm(100)*0.5,
r2=rnorm(100),
r3=rnorm(100)*2)
ecdfplot(~ r1 + r2 + r3, data=vals, auto.key=list(space='right')
Here is an approach using ggplot2 (using the ecdf objects from [Dirk's answer])(https://stackoverflow.com/a/20601807/1385941)
library(ggplot2)
# create a data set containing the range you wish to use
d <- data.frame(x = c(-6,6))
# create a list of calls to `stat_function` with the colours you wish to use
ll <- Map(f = stat_function, colour = c('red', 'green', 'blue'),
fun = list(ecdf1, ecdf2, ecdf3), geom = 'step')
ggplot(data = d, aes(x = x)) + ll
A simpler way is to use ggplot and have the variable that you want to plot as a factor. In the example below, I have Portfolio as a factor and plotting the distribution of Interest Rates by Portfolio.
# select a palette
myPal <- c( 'royalblue4', 'lightsteelblue1', 'sienna1')
# plot the Interest Rate distribution of each portfolio
# make an ecdf of each category in Portfolio which is a factor
g2 <- ggplot(mortgage, aes(x = Interest_Rate, color = Portfolio)) +
scale_color_manual(values = myPal) +
stat_ecdf(lwd = 1.25, geom = "line")
g2
You can also set geom = "step", geom = "point" and adjust the line width lwd in the stat_ecdf() function. This gives you a nice plot with the legend.