In R, how can I tell if the scales on a ggplot object are log or linear? - r

I have many ggplot objects where I wish to print some text (varies from plot to plot) in the same relative position on each plot, regardless of scale. What I have come up with to make it simple is to
define a rescale function (call it sx) to take the relative position I want and return that position on the plot's x axis.
sx <- function(pct, range=xr){
position <- range[1] + pct*(range[2]-range[1])
}
make the plot without the text (call it plt)
Use the ggplot_build function to find the x scale's range
xr <- ggplot_build(plt)$layout$panel_params[[1]]$x.range
Then add the text to the plot
plt <- plt + annotate("text", x=sx(0.95), ....)
This works well for me, though I'm sure there are other solutions folks have derived. I like the solution because I only need to add one step (step 3) to each plot. And it's a simple modification to the annotate command (x goes to sx(x)).
If someone has a suggestion for a better method I'd like to hear about it. There is one thing about my solution though that gives me a little trouble and I'm asking for a little help:
My problem is that I need a separate function for log scales, (call it lx). It's a bit of a pain because every time I want to change the scale I need to modify the annotate commands (change sx to lx) and occasionally there are many. This could easily be solved in the sx function if there was a way to tell what the type of scale was. For instance, is there a parameter in ggplot_build objects that describe the log/lin nature of the scale? That seems to be the best place to find it (that's where I'm pulling the scale's range) but I've looked and can not figure it out. If there was, then I could add a command to step 3 above to define the scale type, and add a tag to the sx function in step 1. That would save me some tedious work.
So, just to reiterate: does anyone know how to tell the scaling (type of scale: log or linear) of a ggplot object? such as using the ggplot_build command's object?

Suppose we have a list of pre-build plots:
linear <- ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) +
geom_point()
log <- linear + scale_y_log10()
linear <- ggplot_build(linear)
log <- ggplot_build(log)
plotlist <- list(a = linear, b = log)
We can grab information about their position scales in the following way:
out <- lapply(names(plotlist), function(i) {
# Grab plot, panel parameters and scales
plot <- plotlist[[i]]
params <- plot$layout$panel_params[[1]]
scales <- plot$plot$scales$scales
# Only keep (continuous) position scales
keep <- vapply(scales, function(x) {
inherits(x, "ScaleContinuousPosition")
}, logical(1))
scales <- scales[keep]
# Grab relevant transformations
out <- lapply(scales, function(scale) {
data.frame(position = scale$aesthetics[1],
# And now for the actual question:
transformation = scale$trans$name,
plot = i)
})
out <- do.call(rbind, out)
# Grab relevant ranges
ranges <- params[paste0(out$position, ".range")]
out$min <- sapply(ranges, `[`, 1)
out$max <- sapply(ranges, `[`, 2)
out
})
out <- do.call(rbind, out)
Which will give us:
out
position transformation plot min max
1 x identity a 1.8800000 4.520000
2 y identity a 4.1200000 8.080000
3 y log-10 b 0.6202605 0.910835
4 x identity b 1.8800000 4.520000
Or if you prefer a straightforward answer:
log$plot$scales$scales[[1]]$trans$name
[1] "log-10"

Related

Setting equal xlim and ylim in plot function

Is there a way to get the plot function to generate equal xlimand ylimautomatically?
I do not want to define a fix range beforehand, but I want the plot function to decide about the range itself. However, I expect it to pick the same range for x and y.
A possible solution is to define a wrapper to the plot function:
plot.Custom <- function(x, y, ...) {
.limits <- range(x, y)
plot(x, y, xlim = .limits, ylim = .limits, ...)
}
One way is to manipulate interactively and then choose the right one. A slider will appear once you run the following code.
library(manipulate)
manipulate(
plot(cars, xlim=c(x.min,x.max)),
x.min=slider(0,15),
x.max=slider(15,30))
I'm not aware of anyway to do this using plot(doesn't mean there isn't one). ggplot might be the way to go; it lends itself more to be being retroactively changed since it is designed around a layer system.
library(ggplot2)
#Creating our ggplot object
loop_plot <- ggplot(cars, aes(x = speed, y = dist)) +
geom_point()
#pulling out the 'auto' x & y axis limits
rangepull <- t(cbind(
ggplot_build(loop_plot)$panel$ranges[[1]]$x.range,
ggplot_build(loop_plot)$panel$ranges[[1]]$y.range))
#taking the max and min(so we don't cut out data points)
newrange <- list(cor.min = min(rangepull[,1]), cor.max = max(rangepull[,2]))
#changing our plot size to be nice and symmetric
loop_plot <- loop_plot +
xlim(newrange$cor.min, newrange$cor.max) +
ylim(newrange$cor.min, newrange$cor.max)
Note that the loop_plot object is of ggplot class, and wont actually print until its called.
I used the cars dataset in the code above to show whats going on, but just sub in your data set[s] and then do whatever postmortem your end goal is.
You'll also be able to add in titles and the like based off of the dataset name et cetera which will likely end up producing a clearer visualization out of your loop.
Hopefully this works for your needs.

Get width of plot area in ggplot2

Is there any way to get the width of the plot area in the grid window? It grows or shrinks, for instance, if plot.margin is changed or if the y-axis labels' font-size is increased. Is is hidden somewhere in str(p)?
Any size measure would work. I need to be able to measure the relative change in the width of the plot area in different scenarios such as change of y-axis labels' font-size.
df = data.frame(x = (1:3),One=c(12, 8, 13),Two=c(13, 7, 11),Three=c(11, 9, 11))
df.melt = melt(df, id.vars="x")
p = ggplot(df.melt, aes(x=x, y=value, color=variable)) +
geom_line() +
coord_cartesian(xlim=c(min(df.melt$x),max(df.melt$x))) +
theme(legend.position="none", plot.margin = unit(c(1, 4, 1, 1), "cm"))
p
UPDATE – To clarify: Please help me calculate a/b.
p = ggplot(df.melt, aes(x=x, y=value, color=variable)) +
geom_line() + coord_cartesian(xlim=c(min(df.melt$x),max(df.melt$x))) +
theme(legend.position="none")
p1 = p + theme(plot.margin=unit(c(1,1,1,1),"cm"), axis.text.y=element_text(size=10))
p2 = p + theme(plot.margin=unit(c(1,1,1,2),"cm"), axis.text.y=element_text(size=30))
grid.arrange(p1, p2, ncol=2)
The plot in ggplot2 uses grid graphics. A graphical scene that has been produced
using the grid graphics package consists of grobs and viewports.
You can use gridDebug package for the inspection of the grobs.
showGrob show the locations and names of the grobs used to draw the scene
showGrob()
Get the gpath of the grob
sceneListing <- grid.ls(viewports=T, print=FALSE)
do.call("cbind", sceneListing)
name gPath
[1,] "ROOT" ""
[2,] "GRID.gTableParent.45019" ""
[3,] "background.1-5-6-1" "GRID.gTableParent.45019"
[4,] "spacer.4-3-4-3" "GRID.gTableParent.45019"
[5,] "panel.3-4-3-4" "GRID.gTableParent.45019"
[6,] "grill.gTree.44997" "GRID.gTableParent.45019::panel.3-4-3-4"
Retrieve the gorb
h <- grid.get(gPath="GRID.gTableParent.45019")
get h properties (e.g)
h$layoutvp$width
Application:
grid.get('x',grep=TRUE,global=T)
(polyline[panel.grid.minor.x.polyline.21899], polyline[panel.grid.major.x.polyline.21903], gTableChild[axis-l.3-3-3-3], gTableChild[axis-b.4-4-4-4], gTableChild[xlab.5-4-5-4])
> grid.get('x',grep=TRUE,global=T)[[3]]
gTableChild[axis-l.3-3-3-3]
> xx <- grid.get('x',grep=TRUE,global=T)[[3]]
> grobWidth(xx)
[1] sum(1grobwidth, 0.15cm+0.1cm)
This intrigued me enough to look into it deeper. I was hoping that the grid.ls function would give the information to navigate to the correct viewports to get the information, but for your example there are a bunch of the steps that get replaced with '...' and I could not see how to change that to give something that is easily worked with. However using grid.ls or other tools you can see the names of the different viewports. The viewports of interest are both named 'panel.3-4-3-4' for your example, below is some code that will navigate to the 1st, find the width in inches, navigate to the second and find the width of that one in inches.
grid.ls(view=TRUE,grob=FALSE)
current.vpTree()
seekViewport('panel.3-4-3-4')
a <- convertWidth(unit(1,'npc'), 'inch', TRUE)
popViewport(1)
seekViewport('panel.3-4-3-4')
b <- convertWidth(unit(1,'npc'), 'inch', TRUE)
a/b
I could not figure out an easy way to get to the second panel without poping the first one. This works and gives the information that you need, unfortunately since it pops the 1st panel off the list you cannot go back to it and find additional information or modify it. But this does give the info you asked for that could be used in future plots.
Maybe someone else knows how to navigate to the second panel without popping the first, or getting the full vpPath of each of them to navigate directly.
This answer is mainly in reply to comments by #java_xof. The reply is too long and includes code so it will not fit in a comment. However, it may help with the original question as well (or at least give a starting place).
Here is a function and some code using it (it requires the tcltk and tkrplot packages):
library(ggplot2)
library(tkrplot)
TkPlotLocations <- function(FUN) {
require(tkrplot)
cl <- substitute(FUN)
replot <- function() eval(cl)
tt <- tktoplevel()
img <- tkrplot(tt, replot, vscale=1.5, hscale=1.5)
tkpack(img)
tkpack(xfr <- tkframe(tt), side='left')
tkpack(yfr <- tkframe(tt), side='left')
xndc <- tclVar()
yndc <- tclVar()
xin <- tclVar()
yin <- tclVar()
tkgrid(tklabel(xfr, text='x ndc'), tklabel(xfr, textvariable=xndc))
tkgrid(tklabel(yfr, text='y ndc'), tklabel(yfr, textvariable=yndc))
tkgrid(tklabel(xfr, text='x inch'), tklabel(xfr, textvariable=xin))
tkgrid(tklabel(yfr, text='y inch'), tklabel(yfr, textvariable=yin))
iw <- as.numeric(tcl("image","width", tkcget(img, "-image")))
ih <- as.numeric(tcl("image","height",tkcget(img, "-image")))
cc <- function(x,y) {
x <- (as.real(x)-1)/iw
y <- 1-(as.real(y)-1)/ih
c(x,y)
}
mm <- function(x, y) {
xy <- cc(x,y)
tclvalue(xndc) <- xy[1]
tclvalue(yndc) <- xy[2]
tclvalue(xin) <- grconvertX(xy[1], from='ndc', to='inches')
tclvalue(yin) <- grconvertY(xy[2], from='ndc', to='inches')
}
tkbind( img, "<Motion>", mm)
invisible()
}
x <- runif(25)
y <- rnorm(25, x, 0.25)
plot(x,y)
par()$pin
par()$plt
TkPlotLocations(plot(x,y))
qplot(x,y)
par()$pin
par()$plt
TkPlotLocations(print(qplot(x,y)))
qplot(x,y) + xlab('Multi\nline\nx\nlabel')
par()$pin
par()$plt
TkPlotLocations(print(qplot(x,y) + xlab('Multi\nline\nx\nlabel')))
Defining the above function, then running the following lines will produce 3 plots of the same random data. You can see that the results of par()$pin and par()$plt (and other parameters) are exactly the same for the 3 plots even though the plotting regions differ in the plots.
There will also be 3 new windows that have popped up, in the windows you can move the mouse pointer over the graph and at the bottom of the window you will see the current location of the pointer in normalized device coordinates and in inches (both from the bottom left corner of the device region). You can hover the mouse pointer over the corners of the graph (or any other part) to see the values and compare between the 3 graphs.
This may be enough to answer at least part of the original question (just not programatically, which would be more useful). The functon can be modified to print out other measurements as well. I may expand this and include it in a package in the future if others would be interested.

Can I tell ggpairs to use log scales?

Can I provide a parameter to the ggpairs function in the GGally package to use log scales for some, not all, variables?
You can't provide the parameter as such (a reason is that the function creating the scatter plots is predefined without scale, see ggally_points), but you can change the scale afterward using getPlot and putPlot. For instance:
custom_scale <- ggpairs(data.frame(x=exp(rnorm(1000)), y=rnorm(1000)),
upper=list(continuous='points'), lower=list(continuous='points'))
subplot <- getPlot(custom_scale, 1, 2) # retrieve the top left chart
subplotNew <- subplot + scale_y_log10() # change the scale to log
subplotNew$type <- 'logcontinuous' # otherwise ggpairs comes back to a fixed scale
subplotNew$subType <- 'logpoints'
custom_scale <- putPlot(custom_fill, subplotNew, 1, 2)
This is essentially the same answer as Jean-Robert but looks much more simple (approachable). I don't know if it is a new feature but it doesn't look like you need to use getPlot or putPlot anymore.
custom_scale[1,2]<-custom_scale[1,2] + scale_y_log10() + scale_x_log10()
Here is a function to apply it across a big matrix. Supply the number of rows in the plot and the name of the plot.
scalelog2<-function(x=2,g){ #for below diagonal
for (i in 2:x){
for (j in 1:(i-1)) {
g[i,(j)]<-g[i,(j)] + scale_x_continuous(trans='log2') +
scale_y_continuous(trans='log2')
} }
for (i in 1:x){ #for the bottom row
g[(x+1),i]<-g[(x+1),i] + scale_y_continuous(trans='log2')
}
for (i in 1:x){ #for the diagonal
g[i,i]<-g[i,i]+ scale_x_continuous(trans='log2') }
return(g) }
It's probably better use a linear scale and log transform variables as appropriate before supplying them to ggpairs because this avoids ambiguity in how the correlation coefficients have been computed (before or after log-transform).
This can be easily achieved e.g. like this:
library(tidyverse)
log10_vars <- vars(ends_with(".Length")) # define variables to be transformed
iris %>% # use standard R example dataframe
mutate_at(log10_vars, log10) %>% # log10 transform selected columns
rename_at(log10_vars, sprintf, fmt="log10 %s") %>% # rename variables accordingly
GGally::ggpairs(aes(color=Species))

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

add labels to lattice barchart

I would like to place the value for each bar in barchart (lattice) at the top of each bar. However, I cannot find any option with which I can achieve this. I can only find options for the axis.
Create a custom panel function, e.g.
library("lattice")
p <- barchart((1:10)^2~1:10, horiz=FALSE, ylim=c(0,120),
panel=function(...) {
args <- list(...)
panel.text(args$x, args$y, args$y, pos=3, offset=1)
panel.barchart(...)
})
print(p)
I would have suggested using the new directlabels package, which can be used with both lattice and ggplot (and makes life very easy for these labeling problems), but unfortunately it doesn't work with barcharts.
Since I had to do this anyway, here's a close-enough-to-figure it out code sample along the lines of what #Alex Brown suggests (scores is a 2D array of some sort, which'll get turned into a grouped vector):
barchart(scores, horizontal=FALSE, stack=FALSE,
xlab='Sample', ylab='Mean Score (max of 9)',
auto.key=list(rectangles=TRUE, points=FALSE),
panel=function(x, y, box.ratio, groups, errbars, ...) {
# We need to specify groups because it's not actually the 4th
# parameter
panel.barchart(x, y, box.ratio, groups=groups, ...)
x <- as.numeric(x)
nvals <- nlevels(groups)
groups <- as.numeric(groups)
box.width <- box.ratio / (1 + box.ratio)
for(i in unique(x)) {
ok <- x == i
width <- box.width / nvals
locs <- i + width * (groups[ok] - (nvals + 1)/2)
panel.arrows(locs, y[ok] + 0.5, scores.ses[,i], ...)
}
} )
I haven't tested this, but the important bits (the parts determining the locs etc. within the panel function) do work. That's the hard part to figure out. In my case, I was actually using panel.arrows to make errorbars (the horror!). But scores.ses is meant to be an array of the same dimension as scores.
I'll try to clean this up later - but if someone else wants to, I'm happy for it!
If you are using the groups parameter you will find the labels in #rcs's code all land on top of each other. This can be fixed by extending panel.text to work like panel.barchart, which is easy enough if you know R.
I can't post the code of the fix here for licencing reasons, sorry.

Resources