Create a Scatterplot of Raster Images in R - r

I am not entirely sure if these kind of questions are allowed at SO, as I have no reproducible data at the moment.
My question is in regards to how one might go about creating a scatterplot of raster images in R. I am not familiar with any packages that allow you to do this. This is the only example I have come across so far in my search. Essentially, this is what I would like to do, however, I am wondering if it's possible for R to simply take the input data and plot the image rather than be fed coordinates in my plot area.
My end goal is to create raster image scatterplots of sports teams using their logos instead of labels. My first thought is to create a data frame including team name, X variable, Y variable, and .png image URL location.
Here is an example of what I am ultimately hoping to do. I'm not sure what program the OP uses, but obviously I would like to do something like this in R.
UPDATE
With the help of Greg Snow's suggestion, I was able to reproduce his example with my own logos.

The the my.symbols and ms.image functions in the TeachingDemos package are one possible starting place. There is an example on the help page for ms.image that shows how to use the R logo as the plotting symbol. Currently it only does one image at a time, so you could either start with a blank plot and loop through the set of images, or a wrapper function can be written that takes a list of images and an indicator of which to plot. Here is a first stab at a wrapper function:
ms.image2 <- function(imgs, transpose=TRUE,
which=1, ...) {
ms.image(imgs[[which]], transpose=transpose, ...)
}
Then we can create a list of images with code like:
require(png)
img1 <- readPNG(system.file("img", "Rlogo.png", package="png"))
logos <- list( img1, img1[76:1,,], img1[,100:1,],
img1[76:1,100:1,], img1[,,c(3:1,4)])
These are all variations on the logo, but for your example you could pass a vector of file names of the .png files to lapply to produce a similar list.
Now we can run my.symbols like this (though obviously you will use real data rather than random numbers for the locations):
my.symbols( runif(10), runif(10), ms.image2,
MoreArgs=list(imgs=logos), which=rep(1:5,2),
inches=0.3, symb.plots=TRUE, add=FALSE)
And that produces a plot along the lines of your example:
Edit
For a speed up you can use rasterImage, here is some new code that ran in about half the time as the above (compared using microbenchmark):
ms.rasterImage <- function(imgs, which=1, ...) {
rasterImage(imgs[[which]], -1, -1, 1, 1)
}
logos2 <- list(as.raster(img1), as.raster(img1[76:1,,]),
as.raster(img1[,100:1,]),
as.raster(img1[76:1,100:1,]),
as.raster(img1[,,c(3:1,4)])
)
my.symbols( runif(10), runif(10), ms.rasterImage,
MoreArgs=list(imgs=logos2), which=rep(1:5,2),
inches=0.3, symb.plots=TRUE, add=FALSE)
And here is some code using ggplot2 based on the link in the comment above, but using the list of logos:
ggplot(mtcars, aes(mpg, wt)) +
mapply(function(xx, yy, i)
annotation_raster(logos[[i]], xmin=xx-1, xmax=xx+1, ymin=yy-0.2, ymax=yy+0.2),
mtcars$mpg, mtcars$wt, mtcars$gear-2)
And mainly for my curiosity, here are the timings:
> microbenchmark(
+ my.symbols( mtcars$mpg, mtcars$wt, ms.image2,
+ MoreArgs=list(imgs=logos), which=mtcars$gear-2,
+ inches=0.3, symb.plots=TRUE, add=FALSE),
+ my.symbols( mtcars$mpg, mtcars$wt, ms.rasterImage,
+ MoreArgs=list(imgs=logos2), which=mtcars$gear-2,
+ inches=0.3, symb.plots=TRUE, add=FALSE),
+ plot(ggplot(mtcars, aes(mpg, wt)) +
+ mapply(function(xx, yy, i)
+ annotation_raster(logos[[i]], xmin=xx-1, xmax=xx+1, ymin=yy-0.2, ymax=yy+0.2),
+ mtcars$mpg, mtcars$wt, mtcars$gear-2) )
+ )
Unit: milliseconds
min lq mean median uq max neval cld
ms.image 518.9137 530.5549 661.9333 545.3890 751.7116 1737.7430 100 b
ms.rasterImage 158.7097 162.4493 244.6673 171.6103 381.6499 544.1656 100 a
ggplot2 478.3005 606.3831 896.8793 772.7210 1359.8888 1714.5647 100 c

Related

In R, how can I tell if the scales on a ggplot object are log or linear?

I have many ggplot objects where I wish to print some text (varies from plot to plot) in the same relative position on each plot, regardless of scale. What I have come up with to make it simple is to
define a rescale function (call it sx) to take the relative position I want and return that position on the plot's x axis.
sx <- function(pct, range=xr){
position <- range[1] + pct*(range[2]-range[1])
}
make the plot without the text (call it plt)
Use the ggplot_build function to find the x scale's range
xr <- ggplot_build(plt)$layout$panel_params[[1]]$x.range
Then add the text to the plot
plt <- plt + annotate("text", x=sx(0.95), ....)
This works well for me, though I'm sure there are other solutions folks have derived. I like the solution because I only need to add one step (step 3) to each plot. And it's a simple modification to the annotate command (x goes to sx(x)).
If someone has a suggestion for a better method I'd like to hear about it. There is one thing about my solution though that gives me a little trouble and I'm asking for a little help:
My problem is that I need a separate function for log scales, (call it lx). It's a bit of a pain because every time I want to change the scale I need to modify the annotate commands (change sx to lx) and occasionally there are many. This could easily be solved in the sx function if there was a way to tell what the type of scale was. For instance, is there a parameter in ggplot_build objects that describe the log/lin nature of the scale? That seems to be the best place to find it (that's where I'm pulling the scale's range) but I've looked and can not figure it out. If there was, then I could add a command to step 3 above to define the scale type, and add a tag to the sx function in step 1. That would save me some tedious work.
So, just to reiterate: does anyone know how to tell the scaling (type of scale: log or linear) of a ggplot object? such as using the ggplot_build command's object?
Suppose we have a list of pre-build plots:
linear <- ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) +
geom_point()
log <- linear + scale_y_log10()
linear <- ggplot_build(linear)
log <- ggplot_build(log)
plotlist <- list(a = linear, b = log)
We can grab information about their position scales in the following way:
out <- lapply(names(plotlist), function(i) {
# Grab plot, panel parameters and scales
plot <- plotlist[[i]]
params <- plot$layout$panel_params[[1]]
scales <- plot$plot$scales$scales
# Only keep (continuous) position scales
keep <- vapply(scales, function(x) {
inherits(x, "ScaleContinuousPosition")
}, logical(1))
scales <- scales[keep]
# Grab relevant transformations
out <- lapply(scales, function(scale) {
data.frame(position = scale$aesthetics[1],
# And now for the actual question:
transformation = scale$trans$name,
plot = i)
})
out <- do.call(rbind, out)
# Grab relevant ranges
ranges <- params[paste0(out$position, ".range")]
out$min <- sapply(ranges, `[`, 1)
out$max <- sapply(ranges, `[`, 2)
out
})
out <- do.call(rbind, out)
Which will give us:
out
position transformation plot min max
1 x identity a 1.8800000 4.520000
2 y identity a 4.1200000 8.080000
3 y log-10 b 0.6202605 0.910835
4 x identity b 1.8800000 4.520000
Or if you prefer a straightforward answer:
log$plot$scales$scales[[1]]$trans$name
[1] "log-10"

Extracting the exact coordinates of a mouse click in an interactive plot

In short: I'm looking for a way to get the exact coordinates of a series of mouse positions (on-clicks) in an interactive x/y scatter plot rendered by ggplot2 and ggplotly.
I'm aware that plotly (and several other interactive plotting packages for R) can be combined with Shiny, where a box- or lazzo select can return a list of all data points within the selected subspace. This list will be HUGE in most of the datasets I'm analysing, however, and I need to be able to do the analysis reproducibly in an R markdown format (writing a few, mostly less than 5-6, point coordinates is much more readable). Furthermore, I have to know the exact positions of the clicks to be able to extract points within the same polygon of points in a different dataset, so a list of points within the selection in one dataset is not useful.
The grid.locator() function from the grid package does almost what I'm looking for (the one wrapped in fx gglocator), however I hope there is a way to do the same within an interactive plot rendered by plotly (or maybe something else that I don't know of?) as the data sets are often HUGE (see the plot below) and thus being able to zoom in and out interactively is very much appreciated during several iterations of analysis.
Normally I have to rescale the axes several times to simulate zooming in and out which is exhausting when doing it MANY times. As you can see in the plot above, there is a LOT of information in the plots to explore (the plot is about 300MB in memory).
Below is a small reprex of how I'm currently doing it using grid.locator on a static plot:
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
locator <- function(p) {
# Build ggplot object
ggobj <- ggplot_build(p)
# Extract coordinates
xr <- ggobj$layout$panel_ranges[[1]]$x.range
yr <- ggobj$layout$panel_ranges[[1]]$y.range
# Variable for selected points
selection <- data.frame(x = as.numeric(), y = as.numeric())
colnames(selection) <- c(ggobj$plot$mapping$x, ggobj$plot$mapping$y)
# Detect and move to plot area viewport
suppressWarnings(print(ggobj$plot))
panels <- unlist(current.vpTree()) %>%
grep("panel", ., fixed = TRUE, value = TRUE)
p_n <- length(panels)
seekViewport(panels, recording=TRUE)
pushViewport(viewport(width=1, height=1))
# Select point, plot, store and repeat
for (i in 1:10){
tmp <- grid.locator('native')
if (is.null(tmp)) break
grid.points(tmp$x,tmp$y, pch = 16, gp=gpar(cex=0.5, col="darkred"))
selection[i, ] <- as.numeric(tmp)
}
grid.polygon(x= unit(selection[,1], "native"), y= unit(selection[,2], "native"), gp=gpar(fill=NA))
#return a data frame with the coordinates of the selection
return(selection)
}
locator(p)
and from here use the point.in.polygon function to subset the data based on the selection.
A possible solution could be to add, say 100x100, invisible points to the plot and then use the plotly_click feature of event_data() in a Shiny app, but this is not at all ideal.
Thanks in advance for your ideas or solutions, I hope my question was clear enough.
-- Kasper
I used ggplot2. Besides the materials at https://shiny.rstudio.com/articles/plot-interaction.html, I'd like to mention the following:
Firstly, when you create the plot, don't use "print( )" within "renderPlot( )", or the coordinates would be wrong. For instance, if you have the following in UI:
plotOutput("myplot", click = "myclick")
The following in the Server would work:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
p
})
But the clicking coordinates would be wrong if you do:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
print(p)
})
Then, you could store the coordinates by adding to the Server:
mydata = reactiveValues(x_values = c(), y_values = c())
observeEvent(input$myclick, {
mydata$x_values = c(mydata$x_values, input$myclick$x)
mydata$y_values = c(mydata$y_values, input$myclick$y)
})
In addition to X-Y coordinates, when you use facet with ggplot2, you refer to the clicked facet panel by
input$myclick$panelvar1

Setting equal xlim and ylim in plot function

Is there a way to get the plot function to generate equal xlimand ylimautomatically?
I do not want to define a fix range beforehand, but I want the plot function to decide about the range itself. However, I expect it to pick the same range for x and y.
A possible solution is to define a wrapper to the plot function:
plot.Custom <- function(x, y, ...) {
.limits <- range(x, y)
plot(x, y, xlim = .limits, ylim = .limits, ...)
}
One way is to manipulate interactively and then choose the right one. A slider will appear once you run the following code.
library(manipulate)
manipulate(
plot(cars, xlim=c(x.min,x.max)),
x.min=slider(0,15),
x.max=slider(15,30))
I'm not aware of anyway to do this using plot(doesn't mean there isn't one). ggplot might be the way to go; it lends itself more to be being retroactively changed since it is designed around a layer system.
library(ggplot2)
#Creating our ggplot object
loop_plot <- ggplot(cars, aes(x = speed, y = dist)) +
geom_point()
#pulling out the 'auto' x & y axis limits
rangepull <- t(cbind(
ggplot_build(loop_plot)$panel$ranges[[1]]$x.range,
ggplot_build(loop_plot)$panel$ranges[[1]]$y.range))
#taking the max and min(so we don't cut out data points)
newrange <- list(cor.min = min(rangepull[,1]), cor.max = max(rangepull[,2]))
#changing our plot size to be nice and symmetric
loop_plot <- loop_plot +
xlim(newrange$cor.min, newrange$cor.max) +
ylim(newrange$cor.min, newrange$cor.max)
Note that the loop_plot object is of ggplot class, and wont actually print until its called.
I used the cars dataset in the code above to show whats going on, but just sub in your data set[s] and then do whatever postmortem your end goal is.
You'll also be able to add in titles and the like based off of the dataset name et cetera which will likely end up producing a clearer visualization out of your loop.
Hopefully this works for your needs.

R ggplot: geom_tile lines in pdf output

I'm constructing a plot that uses geom_tile and then outputting it to .pdf (using pdf("filename",...)). However, when I do, the .pdf result has tiny lines (striations, as one person put it) running through it. I've attached an image showing the problem.
Googling let to this thread, but the only real advice in there was to try passing size=0 to geom_tile, which I did with no effect. Any suggestions on how I can fix these? I'd like to use this as a figure in a paper, but it's not going to work like this.
Minimal code:
require(ggplot2)
require(scales)
require(reshape)
volcano3d <- melt(volcano)
names(volcano3d) <- c("x", "y", "z")
v <- ggplot(volcano3d, aes(x, y, z = z))
pdf("mew.pdf")
print(v + geom_tile(aes(fill=z)) + stat_contour(size=2) + scale_fill_gradient("z"))
This happens because the default colour of the tiles in geom_tile seems to be white.
To fix this, you need to map the colour to z in the same way as fill.
print(v +
geom_tile(aes(fill=z, colour=z), size=1) +
stat_contour(size=2) +
scale_fill_gradient("z")
)
Try to use geom_raster:
pdf("mew.pdf")
print(v + geom_raster(aes(fill=z)) + stat_contour(size=2) + scale_fill_gradient("z"))
dev.off()
good quality in my environment.
I cannot reproduce the problem on my computer (Windows 7), but I remember it was a problem discussed on the list for certain configurations. Brian Ripley (if I remember) recommended
CairoPDF("mew.pdf") # Package Cairo
to get around this
In the interests of skinning this cat, and going into waaay too much detail, this code decomposes the R image into a mesh of quads (as used by rgl), and then shows the difference between a raster plot and a "tile" or "rect" plot.
library(raster)
im <- raster::raster(volcano)
## this is the image in rgl corner-vertex form
msh <- quadmesh::quadmesh(im)
## manual labour for colour scaling
dif <- diff(range(values(im)))
mn <- min(values(im))
scl <- function(x) (x - mn)/dif
This the the traditional R 'image', which draws a little tile or 'rect()' for every pixel.
list_image <- list(x = xFromCol(im), y = rev(yFromRow(im)), z = t(as.matrix(im)[nrow(im):1, ]))
image(list_image)
It's slow, and though it calls the source of 'rect()' under the hood, we can't also set the border colour. Use 'useRaster = TRUE' to use 'rasterImage' for more efficient drawing time, control over interpolation, and ultimately - file size.
Now let's plot the image again, but by explicitly calling rect for every pixel. ('quadmesh' probably not the easiest way to demonstrate, it's just fresh in my mind).
## worker function to plot rect from vertex index
rectfun <- function(x, vb, ...) rect(vb[1, x[1]], vb[2,x[1]], vb[1,x[3]], vb[2,x[3]], ...)
## draw just the borders on the original, traditional image
apply(msh$ib, 2, rectfun, msh$vb, border = "white")
Now try again with 'rect'.
## redraw the entire image, with rect calls
##(not efficient, but essentially the same as what image does with useRaster = FALSE)
cols <- heat.colors(12)
## just to clear the plot, and maintain the plot space
image(im, col = "black")
for (i in seq(ncol(msh$ib))) {
rectfun(msh$ib[,i], msh$vb, col = cols[scl(im[i]) * (length(cols)-1) + 1], border = "dodgerblue")
}

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources