Add text to a ggpairs() scatterplot? - r

dumb but maddening question: How can I add text labels to my scatterplot points in a ggpairs(...) plot? ggpairs(...) is from the GGally library. The normal geom_text(...) function doesn't seem to be an option, as it take x,y arguments and ggpairs creates an NxN matrix of differently-styled plots.
Not showing data, but imagine I have a column called "ID" with id's of each point that's displayed in the scatterplots.
Happy to add data if it helps, but not sure it's necessary. And maybe the answer is simply that it isn't possible to add text labels to ggpairs(...)?
library(ggplot2)
library(GGally)
ggpairs(hwWrld[, c(2,6,4)], method = "pearson")
Note: Adding labels is for my personal reference. So no need to tell me it would look like an absolute mess. It will. I'm just looking to identify my outliers.
Thanks!

It is most certainly possible. Looking at the documentation for ?GGally::ggpairs there are three arguments, upper, lower and diag, which from the details of the documentations are
Upper and lower are lists that may contain the variables 'continuous', 'combo', 'discrete' and 'na'. Each element of thhe list may be a function or a string
... (more description)
If a function is supplied as an option to upper, lower, or diag, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.
Thus a way to "make a label" would be to overwrite the default value of a plot. For example if we wanted to write "hello world" in the upper triangle we could do something like:
library(ggplot2)
library(GGally)
#' Plot continuous upper function, by adding text to the standard plot
#' text is placed straight in the middle, over anything already residing there!
continuous_upper_plot <- function(data, mapping, text, ...){
p <- ggally_cor(data, mapping, ...)
if(!is.data.frame(text))
text <- data.frame(text = text)
lims <- layer_scales(p)
p + geom_label(data = text, aes(x = mean(lims$x$range$range),
y = mean(lims$y$range$range),
label = text),
inherit.aes = FALSE)
}
ggpairs(iris, upper = list(continuous = wrap(continuous_upper_plot,
text = 'hello world')))
with the end result being:
There are 3 things to note here:
I've decided to add the text in the function itself. If your text is part of your existing data, simply using the mapping (aes) argument when calling the function will suffice. And this is likely also better, as you are looking to add text to specific points.
If you have any additional arguments to a function (outside data and mapping) you will need to use wrap to add these to the call.
The function documentation specifically says that arguments should be data, mapping rather than the standard for ggplot2 which is mapping, data. As such for any of the ggplot functions a small wrapper switching their positions will be necessary to overwrite the default arguments for ggpairs.

Related

R, ggplot2: How can I visualize the inheritance or flow of data, attributes, and other components of a graphical object in ggplot2?

I am often unsure of exactly which elements of data, attributes, and other components of a graphical object in ggplot2 are inherited by which other elements, and where the defaults that flow down to, e.g., geoms, originate. In particular cases these questions can generally be answered by close reading of Hadley's ggplot2 book. But I would find it useful to have some sort of visualization of the overall flow of inheritance in ggplot2, and I wonder if anyone has seen, or created, or knows how to create, such a thing. I the same vein, a compact list of default values which arise in one level of specification (like the aes or a theme) and are inherited by another level (like a geom or scale) would be very useful to me, and I suspect to many people learning how to use ggplot2.
I would accept any of the following as an answer:
An inheritance visualization (perhaps as a network?) or a pointer
to same.
Code to construct an inheritance visualization, or a pointer to
same.
An alternative, non-visual approach that makes it easy, or at least
easier, to understand and remember such inheritance and answer
specific questions about it.
A list specifically of argument defaults showing where they arise and
which subsidiary functions inherit them, or code to produce such a
list.
This question seems to be about multiple levels of the ggplot package at once, but I'll try my best to give some information. It's almost impossible to describe the entire inheritance system of ggplot in one stack overflow answer, but pointing at the right functions might help get you started.
At the top level, both data and aesthetic mappings are inherited from the main ggplot call. In code below, geom_point() inherits the mapping and data:
ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point()
Unless you explicitly provide an alternative mapping and set the inheritance to false:
ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point(aes(Petal.Width, Petal.Length), inherit.aes = FALSE)
Next, at the level of individual layers, certain defaults are inherited from either the stats, geoms or positions. Consider the following plot:
df <- reshape2::melt(volcano)
ggplot(df, aes(Var1, Var2)) +
geom_raster()
The raster will be a dark grey colour, because we haven't specified a fill mapping. You can get a sense what defaults a geom / stat has by looking at their ggproto objects:
> GeomRaster$default_aes
Aesthetic mapping:
* `fill` -> "grey20"
* `alpha` -> NA
> StatDensity$default_aes
Aesthetic mapping:
* `y` -> `stat(density)`
* `fill` -> NA
Another key ingredient for understanding how layers are given their parameters, is looking at the layer() code. Specifically, this bit here (abbreviated for clarity):
function (geom = NULL, stat = NULL, data = NULL, mapping = NULL,
position = NULL, params = list(), inherit.aes = TRUE, check.aes = TRUE,
check.param = TRUE, show.legend = NA, key_glyph = NULL,
layer_class = Layer)
{
...
aes_params <- params[intersect(names(params), geom$aesthetics())]
geom_params <- params[intersect(names(params), geom$parameters(TRUE))]
stat_params <- params[intersect(names(params), stat$parameters(TRUE))]
...
ggproto("LayerInstance", layer_class, geom = geom, geom_params = geom_params,
stat = stat, stat_params = stat_params, data = data,
mapping = mapping, aes_params = aes_params, position = position,
inherit.aes = inherit.aes, show.legend = show.legend)
}
Wherein you can see that whatever parameters you give, they are checked against valid parameters of the stat/geom/position and distributed to the appropriate part of the layer. As you can see from the last call, a Layer ggproto object is created. The parent of this class is not exported but you can still inspect the functions inside that object. For example, if you are curious about how aesthetics are evaluated, you can type:
ggplot2:::Layer$compute_aesthetics
Wherein you can see that some defaults of the scales are incorporated here as well. Of course, it doesn't make much sense what these layers do if you don't understand the order of operations in which these layer functions are called. For that we can have a look at the plot builder (also abbreviated for clarity):
> ggplot2:::ggplot_build.ggplot
function (plot)
{
...
data <- by_layer(function(l, d) l$setup_layer(d, plot))
...
data <- by_layer(function(l, d) l$compute_aesthetics(d, plot))
data <- lapply(data, scales_transform_df, scales = scales)
...
data <- layout$map_position(data)
data <- by_layer(function(l, d) l$compute_statistic(d, layout))
data <- by_layer(function(l, d) l$map_statistic(d, plot))
scales_add_missing(plot, c("x", "y"), plot$plot_env)
data <- by_layer(function(l, d) l$compute_geom_1(d))
data <- by_layer(function(l, d) l$compute_position(d, layout))
...
data <- by_layer(function(l, d) l$compute_geom_2(d))
data <- by_layer(function(l, d) l$finish_statistics(d))
...
structure(list(data = data, layout = layout, plot = plot),
class = "ggplot_built")
}
From this, you can see that layers are setup first, then aesthetics are computed, then scale transformations are applied, then statistics are computed, then a part of the geom is computed, then the position, and finally the reset of the geom.
What this means is that the statistical transformations you put in are going to be affected by scale transformations, but not coord transformations (which is elsewhere later).
If you go through the code, you'll find that almost nothing theme-related is evaluated up untill this point (except for some theme evaluation with the facets). As you can see, the building function returns an object of the class ggplot_build, which is still not graphical output. The interpretation of theme elements and actual interpretation of the geoms towards grid graphics happens in the following function:
ggplot2:::ggplot_gtable.ggplot_built
After this function, you'll have a gtable object that can be interpreted by grid::grid.draw() which will output to your graphics device.
Unfortunately, I'm not very well-versed in the inheritance of theme elements, but as Jon Spring pointed out in the comments, a good place to start is the documentation.
Hopefully, I've pointed out functions where to look for inheritance patterns in ggplot.

Mosaic plot and text values

I created structable from Titanic dataset and used mosaic function for it. Everything worked great, hovewer I also wanted to label each box from mosaic plot with quantity of titanic passangers given their Class, Survival and Sex. As it turns out, I am not able to do that. I know I need to use labeling_cells to achive that, hovewer i am not able to use it (and i wan't able to find any example) in combination with stuctable and below code.
library("vcd")
struct <- structable(~ Class + Survived + Sex, data = Titanic)
mosaic(struct, data = Titanic, shade = TRUE, direction = "v")
If I understand your question correctly, then the last example in ?labeling_cells is pretty close to what you want to do. Using your example, the labeling_cells() can be added afterwards provided that the viewport tree is not popped. The only aspect that is somewhat awkward is that the struct object has to be a regular table again for the labeling. I have to ask David, the main author, whether this could be handled automatically.
mosaic(struct, shade = TRUE, direction = "v", pop = FALSE)
labeling_cells(text = as.table(struct), margin = 0)(as.table(struct))
Fixed in upstream in vcd 1.4-4, but note that you can simply use
mosaic(struct, labeling = labeling_values)

Change of colors in compare.matrix command in r

I'm trying to change the colors for the compare.matrix command in r, but the error is always the same:
Error in image.default(x = mids, y = mids, z = mdata, col = c(heat.colors(10)[10:1]), :
formal argument "col" matched by multiple actual arguments
My code is very simple:
compare.matrix(current,ech_b1,nbins=40)
and some of my attempts are:
compare.matrix(current,ech_b1,nbins=40,col=c(grey.colors(5)))
compare.matrix(current,ech_b1,nbins=40,col=c(grey.colors(10)[10:1]))
Assuming you're using compare.matrix() from the SDMTools package, the color arguments appear to be hard-coded into the function, so you'll need to redefine the function in order to make them flexible:
# this shows you the code in the console
SDMTools::compare.matrix
function(x,y,nbins,...){
#---- preceding code snipped ----#
suppressWarnings(image(x=mids, y=mids, z=mdata, col=c(heat.colors(10)[10:1]),...))
#overlay contours
contour(x=mids, y=mids, z=mdata, col="black", lty="solid", add=TRUE,...)
}
So you can make a new one like so, but bummer, there are two functions using the ellipsis that have a col argument predefined. If you'll only be using extra args to image() and not to contour(), this is cheap and easy.
my.compare.matrix <- function(x,y,nbins,...){
#---- preceding code snipped ----#
suppressWarnings(image(x=mids, y=mids, z=mdata,...))
#overlay contours
contour(x=mids, y=mids, z=mdata, col="black", lty="solid", add=TRUE)
}
If, however, you want to use ... for both internal calls, then the only way I know of to avoid confusion about redundant argument names is to do something like:
my.compare.matrix <- function(x,y,nbins,
image.args = list(col=c(heat.colors(10)[10:1])),
contour.args = list(col="black", lty="solid")){
#---- preceding code snipped ----#
contour.args[[x]] <- contour.args[[y]] <- image.args[[x]] <- image.args[[y]] <- mids
contour.args[[z]] <- image.args[[z]] <- mdata
suppressWarnings(do.call(image, image.args))
#overlay contours
do.call(contour, contour.args)
}
Decomposing this change: instead of ... make a named list of arguments, where the previous hard codes are now defaults. You can then change these items by renaming them in the list or adding to the list. This could be more elegant on the user side, but it gets the job done. Both of the above modifications are untested, but should get you there, and this is all prefaced by my above comment. There may be some other problem that cannot be detected by SO Samaritans because you didn't specify the package or the data.

How to draw loess estimation in GGally using ggpairs?

I tried GGally package a little bit. Especially the ggpairs function. However, I cannot figure out how to use loess instead of lm when plot smooth. Any ideas?
Here is my code:
require(GGally)
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp[,c(1,5)],
lower = list(continuous = "smooth"),
params = c(method = "loess"),
axisLabels = "show")
Thanks!
P.S. compare with the plotmatrix function, ggpairs is much much slower... As a result, most of the time, I just use plotmatrix from ggplot2.
Often it is best to write your own function for it to use. Adapted from this answer to similar question.
library(GGally)
diamonds_sample = diamonds[sample(1:dim(diamonds)[1],200),]
# Function to return points and geom_smooth
# allow for the method to be changed
custom_function = function(data, mapping, method = "loess", ...){
p = ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# test it
ggpairs(diamonds_sample,
lower = list(continuous = custom_function)
)
Produces this:
Well the documentation doesn't say, so use the source, Luke
You can dig deeper into the source with:
ls('package:GGally')
GGally::ggpairs
... and browse every function it references ...
seems like the args get mapped into ggpairsPlots and then -> plotMatrix which then gets called
So apparently selecting smoother is not explicitly supported, you can only select continuous = "smooth". If it behaves like ggplot2:geom_smooth it internally automatically figures out which of the supported smoothers to call (loess for <1000 datapoints, gam for >=1000).
You might like to step it through the debugger to see what's happening inside your plot. I tried to follow the source but my eyes glaze over.
or 2. Browse on https://github.com/ggobi/ggally/blob/master/R/ggpairs.r [4/14/2013]
#' upper and lower are lists that may contain the variables 'continuous',
#' 'combo' and 'discrete'. Each element of the list is a string implementing
#' the following options: continuous = exactly one of ('points', 'smooth',
#' 'density', 'cor', 'blank') , ...
#'
#' diag is a list that may only contain the variables 'continuous' and 'discrete'.
#' Each element of the diag list is a string implmenting the following options:
#' continuous = exactly one of ('density', 'bar', 'blank');

How to only change parameters for "lower" plots in the ggpairs function from GGally package

I have the following example
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
Resulting in a really nice figure:
But my problem is that in the real dataset I have to many points whereby I would like to change the parameters for the point geom. I want to reduce the dot size and use a lower alpha value. I can however not doe this with the "param" option it applies to all plot - not just the lower one:
ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
params=c(alpha=1/10),
axisLabels='show'
)
resulting in this plot:
Is there a way to apply parameters to only "lower" plots - or do I have to use the ability to create custom plots as suggested in the topic How to adjust figure settings in plotmatrix?
In advance - thanks!
There doesn't seem to be any elegant way to do it, but you can bodge it by writing a function to get back the existing subchart calls from the ggally_pairs() object and then squeezing the params in before the last bracket. [not very robust, it'll only work for if the graphs are already valid]
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
g<-ggpairs(diamonds.samp, columns=8:10,
upper=list(continuous='cor'),
lower=list(continuous = 'points'),
diag=list(continuous='density'),
axisLabels='show'
)
add_p<-function(g,i,params){
side=length(g$columns) # get number of cells per side
lapply(i,function(i){
s<-as.character(g$plots[i]) # get existing call as a template
l<-nchar(s)
p<-paste0(substr(s,1,l-1),",",params,")") # append params before last bracket
r<-i%/%side+1 # work out the position on the grid
c<-i%%side
array(c(p,r,c)) # return the sub-plot and position data
})
}
rep_cells<-c(4,7,8)
add_params<-"alpha=0.3, size=0.1, color='red'"
ggally_data<-g$data # makes sure that the internal parameter picks up your data (it always calls it's data 'ggally_data'
calls<-add_p(g,rep_cells,params=add_params) #call the function
for(i in 1:length(calls)){g<-putPlot(g,calls[[i]][1],as.numeric(calls[[i]][2]),as.numeric(calls[[i]][3]))}
g # call the plot

Resources