How to draw loess estimation in GGally using ggpairs? - r

I tried GGally package a little bit. Especially the ggpairs function. However, I cannot figure out how to use loess instead of lm when plot smooth. Any ideas?
Here is my code:
require(GGally)
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
ggpairs(diamonds.samp[,c(1,5)],
lower = list(continuous = "smooth"),
params = c(method = "loess"),
axisLabels = "show")
Thanks!
P.S. compare with the plotmatrix function, ggpairs is much much slower... As a result, most of the time, I just use plotmatrix from ggplot2.

Often it is best to write your own function for it to use. Adapted from this answer to similar question.
library(GGally)
diamonds_sample = diamonds[sample(1:dim(diamonds)[1],200),]
# Function to return points and geom_smooth
# allow for the method to be changed
custom_function = function(data, mapping, method = "loess", ...){
p = ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# test it
ggpairs(diamonds_sample,
lower = list(continuous = custom_function)
)
Produces this:

Well the documentation doesn't say, so use the source, Luke
You can dig deeper into the source with:
ls('package:GGally')
GGally::ggpairs
... and browse every function it references ...
seems like the args get mapped into ggpairsPlots and then -> plotMatrix which then gets called
So apparently selecting smoother is not explicitly supported, you can only select continuous = "smooth". If it behaves like ggplot2:geom_smooth it internally automatically figures out which of the supported smoothers to call (loess for <1000 datapoints, gam for >=1000).
You might like to step it through the debugger to see what's happening inside your plot. I tried to follow the source but my eyes glaze over.
or 2. Browse on https://github.com/ggobi/ggally/blob/master/R/ggpairs.r [4/14/2013]
#' upper and lower are lists that may contain the variables 'continuous',
#' 'combo' and 'discrete'. Each element of the list is a string implementing
#' the following options: continuous = exactly one of ('points', 'smooth',
#' 'density', 'cor', 'blank') , ...
#'
#' diag is a list that may only contain the variables 'continuous' and 'discrete'.
#' Each element of the diag list is a string implmenting the following options:
#' continuous = exactly one of ('density', 'bar', 'blank');

Related

Add text to a ggpairs() scatterplot?

dumb but maddening question: How can I add text labels to my scatterplot points in a ggpairs(...) plot? ggpairs(...) is from the GGally library. The normal geom_text(...) function doesn't seem to be an option, as it take x,y arguments and ggpairs creates an NxN matrix of differently-styled plots.
Not showing data, but imagine I have a column called "ID" with id's of each point that's displayed in the scatterplots.
Happy to add data if it helps, but not sure it's necessary. And maybe the answer is simply that it isn't possible to add text labels to ggpairs(...)?
library(ggplot2)
library(GGally)
ggpairs(hwWrld[, c(2,6,4)], method = "pearson")
Note: Adding labels is for my personal reference. So no need to tell me it would look like an absolute mess. It will. I'm just looking to identify my outliers.
Thanks!
It is most certainly possible. Looking at the documentation for ?GGally::ggpairs there are three arguments, upper, lower and diag, which from the details of the documentations are
Upper and lower are lists that may contain the variables 'continuous', 'combo', 'discrete' and 'na'. Each element of thhe list may be a function or a string
... (more description)
If a function is supplied as an option to upper, lower, or diag, it should implement the function api of function(data, mapping, ...){#make ggplot2 plot}. If a specific function needs its parameters set, wrap(fn, param1 = val1, param2 = val2) the function with its parameters.
Thus a way to "make a label" would be to overwrite the default value of a plot. For example if we wanted to write "hello world" in the upper triangle we could do something like:
library(ggplot2)
library(GGally)
#' Plot continuous upper function, by adding text to the standard plot
#' text is placed straight in the middle, over anything already residing there!
continuous_upper_plot <- function(data, mapping, text, ...){
p <- ggally_cor(data, mapping, ...)
if(!is.data.frame(text))
text <- data.frame(text = text)
lims <- layer_scales(p)
p + geom_label(data = text, aes(x = mean(lims$x$range$range),
y = mean(lims$y$range$range),
label = text),
inherit.aes = FALSE)
}
ggpairs(iris, upper = list(continuous = wrap(continuous_upper_plot,
text = 'hello world')))
with the end result being:
There are 3 things to note here:
I've decided to add the text in the function itself. If your text is part of your existing data, simply using the mapping (aes) argument when calling the function will suffice. And this is likely also better, as you are looking to add text to specific points.
If you have any additional arguments to a function (outside data and mapping) you will need to use wrap to add these to the call.
The function documentation specifically says that arguments should be data, mapping rather than the standard for ggplot2 which is mapping, data. As such for any of the ggplot functions a small wrapper switching their positions will be necessary to overwrite the default arguments for ggpairs.

Plot a normal distribution in R with specific parameters

I'd like to plot something like this:
plot(dnorm(mean=2),from=-3,to=3)
But it doesn't work as if you do:
plot(dnorm,from=-3,to=3)
what is the problem?
The answer you received from #r2evans is excellent. You might also want to consider learning ggplot, as in the long run it will likely make your life much easier. In that case, you can use stat_function which will plot the results of an arbitrary function along a grid of the x variable. It accepts arguments to the function as a list.
library(ggplot2)
ggplot(data = data.frame(x=c(-3,3)), aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = 2))
curve(dnorm(x, mean = 2), from = -3, to = 3)
The curve function looks for the xname= variable (defaults to x) in the function call, so in dnorm(x, mean=2), it is not referencing an x in the calling environment, it is a placeholder for curve to use for iterated values.
The reason plot(dnorm, ...) works as it does is because there exists graphics::plot.function, since dnorm in that case is a function. When you try plot(dnorm(mean=2)), the dnorm(mean=2) is no longer a function, it is a call ... that happens to fail because it requires x (its first argument) be provided.
Incidentally, plot.function calls curve(...), so other than being a convenience function, there is very little reason to use plot(dnorm, ...) over curve(dnorm(x), ...) other than perhaps a little code-golf. The biggest advantage to curve is that it lets you control arbitrary arguments to the dnorm() function, whereas plot.function does not.

R, ggplot2: How can I visualize the inheritance or flow of data, attributes, and other components of a graphical object in ggplot2?

I am often unsure of exactly which elements of data, attributes, and other components of a graphical object in ggplot2 are inherited by which other elements, and where the defaults that flow down to, e.g., geoms, originate. In particular cases these questions can generally be answered by close reading of Hadley's ggplot2 book. But I would find it useful to have some sort of visualization of the overall flow of inheritance in ggplot2, and I wonder if anyone has seen, or created, or knows how to create, such a thing. I the same vein, a compact list of default values which arise in one level of specification (like the aes or a theme) and are inherited by another level (like a geom or scale) would be very useful to me, and I suspect to many people learning how to use ggplot2.
I would accept any of the following as an answer:
An inheritance visualization (perhaps as a network?) or a pointer
to same.
Code to construct an inheritance visualization, or a pointer to
same.
An alternative, non-visual approach that makes it easy, or at least
easier, to understand and remember such inheritance and answer
specific questions about it.
A list specifically of argument defaults showing where they arise and
which subsidiary functions inherit them, or code to produce such a
list.
This question seems to be about multiple levels of the ggplot package at once, but I'll try my best to give some information. It's almost impossible to describe the entire inheritance system of ggplot in one stack overflow answer, but pointing at the right functions might help get you started.
At the top level, both data and aesthetic mappings are inherited from the main ggplot call. In code below, geom_point() inherits the mapping and data:
ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point()
Unless you explicitly provide an alternative mapping and set the inheritance to false:
ggplot(iris, aes(Sepal.Width, Sepal.Length)) +
geom_point(aes(Petal.Width, Petal.Length), inherit.aes = FALSE)
Next, at the level of individual layers, certain defaults are inherited from either the stats, geoms or positions. Consider the following plot:
df <- reshape2::melt(volcano)
ggplot(df, aes(Var1, Var2)) +
geom_raster()
The raster will be a dark grey colour, because we haven't specified a fill mapping. You can get a sense what defaults a geom / stat has by looking at their ggproto objects:
> GeomRaster$default_aes
Aesthetic mapping:
* `fill` -> "grey20"
* `alpha` -> NA
> StatDensity$default_aes
Aesthetic mapping:
* `y` -> `stat(density)`
* `fill` -> NA
Another key ingredient for understanding how layers are given their parameters, is looking at the layer() code. Specifically, this bit here (abbreviated for clarity):
function (geom = NULL, stat = NULL, data = NULL, mapping = NULL,
position = NULL, params = list(), inherit.aes = TRUE, check.aes = TRUE,
check.param = TRUE, show.legend = NA, key_glyph = NULL,
layer_class = Layer)
{
...
aes_params <- params[intersect(names(params), geom$aesthetics())]
geom_params <- params[intersect(names(params), geom$parameters(TRUE))]
stat_params <- params[intersect(names(params), stat$parameters(TRUE))]
...
ggproto("LayerInstance", layer_class, geom = geom, geom_params = geom_params,
stat = stat, stat_params = stat_params, data = data,
mapping = mapping, aes_params = aes_params, position = position,
inherit.aes = inherit.aes, show.legend = show.legend)
}
Wherein you can see that whatever parameters you give, they are checked against valid parameters of the stat/geom/position and distributed to the appropriate part of the layer. As you can see from the last call, a Layer ggproto object is created. The parent of this class is not exported but you can still inspect the functions inside that object. For example, if you are curious about how aesthetics are evaluated, you can type:
ggplot2:::Layer$compute_aesthetics
Wherein you can see that some defaults of the scales are incorporated here as well. Of course, it doesn't make much sense what these layers do if you don't understand the order of operations in which these layer functions are called. For that we can have a look at the plot builder (also abbreviated for clarity):
> ggplot2:::ggplot_build.ggplot
function (plot)
{
...
data <- by_layer(function(l, d) l$setup_layer(d, plot))
...
data <- by_layer(function(l, d) l$compute_aesthetics(d, plot))
data <- lapply(data, scales_transform_df, scales = scales)
...
data <- layout$map_position(data)
data <- by_layer(function(l, d) l$compute_statistic(d, layout))
data <- by_layer(function(l, d) l$map_statistic(d, plot))
scales_add_missing(plot, c("x", "y"), plot$plot_env)
data <- by_layer(function(l, d) l$compute_geom_1(d))
data <- by_layer(function(l, d) l$compute_position(d, layout))
...
data <- by_layer(function(l, d) l$compute_geom_2(d))
data <- by_layer(function(l, d) l$finish_statistics(d))
...
structure(list(data = data, layout = layout, plot = plot),
class = "ggplot_built")
}
From this, you can see that layers are setup first, then aesthetics are computed, then scale transformations are applied, then statistics are computed, then a part of the geom is computed, then the position, and finally the reset of the geom.
What this means is that the statistical transformations you put in are going to be affected by scale transformations, but not coord transformations (which is elsewhere later).
If you go through the code, you'll find that almost nothing theme-related is evaluated up untill this point (except for some theme evaluation with the facets). As you can see, the building function returns an object of the class ggplot_build, which is still not graphical output. The interpretation of theme elements and actual interpretation of the geoms towards grid graphics happens in the following function:
ggplot2:::ggplot_gtable.ggplot_built
After this function, you'll have a gtable object that can be interpreted by grid::grid.draw() which will output to your graphics device.
Unfortunately, I'm not very well-versed in the inheritance of theme elements, but as Jon Spring pointed out in the comments, a good place to start is the documentation.
Hopefully, I've pointed out functions where to look for inheritance patterns in ggplot.

Basic Calculations with stat_functions -- Plotting hazard functions

I am currently trying to plot some density distributions functions with R's ggplot2. I have the following code:
f <- stat_function(fun="dweibull",
args=list("shape"=1),
"x" = c(0,10))
stat_F <- stat_function(fun="pweibull",
args=list("shape"=1),
"x" = c(0,10))
S <- function() 1 - stat_F
h <- function() f / S
wei_h <- ggplot(data.frame(x=c(0,10))) +
stat_function(fun=h) +
...
Basically I want to plot hazard functions based on a Weibull Distribution with varying parameters, meaning I want to plot:
The above code gives me this error:
Computation failed in stat_function():
unused argument (x_trans)
I also tried to directly use
S <- 1 - stat_function(fun="pweibull", ...)
instead of above "workaround" with the custom function construction. This threw another error, since I was trying to do numeric arithmetics on an object:
non-numeric argument for binary operator
I get that error, but I have no idea for a solution.
I have done some research, but without success. I feel like this should be straightforward. Also I would like to do it "manually" as much as possible, but if there is no simple way to do this, then a packaged solution is just fine aswell.
Thanks in advance for any suggestions!
PS: I basically want to recreate the graph you can find in Kiefer, 1988 on page 10 of the linked PDF file.
Three comments:
stat_function is a function statistic for ggplot2, you cannot divide two stat_function expressions by each other or otherwise use them in mathematical expressions, as in S <- 1 - stat_function(fun="pweibull", ...). That's a fundamental misunderstanding of what stat_function is. stat_function always needs to be added to a ggplot2 plot, as in the example below.
The fun argument for stat_function takes a function as an argument, not a string. You can define functions on the fly if you need ones that don't exist already.
You need to set up an aesthetic mapping, via the aes function.
This code works:
args = list("shape" = 1.2)
ggplot(data.frame(x = seq(0, 10, length.out = 100)), aes(x)) +
stat_function(fun = dweibull, args = args, color = "red") +
stat_function(fun = function(...){1-pweibull(...)}, args = args, color = "green") +
stat_function(fun = function(...){dweibull(...)/(1-pweibull(...))},
args = args, color = "blue")

Error in stop_if_params_exist(obj$params) : 'params' is a deprecated argument [duplicate]

I am trying to replicate this simple example given in the Coursera R Regression Models course:
require(datasets)
data(swiss)
require(GGally)
require(ggplot2)
ggpairs(swiss, lower = list(continuous = "smooth", params = c(method = "loess")))
I expect to see a 6x6 pairs plot - one scatterplot with loess smoother and confidence intervals for each combination of the 6 variables in the swiss data.
However, I get the following error:
Error in display_param_error() : 'params' is a deprecated argument.
Please 'wrap' the function to supply arguments. help("wrap", package =
"GGally")
I looked through the ggpairs() and wrap() help files and have tried lots of permutations of the wrap() and wrap_fn_with_param_arg() functions.
I can get this to work as expected:
ggpairs(swiss, lower = list(continuous = wrap("smooth")))
But once I add the loess part in, it does not:
ggpairs(swiss, lower = list(continuous = wrap("smooth"), method = wrap("loess")))
I get this error when I tried the line above.
Error in value[3L] : The following ggpair plot functions
are readily available: continuous: c('points', 'smooth', 'density',
'cor', 'blank') combo: c('box', 'dot', 'facethist', 'facetdensity',
'denstrip', 'blank') discrete: c('ratio', 'facetbar', 'blank') na:
c('na', 'blank')
diag continuous: c('densityDiag', 'barDiag', 'blankDiag') diag
discrete: c('barDiag', 'blankDiag') diag na: c('naDiag', 'blankDiag')
You may also provide your own function that follows the api of
function(data, mapping, ...){ . . . } and returns a ggplot2 plot
object Ex: my_fn <- function(data, mapping, ...){ p <-
ggplot(data = data, mapping = mapping) +
geom_point(...) p } ggpairs(data, lower = list(continuous = my_fn))
Function provided: loess
Obviously I am entering loess in the wrong place. Can anyone help me understand how to add the loess part in?
Note that my problem is different to this one, as I am asking how to implement loess in ggpairs since the params argument became deprecated.
Thanks very much.
One quick way is to write your own function... the one below was edited from the one provided by the ggpairs error message in your question
library(GGally)
library(ggplot2)
data(swiss)
# Function to return points and geom_smooth
# allow for the method to be changed
my_fn <- function(data, mapping, method="loess", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# Default loess curve
ggpairs(swiss[1:4], lower = list(continuous = my_fn))
# Use wrap to add further arguments; change method to lm
ggpairs(swiss[1:4], lower = list(continuous = wrap(my_fn, method="lm")))
This perhaps gives a bit more control over the arguments that are passed to each geon_
my_fn <- function(data, mapping, pts=list(), smt=list(), ...){
ggplot(data = data, mapping = mapping, ...) +
do.call(geom_point, pts) +
do.call(geom_smooth, smt)
}
# Plot
ggpairs(swiss[1:4],
lower = list(continuous =
wrap(my_fn,
pts=list(size=2, colour="red"),
smt=list(method="lm", se=F, size=5, colour="blue"))))
Maybe you are taking the Coursera online course Regression Models and try to convert the Rmarkdown file given by the course to html file, and come across this error as I do.
The way I tried out is:
require(datasets); data(swiss); require(GGally); require(ggplot2)
g = ggpairs(swiss, lower = list(continuous = wrap("smooth", method = "lm")))
g
Also you can try using method="loess", but the outcome looks a bit different from that given in the lecture. method = "lm" may be a better fit as I see.
I suspected as well you were taking Coursera's class.
Though, I could not find any github repo containing ggplot's examples.
Here's what I did to make it work:
gp = ggpairs(swiss, lower = list(continuous = "smooth"))
gp
This works:
ggpairs(swiss, lower = list(continuous = wrap("smooth", method = "loess")))

Resources