R : pass Graph as parameter to a function - r

I have a decent looking graph ,which I plotted using
r <- ggplot(data=data2.Gurgaon,aes(x=createdDate,y=count))+geom_point()
Now i want to higlight few points on the graph say 500,1000,5000 etc..
so ,I am trying to write a function , in which i can pass point I want to mark
Below is the function I have written
graphPoint <- function(graph,point) {
g <- graph
g <- g+geom_point(aes(x=createdDate[point],y=count[point]),pch=1,size=8,col='black')
g <- g+ geom_point(aes(x=createdDate[point],y=count[point]),pch=16,size=5,col='red')
g
}
when i am passing parameters
r -> graphPoint(r,500)
this is giving error
Error in lapply(X = x, FUN = "[", ..., drop = drop) :
object 'point' not found
i am not that great with R . Hope its possible , But I am missing at some small point .. Thanks.

This is actually an extremely subtle (and annoying...) problem in ggplot, although not a bug. The aes(...) function evaluates all symbols first in the context of the default dataset (e.g. it looks for columns with that name), and, if that fails in the global environment. It does not move up the calling chain, as you might justifiably expect it to. So in your case the symbol point is first evaluated in the context of data2.Gurgaon. Since there is no such column, it looks for point in the global environment, but not in the context of your graphPoint(...) function. Here is a demonstration:
df <- mtcars
library(ggplot2)
graphPoint <- function(graph,point) {
g <- graph
g <- g + geom_point(aes(x=wt[point],y=mpg[point]),pch=1,size=8,col='black')
g <- g + geom_point(aes(x=wt[point],y=mpg[point]),pch=16,size=5,col='red')
g
}
ggp <- ggplot(df, aes(x=wt, y=mpg)) + geom_point()
point=10
graphPoint(ggp, 10)
The reason this works is because I defined point in the global environment; the point variable inside the function is being ignored (you can demonstrate that by calling the fn with something other than 10: you'll get the same plot).
The correct way around this is by subsetting the data=... argument, as shown in the other answer.

You cannot select a subset of the data within the aesthetics part of a ggplot function, as you are trying to do. However you can achieve this by extracting the original data from the ggplot object, subsetting it and using the subset in the rest of the function.
r <- ggplot(data=mtcars,aes(x=cyl,y=drat))+geom_point()
graphPoint <- function(graph,point) {
g <- graph
data_subset <- g$data[point, ]
g <- g+geom_point(data = data_subset,
aes(x=cyl,y=drat),pch=1,size=8,col='black')
g <- g+ geom_point(data = data_subset,
aes(x=cyl,y=drat),pch=16,size=5,col='red')
g
}
graphPoint(r, point = 2)
PS for upcoming posts I would advise you to make a reproducible example by using data that is generally accessible, like the mtcars data. This would make it easier to help you out.

Related

In R, how can I tell if the scales on a ggplot object are log or linear?

I have many ggplot objects where I wish to print some text (varies from plot to plot) in the same relative position on each plot, regardless of scale. What I have come up with to make it simple is to
define a rescale function (call it sx) to take the relative position I want and return that position on the plot's x axis.
sx <- function(pct, range=xr){
position <- range[1] + pct*(range[2]-range[1])
}
make the plot without the text (call it plt)
Use the ggplot_build function to find the x scale's range
xr <- ggplot_build(plt)$layout$panel_params[[1]]$x.range
Then add the text to the plot
plt <- plt + annotate("text", x=sx(0.95), ....)
This works well for me, though I'm sure there are other solutions folks have derived. I like the solution because I only need to add one step (step 3) to each plot. And it's a simple modification to the annotate command (x goes to sx(x)).
If someone has a suggestion for a better method I'd like to hear about it. There is one thing about my solution though that gives me a little trouble and I'm asking for a little help:
My problem is that I need a separate function for log scales, (call it lx). It's a bit of a pain because every time I want to change the scale I need to modify the annotate commands (change sx to lx) and occasionally there are many. This could easily be solved in the sx function if there was a way to tell what the type of scale was. For instance, is there a parameter in ggplot_build objects that describe the log/lin nature of the scale? That seems to be the best place to find it (that's where I'm pulling the scale's range) but I've looked and can not figure it out. If there was, then I could add a command to step 3 above to define the scale type, and add a tag to the sx function in step 1. That would save me some tedious work.
So, just to reiterate: does anyone know how to tell the scaling (type of scale: log or linear) of a ggplot object? such as using the ggplot_build command's object?
Suppose we have a list of pre-build plots:
linear <- ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) +
geom_point()
log <- linear + scale_y_log10()
linear <- ggplot_build(linear)
log <- ggplot_build(log)
plotlist <- list(a = linear, b = log)
We can grab information about their position scales in the following way:
out <- lapply(names(plotlist), function(i) {
# Grab plot, panel parameters and scales
plot <- plotlist[[i]]
params <- plot$layout$panel_params[[1]]
scales <- plot$plot$scales$scales
# Only keep (continuous) position scales
keep <- vapply(scales, function(x) {
inherits(x, "ScaleContinuousPosition")
}, logical(1))
scales <- scales[keep]
# Grab relevant transformations
out <- lapply(scales, function(scale) {
data.frame(position = scale$aesthetics[1],
# And now for the actual question:
transformation = scale$trans$name,
plot = i)
})
out <- do.call(rbind, out)
# Grab relevant ranges
ranges <- params[paste0(out$position, ".range")]
out$min <- sapply(ranges, `[`, 1)
out$max <- sapply(ranges, `[`, 2)
out
})
out <- do.call(rbind, out)
Which will give us:
out
position transformation plot min max
1 x identity a 1.8800000 4.520000
2 y identity a 4.1200000 8.080000
3 y log-10 b 0.6202605 0.910835
4 x identity b 1.8800000 4.520000
Or if you prefer a straightforward answer:
log$plot$scales$scales[[1]]$trans$name
[1] "log-10"

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Inverse of ggplotGrob?

I have a function which manipulates a ggplot object, by converting it to a grob and then modifying the layers. I would like the function to return a ggplot object not a grob. Is there a simple way to convert a grob back to gg?
The documentation on ggplotGrob is awfully sparse.
Simple example:
P <- ggplot(iris) + geom_bar(aes(x=Species, y=Petal.Width), stat="identity")
G <- ggplotGrob(P)
... some manipulation to G ...
## DESIRED:
P2 <- inverse_of_ggplotGrob(G)
such that, we can continue to use basic ggplot syntax, ie
`P2 + ylab ("The Width of the Petal")`
UPDATE:
To answer the question in the comment, the motivation here is to modify the colors of facet labels programmatically, based on the value of label name in each facet. The functions below work nicely (based on input from baptise in a previous question).
I would like for the return value from colorByGroup to be a ggplot object, not simply a grob.
Here is the code, for those interested
get_grob_strips <- function(G, strips=grep(pattern="strip.*", G$layout$name)) {
if (inherits(G, "gg"))
G <- ggplotGrob(G)
if (!inherits(G, "gtable"))
stop ("G must be a gtable object or a gg object")
strip.type <- G$layout[strips, "name"]
## I know this works for a simple
strip.nms <- sapply(strips, function(i) {
attributes(G$grobs[[i]]$width$arg1)$data[[1]][["label"]]
})
data.table(grob_index=strips, type=strip.type, group=strip.nms)
}
refill <- function(strip, colour){
strip[["children"]][[1]][["gp"]][["fill"]] <- colour
return(strip)
}
colorByGroup <- function(P, colors, showWarnings=TRUE) {
## The names of colors should match to the groups in facet
G <- ggplotGrob(P)
DT.strips <- get_grob_strips(G)
groups <- names(colors)
if (is.null(groups) || !is.character(groups)) {
groups <- unique(DT.strips$group)
if (length(colors) < length(groups))
stop ("not enough colors specified")
colors <- colors[seq(groups)]
names(colors) <- groups
}
## 'groups' should match the 'group' in DT.strips, which came from the facet_name
matched_groups <- intersect(groups, DT.strips$group)
if (!length(matched_groups))
stop ("no groups match")
if (showWarnings) {
if (length(wh <- setdiff(groups, DT.strips$group)))
warning ("values in 'groups' but not a facet label: \n", paste(wh, colapse=", "))
if (length(wh <- setdiff(DT.strips$group, groups)))
warning ("values in facet label but not in 'groups': \n", paste(wh, colapse=", "))
}
## identify the indecies to the grob and the appropriate color
DT.strips[, color := colors[group]]
inds <- DT.strips[!is.na(color), grob_index]
cols <- DT.strips[!is.na(color), color]
## Fill in the appropriate colors, using refill()
G$grobs[inds] <- mapply(refill, strip = G$grobs[inds], colour = cols, SIMPLIFY = FALSE)
G
}
I would say no. ggplotGrob is a one-way street. grob objects are drawing primitives defined by grid. You can create arbitrary grobs from scratch. There's no general way to turn a random collection of grobs back into a function that would generate them (it's not invertible because it's not 1:1). Once you go grob, you never go back.
You could wrap a ggplot object in a custom class and overload the plot/print commands to do some custom grob manipulation, but that's probably even more hack-ish.
You can try the following:
p = ggplotify::as.ggplot(g)
For more info, see https://cran.r-project.org/web/packages/ggplotify/vignettes/ggplotify.html
It involves a little bit of a cheat annotation_custom(as.grob(plot),...), so it may not work for all circumstances: https://github.com/GuangchuangYu/ggplotify/blob/master/R/as-ggplot.R
Have a look at the ggpubr package: it has a function as_ggplot(). If your grob is not too complex it might be a solution!
I would also advise to have a look at the patchwork package which combine nicely ggplots... it is likely to not be what you are looking for but... have a look.

Save heatmap.2 in variable and plot again

I use heatmap.2 from gplots to make a heatmap:
library(gplots)
# some fake data
m = matrix(c(0,1,2,3), nrow=2, ncol=2)
# make heatmap
hm = heatmap.2(m)
When I do 'heatmap.2' directly I get a plot that I can output to a device. How can I make the plot again from my variable 'hm'? Obviously this is a toy example, in real life I have a function that generates and returns a heatmap which I would like to plot later.
There are several alternatives, although none of them are particularly elegant. It depends on if the variables used by your function are available in the plotting environment. heatmap.2 doesn't return a proper "heatmap" object, although it contains the necessary information for plotting the graphics again. See str(hm) to inspect the object.
If the variables are available in your environment, you could just re-evaluate the original plotting call:
library(gplots)
# some fake data (adjusted a bit)
set.seed(1)
m = matrix(rnorm(100), nrow=10, ncol=10)
# make heatmap
hm = heatmap.2(m, col=rainbow(4))
# Below fails if all variables are not available in the global environment
eval(hm$call)
I assume this won't be the case though, as you mentioned that you are calling the plot command from inside a function and I think you're not using any global variables. You could just re-construct the heatmap drawing call from the fields available in your hm-object. The problem is that the original matrix is not available, but instead we have a re-organized $carpet-field. It requires some tinkering to obtain the original matrix, as the projection has been:
# hm2$carpet = t(m[hm2$rowInd, hm2$colInd])
At least in the case when the data matrix has not been scaled, the below should work. Add extra parameters according to your specific plotting call.
func <- function(mat){
h <- heatmap.2(mat, col=rainbow(4))
h
}
# eval(hm2$call) does not work, 'mat' is not available
hm2 <- func(m)
# here hm2$carpet = t(m[hm2$rowInd, hm2$colInd])
# Finding the projection back can be a bit cumbersome:
revRowInd <- match(c(1:length(hm2$rowInd)), hm2$rowInd)
revColInd <- match(c(1:length(hm2$colInd)), hm2$colInd)
heatmap.2(t(hm2$carpet)[revRowInd, revColInd], Rowv=hm2$rowDendrogram, Colv=hm2$colDendrogram, col=hm2$col)
Furthermore, I think you may be able to work your way to evaluating hm$call in the function's environment. Perhaps with-function would be useful.
You could also make mat available by attaching it to the global environment, but I think this is considered bad practice, as too eager use of attach can result in problems. Notice that in my example every call to func creates the original plot.
I would do some functional programming:
create_heatmap <- function(...) {
plot_heatmap <- function() heatmap.2(...)
}
data = matrix(rnorm(100), nrow = 10)
show_heatmap <- create_heatmap(x = data)
show_heatmap()
Pass all of the arguments you need to send to plot_heatmap through the .... The outer function call sets up an environment in which the inner function looks first for its arguments. The inner function is returned as an object and is now completely portable. This should produce the exact same plot each time!

How to draw lines on a plot in R?

I need to draw lines from the data stored in a text file.
So far I am able only to draw points on a graph and i would like to have them as lines (line graph).
Here's the code:
pupil_data <- read.table("C:/a1t_left_test.dat", header=T, sep="\t")
max_y <- max(pupil_data$PupilLeft)
plot(NA,NA,xlim=c(0,length(pupil_data$PupilLeft)), ylim=c(2,max_y));
for (i in 1:(length(pupil_data$PupilLeft) - 1))
{
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red", cex = 0.5, lwd = 2.0)
}
Please help me change this line of code:
points(i, y = pupil_data$PupilLeft[i], type = "o", col = "red")
to draw lines from the data.
Here is the data in the file:
PupilLeft
3.553479
3.539469
3.527239
3.613131
3.649437
3.632779
3.614373
3.605981
3.595985
3.630766
3.590724
3.626535
3.62386
3.619688
3.595711
3.627841
3.623596
3.650569
3.64876
By default, R will plot a single vector as the y coordinates, and use a sequence for the x coordinates. So to make the plot you are after, all you need is:
plot(pupil_data$PupilLeft, type = "o")
You haven't provided any example data, but you can see this with the built-in iris data set:
plot(iris[,1], type = "o")
This does in fact plot the points as lines. If you are actually getting points without lines, you'll need to provide a working example with your data to figure out why.
EDIT:
Your original code doesn't work because of the loop. You are in effect asking R to plot a line connecting a single point to itself each time through the loop. The next time through the loop R doesn't know that there are other points that you want connected; if it did, this would break the intended use of points, which is to add points/lines to an existing plot.
Of course, the line connecting a point to itself doesn't really make sense, and so it isn't plotted (or is plotted too small to see, same result).
Your example is most easily done without a loop:
PupilLeft <- c(3.553479 ,3.539469 ,3.527239 ,3.613131 ,3.649437 ,3.632779 ,3.614373
,3.605981 ,3.595985 ,3.630766 ,3.590724 ,3.626535 ,3.62386 ,3.619688
,3.595711 ,3.627841 ,3.623596 ,3.650569 ,3.64876)
plot(PupilLeft, type = 'o')
If you really do need to use a loop, then the coding becomes more involved. One approach would be to use a closure:
makeaddpoint <- function(firstpoint){
## firstpoint is the y value of the first point in the series
lastpt <- firstpoint
lastptind <- 1
addpoint <- function(nextpt, ...){
pts <- rbind(c(lastptind, lastpt), c(lastptind + 1, nextpt))
points(pts, ... )
lastpt <<- nextpt
lastptind <<- lastptind + 1
}
return(addpoint)
}
myaddpoint <- makeaddpoint(PupilLeft[1])
plot(NA,NA,xlim=c(0,length(PupilLeft)), ylim=c(2,max(PupilLeft)))
for (i in 2:(length(PupilLeft)))
{
myaddpoint(PupilLeft[i], type = "o")
}
You can then wrap the myaddpoint call in the for loop with whatever testing you need to decide whether or not you will actually plot that point. The function returned by makeaddpoint will keep track of the plot indexing for you.
This is normal programming for Lisp-like languages. If you find it confusing you can do this without a closure, but you'll need to handle incrementing the index and storing the previous point value 'manually' in your loop.
There is a strong aversion among experienced R coders to using for-loops when not really needed. This is an example of a loop-less use of a vectorized function named segments that takes 4 vectors as arguments: x0,y0, x1,y1
npups <-length(pupil_data$PupilLeft)
segments(1:(npups-1), pupil_data$PupilLeft[-npups], # the starting points
2:npups, pupil_data$PupilLeft[-1] ) # the ending points

Resources