Refresh plot after changing the underlying dataframe - r

I have several large R scripts in which I construct complex plots. At the end, I want to output the plots as PDF and TikZ file. It looks something like this:
mydata <- ...
p <- ggplot(mydata, ...)
p <- p + ... # many
p <- p + ... # modifications
p <- p + ... # to the plot
ggsave("plot.pdf")
ggsave("plot.tex", device=tikz)
Now, I want to change the name of factor levels between both calls to ggsave, since I want to include some fancy LaTeX stuff in the level names for the TikZ version:
ggsave("plot.pdf")
mydata$myfactor <- revalue(mydata$myfactor, c(small="S", medium="M"))
ggsave("plot.tex", device=tikz)
The problem here is that the change in mydata is not "propagated" to the plot. The TikZ version still uses the old level names. Is there any command to "refresh" the plot from mydata?
I'm aware of some workarounds, e.g., after renaming the factor levels, I could duplicate the whole plot construction. That works, but is inelegant. I think some kind of refresh-plot-from-data command would be most elegant, so that I don't have to repeat the plot specifications.

You haven't given a reproducible example, but I think the %+% operator (which is primarily intended for replacing the internally stored data set with a new, different one) should work to replace the internally stored data set with an updated version.
ggsave("plot.pdf",plot=p)
mydata$myfactor <- revalue(mydata$myfactor, c(small="S", medium="M"))
p <- p %+% mydata
ggsave("plot.tex", plot=p, device=tikz)
(I'm using an explicit plot= specification here for clarity.)
If that doesn't work, I would wrap your plot-construction code in a function, so that you would just p <- build_plot(mydata) every time you needed to.

Related

Assigned variable is changing when object is modified - ggplot [duplicate]

I'm trying to copy a ggplot object and then change some properties of the new copied object as, for instance, the colour line to red.
Assume this code:
df = data.frame(cbind(x=1:10, y=1:10))
a = ggplot(df, aes(x=x, y=y)) + geom_line()
b = a
Then, if I change the colour of line of variable a
a$layers[[1]]$geom_params$colour = "red"
it also changes the colour of b
> b$layers[[1]]$geom_params$colour
[1] "red" # why it is not "black"?
I wish I could have two different objects a and b with different characteristics. So, in order to do this in the correct way, I would need to call the plot again for b using b = ggplot(df, aes(xy, y=z)) + geom_line(). However, at this time in the algorithm, there is no way to know the plot command ggplot(df, aes(x=x, y=y)) + geom_line()
Do you know what's wrong with this? Is ggplot objects treated in a different manner?
Thanks!
The issue here is that ggplot uses the proto library to mimic OO-style objects. The proto library relies on environments to collect variables for objects. Environments are passed by reference which is why you are seeing the behavior you are (and also a reason no one would probably recommend changing the properties of a layer that way).
Anyway, adapting an example from the proto documentaiton, we can try to make a deep copy of the laters of the ggplot object. This should "disconnect" them. Here's such a helper function
duplicate.ggplot<-function(x) {
require(proto)
r<-x
r$layers <- lapply(r$layers, function(x) {
as.proto(as.list(x), parent=x)
})
r
}
so if we run
df = data.frame(cbind(x=1:10, y=1:10))
a = ggplot(df, aes(x=x, y=y)) + geom_line()
b = a
c = duplicate.ggplot(a)
a$layers[[1]]$geom_params$colour = "red"
then plot all three, we get
which shows we can change "c" independently from "a"
Ignoring the specifics of ggplot, there's a simple trick to make a deep copy of (almost) any object in R:
obj_copy <- unserialize(serialize(obj, NULL))
This serializes the object to a binary representation suitable for writing to disk and then reconstructs the object from that representation. It's equivalent to saving the object to a file and then loading it again (i.e. saveRDS followed by readRDS), only it never actually saves to a file. It's probably not the most efficient solution, but it should work for just about any object that can be saved to a file.
You can define a deepcopy function using this trick:
deepcopy <- function(p) {
unserialize(serialize(p, NULL))
}
This seems to successfully break the links between related ggplots.
Obviously, this will not work for objects that cannot be serialized, such as big matrices from the bigmemory package.

How can you use ggplot to superimpose many plots of related functions in an automatic way?

I have a family of functions that are all the same except for one adjustable parameter, and I want to plot all these functions on one set of axes all superimposed on one another. For instance, this could be sin(n*x), with various values of n, say 1:30, and I don't want to have to type out each command individually -- I figure there should be some way to do it programatically.
library(ggplot2)
define trig functions as a function of frequency: sin(x), sin(2x), sin(3x) etc.
trigf <- function(i)(function(x)(sin(i*x)))
Superimpose two function plots -- this works manually of course
ggplot(data.frame(x=c(0,pi)), aes(x)) + stat_function(fun=trigf(1)) + stat_function(fun=trigf(2))
now try to generalize -- my idea was to make a list of the stat_functions using lapply
plotTrigf <- lapply(1:5, function(i)(stat_function(fun=function(x)(sin(i*x))) ))
try using the elements of the list manually but it doesn't really work -- only the i=5 plot is shown and I'm not sure why when that's not what I referenced
ggplot(data.frame(x=c(0,pi)), aes(x)) +plotTrigf[[1]] + plotTrigf[[2]]
I Thought this Reduce might handle the 'generalized sum' to add to a ggplot() but it doesn't work -- it complains of a non-numeric argument to binary operator
Reduce("+", plotTrigf)
So I'm kind of stuck both in executing this strategy, or perhaps there's some other way to do this.
Are you using version R <3.2? The problem is that you actually need to evaluate your i parameter in your lapply call. Right now it's being left as a promise and not getting evaulated till you try to plot and at that point i has the last value it had in the lapply loop which is 5. Use:
plotTrigf <- lapply(1:5, function(i) {force(i);stat_function(fun=function(x)(sin(i*x))) })
You can't just add stat_function calls together, even without Reduce() you get the error
stat_function(fun=sin) + stat_function(fun=cos)
# Error in stat_function(fun = sin) + stat_function(fun = cos) :
# non-numeric argument to binary operator
You need to add them to a ggplot object. You can do this with Reduce() if you just specify the init= parameter
Reduce("+", plotTrigf, ggplot(data.frame(x=c(0,pi)), aes(x)))
And actually the special + operator for ggplot objects allows you to add a list of objects so you don't even need the Reduce at all (see code for ggplot2:::add_ggplot)
ggplot(data.frame(x=c(0,pi)), aes(x)) + plotTrigf
The final result is
You need to use force in order to make sure the parameter is being evaluated at the right time. It's a very useful technique and a common source of confusion in loops, you should read about it in Hadley's book http://adv-r.had.co.nz/Functions.html
To solve your question: you just need to add force(i) when defining all the plots, inside the lapply function, before making the call to stat_function. Then you can use Reduce or any other method to combine them. Here's a way to combine the plots using lapply (note that I'm using the <<- operator which is discouraged)
p <- ggplot(data.frame(x=c(0,pi)), aes(x))
lapply(plotTrigf, function(x) {
p <<- p + x
return()
})

Misplaced points in ggplot

I'm reading in a file like so:
genes<-read.table("goi.txt",header=TRUE, row.names=1)
control<-log2(1+(genes[,1]))
experiment<-log2(1+(genes[,2]))
And plotting them as a simple scatter in ggplot:
ggplot(genes, aes(control, experiment)) +
xlim(0, 20) +
ylim(0, 20) +
geom_text(aes(control, experiment, label=row.names(genes)),size=3)
However the points are incorrectly placed on my plot (see attached image)
This is my data:
control expt
gfi1 0.189634 3.16574
Ripply3 13.752000 34.40630
atonal 2.527670 4.97132
sox2 16.584300 42.73240
tbx15 0.878446 3.13560
hes8 0.830370 8.17272
Tlx1 1.349330 7.33417
pou4f1 3.763400 9.44845
pou3f2 0.444326 2.92796
neurog1 13.943800 24.83100
sox3 17.275700 26.49240
isl2 3.841100 10.08640
As you can see, 'Ripply3' is clearly in the wrong position on the graph!
Am I doing something really stupid?
The aes() function used by ggplot looks first inside the data frame you provide via data = genes. This is why you can (and should) specify variable only by bare column names like control; ggplot will automatically know where to find the data.
But R's scoping system is such that if nothing by that name is found in the current environment, R will look in the parent environment, and so on, until it reaches the global environment until it finds something by that name.
So aes(control, experiment) looks for variables named control and experiment inside the data frame genes. It finds the original, untransformed control variable, but of course there is no experiment variable in genes. So it continues up the chain of environments until it hits the global environment, where you have defined the isolated variable experiment and uses that.
You meant to do something more like this:
genes$controlLog <- log2(1+(genes[,1]))
genese$exptLog <- log2(1+(genes[,2]))
followed by:
ggplot(genes, aes(controlLog, exptLog)) +
xlim(0, 20) +
ylim(0, 20) +
geom_text(aes(controlLog, exptLog, label=row.names(genes)),size=3)

ggplot2 : printing multiple plots in one page with a loop

I have several subjects for which I need to generate a plot, as I have many subjects I'd like to have several plots in one page rather than one figure for subject.
Here it is what I have done so far:
Read txt file with subjects name
subjs <- scan ("ListSubjs.txt", what = "")
Create a list to hold plot objects
pltList <- list()
for(s in 1:length(subjs))
{
setwd(file.path("C:/Users/", subjs[[s]])) #load subj directory
ifile=paste("Co","data.txt",sep="",collapse=NULL) #Read subj file
dat = read.table(ifile)
dat <- unlist(dat, use.names = FALSE) #make dat usable for ggplot2
df <- data.frame(dat)
pltList[[s]]<- print(ggplot( df, aes(x=dat)) + #save each plot with unique name
geom_histogram(binwidth=.01, colour="cyan", fill="cyan") +
geom_vline(aes(xintercept=0), # Ignore NA values for mean
color="red", linetype="dashed", size=1)+
xlab(paste("Co_data", subjs[[s]] , sep=" ",collapse=NULL)))
}
At this point I can display the single plots for example by
print (pltList[1]) #will print first plot
print(pltList[2]) # will print second plot
I d like to have a solution by which several plots are displayed in the same page, I 've tried something along the lines of previous posts but I don't manage to make it work
for example:
for (p in seq(length(pltList))) {
do.call("grid.arrange", pltList[[p]])
}
gives me the following error
Error in arrangeGrob(..., as.table = as.table, clip = clip, main = main, :
input must be grobs!
I can use more basic graphing features, but I d like to achieve this by using ggplot. Many thanks for consideration
Matilde
Your error comes from indexing a list with [[:
consider
pl = list(qplot(1,1), qplot(2,2))
pl[[1]] returns the first plot, but do.call expects a list of arguments. You could do it with, do.call(grid.arrange, pl[1]) (no error), but that's probably not what you want (it arranges one plot on the page, there's little point in doing that). Presumably you wanted all plots,
grid.arrange(grobs = pl)
or, equivalently,
do.call(grid.arrange, pl)
If you want a selection of this list, use [,
grid.arrange(grobs = pl[1:2])
do.call(grid.arrange, pl[1:2])
Further parameters can be passed trivially with the first syntax; with do.call care must be taken to make sure the list is in the correct form,
grid.arrange(grobs = pl[1:2], ncol=3, top=textGrob("title"))
do.call(grid.arrange, c(pl[1:2], list(ncol=3, top=textGrob("title"))))
library(gridExtra) # for grid.arrange
library(grid)
grid.arrange(pltList[[1]], pltList[[2]], pltList[[3]], pltList[[4]], ncol = 2, main = "Whatever") # say you have 4 plots
OR,
do.call(grid.arrange,pltList)
I wish I had enough reputation to comment instead of answer, but anyway you can use the following solution to get it work.
I would do exactly what you did to get the pltList, then use the multiplot function from this recipe. Note that you will need to specify the number of columns. For example, if you want to plot all plots in the list into two columns, you can do this:
print(multiplot(plotlist=pltList, cols=2))

Assigning "beanplot" object to variable in R

I have found that the beanplot is the best way to represent my data. I want to look at multiple beanplots together to visualize my data. Each of my plots contains 3 variables, so each one looks something like what would be generated by this code:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
c <- rnorm(100)
beanplot(a, b ,c ,ylim = c(-4, 4), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
(Would have just included an image but my reputation score is not high enough, sorry)
I have 421 of these that I want to put into one long PDF (EDIT: One plot per page is fine, this was just poor wording on my part). The approach I have taken was to first generate the beanplots in a for loop and store them in a list at each iteration. Then I will use the multiplot function (from the R Cookbook page on multiplot) to display all of my plots on one long column so I can begin my analysis.
The problem is that the beanplot function does not appear to be set up to assign plot objects as a variable. Example:
library(beanplot)
a <- rnorm(100)
b <- rnorm(100)
plot1 <- beanplot(a, b, ylim = c(-5,5), main = "Beanplot",
col = c("#CAB2D6", "#33A02C", "#B2DF8A"), border = "#CAB2D6")
plot1
If you then type plot1 into the R console, you will get back two of the plot parameters but not the plot itself. This means that when I store the plots in the list, I am unable to graph them with multiplot. It will simply return the plot parameters and a blank plot.
This behavior does not seem to be the case with qplot for example which will return a plot when you recall the stored plot. Example:
library(ggplot2)
a <- rnorm(100)
b <- rnorm(100)
plot2 <- qplot(a,b)
plot2
There is no equivalent to the beanplot that I know of in ggplot. Is there some sort of workaround I can use for this issue?
Thank you.
You can simply open a PDF device with pdf() and keep the default parameter onefile=TRUE. Then call all your beanplot()s, one after the other. They will all be in one PDF document, each one on a separate page. See here.

Resources