ggSave group_by df list of ggarrange'd ggplot objects - r

I've used group_by, do, and ggplot - twice - to create two simple dfs of Date (the group) and a list of the ggplot outputs, thanks hugely to help from examples on this site. Simplified example:
p1 <- df_i %>% group_by(Date) %>% do(
plots = ggplot(data = .) +
geom_line() #etc, hugely long and detailed ggplot call omitted for brevity, but it works fine
) # close do
I can then join those dfs,
p1 <- cbind(p1, p2[,2])
names(p1) <- c("Date", "Temp", "Light") #Temp & Light were both "plots" from above
And loop through the rows, saving the outputs in a 1-row (top & bottom object) ggarranged png:
for (j in 1:nrow(p1)) {
ggsave(file = paste0(p1$Date[j], ".png"),
plot = arrangeGrob(p1$Temp[[j]], p1$Light[[j]]),
device="png",scale=1.75,width=6.32,height=4,units="in",dpi=300,limitsize=TRUE)
}
So far, so good. But nature abhors a for-loop, so I was trying to do the ggsaving in a group_by, using the same ggsave parameter options, changing only what's needed given the difference in for-loop indexing vs (what I understand of) group_by subsetting:
p1 %>% group_by(Date) %>%
ggsave(file = paste0(.$Date, ".png"),
plot = arrangeGrob(Temp, Light),...) #other params hidden here for brevity
Error in grDevices::png(..., res = dpi, units = "in"): invalid
'pointsize' argument
If I add pointsize=10 it says "invalid bg value"; add bg = "white":
Error in check.options(new, name.opt = ".X11.Options", envir =
.X11env) : invalid arguments in 'grDevices::png(..., res = dpi,
units = "in")' (need named args)
(I also tried lowering dpi to no effect). Possibly I'm going about this the wrong way, e.g. swapping %>% for %$% in Vlad's suggestion from magrittr:
Error in gList(list(list(data = list(DateTimeUTCmin5 = c(915213660, 915213780, :
only 'grobs' allowed in "gList"
This gives the same error with Date and .$Date in the ggsave call. Trying to recreate the do framework:
p1 %>% group_by(Date) %>%
do(ggsave(file = paste0(.$Date, ".png"),"_", .$Date, ".png"),
plot = arrangeGrob(Temp, Light), #etc
Error in arrangeGrob(Temp, Light) : object 'Temp' not found
p1 %>% group_by(Date) %>%
do(ggsave(file = paste0(.$Date, ".png"),"_", .$Date, ".png"),
plot = arrangeGrob(.$Temp, .$Light), #etc
Error in gList(list(list(data = list(DateTimeUTCmin5 = c(915213660,
915213780, : only 'grobs' allowed in "gList"
Which gives the same error if I use %$%.
Does anyone have the connected stack of understanding of these tools such that they can see what I'm doing wrong here? It seems like I should be close, but I'm increasingly groping around in the dark. Any pointers very much appreciated. Thanks in advance!
Equally if folks recommend a different approach I'm interested too. It strikes me that I could use an lapply (or parSapply) instead of the for-loop on the p1 df. Do operations on grouped dfs outperform apply operations?
[Edit: desired final output: ggsave dumps 1 image (with 2 plots on it) per Date, into the specified folder. Essentially if I can get ggsave to work within the grouped_df, that should be that]

Related

Animating Histograms with plotly

I'm trying to create an animated demonstration of the Law of Large Numbers, where I want to show the histogram converging to the density as the sample size increase.
I can do this with R shiny, putting a slider on the sample size, but when I try to set up a plotly animation using the sample size as the frame, I get an error deep in the bowels of ggploty. Here is the sample code:
library(tidyverse)
library(plotly)
XXX <- rnorm(200)
plotdat <- bind_rows(lapply(25:200, function(i) data.frame(x=XXX[1:i],f=i)))
hplot <- ggplot(plotdat,aes(x,frame=f)) + geom_histogram(binwidth=.25)
ggplotly(hplot)
The last line returns the error. Error in -data$group : invalid argument to unary operator.
I'm not sure where it is suppose to be getting data$group (this value has been magically set for me in other invocations of ggplotly).
Skipping the initial ggplot and going straight to plotly, does this work for you?
plotdat %>%
plot_ly(x=~x,
type = 'histogram',
frame = ~f) %>%
layout(yaxis = list(range = c(0,50)))
Or, using your original syntax, we can add a position specification that seems to prevent the bug. This version looks better, with standard ggplot formatting and tweened animation.
hplot <- ggplot(plotdat, aes(x, frame = f)) +
geom_histogram(binwidth=.25, position = "identity")
ggplotly(hplot) %>%
animation_opts(frame = 100) # minimum ms per frame to control speed
(I don't know why this fixes it, but when I googled your error I saw a plotly issue on github that was solved by specifying the position, and it seems to fix the error here too. https://github.com/plotly/plotly.R/issues/1544)

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Plotting tanglegrams subplots in R using dendextend

I am plotting Tanglegrams in R using dendextend. I am wondering if it is possible to plot multiple subplots using par(mfrow = c(2,2))?
I can't seem to figure it out.
Thanks
library(dendextend)
dend15 <- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend15 <- dend15 %>% set("labels_to_char")
dend51 <- dend15 %>% set("labels", as.character(5:1)) %>% match_order_by_labels(dend15)
dends_15_51 <- dendlist(dend15, dend51)
par(mfrow = c(2,2))
tanglegram(dends_15_51)
tanglegram(dends_15_51)
tanglegram(dends_15_51)
tanglegram(dends_15_51)
tl;dr: It is not possible to use par(mfrow=...) with the function tanglegram, but it is possible using layout.
Explanation: If you look closer at function tanglegram, you'll see (methods(tanglegram)) that, underneath, there are several methods, among which, dendextend:::tanglegram.dendrogram which is called to draw the tanglegram (as can be seen inside dendextend:::tanglegram.dendlist function).
Inside this function, there is a call to layout:
layout(matrix(1:3, nrow = 1), widths = columns_width)
This "erases" your previous setting of par(mfrow=c(2, 2)) and changes it to c(1, 3) (just for the "time" of the function though because at the end of the function, the value is reset...).
Indeed, in the help page of layout, it says:
These functions are totally incompatible with the other mechanisms for arranging plots on a device: par(mfrow), par(mfcol) and split.screen.
Conclusion: If you want to plot several tanglegrams in the same "window" you'll need to use the layout call (with 12 subparts: 2 rows and 6 columns) ahead of the calls to tanglegram and suppress the layout call inside tanglegram using the argument just_one=FALSE.
Example of drawing several tanglegrams:
Using the code below, you can then obtain the desired plot (I put the function's default widths for the layout):
layout(matrix(1:12, nrow=2, byrow=TRUE), widths=rep(c(5, 3, 5), 2))
tanglegram.dendlist_mod(dends_15_51, just_one=FALSE)
tanglegram.dendlist_mod(dends_15_51, just_one=FALSE)
tanglegram.dendlist_mod(dends_15_51, just_one=FALSE)
tanglegram.dendlist_mod(dends_15_51, just_one=FALSE)
This was done by updating the dendextend package in which: I modified the 2 functions tanglegram.dendrogram and tanglegram.dendlist to add a just_one parameter, which defaults to TRUE and changed the line of the layout in tanglegram.dendrogram to:
if (just_one) layout(matrix(1:3, nrow = 1), widths = columns_width)
I also suppressed the reset of par parameters and of course changed the call in tanglegram.dendlist (now called tanglegram.dendlist_mod) so it calls the new modified function, incorporates the just_one parameter and passes it to the modified tanglegram.dendrogram function.
Rather than creating a combined plot in a single graphical device, you could create multiple plots and arrange them when you put them in a document. The knitr package makes it easy to do this, by using fig.show = "hold" to hold on to multiple plots produced in a single R chunk and specifying a relevant out.width, e.g. 50% to have two plots in a row, for when the plots are placed in the document.
For example, in an R markdown (.Rmd) file you might have
```{r, fig.show = "hold", out.width = "50%", echo = FALSE}
suppressPackageStartupMessages(library(dendextend))
dend15 <- c(1:5) %>% dist %>% hclust(method = "average") %>% as.dendrogram
dend15 <- dend15 %>% set("labels_to_char")
dend51 <- dend15 %>% set("labels", as.character(5:1)) %>% match_order_by_labels(dend15)
dends_15_51 <- dendlist(dend15, dend51)
tanglegram(dends_15_51, margin_outer = 1)
plot.new()
tanglegram(dends_15_51, margin_outer = 1)
plot.new()
tanglegram(dends_15_51, margin_outer = 1)
plot.new()
tanglegram(dends_15_51, margin_outer = 1)
```
which when knitted to HTML, would look like the following:
There a few modifications I made to the code:
Suppressed package startup messages from dendextend.
Increased default margin_outer to avoid overlapping x axis labels from neighbouring plots.
Added plot.new() in between calls to tanglegram, otherwise the next plot would be drawn on top of the previous one (this is a result of tanglegram using layout and is not needed in general when producing multiple plots).
The same approach can be used in .Rnw files. If you are compiling to PDF (via LaTeX) you can add a figure caption and subcaptions, see knitr demo #067 - Graphics Options for more detail.

ggplot2 : printing multiple plots in one page with a loop

I have several subjects for which I need to generate a plot, as I have many subjects I'd like to have several plots in one page rather than one figure for subject.
Here it is what I have done so far:
Read txt file with subjects name
subjs <- scan ("ListSubjs.txt", what = "")
Create a list to hold plot objects
pltList <- list()
for(s in 1:length(subjs))
{
setwd(file.path("C:/Users/", subjs[[s]])) #load subj directory
ifile=paste("Co","data.txt",sep="",collapse=NULL) #Read subj file
dat = read.table(ifile)
dat <- unlist(dat, use.names = FALSE) #make dat usable for ggplot2
df <- data.frame(dat)
pltList[[s]]<- print(ggplot( df, aes(x=dat)) + #save each plot with unique name
geom_histogram(binwidth=.01, colour="cyan", fill="cyan") +
geom_vline(aes(xintercept=0), # Ignore NA values for mean
color="red", linetype="dashed", size=1)+
xlab(paste("Co_data", subjs[[s]] , sep=" ",collapse=NULL)))
}
At this point I can display the single plots for example by
print (pltList[1]) #will print first plot
print(pltList[2]) # will print second plot
I d like to have a solution by which several plots are displayed in the same page, I 've tried something along the lines of previous posts but I don't manage to make it work
for example:
for (p in seq(length(pltList))) {
do.call("grid.arrange", pltList[[p]])
}
gives me the following error
Error in arrangeGrob(..., as.table = as.table, clip = clip, main = main, :
input must be grobs!
I can use more basic graphing features, but I d like to achieve this by using ggplot. Many thanks for consideration
Matilde
Your error comes from indexing a list with [[:
consider
pl = list(qplot(1,1), qplot(2,2))
pl[[1]] returns the first plot, but do.call expects a list of arguments. You could do it with, do.call(grid.arrange, pl[1]) (no error), but that's probably not what you want (it arranges one plot on the page, there's little point in doing that). Presumably you wanted all plots,
grid.arrange(grobs = pl)
or, equivalently,
do.call(grid.arrange, pl)
If you want a selection of this list, use [,
grid.arrange(grobs = pl[1:2])
do.call(grid.arrange, pl[1:2])
Further parameters can be passed trivially with the first syntax; with do.call care must be taken to make sure the list is in the correct form,
grid.arrange(grobs = pl[1:2], ncol=3, top=textGrob("title"))
do.call(grid.arrange, c(pl[1:2], list(ncol=3, top=textGrob("title"))))
library(gridExtra) # for grid.arrange
library(grid)
grid.arrange(pltList[[1]], pltList[[2]], pltList[[3]], pltList[[4]], ncol = 2, main = "Whatever") # say you have 4 plots
OR,
do.call(grid.arrange,pltList)
I wish I had enough reputation to comment instead of answer, but anyway you can use the following solution to get it work.
I would do exactly what you did to get the pltList, then use the multiplot function from this recipe. Note that you will need to specify the number of columns. For example, if you want to plot all plots in the list into two columns, you can do this:
print(multiplot(plotlist=pltList, cols=2))

Store the result of a plot() call to a variable without sending to current graphics device

This is really one of two questions - either:
1) How do I store the result of a print() call [i.e. x <- print(something) ] without sending anything to current graphics output?
-or-
2) Is there a function or method in ggplot that will store a plot() call to a variable without calling plot() directly? ggplotGrob is in the ballpark, but a ggplotGrob object doesn't return a list with $data in it the same way you get when you store the result of print() to a variable.
I'm using a technique picked up from this SO answer to pull out the points of a geom_density curve, and then using that data to generate some annotations. I've outlined the issue below -- when I call this as a function, I get the undesired intermediate plot object in my pdf, along with the final plot. The goal is to get rid of that undesired plot; given that base hist() has a plot = FALSE option I was hopeful that someone who knows something more about R viewports would be able to fix my plot() call (solution #1), but any solution is fine, frankly.
library(ggplot2)
library(plyr)
demo <- function (df) {
p <- ggplot(
df
,aes(
x = rating
)
) +
geom_density()
#plot the object so we can access $data
render_plot <- plot(p + ggtitle("Don't want this plot"))
#grab just the DF for the density line
density_df <- render_plot$data[[1]]
#get the maximum density value
max_y <- ddply(density_df, "group", summarise, y = max(y))
#join that back to the data to find the matching row
anno <- join(density_df, max_y, type = 'inner')
#use this to annotate
p <- p + annotate(
geom = 'text'
,x = anno$x
,y = anno$y
,label = round(anno$density, 3)
) +
ggtitle('Keep this plot')
return(p)
}
#call to demo outputs an undesired plot to the graphics device
ex <- demo(movies[movies$Comedy ==1,])
plot(ex)
#this is problematic if you are trying to make a PDF
#a distinct name for the pdf to avoid filesystem issues
unq_name <- as.character(format(Sys.time(), "%X"))
unq_name <- gsub(':', '', unq_name)
pdf(paste(unq_name , '.pdf', sep=''))
p <- demo(movies[movies$Drama ==1,])
print(p)
dev.off()
Use ggplot_build:
render_plot <- ggplot_build(p + ggtitle("Don't want this plot"))

Resources