Animating Histograms with plotly - r

I'm trying to create an animated demonstration of the Law of Large Numbers, where I want to show the histogram converging to the density as the sample size increase.
I can do this with R shiny, putting a slider on the sample size, but when I try to set up a plotly animation using the sample size as the frame, I get an error deep in the bowels of ggploty. Here is the sample code:
library(tidyverse)
library(plotly)
XXX <- rnorm(200)
plotdat <- bind_rows(lapply(25:200, function(i) data.frame(x=XXX[1:i],f=i)))
hplot <- ggplot(plotdat,aes(x,frame=f)) + geom_histogram(binwidth=.25)
ggplotly(hplot)
The last line returns the error. Error in -data$group : invalid argument to unary operator.
I'm not sure where it is suppose to be getting data$group (this value has been magically set for me in other invocations of ggplotly).

Skipping the initial ggplot and going straight to plotly, does this work for you?
plotdat %>%
plot_ly(x=~x,
type = 'histogram',
frame = ~f) %>%
layout(yaxis = list(range = c(0,50)))
Or, using your original syntax, we can add a position specification that seems to prevent the bug. This version looks better, with standard ggplot formatting and tweened animation.
hplot <- ggplot(plotdat, aes(x, frame = f)) +
geom_histogram(binwidth=.25, position = "identity")
ggplotly(hplot) %>%
animation_opts(frame = 100) # minimum ms per frame to control speed
(I don't know why this fixes it, but when I googled your error I saw a plotly issue on github that was solved by specifying the position, and it seems to fix the error here too. https://github.com/plotly/plotly.R/issues/1544)

Related

Is there a way to create a kissing people curve using ggplot2 in R

Is it possible to create custom graphs using ggplot2, for example I want to create a graph of kissing people.
Simple variant
Not completely, but partially, I was able to reproduce it, everything except for the "lines of the eyes" is not clear how to mark them
But how to make a more complex graph of kissing people. In general, is it possible to somehow approximate such a curve, more voluminou?
thank you for your help.
perhaps not what you are looking for, but if you have already got the image, and want to reproduce it in ggplot, then you can use the following method:
library(tidyverse)
library(magick)
library(terra)
# read image
im <- image_read("./data/kiss_1.png")
# conver to black/white image
im2 <- im %>%
image_quantize(
max = 2,
colorspace = "gray" )
# get a matrix of the pixel-colors
m <- as.raster(im2) %>% as.matrix()
# extract coordinates of the black pixels
df <- as.data.frame(which(m == "#000000ff", arr.ind=TRUE))
df$row <- df$row * -1
# plot point
ggplot(df, aes(x = col, y = row)) + geom_point()

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

ggSave group_by df list of ggarrange'd ggplot objects

I've used group_by, do, and ggplot - twice - to create two simple dfs of Date (the group) and a list of the ggplot outputs, thanks hugely to help from examples on this site. Simplified example:
p1 <- df_i %>% group_by(Date) %>% do(
plots = ggplot(data = .) +
geom_line() #etc, hugely long and detailed ggplot call omitted for brevity, but it works fine
) # close do
I can then join those dfs,
p1 <- cbind(p1, p2[,2])
names(p1) <- c("Date", "Temp", "Light") #Temp & Light were both "plots" from above
And loop through the rows, saving the outputs in a 1-row (top & bottom object) ggarranged png:
for (j in 1:nrow(p1)) {
ggsave(file = paste0(p1$Date[j], ".png"),
plot = arrangeGrob(p1$Temp[[j]], p1$Light[[j]]),
device="png",scale=1.75,width=6.32,height=4,units="in",dpi=300,limitsize=TRUE)
}
So far, so good. But nature abhors a for-loop, so I was trying to do the ggsaving in a group_by, using the same ggsave parameter options, changing only what's needed given the difference in for-loop indexing vs (what I understand of) group_by subsetting:
p1 %>% group_by(Date) %>%
ggsave(file = paste0(.$Date, ".png"),
plot = arrangeGrob(Temp, Light),...) #other params hidden here for brevity
Error in grDevices::png(..., res = dpi, units = "in"): invalid
'pointsize' argument
If I add pointsize=10 it says "invalid bg value"; add bg = "white":
Error in check.options(new, name.opt = ".X11.Options", envir =
.X11env) : invalid arguments in 'grDevices::png(..., res = dpi,
units = "in")' (need named args)
(I also tried lowering dpi to no effect). Possibly I'm going about this the wrong way, e.g. swapping %>% for %$% in Vlad's suggestion from magrittr:
Error in gList(list(list(data = list(DateTimeUTCmin5 = c(915213660, 915213780, :
only 'grobs' allowed in "gList"
This gives the same error with Date and .$Date in the ggsave call. Trying to recreate the do framework:
p1 %>% group_by(Date) %>%
do(ggsave(file = paste0(.$Date, ".png"),"_", .$Date, ".png"),
plot = arrangeGrob(Temp, Light), #etc
Error in arrangeGrob(Temp, Light) : object 'Temp' not found
p1 %>% group_by(Date) %>%
do(ggsave(file = paste0(.$Date, ".png"),"_", .$Date, ".png"),
plot = arrangeGrob(.$Temp, .$Light), #etc
Error in gList(list(list(data = list(DateTimeUTCmin5 = c(915213660,
915213780, : only 'grobs' allowed in "gList"
Which gives the same error if I use %$%.
Does anyone have the connected stack of understanding of these tools such that they can see what I'm doing wrong here? It seems like I should be close, but I'm increasingly groping around in the dark. Any pointers very much appreciated. Thanks in advance!
Equally if folks recommend a different approach I'm interested too. It strikes me that I could use an lapply (or parSapply) instead of the for-loop on the p1 df. Do operations on grouped dfs outperform apply operations?
[Edit: desired final output: ggsave dumps 1 image (with 2 plots on it) per Date, into the specified folder. Essentially if I can get ggsave to work within the grouped_df, that should be that]

Retrieve axis tick information from a plotly figure

I'm plotting a heatmap using R plotly:
set.seed(1)
df <- reshape2::melt(matrix(rnorm(100*20),100,20,dimnames = list(paste0("G",1:100),paste0("S",1:20))))
library(plotly)
library(dplyr)
plot_ly(z=c(df$value),x=df$Var2,y=df$Var1,colors=grDevices::colorRamp(c("darkblue","gray","darkred")),type="heatmap",colorbar=list(title="Scaled Value",len=0.4)) %>%
layout(yaxis=list(title=NULL),xaxis=list(tickangle=90,tickvals=10,ticktext="X-Label"))
As you can see, plotly is not showing all y-axis ticks. My question is whether it is possible, and if so how, to retrieve the y-axis tick labels plotly selected to show?
I saved the plot object:
plotly.obj <- plot_ly(z=c(df$value),x=df$Var2,y=df$Var1,colors=grDevices::colorRamp(c("darkblue","gray","darkred")),type="heatmap",colorbar=list(title="Scaled Value",len=0.4)) %>%
layout(yaxis=list(title=NULL),xaxis=list(tickangle=90,tickvals=10,ticktext="X-Label"))
And looked around and it seems that perhaps plotly.obj$x$layoutAttrs should store this information but it doesn't:
> plotly.obj$x$layoutAttrs
$`102ce55fd393e`
$`102ce55fd393e`$yaxis
$`102ce55fd393e`$yaxis$title
NULL
$`102ce55fd393e`$xaxis
$`102ce55fd393e`$xaxis$tickangle
[1] 90
$`102ce55fd393e`$xaxis$tickvals
[1] 10
$`102ce55fd393e`$xaxis$ticktext
[1] "X-Label"
Any idea?
I don't think you can get the ticks, that are finally rendered. But you can get all the levels of the y-axis, that ploty can choose from.
levels(plotly.obj$x$attrs$`2c4c148651ae`$y)
The ticks that are finally rendered are dynamically chosen and will adapt, depending on your plot size etc.
You can also check out the attributes with plotly_json():
plot_ly(z=c(df$value),x=df$Var2,y=df$Var1,colors=grDevices::colorRamp(c("darkblue","gray","darkred")),type="heatmap",colorbar=list(title="Scaled Value",len=0.4)) %>%
layout(yaxis=list(title=NULL),xaxis=list(tickangle=90,tickvals=10,ticktext="X-Label")) %>%
plotly_json()
I got the answer from a github issue I posted on ropensci/plotly:
set.seed(1)
df <- reshape2::melt(matrix(rnorm(100*20),100,20,dimnames = list(paste0("G",1:100),paste0("S",1:20))))
library(plotly)
library(dplyr)
plot_ly(z=c(df$value),x=df$Var2,y=df$Var1,colors=grDevices::colorRamp(c("darkblue","gray","darkred")),type="heatmap",colorbar=list(title="Scaled Value",len=0.4)) %>%
layout(yaxis=list(title=NULL),xaxis=list(tickangle=90,tickvals=10,ticktext="X-Label")) %>%
htmlwidgets::onRender(
"function(el, x) {
alert(el._fullLayout.yaxis._vals.map(function(i) { return i.text; }));
}"
)
Will pop up a browser window with the tick labels.
The question now is if this can be saved/piped to an R variable or written to a file so it can be done automatically rather than interactively. That's going to be another post.

Weird ggplot2 error: Empty raster

Why does
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,1.5)),aes(x=x,y=y,color=z)) +
geom_point()
give me the error
Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster
but the following two plots work
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(2.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
I'm using ggplot2 0.9.3.1
TL;DR: Check your data -- do you really want to use a continuous color scale with only one possible value for the color?
The error does not occur if you add + scale_fill_continuous(guide=FALSE) to the plot. (This turns off the legend.)
ggplot(data.frame(x=c(1,2), y=c(1,2), z=c(1.5,1.5)), aes(x=x,y=y,color=z)) +
geom_point() + scale_color_continuous(guide = FALSE)
The error seems to be triggered in cases where a continuous color scale uses only one color. The current GitHub version already includes the relevant pull request. Install it via:
devtools::install_github("hadley/ggplot2")
But more probably there is an issue with the data: why would you use a continuous color scale with only one value?
The same behaviour (i.e. the "Empty raster"error) appeared to me with another value apart from 1.5.
Try the following:
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(0.02,0.02)),aes(x=x,y=y,color=z))
+ geom_point()
And you get again the same error (tried with both 0.9.3.1 and 1.0.0.0 versions) so it looks like a nasty and weird bug.
This definitely sounds like an edge case better suited for a bug report as others have mentioned but here's some generalizable code that might be useful to somebody as a clunky workaround or for handling labels/colors. It's plotting a rescaled variable and using the real values as labels.
require(scales)
z <- c(1.5,1.5)
# rescale z to 0:1
z_rescaled <- rescale(z)
# customizable number of breaks in the legend
max_breaks_cnt <- 5
# break z and z_rescaled by quantiles determined by number of maximum breaks
# and use 'unique' to remove duplicate breaks
breaks_z <- unique(as.vector(quantile(z, seq(0,1,by=1/max_breaks_cnt))))
breaks_z_rescaled <- unique(as.vector(quantile(z_rescaled, seq(0,1,by=1/max_breaks_cnt))))
# make a color palette
Pal <- colorRampPalette(c('yellow','orange','red'))(500)
# plot z_rescaled with breaks_z used as labels
ggplot(data.frame(x=c(1,2),y=c(1,2),z_rescaled),aes(x=x,y=y,color=z_rescaled)) +
geom_point() + scale_colour_gradientn("z",colours=Pal,labels = breaks_z,breaks=breaks_z_rescaled)
This is quite off-topic but I like to use rescaling to send tons of changing variables to a function like this:
colorfunction <- gradient_n_pal(colours = colorRampPalette(c('yellow','orange','red'))(500),
values = c(0:1), space = "Lab")
colorfunction(z_rescaled)

Resources