Weird ggplot2 error: Empty raster - r

Why does
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,1.5)),aes(x=x,y=y,color=z)) +
geom_point()
give me the error
Error in grid.Call.graphics(L_raster, x$raster, x$x, x$y, x$width, x$height, : Empty raster
but the following two plots work
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(2.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(1.5,2.5)),aes(x=x,y=y,color=z)) +
geom_point()
I'm using ggplot2 0.9.3.1

TL;DR: Check your data -- do you really want to use a continuous color scale with only one possible value for the color?
The error does not occur if you add + scale_fill_continuous(guide=FALSE) to the plot. (This turns off the legend.)
ggplot(data.frame(x=c(1,2), y=c(1,2), z=c(1.5,1.5)), aes(x=x,y=y,color=z)) +
geom_point() + scale_color_continuous(guide = FALSE)
The error seems to be triggered in cases where a continuous color scale uses only one color. The current GitHub version already includes the relevant pull request. Install it via:
devtools::install_github("hadley/ggplot2")
But more probably there is an issue with the data: why would you use a continuous color scale with only one value?

The same behaviour (i.e. the "Empty raster"error) appeared to me with another value apart from 1.5.
Try the following:
ggplot(data.frame(x=c(1,2),y=c(1,2),z=c(0.02,0.02)),aes(x=x,y=y,color=z))
+ geom_point()
And you get again the same error (tried with both 0.9.3.1 and 1.0.0.0 versions) so it looks like a nasty and weird bug.

This definitely sounds like an edge case better suited for a bug report as others have mentioned but here's some generalizable code that might be useful to somebody as a clunky workaround or for handling labels/colors. It's plotting a rescaled variable and using the real values as labels.
require(scales)
z <- c(1.5,1.5)
# rescale z to 0:1
z_rescaled <- rescale(z)
# customizable number of breaks in the legend
max_breaks_cnt <- 5
# break z and z_rescaled by quantiles determined by number of maximum breaks
# and use 'unique' to remove duplicate breaks
breaks_z <- unique(as.vector(quantile(z, seq(0,1,by=1/max_breaks_cnt))))
breaks_z_rescaled <- unique(as.vector(quantile(z_rescaled, seq(0,1,by=1/max_breaks_cnt))))
# make a color palette
Pal <- colorRampPalette(c('yellow','orange','red'))(500)
# plot z_rescaled with breaks_z used as labels
ggplot(data.frame(x=c(1,2),y=c(1,2),z_rescaled),aes(x=x,y=y,color=z_rescaled)) +
geom_point() + scale_colour_gradientn("z",colours=Pal,labels = breaks_z,breaks=breaks_z_rescaled)
This is quite off-topic but I like to use rescaling to send tons of changing variables to a function like this:
colorfunction <- gradient_n_pal(colours = colorRampPalette(c('yellow','orange','red'))(500),
values = c(0:1), space = "Lab")
colorfunction(z_rescaled)

Related

ggtree issue with coloring both tips and branches

I'm trying to make a plot using ggtree - but I'm having some issues when I try to have both the tip points and branches colored. The tree works with both of these independently, but when I try them together the fill for the nodes is overrided by the color argument from the branch and they come out grey (or it's ignoring it all together and defaulting to the same NA color?).
Here's the minimum code needed to produce the issue:
p <- ggtree(rerooted_tree, aes(color = support))
p <- p %<+% my_DF +
geom_tippoint(aes(fill = as.factor(domains.present)))
p
The variable domains.present is a character column in the dataframe, and works perfectly if it's color instead of fill like in the code below. In the above code, though, if domains.present isn't written as.factor within aes I get an error message saying Continuous value supplied to discrete scale.
q <- ggtree(rerooted_tree)
q <- q %<+% All.my_DF +
geom_tippoint(aes(color = domains.present), size = 1)
q
I'm hoping this is just a syntax issue, but I'm working on getting a reprex together to add if needed. This is a very similar issue to this post, but the OP there solved it without ggtree (I'd rather keep it simple if possible). Thank you in advance!
I ran into the same issue recently and having the branch colour defined outside of aes() worked for me:
p <- ggtree(rerooted_tree, color = support)
p <- p %<+% my_DF +
geom_tippoint(aes(fill = as.factor(domains.present)))
p

Animating Histograms with plotly

I'm trying to create an animated demonstration of the Law of Large Numbers, where I want to show the histogram converging to the density as the sample size increase.
I can do this with R shiny, putting a slider on the sample size, but when I try to set up a plotly animation using the sample size as the frame, I get an error deep in the bowels of ggploty. Here is the sample code:
library(tidyverse)
library(plotly)
XXX <- rnorm(200)
plotdat <- bind_rows(lapply(25:200, function(i) data.frame(x=XXX[1:i],f=i)))
hplot <- ggplot(plotdat,aes(x,frame=f)) + geom_histogram(binwidth=.25)
ggplotly(hplot)
The last line returns the error. Error in -data$group : invalid argument to unary operator.
I'm not sure where it is suppose to be getting data$group (this value has been magically set for me in other invocations of ggplotly).
Skipping the initial ggplot and going straight to plotly, does this work for you?
plotdat %>%
plot_ly(x=~x,
type = 'histogram',
frame = ~f) %>%
layout(yaxis = list(range = c(0,50)))
Or, using your original syntax, we can add a position specification that seems to prevent the bug. This version looks better, with standard ggplot formatting and tweened animation.
hplot <- ggplot(plotdat, aes(x, frame = f)) +
geom_histogram(binwidth=.25, position = "identity")
ggplotly(hplot) %>%
animation_opts(frame = 100) # minimum ms per frame to control speed
(I don't know why this fixes it, but when I googled your error I saw a plotly issue on github that was solved by specifying the position, and it seems to fix the error here too. https://github.com/plotly/plotly.R/issues/1544)

How to specify bin colors for plot_usmap?

I'm looking to create a heat map with a little more control over the color scale, specifically I want to have bins for ranges of values that will correspond to a specific color.
Below I provide some sample code to generate some data and make a plot. The issue seems to be how it maps the colors to the breaks, it is not a 1:1 correspondence, when I add more percentiles to the breaks it seems to stretch the colors.
It does not appear to be a large issue here, but when I apply this to the entire US data set I'm working with the color scheme really breaks down.
library(usmap)
library(ggplot2)
fips <- seq(45001,45091,2)
value <- rnorm(length(fips),3000,10000)
data <- data.frame(fips,value)
data$value[data$value<0]=0
plot_usmap(regions='counties',data=data,values="value",include="SC") +
scale_fill_stepsn(breaks=c(as.numeric(quantile(data$value,seq(.25,1,.25)))),
colors=c("blue","green","yellow","red"))
plot_usmap(regions='counties',data=data,values="value",include="SC") +
scale_fill_stepsn(breaks=c(as.numeric(quantile(data$value,seq(0,1,.1)))),
colors=c("blue","green","yellow","red"))
#data not provided for this bit
plot_usmap(regions='counties',data=datar,values="1969",exclude=c("AK","HI")) +
scale_fill_stepsn(breaks=c(as.numeric(quantile(datar$`1969`,seq(0,1,.1)))),
colours=c("blue","green","yellow","red"))
One way would be to manually bin the percentiles and then use the factor levels for your manual breaks and labels.
I've never used this high level function from usmap, so I don't know how to deal with this warning which comes up. Would personally prefer and recommend to use ggplot + geom_polygon or friends for more control.
library(usmap)
library(ggplot2)
fips <- seq(45001,45091,2)
value <- rnorm(length(fips),3000,10000)
mydat <- base::data.frame(fips,value)
mydat$value[mydat$value<0]=0
mydat$perc_cuts <- as.integer(cut(ecdf(mydat$value)(mydat$value), seq(0,1,.25)))
plot_usmap(regions='counties',
data=mydat,
values="perc_cuts",include="SC") +
scale_fill_stepsn(breaks= 1:4, limits = c(0,4), labels = seq(.25, 1, .25),
colors=c("blue","green","yellow","red"),
guide = guide_colorsteps(even.steps = FALSE))
#> Warning: Use of `map_df$x` is discouraged. Use `x` instead.
#> Warning: Use of `map_df$y` is discouraged. Use `y` instead.
#> Warning: Use of `map_df$group` is discouraged. Use `group` instead.
Created on 2020-06-27 by the reprex package (v0.3.0)

Using the QQ Plot functionality in ggplot

I'm brand new to R, and have a data frame with 8 columns that has daily changes in interest rates. I can plot QQ plots for data each of the 8 columns using the following code:
par(mfrow = c(2,4))
for(i in 1:length(column_names)){
qqnorm(deltaIR.df[,i],main = column_names[i], pch = 16, cex = .5)
qqline(deltaIR.df[,i],cex = .5)
}
I'd like now to use the stat_qq function in the ggplot2 package to do this more elegantly, but just can't get my arms around the syntax - I keep getting it wrong. Would someone kindly help me translate the above code to use ggplot and allow me to view my 8 QQ plots on one page with an appropriate header? Trying the obvious
ggplot(deltaIR.df) + stat_qq(sample = columns[i])
gets me only an error message
Warning: Ignoring unknown parameters: sample
Error: stat_qq requires the following missing aesthetics: sample
and adding in the aesthetics
ggplot(deltaIR.df, aes(column_names)) + stat_qq()
is no better. The error message just changes to
Error: Aesthetics must be either length 1 or the same as the data (5271)
In short, nothing I have done so far (even with Google's assistance) has got me closer to a solution. May I ask for guidance?

How to plot a violin scatter boxplot (in R)?

I just came by the following plot:
And wondered how can it be done in R? (or other softwares)
Update 10.03.11: Thank you everyone who participated in answering this question - you gave wonderful solutions! I've compiled all the solution presented here (as well as some others I've came by online) in a post on my blog.
Make.Funny.Plot does more or less what I think it should do. To be adapted according to your own needs, and might be optimized a bit, but this should be a nice start.
Make.Funny.Plot <- function(x){
unique.vals <- length(unique(x))
N <- length(x)
N.val <- min(N/20,unique.vals)
if(unique.vals>N.val){
x <- ave(x,cut(x,N.val),FUN=min)
x <- signif(x,4)
}
# construct the outline of the plot
outline <- as.vector(table(x))
outline <- outline/max(outline)
# determine some correction to make the V shape,
# based on the range
y.corr <- diff(range(x))*0.05
# Get the unique values
yval <- sort(unique(x))
plot(c(-1,1),c(min(yval),max(yval)),
type="n",xaxt="n",xlab="")
for(i in 1:length(yval)){
n <- sum(x==yval[i])
x.plot <- seq(-outline[i],outline[i],length=n)
y.plot <- yval[i]+abs(x.plot)*y.corr
points(x.plot,y.plot,pch=19,cex=0.5)
}
}
N <- 500
x <- rpois(N,4)+abs(rnorm(N))
Make.Funny.Plot(x)
EDIT : corrected so it always works.
I recently came upon the beeswarm package, that bears some similarity.
The bee swarm plot is a
one-dimensional scatter plot like
"stripchart", but with closely-packed,
non-overlapping points.
Here's an example:
library(beeswarm)
beeswarm(time_survival ~ event_survival, data = breast,
method = 'smile',
pch = 16, pwcol = as.numeric(ER),
xlab = '', ylab = 'Follow-up time (months)',
labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER),
title = 'ER', pch = 16, col = 1:2)
(source: eklund at www.cbs.dtu.dk)
I have come up with the code similar to Joris, still I think this is more than a stem plot; here I mean that they y value in each series is a absolute value of a distance to the in-bin mean, and x value is more about whether the value is lower or higher than mean.
Example code (sometimes throws warnings but works):
px<-function(x,N=40,...){
x<-sort(x);
#Cutting in bins
cut(x,N)->p;
#Calculate the means over bins
sapply(levels(p),function(i) mean(x[p==i]))->meansl;
means<-meansl[p];
#Calculate the mins over bins
sapply(levels(p),function(i) min(x[p==i]))->minl;
mins<-minl[p];
#Each dot is one value.
#X is an order of a value inside bin, moved so that the values lower than bin mean go below 0
X<-rep(0,length(x));
for(e in levels(p)) X[p==e]<-(1:sum(p==e))-1-sum((x-means)[p==e]<0);
#Y is a bin minum + absolute value of a difference between value and its bin mean
plot(X,mins+abs(x-means),pch=19,cex=0.5,...);
}
Try the vioplot package:
library(vioplot)
vioplot(rnorm(100))
(with awful default color ;-)
There is also wvioplot() in the wvioplot package, for weighted violin plot, and beanplot, which combines violin and rug plots. They are also available through the lattice package, see ?panel.violin.
Since this hasn't been mentioned yet, there is also ggbeeswarm as a relatively new R package based on ggplot2.
Which adds another geom to ggplot to be used instead of geom_jitter or the like.
In particular geom_quasirandom (see second example below) produces really good results and I have in fact adapted it as default plot.
Noteworthy is also the package vipor (VIolin POints in R) which produces plots using the standard R graphics and is in fact also used by ggbeeswarm behind the scenes.
set.seed(12345)
install.packages('ggbeeswarm')
library(ggplot2)
library(ggbeeswarm)
ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()
ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

Resources