How to suppress zeroes when using geom_histogram with scale_y_log10 - r

I'm trying to plot a histogram with a log y scale using ggplot, geom_histogram and scale_y_log10. Most regions (those with counts greater than 1) appear correct: the background is transparent and the histogram bars are filled with the default color black. But at counts of 1, the colors are inverted: black background and transparent fill of the histogram bars. This code (below) generates the example in the graph.
Can anyone explain the cause of this? I understand the problems that come with log scales but I can't seem to find a solution to this. I'm hoping there's a easy fix, or that I overlooked something.
set.seed(1)
df <- data.frame(E=sample(runif(100), 20, TRUE))
ggplot(df,aes(E)) + geom_histogram(binwidth=0.1) + scale_y_log10(limits=c(0.1,100)) + xlim(0,1)

You can add drop=TRUE to the geom_histogram call to drop bins with zero counts (see ?stat_bin for details):
set.seed(1)
df <- data.frame(E=sample(runif(100), 20, TRUE))
ggplot(df,aes(E)) +
geom_histogram(binwidth=0.1, drop=TRUE) +
scale_y_log10(limits=c(0.1,100)) +
xlim(0,1)
EDIT: Since the scale starts at 1, it is impossible to display a bar of height 1. As mentioned in this answer, you can choose to start at different levels, but it may become misleading. Here's the code for this anyway:
require(scales)
mylog_trans <-
function (base = exp(1), from = 0)
{
trans <- function(x) log(x, base) - from
inv <- function(x) base^(x + from)
trans_new("mylog", trans, inv, log_breaks(base = base), domain = c(base^from, Inf))
}
ggplot(df,aes(E)) +
geom_histogram(binwidth=0.1, drop=TRUE) +
scale_y_continuous(trans = mylog_trans(base=10, from=-1), limits=c(0.1,100)) +
xlim(0,1)

Related

How do you fix warning message: colourbar guide needs continuous scales?

I would like to produce multiple contour plots using ggplot2 and
geom_contour_filled()
but the z values range is too large. To give you a little bit of an idea of what the values are, it ranges from -2,71 to -157,28. So I thought I should change the breaks so it covers all of these values.
The code below is not the data I work with, but it should represent the problem I have:
The data
h_axis <- 10^(seq(log10(0.1), log10(1000),
length.out = 20))
a_axis <- 10^(seq(log10(0.1), log10(1000),
length.out = 20))
comb <- expand.grid(h_axis, a_axis)
h_val <- comb$Var2
a_val <- comb$Var1
values <- seq(-2, -150, length.out = 400)
dt <- data.frame(h = h_val, a = a_val, values)
First, let's say I don't change the breaks. Then, using this code
ggplot(dt, aes(x = log10(h_val), y = log10(a_val), z = values)) +
geom_contour_filled() +
# geom_contour(color = "black", size = 0.1) +
xlab(expression(log[10](h))) +
ylab(expression(log[10](a))) +
guides(fill = guide_colorbar(title = expression('E ||'*g - hat(g)*'||'[2]*'')))
will produce the following figure:
So a lot of the area will be covered by the same colour, which is a problem since my data consists of multiple factors. Factor 1 is covered by the yellow, Factor 2 is covered by the green, and so on.
Then my second approach, is to add
bar <- 10^(seq(log10(-min(values)), log10(-max(values)),
length.out = 100))
and put bar in the geom_contour_filled() like this
geom_contour_filled(breaks = -bar)
Then I get
which is nice! But, in both cases I get the following warning
Warning message:
colourbar guide needs continuous scales.
Also, the legend is not shown on the right side. What do I need to do to fix the warning and how can I make sure that the legend is shown?
Try guide_legend instead of guide_colorbar.

Color horizontal bars in ggplot's geom_bar

I'm having some trouble with ggplot2's geom_bar:
Here's my data:
set.seed(1)
df <- data.frame(log10.p.value = -10*log10(runif(10,0,1)), y = letters[1:10], col = rep("#E0E0FF",10), stringsAsFactors = F)
#specify color by log10.p.value
df$col[which(df$log10.p.value > 2)] <- "#EBCCD6"
df$col[which(df$log10.p.value > 4)] <- "#E09898"
df$col[which(df$log10.p.value > 6)] <- "#C74747"
df$col[which(df$log10.p.value > 8)] <- "#B20000"
#truncate bars
df$log10.p.value[which(df$log10.p.value > 10)] <- 10
As you can see each log10.p.value interval is assigned a different color and since I don't want the bars to extend beyond log10.p.value = 10 I set any such value to 10.
My ggplot command is:
p <- ggplot(df, aes(y=log10.p.value,x=y,fill=as.factor(col)))+
geom_bar(stat="identity",width=0.2)+coord_flip()+scale_y_continuous(limits=c(0,10),labels=c(seq(0,7.5,2.5)," >10"))+
theme(axis.text=element_text(size=10))+scale_fill_manual(values=df$col,guide=FALSE)
And the figure is:
The problems are:
The bar colors in the plot do not match df$col. For example, bars a and b are colored #EBCCD6 rather than #E09898.
Because I manually specify the x-axis last tick text to be ">10" an extra space is created in the plot to the right of that tick making the bars I truncated at 10 seem like they end at 10, whereas my intention was for them to go all the way to the right end of the plot.
I am unable to reproduce the graph that you generated. Running the code you've provided generates the following graph:
You can correctly specify the colours by naming the vector that you are passing to scale_fill_manual:
coloursv <- df$col
names(coloursv) <- df$col
To the second part of your question - you can make sure there is no space between the bars and the edge of the graph using the expand parameter of scale_y_continuous.
Making these two changes, the code for the plot becomes:
p <- ggplot(df, aes(y=log10.p.value,x=y,fill=as.factor(col)))+
geom_bar(stat="identity",width=0.2) +
scale_y_continuous(limits=c(0,10),
labels=c(seq(0,7.5,2.5)," >10"),
expand = c(0,0)) +
theme(axis.text=element_text(size=10)) +
scale_fill_manual(values = coloursv,guide = F) +
coord_flip()
The '>10' label is a bit cut-off. You can increase the plot margins to make it visible, using:
p + theme(plot.margin=unit(c(0.1,0.5,0.1,0.1),"cm"))

Two column/row Positioning of labels in ggplot

I am making a plot in ggplot2 where on the y axis I have the indices of groups and on the x axis some information. For readability I would like to make the labels bigger but then they start overlapping. Therefore I would like to put the labels into two columns as shown in the figure so they can be bigger. Is there a way to do this in ggplot? I tried vjust and hjust but they only seem to accept 1 argument applying to all labels.
Current labels:
Objective labeling:
Well, there is no obvious parameter responsible for that, at least AFAIK.
However, for your specific goal my first thought was to add some spaces to numeric labels.
avoid_overlap <- function(x)
{
ind <- seq_along(x) %% 2 == 0
x[ind] <- paste0(x[ind], " ")
x
}
ggplot(mtcars, aes(cyl, mpg)) + geom_point() +
scale_y_continuous(breaks = 10:35, labels = avoid_overlap(10:35)) +
theme(axis.text.y = element_text(size = 32))
Play with grid lines (minor/major) via theme if the grid is too dense.

Label minimum and maximum of scale fill gradient legend with text: ggplot2

I have a plot created in ggplot2 that uses scale_fill_gradientn. I'd like to add text at the minimum and maximum of the scale legend. For example, at the legend minimum display "Minimum" and at the legend maximum display "Maximum". There are posts using discrete fills and adding labels with numbers instead of text (e.g. here), but I am unsure how to use the labels feature with scale_fill_gradientn to only insert text at the min and max. At the present I am apt to getting errors:
Error in scale_labels.continuous(scale, breaks) :
Breaks and labels are different lengths
Is this text label possible within ggplot2 for this type of scale / fill?
# The example code here produces an plot for illustrative purposes only.
# create data frame, from ggplot2 documentation
df <- expand.grid(x = 0:5, y = 0:5)
df$z <- runif(nrow(df))
#plot
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=topo.colors(7),na.value = "transparent")
For scale_fill_gradientn() you should provide both arguments: breaks= and labels= with the same length. With argument limits= you extend colorbar to minimum and maximum value you need.
ggplot(df, aes(x, y, fill = z)) + geom_raster() +
scale_fill_gradientn(colours=topo.colors(7),na.value = "transparent",
breaks=c(0,0.5,1),labels=c("Minimum",0.5,"Maximum"),
limits=c(0,1))
User Didzis Elfert's answer slightly lacks "automatism" in my opinion (but it is of course pointing to the core of the problem +1 :).
Here an option to programatically define minimum and maximum of your data.
Advantages:
You will not need to hard code values any more (which is error prone)
You will not need hard code the limits (which also is error prone)
Passing a named vector: You don't need the labels argument (manually map labels to values is also error-prone).
As a side effect you will avoid the "non-matching labels/breaks" problem
library(ggplot2)
foo <- expand.grid(x = 0:5, y = 0:5)
foo$z <- runif(nrow(foo))
myfuns <- list(Minimum = min, Mean = mean, Maximum = max)
ls_val <- unlist(lapply(myfuns, function(f) f(foo$z)))
# you only need to set the breaks argument!
ggplot(foo, aes(x, y, fill = z)) +
geom_raster() +
scale_fill_gradientn(
colours = topo.colors(7),
breaks = ls_val
)
# You can obviously also replace the middle value with sth else
ls_val[2] <- 0.5
names(ls_val)[2] <- 0.5
ggplot(foo, aes(x, y, fill = z)) +
geom_raster() +
scale_fill_gradientn(
colours = topo.colors(7),
breaks = ls_val
)

easiest way to discretize continuous scales for ggplot2 color scales?

Suppose I have this plot:
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, colour=Sepal.Length)) + scale_colour_gradient()
what is the correct way to discretize the color scale, like the plot shown below the accepted answer here (gradient breaks in a ggplot stat_bin2d plot)?
ggplot correctly recognizes discrete values and uses discrete scales for these, but my question is if you have continuous data and you want a discrete colour bar for it (with each square corresponding to a value, and squares colored in a gradient still), what is the best way to do it? Should the discretizing/binning happen outside of ggplot and get put in the dataframe as a separate discrete-valued column, or is there a way to do it within ggplot? an example of what I'm looking for is similar to the scale shown here:
except I'm plotting a scatter plot and not something like geom_tile/heatmap.
thanks.
The solution is slightly complicated, because you want a discrete scale. Otherwise you could probably simply use round.
library(ggplot2)
bincol <- function(x,low,medium,high) {
breaks <- function(x) pretty(range(x), n = nclass.Sturges(x), min.n = 1)
colfunc <- colorRampPalette(c(low, medium, high))
binned <- cut(x,breaks(x))
res <- colfunc(length(unique(binned)))[as.integer(binned)]
names(res) <- as.character(binned)
res
}
labels <- unique(names(bincol(iris$Sepal.Length,"blue","yellow","red")))
breaks <- unique(bincol(iris$Sepal.Length,"blue","yellow","red"))
breaks <- breaks[order(labels,decreasing = TRUE)]
labels <- labels[order(labels,decreasing = TRUE)]
ggplot(iris) +
geom_point(aes(x=Sepal.Width, y=Sepal.Length,
colour=bincol(Sepal.Length,"blue","yellow","red")), size=4) +
scale_color_identity("Sepal.Length", labels=labels,
breaks=breaks, guide="legend")
You could try the following, I have your example code modified appropriately below:
#I am not so great at R, so I'll just make a data frame this way
#I am convinced there are better ways. Oh well.
df<-data.frame()
for(x in 1:10){
for(y in 1:10){
newrow<-c(x,y,sample(1:1000,1))
df<-rbind(df,newrow)
}
}
colnames(df)<-c('X','Y','Val')
#This is the bit you want
p<- ggplot(df, aes(x=X,y=Y,fill=cut(Val, c(0,100,200,300,400,500,Inf))))
p<- p + geom_tile() + scale_fill_brewer(type="seq",palette = "YlGn")
p<- p + guides(fill=guide_legend(title="Legend!"))
#Tight borders
p<- p + scale_x_continuous(expand=c(0,0)) + scale_y_continuous(expand=c(0,0))
p
Note the strategic use of cut to discretize the data followed by the use of color brewer to make things pretty.
The result looks as follows.

Resources