I'm trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient, and log10's them.
Here's the code:
library(ggplot2)
set.seed(1)
df <- data.frame(id=paste("ID",1:1000,sep="."),val=rnorm(1000),stringsAsFactors=F)
bins <- 10
cols <- c("darkblue","darkred")
colGradient <- colorRampPalette(cols)
cut.cols <- colGradient(bins)
df$cut <- cut(df$val,bins)
df$cut <- factor(df$cut,level=unique(df$cut))
Then,
ggplot(data=df,aes_string(x="val",y="..count..+1",fill="cut"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(df$cut))+
scale_fill_manual(values=cut.cols,labels=levels(df$cut))+
scale_y_log10()
gives:
whereas dropping the fill from the aesthetics:
ggplot(data=df,aes_string(x="val",y="..count..+1"))+
geom_histogram(show.legend=FALSE)+
scale_color_manual(values=cut.cols,labels=levels(cuts))+
scale_fill_manual(values=cut.cols,labels=levels(cuts))+
scale_y_log10()
gives:
Any idea why do the histogram bars differ between the two plots and to make the first one similar to the second one?
The OP is trying to produce a histogram with ggplot's geom_histogram which colors the bars according to a gradient...
The OP has already done the binning (with 10 bins) but is then calling geom_histogram() which does a binning on its own using 30 bins by default (see ?geomhistogram).
When geom_bar() is used instead together with cutinstead of val
ggplot(data = df, aes_string(x = "cut", y = "..count..+1", fill = "cut")) +
geom_bar(show.legend = FALSE) +
scale_color_manual(values = cut.cols, labels = levels(df$cut)) +
scale_fill_manual(values = cut.cols, labels = levels(df$cut)) +
scale_y_log10()
the chart becomes:
Using geom_histogram() with filled bars is less straightforward as can be seen in this and this answer to the question How to fill histogram with color gradient?
Related
I want to make a line chart in plotly so that it does not have the same color on its whole length. The color is given continuous scale. It is easy in ggplot2 but when I translate it to plotly using ggplotly function the variable determining color behaves like categorical variable.
require(dplyr)
require(ggplot2)
require(plotly)
df <- data_frame(
x = 1:15,
group = rep(c(1,2,1), each = 5),
y = 1:15 + group
)
gg <- ggplot(df) +
aes(x, y, col = group) +
geom_line()
gg # ggplot2
ggplotly(gg) # plotly
ggplot2 (desired):
plotly:
I found one work-around that, on the other hand, behaves oddly in ggplot2.
df2 <- df %>%
tidyr::crossing(col = unique(.$group)) %>%
mutate(y = ifelse(group == col, y, NA)) %>%
arrange(col)
gg2 <- ggplot(df2) +
aes(x, y, col = col) +
geom_line()
gg2
ggplotly(gg2)
I also did not find a way how to do this in plotly directly. Maybe there is no solution at all. Any ideas?
It looks like ggplotly is treating group as a factor, even though it's numeric. You could use geom_segment as a workaround to ensure that segments are drawn between each pair of points:
gg2 = ggplot(df, aes(x,y,colour=group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y)))
gg2
ggplotly(gg2)
Regarding #rawr's (now deleted) comment, I think it would make sense to have group be continuous if you want to map line color to a continuous variable. Below is an extension of the OP's example to a group column that's continuous, rather than having just two discrete categories.
set.seed(49)
df3 <- data_frame(
x = 1:50,
group = cumsum(rnorm(50)),
y = 1:50 + group
)
Plot gg3 below uses geom_line, but I've also included geom_point. You can see that ggplotly is plotting the points. However, there are no lines, because no two points have the same value of group. If we hadn't included geom_point, the graph would be blank.
gg3 <- ggplot(df3, aes(x, y, colour = group)) +
geom_point() + geom_line() +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
gg3
ggplotly(gg3)
Switching to geom_segment gives us the lines we want with ggplotly. Note, however, that line color will be based on the value of group at the first point in the segment (whether using geom_line or geom_segment), so there might be cases where you want to interpolate the value of group between each (x,y) pair in order to get smoother color gradations:
gg4 <- ggplot(df3, aes(x, y, colour = group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y))) +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
ggplotly(gg4)
I have a simple problem. How to plot histogram with ggplot2 with fixed binwidth and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width() to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram() twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2. Then, geom_col() cam be used instead of geom_histogram():
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks is plotted on the x-axis instead of levels(myData2) to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2 color palette is used instead of rainbow().
This question already has answers here:
What is the simplest method to fill the area under a geom_freqpoly line?
(4 answers)
Closed 6 years ago.
I am plotting a continuous variable in X-axis against the the corresponding counts (not the density) in the Y-axis using ggplot2.
This is my code
p <- ggplot(matched.frame, aes(x = AGE, color = as.factor(DRUG_KEY))) + geom_freqpoly(binwidth=5)
p1 <- p + theme_minimal()
plot(p1)
This produces a graph like this this:
I want the areas under these lines to be filled with colors and with little bit of transparency. I know to do this for density plots in ggplot2, but I am stuck with this frequency polygon.
Also, how do I change the legends on the right side? For example, I want 'Cases' instead of 26 and Controls instead of '27'. Instead of as.factor(DRUG_KEY), I want it to appear as 'Colors"
Sample data
matched.frame <- data.frame("AGE"=c(18,19,20,21,22,23,24,25,26,26,27,18,19,20,24,23,23,23,22,30,28,89,30,20,23))
matched.frame$DRUG_KEY <- 26
matched.frame$DRUG_KEY[11:25] <- 27
You can use geom_ribbon to fill the area under the curves and scale_fill_discrete (fill color) as well as scale_color_discrete (line color) to change the legend labels:
library(ggplot2)
set.seed(1)
df <- data.frame(x = 1:10, y = runif(20), f = gl(2, 10))
ggplot(df, aes(x=x, ymin=0, ymax=y, fill=f)) +
geom_ribbon(, alpha=.5) +
scale_fill_discrete(labels = c("1"="foo", "2"="bar"), name = "Labels")
With regards to your edit:
ggplot(matched.frame, aes(x=AGE, fill=as.factor(DRUG_KEY), color=as.factor(DRUG_KEY))) +
stat_bin(aes(ymax=..count..,), alpha=.5, ymin=0, geom="ribbon", binwidth =5, position="identity", pad=TRUE) +
geom_freqpoly(binwidth=5, size=2) +
scale_fill_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels") +
scale_color_discrete(labels = c("26"="foo", "27"="bar"), name = "Labels")
Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())
I have a simplified dataframe
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,1,2,1,1,1,3))
ggplot(df,aes(x=wins))+geom_histogram(binwidth=0.5,fill="red")
I would like to get the final value in the sequence,3, shown with either a different fill or alpha. One way to identify its value is
tail(df,1)$wins
In addition, I would like to have the histogram bars shifted so that they are centered over the number. I tried unsuccesfully subtracting from the wins value
You can do this with a single geom_histogram() by using aes(fill = cond).
To choose different colours, use one of the scale_fill_*() functions, e.g. scale_fill_manual(values = c("red", "blue").
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
df$cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins, fill = cond)) +
geom_histogram() +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins) +
scale_fill_manual(values = c("red", "blue"))
1) To draw bins in different colors you can use geom_histogram() for subsets.
2) To center bars along numbers on the x axis you can invoke scale_x_continuous(breaks=..., labels=...)
So, this code
library(ggplot2)
df <- data.frame(wins=c(1,1,3,1,1,2,11,2,11,15,1,1,3))
cond <- df$wins == tail(df,1)$wins
ggplot(df, aes(x=wins)) +
geom_histogram(data=subset(df,cond==FALSE), binwidth=0.5, fill="red") +
geom_histogram(data=subset(df,cond==TRUE), binwidth=0.5, fill="blue") +
scale_x_continuous(breaks=df$wins+0.25, labels=df$wins)
produces the plot: