I think I'm very close to getting this code done, but I'm missing something here.
I want to "combine" two plots into just one like this:
The first plot has this code:
ggplot(test, aes(y=key,x=value)) +
geom_path()+
coord_flip()
And the second one has this one below:
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
coord_flip()
This kind of multiple distributions plot are often seen in stats book when we read about normal distributions. The most useful link I've got so far was this one here.
Please use this code to reproduce my question:
library(tidyverse)
test <- data.frame(key = c("communication","gross_motor","fine_motor"),
value = rnorm(n=30,mean=0, sd=1))
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
coord_flip()
ggplot(test, aes(y=key,x=value)) +
geom_path(size=2)+
coord_flip()
Thanks much
You might be interested in ridgeline plots from the ggridges package.
Ridgeline plots are partially overlapping line plots that create the impression of a mountain range. They can be quite useful for visualizing changes in distributions over time or space.
library(tidyverse)
library(ggridges)
set.seed(123)
test <- data.frame(
key = c("communication", "gross_motor", "fine_motor"),
value = rnorm(n = 30, mean = 0, sd = 1)
)
ggplot(test, aes(x = value, y = key)) +
geom_density_ridges(scale = 0.9) +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Add median line:
ggplot(test, aes(x = value, y = key)) +
stat_density_ridges(quantile_lines = TRUE, quantiles = 2, scale = 0.9) +
coord_flip() +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Simulate a rug:
ggplot(test, aes(x = value, y = key)) +
geom_density_ridges(
jittered_points = TRUE,
position = position_points_jitter(width = 0.05, height = 0),
point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7,
) +
theme_ridges() +
NULL
#> Picking joint bandwidth of 0.525
Created on 2018-10-16 by the reprex package (v0.2.1.9000)
I think the easiest way to do this is with facet_wrap(). If you don't like the default appearance of the facets you can tweak them with theme(), e.g.:
ggplot(test, aes(x=value, fill=key)) +
geom_density() +
facet_wrap(~ key) +
coord_flip() +
theme(panel.spacing.x = unit(0, "mm"))
Result:
Related
I have an issue where I would like to calculate locations of y-axis labels in a large plot mad with facet_grid(). Let me show you what I mean using the mpg dataset.
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free")
You will notice that both axes use variable numbers of labels. Ignoring problems with the x-axis, I am interested in labelling only three values on the y-axis: 0, half of max value, and max value. It makes sense in my use-case, so here is what I tried.
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) + # extends y-axis to 0
scale_y_continuous(expand = expansion(mult = c(0, 0.1)), # prevents ggplot2 from extending beyond y = 0
n.breaks = 3) # Three axis labels, please.
The plot correctly starts at y = 0 and labels it correctly. However, remaining labels are poorly assigned, and have labels n = 2 and n = 4 instead of n = 3 for some reason.
If I could only directly calculate the label positions!
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
geom_blank(aes(y = 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
n.breaks = 3,
breaks = c(0, 0.5*max(hwy), 1*max(hwy))) # Maybe these formulas works?
Error in check_breaks_labels(breaks, labels) : object 'hwy' not found
I believe providing break points by this way should work, but that my syntax is bad. How do I access and work with the data underlying the plot? Alternatively, if this doesn't work, can I manually specify y-axis labels for each row of panels?
I could really use some assistance here, please.
If you want custom rules for breaks, the easiest thing is to use a function implementing those rules given the (panel) limits.
Below an example for labeling 0, the max and half-max.
library(ggplot2)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class, scales = "free") +
scale_y_continuous(expand = expansion(mult = c(0, 0.1)),
limits = c(0, NA), # <- fixed min, flexible max. replaces geom_blank
breaks = function(x){c(x[1], mean(x), x[2])})
You can remove scales free: Do you get then what you desire?
require(tidyverse)
ggplot(mpg, aes(x = displ)) +
geom_point(aes(y = hwy)) +
facet_grid(drv ~ class)
I'm struggling with the following issue:
I want to plot two histograms, but since the statistics of one of the two classes is much less than the other I need to add a second y-axis to allow a direct comparison of the values.
I report below the code I used at the moment and the result.
Thank you in advance!
ggplot(data,aes(x= x ,group=class,fill=class)) + geom_histogram(position="identity",
alpha=0.5, bins = 20)+ theme_bw()
Consider the following situation where you have 800 versus 200 observations:
library(ggplot2)
df <- data.frame(
x = rnorm(1000, rep(c(1, 2), c(800, 200))),
class = rep(c("A", "B"), c(800, 200))
)
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
# Note that y = stat(count) is the default behaviour
mapping = aes(y = stat(count)))
You could scale the counts for each group to a maximum of 1 by using y = stat(ncount):
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(ncount)))
Alternatively, you can set y = stat(density) to have the total area integrate to 1.
ggplot(df, aes(x, fill = class)) +
geom_histogram(bins = 20, position = "identity", alpha = 0.5,
mapping = aes(y = stat(density)))
Note that after ggplot 3.3.0 stat() probably will get replaced by after_stat().
How about comparing them side by side with facets?
ggplot(data,aes(x= x ,group=class,fill=class)) +
geom_histogram(position="identity",
alpha=0.5,
bins = 20) +
theme_bw() +
facet_wrap(~class, scales = "free_y")
I am creating a grouped boxplot with a scatterplot overlay using ggplot2. I would like to group each scatterplot datapoint with the grouped boxplot that it corresponds to.
However, I'd also like the scatterplot points to be different symbols. I seem to be able to get my scatterplot points to group with my grouped boxplots OR get my scatterplot points to be different symbols... but not both simultaneously. Below is some example code to illustrate what's happening:
library(scales)
library(ggplot2)
# Generates Data frame to plot
Gene <- c(rep("GeneA",24),rep("GeneB",24),rep("GeneC",24),rep("GeneD",24),rep("GeneE",24))
Clone <- c(rep(c("D1","D2","D3","D4","D5","D6"),20))
variable <- c(rep(c(rep("Day10",6),rep("Day20",6),rep("Day30",6),rep("Day40",6)),5))
value <- c(rnorm(24, mean = 0.5, sd = 0.5),rnorm(24, mean = 10, sd = 8),rnorm(24, mean = 1000, sd = 900),
rnorm(24, mean = 25000, sd = 9000), rnorm(24, mean = 8000, sd = 3000))
value <- sqrt(value*value)
Tdata <- cbind(Gene, Clone, variable)
Tdata <- data.frame(Tdata)
Tdata <- cbind(Tdata,value)
# Creates the Plot of All Data
# The below code groups the data exactly how I'd like but the scatter plot points are all the same shape
# and I'd like them to each have different shapes.
ln_clr <- "black"
bk_clr <- "white"
point_shapes <- c(0,15,1,16,2,17)
blue_cols <- c("#EFF2FB","#81BEF7","#0174DF","#0000FF","#0404B4")
lp1 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
pch=15)
lp1 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Grouped-Wrong Symbols.png")
#*************************************************************************************************************************************
# The below code doesn't group the scatterplot data how I'd like but the points each have different shapes
lp2 <- ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', position = position_dodge(width = .83), width = 0.25,
size = 0.7, coef = 4) +
geom_boxplot( coef=1, outlier.shape = NA, position = position_dodge(width = .83), lwd = 0.3,
alpha = 1, colour = ln_clr) +
geom_point(position = position_jitterdodge(dodge.width = 0.83), size = 1.8, alpha = 0.7,
aes(shape=Clone))
lp2 + scale_fill_manual(values = blue_cols) + labs(y = "Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand = c(0, 0), breaks = c(0.01,1,100,10000,100000),
labels = trans_format("log10", math_format(10^.x)))
ggsave("Scatter Ungrouped-Right Symbols.png")
If anyone has any suggestions I'd really appreciate it.
Thank you
Nathan
To get the boxplots to appear, the shape aesthetic needs to be inside geom_point, rather than in the main call to ggplot. The reason for this is that when the shape aesthetic is in the main ggplot call, it applies to all the geoms, including geom_boxplot. However, applying a shape=Clone aesthetic causes geom_boxplot to create a separate boxplot for each level of Clone. Since there's only one row of data for each combination of variable and Clone, no boxplot is produced.
That the shape aesthetic affects geom_boxplot seems counterintuitive to me, but maybe there's a reason for it that I'm not aware of. In any case, moving the shape aesthetic into geom_point solves the problem by applying the shape aesthetic only to geom_point.
Then, to get the points to appear with the correct boxplot, we need to group by Gene. I also added theme_classic to make it easier to see the plot (although it's still very busy):
ggplot(Tdata, aes(x=variable, y=value, fill=Gene)) +
stat_boxplot(geom ='errorbar', width=0.25, size=0.7, coef=4, position=position_dodge(0.85)) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, position=position_dodge(0.85)) +
geom_point(position=position_jitterdodge(dodge.width=0.85), size=1.8, alpha=0.7,
aes(shape=Clone, group=Gene)) +
scale_fill_manual(values=blue_cols) + labs(y="Fold Change") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x))) +
theme_classic()
I think the plot would be easier to understand if you use faceting for Gene and the x-axis for variable. Putting time on the x-axis seems more intuitive, while using facetting frees up the color aesthetic for the points. With six different clones, it's still difficult (for me at least) to differentiate the point markers, but this looks cleaner to me than the previous version.
library(dplyr)
ggplot(Tdata %>% mutate(Gene=gsub("Gene","Gene ", Gene)),
aes(x=gsub("Day","",variable), y=value)) +
stat_boxplot(geom='errorbar', width=0.25, size=0.7, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, colour=ln_clr, width=0.5) +
geom_point(aes(fill=Clone), position=position_jitter(0.2), size=1.5, alpha=0.7, shape=21) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y = "Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
If you really need to keep the points, maybe it would be better to separate the boxplots and points with some manual dodging:
set.seed(10)
ggplot(Tdata %>% mutate(Day=as.numeric(substr(variable,4,5)),
Gene = gsub("Gene","Gene ", Gene)),
aes(x=Day - 2, y=value, group=Day)) +
stat_boxplot(geom ='errorbar', width=0.5, size=0.5, coef=4) +
geom_boxplot(coef=1, outlier.shape=NA, lwd=0.3, alpha=1, width=4) +
geom_point(aes(x=Day + 2, fill=Clone), size=1.5, alpha=0.7, shape=21,
position=position_jitter(width=1, height=0)) +
theme_classic() +
facet_grid(. ~ Gene) +
labs(y="Fold Change", x="Day") +
expand_limits(y=c(0.01,10^5)) +
scale_y_log10(expand=c(0, 0), breaks=10^(-2:5),
labels=trans_format("log10", math_format(10^.x)))
One more thing: For future reference, you can simplify your data creation code:
Gene = rep(paste0("Gene",LETTERS[1:5]), each=24)
Clone = rep(paste0("D",1:6), 20)
variable = rep(rep(paste0("Day", seq(10,40,10)), each=6), 5)
value = rnorm(24*5, mean=rep(c(0.5,10,1000,25000,8000), each=24),
sd=rep(c(0.5,8,900,9000,3000), each=24))
Tdata = data.frame(Gene, Clone, variable, value)
Using ggplot2 1.0.0, I followed the instructions in below post to figure out how to plot percentage bar plots across factors:
Sum percentages for each facet - respect "fill"
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
library(ggplot2)
library(scales)
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
However, I cannot seem to get a label for either the total count or the percentage above each of the bar plots when using geom_text.
What is the correct addition to the above code that also preserves the percentage y-axis?
Staying within ggplot, you might try
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
geom_text(aes( label = format(100*..density.., digits=2, drop0trailing=TRUE),
y= ..density.. ), stat= "bin", vjust = -.5) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
For counts, change ..density.. to ..count.. in geom_bar and geom_text
UPDATE for ggplot 2.x
ggplot2 2.0 made many changes to ggplot including one that broke the original version of this code when it changed the default stat function used by geom_bar ggplot 2.0.0. Instead of calling stat_bin, as before, to bin the data, it now calls stat_count to count observations at each location. stat_count returns prop as the proportion of the counts at that location rather than density.
The code below has been modified to work with this new release of ggplot2. I've included two versions, both of which show the height of the bars as a percentage of counts. The first displays the proportion of the count above the bar as a percent while the second shows the count above the bar. I've also added labels for the y axis and legend.
library(ggplot2)
library(scales)
#
# Displays bar heights as percents with percentages above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
#
# Displays bar heights as percents with counts above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes(label = ..count.., y= ..prop..), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
The plot from the first version is shown below.
This is easier to do if you pre-summarize your data. For example:
library(ggplot2)
library(scales)
library(dplyr)
set.seed(25)
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
# Summarize to get counts and percentages
test.pct = test %>% group_by(test1, test2) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=test2, y=pct, colour=test2, fill=test2)) +
geom_bar(stat="identity") +
facet_grid(. ~ test1) +
scale_y_continuous(labels=percent, limits=c(0,0.27)) +
geom_text(data=test.pct, aes(label=paste0(round(pct*100,1),"%"),
y=pct+0.012), size=4)
(FYI, you can put the labels inside the bar as well, for example, by changing the last line of code to this: y=pct*0.5), size=4, colour="white"))
I've used all of your code and came up with this. First assign your ggplot to a variable i.e. p <- ggplot(...) + geom_bar(...) etc. Then you could do this. You don't need to summarize much since ggplot has a build function that gives you all of this already. I'll leave it to you for the formatting and such. Good luck.
dat <- ggplot_build(p)$data %>% ldply() %>% select(group,density) %>%
do(data.frame(xval = rep(1:6, times = 2),test1 = mapvalues(.$group, from = c(1,2), to = c("a","b")), density = .$density))
p + geom_text(data=dat, aes(x = xval, y = (density + .02), label = percent(density)), colour="black", size = 3)
I have a ggplot2 linegraph with two lines featuring significant overlap. I'm trying to use position_jitterdodge() so that they are more visible, but I can't get the lines and points to both jitter in the same way. I'm trying to jitter the points and line horizontally only (as I don't want to suggest any change on the y-axis). Here is an MWE:
## Create data frames
dimension <- factor(c("A", "B", "C", "D"))
df <- data.frame("dimension" = rep(dimension, 2),
"value" = c(20, 21, 34, 32,
20, 21, 36, 29),
"Time" = c(rep("First", 4), rep("Second", 4)))
## Plot it
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = position_jitterdodge(dodge.width = 0.45)) +
geom_point(position = position_jitterdodge(dodge.width = 0.45)) +
xlab("Dimension") + ylab("Value")
Which produces the ugly:
I've obviously got something fundamentally wrong here: What should I do to make the geom_point jitter follow the geom_line jitter?
Another option for horizontal only would be to specify position_dodge and pass this to the position argument for each geom.
pd <- position_dodge(0.4)
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = pd) +
geom_point(position = pd) +
xlab("Dimension") + ylab("Value")
One solution is to manually jitter the points:
df$value_j <- jitter(df$value)
ggplot(df, aes(dimension, value_j, shape=Time, linetype=Time, group=Time)) +
geom_line() +
geom_point() +
labs(x="Dimension", y="Value")
The horizontal solution for your discrete X axis isn't as clean (it's clean under the covers when ggplot2 does it since it handles the axis and point transformations for you quite nicely) but it's doable:
df$dim_j <- jitter(as.numeric(factor(df$dimension)))
ggplot(df, aes(dim_j, value, shape=Time, linetype=Time, group=Time)) +
geom_line() +
geom_point() +
scale_x_continuous(labels=dimension) +
labs(x="Dimension", y="Value")
On July 2017, developpers of ggplot2 have added a seed argument on position_jitter function (https://github.com/tidyverse/ggplot2/pull/1996).
So, now (here: ggplot2 3.2.1) you can pass the argument seed to position_jitter in order to have the same jitter effect in geom_point and geom_line (see the official documentation: https://ggplot2.tidyverse.org/reference/position_jitter.html)
Note that this seed argument does not exist (yet) in geom_jitter.
ggplot(data = df, aes(x = dimension, y = value,
shape = Time, linetype = Time, group = Time)) +
geom_line(position = position_jitter(width = 0.25, seed = 123)) +
geom_point(position = position_jitter(width = 0.25, seed = 123)) +
xlab("Dimension") + ylab("Value")