Change the smoothness on a volcano plot - r

I have produced a volcano plot, however the underlying data has gaps, i.e. the histogram data looks like:
When I produce the volcano plot it looks a bit silly:
Is it possible to apply a smoother to the shaded area to iron out the ribbed feature; surely, it must already have a smoothness associated with it, otherwise the shadow would drop back to 0 each time?
Code:
ggplot(fly2[fly2$Region == "different",], aes(x = Probability)) +
stat_density(aes(ymax = ..density.., ymin = -..density..),
fill = "grey50", colour = "grey50",
geom = "ribbon", position = "identity") +
facet_grid(. ~ Algorithm) + xlim(0,0.3) +
coord_flip()
link to the dput file:
http://pastebin.com/ba95WEab

Use adjust in geom_density.
For example, when I use adjust = 1.6, this is what I get
ggplot(fly2[fly2$Region == "different",], aes(x = Probability)) +
stat_density(aes(ymax = ..density.., ymin = -..density..),
fill = "grey50", colour = "grey50",
geom = "ribbon", position = "identity",
adjust=1.6) +
facet_grid(. ~ Algorithm) + xlim(0,0.3) +
coord_flip()

Related

Is there a way to change the symbol shape in a ggplot2 legend?

I made a plot using ggplot in R and wanted to have labels with the values that correspond with each of the points on the plot. I used the function geom_label_repel to get the labels on the plot, but it seems like it changed the legend to a letter instead of a dot, which I don't really like. Is there a way to change the shape of the symbol in the legend to a dot instead?
Here's the code for my plot:
ggplot(CI_bar_df_null, mapping = aes(x = model, y = R2, color = condition, group = condition)) +
geom_point() +
geom_line() +
geom_label_repel(
label = CI_bar_df_null$R2factor,
nudge_x = .3, nudge_y = 0) +
geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper), width = .3) +
#ylim(-.06, .6)
ggtitle('Model R-Squared Values and Confidence Intervals') +
ylab('R-Squared Value') +
xlab('Model Type') +
scale_color_discrete(name = "Condition") +
scale_shape_discrete(shape = 17)
You can simply add the show.legend = F argument inside geom_label_repel. This will avoid creating a legend for geom_label_repel and leave intact the other two legends (geom_point and geom_line).
geom_label_repel(
label = mtcars$cyl,
nudge_x = .3,
nudge_y = 0,
show.legend = F)

Difficulty adjusting color inn ggplot2 using r

I am trying to match plots I make in R to plots I make in python using matplotlib.
The current code I use does not match the custom colors I want to use correctly. What can I change to get this to work correctly?
My main concern is matching the colour of the lines.
I am trying to use scale fill manual. This changes the colours but not in the way I want.
ggplot(data = reactor.summarised.ci, aes(x=standard_time, y=value, group =
group, colour=group)) +
geom_line(size = 0.25)+
geom_ribbon(aes(x = standard_time, ymin = lower.ci.od, ymax = upper.ci.od),
show.legend =FALSE, alpha =0.2, colour = NA)+
ylab("O.D.")+
xlab("Time (min)")+
xlim(0, 350)+
ggtitle('OD Over Time in in Bioreactor 1.02 before adjustment')+
theme(plot.title = element_text(hjust = 0.5))+
scale_fill_manual(values=c("#1f77b4", "#ff7f0e", "#2ca02c", '#d62728'))+
newtheme
newtheme is defined as follows:
newtheme <- theme_classic()+
theme(plot.title = element_text(hjust = 0.5))
I want to assign the following colours to the figure legend, ribbon and line.
(ReactorA = "#1f77b4",
ReactorB = "#ff7f0e",
ReactorC = "#2ca02c",
Reactor '#d62728')
current plot generated
You are mapping group to color not fill. Therefore you have to use scale_color_manual to adjust the color of the lines. Try this:
ggplot(data = reactor.summarised.ci, aes(x=standard_time, y=value, group =
group, colour=group)) +
geom_line(size = 0.25)+
geom_ribbon(aes(x = standard_time, ymin = lower.ci.od, ymax = upper.ci.od),
show.legend =FALSE, alpha =0.2, colour = NA)+
ylab("O.D.")+
xlab("Time (min)")+
xlim(0, 350)+
ggtitle('OD Over Time in in Bioreactor 1.02 before adjustment')+
theme(plot.title = element_text(hjust = 0.5)) +
scale_color_manual(values=c("reactorA" = "#1f77b4", "reactorB" = "#ff7f0e", "reactorC" = "#2ca02c", "reactorD" = '#d62728'))+
newtheme

Add standard error as shaded area instead of errorbars in geom_boxplot

I have my boxplot and I added the mean with stat_summary as a line over the box plot. I want to add the standard error, but I don't want errorbar.
Basically, I want to add the standard error as shaded area, as you can do using geom_ribbon.
I used the PlantGrowth dataset to show you briefly what I've tried.
library(ggplot2)
ggplot(PlantGrowth, aes(group, weight))+
stat_boxplot( geom='errorbar', linetype=1, width=0.5)+
geom_boxplot(fill="yellow4",colour="black",outlier.shape=NA) +
stat_summary(fun.y=mean, colour="black", geom="line", shape=18, size=1,aes(group=1))+
stat_summary(fun.data = mean_se, geom = "errorbar")
I did it using geom_errorbar in stat_summary, and tried to substitute geom_errorbar with geom_ribbon, as I saw in some other examples around the web, but it doesn't work.
Something like this one, but with the error as shaded area instead of error bars (which make it a bit confusing to see)
Layering so many geoms becomes hard to read, but here's a simplified version with a few options. Aside from just paring things down a bit to see what I was editing, I added a tile as a summary geom; tile is similar to rect, except it assumes it will be centered at whatever its x value is, so you don't need to worry about the x-axis placement that geom_rect requires. You might experiment with fill colors and opacity—I made the boxplots white just to illustrate better.
library(ggplot2)
gg <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot(fill = "white", outlier.shape = NA, width = 0.7) +
stat_summary(aes(group = 1), fun.y = mean, geom = "line")
gg +
stat_summary(fun.data = mean_se, geom = "tile", width = 0.7,
fill = "pink", alpha = 0.6)
Based on your comments that you want a ribbon, you could instead use a ribbon with group = 1 the same as for the line.
gg +
stat_summary(aes(group = 1), fun.data = mean_se, geom = "ribbon",
fill = "pink", alpha = 0.6)
The ribbon doesn't make a lot of sense across a discrete variable, but here's an example with some dummy data for a continuous group, where this setup becomes more reasonable (though IMO still hard to read).
pg2 <- PlantGrowth
set.seed(123)
pg2$cont_group <- floor(runif(nrow(pg2), 1, 6))
ggplot(pg2, aes(x = cont_group, y = weight, group = cont_group)) +
stat_boxplot(geom = "errorbar", width = 0.5) +
geom_boxplot(fill = "white", outlier.shape = NA, width = 0.7) +
stat_summary(aes(group = 1), fun.y = mean, geom = "line") +
stat_summary(aes(group = 1), fun.data = mean_se, geom = "ribbon",
fill = "pink", alpha = 0.6)

Preventing wrong density plots when coloring histograms according to groups

based on some dummy data I created a histogram with desity plot
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))
a + geom_histogram(aes(y = ..density..,
# color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
The histogram of weight shall be colored corresponding to sex, so I use aes(y = ..density.., color = sex) for geom_histogram():
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
# aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
As I want it to, the density plot stays the same (overall for both groups), but the histograms jump scale up (and seem to be treated individually now):
How do I prevent this from happening? I need individually colored histogram bars but a joint density plot for all coloring groups.
P.S.
Using aes(color = sex) for geom_density() gets everything back to original scales - but I don't want individual density plots (like below):
a + geom_histogram(aes(y = ..density..,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
aes(color = sex)
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
EDIT:
As it has been suggested, dividing by the number of groups in geom_histogram()'s aesthetics with y = ..density../2 may approximate the solution. Nevertheless, this only works with symmetric distributions like in the first output below:
a + geom_histogram(aes(y = ..density../2,
color = sex
),
colour="black",
fill="white",
position = "identity") +
geom_density(alpha = 0.2,
) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))
which yields
Less symmetric distributions, however, may cause trouble using this approach. See those below, where for 5 groups, y = ..density../5 was used. First original, then manipulation (with position = "stack"):
Since the distribution is heavy on the left, dividing by 5 underestimates on the left and overestimates on the right.
EDIT 2: SOLUTION
As suggested by Andrew, the below (complete) code solves the problem:
library(ggplot2)
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each = 200)),
weight = c(rnorm(200, 55), rnorm(200, 58))
)
binwidth <- 0.25
a <- ggplot(wdata,
aes(x = weight,
# Pass binwidth to aes() so it will be found in
# geom_histogram()'s aes() later
binwidth = binwidth))
# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
binwidth = binwidth,
colour = "black",
fill = "white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25))
# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
# binwidth will only be found if passed to
# ggplot()'s aes() (as above)
y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "stack") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
# Use fixed scale for sake of comparability
scale_x_continuous(limits = c(52, 61)) +
scale_y_continuous(limits = c(0, 0.25)) +
guides(color = FALSE)
Note:
binwidth = binwidth needed to be passed to ggplot()'s aes(), otherwise the pre-specified binwidth would not be found by geom_histogram()'s aes(). Further, position = "stack" is specified, so that both versions of the histogram are comparable. Plots for dummy data and the more complex distribution below:
Solved - Thanks for your help!
I don't think you can do it using y=..density.., but you can recreate the same thing like this...
binwidth <- 0.25 #easiest to set this manually so that you know what it is
a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
color = sex),
binwidth = binwidth,
fill="white",
position = "identity") +
geom_density(alpha = 0.2) +
scale_color_manual(values = c("#868686FF", "#EFC000FF"))

Try to draw a line with abline?

ggplot(G, aes(x = State, y = Score, fill = State)) +
geom_bar(stat = "identity", position = "identity", width = 0.5) +
scale_y_continuous(labels = scales::comma) +
coord_flip()
This the code I am using and I am trying to add a line at the score of 236 so how to do it and how to improve the chart in general and any edit or suggestion are always welcome.
Just use:
geom_hline(yintercept = 236)
Might be worth while to reorder your y axis and use fill = Score. It would make the plot look something like this:
df %>%
ggplot(aes(reorder(State, -Assault), Assault, fill = Assault)) +
geom_col(width = 0.75, aes(fill = Assault)) +
labs(x = "State") +
geom_hline(yintercept = 200, size = 1) +
coord_flip() +
theme_classic()
You can use geom_vline() to do this. Since you have so many bars you will want to apply the vline after your geom_bar() so that it will show up on top of your bars (rather than underneath where you can only barely see it).
ggplot(G, aes(x = State, y = Score, fill = State)) +
geom_bar(stat = "identity", position = "identity", width = 0.5) +
geom_hline(yintercept=236, color="#000000", linetype="solid") +
scale_y_continuous(labels = scales::comma) +
coord_flip()
All I've done there is add the third line to your example above. ggplot2 always confuses me with horizontal and vertical especially when you do things like coord_flip(). I think I've got it correct (even though it looks wrong it's because of the flip), but if I'm mistaken and the line comes out horizontal replace that third line above with this:
geom_vline(xintercept=236, color="#000000", linetype="solid") +
and note the only two changes, which are that vline becomes hline and xintercept becomes yintercept.

Resources