To make it clear, I am looking for a simple way of adding a 90-degree-rotated histogram or density plot whose x-axis aligns with the y-axis of the example plot given below.
library(ggplot2)
library(tibble)
x <- seq(100)
y <- rnorm(100)
my_data <- tibble(x = x, y = y)
ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line()
Created on 2019-01-28 by the reprex package (v0.2.1)
I'd try it with either geom_histogram or geom_density, the patchwork library, and dynamically setting limits to match the plots.
Rather than manually setting limits, get the range of y-values, set that as the limits in scale_y_continuous or scale_x_continuous as appropriate, and add some padding with expand_scale. The first plot is the line plot, and the second and third are distribution plots, with the axes flipped. All have the scales set to match.
library(ggplot2)
library(tibble)
library(patchwork)
y_range <- range(my_data$y)
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) +
geom_line() +
scale_y_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_hist <- ggplot(my_data, aes(x = y)) +
geom_histogram(binwidth = 0.2) +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
p2_dens <- ggplot(my_data, aes(x = y)) +
geom_density() +
coord_flip() +
scale_x_continuous(limits = y_range, expand = expand_scale(mult = 0.1))
patchwork allows you to simply add plots to each other, then add the plot_layout function where you can customize the layout.
p1 + p2_hist + plot_layout(nrow = 1)
p1 + p2_dens + plot_layout(nrow = 1)
I've generally seen these types of plots where the distribution is shown in a "marginal" plot—that is, setup to be secondary to the main (in this case, line) plot. The ggExtra package has a marginal plot, but it only seems to work where the main plot is a scatterplot.
To do this styling manually, I'm setting theme arguments on each plot inline as I pass them to plot_layout. I took off the axis markings from the histogram so its left side is clean, and shrunk the margins on the sides of the two plots that meet. In plot_layout, I'm scaling the widths so the histogram appears more in the margins of the line chart. The same could be done with the density plot.
(p1 +
theme(plot.margin = margin(r = 0, unit = "pt"))
) +
(p2_hist +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = margin(l = 0, unit = "pt"))
) +
plot_layout(nrow = 1, widths = c(1, 0.2))
Created on 2019-01-28 by the reprex package (v0.2.1)
You can try using geom_histogram or geom_density, however it's a little bit complicated as you have to rotate axis for them (while keeping original orientation for geom_line). I would use geom_violin (which is a density plot, but mirrored). If you want to get only one sided violin plot you can use custom geom_flat_violin geom. It was first posted by #David Robinson on his gists.
I used this geom in different answer, however I don't think that it's a duplicate as you need to put it at the end of the plot and combine with different geom.
Final code is:
library(ggplot2)
ggplot(data.frame(x = seq(100), y = rnorm(100))) +
geom_flat_violin(aes(100, y), color = "red", fill = "red", alpha = 0.5, width = 10) +
geom_line(aes(x, y))
geom_flat_violin code:
library(dplyr)
"%||%" <- function(a, b) {
if (!is.null(a)) a else b
}
geom_flat_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
position = "dodge", trim = TRUE, scale = "area",
show.legend = NA, inherit.aes = TRUE, ...) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomFlatViolin,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
trim = trim,
scale = scale,
...
)
)
}
GeomFlatViolin <-
ggproto(
"GeomFlatViolin",
Geom,
setup_data = function(data, params) {
data$width <- data$width %||%
params$width %||% (resolution(data$x, FALSE) * 0.9)
# ymin, ymax, xmin, and xmax define the bounding rectangle for each group
data %>%
dplyr::group_by(.data = ., group) %>%
dplyr::mutate(
.data = .,
ymin = min(y),
ymax = max(y),
xmin = x,
xmax = x + width / 2
)
},
draw_group = function(data, panel_scales, coord)
{
# Find the points for the line to go all the way around
data <- base::transform(data,
xminv = x,
xmaxv = x + violinwidth * (xmax - x))
# Make sure it's sorted properly to draw the outline
newdata <-
base::rbind(
dplyr::arrange(.data = base::transform(data, x = xminv), y),
dplyr::arrange(.data = base::transform(data, x = xmaxv), -y)
)
# Close the polygon: set first and last point the same
# Needed for coord_polar and such
newdata <- rbind(newdata, newdata[1,])
ggplot2:::ggname("geom_flat_violin",
GeomPolygon$draw_panel(newdata, panel_scales, coord))
},
draw_key = draw_key_polygon,
default_aes = ggplot2::aes(
weight = 1,
colour = "grey20",
fill = "white",
size = 0.5,
alpha = NA,
linetype = "solid"
),
required_aes = c("x", "y")
)
You could use egg::ggarrange(). So basically what you want is this:
p <- ggplot(data=my_data, mapping=aes(x=x, y=y)) +
geom_line() + ylim(c(-2, 2))
q <- ggplot(data=my_data, mapping=aes(x=y)) +
geom_histogram(binwidth=.05) + coord_flip() + xlim(c(-2, 2))
egg::ggarrange(p, q, nrow=1)
Result
Data
set.seed(42)
my_data <- data.frame(x=seq(100), rnorm(100))
my_data1 <- count(my_data, vars=c("y"))
p1 <- ggplot(data = my_data, mapping = aes(x = x, y = y)) + geom_line()
p2 <- ggplot(my_data1,aes(x=freq,y=y))+geom_line()+theme(axis.title.y = element_blank(),axis.text.y = element_blank())
grid.draw(cbind(ggplotGrob(p1), ggplotGrob(p2), size = "last"))
I have a dataset with some outliers, such as the following
x <- rnorm(1000,0,20)
x <- c(x, 500, -500)
If we plot this on a linear x axis scale at this we see
histogram(x)
I worked out a nice way to put it on a log scale using this useful thread:
how to use a log scale for y-axis of histogram in R? :
mat <- data.frame(x)
ggplot(ee, aes(x = xx)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10()
However, I would like the x axis labels from this 2nd example to match that of the first example, except with a kind of "negative log" - i.e. first tick (moving from the centre to the left) could be -1, then the next could be -10, the next -100, but all equidistant. Does that make sense?
I am not sure I understand your goal, but when you want a log-like transformation yet have zeroes or negative values, the inverse hyperbolic sine transformation asinh() is often a good option. It is log-like for large values and is defined for all real values. See Rob Hyndman's blog and this question on stats.stackexchange.com for discussion, details, and other options.
If this is an acceptable approach, you can create a custom scale for ggplot. The code below demonstrates how to create and use a custom scale (with custom breaks), along with a visualization of the asinh() transformation.
library(ggplot2)
library(scales)
limits <- 100
step <- 0.005
demo <- data.frame(x=seq(from=-1*limits,to=limits,by=step))
asinh_trans <- function(){
trans_new(name = 'asinh', transform = function(x) asinh(x),
inverse = function(x) sinh(x))
}
ggplot(demo,aes(x,x))+geom_point(size=2)+
scale_y_continuous(trans = 'asinh',breaks=c(-100,-50,-10,-1,0,1,10,50,100))+
theme_bw()
ggplot(demo,aes(x,x))+geom_point(size=2)+
scale_x_continuous(trans = 'asinh',breaks=c(0,1,10,50,100))+
scale_y_log10(breaks=c(0,1,10,50,100))+ # zero won't plot
xlab("asinh() scale")+ylab("log10 scale")+
theme_bw()
Realizing that the question is fairly old, I decided to answer it anyway since I ran into exactly the same problem.
I see that some answers above misunderstood your original question. I think it is a valid visualization question and I outline below my solution that will hopefully be useful for others as well.
My approach was to use ggplot and create custom log transform for x and y axis (as well as custom break generators)
library(ggplot2)
library(scales)
# Create custom log-style x axis transformer (...,-10,-3,-1,0,1,3,10,...)
custom_log_x_trans <- function()
trans_new("custom_log_x",
transform = function (x) ( sign(x)*log(abs(x)+1) ),
inverse = function (y) ( sign(y)*( exp(abs(y))-1) ),
domain = c(-Inf,Inf))
# Custom log x breaker (...,-10,-3,-1,0,1,3,10,...)
custom_x_breaks <- function(x)
{
range <- max(abs(x), na.rm=TRUE)
return (sort( c(0,
sapply(0:log10(range), function(z) (10^z) ),
sapply(0:log10(range/3), function(z) (3*10^z) ),
sapply(0:log10(range), function(z) (-10^z) ),
sapply(0:log10(range/3), function(z) (-3*10^z) )
)))
}
# Create custom log-style y axis transformer (0,1,3,10,...)
custom_log_y_trans <- function()
trans_new("custom_log_y",
transform = function (x) ( log(abs(x)+1) ),
inverse = function (y) ( exp(abs(y))-1 ),
domain = c(0,Inf))
# Custom log y breaker (0,1,3,10,...)
custom_y_breaks <- function(x)
{
max_y <- length(x)
range <- max(abs(max_y), na.rm=TRUE)
return (sort( c(0,
sapply(0:log10(range), function(z) (10^z) ),
sapply(0:log10(range/3), function(z) (3*10^z) )
)))
}
ggplot(data=mat) +
geom_histogram(aes(x=x,fill=..count..),
binwidth = 1, color="black", size=0.1) +
scale_fill_gradient("Count", low = "steelblue", high = "red") +
coord_trans(x="custom_log_x",y="custom_log_y") +
scale_x_continuous(breaks = custom_x_breaks(mat$x)) +
scale_y_continuous(breaks = custom_y_breaks(mat$x)) +
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
theme_bw()
which gives me the following plot.
Note that:
the plot also includes coloring scheme to show visually the absolute value of each bar.
the bins become increasingly thinner as x increases (side effect of log-transform)
In either case, the two outliers are clearly visible.
I found a way to cheat on it. I say "cheat", because it actually plot negative and positive parts of the data separately. Thus you can not compare the negative and positive data. But only can show the distribution of negative and positive parts separately.
And one of the problem is if there is zero values in your data, it will not be shown in the plot.
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
quartz();
dist1 <- ggplot(data=df.meltFUAC) +
geom_point(alpha=1,aes(x=deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness,
colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10",
limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,0,0,-11),"mm"))
dist2 <- ggplot(data=df.meltFUAC, aes(x=-deltaU.deltaUltrasensitivity,y=deltaF.deltaFitness,
colour=deltaF.w_c)) +
geom_point(alpha=1) +
scale_x_continuous(name = expression(Delta * sqrt(S[ult] %.% S[amp])),limits=c(1,1e-7),
trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
labels=c("-1e-01","-1e-03","-1e-05")) +
scale_y_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10",
limits = c(1e-7,1), breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
theme_bw() +
theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),
panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,-8,0,2.5),"mm"))
hist0 <- ggplot(data=df.meltFUAC, aes(deltaF.deltaFitness,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(paste(Delta, " Fitness")),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(0,5,2.5,-2.5),"mm")) +
coord_flip()
hist1 <- ggplot(data=df.meltFUAC, aes(deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),
limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),
labels=c("1e-01","1e-03","1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
axis.line.x=element_line(colour="black",size=1,linetype="solid"),
panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(5,0,-2.5,2),"mm"))
hist2 <- ggplot(data=df.meltFUAC, aes(-deltaU.deltaUltrasensitivity,fill=deltaF.w_c)) +
#geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') +
geom_density(alpha = 0.5, aes(colour=deltaF.w_c)) +
scale_x_continuous(name = expression(Delta * S[ult]),limits=c(1,1e-7),
trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),
labels=c("-1e-01","-1e-03","-1e-05")) +
scale_y_continuous(name = "Density", limits=c(0,0.6)) +
theme_bw() +
theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),
axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),
axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.title.x=element_blank(),
axis.line.y=element_line(colour="black",size=1,linetype="solid"),
axis.line.x=element_line(colour="black",size=1,linetype="solid"),
panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),
plot.margin=unit(c(5,-8,-2.5,2.5),"mm"))
grid.newpage();
pushViewport(viewport(layout = grid.layout(3, 3, widths = unit(c(4,4,2),"null"),
heights=unit(c(2,7.5,0.5),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(dist2, vp = vplayout(2, 1));
print(dist1, vp = vplayout(2, 2));
print(hist2, vp = vplayout(1, 1));
print(hist1, vp = vplayout(1, 2));
print(hist0, vp = vplayout(2, 3));
grid.text(expression(Delta * Ultrasensitivity),vp = vplayout(3,1:2),x = unit(0.55, "npc"),
y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
dev.copy2pdf(file=sprintf("%s/_dist/dist_hist_deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();
Here is the graph it got (but you need to manually to put the legend on):
Or a simpler one:
reverselog_trans <- function(base = exp(1)) {
trans <- function(x) -log(x, base)
inv <- function(x) base^(-x)
trans_new(paste0("reverselog-", format(base)), trans, inv,
log_breaks(base = base),
domain = c(1e-100, Inf))
}
quartz();
hist1 <- ggplot(deltaF, aes(deltaFitness,fill=w_c)) + guides(fill=guide_legend(title=expression(omega[c]))) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Fitness")),trans = "log10");
hist1 <- hist1 + scale_y_continuous(name = "Density", limits=c(0,1));
#hist1 <- hist1 + theme(panel.background=element_blank(),panel.border=element_blank(),axis.line.x=element_blank(),axis.line.y=element_line(colour="black",linetype="solid",size=1),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + theme_bw();
hist1 <- hist1 + theme(strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(5,5,0,5),"mm"));
hist1 <- hist1 + scale_color_discrete(name=expression(omega[c]));# + geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
hist2 <- ggplot(deltaU, aes(deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1e-7,1),trans = "log10",breaks=c(1e-01,1e-03,1e-05),labels=c("1e-01","1e-03","1e-05"));
hist2 <- hist2 + scale_y_continuous(name = "Density",limits=c(0,1)) ;#+ geom_vline(xintercept=0, colour="grey", size = 1);# + geom_hline(yintercept=0, colour="grey", size = 0.5);
#hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
hist2 <- hist2 + theme_bw();
hist2 <- hist2 + theme(legend.position = "none", axis.title.x=element_blank(),strip.background=element_blank(),panel.border=element_rect(colour = "black"),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,5,0,-7.5),"mm"));
# + ggtitle("Positive part")
hist3 <- ggplot(deltaU, aes(-deltaUltrasensitivity,fill=w_c)) + geom_histogram(alpha = 0.5, aes(y=..density..),position = 'identity') + geom_density(alpha = 0.05, aes(colour=w_c)) + scale_x_continuous(name = expression(paste(Delta, " Ultrasensitivity")), limits=c(1,1e-7),trans = reverselog_trans(10),breaks=c(1e-01,1e-03,1e-05),labels=c("-1e-01","-1e-03","-1e-05"));
hist3 <- hist3 + scale_y_continuous(name = "Density", limits=c(0,1));# + geom_hline(yintercept=0, colour="black", size = 0.5);
#hist3 <- hist3 + theme(legend.position = "none",panel.background=element_blank(),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
hist3 <- hist3 + theme_bw();
hist3 <- hist3 + theme(legend.position = "none",strip.background=element_blank(),panel.border=element_rect(colour = "black"),axis.text.y=element_blank(), axis.ticks.y=element_blank(), axis.title.y=element_blank(),axis.line.y=element_line(colour="black",size=1,linetype="solid"),axis.title.x=element_blank(),panel.grid.major=element_blank(),panel.grid.minor=element_blank(),plot.background=element_blank(),plot.margin=unit(c(0,-7.5,0,5),"mm"));
# + ggtitle("Negative part")
grid.newpage();
pushViewport(viewport(layout = grid.layout(4, 2, widths = unit(c(5,5),"null"),heights=unit(c(4.6,0.4,4.6,0.4),"null"))));
vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y);
print(hist1, vp = vplayout(1, 1:2)); # key is to define vplayout
grid.text(expression(paste(Delta, " Fitness")),vp = vplayout(2,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
print(hist3, vp = vplayout(3, 1));
print(hist2, vp = vplayout(3, 2));
grid.text(expression(paste(Delta, " Ultrasensitivity")),vp = vplayout(4,1:2),x = unit(0.5, "npc"), y = unit(0.9, "npc"),gp=gpar(fontsize=12, col="black"));
dev.copy2pdf(file=sprintf("%s/deltaF_deltaU_wc_01vs10.pdf", resultDir));
dev.off();
Here is the graph I got:
Why suffer with ggplot2 solution? Your first plot was done with lattice histogram function, and this is where you should stay. Just apply logarithmic transformation directly within histogram function, use nint argument to specify the number of histogram bins, and type argument to choose between "count", or "density". I think that you got everything you need there, but maybe I'm missing some crucial detail of your question...
library(lattice)
histogram(log10(x), nint=50, type="count")