Log scale on y axis but data have negative values - r

I am trying to create a boxplot with a log y axis as I have some very small values and then some much higher values which do not work well in a boxplot with a continuous y axis. However, I have negative values which obviously do not work with a log scale. I was wondering if there was a way around this so that I can display my data on a boxplot which is still easy to interpret but has a more appropriate scale on the y axis.
p <- ggplot(data = Elstow.monthly.fluxes, aes(x = Month1, y = CH4.Flux)) + stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) + geom_boxplot() +
xlab(expression("Month")) + ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(breaks = seq(-5000,40000,5000), limits = c(-5000,40000))+
theme(axis.text.x = element_text(colour = "black")) + theme(axis.text.y = element_text(colour =
"black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+ theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = 0, linetype ="dashed", colour = "black")

While you could indeed use the secondary axis to get the labels you want as Zhiqiang suggests, you could also use a transformation that fits your needs.
Consider the following skewed boxplots:
df <- data.frame(
x = rep(letters[1:2], each = 500),
y = rlnorm(1000) - 2
)
ggplot(df, aes(x, y)) +
geom_boxplot()
Instead, you could use the pseudo-log transformation to visualise your data:
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = scales::pseudo_log_trans())
Alternatively, you could make any transformation you want. I personally like the inverse hyperbolic sine transformation, which is very much like the pseudo-log:
asinh_trans <- scales::trans_new(
"inverse_hyperbolic_sine",
transform = function(x) {asinh(x)},
inverse = function(x) {sinh(x)}
)
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = asinh_trans)

I have a silly solution: trick the secondary axis to re-scale y axis. I do not have your data, just made up some numbers for the purpose of demonstration.
First convert y values as logy = log(y + 5000). When generating the graph, transform the values back to the original scale. I borrow the second axis to display the values. I am pretty sure others may have more elegant ways to do this.
I was lazy for not trying to find the right way to remove the primary y axis tick labels, just used breaks = c(0).
df<-data.frame(y = runif(33, min=-5000, max=40000),
x = rep(c("Aug", "Sep", "Oct"),33))
library(tidyverse)
df$logy = log(df$y+5000)
p <- ggplot(data = df, aes(x = x, y = logy)) +
stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) +
geom_boxplot() +
xlab(expression("Month")) +
ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(sec.axis = sec_axis(~(exp(.) -5000),
breaks = c(-4000, 0, 5000, 10000, 20000, 40000)),
breaks = c(0))+
theme(axis.text.x = element_text(colour = "black")) +
theme(axis.text.y = element_text(colour = "black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+
theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = log(5000), linetype ="dashed", colour = "black")
p

coord_trans() is applied after the statistics are calculated (unlike scale). This can be combined with the pseudo_log_trans to cope with negatives.
library(plotly)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=500)))
pseudoLog <- scales::pseudo_log_trans(base = 10)
p <- ggplot(dat, aes(x=cond, y=rating)) + geom_boxplot() + coord_trans(y=pseudoLog)

Related

Add an additional legend according to the colors of x axis labels

I have modified the colors of my x axis labels according to their group.
For that, I have used the following pseudocode:
library(ggsci)
library(ggplot2)
x_cols = pal_jco()(length(unique(melted_df$Group)))
names(x_cols) = unique(melted_df$Group)
ggplot(melted_df, ... + theme(axis.text.x = element_text(colour = x_cols))
I would like to add a legend to the plot (if possible, outside the plot), that explains the colouring of the x axis labels.
melted_df dataframe looks like this:
Here the full code:
#Generate color mapping
x_cols = pal_jco()(length(unique(melted_df$Group)))
names(x_cols) = unique(melted_df$Group)
melted_df$mycolors = sapply(as.character(melted_df$Group), function(x) x_cols[x])
#Plot
ggplot(melted_df, aes(fill=variable, y=value, x=fct_inorder(id))) +
geom_bar(position="stack", stat = "identity") + ggtitle("Barplot") +
theme_bw() +
xlab("samples") + ylab("Counts") +
theme(axis.title.y=element_text(size=10), axis.title.x=element_text(size=10),
plot.title = element_text(face = "bold", size = (15), hjust = 0.5),
axis.text.x = element_text(distinct(samples_melt[c("id", "mycolors")])$mycolors)) +
guides(fill=guide_legend(title="Columns"))
In the absence of a reproducible example, here is how you might do it with the built-in iris data set:
library(ggplot2)
ggplot(iris, aes(Species, Sepal.Length)) +
stat_summary(fun = mean, geom = "col", aes(fill = Species)) +
geom_point(aes(color = Species), alpha = 0, key_glyph = draw_key_text) +
theme_bw(base_size = 20) +
labs(color = "") +
guides(color = guide_legend(override.aes = list(alpha = 1, size = 8))) +
theme(axis.text.x = element_text(color = scales::hue_pal()(3), face = 2))
I addressed the issue using Legend() constructor, provided by ComplexHeatmap library.
I first used the code provided above under the EDIT section, and then I added the following code in order to draw an additional legend explaining the x-axis colouring.
lgd = Legend(labels = names(x_cols), title = "Group", labels_gp = gpar(fontsize = 8), nrow = 1, legend_gp = gpar(fill = x_cols))
draw(lgd, x = unit(1.8, "cm"), y = unit(0.3, "cm"), just = c("left", "bottom"))

Creating ggplot geom_point() with position dodge 's-shape'

I am trying to create a plot like the one below. I'd like the order the points in each category in such a way that they form an s-shape. Is it possible to do this in ggplot?
Similar data available here
What I have so far:
somatic.variants <- read.delim("data/Lawrence.S2.txt", stringsAsFactors=T)
cancer_rates <- tapply(somatic.variants$logn_coding_mutations, somatic.variants$tumor_type, median)
cancer_rates <- cancer_rates[order(cancer_rates, decreasing=F)]
somatic.variants$tumor_type <- factor(somatic.variants$tumor_type, levels = names(cancer_rates))
library(ggplot2)
library(GGally)
ggplot(data = somatic.variants,
mapping = aes(x = tumor_type,
y = log10(n_coding_mutations))) +
geom_point(position = position_dodge2()) +
scale_x_discrete(position = "top") +
scale_y_continuous(labels = c(0,10,100,1000,10000), expand = c(0,0)) +
geom_stripped_cols() +
theme_bw() +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 315, hjust = 1, size = 12),
panel.grid = element_blank()) +
labs(y = "Coding mutations count") +
stat_summary(fun = median,
geom="crossbar",
size = 0.25,
width = 0.9,
group = 1,
show.legend = FALSE,
color = "#FF0000")
This could be achieved by
grouping the data by x-axis categories
arranging by the y-axis value
which ensures that the points are plotted in ascending order of the values for each category.
somatic.variants <- read.delim("https://gist.githubusercontent.com/wudustan/57deecdaefa035c1ecabf930afde295a/raw/1594d51a1e3b52f674ff746caace3231fd31910a/Lawrence.S2.txt", stringsAsFactors=T)
cancer_rates <- tapply(somatic.variants$logn_coding_mutations, somatic.variants$tumor_type, median)
cancer_rates <- cancer_rates[order(cancer_rates, decreasing=F)]
somatic.variants$tumor_type <- factor(somatic.variants$tumor_type, levels = names(cancer_rates))
library(ggplot2)
library(GGally)
library(dplyr)
somatic.variants <- somatic.variants %>%
group_by(tumor_type) %>%
arrange(n_coding_mutations)
ggplot(data = somatic.variants,
mapping = aes(x = tumor_type,
y = log10(n_coding_mutations))) +
geom_point(position = position_dodge2(.9), size = .25) +
scale_x_discrete(position = "top") +
scale_y_continuous(labels = c(0,10,100,1000,10000), expand = c(0,0)) +
geom_stripped_cols() +
theme_bw() +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 315, hjust = 1, size = 12),
panel.grid = element_blank()) +
labs(y = "Coding mutations count") +
stat_summary(fun = median,
geom="crossbar",
size = 0.25,
width = 0.9,
group = 1,
show.legend = FALSE,
color = "#FF0000")
#> Warning: Removed 29 rows containing non-finite values (stat_summary).

Add the quantilies to each line, ggplot2

I am trying to shade the 0.025 and 0.975 quantiles on this graph that has three lines. I have tried geom_area, geom_ribbon, and I cannot highlight every quantile in every line.
Please note that "y" was ignored in this density graph.
example <-data.frame(source=c("Leaflitter","Leaflitter","Leaflitter","Leaflitter",
"Leaflitter","Leaflitter","Leaflitter","Leaflitter","Leaflitter","Leaflitter",
"Biofilm","Biofilm","Biofilm","Biofilm","Biofilm","Biofilm","Biofilm","Biofilm",
"Biofilm","Biofilm","Algae","Algae","Algae","Algae","Algae","Algae","Algae","Algae",
"Algae","Algae"), n=c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10),
density=c(0.554786934, 0.650578421, 0.039317168, 0.53537613,0.435081982,0.904056941,0.556284164,0.855319434,
0.399169622,0.570246304,0.076722032,0.257427999,0.172736928,0.447424473,0.520976948,0.011720494,0.311348655,
0.120698996,0.016336661,0.331741377, 0.368491034,0.09199358,0.787945904,0.017199397,0.04394107,
0.084222564,0.132367181,0.023981569,0.584493716,0.098012319))
example
One subgroup and quantiles
L <- filter(QPA_G_Feb17, source == "Leaflitter")
L <-as.data.frame(L)
Lq025 <- quantile(L$density, .025)
Lq975 <- quantile(L$density, .975)
ggplot(QPA_G_Feb17, aes(x=density, color=source)) +
labs(y="Density", x="Sorce contribution") +
geom_density(aes(linetype = source), size=1.2) +
scale_color_manual(values=c("#31a354", "#2c7fb8", "#d95f0e")) +
scale_linetype_manual(values = c("solid", "dotted", "longdash")) +
theme_classic()+
ylim(0, 5)+
theme(axis.text.y=element_text(angle=0, size=12, vjust=0.5, color="black")) +
theme(axis.text.x =element_text(angle=0, size=12, vjust=0.5, color="black")) +
theme(axis.title.x = element_text(color="black", size=14))+
theme(axis.title.y = element_text(color="black", size=14))
I would appreciate your help since I have looked in other forums, and there is information to highlight when there is only 1 line.
I think this data is a bit more representative of the data displayed in your plot:
set.seed(50)
QPA_G_Feb17 <- data.frame(density = c(rgamma(400, 2, 10),
rgamma(400, 2.25, 9),
rgamma(400, 5, 7)),
source = rep(c("Algae", "Biofilm", "Leaflitter"),
each = 400))
I find that when you are trying to do something complex or non-standard in ggplot, the best thing to do is calculate the data you wish to plot ahead of time. In this case, we can calculate the density curves and the cumulative densities, including their 0.025 and 0.975 quantiles, and putting them all in a data frame like this:
dens <- lapply(split(QPA_G_Feb17, QPA_G_Feb17$source),
function(x) density(x$density, from = 0, to = 1))
df <- do.call(rbind, mapply(function(x, y) {
data.frame(x = x$x, y = x$y, source = y)
}, dens, names(dens), SIMPLIFY = FALSE))
df <- df %>%
group_by(source) %>%
mutate(cdf = cumsum(y * mean(diff(x))),
lower = cdf < 0.025,
upper = cdf > 0.975)
Now it is easy to plot using geom_area:
ggplot(df, aes(x, y, color = source)) +
geom_area(data = df[df$lower,], aes(fill = source), alpha = 0.5,
position = "identity") +
geom_area(data = df[df$upper,], aes(fill = source), alpha = 0.5,
position = "identity") +
labs(y = "Density", x = "Source contribution") +
geom_line(aes(linetype = source), size = 1.2) +
scale_fill_manual(values = c("#31a354", "#2c7fb8", "#d95f0e")) +
scale_color_manual(values = c("#31a354", "#2c7fb8", "#d95f0e")) +
scale_linetype_manual(values = c("solid", "dotted", "longdash")) +
theme_classic() +
ylim(0, 5) +
xlim(0, 1) +
theme(axis.text.y = element_text(size = 12, vjust = 0.5),
axis.text.x = element_text(size = 12, vjust = 0.5),
axis.title.x = element_text(size = 14),
axis.title.y = element_text(size = 14))
Here, the 2.5% and 97.5% extremeties of each density curve are shaded below each line. The exception is in the "Leaflitter` line, which clearly extends out of the 0-1 range that has been plotted in your example.

Label a barplot by number of values with positive and negative bars

I have a barplot (stat=identity) and would like to label the bars with N (the number of values of every bar). I have a problem with positive and negative bars. One resolution could be to make the labels of the positive bars in white oder to write them at the top.
ggplot(AP_Group, aes(Labels, Mean))+
geom_bar(stat = "identity") +
theme_minimal() +
ggtitle(expression(paste("Air Pressure - C_PM"[1], " - All Season"))) +
xlab("Air Pressure [hPa]") +
ylab("Shap Value") +
geom_text(aes(label=N), color="black", size=3.5, vjust = 1.8) +
theme(plot.title = element_text(color="black", size=14, margin = margin(t = 0, r = 0, b = 15, l = 0)),
axis.title.x = element_text(color="black", size=14, margin = margin(t = 15, r = 0, b = 0, l = 0)),
axis.title.y = element_text(color="black", size=14, margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(plot.title = element_text(hjust=0.5))
One way of doing it would to make vjust an aesthetic like this...
df <- tibble(x = c(1, 2), y = c(-1, 2)) #sample data
df %>% ggplot(aes(x=x, y=y)) +
geom_bar(stat = "identity") +
geom_text(aes(label = y, vjust = -sign(y)))
If you really want to color them conditionally to the sign, here a possible solution:
# fake df
df <- data.frame(Mean = c(1,3,-5),Labels = c("a","b","c"))
# here you decide to put white or black conditionally
df$color <- ifelse(df$Mean > 0, 'white','black')
library(ggplot2)
ggplot(df, aes(Labels, Mean))+
geom_bar(stat = "identity") +
theme_minimal() +
ggtitle(expression(paste("Air Pressure - C_PM", " - All Season"))) +
xlab("Air Pressure [hPa]") +
ylab("Shap Value") +
# here in aes() you put the color made
geom_text(aes(label=Mean, color=color), size=3.5, vjust = 1.8) +
# here you define the colors (it means "white" could be the color you want)
scale_color_manual(values = c("black" = "black", "white" = "white"))+
# you can remove the useless legend
theme(legend.position="none")

How do I reverse the order in which error bars appear using geom_errorbar() in R?

I have the following example plot:
test <- data.frame("Factor" = as.factor(c("O", "C", "A")),
b = c(0.18, .34, .65, -.13, .38, .26),
lower95 = c(-.1, .09, .34, -.52, .10, -.02),
upper95 = c(.48, .58, .98, .26, .67, .56),
group = factor(c("Experiment 1","Experiment 2")))
test$Factor <- factor(test$Factor, as.character(test$Factor))
test$group <- factor(test$group, as.character(test$group))
ggplot(test, aes(Factor, b, colour = group)) +
geom_errorbar(aes(ymin = lower95, ymax = upper95),
size = 1,
width = .5,
position = 'dodge') +
geom_hline(yintercept = 0) +
ylim(-1.25, 1.25) +
coord_flip() +
theme_bw() +
ggtitle("Title") +
theme(
axis.text=element_text(size = 20),
axis.title=element_text(size = 18),
plot.title = element_text(size = 20, face = "bold"),
axis.text.y=element_text(size = 12)
)
As you'll see, the error bars appear in the reverse order (from top to bottom) as they do in the legend. I would like Experiment 1 error bars to appear above Experiment 2 error bars.
I have tried
ggplot(test, aes(Factor, b, colour = forcats::fct_rev(groups)
But this reverses the order of the group labels in the legend, not the order of the colours in the legend–which is what would work. I have also tried reversing the order in which I enter them in the data frame and this does not solve the problem.
I would appreciate some help!
Re-factoring will change the order of the plot, but, as you saw, also changes the order of the legend. In addition to reversing the levels of group, you can reverse the order the legend is displayed with the reverse argument in guide_legend.
ggplot(test, aes(Factor, b, colour = forcats::fct_rev(group))) +
geom_errorbar(aes(ymin = lower95, ymax = upper95),
size = 1,
width = .5,
position = 'dodge') +
geom_hline(yintercept = 0) +
ylim(-1.25, 1.25) +
coord_flip() +
theme_bw() +
ggtitle("Title") +
theme(
axis.text=element_text(size = 20),
axis.title=element_text(size = 18),
plot.title = element_text(size = 20, face = "bold"),
axis.text.y=element_text(size = 12)
) +
guides(color = guide_legend(reverse = TRUE) )
If you are using scale_color_discrete or scale_color_manual to control other scale elements like the legend name, you can use guide_legend there instead of via guides.
+
scale_color_discrete(name = "Experiment", guide = guide_legend(reverse = TRUE) )
Do you mean something like this?
test$Factor <- factor(test$Factor, levels = rev(levels(test$Factor)));
test$group <- factor(test$group, levels = rev(levels(test$group)));
ggplot(test, aes(Factor, b, colour = group)) +
geom_errorbar(aes(ymin = lower95, ymax = upper95),
size = 1,
width = .5,
position = 'dodge') +
geom_hline(yintercept = 0) +
ylim(-1.25, 1.25) +
coord_flip() +
theme_bw() +
ggtitle("Title") +
theme(
axis.text=element_text(size = 20),
axis.title=element_text(size = 18),
plot.title = element_text(size = 20, face = "bold"),
axis.text.y=element_text(size = 12)
)
I'm not entirely clear on whether you want to reverse the ordering of test$Factor as well; just (un)comment the corresponding line depending on what you're after.

Resources