ordering y axis ggplot2() - r

I'm trying to reorder the y axis numbers in a ggplot2 graph (see the example below). By default it is ordered by only considering the first number and I want it reordered in ascending order.
plot <- ggplot(top.OTUs.abun.melt, aes(C, test, size = SA)) +
geom_point(aes(size = SA / 110), shape = 21) +
scale_size_identity(trans = "sqrt", breaks = c(100, 1000, 5000, 20000)) +
theme(panel.grid.major = element_line(linetype = 2, color = "black", size = 0.025),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.2)) +
scale_y_discrete(expand = c(0, 2.5))
plot2 <- plot + guides(colour = guide_legend(override.aes = list(size = 5)))
plot2

no no, that's not the solution here. You are not plotting factors; you're plotting numerics. This is what you want to do before plotting:
top.OTUs.abun.melt$test <- as.numeric(as.character(top.OTUs.abun.melt$test))

Related

ggplot2 plot points when y value in NA

I have a dataframe where the x value (discrete) is present and want to include on the x-axis in the plot; however, its y value is NA
I still want to show the x value even though y is NA. Is there a way to do this in ggplot2?
Currently, it simply skips the first two rows that has the NA value.
ggplot(tChartDF()[['df']], aes(
x = factor(tChartDF()[['df']][['Rare event date']], levels = unique(tChartDF()[['df']][['Rare event date']])),
y = unlist(tChartDF()[['df']][['days_between']]),
)) +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleCL']]), color = input$tChartCLColour, lwd = input$tChartCLWidth) +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleUL']]), linetype = 'dashed', lwd = 1, color = 'red') +
geom_hline(yintercept = unlist(tChartDF()[['timeScaleLL']]), linetype = 'dashed', lwd = 1, color = 'red') +
scale_x_discrete(expand = c(0,0)) +
theme_classic() +
geom_line(aes(group = 1), lwd = input$tChartLineWidth, color= input$tChartLineColour) +
geom_point(size = input$tChartMarkerSize, color = input$tChartMarkerColour) +
labs(title = input$tChartPlotTitle, x = input$tChartPlotXLabel, y = input$tChartPlotYLabel) +
theme(
plot.title = element_text(size = 24, face = 'bold', family = 'Arial', hjust = 0.5),
plot.margin = margin(0, 1, 0, 0, "cm"),
axis.title = element_text(size = 20, face = 'bold', family = 'Arial'),
axis.text = element_text(size = 16, face = 'bold', family = 'Arial'),
axis.text.x = element_text(angle = as.numeric(input$tChartXOrientation), vjust = 0.5),
axis.ticks.length = unit(.25, 'cm'),
) +
coord_cartesian(clip = 'off')
As seen, it only starts plotting at date: 2022/12/15 (ignoring the previous values in the table) columns y and mr have the NA values.
For the plot, I only care about the first two columns (Rare events and days_between). I tried selecting only those two columns and plotting but it still ignores the first two rows.
Desired result:
If we start with something like
mt <- mtcars
mt$mpg[c(3,6,9)] <- NA
and plot the line as in
ggplot(mt, aes(disp, mpg)) +
geom_line()
we don't see the missing points (not a surprise). We can add them this way:
transform(mt, mpg2 = ifelse(is.na(mpg), max(mpg, na.rm = TRUE) + 1, mpg)) |>
ggplot(aes(disp, mpg)) +
geom_line() +
geom_point(aes(y = mpg2), data = ~ subset(., is.na(mpg)), shape = 1)
This can easily be adapted to be at the bottom, using different shapes/colors, perhaps even on an explicitly-gray background (top/bottom ribbon).

ggplot: Modify histogram plot

I made this plot using the following code:
ggplot(all, aes(x = year, color = layer)) +
geom_histogram(binwidth = 0.5, fill = "white", alpha = 0.5, position = "dodge") +
scale_x_continuous(breaks = pretty(all$year)) +
scale_color_discrete(name = "title", labels = c("A","B")) +
theme_light() +
theme(panel.grid.minor = element_blank(), panel.grid.major = element_blank(),
text = element_text(size = 20),
axis.title.x = element_text(margin = margin(t = 25, r = 0, b = 0, l = 0)),
axis.title.y = element_text(margin = margin(t = 0, r = 25, b = 0, l = 0)),
axis.text.x = element_text(angle = 50, hjust = 1, size = 18, color = "black"),
axis.text.y = element_text(size = 18, color = "black"))
I would now like to change the colors first, using colors from the viridis palette. Furthermore, there are blue and red strokes between the histograms, which I would like to remove.
Could someone help me to change the code?
Thanks in advance!
Test Data:
year <- runif(10, 2014, 2021)
year <- round(year, 0)
layer <- sample(c("A","B"), size=10, replace=T)
all <- as.data.frame(year,layer)
Seems like you want a bar plot not a histogram.
all <- data.frame(year,layer) ## fix the sample data creation
ggplot(all, aes(x = year, fill = layer)) + ## I think fill looks better...
geom_bar(position = position_dodge(preserve = "single")) + ## bar, not histogram
#scale_x_continuous(breaks = pretty(all$year)) + ## this line just confirmed defaults
scale_fill_viridis_d() +
theme_light() ## omitted the rest of the theme as irrelevant for the issue at hand
If you do want outline color, not fill, switching to geom_bar "fixes" the strokes between the bars:
ggplot(all, aes(x = year, color = layer)) +
geom_bar(position = position_dodge(preserve = "single"), fill = NA) +
scale_color_viridis_d() +
theme_light()
Thank you, this is helpful information!

R ggridges plot - Showing y axis ticks and labels

I am trying to generate overlay density plots over time, comparing densities of males vs. females. Here is my output:
I am following the Australian athletes height example from https://cran.r-project.org/web/packages/ggridges/vignettes/gallery.html.
Here is my code:
ggplot(math_dat, aes(x = order_math, y = time, color = gender, point_color = gender, fill = gender)) +
geom_density_ridges(
jittered_points = TRUE, scale = .95, rel_min_height = .01,
point_shape = "|", point_size = 3, size = 0.25,
position = position_points_jitter(height = 0)
) +
scale_y_discrete(expand = c(0, 0)) +
scale_x_continuous(expand = c(0, 0), name = "Rankings") +
scale_fill_manual(values = c("#D55E0050", "#0072B250"), labels = c("female", "male")) +
scale_color_manual(values = c("#D55E00", "#0072B2"), guide = "none") +
scale_discrete_manual("point_color", values = c("#D55E00", "#0072B2"), guide = "none") +
coord_cartesian(clip = "off") +
guides(fill = guide_legend(
override.aes = list(
fill = c("#D55E00A0", "#0072B2A0"),
color = NA, point_color = NA)
)
) +
ggtitle("Ranks over time") +
theme_ridges(center = TRUE)
My problem is I am unable to generate any Y axis tick values and the example doesn't display any either. Any ideas how to get Y axis tick marks to display?
Here is some sample data similar to mine:
## generating dataset
order_math<-c(1,2,1,2,3,3,1,2,3,1,2,3)
gender<-c("M","F","M","M","M","F","F","M","F","M","M","F")
time<-c(1,1,2,3,3,2,1,2,3,2,3,1)
sample<-data.frame(order_math,gender,time)
UPdate:
After #Tomasu's suggestions I have updated my code, but it does not run:
ggplot(math_dat, aes(x = order_math, y = time, color = gender, point_color = gender, fill = gender)) +
geom_density_ridges(
jittered_points = TRUE, scale = .95, rel_min_height = .01,
point_shape = "|", point_size = 3, size = 0.25,
position = position_points_jitter(height = 0)
) +
scale_y_reverse(limits = c(1000, 500, 100),expand = c(0, 0)) +
scale_x_continuous(expand = c(0, 0), name = "Rankings") +
scale_fill_manual(values = c("#D55E0050", "#0072B250"), labels = c("female", "male")) +
scale_color_manual(values = c("#D55E00", "#0072B2"), guide = "none") +
scale_discrete_manual("point_color", values = c("#D55E00", "#0072B2"), guide = "none") +
coord_cartesian(clip = "off") +
guides(fill = guide_legend(
override.aes = list(
fill = c("#D55E00A0", "#0072B2A0"),
color = NA, point_color = NA)
)
) +
ggtitle("Ranks over time") +
theme_ridges(center = TRUE)+
theme(
axis.ticks = element_line(size=0.5), # turn ticks back on
axis.ticks.length = grid::unit(5, "pt"), # set length
axis.ticks.y = element_line(colour = "red"), # define tick line color
axis.text.y = element_text(vjust = .4) # center text with tick
)
An easy solution to this problem would be to use a theme_ that includes the axis ticks as theme_ridges() has them turned off. Just removing that theme all together and using the base ggplot2 theme achieves the desired outcome.
However, let's say we still want to use theme_ridges() and just turn ticks back on. This can be achieved with a theme() edit after the theme_ridges().
I'm using the example in the link provided as I couldn't get your sample data to work properly.
library(ggplot2)
library(ggplot2movies)
library(ggridges)
ggplot(movies[movies$year>1912,], aes(x = length, y = year, group = year)) +
geom_density_ridges(scale = 10, size = 0.25, rel_min_height = 0.03) +
theme_ridges() +
theme(
axis.ticks = element_line(size=0.5), # turn ticks back on
axis.ticks.length = grid::unit(5, "pt"), # set length
axis.ticks.y = element_line(colour = "red"), # define tick line color
axis.text.y = element_text(vjust = .4) # center text with tick
) +
scale_x_continuous(limits = c(1, 200), expand = c(0, 0)) +
scale_y_reverse(
breaks = c(2000, 1980, 1960, 1940, 1920, 1900),
expand = c(0, 0)
) +
coord_cartesian(clip = "off")
Created on 2021-05-11 by the reprex package (v1.0.0)
I think your problem is that you need to specify the group.
Related thread: geom_density_ridges requires the following missing aesthetics: y
Extending on code from user tomasu's answer +1
library(ggridges)
library(ggplot2)
order_math<-c(1,2,1,2,3,3,1,2,3,1,2,3)
gender<-c("M","F","M","M","M","F","F","M","F","M","M","F")
time<-c(1,1,2,3,3,2,1,2,3,2,3,1)
sample<-data.frame(order_math,gender,time)
ggplot(sample, aes(x = order_math, y = time, group = time,
color = gender, point_color = gender, fill = gender)) +
geom_density_ridges() +
theme(
axis.ticks = element_line(size=0.5), # turn ticks back on
axis.ticks.length = grid::unit(5, "pt"), # set length
axis.ticks.y = element_line(colour = "red"), # define tick line color
axis.text.y = element_text(vjust = .4) # center text with tick
)
#> Picking joint bandwidth of 0.555
Created on 2021-05-12 by the reprex package (v2.0.0)

ggplot2 Create shaded area with gradient below curve

I would like to create the plot below using ggplot.
Does anyone know of any geom that create the shaded region below the line chart?
Thank you
I think you're just looking for geom_area. However, I thought it might be a useful exercise to see how close we can get to the graph you are trying to produce, using only ggplot:
Pretty close. Here's the code that produced it:
Data
library(ggplot2)
library(lubridate)
# Data points estimated from the plot in the question:
points <- data.frame(x = seq(as.Date("2019-10-01"), length.out = 7, by = "month"),
y = c(2, 2.5, 3.8, 5.4, 6, 8.5, 6.2))
# Interpolate the measured points with a spline to produce a nice curve:
spline_df <- as.data.frame(spline(points$x, points$y, n = 200, method = "nat"))
spline_df$x <- as.Date(spline_df$x, origin = as.Date("1970-01-01"))
spline_df <- spline_df[2:199, ]
# A data frame to produce a gradient effect over the filled area:
grad_df <- data.frame(yintercept = seq(0, 8, length.out = 200),
alpha = seq(0.3, 0, length.out = 200))
Labelling functions
# Turns dates into a format matching the question's x axis
xlabeller <- function(d) paste(toupper(month.abb[month(d)]), year(d), sep = "\n")
# Format the numbers as per the y axis on the OP's graph
ylabeller <- function(d) ifelse(nchar(d) == 1 & d != 0, paste0("0", d), d)
Plot
ggplot(points, aes(x, y)) +
geom_area(data = spline_df, fill = "#80C020", alpha = 0.35) +
geom_hline(data = grad_df, aes(yintercept = yintercept, alpha = alpha),
size = 2.5, colour = "white") +
geom_line(data = spline_df, colour = "#80C020", size = 1.2) +
geom_point(shape = 16, size = 4.5, colour = "#80C020") +
geom_point(shape = 16, size = 2.5, colour = "white") +
geom_hline(aes(yintercept = 2), alpha = 0.02) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
panel.border = element_blank(),
axis.line.x = element_line(),
text = element_text(size = 15),
plot.margin = margin(unit(c(20, 20, 20, 20), "pt")),
axis.ticks = element_blank(),
axis.text.y = element_text(margin = margin(0,15,0,0, unit = "pt"))) +
scale_alpha_identity() + labs(x="",y="") +
scale_y_continuous(limits = c(0, 10), breaks = 0:5 * 2, expand = c(0, 0),
labels = ylabeller) +
scale_x_date(breaks = "months", expand = c(0.02, 0), labels = xlabeller)

Log scale on y axis but data have negative values

I am trying to create a boxplot with a log y axis as I have some very small values and then some much higher values which do not work well in a boxplot with a continuous y axis. However, I have negative values which obviously do not work with a log scale. I was wondering if there was a way around this so that I can display my data on a boxplot which is still easy to interpret but has a more appropriate scale on the y axis.
p <- ggplot(data = Elstow.monthly.fluxes, aes(x = Month1, y = CH4.Flux)) + stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) + geom_boxplot() +
xlab(expression("Month")) + ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(breaks = seq(-5000,40000,5000), limits = c(-5000,40000))+
theme(axis.text.x = element_text(colour = "black")) + theme(axis.text.y = element_text(colour =
"black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+ theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = 0, linetype ="dashed", colour = "black")
While you could indeed use the secondary axis to get the labels you want as Zhiqiang suggests, you could also use a transformation that fits your needs.
Consider the following skewed boxplots:
df <- data.frame(
x = rep(letters[1:2], each = 500),
y = rlnorm(1000) - 2
)
ggplot(df, aes(x, y)) +
geom_boxplot()
Instead, you could use the pseudo-log transformation to visualise your data:
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = scales::pseudo_log_trans())
Alternatively, you could make any transformation you want. I personally like the inverse hyperbolic sine transformation, which is very much like the pseudo-log:
asinh_trans <- scales::trans_new(
"inverse_hyperbolic_sine",
transform = function(x) {asinh(x)},
inverse = function(x) {sinh(x)}
)
ggplot(df, aes(x, y)) +
geom_boxplot() +
scale_y_continuous(trans = asinh_trans)
I have a silly solution: trick the secondary axis to re-scale y axis. I do not have your data, just made up some numbers for the purpose of demonstration.
First convert y values as logy = log(y + 5000). When generating the graph, transform the values back to the original scale. I borrow the second axis to display the values. I am pretty sure others may have more elegant ways to do this.
I was lazy for not trying to find the right way to remove the primary y axis tick labels, just used breaks = c(0).
df<-data.frame(y = runif(33, min=-5000, max=40000),
x = rep(c("Aug", "Sep", "Oct"),33))
library(tidyverse)
df$logy = log(df$y+5000)
p <- ggplot(data = df, aes(x = x, y = logy)) +
stat_boxplot(geom = "errorbar", linetype = 1, width = 0.5) +
geom_boxplot() +
xlab(expression("Month")) +
ylab(expression(~CH[4]~Flux~(µg~CH[4]~m^{-2}~d^{-1}))) +
scale_y_continuous(sec.axis = sec_axis(~(exp(.) -5000),
breaks = c(-4000, 0, 5000, 10000, 20000, 40000)),
breaks = c(0))+
theme(axis.text.x = element_text(colour = "black")) +
theme(axis.text.y = element_text(colour = "black")) +
theme(panel.background = element_rect("white", "black")) +
theme(panel.border = element_rect(colour = "black", fill=NA, size=0.5)) +
theme(axis.text = element_text(size = 12))+
theme(axis.title = element_text(size = 14))+
theme(axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0))) +
theme(axis.title.x = element_text(margin = margin(t = 15, r = 0, b = 0, l = 0))) +
geom_hline(yintercept = log(5000), linetype ="dashed", colour = "black")
p
coord_trans() is applied after the statistics are calculated (unlike scale). This can be combined with the pseudo_log_trans to cope with negatives.
library(plotly)
set.seed(1234)
dat <- data.frame(cond = factor(rep(c("A","B"), each=200)), rating = c(rnorm(200),rnorm(200, mean=500)))
pseudoLog <- scales::pseudo_log_trans(base = 10)
p <- ggplot(dat, aes(x=cond, y=rating)) + geom_boxplot() + coord_trans(y=pseudoLog)

Resources