How to label only the modal peak in a geom_col plot - r

I'd like to put a label above only the modal bar (the tallest peak) on my geom_col plot, giving the x-axis value (CAG). Here's an example, but I can only get it to label every peak.
x <- seq(-20, 20, by = .1)
y <- dnorm(x, mean = 5.0, sd = 1.0)
z <- data.frame(CAG = 1:401, height = y)
ggplot(z, aes(x=CAG, y=height)) +
geom_col() +
geom_text(aes(label = CAG))
I'd be very grateful for help with labelling only the top peak

Just subset your dataset in geom_text to keep only the maximal value of y:
ggplot(z, aes(x=CAG, y=height)) +
geom_col() +
geom_text(data = subset(z, y == max(y)), aes(label = CAG))

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

How to I customise the values on the Y axis

I am using ggplot2 in R to create a histogram and I would like to customise the values on the y axis. At present the values on the Y axis range from one and have an interval of 3. I would like to make all the values on the on the y axis visible i.e. 1,2,3 and so on.
How do I do this?
plot_2 <-
ggplot(Tennis, aes(x=winner)) +
geom_bar(data = subset(top_wins, tournament == "French Open")) +
ggtitle("French Open")
You can use the scale_y_continuous() function. Below is an example where the y axis will go from 0 to 20.
ggplot() + geom_point(data = iris, aes(x = Petal.Width, y = Petal.Length,color = Species)) +
+ scale_y_continuous(limits = c(0, 20), breaks = seq(0, 20, by = 1)

r ggplot when two colors overlap

I have some codes to generate a plot,the only problem I have is there're many overlapping colors.
When two colors overlap, how do I specify the dominant color?
For example, there're 4 black points when indicator = threshold. They are at 4 x-axis correspondingly. However, the black points at "Wire" and "ACH" scales do not show up because it is overlap with blue points. The black point at "RDFI" scale barely shows up. How can I make black as the dominant color when two colors overlap? Thanks ahead!
ggplot(df, aes(a-axis, y-axis), color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE) +
labs(title= 'chart', x='x-axis', y= 'y-axis') +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000" ))
for specify the dominant color you should use the function new_scale () and its aliases new_scale_color () and new_scale_fill ().
As an example, lets overlay some measurements over a contour map of topography using the beloed volcano
library(ggplot2)
library(ggnewscale)
# Equivalent to melt(volcano)
topography <- expand.grid(x = 1:nrow(volcano),
y = 1:ncol(volcano))
topography$z <- c(volcano)
# point measurements of something at a few locations
set.seed(42)
measurements <- data.frame(x = runif(30, 1, 80),
y = runif(30, 1, 60),
thing = rnorm(30))
dominant point:
ggplot(mapping = aes(x, y)) +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
# Color scale for topography
scale_color_viridis_c(option = "D") +
# geoms below will use another color scale
new_scale_color() +
geom_point(data = measurements, size = 3, aes(color = thing)) +
# Color scale applied to geoms added after new_scale_color()
scale_color_viridis_c(option = "A")
dominant contour:
ggplot(mapping = aes(x, y)) +
geom_point(data = measurements, size = 3, aes(color = thing)) +
scale_color_viridis_c(option = "A")+
new_scale_color() +
geom_contour(data = topography, aes(z = z, color = stat(level))) +
scale_color_viridis_c(option = "D")
Your problem may not lie with what color is dominant. You have selected colors that will show up often. You may be losing the bottom of your Y axis. The code you have in your example can not have possibly produced that plot it has errors.
Here is a simple example that show's one way to overcome your problem by simply overplottting the threshold points after you have plotted the beeswarm.
library(dplyr)
library(ggbeeswarm)
distro <- data.frame(
'variable'=rep(c('runif','rnorm'),each=1000),
'value'=c(runif(2000, min=-3, max=3))
)
distro$indicator <- "NA"
distro[3,3] <- "Threshhold"
distro[163,3] <- "Threshhold"
ggplot2::ggplot(distro,aes(variable, value, color=indicator)) +
geom_quasirandom(groupOnX=TRUE, na.rm = TRUE, width=0.1) +
scale_color_manual(name = 'indicator', values=c("#99ccff","#000000")) +
geom_point(data = distro %>% filter(indicator == "Threshhold"))
You sort your data based on the color variable (your indicator).
Basically you want your black dots to be plotted last = on top of the other ones.
df$indicator <- sort(df$indicator, decreasing=T)
#Tidyverse solution
df <- df %>% arrange(desc(indicator))
Dependent on your levels you may have to reverse sort or not.
Then you just plot.
pd <- tibble(x=rnorm(1000), y=1, indicator=sample(c("A","B"), replace=T, size = 1000))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(indicator)
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()
pd <- pd %>% arrange(desc(indicator))
ggplot(pd, aes(x=x,y=y,color=indicator)) + geom_point()

Horizontal standard error bars on bar graphs with negative values

I have a bar graph that looks this:
and I am trying to get standard error bars on it - so two standard error bars for each column (one for the positive Y, and one for the negative N). I am aware of geom_errorh, but I cannot get it to work for this type of bar graph.
Here is a reproducible example with the code that I used to get a bar chart like the one above:
Dataframe
Behavior<-as.character(c("Hammock","Hammock","Climbing Trees","Climbing Trees","Structures","Structures","Grade","Grade"))
Presence<-c("Y","N","Y","N","Y","N","Y","N")
Mean<-as.numeric(c("18.5", "-6.4","3.5","-6.8","13.2","-10.1","4.7","-2.3"))
SD<-as.numeric(c("17.6","-11.9","1.2","-4.4","3.6","-6.25","1.23","-0.4"))
DF<-data.frame(Behavior,Presence,Mean,SD)
Coord Flip Geom Bar
brks <- seq(-20, 20, 2)
lbls = paste0(as.character(c(seq(-20, 0, 2), seq(2, 20, 2))), "")
ggplot(DF, aes(x = Behavior, y = Mean, fill = Presence )) +
geom_bar(data = subset(DF, Presence == "N"), stat = "identity") +
geom_bar(data = subset(DF, Presence == "Y"), stat = "identity") +
scale_y_continuous(breaks = brks,labels = lbls) +
scale_fill_manual(values=c("#0b6bb6", "#6eaf46"),name="", breaks=c("N", "Y"),labels=c("N", "Y"))+
coord_flip()+
theme_bw()+
xlab("Pen Characteristic - Behavior")+
ylab("Average Behavior per Session")+
Is it possible to get the SE bars on this type of graph?
Thanks!
As #Jakub pointed out in his comment, SD values are positive values.
What you normally do is something like this:
library(ggplot2)
set.seed(1)
Behavior <- as.character(c(
"Hammock","Hammock",
"Climbing Trees","Climbing Trees",
"Structures","Structures",
"Grade","Grade"))
Presence <- c("Y","N","Y","N","Y","N","Y","N")
Mean <- as.numeric(
c("18.5", "-6.4",
"3.5","-6.8",
"13.2","-10.1",
"4.7","-2.3"))
SD <- as.numeric(c(
"17.6","-11.9",
"1.2","-4.4",
"3.6","-6.25",
"1.23","-0.4"))
my_sd <- runif(length(Behavior))
DF <- data.frame(Behavior,Presence,Mean,SD, my_sd)
brks <- seq(-20, 20, 2)
ggplot(DF,
aes(x=Behavior, y=Mean, fill=Presence )) +
geom_col() +
scale_y_continuous(breaks = brks) +
scale_fill_manual(values=c("#0b6bb6", "#6eaf46"),
name="",
breaks=c("N", "Y"),
labels=c("N", "Y")) +
coord_flip()+
theme_bw()+
xlab("Pen Characteristic - Behavior") +
ylab("Average Behavior per Session") +
geom_errorbar(aes(ymin=Mean - my_sd, ymax=Mean + my_sd))

facet_zoom can't change breaks of zoomed plot

I currently have a plot and have used facet_zoom to focus on records between 0 and 10 in the x axis. The following code reproduces an example:
require(ggplot2)
require(ggforce)
require(dplyr)
x <- rnorm(10000, 50, 25)
y <- rexp(10000)
data <- data.frame(x, y)
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10))
I want to change the breaks on the zoomed portion of the graph to be the equivalent of:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = dplyr::between(x, 0, 10)) +
scale_x_continuous(breaks = seq(0,10,2))
But this changes the breaks of the original plot as well. Is it possible to just change the breaks of the zoomed portion whilst leaving the original plot as default?
This works for your use case:
ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10)) +
scale_x_continuous(breaks = pretty)
From ?scale_x_continuous, breaks would accept the following (emphasis added):
One of:
NULL for no breaks
waiver() for the default breaks computed by the transformation object
A numeric vector of positions
A function that takes the limits as input and returns breaks as output
pretty() is one such function. It doesn't offer very fine control, but does allow you to have some leeway to specify breaks across different facets with very different scales.
For illustration, here are two examples with different desired number of breaks. See ?pretty for more details on the other arguments this function accepts.
p <- ggplot(data, aes(x = x, y = y)) +
geom_point() +
facet_zoom(x = between(x, 0, 10))
cowplot::plot_grid(
p + scale_x_continuous(breaks = function(x) pretty(x, n = 3)),
p + scale_x_continuous(breaks = function(x) pretty(x, n = 10)),
labels = c("n = 3", "n = 10"),
nrow = 1
)
Of course, you can also define your own function to convert plot limits into desired breaks, (e.g. something like p + scale_x_continuous(breaks = function(x) seq(min(x), max(x), length.out = 5))), but I generally find these functions require more tweaking to get right, & pretty() is often good enough.

Resources