How to get total number of x displayed in ggplot? - r

Consider:
x <- rnorm(100)
qplot(x)
How to I get the total number (N = 100) of x displayed on the top right corner in my ggplot?
See actual output:
See this example (N = 37):

You can also set the location of the label programmatically, based on the data values. ggplot2 defaults to 30 bins, so the code below uses 30 bins to set the y-value for the label location:
set.seed(101)
x <- rnorm(100)
qplot(x) +
annotate("text", label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30))))
or
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=max(x), y=max(table(cut(x, 30)))))
UPDATE: To address your comment, let's plot with a discrete x vector. Now if we still want the y position of the text to be at the maximum, we once again find the category with the maximum number of counts. The data are already discrete, so we just need y=max(table(x)). For the x position, if we want the label at the maximum x value, we need the number of unique x categories, since ggplot implicitly numbers these from 1 to the N (where N is the number of categories). The unique function returns a vector containing each unique category. We just need the length of this vector to get the maximum x value in the graph: x=length(unique(x)).
set.seed(101)
x <- cut(rnorm(100), 5)
qplot(x) +
geom_text(aes(label=paste0("N = ", length(x)), x=length(unique(x)), y=max(table(x))))

Lots of ways. geom_text is the most general tool. For a one-off label, maybe annotate:
qplot(x) +
annotate("text",x = Inf,y = Inf,label = "N = 100",hjust = 1.5,vjust = 1.5)

The other answers show how you can add the text to your plot. But annotate() can also be used to add other geoms. If you want to put your annotation inside a rectangle, for instance, you can do the following:
x0 <- max(x)
y0 <- max(table(cut(x, 30)))
qplot(x) +
annotate("rect", xmin = x0*.8, xmax = x0*1.2, ymin = y0*.95, ymax = y0*1.05,
fill = "white", colour = "black") +
annotate("text", label = paste0("N = ", length(x)), x = x0, y = y0)
which gives
Up to the line that starts with annotate("rect", everything is taken from the other answers to this question.

Like this? (code below)
# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
set.seed(421)
x <- rnorm(100)
qplot(x) + annotate("text", x = 2, y = 15, label = paste("N =", length(x)))

Related

Indicating the maximum values and adding corresponding labels on a ggplot

ggplot(data = dat) + geom_line(aes(x=foo,y=bar)) +geom_line(aes(x=foo_land,y=bar_land))
which creates a plot like the following:
I want to try and indicate the maximum values on this plot as well as add corresponding labels to the axis like:
The data for the maximum x and y values is stored in the dat file.
I was attempting to use
geom_hline() + geom_vline()
but I couldn't get this to work. I also believe that these lines will continue through the rest of the plot, which is not what I am trying to achieve. I should note that I would like to indicate the maximum y-value and its corresponding x value. The x-value is not indicated here since it is already labelled on the axis.
Reproducible example:
library(ggplot2)
col1 <- c(1,2,3)
col2 <- c(2,9,6)
df <- data.frame(col1,col2)
ggplot(data = df) +
geom_line(aes(x=col1,y=col2))
I would like to include a line which travels up from 2 on the x-axis and horizontally to the y-axis indicating the point 9, the maximum value of this graph.
Here's a start, although it does not make the axis text red where that maximal point is:
MaxLines <- data.frame(col1 = c(rep(df$col1[which.max(df$col2)], 2),
-Inf),
col2 = c(-Inf, rep(max(df$col2), 2)))
MaxLines creates an object that says where each of three points should be for two segments.
ggplot(data = df) +
geom_line(aes(x=col1,y=col2)) +
geom_path(data = MaxLines, aes(x = col1, y = col2),
inherit.aes = F, color = "red") +
scale_x_continuous(breaks = c(seq(1, 3, by = 0.5), df$col1[which.max(df$col2)])) +
scale_y_continuous(breaks = c(seq(2, 9, by = 2), max(df$col2)))

Add a labelling function to just first or last ggplot label

I often find myself working with data with long-tail distributions, so that a huge amount of range in values happens in the top 1-2% of the data. When I plot the data, the upper outliers cause variation in the rest of the data to wash out, but I want to show those difference.
I know there are other ways of handling this, but I found that capping the values towards the end of the distribution and then applying a continuous color palette (i.e., in ggplot) is one way that works for me to represent the data. However, I want to ensure the legend stays accurate, by adding a >= sign to the last legend label
The picture below shows the of legend I'd like to achieve programmatically, with the >= sign drawn in messily in red.
I also know I can manually set breaks and labels, but I'd really like to just do something like, if(it's the last label) ~paste0(">=",label) else label) (to show with pseudo code)
Reproducible example:
(I want to alter the plot legend to prefix just the last label)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- tibble(x = x
,y = y
,z = z)
d %>%
ggplot(aes(x = x
,y = y
,fill = z
,color = z)) +
geom_point() +
scale_color_viridis_c()
One option would be to pass a function to the labels argument which replaces the last element or label with your desired label like so:
library(ggplot2)
set.seed(123)
x <- rnorm(1:1e3)
y <- rnorm(1:1e3)
z <- rnorm(1e3, mean = 50, sd = 15)
d <- data.frame(
x = x,
y = y,
z = z
)
ggplot(d, aes(
x = x,
y = y,
fill = z,
color = z
)) +
geom_point() +
scale_fill_continuous(labels = function(x) {
x[length(x)] <- paste0(">=", x[length(x)])
x
}, aesthetics = c("color", "fill"))

Is there a programatic way to pass specific ranges for the y-axis on a ggplot2 plot?

I've got plots that are being generated automatically based on some user inputs. Most of the time, the plots work fine. However, some users have requested to ensure that there is always an axis label on each end of the plotted data. For example, this plot:
sample_data <-
data.frame(
x = rep(LETTERS[1:3], each = 3)
, y = 1:9 + 0.5
)
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
)
Has no label above the top point or below the bottom point. I can add them easily enough with expand_limits:
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
expand_limits(y = c(2, 10))
However, because these plots are being automatically generated, I cannot manually add the next axis point each time. I've tried passing only.loose = TRUE to labeling:extended, but that still doesn't change the displayed values (any more than entering the values that I want would):
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(only.loose = TRUE))
In addition, some of the plots are more complex than this (e.g., with or without confidence intervals, additional grouping, etc.), and the data is prepared for the plot using dplyr and piped directly into ggplot (with %>%). So, even something like recalculating the values is non-trivial.
In fact, even in this case, it fails because adding the expanded points to capture the next set of labels changes the labeling.
ggplot(
sample_data
, aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(breaks = breaks_extended(n = 5
, only.loose = TRUE)) +
expand_limits(y =
sample_data %>%
group_by(x) %>%
summarise(my_mean = mean(y)) %>%
pull(my_mean) %>%
range() %>%
{labeling::extended(.[1], .[2], 5
, only.loose = TRUE)}
)
It appears that this happens because
labeling::extended(2.5, 8.5, 5, only.loose = TRUE)
returns the range 2 to 9 by 1's, while:
labeling::extended(2, 9, 5, only.loose = TRUE)
returns the range 2 to 10 by 2's. Somehow, breaks_extended is throwing in some added variation, though whether I track it down or not doesn't change much. I could work around this by calculating the breaks first, but (again) this is for a fairly complicated set of plots.
I feel like I am missing some sort of obvious point, but it keeps eluding me.
Yes there is a programmatic way to set the limits on y-scales and that is to provide a function to the limits argument. It is given the natural data limits as input that you can then edit programmatically. The same goes for breaks, except the input are the limits.
Example below, how this code should look exactly is up to your specifications.
library(ggplot2)
sample_data <- data.frame(
x = rep(LETTERS[1:3], each = 3),
y = 1:9 + 0.5
)
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = function(x) {
lower <- floor(x[1])
lower <- ifelse(x[1] - lower < 0.5, lower - 1, lower)
upper <- ceiling(x[2])
upper <- ifelse(upper - x[2] <= 0.5, upper + 1, upper)
c(lower, upper)
},
breaks = function(x) {
scales::breaks_pretty()(x)
}
)
#> Warning: Removed 3 rows containing missing values (geom_segment).
Created on 2021-03-23 by the reprex package (v1.0.0)
Inspired by teunbrand, I built a function that generates the limits, then checks to ensure that the expansion (including the 5% buffer) does not change the output of pretty
my_lims_expand <- function(x){
prev_pass <-
range(pretty(x))
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
while(!all(out == confirm_out)){
prev_pass <- curr_pass
curr_pass <-
pretty(c(prev_pass[1] - 0.05 * diff(prev_pass)
, prev_pass[2] + 0.05 * diff(prev_pass)))
last_under <-
tail(which(curr_pass < min(x)), 1)
first_over <-
head(which(curr_pass > max(x)), 1)
out <-
range(curr_pass[last_under:first_over])
confirm_out <-
range(pretty(out))
}
return(out)
}
Then, I can use that function for limits:
ggplot(sample_data,
aes(x = x, y = y)) +
stat_summary(
fun = "mean"
) +
scale_y_continuous(
limits = my_lims_expand
, breaks = pretty
)
to generate the desired plot:

Can I draw a horizontal line at specific number of range of values using ggplot2?

I have data (from excel) with the y-axis as ranges (also calculated in excel) and the x-axis as cell counts and I would like to draw a horizontal line at a specific value in the range, like a reference line. I tried using geom_hline(yintercept = 450) but I am sure it is quite naive and does not work that way for a number in range. I wonder if there are any better suggestions for it :)
plot.new()
library(ggplot2)
d <- read.delim("C:/Users/35389/Desktop/R.txt", sep = "\t")
head(d)
d <- cbind(row.names(d), data.frame(d), row.names=NULL)
d
g <- ggplot(d, aes(d$CTRL,d$Bin.range))+ geom_col()
g + geom_hline(yintercept = 450)
First of all, have a look at my comments.
Second, this is how I suggest you to proceed: don't calculate those ranges on Excel. Let ggplot do it for you.
Say, your data is like this:
df <- data.frame(x = runif(100, 0, 500))
head(df)
#> x
#>1 322.76123
#>2 57.46708
#>3 223.31943
#>4 498.91870
#>5 155.05416
#>6 107.27830
Then you can make a plot like this:
library(ggplot2)
ggplot(df) +
geom_histogram(aes(x = x),
boundary = 0,
binwidth = 50,
fill = "steelblue",
colour = "white") +
geom_vline(xintercept = 450, colour = "red", linetype = 2, size = 1) +
coord_flip()
We don't have your data, but the following data frame is of a similar structure:
d <- data.frame(CTRL = sample(100, 10),
Bin.range = paste(0:9 * 50, 0:9 * 50 + 49.9, sep = "-"))
The first thing to note is that your y axis does not have your ranges ordered correctly. You have 50-99.9 at the top of the y axis. This is because your ranges are stored as characters and ggplot will automatically arrange these alphabetically, not numerically. So you need to reorder the factor levels of your ranges:
d$Bin.range <- factor(d$Bin.range, d$Bin.range)
When you create your plot, don't use d$Bin.range, but instead just use Bin.range. ggplot knows to look for this variable in the data frame you have passed.
g <- ggplot(d, aes(CTRL, Bin.range)) + geom_col()
If you want to draw a horizontal line, your two options are to specify the y axis label at which you want to draw the line (i.e. yintercept = "400-449.9") or, which is what I suspect you want, use a numeric value of 9.5 which will put it between the top two values:
g + geom_hline(yintercept = 9.5, linetype = 2)

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

Resources