R ggplot annotate line plot with max value - r

I have a line plot created with this code:
# Create data
year <- c(2006,2007,2008,2009,2010,2011,2012,2013,2014)
sales <- c(4176,8560,6473,10465,14977,15421,14805,11183,10012)
df <- data.frame(year,sales)
# Plot
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line()
I would like to annotate it with a line that "shows" the maximum value like the example below:
Is this possible with ggplot?

Yes. For your current example, try this:
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line() +
geom_segment(aes(x = 2011, y = 0, xend = 2011, yend = 15421),linetype="dashed", color = "red")
Of course, for more general plotting needs, you can improve the codes instead of manually inputting the values here.

This is close, just needs the arrow:
ggplot(data = df,aes(year, sales),group = 1) + geom_point() + geom_line() + theme_bw() +
geom_linerange(aes(ymax=sales, ymin=min(df$sales)),
data=df[which.max(df$sales),],
col="red", lty=2) +
geom_text(aes(label=sales),data=df[which.max(df$sales),], hjust=1.2, vjust=3)
It works by adding geom_linerange and geom_text geoms but setting the data for each to be the row of the original dataset corresponding to the maximum of the 'sales' column.

Related

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

Creating a legend with shapes using ggplot2

I have created the following code for a graph in which four fitted lines and corresponding points are plotted. I have problems with the legend. For some reason I cannot find a way to assign the different shapes of the points to a variable name. Also, the colours do not line up with the actual colours in the graph.
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
g <- ggplot(df, aes(x=x), shape="shape") +
geom_smooth(aes(y=y1), colour="red", method="auto", se=FALSE) + geom_point(aes(y=y1),shape=14) +
geom_smooth(aes(y=y2), colour="blue", method="auto", se=FALSE) + geom_point(aes(y=y2),shape=8) +
geom_smooth(aes(y=y3), colour="green", method="auto", se=FALSE) + geom_point(aes(y=y3),shape=6) +
geom_smooth(aes(y=y4), colour="yellow", method="auto", se=FALSE) + geom_point(aes(y=y4),shape=2) +
ylab("x") + xlab("y") + labs(title="overview")
geom_line(aes(y=1000), linetype = "dashed")
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5)) +
scale_shape_binned(name="Value g", values=c(y1="14",y2="8",y3="6",y4="2"))
print(g)
I am wondering why the colours don't match up and how I can construct such a legend that it is clear which shape corresponds to which variable name.
While you can add the legend manually via scale_shape_manual, perhaps the adequate solution would be to reshape your data (try using tidyr::pivot_longer() on y1:y4 variables), and then assigning the resulting variable to the shape aesthetic (you can then manually set the colors to your liking). You would then need to use a single geom_point() and geom_smooth() instead of four of each.
Also, you're missing a reproducible example (what are the values of x?) and your code emits some warnings while trying to perform loess smoothing (because there's fewer data points than need to perform it).
Update (2021-12-12)
Here's a reproducible example in which we reshape the original data and feed it to ggplot using its aes() function to automatically plot different geom_point and geom_smooth for each "y group". I made up the values for the x variable.
library(ggplot2)
library(tidyr)
x <- 1:6
y1 <- c(1400,1200,1100,1000,900,800)
y2 <- c(1300,1130,1020,970,830,820)
y3 <- c(1340,1230,1120,1070,940,850)
y4 <- c(1290,1150,1040,920,810,800)
df <- data.frame(x,y1,y2,y3,y4)
data2 <- df %>%
pivot_longer(y1:y4, names_to = "group", values_to = "y")
ggplot(data2, aes(x, y, color = group, shape = group)) +
geom_point(size = 3) + # increased size for increased visibility
geom_smooth(method = "auto", se = FALSE)
Run the code line by line in RStudio and use it to inspect data2. I think it'll make more sense here's the resulting output:
Another update
Freek19, in your second example you'll need to specify both the shape and color scales manually, so that ggplot2 considers them to be the same, like so:
library(ggplot2)
data <- ... # from your previous example
ggplot(data, aes(x, y, shape = group, color = group)) +
geom_smooth() +
geom_point(size = 3) +
scale_shape_manual("Program type", values=c(1, 2, 3,4,5)) +
scale_color_manual("Program type", values=c(1, 2, 3,4,5))
Hope this helps.
I managed to get close to what I want, using:
library(ggplot2)
data <- data.frame(x = c(0,0.02,0.04,0.06,0.08,0.1),
y = c(1400,1200,1100,1000,910,850, #y1
1300,1130,1010,970,890,840, #y2
1200,1080,980,950,880,820, #y3
1100,1050,960,930,830,810, #y4
1050,1000,950,920,810,800), #y5
group = rep(c("5%","6%","7%","8%","9%"), each = 6))
data
Values <- ggplot(data, aes(x, y, shape = group, color = group)) + # Create line plot with default colors
geom_smooth(aes(color=group)) + geom_point(aes(shape=group),size=3) +
scale_shape_manual(values=c(1, 2, 3,4,5))+
geom_line(aes(y=1000), linetype = "dashed") +
ylab("V(c)") + xlab("c") + labs(title="Valuation")+
theme_light() +
theme(plot.title = element_text(color="black", size=12, face="italic", hjust = 0.5))+
labs(group="Program Type")
Values
I am only stuck with 2 legends. I want to change both name, because otherwise they overlap. However I am not sure how to do this.

How to stop ggplot line plot adding fill

I am producing a ggplot which looks at a curve in a dataset. When I build the plot, ggplot is automatically adding fill to data which is on the negative side of the x axis. Script and plot shown below.
ggplot(df, aes(x = Var1, y = Var2)) +
geom_line() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
Using base R, I am able to get the plot shown below which is how it should look.
plot(x = df$Var1, y = df$Var2, type = "l",
xlab = "Var1", ylab = "Var2")
abline(v = 0)
abline(h = df$Var2[1])
If anyone could help identify why I might be getting the automatic fill and how I could make it stop, I would be very appreciative. I would like to make this work in ggplot so I can later animate the line as it is a time series that can be used to compare between other datasets from the same source.
Can add data if necessary. Data set is 1561 obs long however. Thanks in advance.
I guess you should try
ggplot(df, aes(x = Var1, y = Var2)) +
geom_path() +
geom_vline(xintercept = 0) +
geom_hline(yintercept = Var2[1])
instead. The geom_line()-function connects the points in order of the variable on the x-axis.
Take a look at this example
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_line()
The two points with x-coordinate -pi/2 will be connected first, creating a vertical black line. Next x = -pi/2 + 0.001 will be processed and so on. The x values will be processed in order.
Therefore you should use geom_path() to get the desired result
dt <- data.frame(
x = c(seq(-pi/2,3*pi,0.001),seq(-pi/2,3*pi,0.001)),
y = c(sin(seq(-pi/2,3*pi,0.001)), cos(seq(-pi/2,3*pi,0.001)))
)
ggplot(dt, aes(x,y)) + geom_path()

geom_area ggplot fill above threshold with data subset

I read several posts on how to fill above geom_area plots using the geom_ribbon function, but none have also dealt with subsets of data.
Consider the following data and plot. I simply want to fill above a threshold, 25 in this example (y-axis), but also fill only within a subset of days within each month, in this case between days 2 and 12. In sum, both criteria must be met in order to fill, and I'm trying to get a smooth fill.
I can improve upon my graph below by using the approx function to interpolate a lot of points on my line, but it still does not handle my subset and connects fill lines between months.
library(ggplot2)
y = sample(1:50)
x = seq(as.Date("2011-12-30"), as.Date("2012-02-17"), by="days", origin="1970-01-01")
z = format(as.Date(x), "%d")
z=as.numeric(z)
df <- data.frame(x,y,z)
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=subset(df, z>=2 & z<=12), aes(ymin=25, ymax=ifelse(y< 25,20, y)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)
plot
Figure
The data=subset(df, z>=2 & z<=12) removes lines from the dataframe, so the data is 'lost' for geom_ribbon.
Instead of subsetting an additional condition for the y-value may get you closer to what you want to achieve:
plot<-ggplot(df, aes(x=x, y=y)) +
geom_area(fill="transparent") +
geom_ribbon(data=df, aes(ymin=25, ymax=ifelse((z>=2 & z<=12), ifelse(y < 25, 20, y), 25)), fill = "red", alpha=0.5) +
geom_line() +
geom_hline(yintercept = 25, linetype="dashed") +
labs(y="My data") +
theme_bw(base_size = 22)

R: prevent break in line showing time series data using ggplot geom_line

Using ggplot2 I want to draw a line that changes colour after a certain date. I expected this to be be simple, but I get a break in the line at the point the colour changes. Initially I thought this was a problem with group (as per this question; this other question also looked relevant but wasn't quite what I needed). Having messed around with the group aesthetic for 30 minutes I can't fix it so if anybody can point out the obvious mistake...
Code:
require(ggplot2)
set.seed(1111)
mydf <- data.frame(mydate = seq(as.Date('2013-01-01'), by = 'day', length.out = 10),
y = runif(10, 100, 200))
mydf$cond <- ifelse(mydf$mydate > '2013-01-05', "red", "blue")
ggplot(mydf, aes(x = mydate, y = y, colour = cond)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()
If you set group=1, then 1 will be used as the group value for all data points, and the line will join up.
ggplot(mydf, aes(x = mydate, y = y, colour = cond, group=1)) +
geom_line() +
scale_colour_identity(mydf$cond) +
theme()

Resources