geom_vline for values over a threshold on Y-axis - r

I have a ggplot of temperature values plotted against time. I'd like to add vertical lines to my graph where temperature exceeds a threshold (let's say 12 degrees).
reprex:
#example data
Temp <- c(10.55, 11.02, 6.75, 12.55, 15.5)
Date <- c("01/01/2000", "02/01/2000", "03/01/2000", "04/01/2000", "05/01/2000")
#data.frame
df1 <- data.frame(Temp, Date)
#plot
df1%>%
ggplot(aes(Date, format(as.numeric(Temp))))+
geom_line(group=1)
I thought I could maybe do something with geom_hline and then rotate 90 degrees. I went about this by trying to create an object of all values (to 2dp) between 12 and 20. I would then tell geom_hline to use that object to match values and draw the lines.
Then I get a bit stuck. I don't really know how to rotate the lines or whether that's even a good idea.
Disclaimer: I know my dates are not actually dates in the reprex, but they are in my rle.

geom_vline can accept an xintercept either
in the xintercept parameter (if you want to specify it manually) or
in aes(xintercept = ...) if you want to use values from a data frame. We can use data = . %>% filter... to use the same data frame that came into ggplot, but apply some further manipulations.
df1 %>%
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
ggplot(aes(Date, Temp)) +
geom_line() +
geom_vline(data = . %>% filter(Temp > 12),
aes(xintercept = Date))

If you want to have vertical lines starting from the level of 12:
ggplot(df1, aes(Date, as.numeric(Temp)))+
geom_line(group=1) +
geom_segment(data= df1[df1$Temp>12,],
aes(x = Date,
xend = Date,
y = 12,
yend = Temp),
color = "blue", lwd = 1)

Related

How to order time in y axis

I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)

How to plot mixed-frequency series with NAs in ggplot?

I have the following dataframe x:
x1 <- data.frame(Date = seq(as.Date("2010-01-01"),
as.Date("2012-12-01"),
by = "month"),
TS1 = rnorm(36,0,1),
TS2 = rnorm(36,0,1),
stringsAsFactors = F)
x2 <- data.frame(Date = seq(as.Date("2010-01-01"),
as.Date("2012-12-01"),
by = "quarter"),
TS3 = rnorm(12,0,1),
stringsAsFactors = F)
x <- left_join(x1, x2, by = "Date")
x contains two monthly series, while one is quarterly.
I would like to plot all three series at the same time with ggplot. I am aware of dualplot as a way to do it. The issue with it however is that it allows you to plot only 2 mixed frequency series.
Is there anyone who can help me with this?
Thanks!
Note that ggplot requires long format, so we first use tidyr::pivot_longer.
Next, we can plot TS1 and TS2 easily, but TS3 will not plot at all as it contains missing values.
One option is to plot the line with missings with a separate geom_line call:
x2 <- x %>%
tidyr::pivot_longer(cols = c(TS1, TS2, TS3), names_to = "TS") %>%
mutate(TS = as.factor(TS))
ggplot(x2, aes(x = Date, y = value, group = TS, color = TS)) +
geom_line() +
geom_line(data = subset(x2, TS == "TS3" & !is.na(value)))
In this instance, ggplot does not have to have the data transformed into long format (although it is a nice solution, if you are familiar with transforming data, and recommended especially if there were lots of columns or separate lines to be plotted).
For simplicity, especially when learning ggplot can I propose an alternative solution.
TS1 and TS2 can easily be plotted against date, as neither have NA values. Here, we call geom_line() twice, once for each line:
x %>%
ggplot()+
geom_line(aes(Date, TS1), colour = 'red')+
geom_line(aes(Date, TS2), colour = 'blue')
If you try and include a third geom_line() with TS3, only the original two lines are plotted due to TS3's missing values (NA). A solution is to fill in the NA values in the data before plotting, using zoo::na.approx(). As the name suggests, zoo::na.approx() is able to approximate values when you have NAs, by linear interpolation. In this instance, I assume linear interpolation between known values is appropriate for plotting (as geom_line is doing anyway). Check out ?zoo::na.approx for more details, including non-linear interpolation.
zoo::na.approx(TS3, Date, na.rm = FALSE) may be read aloud like: "We want to approximate the values of TS3 when they are missing (NA), based on the values of Date, and if there are still NAs in the interpolated data keep the non-NA values we can approximate."
x %>%
mutate(
TS3 = zoo::na.approx(TS3, Date, na.rm = FALSE)
) %>%
ggplot()+
geom_line(aes(Date, TS1), colour = 'red')+
geom_line(aes(Date, TS2), colour = 'blue')+
geom_line(aes(Date, TS3), colour = 'green')
Note that the green line finishes just short (2 data points) of the other two lines. This is because by default, zoo::na.approx() doesn't interpolate when NA is not between two known data points. This is why we specified na.rm = FALSE when doing the interpolation. Look at the help page ?zoo::na.approx for alternatives (such as repeating the last known observation).

How can I set my own tick labels in ggplot while plotting factor values of time series?

So, I am plotting some time series in ggplot and on the x axis I got some date/time data. Data from 2008 to 2016. The problem is that dates are not continuous and for instance the last date of 2008 is
2008/05/14 19:05:12
and the next date is for 2009 something like this
2009/03/24 10:17:54
While plotting these, the result is the following
In order to get rid of the empty spaces I turn my dates into factors
dates <- factors(dates) in order to get the correct plot.
But after that I am unable to set the x tick labels as they don't change using
scale_x_continuous(breaks = c(1,1724,2283,5821,8906,10112,10156,14875 ),
labels = c("2008","2009","2010","2011","2012","2013","2014","2015"))
How can I change them?
There's a few problems this is throwing up, and the solution will really depend on what you're looking for. I'd suggest you post up some sample data and your code so far to get a more precise answer, but here's a possibility in the mean time:
Your graph above is not showing a continuous scale (though it may look like it), it's a discrete scale with the number of levels corresponding to unique date observations. Two problems come out of this:
applying a scale_x_continuous wont work, as the year breaks wont be evenly spread
your data looks like it's smoothly spread, but it isn't, which isn't a good principle for visualisation.
If what you're trying to do is show change year-by-year you could sort all of your data into yearly 'bins' and plot:
library(tidyverse)
library(lubridate)
# creating random data
df <- tibble(date = as_datetime(runif(1000, as.numeric(as_datetime("2001/01/24 09:30:43")), as.numeric(as_datetime("2006/02/24 09:30:43")))))
df["val"] <- rnorm(nrow(df), 25, 5)
# use lubridate to extract year as new variable, and plot grouped years
df %>%
mutate(year = factor(year(date))) %>%
ggplot(aes(year, val)) +
geom_point(position = "jitter")
Another possibility could be to use a colour scale to note your groupings by year, keeping all the dates in order but removing the gaps (and therefore not using a continuous x-axis scale):
df %>% # begin by simulating a data 'gap'
filter(date>as_datetime("2003/07/24 09:30:43")|date<as_datetime("2002/09/24 09:30:43")) %>%
mutate(year = factor(year(date)), # 'year' to select colour
date = factor(date)) %>%
ggplot(aes(date, val, col = year)) +
geom_point() +
theme(axis.ticks.x = element_blank(), # removes all ticks and labels, as too many unique times
axis.text.x = element_blank())
If neither of those are helpful do comment below with any clarifications of what you're looking for, and I'll see if I can help!
Edit: One last idea, you could create an invisible series of points which act as the breaks for your axis ticks:
blank_labels <- tibble(date = as_datetime(c("20020101 000000",
"20030101 000000",
"20040101 000000",
"20050101 000000",
"20060101 000000")),
col = "NA", val = 0)
df2 <- df %>%
filter(date>as_datetime("2003/07/24 09:30:43")|date<as_datetime("2002/09/24 09:30:43")) %>%
mutate(col = "black") %>%
bind_rows(blank_labels) %>%
mutate(date_fac = factor(date))
tick_values <- left_join(blank_labels, df2, by = c("date", "col"))
df2 %>%
ggplot(aes(date_fac, val, col = col)) +
geom_point() +
scale_x_discrete(breaks = tick_values$date_fac, labels = c("2002", "2003", "2004", "2005", "2006")) +
scale_color_identity()

Density curves on multiple histograms sharing same y-axis

I need to overlay normal density curves on 3 histograms sharing the same y-axis. The curves need to be separate for each histogram.
My dataframe (example):
height <- seq(140,189, length.out = 50)
weight <- seq(67,86, length.out = 50)
fev <- seq(71,91, length.out = 50)
df <- as.data.frame(cbind(height, weight, fev))
I created the histograms for the data as:
library(ggplot)
library(tidyr)
df %>%
gather(key=Type, value=Value) %>%
ggplot(aes(x=Value,fill=Type)) +
geom_histogram(binwidth = 8, position="dodge")
I am now stuck at how to overlay normal density curves for the 3 variables (separate curve for each histogram) on the histograms that I have generated. I won't mind the final figure showing either count or density on the y-axis.
Any thoughts on how to proceed from here?
Thanks in advance.
I believe that the code in the question is almost right, the code below just uses the answer in the link provided by #akrun.
Note that I have commented out the call to facet_wrap by placing a comment char before the last plus sign.
library(ggplot2)
library(tidyr)
df %>%
gather(key = Type, value = Value) %>%
ggplot(aes(x = Value, color = Type, fill = Type)) +
geom_histogram(aes(y = ..density..),
binwidth = 8, position = "dodge") +
geom_density(alpha = 0.25) #+
facet_wrap(~ Type)

Visualizing the difference between two points with ggplot2

I want to visualize the difference between two points with a line/bar in ggplot2.
Suppose we have some data on income and spending as a time series.
We would like to visualize not only them, but the balance (=income - spending) as well.
Furthermore, we would like to indicate whether the balance was positive (=surplus) or negative (=deficit).
I have tried several approaches, but none of them produced a satisfying result. Here we go with a reproducible example.
# Load libraries and create LONG data example data.frame
library(dplyr)
library(ggplot2)
library(tidyr)
df <- data.frame(year = rep(2000:2009, times=3),
var = rep(c("income","spending","balance"), each=10),
value = c(0:9, 9:0, rep(c("deficit","surplus"), each=5)))
df
1.Approach with LONG data
Unsurprisingly, it doesn't work with LONG data,
because the geom_linerange arguments ymin and ymax cannot be specified correctly. ymin=value, ymax=value is definately the wrong way to go (expected behaviour). ymin=income, ymax=spending is obviously wrong, too (expected behaviour).
df %>%
ggplot() +
geom_point(aes(x=year, y=value, colour=var)) +
geom_linerange(aes(x=year, ymin=value, ymax=value, colour=net))
#>Error in function_list[[i]](value) : could not find function "spread"
2.Approach with WIDE data
I almost got it working with WIDE data.
The plot looks good, but the legend for the geom_point(s) is missing (expected behaviour).
Simply adding show.legend = TRUE to the two geom_point(s) doesn't solve the problem as it overprints the geom_linerange legend. Besides, I would rather have the geom_point lines of code combined in one (see 1.Approach).
df %>%
spread(var, value) %>%
ggplot() +
geom_linerange(aes(x=year, ymin=spending, ymax=income, colour=balance)) +
geom_point(aes(x=year, y=spending), colour="red", size=3) +
geom_point(aes(x=year, y=income), colour="green", size=3) +
ggtitle("income (green) - spending (red) = balance")
3.Approach using LONG and WIDE data
Combining the 1.Approach with the 2.Approach results in yet another unsatisfying plot. The legend does not differentiate between balance and var (=expected behaviour).
ggplot() +
geom_point(data=(df %>% filter(var=="income" | var=="spending")),
aes(x=year, y=value, colour=var)) +
geom_linerange(data=(df %>% spread(var, value)),
aes(x=year, ymin=spending, ymax=income, colour=balance))
Any (elegant) way out of this dilemma?
Should I use some other geom instead of geom_linerange?
Is my data in the right format?
Try
ggplot(df[df$var != "balance", ]) +
geom_point(
aes(x = year, y = value, fill = var),
size=3, pch = 21, colour = alpha("white", 0)) +
geom_linerange(
aes(x = year, ymin = income, ymax = spending, colour = balance),
data = spread(df, var, value)) +
scale_fill_manual(values = c("green", "red"))
Output:
The main idea is that we use two different types of aesthetics for colours (fill for the points, with the appropriate pch, and colour for the lines) so that we get separate legends for each.

Resources