Heatmap plotting time against date ggplot - r

I would like to make a heatmap with ggplot.
The results should be something like this (the y-axis needs to be reversed though):
A subset of example data is below. For the actual application the dataframe has 1000+ users instead of only 3. The gradient filling should be based on the value of the users.
Date <- seq(
from = as.POSIXct("2016-01-01 00:00"),
to = as.POSIXct("2016-12-31 23:00"),
by = "hour"
)
user1 <- runif(length(Date), min = 0, max = 10)
user2 <- runif(length(Date), min = 0, max = 10)
user3 <- runif(length(Date), min = 0, max = 10)
example <- data.frame(Date, user1, user2, user3)
example$hour <- format(example$Date, format = "%H:%M")
example$total <- rowSums(example[,c(2:4)])
I have tried several things by using the (fill = total) argument in combination with geom_tile, geom_raster and stat_density2d (like suggested in similar posts here). An example below:
ggplot(plotHuishoudens, aes(Date, hour, fill = Total)) +
geom_tile() +
scale_fill_gradient(low = "blue", high = "red")
Which only shows individual points and not shows the y axis like a continuous variable (scale_y_continuous also did not help with this), although the variable is a continuous one?
How can I create a heatmap like the example provided above?
And how could I make a nice cut-off on the y axis (e.g. per 3 hours instead of per hour)?

The way your data is defined, you won't come to the desired output because example$Date is a POSIXct object, that is a date and an hour.
So, you must map your graph to the day only:
ggplot(data = example) +
geom_raster(aes(x=as.Date(Date, format='%d%b%y'), y=hour, fill=total)) +
scale_fill_gradient(low = "blue", high = "red")
For your second question, you can group hours like this:
example <- example %>%
group_by(grp = rep(row_number(), length.out = n(), each = 4)) %>%
summarise(Date = as.Date(sample(Date, 1), format='%d%b%y'),
total = sum(total),
time_slot = paste(min(hour), max(hour), sep = "-"))
ggplot(data = example) +
geom_raster(aes(x = Date, y = time_slot, fill = total)) +
scale_fill_gradientn(colours = (cet_pal(6, name = "inferno"))) # I like gradients from "cetcolor" package

Related

Draw arrow on ggplot with dates as variable, by specifying co-ordinates rather than using the units of the x- and y-variables

I am attempting to plot the blood test results for a patient in a time series. I have managed to do this and included a reference range between two shaded y-intercepts. My problem is that the annotate() or geom_segment() calls want me to specify, in the units of my independent variable, which is, unhelpfully, a date (YYYY-MM-DD).
Is it possible to get R to ignore the units of the x- and y-axis and specify the arrow co-ordinates as if they were on a grid?
result <- runif(25, min = 2.0, max = 3.5)
start_date <- ymd("2021-08-16")
end_date <- ymd("2022-10-29")
date <- sample(seq(start_date, end_date, by = "days"), 25, replace = TRUE)
q <- data.table(numbers, date)
ggplot(q, aes(x = date, y = result)) +
geom_line() +
geom_point(aes(x = date, y = result), shape = 21, size = 3) +
scale_x_date(limits = c(min(q$date), max(q$date)),
breaks = date_breaks("1 month"),
labels = date_format("%b %Y")) +
ylab("Corrected calcium (mmol/L")+
xlab("Date of blood test") +
ylim(1,4)+
geom_ribbon(aes(ymin=2.1, ymax=2.6), fill="grey", alpha=0.2, colour="grey")+
geom_vline(xintercept=as.numeric(q$date[c(3, 2)]),
linetype=4, colour="black") +
theme(axis.text.x = element_text(angle = 45)) + theme_prism(base_size = 10) +
annotate("segment", x = 1, y = 2, xend = 3, yend = 4, arrow = arrow(length = unit(0.15, "cm")))
The error produced is Error: Invalid input: date_trans works with objects of class Date only.
I can confirm that:
> class(q$date)
[1] "Date"
I've just gone with test co-ordinates (1,2,3,4) for the annotate("segment"...), ideally I want to be able to get the arrow to point to a specific data point on the plot to indicate when the patient went on treatment.
Many thanks,
Sandro
You don't need to convert to points or coordinates. Just use the actual values from your data frame. I am just subsetting within annotate using a hard coded index (you can also automate this of course), but you will need to "remind" R that you are dealing with dates - thus the added lubridate::as_date call.
library(ggplot2)
library(lubridate)
result <- runif(25, min = 2.0, max = 3.5)
start_date <- ymd("2021-08-16")
end_date <- ymd("2022-10-29")
date <- sample(seq(start_date, end_date, by = "days"), 25, replace = TRUE)
q <- data.frame(result, date)
## I am arranging the data frame by date
q <- dplyr::arrange(q, date)
ggplot(q, aes(x = date, y = result)) +
geom_line() +
## for start use a random x and y so it starts whereever you want it to start
## for end, use the same row from your data frame, in this case row 20
annotate(geom = "segment",
x = as_date(q$date[2]), xend = as_date(q$date[20]),
y = min(q$result), yend = q$result[20],
arrow = arrow(),
size = 2, color = "red")

Plotting geom_segment with position_dodge

I have a data set with information of where individuals work at over time. More specifically, I have information on the interval at which individuals work in a given workplace.
library('tidyverse')
library('lubridate')
# individual A
a_id <- c(rep('A',1))
a_start <- c(201201)
a_end <- c(201212)
a_workplace <-c(1)
# individual B
b_id <- c(rep('B',2))
b_start <- c(201201, 201207)
b_end <- c(201206, 201211)
b_workplace <-c(1, 2)
# individual C
c_id <- c(rep('C',2))
c_start <- c(201201, 201202)
c_end <- c(201204, 201206)
c_workplace <-c(1, 2)
# individual D
d_id <- c(rep('D',1))
d_start <- c(201201)
d_end <- c(201201)
d_workplace <-c(1)
# final data frame
id <- c(a_id, b_id, c_id, d_id)
start <- c(a_start, b_start, c_start, d_start)
end <- c(a_end, b_end, c_end, d_end)
workplace <- as.factor(c(a_workplace, b_workplace, c_workplace, d_workplace))
mydata <- data.frame(id, start, end, workplace)
mydata_ym <- mydata %>%
mutate(ymd_start = as.Date(paste0(start, "01"), format = "%Y%m%d"),
ymd_end0 = as.Date(paste0(end, "01"), format = "%Y%m%d"),
day_end = as.numeric(format(ymd_end0 + months(1) - days(1), format = "%d")),
ymd_end = as.Date(paste0(end, day_end), format = "%Y%m%d")) %>%
select(-ymd_end0, -day_end)
I would like a plot where I can see the patterns of how long each individual works at each workplace as well as how they move around. I tried plotting a geom_segment as I have information of start and end date the individual works in each place. Besides, because the same individual may work in more than one place during the same month, I would like to use position_dodge to make it visible when there is overlap of different workplaces for the same id-time. This was suggested in this post here: Ggplot (geom_line) with overlaps
ggplot(mydata_ym) +
geom_segment(aes(x = id, xend = id, y = ymd_start, yend = ymd_end),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip() +
theme(panel.background = element_rect(fill = "grey97")) +
labs(y = "time", title = "Work affiliation")
The problem I am having is that: (i) the position_dodge doesn't seem to be working, (ii) I don't know why all the segments are being colored in black. I would expect each workplace to have a different color and a legend to show up.
If you include colour = workplace in the aes() mapping for geom_segment you get colours and a legend and some dodging, but it doesn't work quite right (it looks like position_dodge only applies to x and not xend ... ? this seems like a bug, or at least an "infelicity", in position_dodge ...
However, replacing geom_segment with an appropriate use of geom_linerange does seem to work:
ggplot(mydata_ym) +
geom_linerange(aes(x = id, ymin = ymd_start, ymax = ymd_end, colour = workplace),
position = position_dodge(width = 0.1), size = 2) +
scale_x_discrete(limits = rev) +
coord_flip()
(some tangential components omitted).
A similar approach is previously documented here — a near-duplicate of your question once the colour= mapping is taken care of ...

Triple variable stacked line/area graph in R

I'm trying to generate a stacked line/area graph utilizing the ggplot and geom_area functions. I have my data loaded into R correctly from what I can tell. Every time I generate the plot, the graph is empty (even though the axis looks correct except for the months being organized in alpha).
I've tried utilizing the data.frame function to define my variables but was unable to generate my plot. I've also looked around Stack Overflow and other websites, but no one seems to have the issue of no errors but still an empty plot.
Here's my data set:
Here's the code I'm using currently:
ggplot(OHV, aes(x=Month)) +
geom_area(aes(y=A+B+Unknown, fill="A")) +
geom_area(aes(y=B, fill="B")) +
geom_area(aes(y=Unknown, fill="Unknown"))
Here's the output at the end:
I have zero error messages, simply just no data being plotted on my graph.
Your dates are being interpreted as a factor. You must transform them.
ibrary(tidyverse)
set.seed(1)
df <- data.frame(Month = seq(lubridate::ymd('2018-01-01'),
lubridate::ymd('2018-12-01'), by = '1 month'),
Unknow = sample(17, replace = T, size = 12),
V1 = floor(runif(12, min = 35, max = 127)),
V2 = floor(runif(12, min = 75, max = 275)))
df <- df %>%
dplyr::mutate(Month = format(Month, '%b')) %>%
tidyr::gather(key = "Variable", value = "Value", -Month)
ggplot2::ggplot(df) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack')
Note that I used tidyr::gather to be able to stack the areas in an easier way.
Now assuming your year of analysis is 2018, you need to transform the date of your data frame to something continuous, in the interpretation of r.
df2 <- df %>%
dplyr::mutate(Month = paste0("2018-", Month, "-01"),
Month = lubridate::parse_date_time(Month,"y-b-d"),
Month = as.Date(Month))
library(scales)
ggplot2::ggplot(df2) +
geom_area(aes(x = Month, y = Value, fill = Variable),
position = 'stack') +
scale_x_date(labels = scales::date_format("%b"))

Alter name of trace in legend after plot is created (by package) in ggplot2

I'm wondering if there is any easy way to change the name in a legend (given using the colour aesthetic) on a ggplot after the plot is created. I know this feels a bit hacky and would normally be changed in the data or when the plot is created, but I want to change the label on a plot that is created by another package, and there's no option in the package to change it.
I could obviously copy the function and save my own version and change it, but I just want to change one thing so it seems neater if I can just do it afterwards.
Here is an example with some dummy data, basically I want to relabel the Mean and Median timeseries that come out of fasstr's plot_daily_stats to "Modelled Mean" and "Modelled Median" so they cannot be confused with the observed mean which I am manually adding.
library(fasstr)
library(tibble)
library(ggplot2)
#create some fake data
df <- tibble(Date = seq.Date(from = as.Date("1991-01-01"), as.Date("1997-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(2557,0,1) + 50 + (cos((1/60)*DayOfYear)+4))
obsdf <- tibble(Date = seq.Date(from = as.Date("1900-01-01"), as.Date("1900-12-31"),
by = "day"),
DayOfYear = as.numeric(format(Date, "%j")),
Value = runif(365,0,1) + 51 + (cos((1/60)*DayOfYear)+4))
# create plot using fasstr package
plt1<- fasstr::plot_daily_stats(df)
# add my own trace. I also want to rename the trace "Mean" to
# "Modelled Mean" to avoid confusion (and same with Median)
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(values = c("red", "black","blue"))
The names are given in fasstr as hard coded names:
daily_plots <- ... +
ggplot2::geom_line(ggplot2::aes(y = Median, colour = "Median")) +
ggplot2::geom_line(ggplot2::aes(y = Mean, colour = "Mean"))
No hacking needed, just add labels to your manual scale.
plt1$Daily_Statistics +
geom_line(data = obsdf, aes(x = Date, y = Value, colour = "Observed Mean"))+
scale_colour_manual(labels = c("Modelled Mean","Modelled Median","Observed Mean"),
values = c("red", "black","blue"))

R: In ggplot, how to add multiple text labels on the y-axis for each of multiple dates on the x-axis

I am making a very wide chart that, when output as a PNG file, takes up several thousand pixels in the x-axis; there is about 20 years of daily data. (This may or may not be regarded as good practise, but it is for my own use, not for publication.) Because the chart is so wide, the y-axis disappears from view as you scroll through the chart. Accordingly I want to add labels to the plot at 2-yearly intervals to show the values on the y-axis. The resulting chart looks like the one below, except that in the interests of keeping it compact I have used only 30 days of fake data and put labels roughly every 10th day:
This works more or less as required, but I wonder if there is some better way of approaching it as in this chart (see code below) I have a column for each of the 3 y-axis values of 120, 140 and 160. The real data has many more levels, so I would end up with 15 calls to geom_text to put everything on the plot area.
Q. Is there a simpler way to splat all 20-odd dates, with 15 labels per date, on to the chart at once?
require(ggplot2)
set.seed(12345)
mydf <- data.frame(mydate = seq(as.Date('2012-01-01'), as.Date('2012-01-31'), by = 'day'),
price = runif(31, min = 100, max = 200))
mytext <- data.frame(mydate = as.Date(c('2012-01-10', '2012-01-20')),
col1 = c(120, 120), col2 = c(140,140), col3 = c(160,160))
p <- ggplot(data = mydf) +
geom_line(aes(x = mydf$mydate, y = mydf$price), colour = 'red', size = 0.8) +
geom_text(data = mytext, aes(x = mydate, y = col1, label = col1), size = 4) +
geom_text(data = mytext, aes(x = mydate, y = col2, label = col2), size = 4) +
geom_text(data = mytext, aes(x = mydate, y = col3, label = col3), size = 4)
print(p)
ggplot2 likes data to be in long format, so melt()ing your text into long format lets you make a single call to geom_text():
require(reshape2)
mytext.m <- melt(mytext, id.vars = "mydate")
Then your plotting command becomes:
ggplot(data = mydf) +
geom_line(aes(x = mydf$mydate, y = mydf$price), colour = 'red', size = 0.8) +
geom_text(data = mytext.m, aes(x = mydate, y = value, label = value), size = 4)

Resources