I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)
Related
With the following example, I get a plot where the areas are not stacked. I would like to stack them. This should be a partial stack, intensity starting at 0.5, then reaching 0.8 where stacked, then reaching 0.3 at the end.
I assume that the position argument does not work as the start and end date are not the same.
Am I missing an argument that could solve this issue? Or maybe another geom?
Do I have to subset the data into days, to get the desired output. If so, how can I acheive that?
Thanks in advance,
# Library
library(tidyverse)
library(lubridate)
# Data
df <- tibble(date_debut = as_date(c("2022-09-28", "2022-10-05")),
intensity = c(0.5, 0.3),
duration = days(c(14, 10)),
type = (c("a", "b")))
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
pivot_longer(cols = c(date_debut, date_fin),
names_to = "date_type",
values_to = "date")
# Plot
df %>%
ggplot(aes(x = date, y = intensity, fill = type))+
geom_area(position = "stack")
This is a tough data wrangling problem. The area plots only stack where the points in the two series have the same x values. The following will achieve that, though it's quite a profligate approach.
df %>%
mutate(interval = interval(date_debut, date_debut + duration)) %>%
group_by(type) %>%
summarize(time = seq(as.POSIXct(min(df$date_debut)),
as.POSIXct(max(df$date_debut + df$duration)), by = 'min'),
intensity = ifelse(time %within% interval, intensity, 0)) %>%
ggplot(aes(x = time, y = intensity, fill = type)) +
geom_area(position = position_stack())
Allan Cameron's answer inspired me to look further into complete.
The proposed answer was solving my question, so I accepted. However, it is indeed more complex than needed.
I solved it this way:
# Adjustment
df <- df %>%
mutate(date_fin = date_debut + duration) %>%
group_by(type) %>%
complete(date_debut = seq(min(date_debut), max(date_fin), by = "1 day")) %>%
fill(intensity) %>%
select(date_debut, intensity, type)
ggplot(df, aes(x = date_debut, y = intensity, fill = type)) +
geom_area()+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
To avoid the weird empty space, it is fine for me to use geom_col (the question was about geom_area, so no worries).
ggplot(df, aes(x = date_debut, y = intensity, fill = type, colour = type)) +
geom_col(width = 0.95)+
scale_x_date(date_labels = "%d",
date_breaks = "1 day")
So I need to work with the hflights package and make subplots of every single weekday to show the delay of the airplanes (cancelled flights excluded). The problem is I'm not able to reproduce both x- and y-axis (x: month & y: delay in in min). I tried to use facet_wrap and facet_grid, but I'm not familiar to those function, because of not using ggplot2 that often.
The plot will be clearer if you name the months and weekdays, arrange them in the correct order, and use a logarithmic scale on the y axis. You can use facet_grid to create subplots for each weekday.
library(hflights)
library(tidyverse)
weekday <- c('Sunday', 'Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday')
hflights %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
filter(Cancelled == 0) %>%
ggplot(aes(Month, DepDelay)) +
geom_boxplot() +
scale_y_log10() +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
To get a single line going through each panel, you need to have an average for each unique combination of month and day. The simplest way to get this is via geom_smooth
hflights %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
filter(Cancelled == 0) %>%
ggplot(aes(Month, DepDelay, group = WeekDay)) +
geom_smooth(se = FALSE, color = 'black') +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
Though you can also summarize the data yourself and use geom_line
hflights %>%
filter(Cancelled == 0) %>%
mutate(WeekDay = factor(weekday[DayOfWeek], weekday),
Month = factor(month.abb[Month], month.abb)) %>%
group_by(Month, WeekDay) %>%
summarize(Delay = mean(DepDelay)) %>%
ggplot(aes(Month, Delay, group = WeekDay)) +
geom_line(color = 'black') +
facet_wrap(.~WeekDay) +
labs(x = 'Month', y = 'Departure delay (log scale)')
now since you didn't post the code let's assume you have saved a plot that plots everything in one plot under a.
a + facet_grid(rows = vars(weekday))
("weekday" is the column name where the weekdays are in, replace it if they are named diffrently)
If this isn't what you were searching for, it would be great if you could post some code...
Suppose you want to show the ArrDelay,
hflights %>%
filter(Cancelled!=1) %>%
ggplot(aes(x=as.factor(Month), y=mean(ArrDelay,na.rm=T)))+
geom_col()+
labs(x='Month',y='Mean arrival Delay')+
facet_wrap(~DayOfWeek)
I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)
I have created a bar plot with each variable having up to four data points. I have managed to plot it successfully. The only issue I'm currently experiencing is the key is not in the order I would like it to be. I would ideally want the key ranging from best to worst or in this case 'Excellent' to 'Not so good'.
What part of code would I need to change for the order to go from best to worst?
df <- read.csv("//ecfle35/STAFF-HOME$/MaxEmery/open event feedback/October/Q3.csv")
df %>%
#First the dataset needs to be long not wide
gather(review,
count,
Excellent:Not.so.good,
factor_key = T) %>%
#Lets get ride of N/A
filter(count != 'N/A') %>%
#convert count from string to number
#Remove the annoying full stop in the middle of text
mutate(count = as.integer(count),
review = gsub('\\.', ' ', review)) %>%
ggplot(aes(
x = Faculty,
y = count,
fill = review
)) +
geom_bar(position = 'dodge',
stat = 'identity') +
scale_y_continuous(breaks = seq(0, 22, by = 2)) +
labs(title = 'Teaching Staff Ratings',
x = 'Faculty',
y = 'Count') +
theme(axis.text.x = element_text(angle = 90))
Below is an image of how it currently looks -
Graphic of my plot
When you convert to a factor without specifying the levels, the levels get decided based on the alphabetical order. Add a mutate() in the pipe to relevel the review column before sending it to ggplot().
I would like to ask how I can possibly mutate the y axis in this example. I do not want to do any sort of transformation, I simply would like to have even breaks, but defined by me (in my own real example I have lots in the range of 0:300 and one tall bar in one category, so I would like to boost the lower bars)
set.seed(123)
df <- data.frame(c(rnorm(20,10,2),rnorm(30,50,5),rnorm(25,5,0.5)),
c(rnorm(75,60,20)),
c(rep("A",20),rep("B",30),rep("C",25)))
colnames(df) <- c("bags","dist","source")
df %>%
mutate(bin = cut(dist, breaks = c(min(dist),18, 30,40, 50, max(dist)))) %>% # specify ranges
group_by(source, bin) %>%
summarise(sum_number = sum(bags)) %>%
ungroup() %>%
ggplot(aes(bin, sum_number, fill=source))+
geom_col()+
xlab("Km")+
ylab("Number of bags")+
scale_fill_manual(values = c("#a6611a","#dfc27d","#bababa"),
labels = unique(df$source),
name = "")+
scale_y_continuous(limits = c(0,1200), breaks =
c(0,50,100,150,200,250,300,500,1000),
labels = c("0", "50", "100","150", `"200","250","300", "500","1,000")) +`
theme_minimal()
The code above is only applying chosen ticks but is not rescaling the y-axis which is what I need.