Ordering axes and making data more presentable - r

I am trying to order the time and date axes on my scatter plot into epochs/ time periods. For example, times between 12pm-:7:59pm and 9pm-11:59pm. I want to do something similar for the dates.
I am fairly new to R so I am just looking for suggestions/ to be told if this is even possible and maybe some alternatives:)
This is my code so far:
accident <- read.csv("accidents.csv",header = TRUE)
accident <- accident %>%
ggplot(data=accident)+
geom_point(mapping=aes(x=Time, y=Date, alpha=0.5))
Thank you!

Welcome to R! Here is one set of options.
library(tidyverse)
library(lubridate)
First, simulate dataset
accident <-
rnorm(n = 1000, mean = 1500000000, sd = 1000000) %>%
tibble(date_time = .) %>%
mutate(date_time = as.POSIXct(date_time, origin = "1970-01-01")) %>%
separate(date_time, into = c("date", "time"), sep = " ", remove = F)
Original plot:
accident %>%
ggplot()+
geom_point(aes(x=time, y=date), alpha=0.5)
Step 1: Collapse the x axis into smaller number of groups
accidents_per_trihour <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
group_by(date, tri_hour) %>%
count()
Then scale dot size by number of accidents
accidents_per_trihour %>%
ggplot()+
geom_point(aes(x=tri_hour, y=date, size = n), alpha=0.5) +
labs(x = "\nTime (in three-hour groups)", y = "Day\n", size = "Accidents count")
Still not great because the y axis is too expansive. So:
Step 2: Collapse the y axis into smaller number of groups
(For your data you may need to group into months for things to start to look reasonable)
accidents_per_trihour_per_week <-
accident %>%
mutate(hour = floor_date(date_time, unit = "hour"),
hour = as.numeric(str_sub(hour, 12,13)),
tri_hour = cut(hour, c(0, 3, 6, 9, 12, 15, 18, 21, 24), include.lowest = T)) %>%
mutate(week_start = floor_date(as.Date(date), unit = "weeks"),
week = format.Date(week_start, "%Y, week %W")) %>%
group_by(week, tri_hour) %>%
count()
Should be much more readable now
We’ll improve the theme as well, just because.
if (!require(ggthemr)) devtools::install_github('cttobin/ggthemr')
ggthemr::ggthemr("flat") ## helps with pretty theming
accidents_per_trihour_per_week %>%
ggplot()+
geom_point(aes(x=tri_hour, y=week, size = n), alpha = 0.9) +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", size = "Accidents count")
Could also do a tile plot
accidents_per_trihour_per_week %>%
ggplot() +
geom_tile(aes(x = tri_hour, y = week, fill = n)) +
geom_label(aes(x = tri_hour, y = week, label = n), alpha = 0.4, size = 2.5, fontface = "bold") +
labs(x = "\nTime (in three-hour groups)", y = "Week\n", fill = "Accidents count")
Created on 2021-11-24 by the reprex package (v2.0.1)

Related

How to get r to not remove a row in ggplot - geom_line

I'm trying to produce a graph of growth rates over time based upon the following data which has blanks in two groups.
When I try to make a growth plot of this using geom_line to join points there is no line for group c.
I'm just wondering if there is anyway to fix this
One option would be to get rid of the missing values which prevent the points to be connected by the line:
Making use of the code from the answer I provided on your previous question but adding tidyr::drop_na:
Growthplot <- data.frame(
Site = letters[1:4],
July = 0,
August = c(1, -1, NA, 2),
September = c(3, 2, 3, NA)
)
library(ggplot2)
library(tidyr)
library(dplyr, warn=FALSE)
growth_df <- Growthplot %>%
pivot_longer(-Site, names_to = "Month", values_to = "Length") %>%
mutate(Month = factor(Month, levels = c("July", "August", "September"))) %>%
drop_na()
ggplot(growth_df, aes(x = Month, y = Length, colour = Site, group = Site)) +
geom_point() +
geom_line()+
labs(color = "Site", x = "Month", y = "Growth in cm") +
theme(axis.line = element_line(colour = "black", size = 0.24))

Avoid overlap of points on a timeline (1-D repeling)

I want to create a timeline plot that roughly resembles the example below: lots of overlap at some points, not a lot of overlap at others.
What I need: overlapping images should repel each other where necessary, eliminating or reducing overlap. Ideally I'd be able to implement either a vertical or horizontal repel.
library(tidyverse)
library(ggimage)
test_img <- list.files(system.file("extdata", package="ggimage"), pattern="png", full.names=TRUE)
set.seed(123)
df <-
tibble(date = as.Date(paste0("2020-", round(runif(45, 1, 2)), "-", round(runif(45, 1, 10)))),
group = paste0("Timeline ", rep(1:9, each = 5)),
img = sample(test_img, size = 45, replace = T) )
df %>%
ggplot() +
geom_line(aes(x = date, y = group, group = group), size = 5, alpha = 0.2) +
geom_image(aes(x = date, y = group, image = img, group = group), asp = 1)
Something similar to the repelling in ggbeeswarm::geom_beeswarm or ggrepel::geom_text_repel would be nice, but those don't support images. So I think I need to pre-apply some kind of 1-dimensional packing algorithm, implementing iterative pair-wise repulsion on my vector of dates within each group, to try to find a non-overlapping arrangement.
Any ideas? Thank you so much!
Created on 2021-10-30 by the reprex package (v2.0.1)
Here is the solution I’ve been able to come up with, repurposing the circleRepelLayout function from the awesome packcircles package
into the repel_vector vector function that takes in your overlapping vector and a "repel_radius", and returns, if possible, a non-overlapping version.
I demonstrate the solution with the richtext geom since this is a geom I’ve always wished had repel functionality.
library(packcircles)
library(tidyverse)
library(ggtext)
library(ggimage)
repel_vector <- function(vector, repel_radius = 1, repel_bounds = range(vector)){
stopifnot(is.numeric(vector))
repelled_vector <-
packcircles::circleRepelLayout(x = data.frame(vector, ypos = 1, repel_radius),
xysizecols = c("vector", "ypos", "repel_radius"),
xlim = repel_bounds, ylim = c(0,1),
wrap = FALSE) %>%
as.data.frame() %>%
.$layout.x
return(repelled_vector)
}
overlapping_vec <- c(1, 1.1, 1.2, 10, 10.1, 10.2)
repelled_vec_default <- repel_vector(overlapping_vec)
repelled_vec_tighter <- repel_vector(overlapping_vec, repel_radius = 0.35)
ggplot() +
annotate("richtext", x = overlapping_vec, y = 3, label = "**test**", alpha = 0.5) +
annotate("richtext", x = repelled_vec_default, y = 2, label = "**test**", alpha = 0.5) +
annotate("richtext", x = repelled_vec_tighter, y = 1, label = "**test**", alpha = 0.5) +
scale_y_continuous(breaks = 1:3, labels = c("Tighter repel", "Default repel", "Overlapping points"))
In theory you apply this to 2D repelling as well.
To solve the problem in my question, this can be applied like so:
test_img <- list.files(system.file("extdata", package="ggimage"), pattern="png", full.names=TRUE)
set.seed(123)
df <-
tibble(date = as.Date(paste0("2020-", round(runif(45, 1, 2)), "-", round(runif(45, 1, 10)))),
group = paste0("Timeline ", rep(1:9, each = 5)),
img = sample(test_img, size = 45, replace = T) ) %>%
group_by(group) %>%
mutate(repelled_date = repel_vector(as.numeric(date),
repel_radius = 4,
repel_bounds = range(as.numeric(date)) + c(-3,3)),
repelled_date = as.Date(repelled_date, origin = "1970-01-01"))
df %>%
ggplot() +
geom_line(aes(x = date, y = group, group = group), size = 5, alpha = 0.2) +
geom_image(aes(x = repelled_date, y = group, image = img, group = group), asp = 1)
Created on 2021-10-30 by the reprex package (v2.0.1)

Combine text and image in a geom_label_repel in ggplot

I'm trying to do a line graph and have the last point of each series be labelled by a combination of text and image. I usually use ggrepel package for this and have no problem doing this with text only. My problem is I can't figure out how to add an image in the label.
I thought that a label like Country <img src='https://link.com/to/flag.png' width='20'/> would work and so this is what I've tried to do:
library(dplyr)
library(ggplot2)
library(ggrepel)
# example df
df <- data.frame(
Country = c(rep("France", 5), rep("United Kingdom", 5)),
Ratio = rnorm(10),
Days = c(seq(1, 5, 1), seq(4, 8, 1)),
abbr = c(rep("FR", 5), rep("GB", 5))) %>%
group_by(Country) %>%
# add "label" only to last point of the graph
mutate(label = if_else(Days == max(Days),
# combine text and img of country's flag
true = paste0(Country, " <img src='https://raw.githubusercontent.com/behdad/region-flags/gh-pages/png/", abbr, ".png' width='20'/>"),
false = NA_character_)
)
# line graph
ggplot(data = df, aes(x = Days, y = Ratio, color = Country)) +
geom_line(size = 1) +
theme(legend.position = "none") +
geom_label_repel(aes(label = label),
nudge_x = 1,
na.rm = T)
But this produces the raw label and not the country's name with its flag, as intended:
This is obviously not the way to go, can anyone please help me?
Try this approach using ggtext function geom_richtext(). You can customize other elements if you wish. Here the code:
library(dplyr)
library(ggplot2)
library(ggrepel)
library(ggtext)
# example df
df <- data.frame(
Country = c(rep("France", 5), rep("United Kingdom", 5)),
Ratio = rnorm(10),
Days = c(seq(1, 5, 1), seq(4, 8, 1)),
abbr = c(rep("FR", 5), rep("GB", 5))) %>%
group_by(Country) %>%
# add "label" only to last point of the graph
mutate(label = if_else(Days == max(Days),
# combine text and img of country's flag
true = paste0(Country, " <img src='https://raw.githubusercontent.com/behdad/region-flags/gh-pages/png/", abbr, ".png' width='20'/>"),
false = NA_character_)
)
# line graph
ggplot(data = df, aes(x = Days, y = Ratio, color = Country,label = label)) +
geom_line(size = 1) +
theme(legend.position = "none") +
geom_richtext(na.rm = T,nudge_x = -0.1,nudge_y = -0.1)
Output:

Multiple linear regression model along with ensemble ggplot in R?

I am trying to predict June - September Level for the Year 2020 using a multiple linear regression model. In my example below, I assume that the year 2016 conditions will repeat and use it for predicting June-Sep Level for the 2020. I plot the observed Level up until May 31, shown as solid black line and the Forecasted Level shown as dashed blue line.
library(tidyverse)
library(lubridate)
set.seed(1500)
DF <- data.frame(Date = seq(as.Date("2000-01-01"), to = as.Date("2018-12-31"), by = "days"),
Level = runif(6940, 360, 366), Flow = runif(6940, 1,10),
PCP = runif(6940, 0,25), MeanT = runif(6940, 1, 30)) %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(between(Month, 6, 9))
Model <- lm(data = DF, Level~Flow+PCP+MeanT)
Yr_2016 <- DF %>%
filter(Year == 2016) %>%
select(c(3:5))
Pred2020 <- data.frame(Date = seq(as.Date("2020-06-01"), to = as.Date("2020-9-30"), by = "days"),
Forecast = predict(Model, Yr_2016))
Obs2020 <- data.frame(Date = seq(as.Date("2020-01-01"), to = as.Date("2020-05-31"), by = "days"),
Level = runif(152, 360, 366))
ggplot(data = Obs2020, aes(x = Date, y = Level), col = "black")+
geom_line(size = 2)+
geom_line(data = Pred2020, aes(x = Date, y = Forecast), linetype = "dashed")
My goal
I want to use the fitted model to predict June - Sep for the year 2020 assuming that all the years in DF will repeat itself (not just the year 2016) and then have a plot where all the years Forecasted scenarios (June -Sep) are shown in different colours - something like below
new answer
The code below should do what you are looking for (if I understood it correctly). The graph, however, is still chaotic.
library(tidyverse)
library(lubridate)
set.seed(1500)
DF <- data.frame(Date = seq(as.Date("2000-01-01"), to = as.Date("2018-12-31"), by = "days"),
Level = runif(6940, 360, 366), Flow = runif(6940, 1,10),
PCP = runif(6940, 0,25), MeanT = runif(6940, 1, 30)) %>%
mutate(Year = year(Date), Month = month(Date), Day = day(Date)) %>%
filter(between(Month, 6, 9))
Model <- lm(data = DF, Level ~ Flow + PCP + MeanT)
Obs2020 <- data.frame(Date = seq(as.Date("2020-01-01"),
to = as.Date("2020-05-31"),
by = "days"),
Level = runif(152, 362.7, 363.25))
pred_data <- DF %>%
nest_by(Year) %>%
mutate(pred_df = list(tibble(Date = seq(as.Date("2020-06-01"),
to = as.Date("2020-09-30"),
by = "days"),
Forecast = predict(.env$Model, data)))) %>%
select(Year, pred_df) %>%
unnest(pred_df)
ggplot(data = Obs2020, aes(x = Date, y = Level), col = "black") +
geom_line(size = 0.1) +
geom_line(data = pred_data,
aes(x = Date, y = Forecast, group = factor(Year), color = factor(Year)),
size = 0.1)
Created on 2020-06-20 by the reprex package (v0.3.0)

smoothed grouped proportion plot

I have the following data set:
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
With that, I create the following plot:
Data %>% mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+ theme_classic() +
geom_line(aes(x = date, y = prop, color = treated)) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Unfortunately the plot is pretty 'jumpy' and I would like to smooth it. I tried geom_smooth() but can't get it to work. Other questions regarding smoothing didn't help me because they missed the grouping aspect and therefore had a different structure. However, the example data set is in reality part of a larger data set so I need to stick to that code.
[Edit: the geom_smooth() code I tried is geom_smooth(method = 'auto', formula = y ~ x)]
Can someone point me into the right direction?
Many thanks and all the best.
Is this what you want by a smoothed line? You call geom_smooth with aesthetics, not in combination with geom_line. You can choose different smoothing methods, though the default loess with low observations is usually what people want. As an aside, I don't think this is necessarily nicer to look at than the geom_line version, and in fact is slightly less readable. geom_smooth is best used when there are many y observations for every x which makes patterns hard to see, geom_line is good for 1-1.
EDIT: After looking at what you're doing more closely, I added a second plot that doesn't directly calculate the treatment-date means and just uses geom_smooth directly. That lets you get a more reasonable confidence interval instead of having to remove it as before.
set.seed(10)
start_date <- as.Date('2000-01-01')
end_date <- as.Date('2000-01-10')
Data <- data.frame(
id = rep((1:1000),10),
group = rep(c("A","B"), 25),
x = sample(1:100),
y = sample(c("1", "0"), 10, replace = TRUE),
date = as.Date(
sample(as.numeric(start_date):
as.numeric(end_date), 1000,
replace = T), origin = '2000-01-01'))
library(tidyverse)
Data %>%
mutate(treated = factor(group)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(treated, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = prop, color = treated), se = F) +
geom_point(aes(x = date, y = prop, color = treated)) +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Data %>%
mutate(treated = factor(group)) %>%
mutate(y = ifelse(y == "0", 0, 1)) %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
ggplot() +
theme_classic() +
geom_smooth(aes(x = date, y = y, color = treated), method = "loess") +
geom_vline(xintercept = as.POSIXct("2000-01-05 12:00 GMT"), color = 'black', lwd = 1)
Created on 2018-03-27 by the reprex package (v0.2.0).

Resources