I'm currently trying to make my own graphical timeline like the one at the bottom of this page. I scraped the table from that link using the rvest package and cleaned it up.
Here is my code:
library(tidyverse)
library(rvest)
library(ggthemes)
library(lubridate)
URL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"
justices <- URL %>%
read_html %>%
html_node("table.wikitable") %>%
html_table(fill = TRUE) %>%
data.frame()
# Removes weird row at bottom of the table
n <- nrow(justices)
justices <- justices[1:(n - 1), ]
# Separating the information I want
justices <- justices %>%
separate(Justice.2, into = c("name","year"), sep = "\\(") %>%
separate(Tenure, into = c("start", "end"), sep = "\n–") %>%
separate(end, into = c("end", "reason"), sep = "\\(") %>%
select(name, start, end)
# Removes wikipedia tags in start column
justices$start <- gsub('\\[e\\]$|\\[m\\]|\\[j\\]$$','', justices$start)
justices$start <- mdy(justices$start)
# This will replace incumbencies with NA
justices$end <- mdy(justices$end)
# Incumbent judges are still around!
justices[is.na(justices)] <- today()
justices$start = as.Date(justices$start, format = "%m/%d%/Y")
justices$end = as.Date(justices$end, format = "%m/%d%/Y")
justices %>%
ggplot(aes(reorder(x = name, X = start))) +
geom_segment(aes(xend = name,
yend = start,
y = end)) +
coord_flip() +
scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
theme(axis.title = element_blank()) +
theme_fivethirtyeight() +
NULL
This is the output from ggplot (I'm not worried about aesthetics yet I know it looks terrible!):
The goal for this plot is to order the judges chronologically from their start date, so the judge with the oldest start date should be at the bottom while the judge with the most recent should be at the top. As you can see, There are multiple instances where this rule is broken.
Instead of sorting chronologically, it simply lists the judges as the order they appear in the data frame, which is also the order Wikipedia has it in.
Therefore, a line segment above another segment should always start further right than the one below it
My understanding of reorder is that it will take the X = start from geom_segment and sort that and list the names in that order.
The only help I could find to this problem is to factor the dates and then order them that way, however I get the error
Error: Invalid input: date_trans works with objects of class Date only.
Thank you for your help!
You can make the name column a factor and use forcats::fct_reorder to reorder names based on start date. fct_reorder can take a function that's used for ordering start; you can use min() to order by the earliest start date for each justice. That way, judges with multiple start dates will be sorted according to the earliest one. Only a two line change: add a mutate at the beginning of the pipe, and remove the reorder inside aes.
justices %>%
mutate(name = as.factor(name) %>% fct_reorder(start, min)) %>%
ggplot(aes(x = name)) +
geom_segment(aes(xend = name,
yend = start,
y = end)) +
coord_flip() +
scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
theme(axis.title = element_blank()) +
theme_fivethirtyeight()
Created on 2018-06-29 by the reprex package (v0.2.0).
I would make this a comment, but I couldn't fit it.
This was an attempt I gave up on. It looks like it actually does fix the problem, but it broke several other aspects of the formatting and I've run out of time to fix it back.
justices <- justices[order(justices$start, decreasing = TRUE),]
any(diff(justices$start) > 0) # FALSE, i.e. it works
justices$id <- nrow(justices):1
ggplot(data=justices, mapping=aes(x = start, y=id)) + #,color=name, color =
scale_x_date(date_breaks = "20 years", date_labels = "%Y") +
scale_y_discrete(breaks=justices$id, labels = justices$name) +
geom_segment(aes(xend = end, y = justices$id, yend = justices$id), size = 5) +
theme(axis.title = element_blank()) +
theme_fivethirtyeight()
Please also refer to this thread. GL!
Related
I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)
I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
Here's the pastebin link with my data
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday for bars and datetime for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)
I have a dataset that has individual records with ZIP Codes and Installation Dates. So far, I was able to plot records in a ZIP Code by:
Subset a ZIP Code
Sort the records by Date
Create a new column and assign increasing values (by 1) for the next row.
Plot this last field by Date
The result looks like this:
Now, what I want to do is have multiple ZIP Code geom_lines in the same figure. Each ZIP Code area has a different first record date, and I would like all of them to start at the same point on the X-axis.
Here's a failed attempt. I want these lines to start at the same point on the X-axis:
I am looking for ideas on how to proceed.
Thanks!
Let's try to roughly emulate your data structure since your question does not include any data:
library(ggplot2)
set.seed(69)
df <- data.frame(
ZIP = c(rep("A", 1000), rep("B", 687)),
Count = c(cumsum(round(runif(1000, 0, 0:999))),
cumsum(round(runif(687, 0, 0:686) * 4))),
Date = c(seq(as.POSIXct("2007-09-01"), by = "1 week", length.out = 1000),
seq(as.POSIXct("2013-08-31"), by = "1 week", length.out = 687)))
ggplot(df, aes(Date, Count, colour = ZIP)) +
geom_line() +
scale_colour_manual(values = c("blue", "red"))
Now clearly, if we want these lines to start at the same position on the x axis, the x axis can no longer reflect the absolute date, but rather the time since the first record. So we need to calculate what this would be for each group. The dplyr package can help us here:
library(dplyr)
df %>%
group_by(ZIP) %>%
mutate(Day = as.numeric(difftime(Date, min(Date), units = "days"))) %>%
ggplot(aes(Day, Count, colour = ZIP)) +
geom_line() +
labs(x = "Day since first record") +
scale_colour_manual(values = c("blue", "red"))
I have created a bar plot with each variable having up to four data points. I have managed to plot it successfully. The only issue I'm currently experiencing is the key is not in the order I would like it to be. I would ideally want the key ranging from best to worst or in this case 'Excellent' to 'Not so good'.
What part of code would I need to change for the order to go from best to worst?
df <- read.csv("//ecfle35/STAFF-HOME$/MaxEmery/open event feedback/October/Q3.csv")
df %>%
#First the dataset needs to be long not wide
gather(review,
count,
Excellent:Not.so.good,
factor_key = T) %>%
#Lets get ride of N/A
filter(count != 'N/A') %>%
#convert count from string to number
#Remove the annoying full stop in the middle of text
mutate(count = as.integer(count),
review = gsub('\\.', ' ', review)) %>%
ggplot(aes(
x = Faculty,
y = count,
fill = review
)) +
geom_bar(position = 'dodge',
stat = 'identity') +
scale_y_continuous(breaks = seq(0, 22, by = 2)) +
labs(title = 'Teaching Staff Ratings',
x = 'Faculty',
y = 'Count') +
theme(axis.text.x = element_text(angle = 90))
Below is an image of how it currently looks -
Graphic of my plot
When you convert to a factor without specifying the levels, the levels get decided based on the alphabetical order. Add a mutate() in the pipe to relevel the review column before sending it to ggplot().
I am not very good in R, and need some help.
My ggplot has a lot of dates(in the x-axis) so you can't actually see the dates, and I want to change it to months to give a better overview of the plot.
For example to something like this in the link:
Display the x-axis on ggplot as month only in R
This is the script I'm using:
r <- read.csv("xxdive.csv", header = T, sep = ";")
names(r) <- c("Date", "Number")
r <- data.frame(r)
r$Date <- factor(r$Date, ordered = T)
r[1:2, ]
Date Number
16.02.2015 97
17.02.2015 47
library(tidyverse)
ggplot(r, aes(Date, Number)) +
theme_light() +
ggtitle("16.02.15-10.02.16") +
ylab("Dives") +
geom_line(aes(group = 1), color = "blue")
This shows what kind of data I have.
I have tried using scale etc, but I can't make it work..
I hope this was understandable, and that someone can help me!! :)
I would convert column Date to data type Date
r$Date <- as.Date(r$Date, "%d.%m.%Y");
instead of converting it to data type factor.
r$Date <- factor(r$Date, ordered = T);
It's a little tricky without a working example, but try this.
install.packages("tidyverse")
library(tidyverse)
r <- read_delim("xxdive.csv", ";", col_types = list(col_date(), col_integer()))
names(r) <- c("Date", "Number")
ggplot(r, aes(Date, Number)) +
geom_line(aes(group = 1), color = "blue") +
scale_x_date(date_breaks = "1 month") +
ylab("Dives") +
ggtitle("16.02.15-10.02.16") +
theme_light()