I am trying to produce a ggplot graph where I can compare two time periods without indexing the data. So, that I get one time-window running along the x-axis at the bottom and one along the x-axis at the top, kinda like the the example from the tidyverse manual page (see graph at bottom of linked page).
However I would like to have dual axis on the y-axis too, something like this (made this by copy-pasiting using the code below),
economics_long %>% filter(variable== "pce" & date > "2008-01-01" & date < "2010-01-01") %>%
ggplot(aes(date, value01, colour = variable)) + geom_line()
economics_long %>% filter(variable== "pce" & date > "1990-01-01" & date < "1992-01-01") %>%
ggplot(aes(date, value01, colour = variable)) + geom_line()
I imagine I would need to use bind_rows() to cut the two time periods out and put them on top and maybe make a new variables like variable with two options, like time-window 1 and time-window 2, however I wanted to ask here before I start manually build something crazy. Maybe others have done something simular?
I have made some first steps, like,
tw01 <- economics_long %>%
filter(variable== "pce" & date > "2008-01-01" & date < "2010-01-01")
tw02 <- economics_long %>%
filter(variable== "pce" & date > "1990-01-01" & date < "1992-01-01")
tw02$date <- tw01$date
tw <- bind_rows(tw01, tw02, .id = "time_window")
tw %>% ggplot(aes(date, value01, colour = time_window)) + geom_line()
Maybe this is what you are looking for:
For the date transformation I make use of lubridate::years. Addtionally for the transformation inside sec_axis I to wrap into hms::hms as I otherwise got an error.
As I personally find secondary axes always a bit confusing, expecially with both a secondary x and y axis I colored both the x and y labels according to the color of the lines. If you don't like that you can simply drop the theme() adjustemnts.
library(ggplot2)
library(dplyr)
tw1_START <- "2008-01-01"; tw1_END <- "2010-01-01"
tw2_START <- "1990-01-01"; tw2_END <- "1992-01-01"
s_factor <- .52
Intv <- interval(ymd(tw2_START), ymd(tw1_START))
IntvM <- time_length(Intv, "month") # time_length(YrDis , "year")
tw01 <- economics_long %>%
filter(variable== "pce" & date > tw1_START & date < tw1_END )
tw02 <- economics_long %>%
filter(variable== "pce" & date > tw2_START & date < tw2_END) %>%
mutate(date = date + hms::hms(months(IntvM))) %>%
mutate(value01 = value01 + s_factor)
tw <- bind_rows(tw01, tw02, .id = "time_window")
tw %>%
ggplot(aes(date, value01, colour = time_window)) +
geom_line() +
scale_x_date(sec.axis = sec_axis(~ . -hms::hms(months(IntvM)))) +
scale_y_continuous(sec.axis = sec_axis(~ . - s_factor), position = "right") +
theme(axis.text.x.top = element_text(color = scales::hue_pal()(2)[2]),
axis.text.x.bottom = element_text(color = scales::hue_pal()(2)[1]),
axis.text.y.right = element_text(color = scales::hue_pal()(2)[1]),
axis.text.y.left = element_text(color = scales::hue_pal()(2)[2]))
Related
I have a data frame (a tibble) like this:
library(tidyverse)
library(lubridate)
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
value=c("on", "off", "on", "off"))
x$day<- as.factor(day(x$date))
x$time <- paste0(str_pad(hour(x$date),2,pad="0"),":",str_pad(minute(x$date),2,pad="0"))
When I plot the data:
x %>% ggplot() + geom_col(aes(x=day,y=time, fill=value))
the times in the y axis do not follow the bars. Each time data is supposed to be side by side with each bar segment.
I tried using as.factor(time) but that didn't solve.
I also tried to add a numeric scale:
x = tibble(date=c("2022-04-25 07:04:07", "2022-04-25 07:09:07", "2022-04-25 07:14:07", "2022-04-26 07:04:07"),
fake_y=c(1,1,1,1)
value=c("on", "off", "on", "off"))
x %>% ggplot() + geom_col(aes(x=day,y=fake_y, fill=value))
but then the order of the on/off bars is lost.
How can I fix this?
Since you are looking for a time line, you would probably be best with geom_segment rather than geom_col. The reason is that since you might have multiple 'on' or 'off' values in a single day, it would be difficult to get these to stack correctly. You would also need to diff the on-off times to get them to stack. Furthermore, your labels would be wrong using columns if "off" represents the time of going from an on state to an off state.
When working with times in R, it is often best to keep them in time format for plotting. If you convert times to character strings before plotting, they will be interpreted as factor levels, and therefore will not be proportionately spaced correctly.
Since you want to have the day along one axis, you will need quite a bit of data manipulation to ensure that you record the state at the start of each day and the end of each day, but it can be achieved by doing:
p <- x %>%
mutate(date = as.POSIXct(date)) %>%
mutate(day = as.factor(day(date))) %>%
group_by(day) %>%
group_modify(~ add_row(.x,
date = floor_date(as.POSIXct(first(.x$date)), 'day'),
value = ifelse(first(.x$value) == 'on', 'off', 'on'),
.before = 1)) %>%
group_modify(~ add_row(.x,
date = ceiling_date(as.POSIXct(last(.x$date)), 'day') - 1,
value = last(.x$value))) %>%
mutate(ends = lead(date)) %>%
filter(!is.na(ends)) %>%
mutate(date = hms::as_hms(date), ends = hms::as_hms(ends)) %>%
ggplot(aes(x = day, y = date)) +
geom_segment(aes(xend = day, yend = ends, color = value),
size = 20) +
coord_cartesian(ylim = c(25120, 26500)) +
labs(y = 'time') +
guides(color = guide_legend(override.aes = list(size = 8)))
p
And of course, you can easily flip the co-ordinates if you wish, and apply theme elements to make the plot more appealing:
p + coord_flip(ylim = c(25120, 26500)) +
scale_color_manual(values = c('deepskyblue4', 'orange')) +
theme_light(base_size = 16)
I am trying to do a faceted plot of a grouped dataframe with ggplot2, using geom_line(). My dataframe has a Date column and I would like to have dates on the horizontal axis. If I just use Date in aes(x=Date, ...) I get nice labels on the horizontal axis. However, the line has an almost horizontal section where the date jumps from the end of one group to the beginning of the next group. This code and chart shows that:
dts <- seq.Date(as.Date("2020-01-01"), as.Date("2021-12-31"), by="day")
mos <- sapply(dts, month)
df <- data.frame(Date=dts, Month=mos)
nr <- nrow(df)
df$X <- rep(1, nr)
df %>%
group_by(Month) -> dfgrp
dfgrp %>%
group_by(Month) %>%
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=Date, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I would not like my chart to have those almost-horizontal lines when the date changes by a large amount. I was able to generate a chart without those lines using integers on aes() as follows:
dfgrp %>%
mutate(Time = 1:n() %>% as.integer(),
Z = cumsum(X)) %>%
ggplot(aes(x=Time, y=Z)) +
geom_line(color="darkgreen", size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
scale_x_continuous(breaks = seq(from=1, to=nr, by=10) %>% as.integer(),
labels = function(x) as.character(dfgrp$Date[x])) +
theme(axis.text.x = element_text(angle=45, size=7))
The line on the chart looks like I want it but the dates on the horizontal axis are not correct: they end in February 2020 in every facet while the dates in the dataframe end in December 2021 and the dates in the first chart begin and end on different months in different facets.
I tried many things but nothing worked. Any suggestions on how to have a chart with dates like in the first chart above and lines like in the second chart above?
Help will be much appreciated.
You may want to adjust the dates to be in the same year, but noting the original year as a variable:
library(lubridate)
dfgrp %>%
group_by(Month) %>%
mutate(year = year(Date),
adj_date = ymd(paste(2020, month(Date), day(Date)))) %>%
# 2020 was leap year so 2/29 won't be lost
mutate(Time = Date[1:n()],
Z = cumsum(X)) %>%
ggplot(aes(x=adj_date, y=Z, color = year, group = year)) +
geom_line(size=0.5) +
facet_grid(. ~ Month, scale="free_x") +
theme(axis.text.x = element_text(angle=45, size=7))
I'm new to using R so please bear with me as my code might not look the best. So I want to combine these two line graphs together since right now I have written code for each item that I am analyzing. This is the dataset I am using: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-01/readme.md I used the "Arable_Land" dataset!
##USA Arable Land
plot_arable_land_USA <- arable_land %>%
filter(Code == "USA") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_USA)
##Canada Arable Land
plot_arable_land_CAN <- arable_land %>%
filter(Code == "CAN") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_CAN)
Ideally, I would like one graph to show both like one line (in Purple) to show the USA and another line(in Brown) to show Canada.
Thank you!
Try this. It is a better practice to reshape data to long as you did. In your case you can add filter() to choose the desired countries. Then, reshape to long and design the plot. The key is setting color and group with Code in order to obtain the desired lines. You can set the colors using scale_color_manual() and I have left the facet option to get the title. Here the code:
library(plotly)
library(tidyverse)
#Code
plot_arable_land_CAN <- arable_land %>% select(-Entity) %>%
filter(Code %in% c('USA','CAN')) %>%
pivot_longer(-c(Code,Year)) %>%
ggplot(aes(x = Year, y = value,color=Code,group=Code)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')+
scale_color_manual(values = c('brown','purple'))
#Transform
ggplotly(plot_arable_land_CAN)
Output:
Hello I need to get my ggplot with date format having this format in X axis:
.
But my date format has time with it.
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(word, created_at, sentiment) %>%
ungroup()
p <- sentiment_bing1 %>% filter(sentiment == "positive") %>% ggplot(aes(x=created_at, y = n)) +
geom_line(stat="identity", position = "identity", color = "Blue") + scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') + stat_smooth() + theme_gdocs() +
xlab("Date") + ylab("Normalized Frequency of Positive Words in Trup's Tweets")
1 abound 11/30/17 13:05 positive 0.0
2 abuse 1/11/18 12:33 negative 0.0
3 abuse 10/27/17 1:18 negative 0.0
4 abuse 2/18/18 17:10 negative 0.0
This is what I have done to get the result. Now how do I achieve it like the picture? Conversion to date doesn't help as there are instances where the tweet takes place on same day but different time and that then messes the graph.
Welcome to SO!
It's hard to answer your question without seeing the data you are using and the error that your code is generating. Next time try and create a reproducible question. This will make it easier for someone to identify where your problem lies.
Based on the code and data you've provided I've created a sample data set with a (broadly) similar structure to that from the chart...
library(lubridate)
library(ggplot2)
library(ggthemes)
set.seed(100)
start_date <- mdy_hm("03-01-2017-12:00")
end_date <- mdy_hm("03-01-2018-12:00")
number_hours <- interval(start_date, end_date)/hours(1)
created_at <- start_date + hours(6:number_hours)
length(created_at)
word <- sample(c("abound", "abuse"), size = length(created_at), replace = TRUE,
prob=c(0.25, 0.75))
Your plotting code looks good. I could be wrong here, but from what I can tell your problem could lie in the way you are summarising the frequencies. In the code below, I've used the lubridate package to group you data by dates (day), allowing for a daily frequency count.
test_plot <- data_frame(created_at, word) %>%
mutate(sentiment =
case_when(
word == "abound" ~ "positive",
word == "abuse" ~ "negative")) %>%
filter(sentiment == "positive") %>%
mutate(created_at = date(round_date(ymd_hms(created_at), unit = "day"))) %>%
group_by(created_at) %>%
tally() %>%
ggplot() +
aes(x = created_at, y = n) +
geom_line(stat="identity", position = "identity", color = "Blue") +
geom_smooth() +
scale_x_date(date_breaks ='3 months', date_labels = '%b-%Y') +
theme_gdocs() +
xlab("Date") +
ylab("Frequency of Positive Words in Trump's Tweets")
Which gives you this...
sentiment_bing1 <- tidy_trump_tweets %>%
inner_join(get_sentiments("bing")) %>%
count(created_at, sentiment) %>%
spread(sentiment, n, fill=0) %>%
mutate(N = (sentiment_bing1$negative - min(sentiment_bing1$negative)) / (max(sentiment_bing1$negative) - min(sentiment_bing1$negative))) %>%
mutate(P = (sentiment_bing1$positive - min(sentiment_bing1$positive)) / (max(sentiment_bing1$positive) - min(sentiment_bing1$positive))) %>%
ungroup
sentiment_bing1$created_at <- as.Date(sentiment_bing1$created_at, "%m/%d/%y")
The use of spread helped in separating the positive and negative and then in normalization to get the result I wasa looking for!
I'm currently trying to make my own graphical timeline like the one at the bottom of this page. I scraped the table from that link using the rvest package and cleaned it up.
Here is my code:
library(tidyverse)
library(rvest)
library(ggthemes)
library(lubridate)
URL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"
justices <- URL %>%
read_html %>%
html_node("table.wikitable") %>%
html_table(fill = TRUE) %>%
data.frame()
# Removes weird row at bottom of the table
n <- nrow(justices)
justices <- justices[1:(n - 1), ]
# Separating the information I want
justices <- justices %>%
separate(Justice.2, into = c("name","year"), sep = "\\(") %>%
separate(Tenure, into = c("start", "end"), sep = "\n–") %>%
separate(end, into = c("end", "reason"), sep = "\\(") %>%
select(name, start, end)
# Removes wikipedia tags in start column
justices$start <- gsub('\\[e\\]$|\\[m\\]|\\[j\\]$$','', justices$start)
justices$start <- mdy(justices$start)
# This will replace incumbencies with NA
justices$end <- mdy(justices$end)
# Incumbent judges are still around!
justices[is.na(justices)] <- today()
justices$start = as.Date(justices$start, format = "%m/%d%/Y")
justices$end = as.Date(justices$end, format = "%m/%d%/Y")
justices %>%
ggplot(aes(reorder(x = name, X = start))) +
geom_segment(aes(xend = name,
yend = start,
y = end)) +
coord_flip() +
scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
theme(axis.title = element_blank()) +
theme_fivethirtyeight() +
NULL
This is the output from ggplot (I'm not worried about aesthetics yet I know it looks terrible!):
The goal for this plot is to order the judges chronologically from their start date, so the judge with the oldest start date should be at the bottom while the judge with the most recent should be at the top. As you can see, There are multiple instances where this rule is broken.
Instead of sorting chronologically, it simply lists the judges as the order they appear in the data frame, which is also the order Wikipedia has it in.
Therefore, a line segment above another segment should always start further right than the one below it
My understanding of reorder is that it will take the X = start from geom_segment and sort that and list the names in that order.
The only help I could find to this problem is to factor the dates and then order them that way, however I get the error
Error: Invalid input: date_trans works with objects of class Date only.
Thank you for your help!
You can make the name column a factor and use forcats::fct_reorder to reorder names based on start date. fct_reorder can take a function that's used for ordering start; you can use min() to order by the earliest start date for each justice. That way, judges with multiple start dates will be sorted according to the earliest one. Only a two line change: add a mutate at the beginning of the pipe, and remove the reorder inside aes.
justices %>%
mutate(name = as.factor(name) %>% fct_reorder(start, min)) %>%
ggplot(aes(x = name)) +
geom_segment(aes(xend = name,
yend = start,
y = end)) +
coord_flip() +
scale_y_date(date_breaks = "20 years", date_labels = "%Y") +
theme(axis.title = element_blank()) +
theme_fivethirtyeight()
Created on 2018-06-29 by the reprex package (v0.2.0).
I would make this a comment, but I couldn't fit it.
This was an attempt I gave up on. It looks like it actually does fix the problem, but it broke several other aspects of the formatting and I've run out of time to fix it back.
justices <- justices[order(justices$start, decreasing = TRUE),]
any(diff(justices$start) > 0) # FALSE, i.e. it works
justices$id <- nrow(justices):1
ggplot(data=justices, mapping=aes(x = start, y=id)) + #,color=name, color =
scale_x_date(date_breaks = "20 years", date_labels = "%Y") +
scale_y_discrete(breaks=justices$id, labels = justices$name) +
geom_segment(aes(xend = end, y = justices$id, yend = justices$id), size = 5) +
theme(axis.title = element_blank()) +
theme_fivethirtyeight()
Please also refer to this thread. GL!