How to reorder X axis date based on another variable - r

I have a text file here: https://login.filesanywhere.com/fs/v.aspx?v=8c6b67865a6370b0af67
I need to re-arrange my 'X' axis based on the month column of the dataset. I have tried for a while and can't seem to find a way to achieve it. The graph below currently plots from Jan to Dec but I want the order to be from Oct to Sept. This is what I have so far:
# A tibble: 6 x 6
# Groups: C_WY, WDAY, month, date [1]
C_WY WDAY month date boxname daily_mean
<fct> <int> <fct> <date> <chr> <dbl>
1 2001 274 Oct 2001-10-01 Confluence 22.3
2 2001 274 Oct 2001-10-01 DWSC-Yolo-CSlough 22.3
3 2001 274 Oct 2001-10-01 E_Delta 21.8
4 2001 274 Oct 2001-10-01 Lower_SaC 22.3
5 2001 274 Oct 2001-10-01 Lower_SJR 22.5
6 2001 274 Oct 2001-10-01 Marsh 23.0
ggplot(test2,aes(date,daily_mean,colour=boxname)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
ggtitle("Test")

This should work
library(tidyverse)
library(lubridate)
test2 %>%
mutate(date = case_when(month %in% c("Oct", "Nov", "Dec") ~ date - years(1),
TRUE ~ date)) %>%
ggplot(aes(date, daily_mean, colour=boxname)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",
date_labels = "%b",expand=c(0,0.5)) +
ggtitle("Test")

UPDATE: I ended up going way back into my dataset and in the code and found that I had lost my next year data. ggplot was looking at only one year of data instead of going beyond December. Thanks for trying.

Related

can't convert month number to month date in R [duplicate]

This question already has answers here:
How can I use name of a month in x-axis in ggplot2
(2 answers)
Closed 7 months ago.
I would like to create a ggplot with different tree types in Spain.
I used that code
library(dplyr)
library(reshape)
set.seed(123)
library(ggplot2)
library(tidyr)
df_long <- pivot_longer(df7,
cols = c(Birch, Palm, Oak),
values_to = "m3",
names_to = "Trees")
# Plot
ggplot(df_long,
aes(
x = Month,
y = Integral,
color = Trees
)) +
geom_line() +
ggtitle("trees in Spain") +
xlab("Month") + scale_x_continuous(breaks = seq(1, 12, by = 1), limits = c(1,12)) +
ylab(" m3")
But unfortunately the month names are not shown, just the number but I would like to have the month name
If your months are integers you can use the built in constants month.abb and month.name
library(dplyr)
df <- data.frame(month_nums = 1:12)
df |>
mutate(
month_abb = month.abb[month_nums],
month_full = month.name[month_nums]
)
# MONTH month_abb month_full
# 1 1 Jan January
# 2 2 Feb February
# 3 3 Mar March
# 4 4 Apr April
# 5 5 May May
# 6 6 Jun June
# 7 7 Jul July
# 8 8 Aug August
# 9 9 Sep September
# 10 10 Oct October
# 11 11 Nov November
# 12 12 Dec December
If they are dates you can use format():
df <- data.frame(
month = seq(from = as.Date("2020-01-01"), to = as.Date("2020-12-31"), by = "month")
)
df |>
mutate(
month_abb = format(month, "%b"),
month_full = format(month, "%B")
)
# month month_abb month_full
# 1 2020-01-01 Jan January
# 2 2020-02-01 Feb February
# 3 2020-03-01 Mar March
# 4 2020-04-01 Apr April
# 5 2020-05-01 May May
# 6 2020-06-01 Jun June
# 7 2020-07-01 Jul July
# 8 2020-08-01 Aug August
# 9 2020-09-01 Sep September
# 10 2020-10-01 Oct October
# 11 2020-11-01 Nov November
# 12 2020-12-01 Dec December

How I change the origin of the x axis in ggplot to go from 'August to March' instead of 'Jan to March, August to December'?

I want to plot temperature data over time, with the x axis: "08-01", "09-01", "10-01", "11-01", "12-01", "01-01", "02-01", "03-01"
Rather then: "01-01", "02-01", "03-01", "08-01", "09-01", "10-01", "11-01", "12-01", which R is doing.
My data looks like the following- my x axis uses the Month_day column. Unique values in this column are: "08-01", "09-01", "10-01", "11-01", "12-01", "01-01", "02-01", "03-01".
> head(upstream)
Date daily_aveTempC Moving_Average_7day Year Month Day Month_day monthAbb Migration EmbryoDev
1 2007-08-01 13.49556 13.94947 2007 08 01 08-01 Aug Upstream
2 2007-08-02 13.44325 13.74864 2007 08 02 08-02 Aug Upstream
3 2007-08-03 12.93881 13.56086 2007 08 03 08-03 Aug Upstream
4 2007-08-04 12.78937 13.41106 2007 08 04 08-04 Aug Upstream
5 2007-08-05 13.13963 13.29029 2007 08 05 08-05 Aug Upstream
6 2007-08-06 13.11844 13.19651 2007 08 06 08-06 Aug Upstream
I have the following code that plots Month_day (x axis) vs Moving Average 7day (y axis).
png(paste0(read_out_final, "Migration_Upstream_7day_MovingAve_Sal_4.png"), res=300, width = 15, height = 8, units = "in")
ggplot(data=upstream, aes(x=as.factor(Month_day), y=factor(Moving_Average_7day, levels=upstream$Month_day), color=Year, group=Year)) +
geom_line(size=1) +
theme_bw() +
scale_y_continuous(n.breaks = 20,
limits=c(1,20)) +
scale_x_discrete(breaks = upstream$Month_day[grep("0*-01", upstream$Month_day)]) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(title="Salmon Creek 4: Upstream Migration from August to March",
x="Date",
y="Temperature (7-Day Rolling Average degrees C)")
dev.off()
This plots the data: "01-01", "02-01", "03-01", "08-01", "09-01", "10-01", "11-01", "12-01".
But I want the data plotted: "08-01", "09-01", "10-01", "11-01", "12-01", "01-01", "02-01", "03-01"
I've seen solutions to this issue using the plot() function, but not for ggplot.
Almost every question on ggplot2 that includes "order of ... axis" can be resolved by using factor(., levels=), and explicitly controlling the order of the levels.
dat <- data.frame(dt = seq(as.Date("2020-08-01"), as.Date("2021-04-01"), by="month"), y = 1:9)
dat$MonDay <- format(dat$dt, format = "%m-%d")
dat
# dt y MonDay
# 1 2020-08-01 1 08-01
# 2 2020-09-01 2 09-01
# 3 2020-10-01 3 10-01
# 4 2020-11-01 4 11-01
# 5 2020-12-01 5 12-01
# 6 2021-01-01 6 01-01
# 7 2021-02-01 7 02-01
# 8 2021-03-01 8 03-01
# 9 2021-04-01 9 04-01
library(ggplot2)
ggplot(dat, aes(MonDay, y)) + geom_point()
This is because ggplot2 looks to order its variables; if numeric or integer, it's easy; if character, then it sorts it lexicographically, and it seems clear that "08-01" comes after "04-01" (despite the fact that the strings were formed from an object that had the opposite ordering).
dat$MonDay <- factor(dat$MonDay, levels = unique(dat$MonDay[order(dat$dt)]))
ggplot(dat, aes(MonDay, y)) + geom_point()

ggplot: Create line plot of proportions

I have a data.table of 59101 observations and I want to make a line plot, not a barplot, of proportions of my exposure column by the unique values in the month.year column (i.e. Jan 2014,Feb 2014,...,March 2020).
library(data.table)
set.seed(1992)
DT <- data.table::data.table(ID=seq(1,59101),
exposure=as.factor(sample(letters[1:10],59101,replace = TRUE)),
index_date=sample(seq(as.Date('2014/01/01'), as.Date('2020/03/31'), by="day"),59101,replace = TRUE))
DT[,month.year:=as.factor(zoo::as.yearmon(index_date)),]
DT
> DT
ID exposure index_date month.year
1: 1 g 2017-11-26 Nov 2017
2: 2 i 2019-10-11 Oct 2019
3: 3 i 2015-02-27 Feb 2015
4: 4 f 2016-10-18 Oct 2016
5: 5 e 2019-06-06 Jun 2019
---
59097: 59097 e 2015-07-22 Jul 2015
59098: 59098 j 2017-09-04 Sep 2017
59099: 59099 a 2018-04-26 Apr 2018
59100: 59100 a 2019-12-02 Dec 2019
59101: 59101 g 2014-11-04 Nov 2014
I want my data to look like the image I attached to my question (seen below)
I've been able to produce plots with ggplot in the past but I found that I struggled most with prepping the data, so my code ends up being rather rudimentary and ad hoc; I would like to keep my code as simple and clean as possible.
I've seen that melting data is the best approach when it comes to plotting with ggplot but I haven't built the brain muscle memory to instinctly know how to 1) prep the data in that manner and 2) pass the data through ggplot syntax to create what I need.
If anyone knows or has advice of how to do this, it would be greatly appreciated.
Here's a solution using tidyverse. It's a lot of data to plot, sometimes small multiples would be better.
library(data.table)
set.seed(1992)
DT <- data.table::data.table(ID=seq(1,59101),
exposure=as.factor(sample(letters[1:10],59101,replace = TRUE)),
index_date=sample(seq(as.Date('2014/01/01'), as.Date('2020/03/31'), by="day"),59101,replace = TRUE))
DT[,month.year:=as.factor(zoo::as.yearmon(index_date)),]
library(tidyverse)
DT %>% as_tibble() %>%
group_by(month.year, exposure) %>%
count() %>%
ungroup() %>%
group_by(month.year) %>%
mutate(ttl = sum(n),
pct_ttl = n / ttl,
date = lubridate::myd(month.year, truncated = 1L)) %>%
print(n = 20) %>%
ggplot(aes(date, pct_ttl, color = exposure, group = exposure)) +
geom_line() +
scale_x_date(scale_x_date(date_breaks = "4 months", date_labels = "%b %Y"))+
scale_y_continuous(label = scales::percent_format(accuracy = 1)) +
theme(axis.text.x = element_text(angle = 90))+
labs(y = "proportion", x = "")
#> # A tibble: 750 x 6
#> # Groups: month.year [75]
#> month.year exposure n ttl pct_ttl date
#> <fct> <fct> <int> <int> <dbl> <date>
#> 1 Jan 2014 a 66 793 0.0832 2014-01-01
#> 2 Jan 2014 b 83 793 0.105 2014-01-01
#> 3 Jan 2014 c 66 793 0.0832 2014-01-01
#> 4 Jan 2014 d 93 793 0.117 2014-01-01
#> 5 Jan 2014 e 76 793 0.0958 2014-01-01
#> 6 Jan 2014 f 71 793 0.0895 2014-01-01
#> 7 Jan 2014 g 87 793 0.110 2014-01-01
#> 8 Jan 2014 h 77 793 0.0971 2014-01-01
#> 9 Jan 2014 i 87 793 0.110 2014-01-01
#> 10 Jan 2014 j 87 793 0.110 2014-01-01
#> 11 Feb 2014 a 79 708 0.112 2014-02-01
#> 12 Feb 2014 b 66 708 0.0932 2014-02-01
#> 13 Feb 2014 c 69 708 0.0975 2014-02-01
#> 14 Feb 2014 d 69 708 0.0975 2014-02-01
#> 15 Feb 2014 e 69 708 0.0975 2014-02-01
#> 16 Feb 2014 f 78 708 0.110 2014-02-01
#> 17 Feb 2014 g 71 708 0.100 2014-02-01
#> 18 Feb 2014 h 67 708 0.0946 2014-02-01
#> 19 Feb 2014 i 65 708 0.0918 2014-02-01
#> 20 Feb 2014 j 75 708 0.106 2014-02-01
#> # ... with 730 more rows
Created on 2020-04-24 by the reprex package (v0.3.0)
Since this includes the data.table tag, here's a way to prep your data:
DT[,
{ n = .N
.SD[, .(rel_freq = .N / n), by = exposure]},
by = month.year]
month.year exposure rel_freq
<fctr> <fctr> <num>
1: Nov 2017 g 0.10840108
2: Nov 2017 f 0.10027100
3: Nov 2017 d 0.10162602
4: Nov 2017 i 0.09485095
5: Nov 2017 e 0.11382114
---
746: Jul 2018 f 0.10506799
747: Jul 2018 c 0.10259580
748: Jul 2018 a 0.10754017
749: Jul 2018 b 0.10135970
750: Jul 2018 g 0.11248455
Then you have a few options for the ggplot2 call.
Make a new variable
DT_relative = DT[, {n = .N; .SD[, .(rel_freq = .N / n), by = exposure]}, by = month.year]
ggplot(DT_relative, aes(x = month.year, y = rel_freq, color = exposure, group = exposure)) + geom_line()
Include the data transformation in the ggplot call
ggplot(DT[, {n = .N; .SD[, .(rel_freq = .N / n), by = exposure]}, by = month.year],
aes(x = month.year, y = rel_freq, color = exposure, group = exposure)) + geom_line()
Chain together another [data.table call and use ggplot(.SD)
DT[, {n = .N; .SD[, .(rel_freq = .N / n), by = exposure]}, by = month.year
][, ggplot(.SD, aes(x = month.year, y = rel_freq, color = exposure, group = exposure)) + geom_line()]
Attach magrittr to use the pipe
library(magrittr)
DT[,
{ n = .N
.SD[, .(rel_freq = .N / n), by = exposure]},
by = month.year]%>%
ggplot(., aes(x = month.year, y = rel_freq, color = exposure, group = exposure)) + geom_line()
The other answer includes great formatting for the ggplot() call so please see the other answer for how to make the graph look nice.

R shift scale at x axis date with a non-continuous sequence of time

I have a time serie data from two consecutive years (2017-2018), from january to december of each year. Then I need to plot the data from sept-17 to april-18.
I could do it with a very hand-made code, however I realize it could be done very much straightforward way with the packages availabe today for managing dates on plots (packages "scales", "lubridate", etc.)
Can someone help me to simplify my work for doing the second plot?
I will really appreciate it.
suppressWarnings(suppressMessages(library("tidyverse", quietly = T)))
dat <- tibble(
date = seq(as.Date("2017-01-01"), as.Date("2018-12-31"), by=1),
var = rgamma(length(date), shape=2, scale=2)) %>%
mutate(year = lubridate::year(date),
month = lubridate::month(date),
julian = lubridate::yday(date))
dat
#> # A tibble: 730 x 5
#> date var year month julian
#> <date> <dbl> <dbl> <dbl> <dbl>
#> 1 2017-01-01 12.9 2017 1 1
#> 2 2017-01-02 6.69 2017 1 2
#> 3 2017-01-03 6.11 2017 1 3
#> 4 2017-01-04 1.68 2017 1 4
#> 5 2017-01-05 1.22 2017 1 5
#> 6 2017-01-06 10.2 2017 1 6
#> 7 2017-01-07 5.13 2017 1 7
#> 8 2017-01-08 4.61 2017 1 8
#> 9 2017-01-09 3.79 2017 1 9
#> 10 2017-01-10 1.11 2017 1 10
#> # … with 720 more rows
dat %>%
ggplot() +
geom_line(aes(julian, var, color = factor(month), linetype=factor(year)))
dat %>%
filter((year == 2017 & month %in% c("9","10", "11", "12"))|
(year == 2018 & month %in% c("1", "2", "3"))) %>%
mutate(julian_AWS = ifelse(julian>=244, julian-243, julian+123)) %>%
ggplot() +
geom_line(aes(julian_AWS, var, color = factor(month), linetype=factor(year)))+
scale_x_continuous(breaks = c(1,#S
31,#O
61,#N
91,#D
121,#E
151,#F
181),#M
labels = c("Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar"))+
theme(axis.text.x=element_text(hjust=-1))
Created on 2019-05-05 by the reprex package (v0.2.1)
I don't think you need to delve into the julian date formats. See if this gets you what you need:
dat %>%
filter(date >= '2017-09-01', date < '2018-04-01') %>%
ggplot() +
geom_line(aes(date, var, color = factor(month), linetype = factor(year))) +
scale_x_date(date_breaks = "1 month", date_labels = "%b") +
theme(axis.text.x = element_text(hjust = -1))
For more info on date label formats, see ?strftime

ggplot replace days with month in aggregated year polar histogram

I am trying to replace the x axis of a histogram with its month, the data looks similar to:
library(tidyverse)
library(lubridate)
library(okcupiddata) # the example data
df <- profiles %>% as_tibble() %>%
select(last_online) %>%
mutate(month = month(last_online, label = TRUE, abbr = FALSE),
day = yday(last_online))
# A tibble: 59,946 x 3
last_online month day
<dttm> <dbl> <dbl>
1 2012-06-28 20:30:00 June 180
2 2012-06-29 21:41:00 June 181
3 2012-06-27 09:10:00 June 179
4 2012-06-28 14:22:00 June 180
5 2012-06-27 21:26:00 June 179
now I want to create a histogram with the days of the year
df %>%
ggplot(aes(x = day, fill = ..count..)) +
geom_histogram(bins = 365) +
scale_y_log10()
I want to replace the day-axis with it assigned month variable. I tried to use scale_x_discrete(labels = month), but this is just deleting the axis.
I assume I need to perform a larger transformation or programming, but I hope there is already a function that can quickly be applied.
I ultimately want to create a radial plot (adding + coord_polar()) with the month as a break, similar to this:

Resources