ggplot replace days with month in aggregated year polar histogram - r

I am trying to replace the x axis of a histogram with its month, the data looks similar to:
library(tidyverse)
library(lubridate)
library(okcupiddata) # the example data
df <- profiles %>% as_tibble() %>%
select(last_online) %>%
mutate(month = month(last_online, label = TRUE, abbr = FALSE),
day = yday(last_online))
# A tibble: 59,946 x 3
last_online month day
<dttm> <dbl> <dbl>
1 2012-06-28 20:30:00 June 180
2 2012-06-29 21:41:00 June 181
3 2012-06-27 09:10:00 June 179
4 2012-06-28 14:22:00 June 180
5 2012-06-27 21:26:00 June 179
now I want to create a histogram with the days of the year
df %>%
ggplot(aes(x = day, fill = ..count..)) +
geom_histogram(bins = 365) +
scale_y_log10()
I want to replace the day-axis with it assigned month variable. I tried to use scale_x_discrete(labels = month), but this is just deleting the axis.
I assume I need to perform a larger transformation or programming, but I hope there is already a function that can quickly be applied.
I ultimately want to create a radial plot (adding + coord_polar()) with the month as a break, similar to this:

Related

How to use another variable values as labels on date x-axis in ggplot?

I have created a ggplot using date x axis but I would like to show their values from another variable instead of dates.
df
library(tidyverse)
library(lubridate)
df <- read_rds("https://github.com/johnsnow09/covid19-df_stack-code/blob/main/vaccine_milestones.rds?raw=true")
df
Updated.On cr_bin days_to_next_10cr_vacc
<date> <fct> <drtn>
1 2021-04-11 10 Cr 85 days
2 2021-05-27 20 Cr 46 days
3 2021-06-24 30 Cr 28 days
4 2021-07-18 40 Cr 24 days
5 2021-08-06 50 Cr 19 days
6 2021-08-25 60 Cr 19 days
7 2021-09-07 70 Cr 13 days
8 2021-09-18 80 Cr 11 days
9 2021-10-02 90 Cr 14 days
df %>%
ggplot(aes(x = Updated.On, y = days_to_next_10cr_vacc)) +
geom_col() +
scale_x_date(aes(labels = cr_bin))
Also tried: scale_x_date(aes(labels = c("10","20","30","40","50","60","70","80","90")))
In the plot on the x axis I would like to have values displayed from cr_bin instead of dates as 10 Cr, 20 cr, 30 Cr ... so on 90 Cr.
I have tried above code but I am not sure what else to use in place of labels to get desired results
You need to set breaks for labels. I'm using unique, just in case there might be duplicate rows.
Also note conversion off difftime to integer.
library(tidyverse)
library(lubridate)
df <- read_rds("https://github.com/johnsnow09/covid19-df_stack-code/blob/main/vaccine_milestones.rds?raw=true")
df %>%
ggplot(aes(x = Updated.On, y = as.integer(days_to_next_10cr_vacc))) +
geom_col() +
scale_x_date(breaks = unique(df$Updated.On), labels = unique(df$cr_bin))
Created on 2021-10-21 by the reprex package (v2.0.1)

How to reorder X axis date based on another variable

I have a text file here: https://login.filesanywhere.com/fs/v.aspx?v=8c6b67865a6370b0af67
I need to re-arrange my 'X' axis based on the month column of the dataset. I have tried for a while and can't seem to find a way to achieve it. The graph below currently plots from Jan to Dec but I want the order to be from Oct to Sept. This is what I have so far:
# A tibble: 6 x 6
# Groups: C_WY, WDAY, month, date [1]
C_WY WDAY month date boxname daily_mean
<fct> <int> <fct> <date> <chr> <dbl>
1 2001 274 Oct 2001-10-01 Confluence 22.3
2 2001 274 Oct 2001-10-01 DWSC-Yolo-CSlough 22.3
3 2001 274 Oct 2001-10-01 E_Delta 21.8
4 2001 274 Oct 2001-10-01 Lower_SaC 22.3
5 2001 274 Oct 2001-10-01 Lower_SJR 22.5
6 2001 274 Oct 2001-10-01 Marsh 23.0
ggplot(test2,aes(date,daily_mean,colour=boxname)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",date_labels = "%b",expand=c(0,0.5)) +
ggtitle("Test")
This should work
library(tidyverse)
library(lubridate)
test2 %>%
mutate(date = case_when(month %in% c("Oct", "Nov", "Dec") ~ date - years(1),
TRUE ~ date)) %>%
ggplot(aes(date, daily_mean, colour=boxname)) +
geom_line(size=.8) +
scale_x_date(date_breaks = "1 month",
date_labels = "%b",expand=c(0,0.5)) +
ggtitle("Test")
UPDATE: I ended up going way back into my dataset and in the code and found that I had lost my next year data. ggplot was looking at only one year of data instead of going beyond December. Thanks for trying.

How to convert week numbers into date format using R

I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05

How to repeate the value of the last day of February for a leap year in R?

I have a data.frame that doesn't account for leap year (ie all years are 365 days). I would like to repeat the last day value in February during the leap year. The DF in my code below has fake data set, I intentionally remove the leap day value in DF_NoLeapday. I would like to add a leap day value in DF_NoLeapday by repeating the value of the last day of February in a leap year (in our example it would Feb 28, 2004 value). I would rather like to have a general solution to apply this to many years data.
set.seed(55)
DF <- data.frame(date = seq(as.Date("2003-01-01"), to= as.Date("2005-12-31"), by="day"),
A = runif(1096, 0,10),
Z = runif(1096,5,15))
DF_NoLeapday <- DF[!(format(DF$date,"%m") == "02" & format(DF$date, "%d") == "29"), ,drop = FALSE]
We can use complete on the 'date' column which is already a Date class to expand the rows to fill in the missing dates
library(dplyr)
library(tidyr)
out <- DF_NoLeapday %>%
complete(date = seq(min(date), max(date), by = '1 day'))
dim(out)
#[1] 1096 3
out %>%
filter(date >= '2004-02-28', date <= '2004-03-01')
# A tibble: 3 x 3
# date A Z
# <date> <dbl> <dbl>
#1 2004-02-28 9.06 9.70
#2 2004-02-29 NA NA
#3 2004-03-01 5.30 7.35
By default, the other columns values are filled with NA, if we need to change it to a different value, it can be done within complete with fill
If we need the previous values, then use fill
out <- out %>%
fill(A, Z)
out %>%
filter(date >= '2004-02-28', date <= '2004-03-01')
# A tibble: 3 x 3
# date A Z
# <date> <dbl> <dbl>
#1 2004-02-28 9.06 9.70
#2 2004-02-29 9.06 9.70
#3 2004-03-01 5.30 7.35

Plotting daily summed values of data against months [duplicate]

This question already has answers here:
How to change x axis from years to months with ggplot2
(2 answers)
Closed 5 years ago.
I am trying to make a ggplot of solar irradiance (from a weather file) on y-axis and time in months on x-axis.
My data consists of values collected on hour basis for 12 months so overall there are 8760 rows filled with data values.
Now, I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day (Not like taking all the values and plotting them. I believe geom_freqpoly() can plot this type of data. I have looked for this but not finding enough examples in the way I want. (Or if there is some approach that can help me achieve the plot I want as I am not sure what exactly I have to do to add points for a day. Otherwise writing code for 365 days is crazy)
I want the following kind of plot
My plot is showing all the reading for a year and looks like this
My code for this plotting is :
library(ggplot2)
cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T)
time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M")
data <- cbind(time,cmsaf_data[5])
#data %>% select(time)
data <- data.frame(data, months = month(time),days = mday(time))
data <- unite(data, date_month, c(months, days), remove=FALSE, sep="-")
data <- subset(data, data[,2]>0)
GHI <- data[,2]
date_month <- data[,3]
ggplot(data, aes(date_month, GHI))+geom_line()
whereas my data looks like this :
head(data)
time Global.horizontal.irradiance..W.m2.
1 2007-01-01 00:00:00 0
2 2007-01-01 01:00:00 0
3 2007-01-01 02:00:00 0
4 2007-01-01 03:00:00 0
5 2007-01-01 04:00:00 0
6 2007-01-01 05:00:00 159
As I want 1 point for a day, how can I perform sum function so that I can get the output I require and show months names on x-axis (may be using something from time and date that can do this addition for a day and give 365 vales for a year in output)
I have no idea at all of any such function or approach.
Your help will be appreciated!
Here is a solution using the tidyverse and lubridate packages. As you haven't provided complete sample data, I've generated some random data.
library(tidyverse)
library(lubridate)
data <- tibble(
time = seq(ymd_hms('2007-01-01 00:00:00'),
ymd_hms('2007-12-31 23:00:00'),
by='hour'),
variable = sample(0:400, 8760, replace = TRUE)
)
head(data)
#> # A tibble: 6 x 2
#> time variable
#> <dttm> <int>
#> 1 2007-01-01 00:00:00 220
#> 2 2007-01-01 01:00:00 348
#> 3 2007-01-01 02:00:00 360
#> 4 2007-01-01 03:00:00 10
#> 5 2007-01-01 04:00:00 18
#> 6 2007-01-01 05:00:00 227
summarised <- data %>%
mutate(date = date(time)) %>%
group_by(date) %>%
summarise(total = sum(variable))
head(summarised)
#> # A tibble: 6 x 2
#> date total
#> <date> <int>
#> 1 2007-01-01 5205
#> 2 2007-01-02 3938
#> 3 2007-01-03 5865
#> 4 2007-01-04 5157
#> 5 2007-01-05 4702
#> 6 2007-01-06 4625
summarised %>%
ggplot(aes(date, total)) +
geom_line()
In order to get a sum for every month of every year, you need to create a Column which describes a specific month of a specific year (Yearmon).
Then you can group over that Column and sum over that group giving you one sum for every month of every year.
Then you just plot it and set the labels of the x-axis to your liking.
library(ggplot2)
library(dplyr)
library(zoo)
library(scales)
# Create dummy data for time column
time <- seq.POSIXt(from = as.POSIXct("2007-01-01 00:00:00"),
to = as.POSIXct("2017-01-01 23:00:00"),
by = "hour")
# Create dummy data.frame
data <- data.frame(Time = time,
GHI = rnorm(length(time)))
############################
# Add column Yearmon to the data.frame
# Groupy by Yearmon and summarise with sum
# This creates one sum per Yearmon
# ungroup is often not neccessary, however
# not doing this caused problems for me in the past
# Change type of Yearmon to Date for ggplot
#
df <- mutate(data,
Yearmon = as.yearmon(Time)) %>%
group_by(Yearmon) %>%
summarise(GHI_sum = sum(GHI)) %>%
ungroup() %>%
mutate(Yearmon = as.Date(Yearmon))
# Plot the chart with special scale lables
ggplot(df, aes(Yearmon, GHI_sum))+
geom_line()+
scale_x_date(labels = date_format("%m/%y"))
I hope this helps.

Resources