Year-on-year change column in the data frame of monthly data - r

I am looking for the solution to my task. It is to add a new column to the data frame, which contains monthly data, that would have year-on-year change for each record that has a corresponding record of same month a year ago.
So, my code as of now is:
library(blsAPI)
cpi <- blsAPI("CUUR0000SA0",2,TRUE)
cpi$value <- as.numeric(cpi$value)
cpi$date <- as.Date(
paste0("1 ",cpi$periodName," ",cpi$year),
format = "%d %B %Y")
cpi <- cpi[order(cpi$date),]
I would like to add new column with YoY change value for cpi$value column.

Something like:
df <- data.frame(Date = c("2021-01-16", "2017-05-09"))
df |> dplyr::mutate(new = as.Date(Date) + 365)
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
library(lubridate)
df |>
dplyr::mutate(new = as_date(Date) %m+% years(1))
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
Created on 2022-02-11 by the reprex package (v2.0.1)

Related

extract year and month from character date field R

I have a column in my large data set called Date. How do I extract both the year and month from it? I would like to create a column Month where the month goes from 1-12 and year where the year goes from the first year in my data set to the last year in my data set.
Thanks.
> typeof(data$Date)
[1] "character
> head(data$Date)
[1] "2/06/2020 11:23" "12/06/2020 7:56" "12/06/2020 7:56" "29/06/2020 16:54" "3/06/2020 15:09" "25/06/2020 17:11"
dplyr and lubridate -
library(dplyr)
library(lubridate)
data <- data %>%
mutate(Date = dmy_hm(Date),
month = month(Date),
year = year(Date))
# Date month year
#1 2020-06-02 11:23:00 6 2020
#2 2020-06-12 07:56:00 6 2020
#3 2020-06-12 07:56:00 6 2020
#4 2020-06-29 16:54:00 6 2020
#5 2020-06-03 15:09:00 6 2020
#6 2020-06-25 17:11:00 6 2020
Base R -
data$Date <- as.POSIXct(data$Date, tz = 'UTC', format = '%d/%m/%Y %H:%M')
data <- transform(data, Month = format(Date, '%m'), Year = format(Date, '%Y'))
data
data <- structure(list(Date = c("2/06/2020 11:23", "12/06/2020 7:56",
"12/06/2020 7:56", "29/06/2020 16:54", "3/06/2020 15:09", "25/06/2020 17:11"
)), class = "data.frame", row.names = c(NA, -6L))

In R, using one date column how do i subtract 2 spots above from the current line and in a new column add the result?

I have a date column and I want to just use the information in this one column
Date
2020-01-05
2020-01-30
2020-01-20
2020-01-10
2020-01-15
2020-01-30
I create a new column
df$3to1_difference
The function i want to create gives me this result. I would like to have the third column subtract the first column
Date | 3to1_difference
2020-01-05 N/A
2020-01-30 N/A
2020-01-20 15
2020-01-10 -20
2020-01-15 -5
2020-01-30 20
library(lubridate)
library(tibble)
library(dplyr)
tbl <- tibble::tibble(date = lubridate::as_date( c("2020-01-05", "2020-01-30","2020-01-20", "2020-01-10", "2020-01-15", "2020-01-30")))
tbl %>% mutate(`3to1difference` = date - lag(date, n = 2)) ## as difference in days
tbl %>% mutate(`3to1difference` = as.numeric(date - lag(date, n = 2))) ## as numeric variable
An option with data.table
library(data.table)
setDT(tbl)[, `3toldifference` := .(date = shift(date, n = 3))]

Getting first and last day of each month in R

I need to get the row of the first and last day of each month in a big data frame where I need to apply operations that cover accurately each month, using a for loop. Unfortunately, the data frame is not very homogeneous. Here a reproducible example to work upon:
dataframe <- data.frame(Date=c(seq.Date(as.Date("2020-01-01"),as.Date("2020-01-31"),by="day"),
seq.Date(as.Date("2020-02-01"),as.Date("2020-02-28"),by="day"),seq.Date(as.Date("2020-03-02"),
as.Date("2020-03-31"),by="day")))
We can create a grouping column by converting to yearmon and then get the first and last
library(zoo)
library(dplyr)
dataframe %>%
group_by(yearMon = as.yearmon(Date)) %>%
summarise(FirstDay = first(Date), LastDay = last(Date))
# A tibble: 3 x 3
# yearMon First Last
#* <yearmon> <date> <date>
#1 Jan 2020 2020-01-01 2020-01-31
#2 Feb 2020 2020-02-01 2020-02-28
#3 Mar 2020 2020-03-02 2020-03-31
If it the first and last day irrespective of the data
library(lubridate)
dataframe %>%
group_by(yearMon = as.yearmon(Date)) %>%
summarise(First = floor_date(first(Date), 'month'),
Last = ceiling_date(last(Date), 'month')-1)

Converting date and time data

I have date data formatted in an odd way that I would like to clean up in R.
The dates are in format "d-Mon-y hh:mm:sec AM". For example "1-Feb-05 12:00:00 AM". The day and time are useless to me, however I would like to be able to use the month and year while also converting them to date-time format.
I cannot figure out how to do this.
Here is a way to do it with handy lubridate parsers and extractors. First convert the string into a datetime and then extract the month and the year:
library(tidyverse)
library(lubridate)
tibble(datetime = "1-Feb-05 12:00:00 AM") %>%
mutate(
datetime = dmy_hms(datetime),
year = year(datetime),
month = month(datetime)
)
#> # A tibble: 1 x 3
#> datetime year month
#> <dttm> <dbl> <dbl>
#> 1 2005-02-01 00:00:00 2005 2
Created on 2018-05-09 by the reprex package (v0.2.0).

Plotting daily summed values of data against months [duplicate]

This question already has answers here:
How to change x axis from years to months with ggplot2
(2 answers)
Closed 5 years ago.
I am trying to make a ggplot of solar irradiance (from a weather file) on y-axis and time in months on x-axis.
My data consists of values collected on hour basis for 12 months so overall there are 8760 rows filled with data values.
Now, I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day (Not like taking all the values and plotting them. I believe geom_freqpoly() can plot this type of data. I have looked for this but not finding enough examples in the way I want. (Or if there is some approach that can help me achieve the plot I want as I am not sure what exactly I have to do to add points for a day. Otherwise writing code for 365 days is crazy)
I want the following kind of plot
My plot is showing all the reading for a year and looks like this
My code for this plotting is :
library(ggplot2)
cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T)
time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M")
data <- cbind(time,cmsaf_data[5])
#data %>% select(time)
data <- data.frame(data, months = month(time),days = mday(time))
data <- unite(data, date_month, c(months, days), remove=FALSE, sep="-")
data <- subset(data, data[,2]>0)
GHI <- data[,2]
date_month <- data[,3]
ggplot(data, aes(date_month, GHI))+geom_line()
whereas my data looks like this :
head(data)
time Global.horizontal.irradiance..W.m2.
1 2007-01-01 00:00:00 0
2 2007-01-01 01:00:00 0
3 2007-01-01 02:00:00 0
4 2007-01-01 03:00:00 0
5 2007-01-01 04:00:00 0
6 2007-01-01 05:00:00 159
As I want 1 point for a day, how can I perform sum function so that I can get the output I require and show months names on x-axis (may be using something from time and date that can do this addition for a day and give 365 vales for a year in output)
I have no idea at all of any such function or approach.
Your help will be appreciated!
Here is a solution using the tidyverse and lubridate packages. As you haven't provided complete sample data, I've generated some random data.
library(tidyverse)
library(lubridate)
data <- tibble(
time = seq(ymd_hms('2007-01-01 00:00:00'),
ymd_hms('2007-12-31 23:00:00'),
by='hour'),
variable = sample(0:400, 8760, replace = TRUE)
)
head(data)
#> # A tibble: 6 x 2
#> time variable
#> <dttm> <int>
#> 1 2007-01-01 00:00:00 220
#> 2 2007-01-01 01:00:00 348
#> 3 2007-01-01 02:00:00 360
#> 4 2007-01-01 03:00:00 10
#> 5 2007-01-01 04:00:00 18
#> 6 2007-01-01 05:00:00 227
summarised <- data %>%
mutate(date = date(time)) %>%
group_by(date) %>%
summarise(total = sum(variable))
head(summarised)
#> # A tibble: 6 x 2
#> date total
#> <date> <int>
#> 1 2007-01-01 5205
#> 2 2007-01-02 3938
#> 3 2007-01-03 5865
#> 4 2007-01-04 5157
#> 5 2007-01-05 4702
#> 6 2007-01-06 4625
summarised %>%
ggplot(aes(date, total)) +
geom_line()
In order to get a sum for every month of every year, you need to create a Column which describes a specific month of a specific year (Yearmon).
Then you can group over that Column and sum over that group giving you one sum for every month of every year.
Then you just plot it and set the labels of the x-axis to your liking.
library(ggplot2)
library(dplyr)
library(zoo)
library(scales)
# Create dummy data for time column
time <- seq.POSIXt(from = as.POSIXct("2007-01-01 00:00:00"),
to = as.POSIXct("2017-01-01 23:00:00"),
by = "hour")
# Create dummy data.frame
data <- data.frame(Time = time,
GHI = rnorm(length(time)))
############################
# Add column Yearmon to the data.frame
# Groupy by Yearmon and summarise with sum
# This creates one sum per Yearmon
# ungroup is often not neccessary, however
# not doing this caused problems for me in the past
# Change type of Yearmon to Date for ggplot
#
df <- mutate(data,
Yearmon = as.yearmon(Time)) %>%
group_by(Yearmon) %>%
summarise(GHI_sum = sum(GHI)) %>%
ungroup() %>%
mutate(Yearmon = as.Date(Yearmon))
# Plot the chart with special scale lables
ggplot(df, aes(Yearmon, GHI_sum))+
geom_line()+
scale_x_date(labels = date_format("%m/%y"))
I hope this helps.

Resources