I have a dataframe with dates from April 2020 to today, right now they are labelled 1 to 492 with 1 being the first date I have data on. I also have a list of dates in the format I want. How can I tell R that date 1 is april 12 2020, date 2 is april 13, 2020, and so on for each date? I'm ok either replacing the values in the column or creating a new column called real_date next to it.
Update:
Sorry I didn't describe this very well. I ended up making a look-up table with the date number and real date, and I used the inner_join function to add the real date to my dataframe.
library(tidyverse)
library(lubridate)
#Creating a sample data.frame
df <-
tibble(
dates = seq.Date(dmy("01/04/20"),today(),by = "1 day")
)
df %>%
#Format date, where: %B = month as string, %d numeric day and %y numeric year
mutate(
new_date = format(dates,"%B %d %Y")
)
*Abril is April in portuguese.
If I have understood the question correctly, you have a dataframe which has numbers from 1 to 492, now you want to change them to dates where number 1 is 12th April 2020, number 2 is 13th April 2020 and so on.
You can use as.Date to convert these numbers to date and pass the origin as 11th April.
df <- data.frame(date = 1:492)
df$real_date <- as.Date(df$date, origin = '2020-04-11')
head(df)
# date real_date
#1 1 2020-04-12
#2 2 2020-04-13
#3 3 2020-04-14
#4 4 2020-04-15
#5 5 2020-04-16
#6 6 2020-04-17
Just create a sequence of dates
data.frame(date = seq(as.Date('2020-04-12'), length.out = 492,
by = '1 day'), code = 1:492)
Related
I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05
I have the data-frame called dates which looks like this:
Day Month Year
2 April 2015
5 May 2014
23 December 2017
This code is:
date <- data.frame(Day = c(2,5,23),
Month = c("April", "May", "December"),
Year = c(2015, 2014, 2017))
I want to create a new column that looks like this:
Day Month Year Date
2 April 2015 2/4/2015
5 May 2014 5/5/2014
23 December 2017 23/12/2017
To do this, I tried:
data <- data %>%
mutate(Date = as.Date(paste(Day, Month, Year, sep = "/"))) %>%
dmy()
But I got an error which says:
Error in charToDate(x) :
character string is not in a standard unambiguous format
Is there an obvious error that I'm not seeing?
Thank you so much.
We need to use appropriate format in as.Date. Using base R, we can do
transform(data, Date = as.Date(paste(Day, Month, Year, sep = "/"), "%d/%B/%Y"))
# Day Month Year Date
#1 2 April 2015 2015-04-02
#2 5 May 2014 2014-05-05
#3 23 December 2017 2017-12-23
Or with dplyr and lubridate
library(dplyr)
library(lubridate)
data %>% mutate(Date = dmy(paste(Day, Month, Year, sep = "/")))
You can add format(Date, "%d/%m/%Y") if you need to change the display format.
In R, how can I produce a list of dates of all 2nd to last Wednesdays of the month in a specified date range? I've tried a few things but have gotten inconsistent results for months with five Wednesdays.
To generate a regular sequence of dates you can use seq with dates for parameter from and to. See the seq.Date documentation for more options.
Create a data frame with the date, the month and weekday. And then obtain the second to last wednesday for each month with the help of aggregate.
day_sequence = seq(as.Date("2020/1/1"), as.Date("2020/12/31"), "day")
df = data.frame(day = day_sequence,
month = months(day_sequence),
weekday = weekdays(day_sequence))
#Filter only wednesdays
df = df[df$weekday == "Wednesday",]
result = aggregate(day ~ month, df, function(x){head(tail(x,2),1)})
tail(x,2) will return the last two rows, then head(.., 1) will give you the first of these last two.
Result:
month day
1 April 2020-04-22
2 August 2020-08-19
3 December 2020-12-23
4 February 2020-02-19
5 January 2020-01-22
6 July 2020-07-22
7 June 2020-06-17
8 March 2020-03-18
9 May 2020-05-20
10 November 2020-11-18
11 October 2020-10-21
12 September 2020-09-23
There are probably simpler ways of doing this but the function below does what the question asks for. it returns a named vector of days such that
They are between from and to.
Are weekday day, where 1 is Monday.
Are n to last of the month.
By n to last I mean the nth counting from the end of the month.
whichWeekday <- function(from, to, day, n, format = "%Y-%m-%d"){
from <- as.Date(from, format = format)
to <- as.Date(to, format = format)
day <- as.character(day)
d <- seq(from, to, by = "days")
m <- format(d, "%Y-%m")
f <- c(TRUE, m[-1] != m[-length(m)])
f <- cumsum(f)
wed <- tapply(d, f, function(x){
i <- which(format(x, "%u") == day)
x[ tail(i, n)[1] ]
})
y <- as.Date(wed, origin = "1970-01-01")
setNames(y, format(y, "%Y-%m"))
}
whichWeekday("2019-01-01", "2020-03-31", 4, 2)
# 2019-01 2019-02 2019-03 2019-04 2019-05
#"2019-01-23" "2019-02-20" "2019-03-20" "2019-04-17" "2019-05-22"
# 2019-06 2019-07 2019-08 2019-09 2019-10
#"2019-06-19" "2019-07-24" "2019-08-21" "2019-09-18" "2019-10-23"
# 2019-11 2019-12 2020-01 2020-02 2020-03
#"2019-11-20" "2019-12-18" "2020-01-22" "2020-02-19" "2020-03-18"
Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.
We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12
This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February
My real goal here is to use the numeric value for a month in table 1 (i.e. January = 01, ... December =12; years are present as a separate column) and find a value in table 2 where the value returned is from one month earlier. The problem I do not know how to deal with is when the month from table 1 is January (i.e. 2014-01), how would I return the value from table 2 related to December 2013 (i.e. 2013-12)?
I'm thinking that there is a package that has a process to decrement the date/month accounting for the beginning of the year condition I describe above. I do not have an issue converting the month and year columns into actual dates to accomplish this task.
year1 <- c(2013, 2013, 2014)
year2 <- c(2013, 2013, 2014)
month1 <- c(04, 08, 01)
month2 <- c(03, 12, 08)
value1 <- c(4,6,10)
value2 <- c(6,3,8)
df1 <- data.frame(year1, month1, value1)
df2 <- data.frame(year2, month2, value2)
Given the date combination of 2014-01 from df1, the expected output from df2 would be value2 = 3 from date combination 2013-12.
Thanks in advance
I find it more convenient to work with Date objects because it's easier to add/subtract days or months (thanks to the lubridate package). So, the idea is to use the first day of a month as date field instead of separate fields for year and month.
In addition, I prefer data.table for data manipulation.
# initial data
df1 <- data.frame(year1=c(2013, 2013, 2014), month1=c(04, 08, 01), value1=c(4,6,10))
df2 <- data.frame(year2=c(2013, 2013, 2014), month2=c(03, 12, 08), value2=c(6,3,8))
library(data.table) # CRAN version 1.10.4 used
library(lubridate) # CRAN version 1.6.0 used
# coerce 1st data.frame to data.table,
# create date from year and month, skip year and month columns,
# create join date which is one month earlier
DT1 <- setDT(df1)[, .(date1 = as.Date(sprintf("%4i-%02i-01", year1, month1)),
value1)][, join.date := date1 - months(1L),]
# coerce 2nd data.frame to data.table,
# create date from year and month, skip year and month columns,
DT2 <- setDT(df2)[, .(date2 = as.Date(sprintf("%4i-%02i-01", year2, month2)),
value2)]
# right join: take all rows of DT1
DT2[DT1, on = c(date2 = "join.date")]
# date2 value2 date1 value1
#1: 2013-03-01 6 2013-04-01 4
#2: 2013-07-01 NA 2013-08-01 6
#3: 2013-12-01 3 2014-01-01 10
You can merge the dataframes (after some manipulation):
df1 <- data.frame(year1=c(2013, 2013, 2014), month1=c(04, 08, 01), value1=c(4,6,10))
df2 <- data.frame(year2=c(2013, 2013, 2014), month2=c(03, 12, 08), value2=c(6,3,8))
df1$month2 <- ifelse(df1$month1==1, 12, df1$month - 1)
df1$year2 <- ifelse(df1$month2==12, df1$year1-1, df1$year1)
merge(df1, df2, all.x=TRUE)
# month2 year2 year1 month1 value1 value2
# 1 3 2013 2013 4 4 6
# 2 7 2013 2013 8 6 NA
# 3 12 2013 2014 1 10 3
It's a bit of a workaround, but here is an idea that might help: instead of only subtracting 1, subtract 2, use the modulo operator and then add 1 back.
i = 1:12
((i - 2) %% 12) + 1
[1] 12 1 2 3 4 5 6 7 8 9 10 11