My real goal here is to use the numeric value for a month in table 1 (i.e. January = 01, ... December =12; years are present as a separate column) and find a value in table 2 where the value returned is from one month earlier. The problem I do not know how to deal with is when the month from table 1 is January (i.e. 2014-01), how would I return the value from table 2 related to December 2013 (i.e. 2013-12)?
I'm thinking that there is a package that has a process to decrement the date/month accounting for the beginning of the year condition I describe above. I do not have an issue converting the month and year columns into actual dates to accomplish this task.
year1 <- c(2013, 2013, 2014)
year2 <- c(2013, 2013, 2014)
month1 <- c(04, 08, 01)
month2 <- c(03, 12, 08)
value1 <- c(4,6,10)
value2 <- c(6,3,8)
df1 <- data.frame(year1, month1, value1)
df2 <- data.frame(year2, month2, value2)
Given the date combination of 2014-01 from df1, the expected output from df2 would be value2 = 3 from date combination 2013-12.
Thanks in advance
I find it more convenient to work with Date objects because it's easier to add/subtract days or months (thanks to the lubridate package). So, the idea is to use the first day of a month as date field instead of separate fields for year and month.
In addition, I prefer data.table for data manipulation.
# initial data
df1 <- data.frame(year1=c(2013, 2013, 2014), month1=c(04, 08, 01), value1=c(4,6,10))
df2 <- data.frame(year2=c(2013, 2013, 2014), month2=c(03, 12, 08), value2=c(6,3,8))
library(data.table) # CRAN version 1.10.4 used
library(lubridate) # CRAN version 1.6.0 used
# coerce 1st data.frame to data.table,
# create date from year and month, skip year and month columns,
# create join date which is one month earlier
DT1 <- setDT(df1)[, .(date1 = as.Date(sprintf("%4i-%02i-01", year1, month1)),
value1)][, join.date := date1 - months(1L),]
# coerce 2nd data.frame to data.table,
# create date from year and month, skip year and month columns,
DT2 <- setDT(df2)[, .(date2 = as.Date(sprintf("%4i-%02i-01", year2, month2)),
value2)]
# right join: take all rows of DT1
DT2[DT1, on = c(date2 = "join.date")]
# date2 value2 date1 value1
#1: 2013-03-01 6 2013-04-01 4
#2: 2013-07-01 NA 2013-08-01 6
#3: 2013-12-01 3 2014-01-01 10
You can merge the dataframes (after some manipulation):
df1 <- data.frame(year1=c(2013, 2013, 2014), month1=c(04, 08, 01), value1=c(4,6,10))
df2 <- data.frame(year2=c(2013, 2013, 2014), month2=c(03, 12, 08), value2=c(6,3,8))
df1$month2 <- ifelse(df1$month1==1, 12, df1$month - 1)
df1$year2 <- ifelse(df1$month2==12, df1$year1-1, df1$year1)
merge(df1, df2, all.x=TRUE)
# month2 year2 year1 month1 value1 value2
# 1 3 2013 2013 4 4 6
# 2 7 2013 2013 8 6 NA
# 3 12 2013 2014 1 10 3
It's a bit of a workaround, but here is an idea that might help: instead of only subtracting 1, subtract 2, use the modulo operator and then add 1 back.
i = 1:12
((i - 2) %% 12) + 1
[1] 12 1 2 3 4 5 6 7 8 9 10 11
Related
I have a dataframe with dates from April 2020 to today, right now they are labelled 1 to 492 with 1 being the first date I have data on. I also have a list of dates in the format I want. How can I tell R that date 1 is april 12 2020, date 2 is april 13, 2020, and so on for each date? I'm ok either replacing the values in the column or creating a new column called real_date next to it.
Update:
Sorry I didn't describe this very well. I ended up making a look-up table with the date number and real date, and I used the inner_join function to add the real date to my dataframe.
library(tidyverse)
library(lubridate)
#Creating a sample data.frame
df <-
tibble(
dates = seq.Date(dmy("01/04/20"),today(),by = "1 day")
)
df %>%
#Format date, where: %B = month as string, %d numeric day and %y numeric year
mutate(
new_date = format(dates,"%B %d %Y")
)
*Abril is April in portuguese.
If I have understood the question correctly, you have a dataframe which has numbers from 1 to 492, now you want to change them to dates where number 1 is 12th April 2020, number 2 is 13th April 2020 and so on.
You can use as.Date to convert these numbers to date and pass the origin as 11th April.
df <- data.frame(date = 1:492)
df$real_date <- as.Date(df$date, origin = '2020-04-11')
head(df)
# date real_date
#1 1 2020-04-12
#2 2 2020-04-13
#3 3 2020-04-14
#4 4 2020-04-15
#5 5 2020-04-16
#6 6 2020-04-17
Just create a sequence of dates
data.frame(date = seq(as.Date('2020-04-12'), length.out = 492,
by = '1 day'), code = 1:492)
I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05
I have a 'decimal month' and a year variable:
df <- data.frame(decimal_month = c(4.75, 5, 5.25), year = c(2011, 2011, 2011))
How can I convert these variables to a Date? ("2011-04-22" "2011-05-01" "2011-05-08"). Or at least to day of the year.
You may use some nice functions from the zoo package:
as.yearmon to convert year and floor of the decimal month to class yearmon.
Then use as.Date.yearmon and its frac argument to coerce the year-month to class Date.
library(zoo)
df$date = as.Date(as.yearmon(paste(df$year, floor(df$decimal_month), sep = "-")),
frac = df$decimal_month - floor(df$decimal_month))
# decimal_month year date
# 1 4.75 2011 2011-04-22
# 2 5.00 2011 2011-05-01
# 3 5.25 2011 2011-05-08
If desired, day of year is simply format(df$date, "%j")
Here's my data which has 10 years in one column and 365 day of another year in second column
dat <- data.frame(year = rep(1980:1989, each = 365), doy= rep(1:365, times = 10))
I am assuming all years are non-leap years i.e. they have 365 days.
I want to create another column month which is basically month of the year the day belongs to.
library(dplyr)
dat %>%
mutate(month = as.integer(ceiling(day/31)))
However, this solution is wrong since it assigns wrong months to days. I am looking for a dplyr
solution possibly.
We can convert it to to datetime class by using the appropriate format (i.e. %Y %j) and then extract the month with format
dat$month <- with(dat, format(strptime(paste(year, doy), format = "%Y %j"), '%m'))
Or use $mon to extract the month and add 1
dat$month <- with(dat, strptime(paste(year, doy), format = "%Y %j")$mon + 1)
tail(dat$month)
#[1] 12 12 12 12 12 12
This should give you an integer value for the months:
dat$month.num <- month(as.Date(paste(dat$year, dat$doy), '%Y %j'))
If you want the month names:
dat$month.names <- month.name[month(as.Date(paste(dat$year, dat$doy), '%Y %j'))]
The result (only showing a few rows):
> dat[29:33,]
year doy month.num month.names
29 1980 29 1 January
30 1980 30 1 January
31 1980 31 1 January
32 1980 32 2 February
33 1980 33 2 February
I have the following data set. I am trying to split the date_1 field into month and days. Then converting the month number to a month name.
date_1,no_of_births_1
1/1,1482
2/2,1213
3/23,1220
4/4,1319
5/11,1262
6/18,1271
I am using month.abb[] for converting the month number to name. But instead of providing month name for each value of month number, the result is generating wrong array.
for example: month.abb[2] is generating Apr instead of Feb.
date_1 no_of_births_1 V1 V2 month
1 1/1 1482 1 1 Jan
2 2/2 1213 2 2 Apr
3 3/23 1220 3 23 May
4 4/4 1319 4 4 Jun
5 5/11 1262 5 11 Jul
6 6/18 1271 6 18 Aug
below is the code i am using,
birthday<-read.csv("Birthday_s.csv",header = TRUE)
birthday$date_1<-as.character(birthday$date_1)
#split the data
listx<-sapply(birthday$date_1,function(x) strsplit(x,"/"))
library(base)
#convert to data frame
mat<-as.data.frame(matrix(unlist(listx),ncol = 2, byrow = TRUE))
#combine birthday and mat
birthday2<-cbind(birthday,mat)
#convert month number to month name
birthday2$month<-sapply(birthday2$V1, function(x) month.abb[as.numeric(x)])
When I run your code, I get the correct months. However, your code is more complicated than necessary. Here are two ways to extract month and day from date_1:
First, when you read the data, use stringsAsFactors=FALSE, which prevents strings from getting converted to factors.
birthday <- read.csv("Birthday_s.csv",header = TRUE, stringsAsFactors=FALSE)
Extract month and days using date functions:
library(lubridate)
birthday$month = month(as.POSIXct(birthday$date_1, format="%m/%d"), abbr=TRUE, label=TRUE)
birthday$day = day(as.POSIXct(birthday$date_1, format="%m/%d"))
Extract month and days using Regular Expressions:
birthday$month = month.abb[as.numeric(gsub("([0-9]{1,2}).*", "\\1", birthday$date_1))]
birthday$day = as.numeric(gsub(".*/([0-9]{1,2}$)", "\\1", birthday$date_1))