Subtract month/year to get years - r

data = data.frame("start"= c("1/2000","8/2004","99/9999"),
"stop"=c("1/2001","2/2007","09/2010"),
"WANTYEARS"= c(1,2.5,NA))
I have date in month/year format and want to subtract to get the years.
My attempt of simple data$stop - data$start did not yield the desired results. THank you.

The yearmon class represents months and years as years and fraction of a year.
Using data shown in the Note at the end:
library(zoo)
transform(data, diff = as.yearmon(stop, "%m/%Y") - as.yearmon(start, "%m/%Y"))
giving:
start stop diff
1 1/2000 1/2001 1.0
2 8/2004 2/2007 2.5
3 99/9999 09/2010 NA
Note
data = data.frame(start= c("1/2000", "8/2004", "99/9999"),
stop = c("1/2001", "2/2007", "09/2010"))

One option is to use difftime from base R. Add "01" to stop and start date to create an actual Date object and subtract those dates using difftime with unit as "weeks" and divide it by number of weeks in year to get time difference in year,
round(difftime(as.Date(paste0("01/", data$stop), "%d/%m/%Y"),
as.Date(paste0("01/", data$start), "%d/%m/%Y"), units = "weeks")/52.2857, 2)
#[1] 1.0 2.5 NA
We can do the same using any other unit component of difftime as well if we know the equivalent year conversion ratio like for example with "days"
round(difftime(as.Date(paste0("01/", data$stop), "%d/%m/%Y"),
as.Date(paste0("01/", data$start), "%d/%m/%Y"), units = "days")/365.25, 2)
#[1] 1.0 2.5 NA

One possibility involving dplyr and lubridate could be:
data %>%
mutate_at(vars(1:2), list(~ parse_date_time(., "my"))) %>%
mutate(WANTYEARS = round(time_length(stop - start, "years"), 1))
start stop WANTYEARS
1 2000-01-01 2001-01-01 1.0
2 2004-08-01 2007-02-01 2.5
3 <NA> 2010-09-01 NA

Related

How to subtract 1 year from a date object in R?

I have a date object as follows:
'2013-01'
'2013-02'
...
How to subtract 1 year from 2013 while keeping the month unchanged, for example
'2012-01'
'2012-02'
...
It can be done by converting to yearmon class and then subtract 1
library(zoo)
format(as.yearmon(str1) - 1, '%Y-%m')
#[1] "2012-01" "2012-02"
Similarly, for subtracting a month, use 1/12
format(as.yearmon(str1) - 1/12, '%Y-%m')
data
str1 <- c('2013-01', '2013-02')
Check with as.POSIXlt
s=as.POSIXlt(paste0(str1,'-01'))
s$year=s$year-1
format(s,'%Y-%m')
[1] "2012-01" "2012-02"

R convert yy-mm string to date format [duplicate]

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Aggregate ISO weeks into months with a dataset containing just ISO weeks

My data is in a dataframe which has a structure like this:
df2 <- data.frame(Year = c("2007"), Week = c(1:12), Measurement = c(rnorm(12, mean = 4, sd = 1)))
Unfortunately I do not have the complete date (e.g. days are missing) for each measurement, only the Year and the Weeks (these are ISO weeks).
Now I want to aggregate the Median of a Month's worth of measurements (e.g. the weekly measurements per month of the specific year) into a new column, Months. I did not find a convenient way to do this without having the exact day of the measurements available. Any inputs are much appreciated!
When it is necessary to allocate a week to a single month, the rule for first week of the year might be applied, although ISO 8601 does not consider this case. (Wikipedia)
For example, the 5th week of 2007 belongs to February, because the Thursday of the 5th week was the 1st of February.
I am using data.table and ISOweek packages. See the example how to compute the month of the week. Then you can do any aggregation by month.
require(data.table)
require(ISOweek)
df2 <- data.table(Year = c("2007"), Week = c(1:12),
Measurement = c(rnorm(12, mean = 4, sd = 1)))
# Generate Thursday as year, week of the year, day of week according to ISO 8601
df2[, thursday_ISO := paste(Year, sprintf("W%02d", Week), 4, sep = "-")]
# Convert Thursday to date format
df2[, thursday_date := ISOweek2date(thursday_ISO)]
# Compute month
df2[, month := format(thursday_date, "%m")]
df2
Suggestion by Uwe to compute a year-month string.
# Compute year-month
df2[, yr_mon := format(ISOweek2date(sprintf("%s-W%02d-4", Year, Week)), "%Y-%m")]
df2
And finally you can do an aggregation to the new table or by adding median as a column.
df2[, median(Measurement), by = yr_mon]
df2[, median := median(Measurement), by = yr_mon]
df2
If I understand correctly, you don't know the exact day, but only the week number and year. My answer takes the first day of the year as a starting date and then compute one week intervals based on that. You can probably refine the answer.
Based on
an answer by mnel, using the lubridate package.
library(lubridate)
# Prepare week, month, year information ready for the merge
# Make sure you have all the necessary dates
wmy <- data.frame(Day = seq(ymd('2007-01-01'),ymd('2007-04-01'),
by = 'weeks'))
wmy <- transform(wmy,
Week = isoweek(Day),
Month = month(Day),
Year = isoyear(Day))
# Merge this information with your data
merge(df2, wmy, by = c("Year", "Week"))
Year Week Measurement Day Month
1 2007 1 3.704887 2007-01-01 1
2 2007 10 1.974533 2007-03-05 3
3 2007 11 4.797286 2007-03-12 3
4 2007 12 4.291169 2007-03-19 3
5 2007 2 4.305010 2007-01-08 1
6 2007 3 3.374982 2007-01-15 1
7 2007 4 3.600008 2007-01-22 1
8 2007 5 4.315184 2007-01-29 1
9 2007 6 4.887142 2007-02-05 2
10 2007 7 4.155411 2007-02-12 2
11 2007 8 4.711943 2007-02-19 2
12 2007 9 2.465862 2007-02-26 2
using dplyr you can try:
require(dplyr)
df2 %>% mutate(Date = as.Date(paste("1", Week, Year, sep = "-"), format = "%w-%W-%Y"),
Year_Mon = format(Date,"%Y-%m")) %>% group_by(Year_Mon) %>%
summarise(result = median(Measurement))
As #djhrio pointed out, Thursday is used to determine the weeks in a month. So simply switch paste("1", to paste("4", in the code above.
This can be done relatively simply in dplyr.
library(dplyr)
df2 %>%
mutate(Month = rep(1:3, each = 4)) %>%
group_by(Month) %>%
summarise(MonthlyMedian = stats::median(Measurement))
Basically, add a new column to define your months. I'm presuming since you don't have days, you are going to allocate 4 weeks per month?
Then you just group by your Month variable and calculate the median. Very simple
Hope this helps

Converting yearmon column to last date of the month in R

I have a data frame (df) like the following:
Date Arrivals
2014-07 100
2014-08 150
2014-09 200
I know that I can convert the yearmon dates to the first date of each month as follows:
df$Date <- as.POSIXct(paste0(as.character(df[,1]),"-01"), format = "%Y-%m-%d")
However, given that my data is not available until the end of the month I want to index it to the end rather than the beginning, and I cannot figure it out. Any help appreciated.
If the Date variable is an actual yearmon class vector, from the zoo package, the as.Date.yearmon method can do what you want via its argument frac.
Using your data, and assuming that the Date was originally a character vector
library("zoo")
df <- data.frame(Date = c("2014-07", "2014-08", "2014-09"),
Arrivals = c(100, 150, 200))
I convert this to a yearmon vector:
df <- transform(df, Date2 = as.yearmon(Date))
Assuming this is what you have, then you can achieve what you want using as.Date() with frac = 1:
df <- transform(df, Date3 = as.Date(Date2, frac = 1))
which gives:
> df
Date Arrivals Date2 Date3
1 2014-07 100 Jul 2014 2014-07-31
2 2014-08 150 Aug 2014 2014-08-31
3 2014-09 200 Sep 2014 2014-09-30
That shows the individual steps. If you only want the final Date this is a one-liner
## assuming `Date` is a `yearmon` object
df <- transform(df, Date = as.Date(Date, frac = 1))
## or if not a `yearmon`
df <- transform(df, Date = as.Date(as.yearmon(Date), frac = 1))
The argument frac in the fraction of the month to assign to the resulting dates when converting from yearmon objects to Date objects. Hence, to get the first day of the month, rather than convert to a character and paste on "-01" as your Question showed, it's better to coerce to a Date object with frac = 0.
If the Date in your df is not a yearmon class object, then you can solve your problem by converting it to one and then using the as.Date() method as described above.
Here is a way to do it using the zoo package.
R code:
library(zoo)
df
# Date Arrivals
# 1 2014-07 100
# 2 2014-08 150
# 3 2014-09 200
df$Date <- as.Date(as.yearmon(df$Date), frac = 1)
# output
# Date Arrivals
# 1 2014-07-31 100
# 2 2014-08-31 150
# 3 2014-09-30 200
Using lubridate, you can add a month and subtract a day to get the last day of the month:
library(lubridate)
ymd(paste0(df$Date, '-01')) + months(1) - days(1)
# [1] "2014-07-31" "2014-08-31" "2014-09-30"

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources