I have a list of dates for company fiscal years. I would like to convert all dates that lie between 1st Jan - 31st May into a new variable where it says that it belongs to the prior year. I also have dates that range between 1st June - 31st Dec and I want those years to stay the same.
Example of what we want:
date year
2010-05-31 2009
2015-03-31 2014
2007-04-30 2006
2011-08-31 2011
2002-11-30 2002
Your help is much appreciated! Thank you!
You can do in base R:
> df <- data.frame(date = as.Date(c("2010-05-31", "2015-03-31", "2007-04-30", "2011-08-31", "2002-11-30")))
> df$year <- as.numeric(format(df$date, "%Y")) - (as.numeric(format(df$date, "%m")) < 6)
> df
date year
1 2010-05-31 2009
2 2015-03-31 2014
3 2007-04-30 2006
4 2011-08-31 2011
5 2002-11-30 2002
Final year is the year minus 1 if month is before June.
Using dplyr and lubridate :
library(dplyr)
library(lubridate)
df %>% mutate(year = year(date) - as.integer(month(date) <= 5))
# date year
#1 2010-05-31 2009
#2 2015-03-31 2014
#3 2007-04-30 2006
#4 2011-08-31 2011
#5 2002-11-30 2002
Related
This question already has answers here:
Extract Month and Year From Date in R
(5 answers)
Closed 1 year ago.
Let's say I have this example df dataset
Order.Date
2011-10-20
2011-12-25
2012-04-15
2012-08-23
2013-09-25
I want to extract the month and the year, and be like this
Order.Date Month Year
2011-10-20 October 2011
2011-12-25 December 2011
2012-04-15 April 2012
2012-08-23 August 2012
2013-09-25 September 2013
any solution? anything, can use lubridate or anything
lubridate month and year will work.
as.data.frame(Order.Date) %>%
mutate(Month = lubridate::month(Order.Date, label = FALSE),
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 10 2011
2 2011-12-25 12 2011
3 2012-04-15 4 2012
4 2012-08-23 8 2012
5 2013-09-25 9 2013
If you want month format as Jan, use month.abb and as January, use month.name
as.data.frame(Order.Date) %>%
mutate(Month = month.abb[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 Oct 2011
2 2011-12-25 Dec 2011
3 2012-04-15 Apr 2012
4 2012-08-23 Aug 2012
5 2013-09-25 Sep 2013
as.data.frame(Order.Date) %>%
mutate(Month = month.name[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 October 2011
2 2011-12-25 December 2011
3 2012-04-15 April 2012
4 2012-08-23 August 2012
5 2013-09-25 September 2013
You can use format with %B for Month and %Y for Year or using months for Month.
format(as.Date("2011-10-20"), "%B")
#months(as.Date("2011-10-20")) #Alternative
#[1] "October"
format(as.Date("2011-10-20"), "%Y")
#[1] "2011"
I have a simple df with a column of dates in yearmon class:
df <- structure(list(year_mon = structure(c(2015.58333333333, 2015.66666666667,
2015.75, 2015.83333333333, 2015.91666666667, 2016, 2016.08333333333,
2016.16666666667, 2016.25, 2016.33333333333), class = "yearmon")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
I'd like a simple way, preferably using base R, lubridate or xts / zoo to calculate the first and last days of each month.
I've seen other packages that do this, but I'd like to stick with the aforementioned if possible.
We can use
library(dplyr)
library(lubridate)
library(zoo)
df %>%
mutate(firstday = day(year_mon), last = day(as.Date(year_mon, frac = 1)))
Using base R, you could convert the yearmon object to date using as.Date which would give you the first day of the month. For the last day, we could increment the date by a month (1/12) and subtract 1 day from it.
df$first_day <- as.Date(df$year_mon)
df$last_day <- as.Date(df$year_mon + 1/12) - 1
df
# year_mon first_day last_day
# <S3: yearmon> <date> <date>
# 1 Aug 2015 2015-08-01 2015-08-31
# 2 Sep 2015 2015-09-01 2015-09-30
# 3 Oct 2015 2015-10-01 2015-10-31
# 4 Nov 2015 2015-11-01 2015-11-30
# 5 Dec 2015 2015-12-01 2015-12-31
# 6 Jan 2016 2016-01-01 2016-01-31
# 7 Feb 2016 2016-02-01 2016-02-29
# 8 Mar 2016 2016-03-01 2016-03-31
# 9 Apr 2016 2016-04-01 2016-04-30
#10 May 2016 2016-05-01 2016-05-31
Use as.Date.yearmon from zoo as shown. frac specifies the fractional amount through the month to use so that 0 is beginning of the month and 1 is the end.
The default value of frac is 0.
You must already be using zoo if you are using yearmon (since that is where the yearmon methods are defined) so this does not involve using any additional packages beyond what you are already using.
If you are using dplyr, optionally replace transform with mutate.
transform(df, first = as.Date(year_mon), last = as.Date(year_mon, frac = 1))
gives:
year_mon first last
1 Aug 2015 2015-08-01 2015-08-31
2 Sep 2015 2015-09-01 2015-09-30
3 Oct 2015 2015-10-01 2015-10-31
4 Nov 2015 2015-11-01 2015-11-30
5 Dec 2015 2015-12-01 2015-12-31
6 Jan 2016 2016-01-01 2016-01-31
7 Feb 2016 2016-02-01 2016-02-29
8 Mar 2016 2016-03-01 2016-03-31
9 Apr 2016 2016-04-01 2016-04-30
10 May 2016 2016-05-01 2016-05-31
I am trying de-seasonalize my data by dividing my monthly totals by the average seasonality ratio per that month. I have two data frames. avgseasonality that has 12 rows of the average seasonality ratio per month. The problem is since the seasonality ratio is the ratio of each month averaged only has 12 rows and the ordertotal data frame has 147 rows.
deseasonlize <- transform(avgseasonalityratio, deseasonlizedtotal =
df1$OrderTotal / avgseasonality$seasonalityratio)
This runs but it does not pair the months appropriately. It uses the first ratio of april and runs it on the first ordertotal of december.
> avgseasonality
Month seasonalityratio
1 April 1.0132557
2 August 1.0054602
3 December 0.8316988
4 February 0.9813396
5 January 0.8357475
6 July 1.1181648
7 June 1.0439899
8 March 1.1772450
9 May 1.0430667
10 November 0.9841149
11 October 0.9595041
12 September 0.8312318
> df1
# A tibble: 157 x 3
DateEntLabel OrderTotal `d$Month`
<dttm> <dbl> <chr>
1 2005-12-01 00:00:00 512758. December
2 2006-01-01 00:00:00 227449. January
3 2006-02-01 00:00:00 155652. February
4 2006-03-01 00:00:00 172923. March
5 2006-04-01 00:00:00 183854. April
6 2006-05-01 00:00:00 239689. May
7 2006-06-01 00:00:00 237638. June
8 2006-07-01 00:00:00 538688. July
9 2006-08-01 00:00:00 197673. August
10 2006-09-01 00:00:00 144534. September
# ... with 147 more rows
I need the ordertotal and ratio of each month respectively. The calculations would for each month respectively be such as (december) 512758/0.8316988 = 616518.864762 The output for the calculations would be in their new column that corresponds with the month and ordertotal. Please any help is greatly appreciated!
Easiest way would be to merge() your data first, then do the operation. You can use R base merge() function, though I will show here using the tidyverse left_join() function. I see that one of your columns has a strange name d$Month, renameing this to Month will simplify the merge!
Reproducible example:
library(tidyverse)
df_1 <- data.frame(Month = c("Jan", "Feb"), seasonalityratio = c(1,2))
df_2 <- data.frame(Month = rep(c("Jan", "Feb"),each=2), OrderTotal = 1:4)
df_1 %>%
left_join(df_2, by = "Month") %>%
mutate(eseasonlizedtotal = OrderTotal / seasonalityratio)
#> Month seasonalityratio OrderTotal eseasonlizedtotal
#> 1 Jan 1 1 1.0
#> 2 Jan 1 2 2.0
#> 3 Feb 2 3 1.5
#> 4 Feb 2 4 2.0
Created on 2019-01-30 by the reprex package (v0.2.1)
I have a dataset with dates in following format:
Initial:
Jan-2015 Apr-2013 Jun-2014 Jan-2015 Jan-2016 Jan-2015 Jan-2016 Jan-2015 Apr-2012 Nov-2012 Jun-2013 Sep-2013
Final:
Feb-2014 Jan-2013 Sep-2014 Apr-2013 Sep-2014 Mar-2013 Aug-2012 Apr-2012 Oct-2012 Oct-2013 Jun-2014 Oct-2013
I would like to perform these steps:
create dummy variables for Month and Year
Subtract these dates from another dates to find out duration (final- initials) in months
I would like to do these in R?
You could use as.yearmon from the zoo package for this.
library(zoo)
12 * (as.yearmon("Jan-2015", "%b-%Y") - as.yearmon("Feb-2014", "%b-%Y"))
# result
# [1] 11
To expand on #neilfws answer, you can use the month and year functions from the lubridate package to create your dummy variables with the month and year in your data frame.
Here is the code:
library(lubridate)
library(zoo)
df <- data.frame(Initial = c("Jan-2015", "Apr-2013", "Jun-2014", "Jan-2015", "Jan-2016", "Jan-2015",
"Jan-2016", "Jan-2015", "Apr-2012", "Nov-2012", "Jun-2013", "Sep-2013"),
Final = c("Feb-2014", "Jan-2013", "Sep-2014", "Apr-2013", "Sep-2014", "Mar-2013",
"Aug-2012", "Apr-2012", "Oct-2012", "Oct-2013", "Jun-2014", "Oct-2013"))
df$Initial <- as.character(df$Initial)
df$Final <- as.character(df$Final)
df$Initial <- as.yearmon(df$Initial, "%b-%Y")
df$Final <- as.yearmon(df$Final, "%b-%Y")
df$month_initial <- month(df$Initial)
df$year_intial <- year(df$Initial)
df$month_final <- month(df$Final)
df$year_final <- year(df$Final)
df$Difference <- 12*(df$Initial-df$Final)
And here is the final data.frame:
> head(df)
Initial Final month_initial year_intial month_final year_final Difference
1 Jan 2015 Feb 2014 1 2015 2 2014 11
2 Apr 2013 Jan 2013 4 2013 1 2013 3
3 Jun 2014 Sep 2014 6 2014 9 2014 -3
4 Jan 2015 Apr 2013 1 2015 4 2013 21
5 Jan 2016 Sep 2014 1 2016 9 2014 16
6 Jan 2015 Mar 2013 1 2015 3 2013 22
Hope this helps!
I have a data frame in my R environment that I would like to subset based on a specific criteria -a sort of conditional filter. My data frame is a panel dataset of daily values for each day between 2004-2014. Each day in the data frame is a separate observation. Each year has 366 days. I would like to subset the data such that only the leap years retain the 366th day in the panel data. There are three leap years in that time range -2004, 2008, 2012. I have a separate column for the year and the day of the year. In other words, I need a script that will return a dataset without the 366th day but only for each year other than 2004, 2008, and 2012.
I've managed to accomplish this the following way: I pasted my day and year columns together (e.g. "2006-366") and simply used dplyr's filter command to subset each year (2005-366, 2006-366, 2007-366, 2009-366, 2010-366, 2011-366, 2013-366, 2014-366). This however is an awfully crude method. I was hoping someone could point me in the right direction here. Here's some reproducible data along with the workflow I used.
#Create DF
year<-rep(c(2004:2014), each=366)
day<-rep(c(1:366))
df<-data.frame(day, year)
#My crude method
df $reduc<-paste(df$year, df$day, sep="-")
df <-df %>%
filter(reduc!="2005-366") %>%
filter(reduc!="2006-366") %>%
filter(reduc!="2007-366") %>%
filter(reduc!="2009-366") %>%
filter(reduc!="2010-366") %>%
filter(reduc!="2011-366") %>%
filter(reduc!="2013-366") %>%
filter(reduc!="2014-366")
Set up data:
df <- expand.grid(year=2004:2014,day=1:366)
nrow(df) ## 4026
Now exclude cases where (year is not divisible by 4) AND (day equals 366) (identifying non-leap years would be trickier if you included 2000 and/or century-years in your data set ...)
library(dplyr)
df2 <- df %>% filter(!(year %% 4 > 0 & day==366))
You should derive the correct Date values for your dates. This can be done by building the January 1st string representation for each row's year, coercing to Date type, and then adding the day (minus 1) to the Date value.
df$date <- as.Date(paste0(df$year,'-01-01'))+(df$day-1L);
We will then be able to pull out the year from the Date value and check it against the input year. If they fail to match, then we know the year/day combination was invalid, and we can excise it from the data. This works because invalid leap days will translate into January 1st of the following year under the above derivation method.
df[df$year==as.integer(strftime(df$date,'%Y')),];
## day year date
## 1 1 2004 2004-01-01
## ...
## 366 366 2004 2004-12-31
## 367 1 2005 2005-01-01
## ...
## 731 365 2005 2005-12-31
## 733 1 2006 2006-01-01
## ...
## 1097 365 2006 2006-12-31
## 1099 1 2007 2007-01-01
## ...
## 1463 365 2007 2007-12-31
## 1465 1 2008 2008-01-01
## ...
## 1830 366 2008 2008-12-31
## 1831 1 2009 2009-01-01
## ...
## 2195 365 2009 2009-12-31
## 2197 1 2010 2010-01-01
## ...
## 2561 365 2010 2010-12-31
## 2563 1 2011 2011-01-01
## ...
## 2927 365 2011 2011-12-31
## 2929 1 2012 2012-01-01
## ...
## 3294 366 2012 2012-12-31
## 3295 1 2013 2013-01-01
## ...
## 3659 365 2013 2013-12-31
## 3661 1 2014 2014-01-01
## ...
## 4025 365 2014 2014-12-31