How to get every quarter of a date interval in R? [duplicate] - r

This question already has answers here:
Extract only quarter from a date in r
(4 answers)
Closed 6 years ago.
what I have is a data frame with many products, prices and startdate/enddate the object has been online.
product startdate enddate price
1 2012-03-17 2016-09-08 10
2 2014-05-16 2015-06-29 8
3 2015-07-01 2016-04-02 9
What I want to have is to get every quarter and year of the time the product has been online. For example for product 3: Q3 15, Q4 15, Q1 16, Q2 16.
I already transformed it into interval class via:
library(lubridate)
interval <- interval(startdate,enddate)
interval
I searched for a way to get the quarters out of that interval but couldn't find a solution.
My overall goal is to calculate the mean of the prices of every product online for every quarter.
Any help would be appreciated. Thank you!

If df is your data frame, what the following does is generate the sequence of all months from startdate to enddate, retain unique combinations of product and quarters and calculate the average.
library(lubridate)
library(dplyr)
df <- df %>%
mutate(startdate = ymd(startdate),
enddate = ymd(enddate))
df$output <- mapply(function(x,y) seq(x, y, by = "month"),
df$startdate,
df$enddate)
df %>%
tidyr::unnest(output) %>%
mutate(quarter = paste0("Q",quarter(output), " ", year(output))) %>%
select(-output) %>%
group_by(product, startdate, enddate, quarter) %>%
filter(row_number(quarter) == 1) %>%
summarise(mean(price))
Result for the first row of your data frame would be:
product startdate enddate quarter `mean(price)`
<int> <date> <date> <chr> <dbl>
1 1 2012-03-17 2016-09-08 Q1 2012 10
2 1 2012-03-17 2016-09-08 Q1 2013 10
3 1 2012-03-17 2016-09-08 Q1 2014 10
4 1 2012-03-17 2016-09-08 Q1 2015 10
5 1 2012-03-17 2016-09-08 Q1 2016 10
6 1 2012-03-17 2016-09-08 Q2 2012 10
7 1 2012-03-17 2016-09-08 Q2 2013 10
8 1 2012-03-17 2016-09-08 Q2 2014 10
9 1 2012-03-17 2016-09-08 Q2 2015 10
10 1 2012-03-17 2016-09-08 Q2 2016 10
11 1 2012-03-17 2016-09-08 Q3 2012 10
12 1 2012-03-17 2016-09-08 Q3 2013 10
13 1 2012-03-17 2016-09-08 Q3 2014 10
14 1 2012-03-17 2016-09-08 Q3 2015 10
15 1 2012-03-17 2016-09-08 Q3 2016 10
16 1 2012-03-17 2016-09-08 Q4 2012 10
17 1 2012-03-17 2016-09-08 Q4 2013 10
18 1 2012-03-17 2016-09-08 Q4 2014 10
19 1 2012-03-17 2016-09-08 Q4 2015 10

Related

Ranking of values in one quarter [duplicate]

This question already has answers here:
Calculate rank by group
(4 answers)
How to emulate SQLs rank functions in R?
(5 answers)
Closed 8 days ago.
I am trying to implement a calculation that will rank the Price values in a separate partition. Below you can see my data
df<-data.frame( year=c(2010,2010,2010,2010,2010,2010),
quarter=c("q1","q1","q1","q2","q2","q2"),
Price=c(10,20,30,10,20,30)
)
df
Now I want to count over each quarter and I expect to have 1 for the smallest Price and 3 for the highest Price
df %>% group_by(quarter) %>% mutate(id = row_number(Price))
Instead of the expected results, I received different results. Below you can see the result from the code. Instead of ranking in separate quarter, ranging is in both quarters.
So can anybody help me how to solve this problem and to receive results as in table below
You probably want rank.
transform(df, id=ave(Price, year, quarter, FUN=rank))
# year quarter Price id
# 1 2010 q1 10 1
# 2 2010 q1 20 2
# 3 2010 q1 30 3
# 4 2010 q2 10 1
# 5 2010 q2 20 2
# 6 2010 q2 30 3
With dplyr, use dense_rank
library(dplyr)
df %>%
group_by(quarter) %>%
mutate(id = dense_rank(Price)) %>%
ungroup
# A tibble: 6 × 4
year quarter Price id
<dbl> <chr> <dbl> <int>
1 2010 q1 10 1
2 2010 q1 20 2
3 2010 q1 30 3
4 2010 q2 10 1
5 2010 q2 20 2
6 2010 q2 30 3
In the newer version of dplyr, can also use .by in mutate
df %>%
mutate(id = dense_rank(Price), .by = 'quarter')
year quarter Price id
1 2010 q1 10 1
2 2010 q1 20 2
3 2010 q1 30 3
4 2010 q2 10 1
5 2010 q2 20 2
6 2010 q2 30 3
Alternatively with row_number()
library(tidyverse)
df %>% group_by(year, quarter) %>% mutate(id=row_number())
Created on 2023-02-12 with reprex v2.0.2
# A tibble: 6 × 4
# Groups: year, quarter [2]
year quarter Price id
<dbl> <chr> <dbl> <int>
1 2010 q1 10 1
2 2010 q1 20 2
3 2010 q1 30 3
4 2010 q2 10 1
5 2010 q2 20 2
6 2010 q2 30 3

Create incremental column year based on id and year column in R

I have the below dataframe and i want to create the 'create_col' using some kind of seq() function i guess using the 'year' column as the start of the sequence. How I could do that?
id <- c(1,1,2,3,3,3,4)
year <- c(2013, 2013, 2015,2017,2017,2017,2011)
create_col <- c(2013,2014,2015,2017,2018,2019,2011)
Ideal result:
id year create_col
1 1 2013 2013
2 1 2013 2014
3 2 2015 2015
4 3 2017 2017
5 3 2017 2018
6 3 2017 2019
7 4 2011 2011
You can add row_number() to minimum year in each id :
library(dplyr)
df %>%
group_by(id) %>%
mutate(create_col = min(year) + row_number() - 1)
# id year create_col
# <dbl> <dbl> <dbl>
#1 1 2013 2013
#2 1 2013 2014
#3 2 2015 2015
#4 3 2017 2017
#5 3 2017 2018
#6 3 2017 2019
#7 4 2011 2011
data
df <- data.frame(id, year)

Group By and summaries with condition

I have data frame df. After group_by(id, Year, Month, new_used_ind) and summarise(n = n()) it looks like:
id Year Month new_used_ind n
1 2001 apr N 3
1 2001 apr U 2
2 2002 mar N 5
3 2003 mar U 3
4 2004 july N 4
4 2004 july U 2
I want to add and get total for id, year and month but also want a total of ' N' from new_used_ind in a new column.
Something like this
id Year Month Total_New total
1 2001 apr 3 5
2 2002 mar 5 8
4 2004 july 4 6
library(dplyr)
read.table(text= "id Year Month new_used_ind n
1 2001 apr N 3
1 2001 apr U 2
2 2002 mar N 5
3 2003 mar U 3
4 2004 july N 4
4 2004 july U 2", header = T) -> df
df %>%
group_by(id, Year, Month) %>%
mutate(total_New=sum(n*(new_used_ind=="N"))) %>%
mutate(total_n=sum(n)) %>%
summarise_at(c("total_New", "total_n"), mean)
#> # A tibble: 4 x 5
#> # Groups: id, Year [4]
#> id Year Month total_New total_n
#> <int> <int> <fct> <dbl> <dbl>
#> 1 1 2001 apr 3 5
#> 2 2 2002 mar 5 5
#> 3 3 2003 mar 0 3
#> 4 4 2004 july 4 6
Created on 2019-06-11 by the reprex package (v0.3.0)

How to de-cumulate variable in dplyr?

I have an issue. I have panel of quarterly individual data, which are "annually cumulative", ie. values for 1st quarter are for 1st quarter, values for 2nd quarter are sum for 1st and 2nd, 3rd quarter values are sums for first 3 quarters of the year and 4th quarter are annual sums. How to easily de-cumulate those in dplyr, grouping by id and year?
Assuming we have two years, and in year one sales are 2 per quarter, and in year 2 sales are 3 per quarter, the original is:
df = data.frame(quarter = c("Q1","Q2","Q3","Q4","Q1","Q2","Q3","Q4"), year=c(rep(2017,4),rep(2018,4)), cum_tot= c(2,4,6,8,3,6,9,12))
quarter year cum_tot
1 Q1 2017 2
2 Q2 2017 4
3 Q3 2017 6
4 Q4 2017 8
5 Q1 2018 3
6 Q2 2018 6
7 Q3 2018 9
8 Q4 2018 12
Then we can get the sales per quarter as:
library(dplyr)
df %>% group_by(year) %>% mutate(original = c(cum_tot[1], diff(cum_tot)))
Or, as per GGamba's comment below:
df %>% group_by(year) %>% mutate(original = cum_tot - lag(cum_tot, default = 0))
They both result in:
quarter year cum_tot original
1 Q1 2017 2 2
2 Q2 2017 4 2
3 Q3 2017 6 2
4 Q4 2017 8 2
5 Q1 2018 3 3
6 Q2 2018 6 3
7 Q3 2018 9 3
8 Q4 2018 12 3
Hope this helps!

replace NA with previous 2 years values

i have 2 df's ,in df1 we have NA values which needs to be replaced with mean of previous 2 years Average_f1
eg. in df1 - for row 5 year is 2015 and bin - 5 and we need to replace previous 2 years mean for same bin from df2 (2013&2014) and for row-7 we have only 1 year value
df1 df2
year p1 bin year bin_p1 Average_f1
2013 20 1 2013 5 29.5
2013 24 1 2014 5 16.5
2014 10 2 2015 NA 30
2014 11 2 2016 7 12
2015 NA 5
2016 10 3
2017 NA 7
output
df1
year p1 bin
2013 20 1
2013 24 1
2014 10 2
2014 11 2
2015 **23** 5
2016 10 3
2017 **12** 7
Thanks in advance

Resources