How to Disaggregate Monthly Data into Weekly Basis?

How to Disaggregate Monthly Data into Weekly Basis? - r

I am trying to disaggregate the monthly data and spread them into weekly data in two ways.
First, To find the first Monday from the start date and then create days which are Mondays till the last date (month) of the sequence. And then spread the data within the respective week which is in the month.
Second, To create a weekly sequence from start date and end date and spread the data within the respective week which is in the month.
The data which I am working with is given below:
structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"), `2022-11-01` = c(100,
200, 300, 400), `2022-12-01` = c(160, 200, 300, 400), `2023-01-01` = c(500,
550, 600, 650)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
And it looks like this:
The expected output 1 is given below, as you can see all the dates are Mondays:
The expected output 2 is given below:
Is this doable, or is it a bit too much to expect from R?

For Mondays we can create a list of Mondays between the dates in the dataframe, join it with the data in long format, count number of the Mondays for each variable in each month, divide the values by the number of Mondays, and revert back the format to wide;
library(dplyr)
library(tidyr)
library(lubridate)
all_dates <- as.Date(names(df1)[-1])
MON <- seq(min(floor_date(all_dates, "month")),
max(ceiling_date(all_dates, "month")),
by="1 day") %>%
.[wday(.,label = TRUE) == "Mon"] %>%
data.frame("Mondays" = .) %>%
mutate(mmm = format(Mondays, "%Y-%m"))
df1 %>%
pivot_longer(cols = -`Row Labels`, names_to = "dates") %>%
mutate(dates = as.Date(dates),
mmm = format(dates, "%Y-%m")) %>%
right_join(MON, by = "mmm") %>%
arrange(mmm) %>%
group_by(`Row Labels`, dates) %>%
mutate(value = value / n()) %>%
ungroup() %>%
select(`Row Labels`, Mondays, value) %>%
pivot_wider(`Row Labels`, names_from = "Mondays", values_from = "value")
#> # A tibble: 4 x 14
#> `Row Labels` `2022-11-07` `2022-11-14` `2022-11-21` `2022-11-28` `2022-12-05`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 X6 25 25 25 25 40
#> 2 X7 50 50 50 50 50
#> 3 X8 75 75 75 75 75
#> 4 X9 100 100 100 100 100
#> # ... with 8 more variables: 2022-12-12 <dbl>, 2022-12-19 <dbl>,
#> # 2022-12-26 <dbl>, 2023-01-02 <dbl>, 2023-01-09 <dbl>, 2023-01-16 <dbl>,
#> # 2023-01-23 <dbl>, 2023-01-30 <dbl>
Same principal goes to doing it weekly:
WKLY <- seq(min(floor_date(all_dates, "month")),
max(ceiling_date(all_dates, "month")),
by="week") %>%
data.frame("Weekly" = .) %>%
mutate(mmm = format(Weekly, "%Y-%m"))
df1 %>%
pivot_longer(cols = -`Row Labels`, names_to = "dates") %>%
mutate(dates = as.Date(dates),
mmm = format(dates, "%Y-%m")) %>%
right_join(WKLY, by = "mmm") %>%
arrange(mmm) %>%
group_by(`Row Labels`, dates) %>%
mutate(value = value / n()) %>%
ungroup() %>%
select(`Row Labels`, Weekly, value) %>%
pivot_wider(`Row Labels`, names_from = "Weekly", values_from = "value")
#> # A tibble: 4 x 15
#> `Row Labels` `2022-11-01` `2022-11-08` `2022-11-15` `2022-11-22` `2022-11-29`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 X6 20 20 20 20 20
#> 2 X7 40 40 40 40 40
#> 3 X8 60 60 60 60 60
#> 4 X9 80 80 80 80 80
#> # ... with 9 more variables: 2022-12-06 <dbl>, 2022-12-13 <dbl>,
#> # 2022-12-20 <dbl>, 2022-12-27 <dbl>, 2023-01-03 <dbl>, 2023-01-10 <dbl>,
#> # 2023-01-17 <dbl>, 2023-01-24 <dbl>, 2023-01-31 <dbl>
Data:
df1 <- structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"),
`2022-11-01` = c(100, 200, 300, 400),
`2022-12-01` = c(160, 200, 300, 400),
`2023-01-01` = c(500, 550, 600, 650)),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -4L))

Related

Finding mean of variable across each month/year

I have a dataset that looks similar to this:
> dput(df)
structure(list(Date = c("3/23/21", "4/11/22", "6/30/22"), Banana_wasted = c(4L,
2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L,
4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L,
8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L,
5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L,
8L)), class = "data.frame", row.names = c(NA, -3L))
I want to find the % of fruit wasted per month/year (in relation to how many fruits were ordered).
it should be:
(Banana_wasted+Apple_wasted+Orange_wasted) / (Banana_ordered + Apple_ordered+ Orange_ordered)
So, for 3/21, it should be:
(4+6+1/5+9+5)*100 = 57.9%
I would like to do this for every month of the year.

library(tidyverse)
df %>%
group_by(Date = floor_date(mdy(Date), "month")) %>%
summarise(
wasted = sum(across(contains("wasted"))) / sum(across(contains("ordered"))),
wasted_eaten = sum(across(contains("wasted"))) / sum(across(contains("eaten")))
)
# A tibble: 3 x 3
Date wasted wasted_eaten
<date> <dbl> <dbl>
1 2021-03-01 0.579 0.579
2 2022-04-01 0.286 0.314
3 2022-06-01 0.409 0.523

library(dplyr)
library(lubridate)
df %>%
mutate(Date = as.Date(Date, format = "%m/%d/%y"),
pct_wasted = (Banana_wasted + Apple_wasted + Orange_wasted) / (Banana_ordered + Apple_Ordered + Orange_ordered) * 100) %>%
group_by(year = year(Date), month = month(Date)) %>%
summarize(avg_pct_wasted = mean(pct_wasted))
#> # A tibble: 3 × 3
#> # Groups: year [2]
#> year month avg_pct_wasted
#> <dbl> <dbl> <dbl>
#> 1 2021 3 57.9
#> 2 2022 4 28.6
#> 3 2022 6 40.9
Created on 2023-02-06 with reprex v2.0.2

Pivot longer to get single wasted and ordered columns across all fruits; use lubridate::floor_date() and mdy() to get months from Date; group by month; then sum and divide to get your percentages:
library(dplyr)
library(tidyr)
library(lubridate)
dat %>%
rename(Apple_ordered = Apple_Ordered) %>% # for consistent capitalization
pivot_longer(
Banana_wasted:Orange_eaten,
names_to = c("Fruit", ".value"),
names_sep = "_"
) %>%
group_by(month = floor_date(mdy(Date), "month")) %>%
summarize(pct_wasted = sum(wasted) / sum(ordered)) %>%
ungroup()
# # A tibble: 3 × 2
# month pct_wasted
# <date> <dbl>
# 1 2021-03-01 0.579
# 2 2022-04-01 0.286
# 3 2022-06-01 0.409
If you prefer character labels, use strftime() instead of floor_date(), and scales::percent() for the percentages:
library(scales)
dat %>%
rename(Apple_ordered = Apple_Ordered) %>%
pivot_longer(
Banana_wasted:Orange_eaten,
names_to = c("Fruit", ".value"),
names_sep = "_"
) %>%
group_by(month = strftime(mdy(Date), "%B %Y")) %>%
summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %>%
ungroup()
# # A tibble: 3 × 2
# month pct_wasted
# <chr> <chr>
# 1 April 2022 28.6%
# 2 June 2022 40.9%
# 3 March 2021 57.9%

How to assign values to a df$column from another df using conditions in R

I have two dataframes :
> df1 <- data.frame(date = as.Date( c( "2021-06-01", "2021-06-02", "2021-06-03", "2021-06-04",
"2021-06-05", "2021-06-06", "2021-06-07", "2021-06-08",
"2021-06-09", "2021-06-10", "2021-06-11", "2021-06-12",
"2021-06-13") ),
temperature = c( 17, 30, 28, 29, 16, 21, 20, 11, 28, 29, 25, 26, 19) )
and
> df2 <- data.frame( ID = c( 1 : 4 ),
date.pose = as.Date(c("2021-06-01", "2021-06-03", "2021-06-06", "2021-06-10") ),
date.withdrawal = as.Date(c("2021-06-02", "2021-06-05", "2021-06-09", "2021-06-13") ) )
I want to store the mean temperature for each period that is in df2 in a new colomn (df2$mean.temperature).
For ID = 1 from df2, the mean temperature would be calculated with the temperatures from 2021-06-01 and 2021-06-02, witch is mean(17, 30)
In other words, I want to get this :
> df2 <- data.frame(ID = c( 1 : 4 ),
date.pose = as.Date( c("2021-06-01", "2021-06-03", "2021-06-06", "2021-06-10") ) ,
date.withdrawal = as.Date( c("2021-06-03", "2021-06-06", "2021-06-10", "2021-06-13") ),
mean.Temperature = c(23.5, 24.3, 20.0, 24.8) )
I'm trying to add the ID from df2 in a new colomn in df1. Once I do that, I could aggregate like this :
> df3 <- aggregate(df1$temperature, list(df1$ID, df2$date.pose), FUN = mean)
I don't know how to add the corresponding ID in df1.
Or maybe there is a better way to do this?

Here's an approach using uncount from tidyr and some joins.
df2 %>%
mutate(days = (date.witdrawal - date.pose + 1) %>% as.integer) %>%
tidyr::uncount(days, .id = "row") %>%
transmute(ID, date = date.pose + row - 1) %>%
left_join(df1) %>%
group_by(ID) %>%
summarize(mean.Temperature = mean(temperature)) %>%
right_join(df2)
Result
# A tibble: 4 × 4
ID mean.Temperature date.pose date.witdrawal
<int> <dbl> <date> <date>
1 1 23.5 2021-06-01 2021-06-02
2 2 24.3 2021-06-03 2021-06-05
3 3 20 2021-06-06 2021-06-09
4 4 24.8 2021-06-10 2021-06-13

Update. thanks to #Jon Spring:
Here is how we could do it:
logic:
join both df's by date after long pivoting df1
arrange by date and fill
then after grouping by ID use summarise with mean()
and re-join finally:
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(-ID, values_to = "date") %>%
full_join(df1, by= "date") %>%
arrange(date) %>%
fill(ID, .direction = "down") %>%
group_by(ID) %>%
summarise(mean_temp = mean(temperature, na.rm = TRUE)) %>%
left_join(df2, by="ID")
ID mean_temp date.pose date.witdrawal
<int> <dbl> <date> <date>
1 1 23.5 2021-06-01 2021-06-02
2 2 24.3 2021-06-03 2021-06-05
3 3 20 2021-06-06 2021-06-09
4 4 24.8 2021-06-10 2021-06-13

R: Expand rows according to start and end date and calculate hours between days

My question extends this one: Generate rows between two dates into a data frame in R
I have a dataset on admissions, discharges and lengths of stay (Stay_in_days) of patients from a hospital. It looks like this:
ID Admission Discharge Stay_in_days
1 2020-08-20 15:25:03 2020-08-21 21:09:34 1.239
2 2020-10-04 21:53:43 2020-10-09 11:02:57 4.548
...
Dates are in POSIXct format so far.
I aim for this:
ID Date Stay_in_days
1 2020-08-20 15:25:03 0.357
1 2020-08-21 21:09:49 1.239
2 2020-10-04 21:53:43 0.087
2 2020-10-05 00:00:00 1.087
2 2020-10-06 00:00:00 2.087
2 2020-10-07 00:00:00 3.087
2 2020-10-08 00:00:00 4.087
2 2020-10-09 11:02:57 4.548
...
What I have done so far:
M <- Map(seq, patients$Admission, patients$Discharge, by = "day")
patients2 <- data.frame(
ID = rep.int(patients$ID, vapply(M, length, 1L)),
Date = do.call(c, M)
)
patients <- patients %>%
mutate(
Date2=as.Date(Date, format = "%Y-%m-%d"),
Dat2=Date2+1,
Diff=difftime(Date2, Date, units = "days")
)
but this gives me:
ID Date Date2 Diff
1 2020-08-20 17:25:03 2020-08-21 0.375
1 2020-08-21 17:25:03 2020-08-22 0.357
2 2020-10-04 23:53:43 2020-10-05 0.087
2 2020-10-05 23:53:43 2020-10-06 0.087
2 2020-10-06 23:53:43 2020-10-07 0.087
2 2020-10-07 23:53:43 2020-10-08 0.087
2 2020-10-08 23:53:43 2020-10-09 0.087
...
Strangely enough, it adds two hours to the Admission date but calculates the correct length of stay. Can someone explain?
Here is some data:
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20), Admission = structure(c(1597937103.872,
1598717768.704, 1599060521.984, 1599758087.168, 1599815496.704,
1600702198.784, 1600719631.36, 1601065923.584, 1601119400.96,
1601215476.736, 1601236710.4, 1601416934.4, 1601499640.832, 1601545647.104,
1601587328, 1601644868.608, 1601741206.528, 1601848423.424, 1601901245.44,
1601913828.352), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Discharge = structure(c(1598044189.696, 1598897337.344, 1599144670.208,
1599845118.976, 1599842366.464, 1602733683.712, 1603372135.424,
1601125168.128, 1601314173.952, 1605193905.152, 1602190259.2,
1601560720.384, 1601737143.296, 1602705634.304, 1602410460.16,
1602698425.344, 1601770566.656, 1602241377.28, 1602780476.416,
1602612048.896), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Stay_in_days = c(1.239, 2.078, 0.974, 1.007, 0.311, 23.513,
30.7, 0.686, 2.254, 46.047, 11.036, 1.664, 2.749, 13.426,
9.527, 12.194, 0.34, 4.548, 10.176, 8.081)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
Thanks in advance for your help!

Though it is a bit crude but it'll work
library(tidyverse)
library(lubridate)
df %>%
pivot_longer(cols = -c(ID, Stay_in_days), names_to = "Event", values_to = "DATE") %>%
group_by(ID) %>%
mutate(dummy = case_when(Event == "Admission" ~ 0,
Event == "Discharge" ~ max(floor(Stay_in_days),1))) %>%
complete(dummy = seq(min(dummy), max(dummy), 1)) %>%
mutate(Event = ifelse(is.na(Event), "Dummy", Event),
DATE = if_else(is.na(DATE), first(DATE)+dummy*24*60*60, DATE),
Stay_in_days = case_when(Event == "Admission" ~ as.numeric(difftime(ceiling_date(DATE, "day"), DATE, units = "days")),
Event == "Discharge" ~ Stay_in_days,
TRUE ~ dummy + as.numeric(difftime(ceiling_date(first(DATE), "day"), first(DATE), units = "days")))) %>%
select(ID, DATE, Stay_in_days)
# A tibble: 199 x 3
# Groups: ID [20]
ID DATE Stay_in_days
<dbl> <dttm> <dbl>
1 1 2020-08-20 15:25:03 0.358
2 1 2020-08-21 21:09:49 1.24
3 2 2020-08-29 16:16:08 0.322
4 2 2020-08-30 16:16:08 1.32
5 2 2020-08-31 18:08:57 2.08
6 3 2020-09-02 15:28:41 0.355
7 3 2020-09-03 14:51:10 0.974
8 4 2020-09-10 17:14:47 0.281
9 4 2020-09-11 17:25:18 1.01
10 5 2020-09-11 09:11:36 0.617
# ... with 189 more rows
Explanation of logic For the first date in every ID, the stay_in_days gives the duration from admission date-time to following 24 Hrs. For intermediate dates, it just adds 1 to previous value. For discharge_date it retains the stay value calculated prior to pivoting. Hope this was you after.
Explanation of code After pivoting longer, I used a dummy column to create intermediate date-time objects. After that I just mutate the columns for generating output as described above.

You can achieve this with pivot_longer from tidyr.
Edit: with comments:
df1 <- df %>%
select(ID = ID, date1 = Admission, date2 = Discharge, Stay_in_days) %>% # prepare for pivoting
pivot_longer(
cols = starts_with("date"),
names_to = "Date1",
values_to = "Date",
) %>% # pivot to longformat
select(-Date1) %>% # remove temporary Date1
relocate(Stay_in_days, .after = Date) %>% # change column order
group_by(ID) %>%
mutate(idgroup = rep(row_number(), each=1:2, length.out = n())) %>% # id for admission = 1 and for discharge id = 2
mutate(Stay_in_days = replace(Stay_in_days, row_number() == 1, 0)) %>% # set Admission to zero
ungroup()

Calculation of "average sales share " with dplyr::mutate

My data concerns a company and includes Total Sales and the amount of sales in three counties CA , TX and WI.
Data :
> dput(head(WalData))
structure(list(CA = c(11047, 9925, 11322, 12251, 16610, 14696
), TX = c(7381, 5912, 9006, 6226, 9440, 9376), WI = c(6984, 3309,
8883, 9533, 11882, 8664), Total = c(25412, 19146, 29211, 28010,
37932, 32736), date = structure(c(1296518400, 1296604800, 1296691200,
1296777600, 1296864000, 1296950400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), event_type = c("NA", "NA", "NA", "NA", "NA", "Sporting"
), snap_CA = c(1, 1, 1, 1, 1, 1), snap_TX = c(1, 0, 1, 0, 1,
1), snap_WI = c(0, 1, 1, 0, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
With the following code i am trying to calculate the average sales share of the three states on the company's total sales.
In addition, i need the same average percentages for each year, month of the year and day of the week.
install.packages("dplyr")
install.packages("lubridate")
library(dplyr)
library(lubridate)
df1 <- df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# Average per Year
df1 %>%
dplyr::group_by(YEAR) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Month
df1 %>%
dplyr::group_by(MONTH) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
# Average per Weekday
df1 %>%
dplyr::group_by(WEEKDAY) %>%
dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
AV_TX = mean(P_TX, na.rm = TRUE),
AV_WI = mean(P_WI, na.rm = TRUE))
Output :
> df1 <- df %>%
+ dplyr::mutate(YEAR = lubridate::year(date),
+ MONTH = lubridate::month(date),
+ WEEKDAY = lubridate::wday(date),
+ P_CA = CA / Total,
+ P_TX = TX / Total,
+ P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
> # Average per Year
> df1 %>%
+ dplyr::group_by(YEAR) %>%
+ dplyr::summarise(AV_CA = mean(P_CA, na.rm = TRUE),
+ AV_TX = mean(P_TX, na.rm = TRUE),
+ AV_WI = mean(P_WI, na.rm = TRUE))
Error in eval(lhs, parent, parent) : object 'df1' not found
It comes with an error : Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
I cant figure out whats wrong , i double checked the code and the correctness of the data .
Please give a solution .

The issue would be that df is not created as an object in the global env and there is a function with name df if we do ?df
df(x, df1, df2, ncp, log = FALSE)
Basically, the error is based on applying mutate on a function df rather than an object
Checking on a fresh R session with no objects created
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
Error in UseMethod("mutate_") :
no applicable method for 'mutate_' applied to an object of class "function"
Now, we define 'df' as
df <- WalData
df %>%
dplyr::mutate(YEAR = lubridate::year(date),
MONTH = lubridate::month(date),
WEEKDAY = lubridate::wday(date),
P_CA = CA / Total,
P_TX = TX / Total,
P_WI = WI / Total)
# A tibble: 6 x 15
# CA TX WI Total date event_type snap_CA snap_TX snap_WI YEAR MONTH WEEKDAY P_CA P_TX P_WI
# <dbl> <dbl> <dbl> <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 11047 7381 6984 25412 2011-02-01 00:00:00 NA 1 1 0 2011 2 3 0.435 0.290 0.275
#2 9925 5912 3309 19146 2011-02-02 00:00:00 NA 1 0 1 2011 2 4 0.518 0.309 0.173
#3 11322 9006 8883 29211 2011-02-03 00:00:00 NA 1 1 1 2011 2 5 0.388 0.308 0.304
#4 12251 6226 9533 28010 2011-02-04 00:00:00 NA 1 0 0 2011 2 6 0.437 0.222 0.340
#5 16610 9440 11882 37932 2011-02-05 00:00:00 NA 1 1 1 2011 2 7 0.438 0.249 0.313
#6 14696 9376 8664 32736 2011-02-06 00:00:00 Sporting 1 1 1 2011 2 1 0.449 0.286 0.265

R: Add interpolated values in between columns of dataframe?

I have a data frame that looks like this
Region 2000 2001 2002 2003 2004 2005
Australia 15.6 18.4 19.2 20.2 39.1 50.2
Norway 19.05 20.2 15.3 10 10.1 5.6
and basically I need a quick way to add extra columns in-between the currently existing columns that contain interpolated values of the surrounding columns.
Think of it like this: say you don't want columns for every year, but rather columns for every quarter. Then, for every pair of years (like 2000 and 2001), we would need to add 3 extra columns in-between these years.
The values of these columns will just be interpolated values. So, for Australia, the value in 2000 is 15.6 and in 2001 it is 18.4. So we calculate (18.4 - 15.6)/4 = 0.7, and then the values should now be 15.6, 16.3, 17, 17.7, and finally 18.4.
I have a working solution that builds up the new dataframe from scratch using a for loop. It is EXTREMELY slow. How to speed this up?

This is how I did it when I had a similar problem. Not the most sophisticated solution but it works.
Australia=c( 15.6, 18.4, 19.2, 20.2, 39.1, 50.2)
library(zoo)
midpoints=rollmean(Australia, 2)
biyearly=c(rbind(Australia,midpoints))
midpoints=rollmean(biyearly, 2)
quarterly=c(rbind(biyearly,midpoints))
quarterly
#[1] 15.600 16.300 17.000 17.700 18.400 18.600 18.800 19.000 19.200 19.450 19.700
#[12] 19.950 20.200 24.925 29.650 34.375 39.100 41.875 44.650 47.425 50.200 33.600
#[23] 17.000 16.300

Here is one way with tidyverse:
library(tidyverse)
df %>%
#get data in long format
pivot_longer(cols = -Region) %>%
#group by Region
group_by(Region) %>%
#Create 4 number sequence between every 2 value
summarise(temp = list(unlist(map2(value[-n()], value[-1], seq, length.out = 4)))) %>%
#Get data in long format
unnest(temp) %>%
group_by(Region) %>%
#Create column name
mutate(col = paste0(rep(names(df)[-c(1, ncol(df))], each = 4), "Q", 1:4)) %>%
#Spread data in wide format
pivot_wider(names_from = col, values_from = temp)
# A tibble: 2 x 21
# Groups: Region [2]
# Region `2000Q1` `2000Q2` `2000Q3` `2000Q4` `2001Q1` `2001Q2` `2001Q3` `2001Q4` `2002Q1`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Austr… 15.6 16.5 17.5 18.4 18.4 18.7 18.9 19.2 19.2
#2 Norway 19.0 19.4 19.8 20.2 20.2 18.6 16.9 15.3 15.3
# … with 11 more variables: `2002Q2` <dbl>, `2002Q3` <dbl>, `2002Q4` <dbl>,
# `2003Q1` <dbl>, `2003Q2` <dbl>, `2003Q3` <dbl>, `2003Q4` <dbl>, `2004Q1` <dbl>,
# `2004Q2` <dbl>, `2004Q3` <dbl>, `2004Q4` <dbl>
data
df <- structure(list(Region = structure(1:2, .Label = c("Australia",
"Norway"), class = "factor"), `2000` = c(15.6, 19.05), `2001` = c(18.4,
20.2), `2002` = c(19.2, 15.3), `2003` = c(20.2, 10), `2004` = c(39.1,
10.1), `2005` = c(50.2, 5.6)), class = "data.frame", row.names = c(NA, -2L))

Here is a solution using dplyr. Should be more consistent and much faster than a loop:
# dummy data
df <- tibble(Region = LETTERS[1:5],
`2000` = 1:5,
`2001` = 3:7,
`2002` = 10:14)
# function to calculate quarterly values
into_quarter <- function(x) x / 4
df %>%
# create new variables that contain quarterly values
mutate_at(vars(starts_with("200")),
.funs = list("Q1" = into_quarter,
"Q2" = into_quarter,
"Q3" = into_quarter,
"Q4" = into_quarter)) %>%
# sort them approriatly.
# can also be done with base R and order(names), depending on desired result
select(Region,
starts_with("2000"),
starts_with("2001"),
starts_with("2002"),
# in case there are also other variables and to not loose any information
everything())

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to Disaggregate Monthly Data into Weekly Basis? - r

Related

Finding mean of variable across each month/year

How to assign values to a df$column from another df using conditions in R

R: Expand rows according to start and end date and calculate hours between days

Calculation of "average sales share " with dplyr::mutate

R: Add interpolated values in between columns of dataframe?

Categories

Resources