How best to transform quarterly data into monthly - r

Below is the sample data. I receive the data in a form such as this. Each row is a quarter and then the months are columns inside of it. Trying to do some month over month calculation but am thinking that I transform the data frame in order to do so. I am thinking that I would do a pivot_longer but not seeing anything online that is of a similar vein. Below is the desired result
year<-c(2018,2018,2018,2018,2019,2019,2019,2019,2020,2020,2020,2020)
qtr<-c(1,2,3,4,1,2,3,4,1,2,3,4)
avgemp <-c(3,5,7,9,11,13,15,17,19,21,23,25)
month1emp<-c(2,4,6,8,10,12,14,16,18,20,22,24)
month2emp<-c(3,5,7,9,11,13,15,17,19,21,23,25)
month3emp<-c(4,6,8,10,12,14,16,18,20,22,24,26)
sample<-data.frame(year,qtr,month1emp,month2emp,month3emp)
Desired Result
year qtr month employment
2018 1 1 2
2018 1 2 3
2018 1 3 4
2018 2 4 4
2018 2 4 5
2018 2 4 6
and so on. At 2019, the month value would restart and go from 1 to 12.

We could use pivot_longer on the 'month' columns, specify the names_pattern to capture the digits ((\\d+)) followed by the emp for the 'month' and the .value columns
library(dplyr)
library(tidyr)
sample %>%
pivot_longer(cols = starts_with('month'),
names_to = c("month", ".value"), names_pattern = ".*(\\d+)(emp)")%>%
rename(employment = emp)
-output
# A tibble: 36 x 4
year qtr month employment
<dbl> <dbl> <chr> <dbl>
1 2018 1 1 2
2 2018 1 2 3
3 2018 1 3 4
4 2018 2 1 4
5 2018 2 2 5
6 2018 2 3 6
7 2018 3 1 6
8 2018 3 2 7
9 2018 3 3 8
10 2018 4 1 8
# … with 26 more rows
If we need to increment the 'month' based on 'qtr' value
sample %>%
pivot_longer(cols = starts_with('month'),
names_to = c("month", ".value"), names_pattern = ".*(\\d+)(emp)")%>%
rename(employment = emp) %>%
mutate(month = as.integer(month) + c(0, 3, 6, 9)[qtr])
# A tibble: 36 x 4
year qtr month employment
<dbl> <dbl> <dbl> <dbl>
1 2018 1 1 2
2 2018 1 2 3
3 2018 1 3 4
4 2018 2 4 4
5 2018 2 5 5
6 2018 2 6 6
7 2018 3 7 6
8 2018 3 8 7
9 2018 3 9 8
10 2018 4 10 8
# … with 26 more rows

Base R solution:
# Create a vector of boolean values,
# denoting whether or not the columns should
# be unpivoted: unpivot_cols => boolean vector
unpivot_cols <- startsWith(
names(df),
"month"
)
# Reshape the data.frame, calculate
# the month value: rshpd_df => data.frame
rshpd_df <- transform(
reshape(
df,
direction = "long",
varying = names(df)[unpivot_cols],
ids = NULL,
timevar = "month",
times = seq_len(sum(unpivot_cols)),
v.names = "employment",
new.row.names = seq_len(
nrow(df) * ncol(df)
)
),
month = ((12 / 4) * (qtr - 1)) + month
)
# Order the data.frame by year and month:
# ordered_df => data.frame
ordered_df <- with(
rshpd_df,
rshpd_df[order(year, month),]
)

Related

How to group non-ecluding year ranges using a loop with dplyr

I'm new here, so maybe my question could be difficult to understand. So, I have some data and it's date information and I need to group the mean of the data in year ranges. But this year ranges are non-ecluding, I mean that, for example, my first range is: 2013-2015 then 2014-2016 then 2015-2017, etc. So I think that it could be done by using a loop function and dplyr, but I dont know how to do it. I´ll be very thankfull if someone can help me.
Thank you,
Alejandro
What I tried was like:
for (i in Year){
Year_3=c(i, i+1, i+2)
db>%> group_by(Year_3)
#....etc
}
As you note, each observation would be used in multiple groups, so one approach could be to make copies of your data accordingly:
df <- data.frame(year = 2013:2020, value = 1:8)
library(dplyr)
df %>%
tidyr::uncount(3, .id = "grp") %>%
mutate(group_start = year - grp + 1,
group_name = paste0(group_start, "-", group_start + 2)) %>%
group_by(group_name) %>%
summarise(value = mean(value),
n = n())
# A tibble: 10 × 3
group_name value n
<chr> <dbl> <int>
1 2011-2013 1 1
2 2012-2014 1.5 2
3 2013-2015 2 3
4 2014-2016 3 3
5 2015-2017 4 3
6 2016-2018 5 3
7 2017-2019 6 3
8 2018-2020 7 3
9 2019-2021 7.5 2
10 2020-2022 8 1
Or we might take a more algebraic approach, noting that the sum of a three year period will be the difference between the cumulative amount two years in the future minus the cumulative amount the prior year. This approach excludes the partial ranges.
df %>%
mutate(cuml = cumsum(value),
value_3yr = (lead(cuml, n = 2) - lag(cuml, default = 0)) / 3)
year value cuml value_3yr
1 2013 1 1 2
2 2014 2 3 3
3 2015 3 6 4
4 2016 4 10 5
5 2017 5 15 6
6 2018 6 21 7
7 2019 7 28 NA
8 2020 8 36 NA

How to create a previous row difference based on a columns in R

I have a data frame like this:
my_df <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1))
> my_df
year my_month val
1 2018 6 5
2 2018 7 9
3 2017 8 3
4 2017 9 2
5 2016 4 1
6 2016 4 1
I need a data frame like this:
my_df_2 <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1),
pre_month = c(NA,4,NA,-1,NA,0))
> my_df_2
year my_month val pre_month
1 2018 6 5 NA
2 2018 7 9 4
3 2017 8 3 NA
4 2017 9 2 -1
5 2016 4 1 NA
6 2016 5 1 0
Basically "pre_month" col is created by taking "my_month" row for that particular year and subtracting the value of previous month in "val" column. So far 7-2018 -> 9-5=4 and so on.
Thank you for your help.
Here's a solution using tidyverse.
my_df <- data.frame(
year = c("2018","2018","2017","2017", "2016","2016"),
my_month = c(6,7,8,9,4,5),
val=c(5,9,3,2,1,1))
library(tidyverse)
my_df %>%
mutate(year = as.numeric(year)) %>%
group_by(year) %>%
arrange(my_month) %>%
mutate(pre_month = c(NA, diff(val))) %>%
arrange(desc(year))
I changed year to a numeric so it could be sorted sensibly.

Convert quarterly data to monthly

I need to convert my data, which is on quarterly basis, to monthly, by dividing some variable by 3.
Example dataset:
df <- data.frame(Year = c(2018,2019,2020), qtr = c(1,3,2),
amount = c(3,6,12), variable = c(5,6,7))
df
What I would need is to get months for every quarter, i.e. the final dataset would look like this:
data.frame(Year = c(2018,2018,2018,2019,2019,2019,2020,2020,2020),
qtr = c(1,2,3,7,8,9,4,5,6),
amount = c(1,1,1,2,2,2,4,4,4),
variable = c(5,5,5,6,6,6,7,7,7))
Also, bonus question, how do I print the data frames in this environment
Does this work:
df %>%
mutate(qtr_start_mth = case_when(qtr == 1 ~ 1,
qtr == 2 ~ 4,
qtr == 3 ~ 7,
qtr == 4 ~ 10),
qtr_end_mth = case_when(qtr == 1 ~ 3,
qtr == 2 ~ 6,
qtr == 3 ~ 9,
qtr == 4 ~ 12)) %>%
mutate(month = map2(qtr_start_mth, qtr_end_mth, `:`)) %>%
separate_rows() %>%
unnest(month) %>%
mutate(amount = amount /3) %>%
select(1,2,3,4,7)
# A tibble: 9 x 5
Year qtr amount variable month
<dbl> <dbl> <dbl> <dbl> <int>
1 2018 1 1 5 1
2 2018 1 1 5 2
3 2018 1 1 5 3
4 2019 3 2 6 7
5 2019 3 2 6 8
6 2019 3 2 6 9
7 2020 2 4 7 4
8 2020 2 4 7 5
9 2020 2 4 7 6
Data used:
> dput(df)
structure(list(Year = c(2018, 2019, 2020), qtr = c(1, 3, 2),
amount = c(3, 6, 12), variable = c(5, 6, 7)), class = "data.frame", row.names = c(NA,
-3L))
>
Using base:
do.call(rbind,
c(make.row.names = FALSE,
lapply(split(df, df$Year), function(i){
cbind(i, month = 1:3 + (i$qtr - 1) * 3, row.names = NULL)
})))
# Year qtr amount variable month
# 1 2018 1 3 5 1
# 2 2018 1 3 5 2
# 3 2018 1 3 5 3
# 4 2019 3 6 6 7
# 5 2019 3 6 6 8
# 6 2019 3 6 6 9
# 7 2020 2 12 7 4
# 8 2020 2 12 7 5
# 9 2020 2 12 7 6

Create multiple new dataframes based on rows in another dataframe with a for loop in r

I have a dataframe that looks like this:
df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))
ID Type X2019 X2020 X2021
1 1 A 1 2 3
2 2 A 2 3 4
3 3 B 3 4 5
4 4 B 4 5 6
5 5 C 5 6 7
6 6 C 6 7 8
Now, I'm looking for some code that does the following:
1. Create a new data.frame for every row in df
2. Names the new dataframe with a combination of "ID" and "Type" (A_1, A_2, ... , C_6)
The resulting new dataframes should look like this (example for A_1, A_2 and C_6):
Year Values
1 2019 1
2 2020 2
3 2021 3
Year Values
1 2019 2
2 2020 3
3 2021 4
Year Values
1 2019 6
2 2020 7
3 2021 8
I have some things that somehow complicate the code:
1. The code should work in the next few years without any changes, meaning next year the data.frame df will no longer contain the years 2019-2021, but rather 2020-2022.
2. As the data.frame df is only a minimal reproducible example, I need some kind of loop. In the "real" data, I have a lot more rows and therefore a lot more dataframes to be created.
Unfortunately, I can't give you any code, as I have absolutely no idea how I could manage that.
While researching, I found the following code that may help adress the first problem with the changing years:
year <- as.numeric(format(Sys.Date(), "%Y"))
Further, I read about list, and that it may help to work with a list in a for loop and then transform the list back into a dataframe. Sorry for my limited approach, I hope anyone can give me a hint or even the solution to my problem. If you need any further information, please let me know. Thanks in advance!
A kind of similar question to mine:
Populating a data frame in R in a loop
Try this:
library(stringr)
library(dplyr)
library(tidyr)
library(magrittr)
df %>%
gather(Year, Values, 3:5) %>%
mutate(Year = str_sub(Year, 2)) %>%
select(ID, Year, Values) %>%
group_split(ID) # split(.$ID)
# [[1]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 1 2019 1
# 2 1 2020 2
# 3 1 2021 3
#
# [[2]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 2 2019 2
# 2 2 2020 3
# 3 2 2021 4
#
# [[3]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 3 2019 3
# 2 3 2020 4
# 3 3 2021 5
#
# [[4]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 4 2019 4
# 2 4 2020 5
# 3 4 2021 6
#
# [[5]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 5 2019 5
# 2 5 2020 6
# 3 5 2021 7
#
# [[6]]
# # A tibble: 3 x 3
# ID Year Values
# <dbl> <chr> <dbl>
# 1 6 2019 6
# 2 6 2020 7
# 3 6 2021 8
Data
df <- data.frame(ID = c(1,2,3,4,5,6), Type = c("A","A","B","B","C","C"), `2019` = c(1,2,3,4,5,6),`2020` = c(2,3,4,5,6,7), `2021` = c(3,4,5,6,7,8))
library(magrittr)
library(tidyr)
library(dplyr)
library(stringr)
names(df) <- str_replace_all(names(df), "X", "") #remove X's from year names
df %>%
gather(Year, Values, 3:5) %>%
select(ID, Year, Values) %>%
group_split(ID)

R dplyr - select values from one column based on position of a specific value in another column

I am working with gait-cycle data. I have 8 events marked for each id and gait trial. The values "LFCH" and "RFCH" occurs twice in each trial, as these represent the beginning and the end of the gait cycles from left and right leg.
Sample Data Frame:
df <- data.frame(ID = rep(1:5, each = 16),
Gait_nr = rep(1:2, each = 8, times=5),
Frame = rep(c(1,5,7,9,10,15,22,25), times = 10),
Marks = rep(c("LFCH", "LHL", "RFCH", "LTO", "RHL", "LFCH", "RTO", "RFCH"), times =10)
head(df,8)
ID Gait_nr Frame Marks
1 1 1 1 LFCH
2 1 1 5 LHL
3 1 1 7 RFCH
4 1 1 9 LTO
5 1 1 10 RHL
6 1 1 15 LFCH
7 1 1 22 RTO
8 1 1 25 RFCH
I wold like to create something like
Total_gait_left = Frame[The last time Marks == "LFCH"] - Frame[The first time Marks == "LFCH"]
My current code solves the problem, but depends on the position of the Frame values rather than actual values in Marks. Any individual not following the normal gait pattern will have wrong values produced by the code.
library(tidyverse)
l <- df %>% group_by(ID, Gait_nr) %>% filter(grepl("L.+", Marks)) %>%
summarize(Total_gait = Frame[4] - Frame[1],
Side = "left")
r <- df %>% group_by(ID, Gait_nr) %>% filter(grepl("R.+", Marks)) %>%
summarize(Total_gait = Frame[4] - Frame[1],
Side = "right")
val <- union(l,r, by=c("ID", "Gait_nr", "Side")) %>% arrange(ID, Gait_nr, Side)
Can you help me make my code more stable by helping me change e.g. Frame[4] to something like Frame[Marks=="LFCH" the last time ]?
If both LFCH and RFCH happen exactly twice, you can filter and then use diff in summarize:
df %>%
group_by(ID, Gait_nr) %>%
summarise(
left = diff(Frame[Marks == 'LFCH']),
right = diff(Frame[Marks == 'RFCH'])
)
# A tibble: 10 x 4
# Groups: ID [?]
# ID Gait_nr left right
# <int> <int> <dbl> <dbl>
# 1 1 1 14 18
# 2 1 2 14 18
# 3 2 1 14 18
# 4 2 2 14 18
# 5 3 1 14 18
# 6 3 2 14 18
# 7 4 1 14 18
# 8 4 2 14 18
# 9 5 1 14 18
#10 5 2 14 18
We can use first and last from the dplyr package.
library(dplyr)
df2 <- df %>%
filter(Marks %in% "LFCH") %>%
group_by(ID, Gait_nr) %>%
summarise(Total_gait = last(Frame) - first(Frame)) %>%
ungroup()
df2
# # A tibble: 10 x 3
# ID Gait_nr Total_gait
# <int> <int> <dbl>
# 1 1 1 14
# 2 1 2 14
# 3 2 1 14
# 4 2 2 14
# 5 3 1 14
# 6 3 2 14
# 7 4 1 14
# 8 4 2 14
# 9 5 1 14
# 10 5 2 14

Resources