Count number of instances above a varying threshold - r

I have the 0.95 percentile threshold for temperature for each country. In the example below a week is 4 days. I want to count in a new vector/single-column-dataframe how many days each individual country's temperature is over that country's threshold on a weekly basis.
The country 95% percentile temperatures are:
q95 <- c(26,21,22,20,23)
DailyTempCountry <- data.frame(Date = c("W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4",
"W1D1","W1D2","W1D3","W1D4","W2D1","W2D2","W2D3","W2D4"),
Country = c("AL","AL", "AL", "AL","AL","AL", "AL", "AL",
"BE","BE", "BE", "BE", "BE","BE", "BE", "BE",
"CA","CA", "CA", "CA","CA","CA", "CA", "CA",
"DE","DE", "DE", "DE","DE","DE", "DE", "DE",
"UK","UK", "UK", "UK","UK","UK", "UK", "UK"),
DailyTemp = c(27,25,20,22,20,20,27,27,
24,22,23,18,17,19,20,16,
23,23,23,23,27,26,20,26,
19,18,17,19,16,15,19,18,
20,24,24,20,19,25,19,25))
DailyTempCountry
Date Country DailyTemp
1 W1D1 AL 27
2 W1D2 AL 25
3 W1D3 AL 20
4 W1D4 AL 22
5 W2D1 AL 20
6 W2D2 AL 20
7 W2D3 AL 27
8 W2D4 AL 27
9 W1D1 BE 24
10 W1D2 BE 22
11 W1D3 BE 23
12 W1D4 BE 18
13 W2D1 BE 17
14 W2D2 BE 19
15 W2D3 BE 20
16 W2D4 BE 16
17 W1D1 CA 23
18 W1D2 CA 23
19 W1D3 CA 23
20 W1D4 CA 23
21 W2D1 CA 27
22 W2D2 CA 26
23 W2D3 CA 20
24 W2D4 CA 26
25 W1D1 DE 19
26 W1D2 DE 18
27 W1D3 DE 17
28 W1D4 DE 19
29 W2D1 DE 16
30 W2D2 DE 15
31 W2D3 DE 19
32 W2D4 DE 18
33 W1D1 UK 20
34 W1D2 UK 24
35 W1D3 UK 24
36 W1D4 UK 20
37 W2D1 UK 19
38 W2D2 UK 25
39 W2D3 UK 19
40 W2D4 UK 25
What I want is a vector/column that counts the number of days in that week above the country's threshold like this:
DaysInWeekAboveQ95 <- c(1,2,3,0,4,3,0,0,2,2)
df_right <- data.frame(Week = c("W1","W2","W1","W2","W1","W2","W1","W2","W1","W2"),
DaysInWeekAboveQ95 = c(1,2,3,0,4,3,0,0,2,2))
Week DaysInWeekAboveQ95
1 W1 1
2 W2 2
3 W1 3
4 W2 0
5 W1 4
6 W2 3
7 W1 0
8 W2 0
9 W1 2
10 W2 2
The q95% vector was
q95 <- c(26,21,22,20,23)
so in the first week AL have 1 instance above its threshold value 26. UK have 2 instances above 23 (UK's threshold) in the second week. And so for every country and every week.
I handled a similar problem but where the threshold did not vary by country but was just a constant 30 degrees (where I divide by 7 because seven days in week)
DaysAbove30perWeek <- as.data.frame(tapply(testdlong$value > 30,
ceiling(seq(nrow(testdlong))/7),sum))
Maybe a solution is to loop over countries? However, I can't figure out how to incorporate the specific loop. Other solutions are welcome.

In revised scenario you also need calculating a new column for week too
q95 <- c(26,21,22,20,23)
c_q95 <- data.frame(Country = unique(DailyTempCountry$Country),
threshold = q95)
library(dplyr)
DailyTempCountry %>% left_join(c_q95, by = 'Country') %>%
group_by(Country, Week = substr(Date, 1, 2)) %>%
summarise(days = sum(DailyTemp > threshold), .groups = 'drop')
# A tibble: 10 x 3
Country Week days
<chr> <chr> <int>
1 AL W1 1
2 AL W2 2
3 BE W1 3
4 BE W2 0
5 CA W1 4
6 CA W2 3
7 DE W1 0
8 DE W2 0
9 UK W1 2
10 UK W2 2
Created on 2021-05-05 by the reprex package (v2.0.0)
OP has asked that date variable is in some different format than given in sample data
time <- as.character(20000101:20000130)
> time
[1] "20000101" "20000102" "20000103" "20000104" "20000105" "20000106" "20000107" "20000108" "20000109" "20000110"
[11] "20000111" "20000112" "20000113" "20000114" "20000115" "20000116" "20000117" "20000118" "20000119" "20000120"
[21] "20000121" "20000122" "20000123" "20000124" "20000125" "20000126" "20000127" "20000128" "20000129" "20000130"
library(lubridate)
time <- ymd(time)
# Either ISO week
isoweek(time)
# or week
week(time)
> isoweek(time)
[1] 52 52 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4
> # or week
> week(time)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
library(lubridate)
time <- ymd(time)
isoweek(time)
week(time)

Related

Remove rows conditionally based on NA values in other rows

I have a data frame like this:
city year value
<chr> <dbl> <dbl>
1 la 1 NA
2 la 2 NA
3 la 3 NA
4 la 4 20
5 la 5 25
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
I would like to remove any cities that don't have a complete 5 years worth of data. So in this case, I'd like to remove all rows for city la despite the fact that there is data for years 4 and 5, resulting in the following data frame:
city year value
<chr> <dbl> <dbl>
1 nyc 1 18
2 nyc 2 29
3 nyc 3 24
4 nyc 4 17
5 nyc 5 30
Is this possible? Thanks in advance.
In Base R:
subset(df, !ave(value, city, FUN = anyNA))
city year value
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
in Tidyverse
df %>%
group_by(city) %>%
filter(!anyNA(value))
# A tibble: 5 x 3
# Groups: city [1]
city year value
<chr> <int> <int>
1 nyc 1 18
2 nyc 2 29
3 nyc 3 24
4 nyc 4 17
5 nyc 5 30
or even
df %>%
group_by(city) %>%
filter(all(!is.na(value)))
Another base R option with ave
> subset(df, !is.na(ave(value, city)))
city year value
6 nyc 1 18
7 nyc 2 29
8 nyc 3 24
9 nyc 4 17
10 nyc 5 30
or a data.table one
> library(data.table)
> setDT(df)[, .SD[!anyNA(value)], city]
city year value
1: nyc 1 18
2: nyc 2 29
3: nyc 3 24
4: nyc 4 17
5: nyc 5 30

how to determine season dry or rainy in temporal analysis using R?

I have the data temporal of temperature, i would like determinate if date be to season dry or rainy.
In my coutry the season dry start in May up to October, and season rainy start in November up to April.
Would be possible create a column with this information in package dplyr ou other?
my data-frame in:
sample_station <-c('A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C','A','B','C','A','B','C')
Date_dmy <-c('01/01/2000','08/08/2000','16/03/2001','22/09/2001','01/06/2002','05/01/2002','26/01/2002','16/02/2002','09/03/2002','30/03/2002','20/04/2002','04/01/2000','11/08/2000','19/03/2001','25/09/2001','04/06/2002','08/01/2002','29/01/2002','19/02/2002','12/03/2002','13/09/2001','08/01/2000','15/08/2000','23/03/2001','29/09/2001','08/06/2002','12/01/2002','02/02/2002','23/02/2002','16/03/2002','06/04/2002','01/02/2000','01/02/2000','01/02/2000','02/11/2001','02/11/2001','02/11/2001')
Temperature <-c(17,20,24,19,17,19,23,26,19,19,21,15,23,18,22,22,23,18,19,26,21,22,23,27,19,19,21,23,24,25,26,29,30,21,25,24,23)
df<-data.frame(sample_station, Date_dmy, Temperature)
One option is to extract the month after converting to Date class, create a condition in case_when to return 'dry', 'rainy' based on the values of 'Month' column
library(dplyr)
library(lubridate)
df <- df %>%
mutate(Month = month(dmy(Date_dmy)),
categ = case_when(Month %in% 5:10 ~ 'dry', TRUE ~ 'rainy'))
Similar to akrun's solution but with ifelse:
library(dplyr)
library(lubridate)
df <- df %>%
mutate(Month = month(dmy(Date_dmy)),
categ = ifelse(Month %in% 5:10,'dry','rainy'))
Output:
sample_station Date_dmy Temperature Month categ
1 A 01/01/2000 17 1 rainy
2 A 08/08/2000 20 8 dry
3 A 16/03/2001 24 3 rainy
4 A 22/09/2001 19 9 dry
5 A 01/06/2002 17 6 dry
6 A 05/01/2002 19 1 rainy
7 A 26/01/2002 23 1 rainy
8 A 16/02/2002 26 2 rainy
9 A 09/03/2002 19 3 rainy
10 A 30/03/2002 19 3 rainy
11 A 20/04/2002 21 4 rainy
12 B 04/01/2000 15 1 rainy
13 B 11/08/2000 23 8 dry
14 B 19/03/2001 18 3 rainy
15 B 25/09/2001 22 9 dry
16 B 04/06/2002 22 6 dry
17 B 08/01/2002 23 1 rainy
18 B 29/01/2002 18 1 rainy
19 B 19/02/2002 19 2 rainy
20 B 12/03/2002 26 3 rainy
21 B 13/09/2001 21 9 dry
22 C 08/01/2000 22 1 rainy
23 C 15/08/2000 23 8 dry
24 C 23/03/2001 27 3 rainy
25 C 29/09/2001 19 9 dry
26 C 08/06/2002 19 6 dry
27 C 12/01/2002 21 1 rainy
28 C 02/02/2002 23 2 rainy
29 C 23/02/2002 24 2 rainy
30 C 16/03/2002 25 3 rainy
31 C 06/04/2002 26 4 rainy
32 A 01/02/2000 29 2 rainy
33 B 01/02/2000 30 2 rainy
34 C 01/02/2000 21 2 rainy
35 A 02/11/2001 25 11 rainy
36 B 02/11/2001 24 11 rainy
37 C 02/11/2001 23 11 rainy

A running sum for daily data that resets when month turns

I have a 2 column table (tibble), made up of a date object and a numeric variable. There is maximum one entry per day but not every day has an entry (ie date is a natural primary key). I am attempting to do a running sum of the numeric column along with dates but with the running sum resetting when the month turns (the data is sorted by ascending date). I have replicated what I want to get as a result below.
Date score monthly.running.sum
10/2/2019 7 7
10/9/2019 6 13
10/16/2019 12 25
10/23/2019 2 27
10/30/2019 13 40
11/6/2019 2 2
11/13/2019 4 6
11/20/2019 15 21
11/27/2019 16 37
12/4/2019 4 4
12/11/2019 24 28
12/18/2019 28 56
12/25/2019 8 64
1/1/2020 1 1
1/8/2020 15 16
1/15/2020 9 25
1/22/2020 8 33
It looks like the package "runner" is possibly suited to this but I don't really understand how to instruct it. I know I could use a join operation plus a group_by using dplyr to do this, but the data set is very very large and doing so would be wildly inefficient. i could also manually iterate through the list with a loop, but that also seems inelegant. last option i can think of is selecting out a unique vector of yearmon objects and then cutting the original list into many shorter lists and running a plain cumsum on it, but that also feels unoptimal. I am sure this is not the first time someone has to do this, and given how many tools there is in the tidyverse to do things, I think I just need help finding the right one. The reason I am looking for a tool instead of using one of the methods I described above (which would take less time than writing this post) is because this code needs to be very very readable by an audience that is less comfortable with code.
We can also use data.table
library(data.table)
setDT(df)[, Date := as.IDate(Date, "%m/%d/%Y")
][, monthly.running.sum := cumsum(score),format(Date, "%Y-%m")][]
# Date score monthly.running.sum
# 1: 2019-10-02 7 7
# 2: 2019-10-09 6 13
# 3: 2019-10-16 12 25
# 4: 2019-10-23 2 27
# 5: 2019-10-30 13 40
# 6: 2019-11-06 2 2
# 7: 2019-11-13 4 6
# 8: 2019-11-20 15 21
# 9: 2019-11-27 16 37
#10: 2019-12-04 4 4
#11: 2019-12-11 24 28
#12: 2019-12-18 28 56
#13: 2019-12-25 8 64
#14: 2020-01-01 1 1
#15: 2020-01-08 15 16
#16: 2020-01-15 9 25
#17: 2020-01-22 8 33
data
df <- structure(list(Date = c("10/2/2019", "10/9/2019", "10/16/2019",
"10/23/2019", "10/30/2019", "11/6/2019", "11/13/2019", "11/20/2019",
"11/27/2019", "12/4/2019", "12/11/2019", "12/18/2019", "12/25/2019",
"1/1/2020", "1/8/2020", "1/15/2020", "1/22/2020"), score = c(7L,
6L, 12L, 2L, 13L, 2L, 4L, 15L, 16L, 4L, 24L, 28L, 8L, 1L, 15L,
9L, 8L)), row.names = c(NA, -17L), class = "data.frame")
Using lubridate, you can extract month and year values from the date, group_by those values and them perform the cumulative sum as follow:
library(lubridate)
library(dplyr)
df %>% mutate(Month = month(mdy(Date)),
Year = year(mdy(Date))) %>%
group_by(Month, Year) %>%
mutate(SUM = cumsum(score))
# A tibble: 17 x 6
# Groups: Month, Year [4]
Date score monthly.running.sum Month Year SUM
<chr> <int> <int> <int> <int> <int>
1 10/2/2019 7 7 10 2019 7
2 10/9/2019 6 13 10 2019 13
3 10/16/2019 12 25 10 2019 25
4 10/23/2019 2 27 10 2019 27
5 10/30/2019 13 40 10 2019 40
6 11/6/2019 2 2 11 2019 2
7 11/13/2019 4 6 11 2019 6
8 11/20/2019 15 21 11 2019 21
9 11/27/2019 16 37 11 2019 37
10 12/4/2019 4 4 12 2019 4
11 12/11/2019 24 28 12 2019 28
12 12/18/2019 28 56 12 2019 56
13 12/25/2019 8 64 12 2019 64
14 1/1/2020 1 1 1 2020 1
15 1/8/2020 15 16 1 2020 16
16 1/15/2020 9 25 1 2020 25
17 1/22/2020 8 33 1 2020 33
An alternative will be to use floor_date function in order ot convert each date as the first day of each month and the calculate the cumulative sum:
library(lubridate)
library(dplyr)
df %>% mutate(Floor = floor_date(mdy(Date), unit = "month")) %>%
group_by(Floor) %>%
mutate(SUM = cumsum(score))
# A tibble: 17 x 5
# Groups: Floor [4]
Date score monthly.running.sum Floor SUM
<chr> <int> <int> <date> <int>
1 10/2/2019 7 7 2019-10-01 7
2 10/9/2019 6 13 2019-10-01 13
3 10/16/2019 12 25 2019-10-01 25
4 10/23/2019 2 27 2019-10-01 27
5 10/30/2019 13 40 2019-10-01 40
6 11/6/2019 2 2 2019-11-01 2
7 11/13/2019 4 6 2019-11-01 6
8 11/20/2019 15 21 2019-11-01 21
9 11/27/2019 16 37 2019-11-01 37
10 12/4/2019 4 4 2019-12-01 4
11 12/11/2019 24 28 2019-12-01 28
12 12/18/2019 28 56 2019-12-01 56
13 12/25/2019 8 64 2019-12-01 64
14 1/1/2020 1 1 2020-01-01 1
15 1/8/2020 15 16 2020-01-01 16
16 1/15/2020 9 25 2020-01-01 25
17 1/22/2020 8 33 2020-01-01 33
A base R alternative :
df$Date <- as.Date(df$Date, "%m/%d/%Y")
df$monthly.running.sum <- with(df, ave(score, format(Date, "%Y-%m"),FUN = cumsum))
df
# Date score monthly.running.sum
#1 2019-10-02 7 7
#2 2019-10-09 6 13
#3 2019-10-16 12 25
#4 2019-10-23 2 27
#5 2019-10-30 13 40
#6 2019-11-06 2 2
#7 2019-11-13 4 6
#8 2019-11-20 15 21
#9 2019-11-27 16 37
#10 2019-12-04 4 4
#11 2019-12-11 24 28
#12 2019-12-18 28 56
#13 2019-12-25 8 64
#14 2020-01-01 1 1
#15 2020-01-08 15 16
#16 2020-01-15 9 25
#17 2020-01-22 8 33
The yearmon class represents year/month objects so just convert the dates to yearmon and accumulate by them using this one-liner:
library(zoo)
transform(DF, run.sum = ave(score, as.yearmon(Date, "%m/%d/%Y"), FUN = cumsum))
giving:
Date score run.sum
1 10/2/2019 7 7
2 10/9/2019 6 13
3 10/16/2019 12 25
4 10/23/2019 2 27
5 10/30/2019 13 40
6 11/6/2019 2 2
7 11/13/2019 4 6
8 11/20/2019 15 21
9 11/27/2019 16 37
10 12/4/2019 4 4
11 12/11/2019 24 28
12 12/18/2019 28 56
13 12/25/2019 8 64
14 1/1/2020 1 1
15 1/8/2020 15 16
16 1/15/2020 9 25
17 1/22/2020 8 33

Calculate average number of individuals present on each date in R

I have a dataset that contains the residence period (start.date to end.date) of marked individuals (ID) at different sites. My goal is to generate a column that tells me the average number of other individuals per day that were also present at the same site (across the total residence period of each individual).
To do this, I need to determine the total number of individuals that were present per site on each date, summed across the total residence period of each individual. Ultimately, I will divide this sum by the total residence days of each individual to calculate the average. Can anyone help me accomplish this?
I calculated the total number of residence days (total.days) using lubridate and dplyr
mutate(total.days = end.date - start.date + 1)
site ID start.date end.date total.days
1 1 16 5/24/17 6/5/17 13
2 1 46 4/30/17 5/20/17 21
3 1 26 4/30/17 5/23/17 24
4 1 89 5/5/17 5/13/17 9
5 1 12 5/11/17 5/14/17 4
6 2 14 5/4/17 5/10/17 7
7 2 18 5/9/17 5/29/17 21
8 2 19 5/24/17 6/10/17 18
9 2 39 5/5/17 5/18/17 14
First of all, it is always advisable to give a sample of the data in a more friendly format using dput(yourData) so that other can easily regenerate your data. Here is the output of dput() you could better be sharing:
> dput(dat)
structure(list(site = c(1, 1, 1, 1, 1, 2, 2, 2, 2), ID = c(16,
46, 26, 89, 12, 14, 18, 19, 39), start.date = structure(c(17310,
17286, 17286, 17291, 17297, 17290, 17295, 17310, 17291), class = "Date"),
end.date = structure(c(17322, 17306, 17309, 17299, 17300,
17296, 17315, 17327, 17304), class = "Date")), class = "data.frame", row.names =
c(NA,
-9L))
To do this easily we first need to unpack the start.date and end.date to individual dates:
newDat <- data.frame()
for (i in 1:nrow(dat)){
expand <- data.frame(site = dat$site[i],
ID = dat$ID[i],
Dates = seq.Date(dat$start.date[i], dat$end.date[i], 1))
newDat <- rbind(newDat, expand)
}
newDat
site ID Dates
1 1 16 2017-05-24
2 1 16 2017-05-25
3 1 16 2017-05-26
4 1 16 2017-05-27
5 1 16 2017-05-28
6 1 16 2017-05-29
7 1 16 2017-05-30
. . .
. . .
Then we calculate the number of other individuals present in each site in each day:
individualCount = newDat %>%
group_by(site, Dates) %>%
summarise(individuals = n_distinct(ID) - 1)
individualCount
# A tibble: 75 x 3
# Groups: site [?]
site Dates individuals
<dbl> <date> <int>
1 1 2017-04-30 1
2 1 2017-05-01 1
3 1 2017-05-02 1
4 1 2017-05-03 1
5 1 2017-05-04 1
6 1 2017-05-05 2
7 1 2017-05-06 2
8 1 2017-05-07 2
9 1 2017-05-08 2
10 1 2017-05-09 2
# ... with 65 more rows
Then, we augment our data with the new information using left_join() and calculate the required average:
newDat <- left_join(newDat, individualCount, by = c("site", "Dates")) %>%
group_by(site, ID) %>%
summarise(duration = max(Dates) - min(Dates)+1,
av.individuals = mean(individuals))
newDat
# A tibble: 9 x 4
# Groups: site [?]
site ID duration av.individuals
<dbl> <dbl> <time> <dbl>
1 1 12 4 0.75
2 1 16 13 0
3 1 26 24 1.42
4 1 46 21 1.62
5 1 89 9 1.33
6 2 14 7 1.14
7 2 18 21 0.875
8 2 19 18 0.333
9 2 39 14 1.14
The final step is to add the required column to the original dataset (dat) again with left_join():
dat %>% left_join(newDat, by = c("site", "ID"))
dat
site ID start.date end.date duration av.individuals
1 1 16 2017-05-24 2017-06-05 13 days 0.000000
2 1 46 2017-04-30 2017-05-20 21 days 1.619048
3 1 26 2017-04-30 2017-05-23 24 days 1.416667
4 1 89 2017-05-05 2017-05-13 9 days 2.333333
5 1 12 2017-05-11 2017-05-14 4 days 2.750000
6 2 14 2017-05-04 2017-05-10 7 days 1.142857
7 2 18 2017-05-09 2017-05-29 21 days 0.857143
8 2 19 2017-05-24 2017-06-10 18 days 0.333333
9 2 39 2017-05-05 2017-05-18 14 days 1.142857

Calculating yearly growth-rates from quarterly, long form data in r

My data takes the following form:
df <- data.frame(Sector=c(rep("A",8),rep("B",8)), Country = c(rep("USA", 16)),
Quarter=rep(1:8,2),Income=20:35)
df2 <- data.frame(Sector=c(rep("A",8),rep("B",8)), Country = c(rep("UK", 16)),
Quarter=rep(1:8,2),Income=32:47)
df <- rbind(df, df2)
What I want to do is to calculate the growth rate from the first quarter each year to the first quarter the second year, within country and sector. In the example above it would be the growth rate from quarter 1 to quarter 5. So for Sector A, in the USA, it would be (24/20)-1=0.2
I then want to append this data to the dataframe as a new column.
I looked at the solutions in:
How calculate growth rate in long format data frame?
But didn't have the r-skills to get it to work if the lag is more then one time-unit. Any suggestions?
ADDITION
So what i want is the growth-rate, that is (24/20)-1=0.2 in the example below. Not 1-(24/20), which I first wrote. The desired output should look something like this:
Sector Country Quarter Income growth
(fctr) (fctr) (int) (int) (dbl)
1 A USA 1 20 NA
2 A USA 2 21 NA
3 A USA 3 22 NA
4 A USA 4 23 NA
5 A USA 5 24 0.2
6 A USA 6 25 0.1904
7 A USA 7 26 0.1818
I think you need something like this:
library(dplyr)
df %>%
#group by sector and country
group_by(Sector, Country) %>%
#calculate growth as (quarter / 5-period-lagged quarter) - 1
mutate(growth = Income / lag(Income, 4) - 1)
Output
Source: local data frame [32 x 5]
Groups: Sector, Country [4]
Sector Country Quarter Income growth
(fctr) (fctr) (int) (int) (dbl)
1 A USA 1 20 NA
2 A USA 2 21 NA
3 A USA 3 22 NA
4 A USA 4 23 NA
5 A USA 5 24 0.2000000
6 A USA 6 25 0.1904762
7 A USA 7 26 0.1818182
8 A USA 8 27 0.1739130
9 B USA 1 28 NA
10 B USA 2 29 NA
.. ... ... ... ... ...
df3 = copy(df)
df3$Quarter = df3$Quarter - 4
df = merge(df,df3,c('Sector','Country','Quarter'), suffixes = c('','_prev'), all.x = T)
df$growth = 1 - (df$Income_prev/df$Income
> df
Sector Country Quarter Income Income_prev growth
1 A USA 1 20 24 -4
2 A USA 2 21 25 -4
3 A USA 3 22 26 -4
4 A USA 4 23 27 -4
5 A USA 5 24 NA NA
6 A USA 6 25 NA NA
7 A USA 7 26 NA NA
8 A USA 8 27 NA NA
9 A UK 1 32 36 -4
10 A UK 2 33 37 -4
11 A UK 3 34 38 -4
12 A UK 4 35 39 -4
13 A UK 5 36 NA NA
14 A UK 6 37 NA NA
15 A UK 7 38 NA NA
16 A UK 8 39 NA NA
17 B USA 1 28 32 -4
18 B USA 2 29 33 -4
19 B USA 3 30 34 -4
20 B USA 4 31 35 -4
21 B USA 5 32 NA NA
22 B USA 6 33 NA NA
23 B USA 7 34 NA NA
24 B USA 8 35 NA NA
25 B UK 1 40 44 -4
26 B UK 2 41 45 -4
27 B UK 3 42 46 -4
28 B UK 4 43 47 -4
29 B UK 5 44 NA NA
30 B UK 6 45 NA NA
31 B UK 7 46 NA NA
32 B UK 8 47 NA NA
>

Resources