Calculate annual average of quarterly data in R [duplicate] - r

This question already has answers here:
Summarising by a group variable in r
(2 answers)
Closed 2 years ago.
I have a dataframe with some TS data reported quarterly, as follows
quarter region value
2018T4 A 4
2018T3 A 2
2018T2 A 3
2018T1 A 9
2018T4 B 6
2018T3 B 2
2018T2 B 5
2018T1 B 8
2017T4 A 2
...
I want to aggregate the quarterly observations and average them to obtain an annual mean value for each year and region, as such
quarter region value
2018 A 4.5
2018 B 5.25
2017 A 2
...
What would be an appropriate approach to this?

We can remove the quarter information from year and take mean by year and region.
aggregate(value~year+region, transform(df, year = sub('T.*', '', quarter)), mean)
# year region value
#1 2017 A 2.00
#2 2018 A 4.50
#3 2018 B 5.25
Same using dplyr :
library(dplyr)
df %>%
group_by(year = sub('T.*', '', quarter), region) %>%
summarise(value = mean(value))

Related

Six-month peak-season running average

I'm trying to implement this:
The recommendation is a peak season ozone AQG level of 60 μg/m3
(the average of daily maximum 8-hour mean ozone concentrations).
The peak season is defined as the six consecutive months of the year
with the highest six-month running-average ozone concentration.
In regions away from the equator, this period will typically be in the
warm season within a single calendar year (northern hemisphere)
or spanning two calendar years (southern hemisphere). Close to
the equator, such clear seasonal patterns may not be obvious, but a
running-average six-month peak season will usually be identifiable
from existing monitoring or modelling data.
I have:
# A tibble: 300 × 2
date value
<dttm> <dbl>
1 1997-01-01 00:00:00 NA
2 1997-02-01 00:00:00 NA
3 1997-03-01 00:00:00 NA
4 1997-04-01 00:00:00 30.2
5 1997-05-01 00:00:00 20.9
6 1997-06-01 00:00:00 10.1
7 1997-07-01 00:00:00 9.40
8 1997-08-01 00:00:00 22.4
9 1997-09-01 00:00:00 26.2
10 1997-10-01 00:00:00 32.9
# … with 290 more rows
Every year is complete (with or without NA). I found the peaks by "findpeaks" from pracma package, and get:
peaks = findpeaks(mda8_omit$value, minpeakdistance = 6,
minpeakheight = mean(mda8_omit$value))
How do i optimize to get the best six month by peak? For northern hemisphere is easier because the peaks is within a yer (summer) but in the southern hemisphere is split in two years and peaks may change depending on latitude. Any ideas on how to continue?
Assuming that
we only use windows with 6 consecutive months of data
the year that a window falls is determined by the last month of the window
we compare all such windows, at most 12, within each calendar year
Calculate the rolling mean and then grouping by year take the row with the largest rolling mean within year. This row is the last month of the 6 month window. The input is shown reproducibly in the Note at the end.
library(dplyr)
library(zoo)
DF %>%
mutate(date = as.yearmon(date),
peakmean = rollapplyr(value, 6, mean, fill = NA)) %>%
group_by(year = as.integer(date)) %>%
slice_max(peakmean) %>%
ungroup %>%
select(-year)
## # A tibble: 1 × 3
## date value peakmean
## <yearmon> <dbl> <dbl>
## 1 Oct 1997 32.9 20.3
Note
Lines <- "date value
1 1997-01-01T00:00:00 NA
2 1997-02-01T00:00:00 NA
3 1997-03-01T00:00:00 NA
4 1997-04-01T00:00:00 30.2
5 1997-05-01T00:00:00 20.9
6 1997-06-01T00:00:00 10.1
7 1997-07-01T00:00:00 9.40
8 1997-08-01T00:00:00 22.4
9 1997-09-01T00:00:00 26.2
10 1997-10-01T00:00:00 32.9"
DF <- read.table(text = Lines)

R Calculate change in Weekly values Year on Year (with additional complication)

I have a data set of daily value. It spans from Dec-1 2018 to April-1 2020.
The columns are "date" and "value". As shown here:
date <- c("2018-12-01","2000-12-02", "2000-12-03",
...
"2020-03-30","2020-03-31","2020-04-01")
value <- c(1592,1825,1769,1909,2022, .... 2287,2169,2366,2001,2087,2099,2258)
df <- data.frame(date,value)
What I would like to do is the sum the values by week and then calculate week over week change from the current to previous year.
I know that I can sum by week using the following function:
Data_week <- df%>% group_by(category ,week = cut(date, "week")) %>% mutate(summed= sum(value))
My questions are twofold:
1) How do I sum by week and then manipulate the dataframe so that I can calculate week over week change (e.g. week dec.1 2019/ week dec.1 2018).
2) How can I do that above, but using a "customized" week. Let's say I want to define a week as moving 7 days back from the latest date I have data for. Eg. the latest week I would have would be week starting on March 26th (April 1st -7 days).
We can use lag from dplyr to help and also some convenience functions from lubridate.
library(dplyr)
library(lubridate)
df %>%
mutate(year = year(date)) %>%
group_by(week = week(date),year) %>%
summarize(summed = sum(value)) %>%
arrange(year, week) %>%
ungroup %>%
mutate(change = summed - lag(summed))
# week year summed change
# <dbl> <dbl> <dbl> <dbl>
# 1 48 2018 3638. NA
# 2 49 2018 15316. 11678.
# 3 50 2018 13283. -2033.
# 4 51 2018 15166. 1883.
# 5 52 2018 12885. -2281.
# 6 53 2018 1982. -10903.
# 7 1 2019 14177. 12195.
# 8 2 2019 14969. 791.
# 9 3 2019 14554. -415.
#10 4 2019 12850. -1704.
#11 5 2019 1907. -10943.
If you would like to define "weeks" in different ways, there is also isoweek and epiweek. See this answer for a great explaination of your options.
Data
set.seed(1)
df <- data.frame(date = seq.Date(from = as.Date("2018-12-01"), to = as.Date("2019-01-29"), "days"), value = runif(60,1500,2500))

Age calculation for observation data in R [duplicate]

This question already has answers here:
Return date range by group
(3 answers)
Closed 3 years ago.
I have very simple big observation data hypothetically structured as below:
> df = data.frame(ID = c("oak", "birch", rep("oak",2), "pine", "birch", "oak", rep("pine",2), "birch", "oak"),
+ yearobs = c(rep(1998,3), rep(1999,2), rep(2000,3),rep(2001,2), 2002))
> df
ID yearobs
1 oak 1998
2 birch 1998
3 oak 1998
4 oak 1999
5 pine 1999
6 birch 2000
7 oak 2000
8 pine 2000
9 pine 2001
10 birch 2001
11 oak 2002
What I want to do is to calculate the age by taking the difference between the years ( max(yearobs)-min(yearobs) ) for each unique ID (tree species in this example). I have tried to work with lubridate + dplyr packages, however, number of observations for each unique ID varies in my data and I want to create an age column in a fastest way without storing minimum and maximum values separately (avoiding for loops here since my data is huge).
Desired output:
ID age
1 oak 4
2 birch 3
3 pine 3
Any suggestion would be appreciated.
In base R you can do:
aggregate(yearobs ~ ID, data = df, FUN = function(x) max(x) - min(x))
# ID yearobs
# 1 birch 3
# 2 oak 4
# 3 pine 2
An option is to group by 'ID' and get the difference between the min and max of 'yearobs' column
library(dplyr)
df %>%
group_by(ID) %>%
summarise(age = max(yearobs) - min(yearobs))
Also, if we need to do this fast, then data.table would be another option
library(data.table)
setDT(df)[, .(age = max(yearobs) - min(yearobs)), by = ID]
Or using base R
by(df['yearobs'], df$ID, FUN = function(x) max(x)- min(x))

How to calculate/count the number of extreme precipitation events (above a "threshold") from daily rainfall data in each month per year basis

I am working on daily rainfall data and trying to evaluate the extreme events from the time series data above a certain threshold value in each month per year i.e. the number of times the rainfall exceeded a certain threshold in each month per year.
The rainfall timeseries data is from St Lucia and has two columns:
"YEARMODA" - defining the time (format- YYYYMMDD)
"PREP" - rainfall in mm (numeric)
StLucia <- read_excel("C:/Users/hp/Desktop/StLuciaProject.xlsx")
The dataframe which I'm working i.e "Precip1" on has two columns namely:
Time (format YYYY-MM-DD)
Precipitation (numeric value)
The code is provided below:
library("imputeTS")
StLucia$YEARMODA <- as.Date(as.character(StLucia$YEARMODA), format = "%Y%m%d")
data1 <- na_ma(StLucia$PREP, k=4, weighting = "exponential")
Precip1 <- data.frame(Time= StLucia$YEARMODA, Precipitation= data1, check.rows = TRUE)
I found out the threshold value based on the 95th percentile and 99th percentile using function quantile().
I now want to count the number of "extreme events" of rainfall above this threshold in each month on per year basis.
Please help me out on this. I would be highly obliged by your help. Thank You!
If you are open to a tidyverse method, here is an example with the economics dataset that is built into ggplot2. We can use ntile to assign a percentile group to each observation. Then we group_by the year, and get a count of the values that are in the desired percentiles. Because this is monthly data the counts are pretty low, but it's easily translated to daily data.
library(tidyverse)
thresholds <- economics %>%
mutate(
pctile = ntile(unemploy, 100),
year = year(date)
) %>%
group_by(year) %>%
summarise(
q95 = sum(pctile >= 95L),
q99 = sum(pctile >= 99L)
)
arrange(thresholds, desc(q95))
#> # A tibble: 49 x 3
#> year q95 q99
#> <dbl> <int> <int>
#> 1 2010 12 6
#> 2 2011 12 0
#> 3 2009 10 5
#> 4 1967 0 0
#> 5 1968 0 0
#> 6 1969 0 0
#> 7 1970 0 0
#> 8 1971 0 0
#> 9 1972 0 0
#> 10 1973 0 0
#> # ... with 39 more rows
Created on 2018-06-04 by the reprex package (v0.2.0).

Calculating age per animal by subtracting years in R

I am looking to calculate relative age of animals. I need to subtract sequentially each year from the next for each animal in my dataset. Because an animal can have multiple reproductive events in a year, I need the age for the remaining events in that year (i.e. all events after the first) to be the same as the initial calculation.
Update:
The dataset more resembles this:
Year ID Age
1 1975 6 -1
2 1975 6 -1
3 1976 6 -1
4 1977 6 -1
6 1975 9 -1
8 1978 9 -1
And I need it to look like this
Year ID Age
1 1975 6 0
2 1975 6 0
3 1976 6 1
4 1977 6 2
6 1975 9 0
8 1978 9 3
Apologies for the initial confusion, if I wasn't clear on what I needed to accomplish.
Any help would be greatly appreciated.
Things done "by group" are usually easiest to do using dplyr or data.table
library(dplyr)
your_data %>%
group_by(ID) %>% # group by ID
mutate(Age = Year - min(Year)) # add new column
or
library(data.table)
setDT(your_data) # convert to data table
# add new column by group
your_data[, Age := Year - min(Year), by = ID]
In base R, ave is probably easiest for adding a groupwise columns to existing data:
your_data$Age = with(your_data, ave(Year, ID, function(x) x - min(x)))
but the syntax isn't as nice as the options above.
You can test on this data:
your_data = read.table(text = " Year ID Age
1 1975 6 -1
2 1975 6 -1
3 1976 6 -1
4 1977 6 -1
6 1975 9 -1
8 1978 9 -1 ", header = T)
if you're trying to figure out the relative age based on one intial birth year, 1975 (which it seems like you are), then you can just make a new column called "RelativeAge" and set it equal to the year - 1975
data$RelativeAge = (Year-1975)
then just get rid of the original "Age" column, or rename as necessary

Resources