Sum daily values into monthly values - r

I am trying to sum daily rainfall values into monthly totals for a record over 100 years in length. My data takes the form:
Year Month Day Rain
1890 1 1 0
1890 1 2 3.1
1890 1 3 2.5
1890 1 4 15.2
In the example above I want R to sum all the days of rainfall in January 1890, then February 1890, March 1890.... through to December 2010. I guess what I'm trying to do is create a loop to sum values. My output file should look like:
Year Month Rain
1890 1 80.5
1890 2 72.4
1890 3 66.8
1890 4 77.2
Any easy way to do this?
Many thanks.

You can use dplyr for some pleasing syntax
library(dplyr)
df %>%
group_by(Year, Month) %>%
summarise(Rain = sum(Rain))

In some cases it can be beneficial to convert it to a time-series class like xts, then you can use functions like apply.monthly().
Data:
df <- data.frame(
Year = rep(1890,5),
Month = c(1,1,1,2,2),
Day = 1:5,
rain = rexp(5)
)
> head(df)
Year Month Day rain
1 1890 1 1 0.1528641
2 1890 1 2 0.1603080
3 1890 1 3 0.5363315
4 1890 2 4 0.6368029
5 1890 2 5 0.5632891
Convert it to xts and use apply.monthly():
library(xts)
dates <- with(df, as.Date(paste(Year, Month, Day), format("%Y %m %d")))
myXts <- xts(df$rain, dates)
> head(apply.monthly(myXts, sum))
[,1]
1890-01-03 0.8495036
1890-02-05 1.2000919

Related

Six-month peak-season running average

I'm trying to implement this:
The recommendation is a peak season ozone AQG level of 60 μg/m3
(the average of daily maximum 8-hour mean ozone concentrations).
The peak season is defined as the six consecutive months of the year
with the highest six-month running-average ozone concentration.
In regions away from the equator, this period will typically be in the
warm season within a single calendar year (northern hemisphere)
or spanning two calendar years (southern hemisphere). Close to
the equator, such clear seasonal patterns may not be obvious, but a
running-average six-month peak season will usually be identifiable
from existing monitoring or modelling data.
I have:
# A tibble: 300 × 2
date value
<dttm> <dbl>
1 1997-01-01 00:00:00 NA
2 1997-02-01 00:00:00 NA
3 1997-03-01 00:00:00 NA
4 1997-04-01 00:00:00 30.2
5 1997-05-01 00:00:00 20.9
6 1997-06-01 00:00:00 10.1
7 1997-07-01 00:00:00 9.40
8 1997-08-01 00:00:00 22.4
9 1997-09-01 00:00:00 26.2
10 1997-10-01 00:00:00 32.9
# … with 290 more rows
Every year is complete (with or without NA). I found the peaks by "findpeaks" from pracma package, and get:
peaks = findpeaks(mda8_omit$value, minpeakdistance = 6,
minpeakheight = mean(mda8_omit$value))
How do i optimize to get the best six month by peak? For northern hemisphere is easier because the peaks is within a yer (summer) but in the southern hemisphere is split in two years and peaks may change depending on latitude. Any ideas on how to continue?
Assuming that
we only use windows with 6 consecutive months of data
the year that a window falls is determined by the last month of the window
we compare all such windows, at most 12, within each calendar year
Calculate the rolling mean and then grouping by year take the row with the largest rolling mean within year. This row is the last month of the 6 month window. The input is shown reproducibly in the Note at the end.
library(dplyr)
library(zoo)
DF %>%
mutate(date = as.yearmon(date),
peakmean = rollapplyr(value, 6, mean, fill = NA)) %>%
group_by(year = as.integer(date)) %>%
slice_max(peakmean) %>%
ungroup %>%
select(-year)
## # A tibble: 1 × 3
## date value peakmean
## <yearmon> <dbl> <dbl>
## 1 Oct 1997 32.9 20.3
Note
Lines <- "date value
1 1997-01-01T00:00:00 NA
2 1997-02-01T00:00:00 NA
3 1997-03-01T00:00:00 NA
4 1997-04-01T00:00:00 30.2
5 1997-05-01T00:00:00 20.9
6 1997-06-01T00:00:00 10.1
7 1997-07-01T00:00:00 9.40
8 1997-08-01T00:00:00 22.4
9 1997-09-01T00:00:00 26.2
10 1997-10-01T00:00:00 32.9"
DF <- read.table(text = Lines)

r - Fill in missing years in Data frame [duplicate]

This question already has answers here:
Extend an irregular sequence and add zeros to missing values
(9 answers)
Closed 1 year ago.
I have some data in R that looks like this.
year freq
<int> <int>
1902 2
1903 2
1905 1
1906 4
1907 1
1908 1
1909 1
1912 1
1914 1
1915 1
The data was read in using the following code.
data = read.csv("earthquakes.csv")
my_var <- c('year')
new_data <- data[my_var]
counts <- count(data, 'year')
This is 1 page of a 7 page table. I need to fill in the missing years with a count of 0 from 1900-1999. How would I go about this? I haven't been able to find an example online where year is the primary column.
We may use complete on the 'counts' data
library(tidyr)
complete(counts, year = 1990:1999, fill = list(freq = 0))
1) Convert the input, shown in the Note, to zoo class and then to ts class. The latter will fill iln the missing years with NA. Replace the NA's with 0, convert back to data frame and set the names to the original names.
If a ts series is ok as output then omit the last two lines. If in addition it is ok to use NA rather than 0 then omit the last three lines.
library(zoo)
DF |>
read.zoo() |>
as.ts() |>
na.fill(0) |>
fortify.zoo() |>
setNames(names(DF))
giving:
year freq
1 1902 2
2 1903 2
3 1904 0
4 1905 1
5 1906 4
6 1907 1
7 1908 1
8 1909 1
9 1910 0
10 1911 0
11 1912 1
12 1913 0
13 1914 1
14 1915 1
2) for a base solution use merge. Omit the last line if NA is ok instead of 0.
m <- merge(DF, data.frame(year = min(DF$year):max(DF$year)), all = TRUE)
transform(m, freq = replace(freq, is.na(freq), 0))
Note
Lines <- "year freq
1902 2
1903 2
1905 1
1906 4
1907 1
1908 1
1909 1
1912 1
1914 1
1915 1"
DF <- read.table(text = Lines, header = TRUE)

How to calculate the average year

I have a 20-year monthly XTS time series
Jan 1990 12.3
Feb 1990 45.6
Mar 1990 78.9
..
Jan 1991 34.5
..
Dec 2009 89.0
I would like to get the average (12-month) year, or
Jan xx
Feb yy
...
Dec kk
where xx is the average of every January, yy of every February, and so on.
I have tried apply.yearly and lapply but these return 1 value, which is the 20-year total average
Would you have any suggestions? I appreciate it.
The lubridate package could be useful for you. I would use the functions year() and month() in conjunction with aggregate():
library(xts)
library(lubridate)
#set up some sample data
dates = seq(as.Date('2000/01/01'), as.Date('2005/01/01'), by="month")
df = data.frame(rand1 = runif(length(dates)), rand2 = runif(length(dates)))
my_xts = xts(df, dates)
#get the mean by year
aggregate(my_xts$rand1, by=year(index(my_xts)), FUN=mean)
This outputs something like:
2000 0.5947939
2001 0.4968154
2002 0.4941752
2003 0.5291211
2004 0.6631564
To find the mean for each month you can do:
#get the mean by month
aggregate(my_xts$rand1, by=month(index(my_xts)), FUN=mean)
which will output something like
1 0.5560279
2 0.6352220
3 0.3308571
4 0.6709439
5 0.6698147
6 0.7483192
7 0.5147294
8 0.3724472
9 0.3266859
10 0.5331233
11 0.5490693
12 0.4642588

Aggregation on 2 columns while keeping two unique R

So I have this:
Staff Result Date Days
1 50 2007 4
1 75 2006 5
1 60 2007 3
2 20 2009 3
2 11 2009 2
And I want to get to this:
Staff Result Date Days
1 55 2007 7
1 75 2006 5
2 15 2009 5
I want to have the Staff ID and Date be unique in each row, but I want to sum 'Days' and mean 'Result'
I can't work out how to do this in R, I'm sure I need to do lots of aggregations but I keep getting different results to what I am aiming for.
Many thanks
the simplest way to do this is to group_by Staff and Date and summarise the results with dplyr package:
require(dplyr)
df <- data.frame(Staff = c(1,1,1,2,2),
Result = c(50, 75, 60, 20, 11),
Date = c(2007, 2006, 2007, 2009, 2009),
Days = c(4, 5, 3, 3, 2))
df %>%
group_by(Staff, Date) %>%
summarise(Result = floor(mean(Result)),
Days = sum(Days)) %>%
data.frame
Staff Date Result Days
1 1 2006 75 5
2 1 2007 55 7
3 2 2009 15 5
You can aggregate on two variables by using a formula and then merge the two aggregates
merge(aggregate(Result ~ Staff + Date, data=df, mean),
aggregate(Days ~ Staff + Date, data=df, sum))
Staff Date Result Days
1 1 2006 75.0 5
2 1 2007 55.0 7
3 2 2009 15.5 5
Here is another option with data.table
library(data.table)
setDT(df1)[, .(Result = floor(mean(Result)), Days = sum(Days)), .(Staff, Date)]
# Staff Date Result Days
#1: 1 2007 55 7
#2: 1 2006 75 5
#3: 2 2009 15 5

Weekends in a Month in R

I am trying to prepare an xreg serie for my Arima model and I will use number of weekends in a month for it. I can find results for a year but when it is longer than a year, it usually is, I couldn't find a way. Here is what I do so far.
dates <- seq(from=as.Date("2001-01-01"), to=as.Date("2010-12-31"), by = "day")
wd <- weekdays(dates)
aylar <- months(dates[which(wd == "Sunday" | wd == "Satuday")])
table(aylar)
What I want is gathering all months' weekends not based on only months but also years. So that I can have the same length of serie with my original forecast serie.
Here is my solution:
library(chron)
library(dplyr)
library(lubridate)
month <- months(dates[chron::is.weekend(dates)])
day <- dates[chron::is.weekend(dates)]
# create data.frame
df <- data.frame(date = day, month = month, year = chron::years(day))
df %>% group_by(year, month) %>% summarize(weekends = floor(n()/2))
# year month weekends
# <dbl> <fctr> <dbl>
#1 2001 April 4
#2 2001 August 4
#3 2001 Dezember 5
#4 2001 Februar 4
#5 2001 Januar 4
#6 2001 Juli 4
#7 2001 Juni 4
#8 2001 Mai 4
#9 2001 März 4
#10 2001 November 4
## ... with 110 more rows
I hope this is a starting point for your work.

Resources