R - average of monthly sums of time series - r

I have a long time series (zoo) of precipitation data, I know how to obtain the monthly average of the values:
library(hydroTSM)
ma= monthlyfunction(data, mean, na.rm=TRUE)
I also know how to obtain the monthly sum of the values:
su= monthlyfunction(data, sum, na.rm=TRUE)
but with the last one I get a monthly sum for the whole period of the time serie. I would like to get a monthly average of the sums, I mean for example:
jan 1980 (sum)= 150
jan 1981 (sum)= 180
jan 1982 (sum)= 90
expected value for january = average(150,180,90)= 140
Is there a function for this instead of mean and sum?

library(hydroTSM)
#This data is daily streamflows, but is similar to Precipitation
data(OcaEnOnaQts)
x <- OcaEnOnaQts
#In case you want monthly precipitation in "precipitation / 30 days" (what is common) you can use
monthlyfunction(x, FUN=mean, na.rm=TRUE) * 30
#In case you want the precipitation per days in specific month you can use
monthlyfunction(x, FUN=mean, na.rm=TRUE) * as.vector(dwi(x, out.unit = "months") * mean(dwi(x)) / sum(dwi(x)))
#or approximately
monthlyfunction(x, FUN=mean, na.rm=TRUE)*c(31,28.25,31,30,31,30,31,31,30,31,30,31)
#Add: Some ways to come to the mean monthly precipitation
p1980 <- c(rep(0,28), 50, 50, 50) #sum = 150
p1981 <- c(rep(0,28), 60, 60, 60) #sum = 180
p1982 <- c(rep(0,28), 30, 30, 30) #sum = 90
#
mean(c(sum(p1980), sum(p1981), sum(p1982))) # = 140 This is how you want it to be calculated
mean(c(p1980, p1981, p1982))*31 # = 140 This is how I suggested to come to the result
#Some other ways to come to the mean monthly precipitation
mean(c(mean(p1980), mean(p1981), mean(p1982)))*31 # = 140
sum(c(p1980, p1981, p1982))/3 # = 140

Related

financial break even point given a stream of cashflows in R

Suppose I have every month an income of 1200
The interest rate is 1% - so after 1 year the price will increase 1%
I would like to find out how many years it will take until each investment will break even
Suppose an investment costs 200,000 with a momthly income of 1200
My first year and subsequent years annual income will be:
firstYear = 1200 * 12
additionalYears = (1200*12)*(1+0.01)^c(1:5)
c(firstYear, additionalYears)
14400.00 14544.00 14689.44 14836.33 14984.70 15134.54
I would like to make the "5" in the above example dynamic until it find the breakeven point.
In this example I have:
sum(c(firstYear, additionalYears))
198854.3
So the investment did not breakeven yet. Adjusting it to "12" gives me the breakeven point:
firstYear = 1200 * 12
additionalYears = (1200*12)*(1+0.01)^c(1:12)
sum(c(firstYear, additionalYears))
If possible I would like to determine the month of that year it will break even (so given this example it breakseven in month 12 of year 12, but others might break even in month 8 of year 6 etc.
Instead of using a loop, you can use vectors. Set the max years to 100, create a vector of 100 incomes to grow and a vector of 100 growth factors. Multiply the two and get a cumulative sum of the cost (negative) and the incomes. Count the number of times the sum is negative, that is your break even.
cost = -200000 # negative cost
income = 1200*12 # annual income
i = 0.01 # interest rate to grow income after year 0.
# repeat 14400 101 times, multiply it by (1+r)^n - R is vectorised
income100 = rep(income, 101) * ((1+i) ^ seq(0,100))
# subtract the cost from the cumulative sum of income
cumincome = cost + cumsum(income100)
# how many are negative?
yrs = sum(cumincome < 0)
# how much left to recover in next year
yrs + (-cumincome[yrs] / income100[yrs+1])
[1] 13.06991
Putting this into a function
break_even_years <- function(cost, income, interest=0, period = "monthly"){
if(cost >= 0) cost = -cost
if(period == "monthly") income = income * 12
income100 = rep(income, 101) * ((1+interest) ^ seq(0,100))
cumincome = cost + cumsum(income100)
# how many are negative?
yrs = sum(cumincome < 0)
# how much left to recover in next year
yrs + (-cumincome[yrs] / income100[yrs+1])
}
Using the function
purrr::map2(
.x = cost,
.y = investment,
~ break_even_years(.x, .y, interest = 0.03, "annual"))
[[1]]
[1] 28.90435
[[2]]
[1] 23.75858
[[3]]
[1] 6.391264
[[4]]
[1] 3.505453
Benchmarking
library(microbenchmark)
microbenchmark(break_even_years(200000,1000,0.01),
find_break_even_year(1000, 200000, 0.01, 100), times = 1000)
Unit: microseconds
expr min lq mean median uq
break_even_years(200000, 1000, 0.01) 50.9 87.10 257.4185 119.0 159.05
find_break_even_year(1000, 200000, 0.01, 100) 853.5 1247.05 3432.5157 1556.2 2391.35
max neval
36938.0 1000
145980.6 1000
I think this answers my question. If anybody can help with not using the forloop function that would be very helpful.
library(tidyverse)
investment = c(1000, 2000, 5000, 27000)
interest_rate = 0.03
cost = c(45000, 67900, 34678, 98367)
max_years = 100
future_value = list()
find_break_even_year <- function(CF, investment, interest_rate, max_years){
for (year in 1:max_years){
#print(year)
future_value[[year]] <- CF * (1 + interest_rate)^year
future_value_sums = sum(unlist(future_value))
if(future_value_sums >= investment)
return(year)
}
}
purrr::map2(
.x = investment,
.y = cost,
~ find_break_even_year(.x, .y, interest_rate = 0.03, max_years = 100)
)

correlation coefficients between two time series calculated over windows moved forward in time by n time unit

Is there a package or a simple code to produce plots of
(1) correlation coefficients between two time series calculated over windows moved forward in time by n time unit
(2) and their respective p-values calculated for each move ?
library(zoo)
x = ts(rnorm(1:121), start = 1900, end = 2021)
y = ts(rnorm(1:121), start = 1900, end = 2021)
data = data.frame(x, y)
# 40-year moving window lagged forward by 15 years per example
rollapply(data, width=40, by = 15,
function(x) cor(x[,1],x[,2], method = "pearson"),
by.column=FALSE)
[1] 0.92514750 0.5545223 -0.207100231 -0.119647462 -0.125114237 0.041334073
**It would be better with Hmisc::rcorr which also calculates p-values but I didn't manage to integrate it in rollapply.
In the result here, the first coefficient (0.9251...) is valid for 1900:1940, the second one is valid for 1915:1955 etc.
So the question is: is there a quick way to integrate this result into a staircase graph with time, r and p-value?
The output would look like:
Time
r
P
1900
0.92
0.000001
1901
0.92
0.000001
...
...
...
1915
0.55
0.00045
1916
0.55
0.00045
A few points:
there are 2021-1900+1 = 122 years from 1900 to 2021 inclusive, not 121
the 40/15 parameters do not evenly work with 122 points so start at 1907
rcorr returns a list of 3 components and we want the 1,2 elements of each. We can fill in the missing values from rollapplyr using na.locf. The input and output are both mts/ts series.
library(zoo)
library(Hmisc)
set.seed(123)
tt <- ts(cbind(x = rnorm(115), y = rnorm(115)), start = 1907)
na.locf(rollapplyr(tt, width=40, by = 15,
function(x) sapply(rcorr(x), `[`, 1, 2),
by.column = FALSE, fill = NA), fromLast = TRUE)
The above returns a series with the same number of rows as the input tt but based on computing rcorr for the following ranges of years:
rollapplyr(1907:2021, 40, by = 15, range)
## [,1] [,2]
## [1,] 1907 1946
## [2,] 1922 1961
## [3,] 1937 1976
## [4,] 1952 1991
## [5,] 1967 2006
## [6,] 1982 2021

How to use growth rates with apply function instead of a loop in R

Suppose I have a data frame 'country' with 3 colums:
year (ranging from 2000 to 2017)
GDP
Population
My objective is to grow the GDP and population for the next five years according to assumptions. I have developped the following loop:
country[19:23,1] <- seq(2018, by=1, length.out = 5)
for (i in 19:nrow(country)){
country[i,"GDP"] <- country[i-1, "GDP"] * (1 + hypo_gdp/100)
country[i,"Population"] <- country[i-1, "Population"] * (1 + hypo_pop/100)
}
Where hypo_gdp and hypo_pop are my growth assumptions.
Is there a way to use one of the apply() functions in this case?
Thanks in advance!
You do not need any apply() function in this case. This is just a simple geometric progression:
# simulate some data
country <- data.frame(year = 2000:2017,
GDP = rep(100, 18),
Population = rep(1000, 18))
country[19:23,1] <- seq(2018, by=1, length.out = 5)
# define gdp and pop parameters
hypo_gdp = 5
hypo_pop = 2
# define common ratios
b1 <- (1 + hypo_gdp/100)
b2 <- (1 + hypo_pop/100)
# calculate values for years 2018-2022
country[19:23,2] = country$GDP[18]* b1 ^ (1:5)
country[19:23,3] = country$Population[18]* b2 ^ (1:5)
tail(country)
# year GDP Population
# 18 2017 100.0000 1000.000
# 19 2018 105.0000 1020.000
# 20 2019 110.2500 1040.400
# 21 2020 115.7625 1061.208
# 22 2021 121.5506 1082.432
# 23 2022 127.6282 1104.081

r: How to connect line breaks in ggplot

Reusing the example in this question, but for a different question;
Plot time series and forecast simultaneously using ggplot2
As you can see, there is a gap between 'my observation' and 'my forecast' (between 350, and 351).
Why is there a gap? I have a 1 day forecast, and the forecast line itself is completely missing from the chart. Please help!
It's because your last 'observation' was made when time=350.
df[df$time > 349 & df$time <= 351, ]
## time M isin
## 26 350 -0.2180864 observations
## 27 351 1.2246175 my_forecast
## 51 351 3.7502526 upper_bound
## 75 351 -1.3010176 lower_bound
You can add a data point at time=351 and isin=observations, if you want to connect them.
df <- rbind(df, data.frame(
time = c(351), M = c(1.2246175), isin = c("observations")
))
ggplot(df, aes(x = time, y = M, color = isin)) +
geom_line()

Binning data in R

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins).
v<-c(1:4000)
V is really just a vector. I read about cut but that needs me to specify the breakpoints. I just want 60 equal intervals
Use cut and tapply:
> tapply(v, cut(v, 60), median)
(-3,67.7] (67.7,134] (134,201] (201,268]
34.0 101.0 167.5 234.0
(268,334] (334,401] (401,468] (468,534]
301.0 367.5 434.0 501.0
(534,601] (601,668] (668,734] (734,801]
567.5 634.0 701.0 767.5
(801,867] (867,934] (934,1e+03] (1e+03,1.07e+03]
834.0 901.0 967.5 1034.0
(1.07e+03,1.13e+03] (1.13e+03,1.2e+03] (1.2e+03,1.27e+03] (1.27e+03,1.33e+03]
1101.0 1167.5 1234.0 1301.0
(1.33e+03,1.4e+03] (1.4e+03,1.47e+03] (1.47e+03,1.53e+03] (1.53e+03,1.6e+03]
1367.5 1434.0 1500.5 1567.0
(1.6e+03,1.67e+03] (1.67e+03,1.73e+03] (1.73e+03,1.8e+03] (1.8e+03,1.87e+03]
1634.0 1700.5 1767.0 1834.0
(1.87e+03,1.93e+03] (1.93e+03,2e+03] (2e+03,2.07e+03] (2.07e+03,2.13e+03]
1900.5 1967.0 2034.0 2100.5
(2.13e+03,2.2e+03] (2.2e+03,2.27e+03] (2.27e+03,2.33e+03] (2.33e+03,2.4e+03]
2167.0 2234.0 2300.5 2367.0
(2.4e+03,2.47e+03] (2.47e+03,2.53e+03] (2.53e+03,2.6e+03] (2.6e+03,2.67e+03]
2434.0 2500.5 2567.0 2634.0
(2.67e+03,2.73e+03] (2.73e+03,2.8e+03] (2.8e+03,2.87e+03] (2.87e+03,2.93e+03]
2700.5 2767.0 2833.5 2900.0
(2.93e+03,3e+03] (3e+03,3.07e+03] (3.07e+03,3.13e+03] (3.13e+03,3.2e+03]
2967.0 3033.5 3100.0 3167.0
(3.2e+03,3.27e+03] (3.27e+03,3.33e+03] (3.33e+03,3.4e+03] (3.4e+03,3.47e+03]
3233.5 3300.0 3367.0 3433.5
(3.47e+03,3.53e+03] (3.53e+03,3.6e+03] (3.6e+03,3.67e+03] (3.67e+03,3.73e+03]
3500.0 3567.0 3633.5 3700.0
(3.73e+03,3.8e+03] (3.8e+03,3.87e+03] (3.87e+03,3.93e+03] (3.93e+03,4e+03]
3767.0 3833.5 3900.0 3967.0
In the past, i've used this function
evenbins <- function(x, bin.count=10, order=T) {
bin.size <- rep(length(x) %/% bin.count, bin.count)
bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1, 0)
bin <- rep(1:bin.count, bin.size)
if(order) {
bin <- bin[rank(x,ties.method="random")]
}
return(factor(bin, levels=1:bin.count, ordered=order))
}
and then i can run it with
v.bin <- evenbins(v, 60)
and check the sizes with
table(v.bin)
and see they all contain 66 or 67 elements. By default this will order the values just like cut will so each of the factor levels will have increasing values. If you want to bin them based on their original order,
v.bin <- evenbins(v, 60, order=F)
instead. This just split the data up in the order it appears
This result shows the 59 median values of the break-points. The 60 bin values are probably as close to equal as possible (but probably not exactly equal).
> sq <- seq(1, 4000, length = 60)
> sapply(2:length(sq), function(i) median(c(sq[i-1], sq[i])))
# [1] 34.88983 102.66949 170.44915 238.22881 306.00847 373.78814
# [7] 441.56780 509.34746 577.12712 644.90678 712.68644 780.46610
# ......
Actually, after checking, the bins are pretty darn close to being equal.
> unique(diff(sq))
# [1] 67.77966 67.77966 67.77966

Resources