Doing operations down columns in R, indexing by another column

Doing operations down columns in R, indexing by another column - r

I am trying to compute the hedging error for an options pricing model. Each day, I will compute an equivalent position that one should take when hedging against this option in the market, let's call it X_s, and compute the cash position of the hedge, let's call it X_0, for every given day. This doesn't present any issues since I can mapply() a function that calculates all the necessary partials given my parameters, stock price, etc. to compute X_s and X_0. Where I am starting to run into issues is when trying to compute the hedging error for my models. Here's a subset of my data that I'm looking at:
date optionid px_last r X_s_position X_0_cash mp_ba
1 2020-03-03 127117475 3003.37 0.011587702 0.642588548 -1783.881169 146.05
2 2020-03-03 131373646 3003.37 0.011587702 0.527107056 -1477.947518 105.15
3 2020-03-06 127117475 2972.37 0.008128021 0.566540143 -1558.566925 125.40
4 2020-03-09 127117475 2746.56 0.004745339 0.133284145 -332.122900 33.95
5 2020-03-10 127117475 2882.23 0.005884274 0.413389283 -1125.632994 65.85
6 2020-03-11 127117475 2741.38 0.006223502 0.131700734 -333.691757 27.35
7 2020-03-12 127117475 2480.64 0.003787032 0.003680431 -8.179825 0.95
So, let's say we're looking at optionid == 127117475. On the first observation date we won't have any hedge error, so we go to the next observation on 2020-03-06. The hedge error on that day would be
0.642588548*2972.37 + -1783.881169*exp(0.011587702*as.numeric(2020-03-06 - 2020-03-03)/365) - 105.15
So in row 3, in the new 'hedge error' column I want to create, the value would be 20.80985. So, to calculate the hedge error for the next observation of optionid == 127117475, I take the previous observation X_s_position multiply it by the next spot price (px_last), add the X_0_cash value multiplied by exp(r*(difference in days between the two observations)/365) and then subtract the next observation of the option price (mp_ba)

Perhaps like so? Should the mp_ba in your example be 125.40?
library(dplyr)
df %>%
group_by(optionid) %>%
mutate(hedge_error = lag(X_s_position)*px_last + X_0_cash*exp(lag(r)*as.numeric(date - lag(date))/365) - mp_ba)
Result
# A tibble: 7 × 8
# Groups: optionid [2]
date optionid px_last r X_s_position X_0_cash mp_ba hedge_error
<date> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2020-03-03 127117475 3003. 0.0116 0.643 -1784. 146. NA
2 2020-03-03 131373646 3003. 0.0116 0.527 -1478. 105. NA
3 2020-03-06 127117475 2972. 0.00813 0.567 -1559. 125. 226.
4 2020-03-09 127117475 2747. 0.00475 0.133 -332. 34.0 1190.
5 2020-03-10 127117475 2882. 0.00588 0.413 -1126. 65.8 -807.
6 2020-03-11 127117475 2741. 0.00622 0.132 -334. 27.4 772.
7 2020-03-12 127117475 2481. 0.00379 0.00368 -8.18 0.95 318.

Related

R Tidyquant adding a price difference column to a tibble

I'm trying to add a column to a Tidyquant tibble. Here's the code:
library(tidyquant)
symbol <- 'AAPL'
start_date <- as.Date('2022-01-01')
end_date <- as.Date('2022-03-31')
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = 'stock.prices')
head(prices)
# A tibble: 6 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2022-01-03 178. 183. 178. 182. 104487900 182.
2 AAPL 2022-01-04 183. 183. 179. 180. 99310400 179.
3 AAPL 2022-01-05 180. 180. 175. 175. 94537600 175.
4 AAPL 2022-01-06 173. 175. 172. 172 96904000 172.
5 AAPL 2022-01-07 173. 174. 171. 172. 86709100 172.
6 AAPL 2022-01-10 169. 172. 168. 172. 106765600 172.
Now, I'm attempting to add the change_on_day column (that's just the difference in the 'adjusted' prices between one day and the next) using the following:
prices$change_on_day <- diff(prices$adjusted)
The error message is:
Error: Assigned data `diff(prices$adjusted)` must be compatible with existing data.
x Existing data has 61 rows.
x Assigned data has 60 rows.
i Only vectors of size 1 are recycled.
How would I add this price difference column?
Thanks!

If you are trying to get today's value from the previous date value then you should be able to do that with the lag() function
prices %>%
mutate(change_on_day=adjusted-lag(adjusted,1))

We can use tq_transmute with quantmod::periodReturn setting the period argument to 'daily' in order to calculate daily returns.
library(tidyquant)
symbol <- "AAPL"
start_date <- as.Date("2022-01-01")
end_date <- as.Date("2022-03-31")
prices <- tq_get(symbol,
from = start_date,
to = end_date,
get = "stock.prices"
)
stock_returns_monthly <- prices %>%
tq_transmute(
select = adjusted,
mutate_fun = periodReturn,
period = "daily",
col_rename = "change_on_day"
)
stock_returns_monthly
#> # A tibble: 61 × 2
#> date change_on_day
#> <date> <dbl>
#> 1 2022-01-03 0
#> 2 2022-01-04 -0.0127
#> 3 2022-01-05 -0.0266
#> 4 2022-01-06 -0.0167
#> 5 2022-01-07 0.000988
#> 6 2022-01-10 0.000116
#> 7 2022-01-11 0.0168
#> 8 2022-01-12 0.00257
#> 9 2022-01-13 -0.0190
#> 10 2022-01-14 0.00511
#> # … with 51 more rows
Created on 2022-04-18 by the reprex package (v2.0.1)
For more information check this vignette

Filter interval dates in R with dplyr

I have a dataset with dates in tibble format from tidyverse/dplyr.
library(tidyverse)
A = seq(from = as.Date("2019/1/1"),to=as.Date("2022/1/1"), length.out = 252*3)
length(A)
x = rnorm(252*3)
d = tibble(A,x);d
Resulting to :
# A tibble: 756 x 2
A x
<date> <dbl>
1 2019-01-01 1.43
2 2019-01-02 0.899
3 2019-01-03 0.658
4 2019-01-05 -0.0720
5 2019-01-06 -1.99
6 2019-01-08 -0.743
7 2019-01-09 0.426
8 2019-01-11 0.00675
9 2019-01-12 0.967
10 2019-01-14 -0.606
# ... with 746 more rows
i also have a date of interest, say:
start = as.Date("2021/12/15");start
I want to subset the dataset from this specific date (start) and one year back. But the year has 252 observations.
i tried :
d%>%
dplyr::filter(A<start)%>%
dplyr::slice_tail(n=252)
but i don't like it because my real dataset has more than one factor label and if i use this then i will have 252 observations.
i also tried :
LAST_YEAR = DATE-365
d%>%
dplyr::filter(Date <= DATE & Date >=LAST_YEAR)
which works but i want to use the 252.Imagine that i want to find 2 years (252*2) back how many observations i have on this specific time interval.
Any help how i can do that?

Calculate Stocks biWeekly Return Standard Deviation in R

I have a very long dataset of numerous stocks for many years, similar to this one:
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
symbol date open high low close volume adjusted day month year
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
1 AAPL 2021-06-01 125. 125. 124. 124. 67637100 124. 1 6 2021
2 AAPL 2021-06-02 124. 125. 124. 125. 59278900 125. 2 6 2021
3 AAPL 2021-06-03 125. 125. 123. 124. 76229200 123. 3 6 2021
4 AAPL 2021-06-04 124. 126. 124. 126. 75169300 126. 4 6 2021
5 AAPL 2021-06-07 126. 126. 125. 126. 71057600 126. 7 6 2021
6 AAPL 2021-06-08 127. 128. 126. 127. 74403800 126. 8 6 2021
7 AAPL 2021-06-09 127. 128. 127. 127. 56877900 127. 9 6 2021
8 AAPL 2021-06-10 127. 128. 126. 126. 71186400 126. 10 6 2021
9 AAPL 2021-06-11 127. 127. 126. 127. 53522400 127. 11 6 2021
10 AAPL 2021-06-14 128. 131. 127. 130. 96906500 130. 14 6 2021
I want first to calculate the biWeekly adjusted price return within each month:
-first biWeekly interval: days 1-15
-second biWeekly interval: days 16-30
Calculate the adjusted returns standard deviation within each quarter.
Here are the results (for Apple last 6 months):
1. Adjusted_biWeekly_Returns
[1] 0.043128324
[2] 0.052324355
[3] 0.081663817
[4] -0.003620508
[5] 0.026136504
[6] 0.004698278
[7] -0.022818187
[8] -0.048995111
[9] 0.0153523
[10] 0.022176775
Explanations:
[1] 129.257401/123.913231-1 = 0.043128324
(15/06/2021 adjusted price// 01/06/2021 adjusted price)
[5] 148.882721/145.090561-1 = 0.026136504
(13/08/2021 & 02/08/2021) - because there was no trading on the 15th and the 1st.
2. Quarterly Standard Deviation:
1/06/2021 - 1/09/2021 0.028944365 ([1]-[6] standard deviation)
1/09/2021 - 1/01/2022 Not available yet.
How can I calculate it in R?
*there is the tq_transmute function which is very useful for weekly but not biWeekly calculations

You could do each step separately and make use of the to.weekly and apply.quarterly functions as done in the code below:
library(tidyverse)
library(tidyquant)
library(xts)
library(PerformanceAnalytics)
library(TTR)
one_ticker = tq_get("AAPL", from = "2021-06-01")
one_ticker <- one_ticker %>%
mutate(day = day(date),
month = month(date),
year = year(date))
aapl_adj <- xts(x = one_ticker$adjusted,
order.by = one_ticker$date)
aapl_adj_weekly <- to.weekly(aapl_adj, OHLC = F) # convert to weekly
idx <- seq_along(aapl_adj_weekly) %% 2 > 0 # create index for bi-weekly dates
aapl_adj_biweekly <- aapl_adj_weekly[idx, ] # extract bi-weekly values
aapl_adj_biweekly_returns <- TTR::ROC(aapl_adj_biweekly, type = "discrete", na.pad = F)
aapl_adj_biweekly_returns
# compute quarterly standard deviation
xts::apply.quarterly(aapl_adj_biweekly_returns, sd)
e1
2021-06-18 NA
2021-09-24 0.03159961
2021-11-16 0.02900001
If you don't need to downsample to a biweekly frequency, you could just run this in one go for each ticker. This has the advantage of better estimates of the return standard deviation too, since you use all the available data points instead of only biweekly data:
# fast version without downsampling and annualized standard deviation
aapl_adj |> TTR::ROC(type = "discrete", na.pad = F) |> xts::apply.quarterly(FUN = PerformanceAnalytics::sd.annualized)
e1
2021-06-30 0.1551537
2021-09-30 0.2063587
2021-11-16 0.1701798

Error: Can't subset columns that don't exist - selecting from a fable in R

I am building multiple forecasts in R and I am trying to select certain columns from the forecast output. Below is what the fable looks like:
> head(forData)
# A fable: 6 x 8 [1M]
# Key: .model [1]
.model Month ABC .mean DateVar PCT Ind1 Ind2
<chr> <mth> <dist> <dbl> <dttm> <dbl> <dbl> <dbl>
1 average 2021 Jul N(0.31, 0.0017) 0.315 2021-07-01 00:00:00 3.25 0 0
2 average 2021 Aug N(0.33, 0.0024) 0.328 2021-08-01 00:00:00 3.25 0 0
3 average 2021 Sep N(0.33, 0.0029) 0.329 2021-09-01 00:00:00 3.25 0 0
4 average 2021 Oct N(0.32, 0.0038) 0.322 2021-10-01 00:00:00 3.25 0 0
5 average 2021 Nov N(0.33, 0.0044) 0.328 2021-11-01 00:00:00 3.25 0 0
6 average 2021 Dec N(0.33, 0.0051) 0.326 2021-12-01 00:00:00 3.25 0 0
When I try to use dplyr to select any columns I get the following error:
> forData %>% select(Month, .mean)
Error: Can't subset columns that don't exist.
x Column `ABC` doesn't exist.
The code below gives me a vector of both Month and .mean so I assume the names are correct but I can't understand the error it gives.
forData$Month
forData$.mean

We can use backquote to select after converting to tibble
forData %>%
as_tibble %>%
select(Month, `.mean`)

The underlying issue here (which is not clear from the error, I'll try to improve this) is that a <fable> must contain a distribution column. By selecting Month and .mean, you are removing the ABC (distribution) column which is required. If you no longer want the distribution, you will need to convert to a different data class, there are two main options here:
a <tsibble> with as_tsibble() (which requires the time column Month that you still have)
a <tibble> with as_tibble() (which has no requirements on the columns it contains)

Reconstructing daily values backwards using a final value in dplyr

This should be pretty straightforward to do in R using dplyr, but I am a bit stuck as to how exactly to do this.
I have aggregated by day a set of transaction revenues, and I want to calculate the daily balance using a final balance. In Excel this would be a trivial formula to do - input the first balance and then for subsequent rows subtract the daily revenues from the previous row's balance.
I am trying to do this in dplyr and keep hitting a wall. Any advice as to how I could achieve this would be great. I'm pretty sure you need to use lag() but I can't seem to figure out exactly how.
Sample data:
library(tidyverse)
x <- tibble(date = c('2018-04-03','2018-04-02','2018-04-01','2018-03-31','2018-03-30'),
daily_spend = c(575,-5.26,-112.45,-73.5,25.6))
final_balance <- 1000
Here's what the data looks like:
# A tibble: 5 x 2
date daily_spend
<chr> <dbl>
1 2018-04-03 575.
2 2018-04-02 -5.26
3 2018-04-01 -112.
4 2018-03-31 -73.5
5 2018-03-30 25.6
What I would like to do would be to add an additional column 'balance' and for each row have the value be the previous balance minus the daily spend, to give a daily spend.
Here are some expected values:
# A tibble: 5 x 3
date daily_spend end_balance
<chr> <dbl> <dbl>
1 2018-04-03 575. 1000
2 2018-04-02 -5.26 425
3 2018-04-01 -112. 430.
4 2018-03-31 -73.5 542.71
5 2018-03-30 25.6 616.21
Here's what I have been trying, which doesn't work beyond the first two rows (due to the nature of the way lag() works):
x <- x %>%
mutate(end_balance = ifelse(row_number() ==1,final_balance,0),
end_balance = ifelse(row_number()>1,lag(end_balance)-lag(daily_spend),end_balance))
The results of this method:
# A tibble: 5 x 3
date daily_spend end_balance
<chr> <dbl> <dbl>
1 2018-04-03 575. 1000.
2 2018-04-02 -5.26 425.
3 2018-04-01 -112. 5.26
4 2018-03-31 -73.5 112.
5 2018-03-30 25.6 73.5

Here you go
mutate(x, end_balance = final_balance - cumsum(daily_spend) + daily_spend)

Use final_balance subtract the cumulative sum of daily_spend (lagged):
x %>%
mutate(end_balance = final_balance - cumsum(lag(daily_spend, default = 0))) %>%
as.data.frame()
# date daily_spend end_balance
#1 2018-04-03 575.00 1000.00
#2 2018-04-02 -5.26 425.00
#3 2018-04-01 -112.45 430.26
#4 2018-03-31 -73.50 542.71
#5 2018-03-30 25.60 616.21

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Doing operations down columns in R, indexing by another column - r

Related

R Tidyquant adding a price difference column to a tibble

Filter interval dates in R with dplyr

Calculate Stocks biWeekly Return Standard Deviation in R

Error: Can't subset columns that don't exist - selecting from a fable in R

Reconstructing daily values backwards using a final value in dplyr

Categories

Resources