RQuantLib: Apply option pricing to data list - r

So I am using RQuantLib's AmericanOption function. It works fine if the input are single numbers, for example:
AmericanOption(type = 'call', underlying = 73.59, strike = 74, dividendYield
= 0, riskFreeRate = 0.006, maturity = 0.25, volatility= 0.2, timeSteps =
150, gridPoints = 149, engine="CrankNicolson")
However, since I have time-series data containing the daily data of the same option over time, for example:
Time underlying strike ...
10-01 73 74
10-02 74 74
10-03 75 74
obviously then I want to apply the pricing function to every single data point and output the result as a new data frame containing the results for every day.
And since I already have underlying and strike prices as data lists, what I did was creating sequence of 0 for dividend ("dividend"), sequence of 0.006 for risk free ("riskfree"), and decay maturity as:
end <- 21/252
interval <- 1/252
opmaturity <- seq(0.25, end, by=-interval)
(from 3 months to maturity to 1 month to maturity)
So of course I can't just do the following:
AmericanOption(type = 'call', underlying = RR$Stock, strike =
RR$Strike.Price, dividendYield = dividend, riskFreeRate = riskfree, maturity
= ematurity, volatility= vol, timeSteps = 150, gridPoints = 149,
because the function expects single value input.
So, how do I do this properly to tell the function to input the time-series data?

It is enough to use the mapply function as follow
mapply(AmericanOption, type = 'call', underlying = RR$Stock, strike =
RR$Strike.Price, dividendYield = dividend, riskFreeRate = riskfree, maturity
= ematurity, volatility= vol, timeSteps = 150, gridPoints = 149,


Run a package function multiple times with different parameter values

if I have a data frame of historic option data;
StrikePrice UnderlyingPrice Days2Exp RfRate DividendRate Volatility
47 45 4 0.02 0.5 0.2
50 55 20 0.03 0.1 0.35
And I am using the package 'LSMonteCarlo' function 'AmerPutLSM';
price = AmerPutLSM(Spot = 45, sigma = 0.2, n=500, m=100, Strike = 47, r= 0.02, dr = 0.5, mT = 4)
Is there anyway I can do this function without manually having to change the values for the second row in my dataframe? (I'm dealing with a lot of rows in reality) An example that is wrong but gets the point of what I want to do across;
price = AmerPutLSM(Spot = dataframe$StrikePrice[1:2], sigma = dataframe$Volatility[1:2] etc, etc...)
You can use any of the apply function here -
result <- apply(df, 1, function(x) AmerPutLSM(Spot = x['UnderlyingPrice'],
sigma = x['Volatility'], n=500, m=100, Strike = x['StrikePrice'],
r = x['RfRate'], dr = x['DividendRate'], mT = x['Days2Exp']))

Struggling to understand the behavior of the window<-.ts function in R

Consider the following code that creates two ts time-series foo & bar:
x = 1:22
foo = ts(x, start = 1.5, end = 106.5, frequency = 0.2)
bar = ts(x, start = 2.5, end = 107.5, frequency = 0.2)
The foo and bar objects are identical except for the start and end values: for bar they are both larger by 1. Neither start/end values are exactly multiples of the frequency, but that shouldn't pose a problem.
Indeed, for both foo and bar, a window of an arbitrary size can be successfully extracted:
stats::window(foo, start = 20, end = 30) # works fine
stats::window(bar, start = 20, end = 30) # works fine too
But if I try to assign values to these windows, only foo works:
window(foo, start = 20, end = 30) <- NA # works fine
window(bar, start = 20, end = 30) <- NA # ERROR!!!
Error in attr(y, "tsp") <- c(ystart, yend, xfreq) :
invalid time series parameters specified
the internal working of window<-.ts basically calls the stats::window function, so it should work just as well as calling the window() function explicitly.
My understanding is that in the ts definition, 'start' and 'end' are just in arbitrary units, eg: seconds. So ts(x, start = 1.5, end = 106.5, frequency = 0.2) may mean: a series that starts at second 1.5 and ends at second 106.5, where each number represents 5 seconds (1/frequency).
The stat::window function then, just selects the values that are within its start-end boundaries, eg: from 20 to 30 seconds. And indeed the time() for both windows is the same and seems to confirm this:
time(window(foo, start = 20, end = 30))
[1] 21.5 26.5
time(window(bar, start = 20, end = 30))
[1] 22.5 27.5
The fact that one series starts at 1.5s and the other at 2.5s has absolutely no impact on the windowing procedure. Yet when assigning values to it, my logic breaks.
Things get even wilder by removing one cycle from bar:
qux = ts(1:21, start = 2.5, end = 102.5, frequency = 0.2)
window(qux, start = 20, end = 30) <- NA #ERROR!!
Error in `window<-.ts`(`*tmp*`, start = 20, end = 30, value = NA) :
times to be replaced do not match
A different error! I think I am failing to understand some fundamental concept.
So what am I missing here?
As you say window<-.ts() uses window() internally to create the new object. It's done via eval.parent(), where window() is called with extend=TRUE, and this is when the error occurs. As such we can simplify our analysis by instead considering the following pair
window(foo, start=20, end=30, extend=TRUE) # works fine
# Time Series:
# Start = 20
# End = 25
# Frequency = 0.2
# [1] 5 6
window(bar, start=20, end=30, extend=TRUE) # error
# Error in attr(y, "tsp") <- c(ystart, yend, xfreq) :
# invalid time series parameters specified
the attr(y, "tsp") <- c(ystart, yend, xfreq) mentioned in the error message happens at the very end of stats:::window.default. The actual values used are
# for foo
y <- 5:6
attr(y, "tsp") <- c(20, 25, 0.2)
# for bar
y <- 5:6
attr(y, "tsp") <- c(20, 30, 0.2)
The 25/30 discrepancy is due to rounding earlier in the code, but why does one work, while the other doesn't? It's the frequency. A frequency of less than one seems a bit odd to me. See, if you set the frequency to 12, f.ex, you have 12 samples per period or cycle. Maybe once a month, or maybe every second hour, but every 12th sample should have something in common. If you set frequency to 1, you effectively have no period, you sample once a year and know of no significant cycles longer that are in multiple of years. What would a frequency of less than 1 mean? I guess a frequency of say 0.5 could mean that sampling is done every second year, and 0.2 then every fifth year? Maybe that's informative, I don't know.
But why the error? While both 25-20 and 30-20 divides by 5 just fine, the latter one is one element short. Use y <- 5:7 instead and it works just fine.
Why does it work for foo but not bar? Because rounding.
But why though? Probably because no-one tested the function using time series with frequency of less than one.
A possible fix could be to use zoo class time series, instead of standard ts. It works well in this case, but I haven't done any other tests.
foo.z <- as.zoo(foo)
bar.z <- as.zoo(bar)
window(foo.z, start = 20, end = 30) <- NA # works fine
window(bar.z, start = 20, end = 30) <- NA # also works fine

Best function for modelling diminishing returns

I am visiting a bird sanctuary that has many different species of birds. Some species are more numerous while other species are less numerous. I came back to the sanctuary 9 times and after every visit I am calculating the total number of species I observed. Unsurprisingly, there is a diminishing return in my visits, since I observe the most numerous species on my every visit, but it does not increase the count of observed species. What is the best function in R to predict how many birds I will observe on my 20th visit?
Here is the data.frame
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
I expect to see a model that fits data well and behaves in a "log-like" fashion, predicting diminishing returns
In order to best ask a question, stack has some good links: https://stackoverflow.com/help/how-to-ask
If you're trying to model this, I might take the approach of a regression on the square root of the independent variable based on the data. Kind of strange to think about it as a function of visits though... Maybe if it were even spaced time periods it would make more sense.
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
mod <- lm(totalNumSpeciesObserved ~ I(sqrt(visit)), d)
new.df <- data.frame(visit=1:13)
out <- predict(mod, newdata = new.df)
plot(d, type = 'o',pch = 16, xlim = c(1,13), ylim = c(200,800), lwd = 2, cex = 2)
points(out, type= 'o', pch = 21, col = "blue", cex = 2)
The I() wrapper allows you to transform the independent variable on the fly, hense the use of sqrt() without needing to save a new variable.
I also don't know if this helps, but you could build a simulator to test for asymptoptic behaviour. For example you could build a population:
population <- sample(size = 1e6, LETTERS[1:20],
replace = TRUE, prob = 1/(2:21)^2)
This would say there are 20 species and decreasing probability in your population (expand as you wish).
The you could simulate visits and information about your visit. For example how large is the sample of your visit? During a visit you only see 1% of the rainforest etc.
sim_visits <- function(visits, percent_obs, population){
species_viewed <- vector()
unique_views <- vector()
for(i in 1:visits){
my_samp <- sample(x = population, size = round(percent_obs*length(population),0),
replace = FALSE)
species_viewed <- c(species_viewed, my_samp)
unique_views[i] <- length(unique(species_viewed))
new_observed <- unique_views - dplyr::lag(unique_views, 1, 0)
df <- data.frame(unique_views = unique_views, new_observed)
df$cummulative <- cumsum(unique_views)
And then you could draw from the simulation many times and see what distribution of values you get.
sim_visits(9, percent_obs = .001, population = population)
unique_views new_observed cummulative
1 13 13 13
2 15 2 28
3 15 0 43
4 17 2 60
5 17 0 77
6 17 0 94
7 17 0 111
8 17 0 128
9 17 0 145
And don't know if this is helpful, but I find simulation a good way to conceptualise problems like these.

Calculating portfolio level returns

EDIT UPDATED I've found a great post by the authors of performanceAnalytics. This post basically sums up the ins-outs of cumulative portfolio returns, and as the author shows it's pretty tricky (he got it wrong too)! Here it is for reference: https://tradeblotter.wordpress.com/2014/09/25/aggregate-portfolio-contributions-through-time/
SO I've run into a little bit of a stump where my two numbers should be adding up but they are not. Here's an example dataframe with stock choices and their weightings of a portfolio for context:
stock.choices stock_weights
1 GOOG 0.150
2 AMZN 0.200
3 BA 0.250
4 FB 0.225
5 AAPL 0.175
Then I'm going to use the Return.portfolio function with wealth.index = TRUE to show the return of my portfolio.
Portfolio <- merge.xts(GOOG,AMZN,BA,FB,AAPL)
dollar_growth <- Return.portfolio(Portfolio, weights = stock_weights, wealth.index = TRUE)
I use dygraph to visualise the dollar growth.
dygraph(dollar_growth, main = "Portfolio Growth Base.$1") %>% dyAxis("y", label = "$")%>%
dyAnnotation("2017-05-01", text = May1, tooltip = "Initial Investment", width = 70, height = 18, tickHeight = -75)%>%
dyAnnotation(LastDay, text = Today, tooltip = "Percentage Increase",width = 70, attachAtBottom = TRUE) %>%
dyAxis("y", label = "Dollars USD")
For this example I'm going to use May 1st as the initial point of investment. On this portfolio I'm getting 11.5% return form May 1st - calculated by taking the current value ($1.37) and dividing that by May 1st ($1.23057) yielding 11.33% increase.
However when I use a different method I get a different answer which is strange because I would have thought this second method was the accurate way of calculating the return of a portfolio.
Firstly I create a dataframe which has the stock values at May 1st and their current values. Then I multiply both by their respective weighting in the portfolio. Here's the output:
May1 Current Stock.Weights May1C CurrentC
GOOG 912.57 926.50 0.150 136.8855 138.97500
AMZN 948.23 965.90 0.200 189.6460 193.18000
BA 182.39 238.78 0.250 45.5975 59.69500
FB 152.46 170.95 0.225 34.3035 38.46375
AAPL 146.58 158.63 0.175 25.6515 27.76025
May1C = May1 * Stock.Weights | CurrentC = Current * Stock.Weights
Now when I sum both May1C and CurrentC I get:
> sum(df$May1C)
[1] 432.084
> sum(df$CurrentC)
[1] 458.074
Which I would think would be the current value of the portfolio as it is the stock choices * their respective weights. This yields only an increase of 6.015%.
My question is: How is the return.portfolio function returning an 11.3% increase, where as the second method is returning a 6.015%?
Edit in reply to the comments I have found that when using return.portfolio the verbose = TRUE function returns the stock weights changing over time. This output shows the weights changing overtime with EOP and BOP.
For reference, here's the complete code to run the dygraph output:
daily_stock_returns = function(ticker) {
symbol <- getSymbols(ticker, src = 'google', auto.assign = FALSE, warnings = FALSE)
symbol <- xts::last(symbol, "1 year")
data <- periodReturn(symbol, period = 'daily', type = 'log')
colnames(data) <- as.character(ticker)
assign(ticker, data, .GlobalEnv)
Portfolio <- merge.xts(GOOG,AMZN,BA,FB,AAPL)
test <- periodReturn(Portfolio, period = 'monthly', type = 'log')
stock_weights <- c(.15, .20, .25, .225, .175)
dollar_growth <- Return.portfolio(Portfolio, weights = stock_weights, wealth.index = TRUE)
May1 <- as.numeric(dollar_growth["2017-05-01"])
format(round(May1, 3), nsmall = 2)
Today <- as.numeric(xts::last(dollar_growth, "1 day"))
Today <- ((Today/May1)-1) %>% percent()
format(round(May1, 3), nsmall = 2)
LastDay <- xts::last(dollar_growth, "1 day")
dygraph(dollar_growth, main = "Portfolio Growth Base.$1")
If you want to see the Dollar value of the portfolio components and the total portfolio you can do the following. Assuming what you want is investing in a portfolio on “DayStart (2017-01-01)” with allocation “alloc (.15, .20, .25, .225, .175)" and then WITHOUT rebalancing let it run it’s course until “DayEnd (2017-05-01)”:
initial alloc(e.g.1000 USD) for GOOG, AMZN, BA, FB, AAPL: 150, 200, 250, 225, 175
taking your portfolio returns “Portfolio” ( I took ‘discrete' returns not ‘log’):
startCapital <- c(150, 200, 250, 225, 175)
portDollar <- cumprod(1+Portfolio["::2017-05-01”]) * startCapital
portDollar <- cbind(portDollar,portf=rowSums(portDollar))
You can now plot the portfolio value in Dollars or convert it back to returns.
2017-01-03 151.4052 248.5942 175.7486 201.4256 225.6790 1002.853
2017-01-04 202.0686 224.7743 152.2168 255.6943 175.3316 1010.086
2017-01-05 254.8609 180.1164 203.0709 233.9321 151.0465 1023.027
2017-04-27 195.9950 241.4572 262.7753 190.4188 309.3954 1200.042
2017-04-28 173.9812 303.9860 206.1689 258.2377 278.1846 1220.558
2017-05-01 233.6613 280.3763 174.3678 327.5105 220.7346 1236.650

Compound interest calculation on changing balance for data.table

I have a data.table which has a balance. The balance is based on deposits/withdrawals each period. Each period there is an interest rate that should be applied. However I am not able to compound the interest rate to the balances, basically applying the interest rate to the balance and then using the updated balance in the next period to calculate the new value.
Balance_t1 = (0 + Deposit_t1)*(1+Interest_t1)
Balance_t2 = (Balance_t1 + Deposit_t2)*(1+Interest_t2)
Balance_t3 = (Balance_t2 + Deposit_t3)*(1+Interest_t3)
I have the following data.table
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1)
The result would be:
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1),
BalanceWithInterest = c(110, 212.1, -105.48, -116.028)
How do I update and reference the updated Balance column in every period?
It seems like you're looking for a "cumulative sum and product," which I don't know of a way to do in R (other than, for instance, with #dynamo's for loop).
That being said, this can be done efficiently with a relatively simple Rcpp solution:
getBalance <- cppFunction(
"NumericVector getBalance(NumericVector deposit,
NumericVector interest) {
NumericVector result(deposit.size());
double prevResult = 0.0;
for (int i=0; i < deposit.size(); ++i) {
result[i] = (prevResult + deposit[i]) * (1.0 + interest[i]);
prevResult = result[i];
return result;
Deposit <- c(100, 100, -300, 0)
Interest <- c(0.1, 0.01, 0.2, 0.1)
getBalance(Deposit, Interest)
# [1] 110.000 212.100 -105.480 -116.028
To give a sense of the efficiency improvements of Rcpp vs. base R:
# Base R solution
f2 = function(Deposit, Interest) {
Balance <- c(0, rep(NA, length(Deposit)))
for (i in 2:length(Balance)) {
Balance[i] = (Balance[i-1] + Deposit[i-1]) * (1+Interest[i-1])
Deposit <- runif(1000000, -1, 2)
Interest = runif(1000000, 0, 0.05)
system.time(getBalance(Deposit, Interest))
# user system elapsed
# 0.008 0.000 0.008
system.time(f2(Deposit, Interest))
# user system elapsed
# 4.701 0.008 4.730
Not enough rep to comment yet:
Can you give an indication of what data you have at each point/ when you wish to update? Do you wish to calculate, say, balance_after_interest(t) based on interest, balance(t-1) and deposits(t)?
A somewhat messy answer:
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1)
# Add row for t = 0
dtCash <- rbind(rep(0, ncol(dtCash)), dtCash)
# Add "dummy" column for interest-accrued balance
dtCash$Balance.1 <- c(0, rep(NA, nrow(dtCash)-1))
for ( i in seq(nrow(dtCash))[-1] ) {
dtCash$Balance.1[i] <- (dtCash$Balance.1[i - 1] + dtCash$Deposit[i]) *
(1 + dtCash$Interest[i])
# Deposit Balance Interest Balance.1
# 1: 0 0 0.00 0.000
# 2: 100 100 0.10 110.000
# 3: 100 200 0.01 212.100
# 4: -300 -100 0.20 -105.480
# 5: 0 -100 0.10 -116.028
Is this what you mean? This isn't super efficient, but it does give you what you are looking for. With some clever re-parameterisation you might be about to work around the explicit loop.
Also, if your problem size is small, you could just as well use data.frame rather than data.table. In this case, the notation would be identical. (And in this case, there is no advantage from using data.table.)
I think you need to pull out the data, work it out with lapply(), and update it. I don't think there's any vector way to do it:
lapply(1:(length(interest)-1), # leave the last entry - nothing to add it to
{balance[x+1]<<-balance[x+1]+balance[x]*interest[x]} # remember the double arrow
) # because you're in a function
Deposit Balance Interest rollBal
1: 100 100 0.10 100.00
2: 100 200 0.01 220.00
3: -300 -100 0.20 -95.70
4: 0 -100 0.10 -138.72
