I have a data.table which has a balance. The balance is based on deposits/withdrawals each period. Each period there is an interest rate that should be applied. However I am not able to compound the interest rate to the balances, basically applying the interest rate to the balance and then using the updated balance in the next period to calculate the new value.
Balance_t1 = (0 + Deposit_t1)*(1+Interest_t1)
Balance_t2 = (Balance_t1 + Deposit_t2)*(1+Interest_t2)
Balance_t3 = (Balance_t2 + Deposit_t3)*(1+Interest_t3)
I have the following data.table
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1)
)
The result would be:
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1),
BalanceWithInterest = c(110, 212.1, -105.48, -116.028)
)
How do I update and reference the updated Balance column in every period?
It seems like you're looking for a "cumulative sum and product," which I don't know of a way to do in R (other than, for instance, with #dynamo's for loop).
That being said, this can be done efficiently with a relatively simple Rcpp solution:
library(Rcpp)
getBalance <- cppFunction(
"NumericVector getBalance(NumericVector deposit,
NumericVector interest) {
NumericVector result(deposit.size());
double prevResult = 0.0;
for (int i=0; i < deposit.size(); ++i) {
result[i] = (prevResult + deposit[i]) * (1.0 + interest[i]);
prevResult = result[i];
}
return result;
}")
Deposit <- c(100, 100, -300, 0)
Interest <- c(0.1, 0.01, 0.2, 0.1)
getBalance(Deposit, Interest)
# [1] 110.000 212.100 -105.480 -116.028
To give a sense of the efficiency improvements of Rcpp vs. base R:
# Base R solution
f2 = function(Deposit, Interest) {
Balance <- c(0, rep(NA, length(Deposit)))
for (i in 2:length(Balance)) {
Balance[i] = (Balance[i-1] + Deposit[i-1]) * (1+Interest[i-1])
}
return(Balance[-1])
}
set.seed(144)
Deposit <- runif(1000000, -1, 2)
Interest = runif(1000000, 0, 0.05)
system.time(getBalance(Deposit, Interest))
# user system elapsed
# 0.008 0.000 0.008
system.time(f2(Deposit, Interest))
# user system elapsed
# 4.701 0.008 4.730
Not enough rep to comment yet:
Can you give an indication of what data you have at each point/ when you wish to update? Do you wish to calculate, say, balance_after_interest(t) based on interest, balance(t-1) and deposits(t)?
A somewhat messy answer:
library(data.table)
dtCash <- data.table(
Deposit = c(100, 100, -300, 0),
Balance = c(100, 200, -100, -100),
Interest=c(0.1, 0.01, 0.2, 0.1)
)
# Add row for t = 0
dtCash <- rbind(rep(0, ncol(dtCash)), dtCash)
# Add "dummy" column for interest-accrued balance
dtCash$Balance.1 <- c(0, rep(NA, nrow(dtCash)-1))
for ( i in seq(nrow(dtCash))[-1] ) {
dtCash$Balance.1[i] <- (dtCash$Balance.1[i - 1] + dtCash$Deposit[i]) *
(1 + dtCash$Interest[i])
}
dtCash
# Deposit Balance Interest Balance.1
# 1: 0 0 0.00 0.000
# 2: 100 100 0.10 110.000
# 3: 100 200 0.01 212.100
# 4: -300 -100 0.20 -105.480
# 5: 0 -100 0.10 -116.028
Is this what you mean? This isn't super efficient, but it does give you what you are looking for. With some clever re-parameterisation you might be about to work around the explicit loop.
Also, if your problem size is small, you could just as well use data.frame rather than data.table. In this case, the notation would be identical. (And in this case, there is no advantage from using data.table.)
I think you need to pull out the data, work it out with lapply(), and update it. I don't think there's any vector way to do it:
interest<-dtCash[,Interest]
balance<-dtCash[,Balance]
lapply(1:(length(interest)-1), # leave the last entry - nothing to add it to
function(x)
{balance[x+1]<<-balance[x+1]+balance[x]*interest[x]} # remember the double arrow
) # because you're in a function
dtCash[,rollBal:=balance]
Deposit Balance Interest rollBal
1: 100 100 0.10 100.00
2: 100 200 0.01 220.00
3: -300 -100 0.20 -95.70
4: 0 -100 0.10 -138.72
Related
Suppose I have every month an income of 1200
The interest rate is 1% - so after 1 year the price will increase 1%
I would like to find out how many years it will take until each investment will break even
Suppose an investment costs 200,000 with a momthly income of 1200
My first year and subsequent years annual income will be:
firstYear = 1200 * 12
additionalYears = (1200*12)*(1+0.01)^c(1:5)
c(firstYear, additionalYears)
14400.00 14544.00 14689.44 14836.33 14984.70 15134.54
I would like to make the "5" in the above example dynamic until it find the breakeven point.
In this example I have:
sum(c(firstYear, additionalYears))
198854.3
So the investment did not breakeven yet. Adjusting it to "12" gives me the breakeven point:
firstYear = 1200 * 12
additionalYears = (1200*12)*(1+0.01)^c(1:12)
sum(c(firstYear, additionalYears))
If possible I would like to determine the month of that year it will break even (so given this example it breakseven in month 12 of year 12, but others might break even in month 8 of year 6 etc.
Instead of using a loop, you can use vectors. Set the max years to 100, create a vector of 100 incomes to grow and a vector of 100 growth factors. Multiply the two and get a cumulative sum of the cost (negative) and the incomes. Count the number of times the sum is negative, that is your break even.
cost = -200000 # negative cost
income = 1200*12 # annual income
i = 0.01 # interest rate to grow income after year 0.
# repeat 14400 101 times, multiply it by (1+r)^n - R is vectorised
income100 = rep(income, 101) * ((1+i) ^ seq(0,100))
# subtract the cost from the cumulative sum of income
cumincome = cost + cumsum(income100)
# how many are negative?
yrs = sum(cumincome < 0)
# how much left to recover in next year
yrs + (-cumincome[yrs] / income100[yrs+1])
[1] 13.06991
Putting this into a function
break_even_years <- function(cost, income, interest=0, period = "monthly"){
if(cost >= 0) cost = -cost
if(period == "monthly") income = income * 12
income100 = rep(income, 101) * ((1+interest) ^ seq(0,100))
cumincome = cost + cumsum(income100)
# how many are negative?
yrs = sum(cumincome < 0)
# how much left to recover in next year
yrs + (-cumincome[yrs] / income100[yrs+1])
}
Using the function
purrr::map2(
.x = cost,
.y = investment,
~ break_even_years(.x, .y, interest = 0.03, "annual"))
[[1]]
[1] 28.90435
[[2]]
[1] 23.75858
[[3]]
[1] 6.391264
[[4]]
[1] 3.505453
Benchmarking
library(microbenchmark)
microbenchmark(break_even_years(200000,1000,0.01),
find_break_even_year(1000, 200000, 0.01, 100), times = 1000)
Unit: microseconds
expr min lq mean median uq
break_even_years(200000, 1000, 0.01) 50.9 87.10 257.4185 119.0 159.05
find_break_even_year(1000, 200000, 0.01, 100) 853.5 1247.05 3432.5157 1556.2 2391.35
max neval
36938.0 1000
145980.6 1000
I think this answers my question. If anybody can help with not using the forloop function that would be very helpful.
library(tidyverse)
investment = c(1000, 2000, 5000, 27000)
interest_rate = 0.03
cost = c(45000, 67900, 34678, 98367)
max_years = 100
future_value = list()
find_break_even_year <- function(CF, investment, interest_rate, max_years){
for (year in 1:max_years){
#print(year)
future_value[[year]] <- CF * (1 + interest_rate)^year
future_value_sums = sum(unlist(future_value))
if(future_value_sums >= investment)
return(year)
}
}
purrr::map2(
.x = investment,
.y = cost,
~ find_break_even_year(.x, .y, interest_rate = 0.03, max_years = 100)
)
I've looked at several questions like this on SO and still cannot resolve it. But I'm looking to add a constraint where the change in volume (New Volume / Old Volume - 1 >= -10%) cannot be less than -10%.
Example
Customer
Old_volume
Elasticity
Price
X
100
-0.68
15.00
#example dataset
df <- data.frame(customer = c("X"),
old_volume = c(100),
elasticity = c(-0.68),
price = c(15.00))
#function
f3 <- function(x) {
new_vol = 100 * (1+(-0.68 * x))
new_rev = new_vol * (15.00 * (1+x))
new_cost = new_vol * 11.25
return(new_rev - new_cost) }
n_vol <- function(x) {
new_vol = 100 * (1+(-0.68 * x))
return(new_vol) }
#example run function
f3(0.25)
>>>[1] 622.5
#running the optimization
res <- optimize(f3, lower=0, upper=10, maximum = TRUE)
res$maximum
>>>0.6102941
res$objective
>>>[1] 754.9081
n_vol(0.6102941)
>>> 58.5
Therefore vol change % = 58.5/100-1 = -0.415, however I want to limit this to -0.1.
Now I want to add in a constraint where new_vol/old_vol-1 >= -0.1. However, I'm not sure how to add this in using Optimize(), optim() or lpsolve(). I was reading through the lpsolve() documentation and it seemed like the way to go, but I am confused as to what my vector of coefficients would be in the objective.in. Also looking at other packages like constrOptim, it seemed too complex for something one dimensional.
Thanks
As an example, normally on Excel, I would have a bunch of columns with these calculations and run solver with the constraint of volume change >= -0.1. However running a macro takes too long with rows > 1,000
EDIT: Added input example
You can calculate your new boundary algebraically.
change_inv <- function(frac, oldvol = 100) { oldvol * (1 + frac) }
n_vol_inv <- function(vol) { (vol / 100 - 1) / -0.68 }
n_vol_inv(vol = change_inv(frac = -0.1))
# [1] 0.1470588
max_x <- optimize(f3,
lower = 0,
upper = n_vol_inv(vol = change_inv(-0.1)),
maximum = TRUE)
max_x
# $maximum
# [1] 0.1469922
#
# $objective
# [1] 535.9664
n_vol(max_x$maximum) / 100 - 1
# [1] -0.09995469
Unfortunately, this means that the maximum is just the boundary.
if I have a data frame of historic option data;
StrikePrice UnderlyingPrice Days2Exp RfRate DividendRate Volatility
47 45 4 0.02 0.5 0.2
50 55 20 0.03 0.1 0.35
And I am using the package 'LSMonteCarlo' function 'AmerPutLSM';
price = AmerPutLSM(Spot = 45, sigma = 0.2, n=500, m=100, Strike = 47, r= 0.02, dr = 0.5, mT = 4)
summary(price)
Is there anyway I can do this function without manually having to change the values for the second row in my dataframe? (I'm dealing with a lot of rows in reality) An example that is wrong but gets the point of what I want to do across;
price = AmerPutLSM(Spot = dataframe$StrikePrice[1:2], sigma = dataframe$Volatility[1:2] etc, etc...)
Thanks
You can use any of the apply function here -
result <- apply(df, 1, function(x) AmerPutLSM(Spot = x['UnderlyingPrice'],
sigma = x['Volatility'], n=500, m=100, Strike = x['StrikePrice'],
r = x['RfRate'], dr = x['DividendRate'], mT = x['Days2Exp']))
result
I am stuck with this problem
I want to multiply my data but each proportion of each observation for a different percentage.
As example: if the first observation of my Var_1 has 5000 value.
I want to multiply the proportion between 100 and 1000 by 2% (in this particular case 900 x 2%).
The proportion between 1000 and 2000 by 3% (in this case 1000 x 3%).
And the proportion >2000 by 5% (in this case 3000 x 5%).
And add the sum of this process for each observation in a new variable.
Any idea of how to proceed?
I have these data like example:
library(tidyverse)
my_data <- tibble(Var_1 = c(5000, 1500, 350, 1200, 750, 1000,1250, 2500))
We can use case_when
library(dplyr)
my_data %>%
mutate(prop = case_when(between(Var_1, 100, 1000) ~ Var_1 * 0.02,
between(Var_1, 1000, 2000) ~ Var_1 * 0.03,
Var_1 > 2000 ~ Var_1 * 0.05))
You don't need to load a package just for this one operation. You can use the subset(...) function from base R. It may be tempting to write an ifelse statement and that would probably work, but R excels when you use vectorized operations rather than loops.
# 1000 randomly selected numbers between 0 and 3000
data <- sample(c(0:3000), 1000, replace = TRUE)
# Multiply the data by a percentage based on it's value.
# It is tempting to do this with ifelse statements but R
# is best at vectorised operations so use the subset(...) function
# Multiply values between 100 and 1000 by 0.02
data[subset(data, data >= 100 & data < 1000)] <- data[subset(data, data >= 100 & data < 1000)] * 0.02
# Multiply values between 1000 and 2000 by 0.03
data[subset(data, data >= 1000 & data < 2000)] <- data[subset(data, data >= 1000 & data < 2000)] * 0.03
# Multiply values greater than 2000 by 0.05
data[subset(data, data >= 2000)] <- data[subset(data, data >= 2000)] * 0.05
Here's a more generic way:
# First create a table of intervals and multipliers
bins <- data.frame(from = c(0, 100, 1000, 2000),
to = c(100, 1000, 2000, Inf),
multiplier = c(0, 0.02, 0.03, 0.05))
# Join that table to *every* row of your starting data
df <- merge(bins, my_data, all=TRUE) %>% mutate(
# Calculate the overlap between each interval and your value
interval_length = pmax(pmin(df$Var_1, df$to) - from, 0),
# Multiply the amount of overlap by the given percentage
amount = interval_length * multiplier
)
# Add up all the calculated amounts
sum(df$amount)
I am trying to simulate certain discrete variable depicting "true state of the world" (say, "red", "green" or "blue") and its indicator, somewhat imperfectly describing it.
r_names <- c("real_R", "real_G", "real_B")
Lets say I have some prior belief about distribution of "reality" variable, which I will use to sample it.
r_probs <- c(0.3, 0.5, 0.2)
set.seed(100)
reality <- sample(seq_along(r_names), 10000, prob=r_probs, replace = TRUE)
Now, let's say I have conditional probability table that stipulates the value of indicator given each of the "realities"
ri_matrix <- matrix(c(0.7, 0.3, 0,
0.2, 0.6, 0.2,
0.05,0.15,0.8), byrow=TRUE,nrow = 3)
dimnames(ri_matrix) <- list(paste("real", r_names, sep="_"),
paste("ind", r_names, sep="_"))
ri_matrix
># ind_R ind_G ind_B
># real_Red 0.70 0.30 0.0
># real_Green 0.20 0.60 0.2
># real_Blue 0.05 0.15 0.8
Since base::sample() is not vectorized for prob argument, I have to:
sample_cond <- function(r, rim){
unlist(lapply(r, function(x)
sample(seq_len(ncol(rim)), 1, prob = rim[x,], replace = TRUE)))
}
Now I can sample my "indicator" variable using the conditional probability matrix
set.seed(200)
indicator <- sample_cond(reality, ri_matrix)
Just to make sure the distributions turned out as expected:
prop.table(table(reality, indicator), margin = 1)
#> indicator
#> reality 1 2 3
#> 1 0.70043610 0.29956390 0.00000000
#> 2 0.19976124 0.59331476 0.20692400
#> 3 0.04365278 0.14400401 0.81234320
Is there a better (i.e. more idiomatic and/or efficient) way to sample a discrete variable conditioned on another discrete random variable?
UPDATE:
As suggested by #Mr.Flick, this is at least 50x faster, because it reuses probability vectors instead of repeated subsetting of the conditional probability matrix.
sample_cond_group <- function(r, rim){
il <- mapply(function(x,y){sample(seq(ncol(rim)), length(x), prob = y, replace = TRUE)},
x=split(r, r),
y=split(rim, seq(nrow(rim))))
unsplit(il, r)
}
You can be a bit more efficient by drawing all the random samples per group with a split/combine type strategy. That might look something like this
simFun <- function(N, r_probs, ri_matrix) {
stopifnot(length(r_probs) == nrow(ri_matrix))
ind <- sample.int(length(r_probs), N, prob = r_probs, replace=TRUE)
grp <- split(data.frame(ind), ind)
unsplit(Map(function(data, r) {
draw <-sample.int(ncol(ri_matrix), nrow(data), replace=TRUE, prob=ri_matrix[r, ])
data.frame(data, draw)
}, grp, as.numeric(names(grp))), ind)
}
Than you can call with
simFun(10000, r_probs, ri_matrix)