Discount rates & Present value in R - r

I have two vectors, one with different interest rates in a monthly basis, and one with cashflows:
interest<-c(0,.0448,.0452,.0428,.0428,.0452,.051,.0475,.04997)
cash_flows<-c(44273.81 44176.68 66849.14 123792.30 101141.25 190894.12 82724.14 257075.63 176920.29 482068.00 429030.01 348291.50).
The first element of the interest vector is 0, because that is my present value basis. I want to take all values to that point; all the other values are interest rates of all the other months, I intend to discount them to that first period. The cash flows are also divided in a monthly basis. The R procedure that I am writing is the following:
discount_vector<-(1+i)^-(1:length(i)-1)
discount_vector*cash_flows
And in order to get the total amount investment I just call the following
sum(discount_vector*cash_flows).
Are my conceptual reasoning and code correct?
Tanks for any attention and support
Present value of formula using discount factor
i<-c(0,.0448,.0448,.0452,.0428,.0428,.0428,.0452,.0451,.0475,.0475,.04997)
cash_flows<-c(44273.81 44176.68 66849.14 123792.30 101141.25 190894.12 82724.14 257075.63 176920.29 482068.00 429030.01 348291.50).
discount_vector<-(1+i)^-(1:length(i)-1)
discount_vector*cash_flows

Related

Trouble with Loop in R - there must be a better way?

I am new to R, and trying to get a handle on things for a school project.
The idea is to model a simple and hypothetical electricity generation/storage system which uses only solar panels and battery storage to meet energy demand. The aim is, given a predetermined storage capacity, to select the least amount of solar paneling that ensures that demand will be satisfied on every day of the year. I have a full-year of daily data - solar insolation figures that determine how productive panels will be, day-time electricity demand, and night-time electricity demand. Surplus generation during the day is stored in batteries, up to the predetermined limit, and then discharged at night to meet demand.
I have written a simple R program, f(x), where x is the amount of solar paneling that is installed. Like the battery-storage parameter, it is invariant over the entire year.
After creating new variables for the total power output per day and total excess power produced per day and adding these as columns 4 and 5 to the original data frame, the program creates two new vectors "batterystartvector" and "batterymidvector," which respectively indicate the battery level at the start of each day and at the midpoint, between day and night.
The program loops over each day (row) in the data frame, and:
(1) Credits the excess power that is produced (column 5) to the storage system up to the predetermined limit (7500 Megawatt hours in my example) - which is then stored in "batterymidvector."
(4) Subtracts the night demand (column 3) from the total just registered in "batterymidvector" to determine how much energy there will be in storage at the start of the next day, and stores this figure in "batterystartvector."
(5) Repeats for all 365 days.
Ultimately, my aim is to use an optimization package, such as DEoptimr, to determine the lowest value for x that ensures that demand is satisfied on all days - that is that no values in either "batterymidvector" or "batterystartvector" are ever negative.
Since every entry in the two battery vectors is dependent on prior entries, I cannot figure out how to write a program that does not use a 'for' loop. But surely there must be a simpler and less clunky way.
Here is the code:
library(DEoptimR)
setwd("C:/Users/User/Desktop/Thesis Stuffs/R Programs")
data <- read.csv("optdata1.csv", header=TRUE)
#x is pv installed and y is pumped-storage capacity
#default is that system starts with complete pumped reservoir
f <- function(x) {
data$output <<- (data$insolation*x)/1000
data$daybalance <<- data$output - data$day
batterystartvector <<- vector(mode="numeric",length="365")
batterystartvector[1] <<- c(7500)
batterymidvector <<- vector(mode="numeric", length="366")
for(i in 1:nrow(data)) {
#charging up
batterymidvector[i] <<- min(batterystartvector[i] + data[i,5], 7500)
#depleting
batterystartvector[i+1] <<- (batterymidvector[i] - data[i,3])
}
}

Calculate average return of strategy

Scenario (using quantstrat, blotter and portfolioanalytics)
I have 10k initial equity
I have a strategy that i want to backtest over 3000 symbol universe (stocks)
Let say the strategy is a simple MA crossover
Every time i get a buy crossover I buy 10k worth of stock and close position
on the sell crossover
For backtest purpose the strategy can trade without any portfolio restriction,
therefore i may be holding 100+ positions at any point in time, therefore the
initial equity shouldn't be considered.
I want to know the AVERAGE return of this strategy over all trades.
In reality if i only had 10k i would only be able to be in one trade at once, but i would like know statisctally what the average return would be.
I then want to compare this with the stock index benchmark.
Do i SUM or MEAN the return stream of each symbol
Is it the return of the portfolio, does this take into account the initial
equity? - i don't want the return to be as a percentage of the initial equity
or consider how may symbols are trading.
I'll add an example strategy when i get time, but the solution to the problem is:
#get the portfolio returns
instRets <- PortfReturns(account.st)
#for each column, NA the values where there is no return, because when the values are averaged out, you don't want 0's to be included in the calculation
# if there are no signals in the strategy, you will invest money elsewhere rather than just leaving lying around. Therefore you only calculate the returns #when the strategy is ACTIVE
for (i in 1:ncol(instRets)){
instRets[,i][instRets[,i] == 0] <- NA
}
#this will give you the average return when the strategy is active, if there are 100 trades on, you want the average return during that period.
portfRets <- xts(rowMeans(instRets, na.rm = T), order.by = index(instRets))
portfRets <- portfRets[!is.na(portfRets)]
Now you can compare the strategy with a benchmark SPY for example. If the strategy has alpha you can use a balancing rule to apply funds to the strategy when signals arise or keep invested in the index when there are no signals.
As far to my knowledge the returns analysis built into blotter uses the initial equity to work out returns, therefor invest the same amount in each trade as you have for initial equity. 10k initial equity, 10k per trade.

correlation; lower values better than higher values R

I am trying to calculate the correlation between some vector of investment returns and a matching vector that has a number from 1 to 5 rating the quality of the company. It looks something like this (lets call this data returnrank:
company returns rank
at&t 0.09034 2
verizon 0.23341 1
sprint 0.03021 3
How can I make it so that when I calculate cor(returnrank$returns,returnrank$rank) it treats lower values as better and higher values as worse in the rank column
(ie: if a stock has high returns and what R would consider a low score (1), I want to see a high positive correlation because I am treating 1 as better than 5).
You probably just want:
cor(returnrank$returns, max(returnrank$rank) - returnrank$rank))
It may be better to just graph the data since it's unlikely to be a linear relationship given the nature of rank

Calculating PERCENTILE in DAX

I have googled and keep ending up with formulas which are too slow. I suspect if I split the formula in steps (creating calculated columns), I might see some performance gain.
I have a table having some numeric columns along with some which would end up as slicers. The intention is to have 10th, 25th, 50th, 75th and 90th percentile over some numeric columns for the selected slicer.
This is what I have for the 10th Percentile over the column "Total Pd".
TotalPaid10thPercentile:=MINX(
FILTER(
VALUES(ClaimOutcomes[Total Pd]),
CALCULATE(
COUNTROWS(ClaimOutcomes),
ClaimOutcomes[Total Pd] <= EARLIER(ClaimOutcomes[Total Pd])
)> COUNTROWS(ClaimOutcomes)*0.1
),
ClaimOutcomes[Total Pd]
)
It takes several minutes and still no data shows up. I have around 300K records in this table.
I figured out a way to break the calculation down in a series of steps, which fetched a pretty fast solution.
For calculating the 10th percentile on Amount Paid in the table Data, I followed the below out-of-the-book formula :
Calculate the Ordinal rank for the 10th percentile element
10ptOrdinalRank:=0.10*(COUNTX('Data', [Amount Paid]) - 1) + 1
It might come out a decimal(fraction) number like 112.45
Compute the decimal part
10ptDecPart:=[10ptOrdinalRank] - TRUNC([10ptOrdinalRank])
Compute the ordinal rank of the element just below(floor)
10ptFloorElementRank:=FLOOR([10ptOrdinalRank],1)
Compute the ordinal rank of the element just above(ceiling)
10ptCeilingElementRank:=CEILING([10ptOrdinalRank], 1)
Compute element corresponding to floor
10ptFloorElement:=MAXX(TOPN([10ptFloorElementRank], 'Data',[Amount Paid],1), [Amount Paid])
Compute element corresponding to ceiling
10ptCeilingElement:=MAXX(TOPN([10ptCeilingElementRank], 'Data',[Amount Paid],1), [Amount Paid])
Compute the percentile value
10thPercValue:=[10ptFloorElement] + [10ptDecPart]*([10ptCeilingElement]-[10ptFloorElement])
I have found the performance remarkably faster than some other solutions I found on the net. Hope it helps someone in future.

Summarized huge data, How to handle it with R?

I am working on EBS, Forex market Limit Order Book(LOB): here is an example of LOB in a 100 millisecond time slice:
datetime|side(0=Bid,1=Ask)| distance(1:best price, 2: 2nd best, etc.)| price
2008/01/28,09:11:28.000,0,1,1.6066
2008/01/28,09:11:28.000,0,2,1.6065
2008/01/28,09:11:28.000,0,3,1.6064
2008/01/28,09:11:28.000,0,4,1.6063
2008/01/28,09:11:28.000,0,5,1.6062
2008/01/28,09:11:28.000,1,1,1.6067
2008/01/28,09:11:28.000,1,2,1.6068
2008/01/28,09:11:28.000,1,3,1.6069
2008/01/28,09:11:28.000,1,4,1.6070
2008/01/28,09:11:28.000,1,5,1.6071
2008/01/28,09:11:28.500,0,1,1.6065 (I skip the rest)
To summarize the data, They have two rules(I have changed it a bit for simplicity):
If there is no change in LOB in Bid or Ask side, they will not record that side. Look at the last line of the data, millisecond was 000 and now is 500 which means there was no change at LOB in either side for 100, 200, 300 and 400 milliseconds(but those information are important for any calculation).
The last price (only the last) is removed from a given side of the order book. In this case, a single record with nothing in the price field. Again there will be no record for whole LOB at that time.
Example:2008/01/28,09:11:28.800,0,1,
I want to calculate minAsk-maxBid(1.6067-1.6066) or weighted average price (using sizes of all distances as weights, there is size column in my real data). I want to do for my whole data. But as you see the data has been summarized and this is not routine. I have written a code to produce the whole data (not just summary). This is fine for small data set but for a large one I am creating a huge file. I was wondering if you have any tips how to handle the data? How to fill the gaps while it is efficient.
You did not give a great reproducible example so this will be pseudo/untested code. Read the docs carefully and make adjustments as needed.
I'd suggest you first filter and split your data into two data.frames:
best.bid <- subset(data, side == 0 & distance == 1)
best.ask <- subset(data, side == 1 & distance == 1)
Then, for each of these two data.frames, use findInterval to compute the corresponding best ask or best bid:
best.bid$ask <- best.ask$price[findInterval(best.bid$time, best.ask$time)]
best.ask$bid <- best.bid$price[findInterval(best.ask$time, best.bid$time)]
(for this to work you might have to transform date/time into a linear measure, e.g. time in seconds since market opening.)
Then it should be easy:
min.spread <- min(c(best.bid$ask - best.bid$price,
best.ask$bid - best.ask$price))
I'm not sure I understand the end of day particularity but I bet you could just compute the spread at market close and add it to the final min call.
For the weighted average prices, use the same idea but instead of the two best.bid and best.ask data.frames, you should start with two weighted.avg.bid and weighted.avg.ask data.frames.

Resources