How to create a "dynamic" column in R? - r

I'm coding a portfolio analysis tool based off back-tests. In short, I want to add a column that starts at X value which will be the initial capital plus the result of the first trade, and have the rest of the values updated from the % change of each trade, but I haven't sorted out a way to put that logic into code. The following code is a simplified example.
profit <- c(10, 15, -5, -6, 20)
change <- profit / 1000
balance <- c(1010, 1025, 1020, 1014, 1036)
data <- data.frame(profit, change, balance)
So far, the only way I can think about is to create a separate vector that increases or decreases based off the change column, but I'm not sure how to do it in a way that it takes into account like the previous value, so doing balance = start_capital * (1 + change) which would give the proportional increase taking always into account the same initial value, not the previous value plus the change of the new one (I hope I explained myself).
Thanks,
Fernando.
EDIT
I have the correct change value on the actual program as each back-test updates the balance with the result of each new trade, so the change column on the real data is correct as it is properly updating, but my code combines several back-test and as the balance update is for each separate back-test and not the combined, it is not usable when combining everything, that's why I added the change column.

If you want to do this via change column we can use Reduce
start_capital <- 1000
Reduce(function(x, y) x + x*y, data$change, init = start_capital, accumulate = TRUE)[-1]
#[1] 1010.000 1025.150 1020.024 1013.904 1034.182
Reduce with accumulate = TRUE gives the output in a cumulative form taking the output of the current iteration as input to the next one.

Related

Making a for loop in r

I am just getting started with R so I am sorry if I say things that dont make sense.
I am trying to make a for loop which does the following,
l_dtest[[1]]<-vector()
l_dtest[[2]]<-vector()
l_dtest[[3]]<-vector()
l_dtest[[4]]<-vector()
l_dtest[[5]]<-vector()
all the way up till any number which will be assigned as n. for example, if n was chosen to be 100 then it would repeat this all the way to > l_dtest[[100]]<-vector().
I have tried multiple different attempts at doing this and here is one of them.
n<-4
p<-(1:n)
l_dtest<-list()
for(i in p){
print((l_dtest[i]<-vector())<-i)
}
Again I am VERY new to R so I don't know what I am doing or what is wrong with this loop.
The detailed background for why I need to do this is that I need to write an R function that receives as input the size of the population "n", runs a simulation of the model below with that population size, and returns the number of generations it took to reach a MRCA (most recent common ancestor).
Here is the model,
We assume the population size is constant at n. Generations are discrete and non-overlapping. The genealogy is formed by this random process: in each
generation, each individual chooses two parents at random from the previous generation. The choices are made randomly and equally likely over the n possibilities and each individual chooses twice. All choices are made independently. Thus, for example, it is possible that, when an individual chooses his two parents, he chooses the same individual twice, so that in
fact he ends up with just one parent; this happens with probability 1/n.
I don't understand the specific step at the begining of this post or why I need to do it but my teacher said I do. I don't know if this helps but the next step is choosing parents for the first person and then combining the lists from the step I posted with a previous step. It looks like this,
sample(1:5, 2, replace=T)
#[1] 1 2
l_dtemp[[1]]<-union(l_dtemp[[1]], l_d[[1]]) #To my understanding, l_dtem[[1]] is now receiving the listdescandants from l_d[[1]] bcs the ladder chose l_dtemp[[1]] as first parent
l_dtemp[[2]]<-union(l_dtemp[[2]], l_d[[1]]) #Same as ^^ but for l_d[[1]]'s 2nd choice which is l_dtemp[[2]]
sample(1:5, 2, replace=T)
#[1] 1 3
l_dtemp[[1]]<-union(l_dtemp[[1]], l_d[[2]])
l_dtemp[[3]]<-union(l_dtemp[[3]], l_d[[2]])

Calculate running total with condition in R

i want to calculate running total in the column of invested money....my condition is if buy_indicator==BUY and sell_indicator==HOLD then my invested money value should be negative of close_price*100 where 100 is volume of shares which is constant.....else if buy_indicator==HOLD and sell_indicator==SELL then my invested money value should be positive of close_price*100......else buy_indicator==HOLD and sell_indicator==HOLD then it should contain the previous row value....
my dataset looks like this
enter image description here
You can use ifelse to generate the columns with +1 or -1, that you can then multiply by 100*close prices. example:
positiveorneg <- ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, 1)
moneytoinvest <- positiveorneg*100*closeprice
You can then use cumsum to get the hopefully positively trended line of your money.
mymoney <- cumsum(moneytoinvest)
Don't spend it all in one place.
EDIT: if you have more than one condition you can embed ifelse statements:
ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, ifelse(buy_indicator==HOLD&sell_indicator==SELL, 1, 0))

reading/writing data frame to google sheets using pygsheets

What is the correct program flow to write different sized data frame to the same worksheet but ensure only the most recent data values written are visible?
Here was my original sequence:
gc = pygsheets.authorize(outh_file=oauth_file)
sh = gc.open(sheet_name)
wks = sh.worksheet_by_title(wks_name)
wks.set_dataframe(df, (1, 1))
Problem with above sequence is if 1st write was 3800 rows x 12 cols and 2nd write was 2400 rows x 12 cols the wks would still show data from the prior write for rows above 2400.
My 2nd solution (basically a hack just to get it to work for me):
gc = pygsheets.authorize(outh_file=oauth_file)
sh = gc.open(spreadsheet_name)
wks = sh.worksheet_by_title(sheet_name)
sh.del_worksheet(wks)
sh.add_worksheet(sheet_name, rows=len(df) + 1, cols=len(df.columns))
wks = sh.worksheet_by_title(sheet_name)
wks.set_dataframe(df, (1, 1))
The above sequence basically does what I want but I do not like having to delete the wks (I lose all my manual formatting). I know there must be a correct way to accomplish but I do not know the pygsheets API very well.
Will a more advanced pygsheet users please advise proper program flow and methods to use?
TIA,
--Rj
fit=True will basically resize the sheet to fit you data frame. so if you wanna keep the sheet at same size, you can clear the sheet before next write. it wold be easier than your second solution. Also if you just wanna clear the range you had written earlier, you can pass a range to clear function.
wks.set_dataframe(df, (1, 1))
wks.clear()
wks.set_dataframe(df, (1, 1))

Summarized huge data, How to handle it with R?

I am working on EBS, Forex market Limit Order Book(LOB): here is an example of LOB in a 100 millisecond time slice:
datetime|side(0=Bid,1=Ask)| distance(1:best price, 2: 2nd best, etc.)| price
2008/01/28,09:11:28.000,0,1,1.6066
2008/01/28,09:11:28.000,0,2,1.6065
2008/01/28,09:11:28.000,0,3,1.6064
2008/01/28,09:11:28.000,0,4,1.6063
2008/01/28,09:11:28.000,0,5,1.6062
2008/01/28,09:11:28.000,1,1,1.6067
2008/01/28,09:11:28.000,1,2,1.6068
2008/01/28,09:11:28.000,1,3,1.6069
2008/01/28,09:11:28.000,1,4,1.6070
2008/01/28,09:11:28.000,1,5,1.6071
2008/01/28,09:11:28.500,0,1,1.6065 (I skip the rest)
To summarize the data, They have two rules(I have changed it a bit for simplicity):
If there is no change in LOB in Bid or Ask side, they will not record that side. Look at the last line of the data, millisecond was 000 and now is 500 which means there was no change at LOB in either side for 100, 200, 300 and 400 milliseconds(but those information are important for any calculation).
The last price (only the last) is removed from a given side of the order book. In this case, a single record with nothing in the price field. Again there will be no record for whole LOB at that time.
Example:2008/01/28,09:11:28.800,0,1,
I want to calculate minAsk-maxBid(1.6067-1.6066) or weighted average price (using sizes of all distances as weights, there is size column in my real data). I want to do for my whole data. But as you see the data has been summarized and this is not routine. I have written a code to produce the whole data (not just summary). This is fine for small data set but for a large one I am creating a huge file. I was wondering if you have any tips how to handle the data? How to fill the gaps while it is efficient.
You did not give a great reproducible example so this will be pseudo/untested code. Read the docs carefully and make adjustments as needed.
I'd suggest you first filter and split your data into two data.frames:
best.bid <- subset(data, side == 0 & distance == 1)
best.ask <- subset(data, side == 1 & distance == 1)
Then, for each of these two data.frames, use findInterval to compute the corresponding best ask or best bid:
best.bid$ask <- best.ask$price[findInterval(best.bid$time, best.ask$time)]
best.ask$bid <- best.bid$price[findInterval(best.ask$time, best.bid$time)]
(for this to work you might have to transform date/time into a linear measure, e.g. time in seconds since market opening.)
Then it should be easy:
min.spread <- min(c(best.bid$ask - best.bid$price,
best.ask$bid - best.ask$price))
I'm not sure I understand the end of day particularity but I bet you could just compute the spread at market close and add it to the final min call.
For the weighted average prices, use the same idea but instead of the two best.bid and best.ask data.frames, you should start with two weighted.avg.bid and weighted.avg.ask data.frames.

R Accumulate equity data - add time and price

I have some data formatted as below. I have done some analysis on this and would like to be able to plot the price development in the same graph as the analyzed data.
This requires me to have the same x-axes for the data.
So I would like to aggregate the "shares" column in say 150 increments, and add the "finalprice" and "time" to this.
The aggregation should include the latest time and price, so if the aggregation needs to occur over two or more rows of data then the last row should provide the price and time data.
My question is how to create a new vector with 150 shares per row.
The length of the vector will equal sum(shares)/150.
Is there an easy way to do this? Thanks in advance.
Edit:
I thought about expanding the observations using rep(finalprice, shares) and then getting each 150th value of the expanded vector.
Data sample:
"date","ord","shares","finalprice","time","stock"
20120702,E,2000,99.35,540.84753333,500
20120702,E,28000,99.35,540.84753333,500
20120702,E,50,99.5,542.03073333,500
20120702,E,13874,99.5,542.29411667,500
20120702,E,292,99.5,542.30191667,500
20120702,E,784,99.5,542.30193333,500
20120702,E,13300,99.35,543.04805,500
20120702,E,16658,99.35,543.04805,500
20120702,E,42,99.5,543.04805,500
20120702,E,400,99.4,546.17173333,500
20120702,E,100,99.4,547.07,500
20120702,E,2219,99.3,549.47988333,500
20120702,E,781,99.3,549.5238,500
20120702,E,50,99.3,553.4052,500
20120702,E,1500,99.35,559.86275,500
20120702,E,103,99.5,567.56726667,500
20120702,E,1105,99.7,573.93326667,500
20120702,E,4100,99.5,582.2657,500
20120702,E,900,99.5,582.2657,500
20120702,E,1024,99.45,582.43891667,500
20120702,E,8214,99.45,582.43891667,500
20120702,E,10762,99.45,582.43895,500
20120702,E,1250,99.6,586.86446667,500
20120702,E,5000,99.45,594.39061667,500
20120702,E,20000,99.45,594.39061667,500
20120702,E,15000,99.45,594.39061667,500
20120702,E,4000,99.45,601.34491667,500
20120702,E,8700,99.45,603.53608333,500
20120702,E,3290,99.6,609.23213333,500
I think I got it solved.
expand <- rep(finalprice, shares)
Increment <- expand[seq(from = 1, to = length(expand), by = 150)]

Resources