Simple Data manipulation task in R

Simple Data manipulation task in R - r

Let's say I have a data frame like the following sample data.
qty_available<- c(13500, 8500, 4600)
supply_qty<- c(0, 1000, 0)
forecast<- c(1200, 400, 3000)
demand_q<- c( 100, 800, 6000)
df<- data.frame(qty_available, supply_qty, forecast, demand_q)
I am attempting to do the following manipulation: I want qty_available to equal previous qty_available + supply qty - forecast - demand quantity. I can ignore the first observation because it is irrelevant in the context of my task.
So in the second observation, we would have 13,500 + 1000 -400 -800 giving us 13,300. The third observation would then be the 13,300 + 0 - 3000 -6000 giving us 4300.
I have attempted this as follows, but it won't work as I don't think that the answers "flow through"
df<- mutate(df, qty_available = lag(qty_available) + supply_qty - forecast - demand_q)
I am trying to work this so that the answer ends up becoming 4300 for the third observation.
I am mimicking a process in Excel through R in which the correct value is 4300. I just can't figure out how to mimic that process in R.
How would I go about doing this in R? Any help is greatly appreciated. I'm sure it's fairly simple, but I just can't seem to figure it out.

I think the third observation would give us -500, as the second qty_available observation is 8500 instead of 13,300.
So I think it would be 8,500 + 0 - 3000 - 6000 = -500.
If you fix the qty_available to the first qty_available (13,300), then we would expect the answer to be 4300.
The lag function takes the previous value in the rows instead of fixing it to the first value.

Related

How to create a "dynamic" column in R?

I'm coding a portfolio analysis tool based off back-tests. In short, I want to add a column that starts at X value which will be the initial capital plus the result of the first trade, and have the rest of the values updated from the % change of each trade, but I haven't sorted out a way to put that logic into code. The following code is a simplified example.
profit <- c(10, 15, -5, -6, 20)
change <- profit / 1000
balance <- c(1010, 1025, 1020, 1014, 1036)
data <- data.frame(profit, change, balance)
So far, the only way I can think about is to create a separate vector that increases or decreases based off the change column, but I'm not sure how to do it in a way that it takes into account like the previous value, so doing balance = start_capital * (1 + change) which would give the proportional increase taking always into account the same initial value, not the previous value plus the change of the new one (I hope I explained myself).
Thanks,
Fernando.
EDIT
I have the correct change value on the actual program as each back-test updates the balance with the result of each new trade, so the change column on the real data is correct as it is properly updating, but my code combines several back-test and as the balance update is for each separate back-test and not the combined, it is not usable when combining everything, that's why I added the change column.

If you want to do this via change column we can use Reduce
start_capital <- 1000
Reduce(function(x, y) x + x*y, data$change, init = start_capital, accumulate = TRUE)[-1]
#[1] 1010.000 1025.150 1020.024 1013.904 1034.182
Reduce with accumulate = TRUE gives the output in a cumulative form taking the output of the current iteration as input to the next one.

Calculate running total with condition in R

i want to calculate running total in the column of invested money....my condition is if buy_indicator==BUY and sell_indicator==HOLD then my invested money value should be negative of close_price*100 where 100 is volume of shares which is constant.....else if buy_indicator==HOLD and sell_indicator==SELL then my invested money value should be positive of close_price*100......else buy_indicator==HOLD and sell_indicator==HOLD then it should contain the previous row value....
my dataset looks like this
enter image description here

You can use ifelse to generate the columns with +1 or -1, that you can then multiply by 100*close prices. example:
positiveorneg <- ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, 1)
moneytoinvest <- positiveorneg*100*closeprice
You can then use cumsum to get the hopefully positively trended line of your money.
mymoney <- cumsum(moneytoinvest)
Don't spend it all in one place.
EDIT: if you have more than one condition you can embed ifelse statements:
ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, ifelse(buy_indicator==HOLD&sell_indicator==SELL, 1, 0))

Increasing Bootstrap size in R

I have a time series on returns which is approximately 20 year long. Based on this time series, I want to compute somekind of a moving bootstrap to calculate the mean returns for every observation.
Let me demonstrate this on an example:
Let´s say we have information starting at 01.01.1990 and I want to compute the means with bootstrap starting at 02101.1991.
At 01.01.1991 I want to comupte the mean based on the returns between 01.01.1991-01.01.1990.
Then, on 02.10.1991 I also want to take into account the return of 02.01.1991 and therefore want to calculate the mean with bootstrap based on the returns from 01.01.1990-02.01.1991.
To sum up, the data for my bootstrap should increase by 1 through the time series.
I hope that you can understand what I am trying to say.
I would appreciate any help.
Cheers
Sven

So I managed to answer the question by myself
Let say we want to get the means calculated with bootstrap starting at 01.01.1991 which is the 300th observation in our sample
(Overall we have 1000 observations in our time series)
then the code is the following one:
h <- rep(1, 1000)
for (i in 300:1000) {
h[i] <- mean( sample(rawdata$retoil[1:i] , 5000 , replace=TRUE))
}
the first 300 row of h are 1's and can be deleted in the end
Hope I could help some of you :)

Summarized huge data, How to handle it with R?

I am working on EBS, Forex market Limit Order Book(LOB): here is an example of LOB in a 100 millisecond time slice:
datetime|side(0=Bid,1=Ask)| distance(1:best price, 2: 2nd best, etc.)| price
2008/01/28,09:11:28.000,0,1,1.6066
2008/01/28,09:11:28.000,0,2,1.6065
2008/01/28,09:11:28.000,0,3,1.6064
2008/01/28,09:11:28.000,0,4,1.6063
2008/01/28,09:11:28.000,0,5,1.6062
2008/01/28,09:11:28.000,1,1,1.6067
2008/01/28,09:11:28.000,1,2,1.6068
2008/01/28,09:11:28.000,1,3,1.6069
2008/01/28,09:11:28.000,1,4,1.6070
2008/01/28,09:11:28.000,1,5,1.6071
2008/01/28,09:11:28.500,0,1,1.6065 (I skip the rest)
To summarize the data, They have two rules(I have changed it a bit for simplicity):
If there is no change in LOB in Bid or Ask side, they will not record that side. Look at the last line of the data, millisecond was 000 and now is 500 which means there was no change at LOB in either side for 100, 200, 300 and 400 milliseconds(but those information are important for any calculation).
The last price (only the last) is removed from a given side of the order book. In this case, a single record with nothing in the price field. Again there will be no record for whole LOB at that time.
Example:2008/01/28,09:11:28.800,0,1,
I want to calculate minAsk-maxBid(1.6067-1.6066) or weighted average price (using sizes of all distances as weights, there is size column in my real data). I want to do for my whole data. But as you see the data has been summarized and this is not routine. I have written a code to produce the whole data (not just summary). This is fine for small data set but for a large one I am creating a huge file. I was wondering if you have any tips how to handle the data? How to fill the gaps while it is efficient.

You did not give a great reproducible example so this will be pseudo/untested code. Read the docs carefully and make adjustments as needed.
I'd suggest you first filter and split your data into two data.frames:
best.bid <- subset(data, side == 0 & distance == 1)
best.ask <- subset(data, side == 1 & distance == 1)
Then, for each of these two data.frames, use findInterval to compute the corresponding best ask or best bid:
best.bid$ask <- best.ask$price[findInterval(best.bid$time, best.ask$time)]
best.ask$bid <- best.bid$price[findInterval(best.ask$time, best.bid$time)]
(for this to work you might have to transform date/time into a linear measure, e.g. time in seconds since market opening.)
Then it should be easy:
min.spread <- min(c(best.bid$ask - best.bid$price,
best.ask$bid - best.ask$price))
I'm not sure I understand the end of day particularity but I bet you could just compute the spread at market close and add it to the final min call.
For the weighted average prices, use the same idea but instead of the two best.bid and best.ask data.frames, you should start with two weighted.avg.bid and weighted.avg.ask data.frames.

R Accumulate equity data - add time and price

I have some data formatted as below. I have done some analysis on this and would like to be able to plot the price development in the same graph as the analyzed data.
This requires me to have the same x-axes for the data.
So I would like to aggregate the "shares" column in say 150 increments, and add the "finalprice" and "time" to this.
The aggregation should include the latest time and price, so if the aggregation needs to occur over two or more rows of data then the last row should provide the price and time data.
My question is how to create a new vector with 150 shares per row.
The length of the vector will equal sum(shares)/150.
Is there an easy way to do this? Thanks in advance.
Edit:
I thought about expanding the observations using rep(finalprice, shares) and then getting each 150th value of the expanded vector.
Data sample:
"date","ord","shares","finalprice","time","stock"
20120702,E,2000,99.35,540.84753333,500
20120702,E,28000,99.35,540.84753333,500
20120702,E,50,99.5,542.03073333,500
20120702,E,13874,99.5,542.29411667,500
20120702,E,292,99.5,542.30191667,500
20120702,E,784,99.5,542.30193333,500
20120702,E,13300,99.35,543.04805,500
20120702,E,16658,99.35,543.04805,500
20120702,E,42,99.5,543.04805,500
20120702,E,400,99.4,546.17173333,500
20120702,E,100,99.4,547.07,500
20120702,E,2219,99.3,549.47988333,500
20120702,E,781,99.3,549.5238,500
20120702,E,50,99.3,553.4052,500
20120702,E,1500,99.35,559.86275,500
20120702,E,103,99.5,567.56726667,500
20120702,E,1105,99.7,573.93326667,500
20120702,E,4100,99.5,582.2657,500
20120702,E,900,99.5,582.2657,500
20120702,E,1024,99.45,582.43891667,500
20120702,E,8214,99.45,582.43891667,500
20120702,E,10762,99.45,582.43895,500
20120702,E,1250,99.6,586.86446667,500
20120702,E,5000,99.45,594.39061667,500
20120702,E,20000,99.45,594.39061667,500
20120702,E,15000,99.45,594.39061667,500
20120702,E,4000,99.45,601.34491667,500
20120702,E,8700,99.45,603.53608333,500
20120702,E,3290,99.6,609.23213333,500

I think I got it solved.
expand <- rep(finalprice, shares)
Increment <- expand[seq(from = 1, to = length(expand), by = 150)]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex