Calculate running total with condition in R - r

i want to calculate running total in the column of invested money....my condition is if buy_indicator==BUY and sell_indicator==HOLD then my invested money value should be negative of close_price*100 where 100 is volume of shares which is constant.....else if buy_indicator==HOLD and sell_indicator==SELL then my invested money value should be positive of close_price*100......else buy_indicator==HOLD and sell_indicator==HOLD then it should contain the previous row value....
my dataset looks like this
enter image description here

You can use ifelse to generate the columns with +1 or -1, that you can then multiply by 100*close prices. example:
positiveorneg <- ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, 1)
moneytoinvest <- positiveorneg*100*closeprice
You can then use cumsum to get the hopefully positively trended line of your money.
mymoney <- cumsum(moneytoinvest)
Don't spend it all in one place.
EDIT: if you have more than one condition you can embed ifelse statements:
ifelse(buy_indicator==BUY&sell_indicator==HOLD, -1, ifelse(buy_indicator==HOLD&sell_indicator==SELL, 1, 0))

Related

Finding the percentage of a specific value in the column of a data set

I have a dataset called college, and one of the columns is 'accepted'. There are two values for this column - 1 (which means student was accepted) and 0 (which means student was not accepted). I was to find the accepted student percentage.
I did this...
table(college$accepted)
which gave me the frequency of 1 and 0. (1 = 44,224 and 0 = 75,166). I then manually added those two values together (119,390) and divided the 44,224/119,390. This is fine and gets me the value I was looking for. But I would really like to know how I could do this with R code, since I'm sure there is a way to do it that I just haven't thought of.
Thanks!
Perhaps you can use prop.table like below
prop.table(table(college$accepted))["1"]
If it's a simple 0/1 column then you only need take the column mean.
mean_accepted <- mean(df$accepted)
you could first sum the column, and the count the total number in the column
sum(college$accepted)/length(college$accepted)
To make the code more explicit and describe your intent better, I suggest using a condition to identify the cases that meet your criteria for inclusion. For example:
college$accepted == 1
Then take the average of the logical vector to compute the proportion (between 0 and 1), multiply by 100 to make it a percentage.
100 * mean(college$accepted == 1, na.rm = TRUE)

Higher/lower than column in R

I have got a data frame with a column called WTP () filled with numbers from 5-30. In this data frame, I now want to add some new columns including price steps by 5 (i.e., buys at 5, buys at 10,..., buys at 30). My target is to fill this columns with a 1 if the WTP is higher than the price and a 0 if the WTP is lower than the price. What kind of function do I need to use to do this?
Thanks in advance!
df$buysat10 <- ifelse(df$WTP >= 10, 1, 0)
And replace with buysatX and >= X for the other cutoffs

How to create a "dynamic" column in R?

I'm coding a portfolio analysis tool based off back-tests. In short, I want to add a column that starts at X value which will be the initial capital plus the result of the first trade, and have the rest of the values updated from the % change of each trade, but I haven't sorted out a way to put that logic into code. The following code is a simplified example.
profit <- c(10, 15, -5, -6, 20)
change <- profit / 1000
balance <- c(1010, 1025, 1020, 1014, 1036)
data <- data.frame(profit, change, balance)
So far, the only way I can think about is to create a separate vector that increases or decreases based off the change column, but I'm not sure how to do it in a way that it takes into account like the previous value, so doing balance = start_capital * (1 + change) which would give the proportional increase taking always into account the same initial value, not the previous value plus the change of the new one (I hope I explained myself).
Thanks,
Fernando.
EDIT
I have the correct change value on the actual program as each back-test updates the balance with the result of each new trade, so the change column on the real data is correct as it is properly updating, but my code combines several back-test and as the balance update is for each separate back-test and not the combined, it is not usable when combining everything, that's why I added the change column.
If you want to do this via change column we can use Reduce
start_capital <- 1000
Reduce(function(x, y) x + x*y, data$change, init = start_capital, accumulate = TRUE)[-1]
#[1] 1010.000 1025.150 1020.024 1013.904 1034.182
Reduce with accumulate = TRUE gives the output in a cumulative form taking the output of the current iteration as input to the next one.

Counting consecutive repeats, and returning the maximum value in each in each string of repeats if over a threshold

I am working with long strings of repeating 1's and 0's representing the presence of a phenomenon as a function of depth. If this phenomenon is flagged for over 1m, it is deemed significant enough to use for further analyses, if not it could be due to experimental error.
I ultimately need to get a total thickness displaying this phenomenon at each location (if over 1m).
In a dummy data set the input and expected output would look like this:
#Depth from 0m to 10m with 0.5m readings
depth <- seq(0, 10, 0.5)
#Phenomenon found = 1, not = 0
phenomflag <- c(1,0,1,1,1,1,0,0,1,0,1,0,1,0,1,1,1,1,1,0)
What I would like as an output is a vector with: 4, 5 (which gets converted back to 2m and 2.5m)
I have attempted to solve this problem using
y <- rle(phenomflag)
z <- y$length[y$values ==1]
but once I have my count, I have no idea how to:
a) Isolate 1 maximum number from each group of consecutive repeats.
b) Restrict to consecutive strings longer than (x) - this might be easier after a.
Thanks in advance.
count posted a good solution in the comments section.
y <- y <- rle(repeating series of 1's and 0's)
x <- cbind(y$lengths,y$values) ; x[which(x[,1]>=3 & x[,2]==1)]
This results in just the values that repeat more than a threshold of 2, and just the maximum.

Percentile from bins of distributions

I need to find the "highest bin for 90% of samples".
I have a table like this:
my_table <- data.frame(matrix(c(122,68,2,0,30,0,0,0,5,79,23,9000), byrow=TRUE, ncol=4))
names(my_table) <- c("0-10","11-20","21-30","31-5000")
Where the bin-headers indicate minutes (time).
For the first row, 90% of samples are at intervals lower or equal to "11-20". I.e. 90% of samples have shorter time than 21 minutes.
For second row it is lower or equal to interval "0-10".
And for third row it is lower or equal to interval "31-5000".
I would like to add a column "90p-interval" where the above intervals are found automatically, resulting in the table like this:
my_table$Perc90 <- c("11-20","0-10","31-5000")
My real table is thousands and thousands of rows long.
If someone can help I'm very grateful, and also thanks to everyone contributing to this fantastic site!
/Chris
apply(my_table, 1, function(x) names(x)[
max( which( c(0,cumsum(x)) < 0.9*sum(x)))
])
# [1] "11-20" "0-10" "31-5000"
It's not clear how you want he 90% cutoff to be determined from your answer when it's not exact so I provided a response that gives you something that matches your example. This makes sure the selected cutoff is at least 90%.
my_table$Perc90 <- apply(my_table, 1, function(x) {
pct <- cumsum(x)/sum(x)
return(names(x[pct >= 0.9][1]))
} )

Resources