R: cumulative sum until certain value - r

I want to calculate how many values are taken until the cumulative reaches a certain value.
This is my vector: myvec = seq(0,1,0.1)
I started with coding the cumulative sum function:
cumsum_for <- function(x)
{
y = 1
for(i in 2:length(x)) # pardon the case where x is of length 1 or 0
{x[i] = x[i-1] + x[i]
y = y+1}
return(y)
}
Now, with the limit
cumsum_for <- function(x, limit)
{
y = 1
for(i in 2:length(x)) # pardon the case where x is of length 1 or 0
{x[i] = x[i-1] + x[i]
if(x >= limit) break
y = y+1}
return(y)
}
which unfortunately errors:
myvec = seq(0,1,0.1)
cumsum_for(myvec, 0.9)
[1] 10
Warning messages:
1: In if (x >= limit) break :
the condition has length > 1 and only the first element will be used
[...]

What about this? You can use cumsum to compute the cumulative sum, and then count the number of values that are below a certain threshold n:
f <- function(x, n) sum(cumsum(x) <= n)
f(myvec, 4)
#[1] 9
f(myvec, 1.1)
#[1] 5

You can put a while loop in a function. This stops further calculation of the cumsum if the limit is reached.
cslim <- function(v, l) {
s <- 0
i <- 0L
while (s < l) {
i <- i + 1
s <- sum(v[1:i])
}
i - 1
}
cslim(v, .9)
# [1] 4
Especially useful for longer vectors, e.g.
v <- seq(0, 3e7, 0.1)

Related

How to stop the counting process for the if loop in my code?

I want to stop the counting process for y after it first time meet the condition of y < 0 for each i in the for loop. Which means, the counting process for x will still continue as long as x > 0 (condition stated in the while loop).
I had tried to do something which is if (y < 0 & (y - 120*(16/81)*time + (z-2)) > 0 & z > 2), it make sense for me but the result doesn't make sense because after I run the code, count_x is around 100 (make sense) but count_y is more than 1000 which doesn't make sense since the for loop is only from 1:1000.
count_x <- 0
count_y <- 0
for (i in 1:1000){
x <- 25
y <- 25
t <- 0
while ((x > 0 | y > 0) & t < 100){
time <- rexp(1,100)
u <- runif(1,0,1)
z <- 4/((1-u)^0.2) - 4
if (z < 2){
x <- x + 110*(65/81)*time - z
} else {
y <- y + 120*(16/81)*time - (z-2)
x <- x + 110*(65/81)*time - 2
}
t <- t + time
if (x < 0){
count_x <- count_x + 1
}
if (y < 0 & (y - 120*(16/81)*time + (z-2)) > 0 & z > 2){
count_y <- count_y + 1
}
}
}
For example, during the 1st iteration i=1, the while loop will start and run the code inside the loop. So what I want to do is:
When y first time reach negative value in the first iteration, it will update the count_y by one, then the whole if loop for the counting process of y will stop doing any thing even though the while loop is still continue to run since the if loop for counting process of x might not reach it first negative value yet (the condition for the while loop still satisfied if t is still smaller than 100), so the while loop need to be continue without touching last inner if loop until either t>100 first or x first time reach its negative value. Once, the conditions for while loop do not satisfied, then only go for the 2nd iteration i=2.
Same thing goes to x, the counting process for x will stop doing anything once x first time become a negative value but the while loop will continue working as long as the condition for the while loop still satisfied.
So, there will only be one time update for either x or y for each iteration (so maximum for y should be 1000 since we have 1000 iteration, although unlikely to reach 1000) since the counting process for both x and y will stop once it reaches the first negative value.
I'm thinking to add break function into it but not sure where should I add.
I don't think there's anything wrong with the code. The y counter, to me at least, seems like it is correct. The reason why y is larger is two-fold. Firstly, even though the outer for loop is only 1000, the inner while loop could potentially run for more than a 100 at each i iteration. Secondly, it appears that y goes below 0 before x does, hence why the x counter is smaller.
I've modified your code to include a counter that counts how many while iterations occur, and two variables that store the values of x and y respectively, the first time that y goes below zero.
count_x <- 0
count_y <- 0
while_count <- 0
store_x <- 0
store_y <- 0
a <- 1
b <- 1
for (i in 1:1000){
x <- 25
y <- 25
t <- 0
count_while <- 0
while ((x > 0 | y > 0) & t < 100){
time <- rexp(1,100)
u <- runif(1,0,1)
z <- 4/((1-u)^0.2) - 4
if (z < 2){
x <- x + 110*(65/81)*time - z
} else {
y <- y + 120*(16/81)*time - (z-2)
x <- x + 110*(65/81)*time - 2
}
t <- t + time
if (x < 0){
count_x <- count_x + 1
}
if (y < 0 & (y - 120*(16/81)*time + (z-2)) > 0 & z > 2){
count_y <- count_y + 1
store_y[a] <- y
store_x[a] <- x
a <- a + 1
}
if (y > 0){
count_while <- count_while + 1
}
}
while_count[b] <- count_while
b <- b + 1
}
Checking the averages shows that x is almost exclusively greater than zero the first time y goes below zero. The while loop averages 163 iterations.
mean(store_x)
[1] 38.90333
mean(store_y)
[1] -2.035492
mean(while_count)
[1] 163.052
which(store_x < 0)
[1] 656
So, your counter seems correct to me, x is low simply because it rarely goes below zero. In fact, run the loop for just one iteration by setting for (i in 1:1) and run it several time, you'll find that your count_x rarely goes above zero, despite the while loop iterating over a 100 times. On the other hand, y goes below zero at least once on each i iteration.
I hope this helps!
We will create variables for both x and y that will store and track when both cross zero on each i iteration. First we will check if x or y is zero and if neg is of class NULL. If so, then this is the first negative. We store x_0 or y_0 as 1. Then when counting, we check if x_0 or y_0 is equal to one and if neg is NULL. Then we will add to the counter and record the first negative. Thereafter, the condition evaluates to FALSE.
count_x <- 0
count_y <- 0
while_x <- 0
while_y <- 0
for (i in 1:1000){
x <- 25
y <- 25
t <- 0
x_0 <- 0
y_0 <- 0
neg <- NULL
neg_x <- NULL
while ((x > 0 | y > 0) & t < 100){
time <- rexp(1,100)
u <- runif(1,0,1)
z <- 4/((1-u)^0.2) - 4
if (z < 2){
x <- x + 110*(65/81)*time - z
if (x < 0 & is.null(neg_x)){
x_0 <- 1
}
} else {
y <- y + 120*(16/81)*time - (z-2)
x <- x + 110*(65/81)*time - 2
if (x < 0 & is.null(neg_x)){
x_0 <- 1
}
if (y < 0 & is.null(neg)) {
y_0 <- 1
}
}
t <- t + time
if (x_0 == 1 & is.null(neg_x)){
count_x <- count_x + 1
neg_x <- "First Negative"
x_0 <- x_0 + 1
}
if (y_0 == 1 & is.null(neg)){
count_y <- count_y + 1
neg <- "first negative"
y_0 <- y_0 + 1
}
}
}
You can also place a counter outside the while loop. If x or y is less than 0, add to the counter. This is probably a better solution. The condition will evaluate regardless of when x or y cross below zero, as long as it went below zero, it will evaluate to TRUE
count_x <- 0
count_y <- 0
for (i in 1:1000){
x <- 25
y <- 25
t <- 0
while ((x > 0 | y > 0) & t < 100){
time <- rexp(1,100)
u <- runif(1,0,1)
z <- 4/((1-u)^0.2) - 4
if (z < 2){
x <- x + 110*(65/81)*time - z
} else {
y <- y + 120*(16/81)*time - (z-2)
x <- x + 110*(65/81)*time - 2
}
t <- t + time
}
if(x < 0) {
count_x <- count_x + 1
}
if(y < 0){
count_y <- count_y + 1
}
}
For me, count_y comes out at a 1000 in both cases. This confirms what I said previously, y goes below zero at each while iteration.

Terms in loop contains their sum

I would like to write a code that generates 3 x 1 vector y according to following rule (The small numbers are selected for simplicity):
Here x is a 3 x 1 vector. According to the rule, for an update of y, I need sum of all y’s.
An attemp to code with an arbitrary x:
x <- c(2,3,1)
y <- c(0,0,0)
for(i in 1:5){
for(j in 1:3){
y[j] <- x[j] + y[j] + sum(y)
}
}
This code is not appropriate because it computes sum(b) term by term.
The inner loop indicates something like this:
y[1] = x[1] + 0 = 2
y[2] = x[2] + 2 = 5
y[3] = x[3] + 2 + 5 = 8
It is not appropriate because sum(y) term contains one term for y[1], two terms for y[2], three terms for y[3]. But I think sum(y) should be 2 + 5 + 8 = 15 for each iteration, y[1], y[2], y[3], according to the rule given above. Moreover this procedure should be repeated for a certain times (here 5 times shown by the outer loop). At each time of outer loop, only one sum(y) term will be computed for all three iteration of inner loop and it will be put as sum(y) term for each j.
How should I code this?
You are over-complicating this. Vectorize the inner-loop away:
> x <- c(2,3,1)
> y <- c(0,0,0)
> for(j in 1:5) y <- x + y + sum(y)
> y
[1] 682 687 677
This approach only computes sum(y) once per iteration, which is what you seem to want. As an added benefit, adding vectors in a single operation is much faster than adding them component-wise in a loop.
Maybe this will work
myfun <- function(x, y, i) {
y[i] <- x[i] + sum(y)
if (i < length(x)) {
myfun(x, y, i+1)
} else {
return(y)
}
}
x <- c(2, 3, 1)
y <- rep(0, length(x))
myfun(x, y, 1)
# [1] 2 5 8
x <- c(2, 3, 1, 5)
y <- rep(0, length(x))
myfun(x, y, 1)
# [1] 2 5 8 20

Function for simulation game in R

I have a classic dice simulation problem, which I'm struggling to implement since I'm new with R syntax. The function (which I have called simu) works as follows:
Start with 0 points
Simulate n random draws of three six-sided dice
For each draw:
If sum of three dice >12 --> +1 point
If sum of three dice <6 --> -1 point
Otherwise (ie sum between 6 and 12):
If three dice have same number --> +5 points
Otherwise --> 0 points
Return total # of points obtained at the end of n simulations
Having tried a number of different methods I seem to be pretty close with:
simu <- function(n){
k <- 0
for(i in 1:n) {
a <- sample(y,1,replace=TRUE)
b <- sample(y,1,replace=TRUE)
c <- sample(y,1,replace=TRUE)
if ((a + b + c) > 12) {
k <- k+1
} else if ((a + b + c) < 6) {
k <- k-1
} else if ((a == b) & (b == c)) {
k <- k+5
} else k <- 0
}
return(k)
}
The problem seems to be that I am failing to iterate over new simulations (for a, b, c) for each "i" in the function.
I have commented the only issue I have found... The last else that always re-initialize k to 0. Instead it should have been k <- k + 0 but it does not change anything to remove it.
y <- seq(1,6) # 6-sided dice
simu <- function(n){
k <- 0
for(i in 1:n) {
a <- sample(y,1,replace=TRUE)
b <- sample(y,1,replace=TRUE)
c <- sample(y,1,replace=TRUE)
if ((a + b + c) > 12) {
k <- k+1
} else if ((a + b + c) < 6) {
k <- k-1
} else if ((a == b) & (b == c)) {
k <- k+5
} #else k <- 0
}
return(k)
}
The results look quite fine :
> simu(1000)
[1] 297
> simu(100)
[1] 38
If you are going to use R, then you should learn to create vectorized operations instead of 'for' loops. Here is a simulation of 1 million rolls of the dice that took less than 1 second to calculate. I am not sure how long the 'for' loop approach would have taken.
n <- 1000000 # trials
start <- proc.time() # time how long it takes
result <- matrix(0L, ncol = 6, nrow = n)
colnames(result) <- c('d1', 'd2', 'd3', 'sum', 'same', 'total')
# initial the roll of three dice
result[, 1:3] <- sample(6L, n * 3L, replace = TRUE)
# compute row sum
result[, 'sum'] <- as.integer(rowSums(result[, 1:3]))
# check for being the same
result[, 'same'] <- result[,1L] == result[, 2L] & result[, 2L] == result[, 3L]
result[, 'total'] <- ifelse(result[, 'sum'] > 12L,
1L,
ifelse(result[, 'sum'] < 6L,
-1L,
ifelse(result[, 'same'] == 1L,
5L,
0L
)
)
)
table(result[, 'total'])
-1 0 1 5
46384 680762 259083 13771
cat("simulation took:", proc.time() - start, '\n')
simulation took: 0.7 0.1 0.8 NA NA
I am not sure that's what you need, but you can try something like that:
# Draw the dice(s) - returns vector of length == n_dices
draw <- function(sides = 6, dices = 3){
sample(1:sides, dices, replace = T)
}
# test simulation x and return -1, 0, 1, 1 or 5
test <- function(x){
(sum(x) > 12)*1 + (sum(x) < 6)*(-1) + (sum(x) >= 6 &
sum(x) <= 12 &
var(x) == 0)*5
}
# simulate n draws of x dices with y sides
simu <- function(sides = 6, dices = 3, n = 100){
sum(replicate(n, test(draw(sides, dices))))
}
# run simulations of 100 draws for 1, 2, ..., 11, 12-side dices (3 dices each simulation)
dt <- lapply(1:12, function(side) replicate(100, simu(side, 3, 100)))
# plot dicstribution of scores
par(mfrow = c(3,4))
lapply(1:length(dt), function(i) hist(dt[[i]],
main = sprintf("%i sides dice", i),
xlab = "Score"
)
)

Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

From a sequence of TRUEs and falses, I wanted to make a function that returns TRUE whether there is a series of at least n1 TRUEs somewhere in the sequence. Here is that function:
fun_1 = function(TFvec, n1){
nbT = 0
solution = -1
for (i in 1:length(x)){
if (x[i]){
nbT = nbT + 1
if (nbT == n1){
return(T)
break
}
} else {
nbT = 0
}
}
return (F)
}
Test:
x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_1(x,3) # TRUE
fun_1(x,4) # FALSE
Then, I needed a function that returns TRUE if in a given list boolean vector, there is a series of at least n1 TRUEs wrapped by at least two series (one on each side) of n2 falses. Here the function:
fun_2 = function(TFvec, n1, n2){
if (n2 == 0){
fun_1(TFvec, n2)
}
nbFB = 0
nbFA = 0
nbT = 0
solution = -1
last = F
for (i in 1:length(TFvec)){
if(TFvec[i]){
nbT = nbT + 1
if (nbT == n1 & nbFB >= n2){
solution = i-n1+1
}
last = T
} else {
if (last){
nbFB = 0
nbFA = 0
}
nbFB = nbFB + 1
nbFA = nbFA + 1
nbT = 0
if (nbFA == n2 & solution!=-1){
return(T)
}
last = F
}
}
return(F)
}
It is maybe not a very efficient function though! And I haven't tested it 100 times but it looks like it works fine!
Test:
x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_2(x, 3, 2) # TRUE
fun_2(x, 3, 3) # FALSE
Now, believe it or not, I'd like to make a function (fun_3) that returns TRUE if in the boolean vector there is a (at least) series of at least n1 TRUEs wrapped in between (at least) two (one on each side) series of n2 falses where the whole thing (the three series) are wrapped in between (at least) two (one on each side) series of n3 TRUEs. And as I am afraid to have to bring this problem even further, I am asking here for help to create a function fun_n in which we enter two arguments TFvec and list_n where list_n is a list of n of any length.
Can you help me to create the function fun_n?
For convenience, record the length of the number of thresholds
n = length(list_n)
Represent the vector of TRUE and FALSE as a run-length encoding, remembering the length of each run for convenience
r = rle(TFvec); l = r$length
Find possible starting locations
idx = which(l >= list_n[1] & r$value)
Make sure the starting locations are embedded enough to satisfy all tests
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
Then check that lengths of successively remote runs are consistent with the condition, keeping only those starting points that are
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}
If there are any values left in idx, then these are the indexes into the rle satisfying the condition; the starting point(s) in the initial vector are cumsum(l)[idx - 1] + 1.
Combined:
runfun = function(TFvec, list_n) {
## setup
n = length(list_n)
r = rle(TFvec); l = r$length
## initial condition
idx = which(l >= list_n[1] & r$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
## adjacent conditions
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
idx = idx[test]
}
## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
This is fast and allows for runs >= the threshold, as stipulated in the question; for example
x = sample(c(TRUE, FALSE), 1000000, TRUE)
system.time(runfun(x, rep(2, 5)))
completes in less than 1/5th of a second.
A fun generalization allows for flexible condition, e.g., runs of exactly list_n, as in the rollapply solution
runfun = function(TFvec, list_n, cond=`>=`) {
## setup
n = length(list_n)
r = rle(TFvec); l = r$length
## initial condition
idx = which(cond(l, list_n[1]) & r$value)
idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]
## adjacent conditions
for (i in seq_len(n - 1)) {
if (length(idx) == 0)
break # no solution
thresh = list_n[i + 1]
test = cond(l[idx + i], thresh) & cond(l[idx - i], thresh)
idx = idx[test]
}
## starts = cumsum(l)[idx - 1] + 1
## any luck?
length(idx) != 0
}
Create a template, tpl of zeros and ones, convert it to a regex pattern pat. Convert x to a single string of zeros and ones and use grepl to match pat to it. No packages are used.
fun_n <- function(x, lens) {
n <- length(lens)
reps <- c(rev(lens), lens[-1])
TF <- if (n == 1) 1 else if (n %% 2) 1:0 else 0:1
tpl <- paste0(rep(TF, length = n), "{", reps, ",}")
pat <- paste(tpl, collapse = "")
grepl(pat, paste(x + 0, collapse = ""))
}
# test
x <- c(F, T, T, F, F, T, T, T, F, F, T, T, T, F)
fun_n(x, 3:1)
## TRUE
fun_n(x, 1:3)
## FALSE
fun_n(x, 100)
## FALSE
fun_n(x, 3)
## TRUE
fun_n(c(F, T, F), c(1, 1))
## [1] TRUE
fun_n(c(F, T, T, F), c(1, 1))
## [1] TRUE
Run time is not as fast as runfun on the example below but still quite fast running 10,000 instances of the example shown in slightly over 2 seconds on my laptop. Also the code is relatively short in length and loop-free.
> library(rbenchmark)
> benchmark(runfun(x, 1:3), fun_n(x, 1:3), replications = 10000)[1:4]
test replications elapsed relative
2 fun_n(x, 1:3) 10000 2.29 1.205
1 runfun(x, 1:3) 10000 1.90 1.000

Euler Project #1 in R

Problem
Find the sum of all numbers below 1000 that can be divisible by 3 or 5
One solution I created:
x <- c(1:999)
values <- x[x %% 3 == 0 | x %% 5 == 0]
sum(values
Second solution I can't get to work and need help with. I've pasted it below.
I'm trying to use a loop (here, I use while() and after this I'll try for()). I am still struggling with keeping references to indexes (locations in a vector) separate from values/observations within vectors. Loops seem to make it more challenging for me to distinguish the two.
Why does this not produce the answer to Euler #1?
x <- 0
i <- 1
while (i < 100) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- c(x, i)
}
i <- i + 1
}
sum(x)
And in words, line by line this is what I understand is happening:
x gets value 0
i gets value 1
while object i's value (not the index #) is < 1000
if is divisible by 3 or 5
add that number i to the vector x
add 1 to i in order (in order to keep the loop going to defined limit of 1e3
sum all items in vector x
I am guessing x[i] <- c(x, i) is not the right way to add an element to vector x. How do I fix this and what else is not accurate?
First, your loop runs until i < 100, not i < 1000.
Second, replace x[i] <- c(x, i) with x <- c(x, i) to add an element to the vector.
Here is a shortcut that performs this sum, which is probably more in the spirit of the problem:
3*(333*334/2) + 5*(199*200/2) - 15*(66*67/2)
## [1] 233168
Here's why this works:
In the set of integers [1,999] there are:
333 values that are divisible by 3. Their sum is 3*sum(1:333) or 3*(333*334/2).
199 values that are divisible by 5. Their sum is 5*sum(1:199) or 5*(199*200/2).
Adding these up gives a number that is too high by their intersection, which are the values that are divisible by 15. There are 66 such values, and their sum is 15*(1:66) or 15*(66*67/2)
As a function of N, this can be written:
f <- function(N) {
threes <- floor(N/3)
fives <- floor(N/5)
fifteens <- floor(N/15)
3*(threes*(threes+1)/2) + 5*(fives*(fives+1)/2) - 15*(fifteens*(fifteens+1)/2)
}
Giving:
f(999)
## [1] 233168
f(99)
## [1] 2318
And another way:
x <- 1:999
sum(which(x%%5==0 | x%%3==0))
# [1] 233168
A very efficient approach is the following:
div_sum <- function(x, n) {
# calculates the double of the sum of all integers from 1 to n
# that are divisible by x
max_num <- n %/% x
(x * (max_num + 1) * max_num)
}
n <- 999
a <- 3
b <- 5
(div_sum(a, n) + div_sum(b, n) - div_sum(a * b, n)) / 2
In contrast, a very short code is the following:
x=1:999
sum(x[!x%%3|!x%%5])
Here is an alternative that I think gives the same answer (using 99 instead of 999 as the upper bound):
iters <- 100
x <- rep(0, iters-1)
i <- 1
while (i < iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318
Here is the for-loop mentioned in the original post:
iters <- 99
x <- rep(0, iters)
i <- 1
for (i in 1:iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318

Resources