Flip a coin. Success, you win 100, otherwise you lose 50. You will keep playing until you have money in your pocket a. How can the value of a at any iteration be stored?
a <- 100
while (a > 0) {
if (rbinom(1, 1, 0.5) == 1) {
a <- a + 100
} else {
a <- a - 50
}
}
As a final result, when the while loop ends, I would like to be able to look at the value of a for each iteration, instead of just the final result. I consulted the post on Counting the iteration in sapply, but I wasn't able to apply it to this case.
Store the initial value of a in a second vector, and append the new value of a at each iteration.
a <- pocket <- 100
while (a > 0) {
if (rbinom(1, 1, 0.5) == 1) {
a <- a + 100
} else {
a <- a - 50
}
pocket <- c(pocket, a)
}
Of course a vectorised approach may be more efficient, e.g.:
n <- 1000000
x <- c(100, sample(c(100, -50), n, replace=TRUE))
cumsum(x)[1:match(0, cumsum(x))]
But there's no guarantee you'll run out of money within n iterations (in which case you receive an error and can just look at x to see the realised trajectory).
EDIT
In response to concerns voiced by #Roland, the following approach avoids reallocation of memory at each iteration:
n <- 1e6
a <- rep(NA_integer_, n)
a[1] <- 100L # set initial value (integer)
i <- 1 # counter
while(a[i] > 0) {
# first check whether our results will fit. If not, embiggenate `a`.
if(i==length(a)) a <- c(a, rep(NA_integer_, n))
if (rbinom(1, 1, 0.5) == 1) {
a[i+1] <- a[i] + 100L
} else {
a[i+1] <- a[i] - 50L
}
i <- i + 1
}
a[seq_len(i)]
Related
Need to apply code that has the following effect on a huge data set:
sum_to_limit <- function(x, limit) {
ret <- 0
if (length(na.omit(x)) > 0) {
for (i in seq_along(x)) {
if (ret + x[i] <= limit) ret <- ret + x[i]
}
}
return(ret)
}
In other words, I need to sum up all the elements of the supplied vector that add up to no more than limit. So for example sum_to_limit(c(10,10,10,10,5), 17) = 15. Have so far failed to come up with anything faster than above, which is not really cutting it on my data. Feels like it should be possible to create a vectorized version...
You can set up an optimization problem. Might only be worth if you have longer vectors rather than multiple small ones:
library(lpSolve)
sum_to_limit <- function(x, limit) {
sol <- lp ("max",
objective.in = rep(1, length(x)),
const.mat = matrix(x, nrow=1),
const.dir = "<=",
const.rhs = limit,
all.bin = T)
stopifnot(sol$status==0) # no solution
return(x[sol$solution==1])
}
print(sum_to_limit(c(10,10,10,10,5), 17))
I found it more interesting to return the selected entries rather than the sum.
You can sort the vector and add them upto they are less than the limit.
sum_to_limit <- function(x, limit) {
x <- sort(x)
sum(x[cumsum(x) <= limit])
}
sum_to_limit(c(10,10,10,10,5), 17)
#[1] 15
sum_to_limit(c(10,10,10,10,5), 35)
#[1] 35
I am looking to sample repeatedly from a distribution with a specific condition.
I am sampling 50 values for four iterations and saving the results. However I need each individual results from the iteration to be smaller than the last result at the same position.
mu.c <- c(7,6,5,3) # Means of control chains
chains.sim <- function(vector, N) {
all.list <- list()
for (i in 1:length(vector)) {
Y <- MASS::rnegbin(n = N, mu = vector[i], theta = 4)
name <- paste('position:',i, sep = '')
all.list[[name]] <- Y
}
all.list
}
chains.sim(mu.c, 50)
The sampling part works fine, but the Y individual results are of course not always smaller than the results from the previous iteration ("position").
Is there a way to repeat the sampling process until the result is smaller?
I would really appreciate your help!
I would add a while loop inside your for loop which samples data sets until the condition is met.
mu.c <- c(7,6,5,3) # Means of control chains
chain.sim <- function(vector, N) {
all.list <- list()
all.list[[1]] <- MASS::rnegbin(n = N, mu = vector[1], theta = 4)
for (i in 2:length(vector)) {
is_smaller <- FALSE
while(!is_smaller){
Y <- MASS::rnegbin(n = N, mu = vector[i], theta = 4)
if (all(all.list[[i-1]] >= Y)) is_smaller <- TRUE
}
all.list[[i]] <- Y
}
all.list
}
chain.sim(mu.c, 3)
Note that I changed the condition to >=, because if 0 is generated in any round, it will never find smaller values. Also, with 50 elements this code will never stop, because it is really unlikely to get two samples where each value is smaller, let alone 4 different samples.
Edit:
it can be much faster by sampling individually as you pointed out
chain.sim <- function(vector, N) {
all.list <- list()
all.list[[1]] <- MASS::rnegbin(n = N, mu = vector[1], theta = 4)
for (i in 2:length(vector)) {
Y <- numeric(N)
for (j in 1:N){
previous_value <- all.list[[i-1]][j]
if (previous_value == 0){
Y[j] = 0
next
}
is_smaller <- FALSE
while(!is_smaller){
val <- MASS::rnegbin(1, mu = vector[i], theta = 4)
if (val <= previous_value) is_smaller <- TRUE
Y[j] <- val
}
}
all.list[[i]] <- Y
}
all.list
}
chain.sim(mu.c, 50)
If 0 is encountered anywhere, no more simulation is necessary as we know the next value can only be 0. This makes the simulation much faster
The function dice takes a parameter n, representing number of rolls for a single six-sided die. It returns a vector of length n that has elements that are integers b/w 1 and 6. I have created the following code for the dice function below. It seems to run properly when I test it.
dice <- function(n) {
x <- c(1:6)
sample(length(x), size = n, replace = TRUE, prob = x)
}
The function kdice takes two parameters, n and k. The parameter n is denoted for number of rolls done. The number of dice rolled is represented by parameter k. The function should return the sum of the k dices, rolled n times. Somehow I have to implement dice() within this function. Below is what I have completed thus far, however the function returns nothing. I have an If and Else statement to make sure that at least 1 dice was rolled at least 1 time. While loop is to make the sum of NumofDice is outputted until it reaches n. Would appreciate any insights, especially how to incorporate the function Dice() in kdice().
kdice <- function(k, n){
NumofDice <- sample(1:6, size = k, replace = TRUE)
RollCount = 0
if(k>0 && n>0) {
while(RollCount < n) {
RollCount = RollCount + 1
sum(NumofDice)
}
}
else {
print("No number of dices were rolled")
}
}
kdice <- function(k, n){
if(k>0 && n>0){
replicate(n, sum(sample(c(1:6),k, replace=TRUE)))
}
else {
print("No number of dices were rolled")
}
}
kdice(4,2)
[1] 15 8
You can try defining kdice using replicate + colSums like below
kdice <- function(k, n) {
tryCatch(
colSums(matrix(replicate(n, dice(k)), nrow = k)),
error = function(e) print("No number of dices were rolled")
)
}
which give result like
> kdice(4, 5)
[1] 17 14 22 13 11
> kdice(4, 0)
[1] "No number of dices were rolled"
The idea of Project Euler question 12 is to find the smallest triangular number with a specified number of divisors(https://projecteuler.net/problem=12). As an attempt to solve this problem, I wrote the following code:
# This function finds the number of divisors of a number and returns it.
FUN <- function(x) {
i = 1
lst = integer(0)
while(i<=x)
{
if(x %% i ==0)
{
lst = c(lst, i)
}
i = i +1
}
return(lst)
}
and
n = 1
i=1
while (length(FUN(n))<500)
{
i = i + 1
n = n + i
}
This code is producing the correct answer for few smaller test cases: length(FUN(n))<4 will produce 6, and length(FUN(n))<6 will produce 28.
However, this simple looking code is taking over 24 hours to run (and still running) for length(FUN(n))<500. I understand that for a number to have 500 divisors, the number is probably very big, but I am wondering why is it taking so long to run.
You FUN is much too inefficient for this task. As the first triangular number is above the 12,000th with a value of 75,000,000 and FUN runs through all these numbers ... the number of iterations to perform is almost
12000 * 75000000 / 2 = 450 * 10^9
This is clearly more than R's relatively slow for-loop can do in a reasonable time frame.
Instead, you could apply the divisors function from the numbers package that employs a prime factor decomposition. The following code need about 5-6 seconds (on my machine) to find the triangular number.
library(numbers)
t <- 0
system.time(
for (i in 1:100000) {
t <- t + i
d <- length( divisors(t) )
if (d > 500) {
cat(i, t, d, '\n')
break
}
}
)
## 12375 76576500 576
## user system elapsed
## 5.660 0.000 5.658
Instead of calculating the i-th triangular number, here i is added to the last triangular number. The time saving is minimal.
Here's my attempt:
library(gmp)
library(plyr)
get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}
for (i in 99691200:100000) {
if (length(get_all_factors(i)[[1]])>500) print(paste(i, length(get_all_factors(i)[[1]])))
if (i %% 100000 == 0) print(paste("-",i,"-"))
}
Let it run as long as you can be bothered...
Having the following matrix and vector:
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
How do I sum recursiverly over each line of the matrix inside a funciton till obtain a desired result using last result to calculate next operation as shown:
b<-b+a[1,]
b<-b+a[2,]
b<-b+a[3,]
b<-b+a[1,]
b<-b+a[2,]
sum(b)>100 # Sum recursiverly till obtain this result sum(b)>100
This operation looks similar to this answer Multiply recursiverly in r. However it uses results from previews operations to calculate next ones.
Here's a recursive function to do what you're after,
# Sample Data
a<-matrix(c(1,4,7,
2,5,8,
3,6,9), nrow = 3)
b <- c(1,1,1)
We create a function that references itself with a value that increments modulo the number of rows
recAdd <- function(b, a, start = 1, size = NROW(a)) {
if(sum(b) > 100) return(b)
return(recAdd(b + a[start,], a, start = start %% size + 1, size))
}
> recAdd(b,a)
[1] 30 38 46
EDIT: Alternatively, here's a way with no recursion at all, which is much faster on large ratios of target number to sum of the matrix (but is slower on data of this size). Basically we get to take advantage of Euclid
nonrecAdd <- function(b, a, target = 100) {
Remaining <- target - sum(b)
perloop <- sum(a)
nloops <- Remaining %/% perloop
Remaining <- Remaining %% perloop
if(Remaining > 0) {
cumulativeRowsums <- cumsum(rowSums(a))
finalindex <- which((Remaining %/% cumulativeRowsums) == 0)[1]
b + colSums(a) * nloops + colSums(a[1:finalindex,,drop = FALSE])
} else {
b + colSums(a) * nloops
}
}