I'm trying to simulate a dice game, with the following criteria:
(1) You are allowed to roll a die up to 6 times;
(2) At any time during the game, after observing the outcome of the roll, you may stop the game, and you win the dollar amount shown on that roll. For example, your rolls are 5, 1, 3, 4, and you
decide to stop the game, then you win $4; your rolls are 5, 1, 3, 4, 3, 2, with no decision to stop the game, then you win $2.
The function I have at the moment is
stop_on_6 <- function() {
nrolls <- 0
# set.seed(0)
n <- 1
# generate 1 random integer from uniform distribution on [1,6] with
# equal probability.
while (n <= 6){
roll <- sample(1:6, size = 1, replace = TRUE, prob = rep(1/6, 6))
if (roll == 6) {print('A 6 was rolled')
return (roll)}
n <- n + 1
}
sprintf("You've rolled ", n, " times.")
}
The function I'm aiming for will compute your expected winnings over n game plays, assuming that you stop the game only if you get 6 on your roll.
At the moment, when I call the function, prints either "A 6 was rolled", or "You've rolled 7 times". I'm not sure how to make the function roll up to 6 times, but stop if roll == 6.
First part of the answer, you have either both answer because :
1- 6 happens very often when running a dice 6 times.
2- the while loop will stop when n == 7, so you will always have 7 times.
To fix the second case you can either print n-1 or go with n initialised at 0 and while n < 6.
I used the first one below.
stop_on_6 <- function() {
n <- 1
memory = 0
while (n <= 6 & memory != 6){
roll <- sample(1:6, size = 1, replace = TRUE, prob = rep(1/6, 6))
if (roll == 6){
print('A 6 was rolled')
}
memory = roll
n <- n + 1
}
sprintf("You played %d times and won %d", n-1, memory)
}
stop_on_6()
stop_on_6 <- function(episode) {
reward <- c()
for(i in 1:episode) {
n <- 1
while (n <= 6){
roll <- sample(1:6, size = 1, replace = TRUE, prob = rep(1/6, 6))
reward[i] <- roll
n <- ifelse(roll == 6,7,n+1)
}
}
return(paste0("You played ", episode," episode.Your expected reward is ",mean(reward)))
}
stop_on_6(1000)
gives,
"You played 1000 episode.Your expected reward is 4.944"
This question just nerd-sniped me, so here's a loop to give you the optimal strategy each roll. If you get a 5 in the 2nd roll or later, or a 4 in the 5th roll, you should quit, since you'll likely do worse staying in.
dice <- 1:6
breakeven = 0 # no value of rolls after the sixth one
for(i in 6:2) {
next_roll_EV <- breakeven
values_over_future_EV = dice[dice > next_roll_EV] # stop if you get one of these
settle_chance = length(values_over_future_EV)/6
settle_EV = mean(values_over_future_EV)
keep_going_chance = 1 - settle_chance
breakeven = settle_chance*settle_EV + keep_going_chance*next_roll_EV
stop_rolls = dice[dice > breakeven]
print(paste0("roll ", i, " has EV of ", breakeven,
", so stop in the prior roll if you have any of ", paste(stop_rolls, collapse = ", ")))
}
[1] "roll 6 has EV of 3.5, so stop in the prior roll if you have any of 4, 5, 6"
[1] "roll 5 has EV of 4.25, so stop in the prior roll if you have any of 5, 6"
[1] "roll 4 has EV of 4.66666666666667, so stop in the prior roll if you have any of 5, 6"
[1] "roll 3 has EV of 4.94444444444444, so stop in the prior roll if you have any of 5, 6"
[1] "roll 2 has EV of 5.12962962962963, so stop in the prior roll if you have any of 6"
Related
I'm trying to recreate a function from Sum of first n elements of a vector, but where this solution took an argument to sum first n elements of vector, I'd like an argument which is the threshold (including a default) the elements sum up to (or over).
After trying different for and/or while possibilities and searching StackOverflow, I've ended up here: unclear how to implement the threshold and set the n_elements.
I have this logic which returns 0 for the given vector. It doesn't seem the n_elements = x[i] + 1 part is correct.
theFunc <- function(x, threshold = 5){
n_elements = 0
while (sum(head(x)) < threshold){
n_elements = x[i] + 1
}
return(n_elements)
}
Call:
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 0
If the input is as above and the threshold is 5, then the function should return 6 (number of elements) because 0+0+1+1+2+3 = 7 and is above the threshold.
A simple function without a loop is as follows:
theFunc <- function(x, threshold = 5){
sum(cumsum(x) < threshold) + 1
}
x <- c(0, 0, 1, 1, 2, 3, 6, 7)
theFunc(x)
[1] 6
I have a random vector (of numbers 1:5) of length 20. I need to count the number of runs of 1 (i.e. each number that is not followed by the same number), 2 (i.e. 2 consecutive numbers the same), 3 and 4.
I'm trying to write a function that takes x[1] and x[2] and compares them, if they are the same then + 1 to a counting variable. After that, x[1] becomes x[2] and x[2] should become x[3] so it keeps on repeating. How do I make x[2] change to x[3] without assigning it again? Sorry if that doesn't make much sense
This is my first day learning R so please simplify as much as you can so I understand lol..
{
startingnumber <- x[1]
nextnumber <- x[2]
count <- 0
repeat {
if (startingnumber == nextnumber) {
count <- count + 1
startingnumber <- nextnumber
nextnumber <- x[3]
} else {
if (startingnumber != nextnumber) {
break
........
}
}
}
}
As mentioned in the comments, using table() on the rle() lengths is probably the most concise solution
E.g:
x <- c(3, 1, 1, 3, 4, 5, 3, 1, 5, 4, 2, 4, 2, 3, 2, 3, 2, 4, 5, 4)
table(rle(x)$lengths)
# 1 2
# 18 1
# or
v <- c(1, 1, 2, 4, 5, 5, 4, 5, 5, 3, 3, 2, 2, 2, 1, 4, 4, 4, 2, 1)
table(rle(v)$lengths)
# 1 2 3
# 6 4 2
In the first example there's 18 singles and one double (the two 1s near the beginning), for a total of 1*18 + 2*1 = 20 values
In the second example there are 6 singles, 4 doubles, and 2 triples, giving a total of 1*6 + 2*4 + 3*2 = 20 values
But if computational speed is of more importance than concise code, we can do better, as both table() and rle() do computations internally that we don't really need. Instead we can assemble a function that only does the bare minimum.
runlengths <- function(x) {
n <- length(x)
r <- which(x[-1] != x[-n])
rl <- diff(c(0, r, n))
rlu <- sort(unique(rl))
rlt <- tabulate(match(rl, rlu))
names(rlt) <- rlu
as.table(rlt)
}
runlengths(x)
# 1 2
# 18 1
runlengths(v)
# 1 2 3
# 6 4 2
Bonus:
You already know that you can compare individual elements of a vector like this
x[1] == x[2]
x[2] == x[3]
but did you know that you can compare vectors with each other, and that you can select multiple elements from a vector by specifying multiple indices? Together that means we can instead of doing
x[1] == x[2]
x[2] == x[3]
.
.
.
x[18] == x[19]
x[19] == x[20]
do
x[1:19] == x[2:20]
# Or even
x[-length(x)] == x[-1]
Getting started on backtesting some trading data, in particular a very basic mean reversion idea and can't get my head around how to approach this concept.
How would I go about having a running 'posy' increase by 1 once DifFromFv (the deviation from fair value) reaches -10 and subsequently 'posy' increases by 1 as DifFromFv extends by multiples of -3 (-13,-16,-19, etc.) whilst having 'posy' decrease by 1 every time DifFromFv reverts back +5 from last changed 'posy'? Simply put, I am buying once the DifFromFv reaches 10 points and averaging every 3 points, whilst taking each individual average out for 5 points profit.
E.g:
DifFromFv posy
0.00 0
-10.00 1 #initial clip (target profit -5.00)
-11.50 1
-13.00 2 #avg #1 (target profit -8.00)
-16.60 3 #avg #2 (target profit -11.00)
-12.30 3
-11.00 2 #taking profit on avg #2
-14.10 2
-8.00 1 #taking profit on avg #1
-7.00 1
-5.00 0 #taking profit on initial clip
It should be noted that the take profit for every clip is consistently set at -5,-8,-11,etc. increments regardless of where the averages are filled as seen by the target profit for avg #2 being at -11.00 rather than -11.60. This is both to reduce margin of error in real-life fills vs data fills and also I'm pretty sure should make the approach to this concept a lot easier to think about.
Thanks in advance!
Next time please provide some code, even though your explanation is quite clear.
However, you didn't mention how you want to deal with large jumps in DifFromFv (for instance, if it goes from -3 to -18), so I leave it up to you.
Here is the code with comments:
library(plyr)
firstPosy = FALSE
DiffFair <- c(0, -10, -11.5, -13, -16.6, -12.3, -11, -14.1, -8, -7, -5) # Your data here
posy <- c(0)
buyPrices <- c(0) # Stores the prices at which you by your asset
targetProfit <- c(0) # Stores the target profit alongside with the vector above
steps <- c(0) # Stores your multiples of -3 after -10 (-10, -13, -16...)
PNL = 0
for (i in 2:length(DiffFair)) {
# Case where posy increments for the first time by one
if (DiffFair[i] <= -10 & DiffFair[i] > -13 & firstPosy == FALSE) {
firstPosy = TRUE
posy <- c(posy, 1)
steps <- c(steps, round_any(DiffFair[i], 10, f = ceiling))
lastChangePosy = DiffFair[i]
buyPrices <- c(buyPrices, DiffFair[i])
targetProfit <- c(targetProfit, -5)
}
else if (DiffFair[i] <= -13 & firstPosy == FALSE) {
firstPosy = TRUE
lastChangePosy = DiffFair[i]
steps <- c(steps, round_any(DiffFair[i] + 10, 3, f = ceiling) - 10)
buyPrices <- c(buyPrices, DiffFair[i])
targetProfit <- c(targetProfit, -5)
posy <- c(posy, tail(posy, n=1) + (-round_any(DiffFair[i] + 10, 3, f = ceiling) / 3) + 1)
}
# Posy increase
else if (tail(steps, n=1) > round_any(DiffFair[i] + 10, 3, f = ceiling) - 10 & DiffFair[i] <= -10) {
posy <- c(posy, posy[i-1] + 1)
steps <- c(steps, round_any(DiffFair[i] + 10, 3, f = ceiling) -10)
lastChangePosy = DiffFair[i]
buyPrices <- c(buyPrices, DiffFair[i])
targetProfit <- c(targetProfit, tail(targetProfit, n=1) - 3)
}
# Posy decrease
else if (DiffFair[i] >= tail(targetProfit, n=1) & tail(posy, n=1) > 0) {
if (tail(targetProfit, n=1) == -5) {
posy <- c(posy, 0)
}
else {
posy <- c(posy, posy[i-1] - 1)
}
lastChangePosy = DiffFair[i]
# Compute PNL and delete the target profit and buy price from the vectors
PNL = PNL + (DiffFair[i] - tail(buyPrices, n=1))
buyPrices <- buyPrices[-length(buyPrices)]
targetProfit <- targetProfit[-length(targetProfit)]
steps <- steps[-length(steps)]
if (DiffFair[i] > -10) {
firstPosy = FALSE
}
}
# Posy doesn't change
else {
posy <- c(posy, posy[i-1])
}
}
print(PNL)
The code is to be used to decide who shall be the first player in the game Wazabi (a popular game developed by Gigamic). So you understand the gist of what I'm trying to do, the following procedure decides who goes first:
Each player rolls a 4, six-sided dice consisting of three outcomes (three faces have 'W', two faces have 'C' and the remaining face has 'G', so there is a 3/6 probability you'd roll a W for example).
The player who rolls the most Ws wins.
In the event of more than one player rolling the highest number of Ws, the players who rolled the highest number re-roll the 4 dice, and then the player with the highest W becomes first (this may happen recursively).
I've designed the below code (albeit messily) in r to try and decide who this player should be, but after a few hours of attempts and a tired brain for company I'm hoping one of you will be able to help me out. Here's the output in R console from my code:
who_first <- function(){
dummylist <- c()
playersdummy <- 1:number_of_players
first_rolling <- function(players_left=number_of_players){
for(i in 1:players_left){
# Random variable where 1, 2 & 3 represents Ws, Cs and Gs respectively.
die_poss <- c(1, 1, 1, 2, 2, 3)
die_result <- sample(die_poss, 4, replace=T)
dummy2 <- 0
for(j in 1:4){
if(die_result[j]==1){
dummy2 <- dummy2 + 1
}
}
dummy3 <- append(dummylist, dummy2, after=i)
# dummylist stores the number of Ws rolled by each respective player,
# i.e. first element says how many Ws 1st player still left in rolled.
dummylist <<- dummy3
}
dummy4 <- 0
for(k in 1:players_left){
if(dummylist[k]==max(dummylist)){
# dummy4 represents the number of players who rolled the highest number of Ws that roll.
dummy4 <<- dummy4 + 1
}
}
return(dummy4)
}
while(dummy4 >= 1){
if(dummy4==1){
playersdummy <<- playersdummy[(which(dummylist==max(dummylist))==TRUE)]
return(playersdummy)
}
else if(dummy4 > 1){
dummy5 <- c()
for(l in 1:length(playersdummy)){
if(any((playersdummy[l]==which(dummylist==max(dummylist)))==TRUE)){
dummy6 <- append(dummy5, playersdummy[l], after=l)
dummy5 <<- dummy6
}
}
# playersdummy becomes the vector containing which players are left in the game, i.e. 2 represents player 2.
playersdummy <<- dummy5
dummylist <<- c()
first_rolling(length(playersdummy))
}
}
}
who_first()
[1] 1 2 3 4 5 6 7
Warning message:
In max(dummylist) : no non-missing arguments to max; returning -Inf*
number_of_players is globally defined in another function simply as the number of players in the game. It is equal to 7 in this test.
Clearly I should return a vector of length 1 with the player number for whichever player rolled the most Ws after however many rolls and rerolls. Also, as I'm quite new to R, I'm not exactly sure as to what the warning message is getting at, if someone could explain that'd be helpful.
Here's a code that apparently follows your algorithm but is much simpler than your attempt.
The idea is simple: Since there's no difference between the C and G outcomes, you can just make them the same, valued 0. Make the W outcome 1, and sum the number of times it's been rolled. Then check if there's a winner, if not, keep only the highest ties and repeat.
It does seem that it would be much simpler to just roll the dice once and, in case of a tie, keep those and roll again. But maybe there's more to it.
who_first <- function(n) {
players <- seq_len(n)
found_winner <- FALSE
die_poss <- c(rep(c(0, 1), each = 3))
while (!found_winner) {
cat("Rolling for players", paste(players, collapse = ", "), "\n")
results <- replicate(length(players), sum(sample(die_poss, 4, replace=T)))
cat("Players rolled", paste(results, collapse = ", "), "points\n")
if (sum(max(results) == results) == 1) {
found_winner <- TRUE
winner <- players[which.max(results)]
cat("Winner is player", winner, "\n")
} else {
players <- players[max(results) == results]
}
}
invisible(winner)
}
Some runs:
> set.seed(10)
> who_first(7)
Rolling for players 1, 2, 3, 4, 5, 6, 7
Players rolled 2, 0, 3, 1, 1, 3, 2 points
Rolling for players 3, 6
Players rolled 2, 2 points
Rolling for players 3, 6
Players rolled 4, 1 points
Winner is player 3
> who_first(7)
Rolling for players 1, 2, 3, 4, 5, 6, 7
Players rolled 0, 2, 1, 1, 0, 3, 1 points
Winner is player 6
> who_first(7)
Rolling for players 1, 2, 3, 4, 5, 6, 7
Players rolled 3, 1, 2, 1, 1, 3, 2 points
Rolling for players 1, 6
Players rolled 0, 3 points
Winner is player 6
bucketIndex <- function(v, N){
o <- rep(0, length(v))
curSum <- 0
index <- 1
for(i in seq(length(v))){
o[i] <- index
curSum <- curSum + v[i]
if(curSum > N){
curSum <- 0
index <- index + 1
}
}
o
}
> bucketIndex(c(1, 1, 2, 1, 5, 1), 3)
[1] 1 1 1 2 2 3
I'm wondering if this function is fundamentally un-vectorizable. If it is, is there some package to deal with this "class" of functions, or is the only alternative (if I want speed) to write it as a c extension?
Here's a try (does not yet arrive at bucketIndex!):
your
curSum <- curSum + v[i]
if(curSum > N){
curSum <- 0
index <- index + 1
}
is almost an integer division %/% of cumsum (v).
But not quite, your index only counts up 1 even if v [i] is > several times N and you start with 1. We can almost take care of that by conversion to a factor and back to integer.
However, I'm wondering (from the name of the function) whether this behaviour is really intended:
> bucketIndex (c(1, 1, 2, 1, 2, 1, 1, 2, 1, 5, 1), 3)
[1] 1 1 1 2 2 2 3 3 3 4 5
> bucketIndex (c(1, 1, 1, 2, 2, 1, 1, 2, 1, 5, 1), 3)
[1] 1 1 1 1 2 2 2 3 3 3 4
I.e. just exchangig two consecutive entries in v can lead to different maximum in the result.
the other point is that you count up only after the element that causes the sum to be > N. Which means that the results should have an additional 1 at the beginning and the last element should be dropped.
You reset curSum to 0 regardless how much it shoots over N. So for all elements with cumsum (v) > N, you'd need to subtract this value, then look for the next cumsum (v) > N and so on. This reduces the number of loop iterations with respect to your for loop, but whether this gives you a substrantial improvement depends on the entries of v and on N (or, on the max (index) : length (v) ratio). If that is 50% as in your example, I don't think you can get a substantial gain. Unless there is at least an order of magnitute between them, I'd go for inline::cfunction.
I'm going to go out on a limb here and say the answer is "no." Essentially, you're changing what it is you sum over based on the results of the current sum. This means future calculations depend on the result of an intermediate calculation, which vectorized operations can't do.
I don't think that this is completely vectorizable, but #cbeleites gets at one way to reduce the number of iterations in the loop by dealing with a whole chunk (bucket) at a time. Each iteration looks for where the cumulative sum exceeds N, assigns the index to that range, reduces the cumulative sum by whatever value was that which exceeded N, and repeats until the vector is exhausted. The rest is bookkeeping (initialization of value and incrementation of values).
bucketIndex2 <- function(v, N) {
index <- 1
cs <- cumsum(v)
bk.old <- 0
o <- rep(0, length(v))
repeat {
bk <- suppressWarnings(min(which(cs > N)))
o[(bk.old+1):min(bk,length(v))] <- index
if (bk >= length(v)) break
cs <- cs - cs[bk]
index <- index + 1
bk.old <- bk
}
o
}
This matches your function for a variety of random inputs:
for (i in 1:200) {
v <- sample(sample(20,1), sample(50,1)+20, replace=TRUE)
N <- sample(10,1)
bi <- bucketIndex(v, N)
bi2 <- bucketIndex2(v, N)
if (any(bi != bi2)) {
print("MISMATCH:")
dump("v","")
dump("N","")
}
}