I trying to determine the point (x,y) where two functions intersect. The functions are the step interpolation between sets of points. One function is weakly increasing (v1) and the other weakly decreasing (v2). I'm coding in R, but a general algorithm is also ok.
If it helps, this is to determine market equilibrium with sets of supply and demand points.
The length of the two vectors is different and their x's and y's will not be the same.
Some example data:
set.seed(4)
v1 = data.frame( y = cumsum( runif(10) ) ,
x = cumsum( runif(10) ) )
v2 = data.frame( y = 5-cumsum( runif(8) ) ,
x = cumsum( runif(8) ) )
plot(y=0,x=0,type="n",xlim=c(0,5),ylim=c(0,5),xlab="x",ylab="y")
lines( y=v1$y , x=v1$x , type="S" , col="blue" )
lines( y=v1$y , x=v1$x , type="p" , col="blue" )
lines( y=v2$y , x=v2$x , type="s" , col="red" )
lines( y=v2$y , x=v2$x , type="p" , col="red" )
In this example, the intersection is at (x=2.7275363 , y=2.510405), where the x is from v2 and y is from v1.
Thanks
As I was facing the same issue, but was dependent on speed. I used the wonderful Rcpp to speed the code up.
If anybody is interested, this is what I did:
library(dplyr) # for data manipulation only, not used for the algorithm!
library(ggplot2) # for data graphing only, not used for the algorithm!
# Load (i.e., Source the Cpp function)
Rcpp::sourceCpp("find_optimum.cpp")
# small helper function that plots the supply and demand as a step-function
plot_supply_demand <- function(supply, demand) {
supply_df <- supply %>%
bind_rows(data_frame(p = -Inf, q = 0)) %>%
arrange(p) %>%
mutate(agg_q = cumsum(q), side = "supply") %>%
bind_rows(data_frame(p = Inf, q = 0, agg_q = sum(supply$q), side = "supply"))
demand_df <- demand %>%
bind_rows(data_frame(p = Inf, q = 0)) %>%
arrange(desc(p)) %>%
mutate(agg_q = cumsum(q), side = "demand") %>%
bind_rows(data_frame(p = -Inf, q = 0, agg_q = sum(demand$q), side = "demand"))
ggplot(mapping = aes(x = p, y = agg_q, color = side)) +
geom_step(data = demand_df, direction = "vh") +
geom_step(data = supply_df)
}
# create two data_frames containing the disaggregated data (i.e., orders)
# by graphing the data, or by calculating it by hand we see the optimum at (10, 2)
supply_small = data_frame(p = c(8, 10),
q = c(1, 2))
demand_small = data_frame(p = c(12, 10, 8),
q = c(1, 1, 1))
plot_supply_demand(supply_small, demand_small) +
geom_point(aes(x = 10, y = 2), color = "red", size = 5)
find_optimum(supply_small$p, supply_small$q, demand_small$p, demand_small$q)
#> $price
#> [1] 10
#>
#> $quantity
#> [1] 2
Larger example
set.seed(12345678)
demand <- data_frame(p = runif(100, 80, 200), q = rnorm(100, 10, 2))
supply <- data_frame(p = runif(100, 0, 120), q = rnorm(100, 10, 2))
opt <- find_optimum(supply$p, supply$q, demand$p, demand$q)
opt
#> $price
#> [1] 102.5982
#>
#> $quantity
#> [1] 841.8772
plot_supply_demand(supply, demand) +
geom_point(aes(x = opt$price, y = opt$quantity), color = "red", size = 2)
To zoom in a bit on the optimum, we can use the following
plot_supply_demand(supply, demand) +
geom_point(aes(x = opt$price, y = opt$quantity), color = "red", size = 2) +
xlim(opt$price + c(-10, 10)) + ylim(opt$quantity + c(-50, 50))
#> Warning: Removed 92 rows containing missing values (geom_path).
#> Warning: Removed 93 rows containing missing values (geom_path).
Created on 2018-10-20 by the reprex package (v0.2.0).
Rcpp Function
And last but not least, the C++ function in find_optimum.cpp that does the heavy lifting:
#include <Rcpp.h>
#include <map>
// [[Rcpp::export]]
Rcpp::List find_optimum(Rcpp::NumericVector price_supply,
Rcpp::NumericVector quant_supply,
Rcpp::NumericVector price_demand,
Rcpp::NumericVector quant_demand) {
std::map<double, double> supply;
std::map<double, double> demand;
// fill the maps
for (int i = 0; i < price_supply.size(); ++i) {
supply[price_supply[i]] += quant_supply[i];
}
for (int i = 0; i < price_demand.size(); ++i) {
demand[price_demand[i]] += quant_demand[i];
}
if (supply.empty() || demand.empty())
return Rcpp::List::create(Rcpp::Named("price") = 0, Rcpp::Named("quantity") = 0);
auto sIt = supply.begin(), nextS = std::next(sIt, 1);
const auto endS = supply.end();
auto dIt = demand.rbegin(), nextD = std::next(dIt, 1);
const auto endD = demand.rend();
// quantity and prices at either side
double pS = sIt->first, pD = dIt->first;
double qS = 0, qD = 0;
// next prices
double nextPS = nextS->first, nextPD = nextD->first;
if (pD < pS)
return Rcpp::List::create(Rcpp::Named("price") = 0, Rcpp::Named("quantity") = 0);
// add the best price from each side!
qS += sIt->second;
qD += dIt->second;
while (pS < pD) {
if (nextS == endS && nextD == endD) {
pD = qD < qS ? pS : pD;
break;
}
while (qS <= qD && sIt != endS && nextS->first <= pD) {
++sIt;
++nextS;
pS = sIt->first;
qS += sIt->second;
}
if (sIt == endS) break;
if (nextD->first < pS) {
pD = qD < qS ? pS : pD;
break;
}
while (qD < qS && dIt != endD && nextD->first >= pS) {
++dIt;
++nextD;
pD = dIt->first;
qD += dIt->second;
}
if (dIt == endD) break;
}
double price = pD;
double vol = qS < qD ? qS : qD;
return Rcpp::List::create(Rcpp::Named("price") = price,
Rcpp::Named("quantity") = vol);
}
You're drawing your step lines differently in each case: v1 you change the vertical first, and then the horizontal (up and across), whereas for v2 you reverse the order (across then down). Assuming this is correct, then your intersection point will be at or immediately after a point in v1 where the next point along the axis is a v1 with a lower y coordinate. We can find that by doing:
v1$v <- 1
v2$v <- 2
v3 <- rbind(v1,v2)
v3 <- v3[order(v3$x),]
v3$diff <- c( diff(v3$y),0)
ind <- which(v3$diff < 0 & v3$v ==1)[1]
There are now two distinct cases - the intersection could be on the horizontal or vertical arm following this point from v1. It will be the former if the immediately preceeding v2 is higher than the v1 after our found one; otherwise it will be in the horizontal arm. This is clear if you draw it out - I'll try and attach an image if you don't see this.
previousV2 <- tail(which(v3$v[1:ind]==2),1)
nextV1 <- which(v3$v[-(1:ind)]==1)[1] + ind
if (v3$y[previousV2] > v3$y[nextV1]) {
x <- v3$x[ind+1]
y <- v3$y[nextV1]
} else {
x <- v3$x[ind]
y <- v3$y[previousV2]
}
Worryingly, this doesn't agree with your (x=2.7275363 , y=2.510405) answer, but when I plot it, mine appears on the intersection. So either: I haven't understood what you want; you've miscalculated; or there's a different scheme regarding the order of horizontal and vertical components. The above code should be adaptable to different schemes.
I seem to have something that works but it's a lot more complicated than i was expecting.
First, let me define a helper function
between <- function(x, a, b) {
if(missing(b)) {
if(length(a)==2) {
a<-t(a)
}
} else {
a <- unname(cbind(a,b))
}
a<-t(apply(a,1,sort))
a[,1] <= x & x <= a[,2]
}
this just helps to check if a number is between two others. Now I will embed the two data.frames to make sets of consecutive point pairs, then i check each possible combination for segments that overlap in just the right way. (It's important that v1 here is the "S" and v2 is the s.)
sa<-embed(as.matrix(v1[,c("x","y")]),2)
sz<-embed(as.matrix(v2[,c("x","y")]),2)
xx<-outer(1:nrow(sa), 1:nrow(sz), function(a,z)
(between(sa[a,2], sz[z,c(2,4)]) & between(sz[z,1], sa[a,c(1,3)])) *1
+ (between(sz[z,4], sa[a,c(2,4)]) & between(sa[a,3], sz[z,c(1,3)]))*2
)
Now xx contains the matching set of points, I just need to extract the correct coordinates depending on which type of intersection occurred.
i <- which(xx!=0, arr.ind=T)
int.pt <- if(nrow(i)>0 && ncol(i)==2) {
if(xx[i]==1) {
c(sz[i[2],1], sa[i[1],2])
} else if (xx[i]==2) {
c(sa[i[1],3], sz[i[2],4])
}
} else {
c(NA,NA)
}
#optionally plot intersection
#if (all(!is.na(int.pt))) {
# points(int.pt[1],int.pt[2], pch=20, col="black")
# abline(v=int.pt[1], h=int.pt[2], lty=2)
#}
Perhaps there is a better way, but at least you have another method that seems to work to compare answers with.
I had another think about the problem. A key issue is that I need to find the intersection within an optimisation routine, so it has to be fast. So, I came up with the following (included here in case others have to same problem in the future). It is a modified Bentley-Ottmann algorithm.
# create some data
supply = data.frame( p = cumsum( runif(1000) ) ,
q = cumsum( runif(1000) ) )
demand = data.frame( p = tail(supply,1)$p - cumsum( runif(1000) ) ,
q = cumsum( runif(1000) ) )
# create tables that identify coordinates of horizontal and vertical lines
demand.h = cbind( p = head(demand,-1)$p ,
q.lower = head(demand,-1)$q ,
q.upper = tail(demand,-1)$q )
supply.v = cbind( q = head(supply,-1)$q ,
p.lower = head(supply,-1)$p ,
p.upper = tail(supply,-1)$p )
demand.v = cbind( q = tail(demand,-1)$q ,
p.lower = tail(demand,-1)$p ,
p.upper = head(demand,-1)$p )
supply.h = cbind( p = tail(supply,-1)$p ,
q.lower = head(supply,-1)$q ,
q.upper = tail(supply,-1)$q )
# define a function
find.intersection = function( f.A , f.B ){
f.result = any( f.B[,2]<=f.A[1] & f.B[,3]>=f.A[1] &
f.A[2] <=f.B[,1] & f.A[3] >=f.B[,1] )
return( f.result )
}
# find the intersection
intersection.h = c( demand.h[ apply( demand.h ,
MARGIN=1 ,
FUN=find.intersection ,
supply.v ) , 1 ] ,
supply.v[ apply( supply.v ,
MARGIN=1 ,
FUN=find.intersection ,
demand.h ) , 1 ] )
intersection.v = c( supply.h[ apply( supply.h ,
MARGIN=1 ,
FUN=find.intersection ,
demand.v ) , 1 ] ,
demand.v[ apply( demand.v ,
MARGIN=1 ,
FUN=find.intersection ,
supply.h ) , 1 ] )
intersection = c( intersection.h , intersection.v )
# (optional) if you want to print the graph and intersection
plot(y=0,x=0,type="n",
xlim=c(intersection[2]-1,intersection[2]+1),
ylim=c(intersection[1]-1,intersection[1]+1),
xlab="q",ylab="p")
lines( y=supply$p , x=supply$q , type="S" , col="black" )
lines( y=supply$p , x=supply$q , type="p" , col="black" )
lines( y=demand$p , x=demand$q , type="s" , col="black" )
lines( y=demand$p , x=demand$q , type="p" , col="black" )
points(intersection[2],intersection[1], pch=20, col="red")
abline( v=intersection[2], h=intersection[1], lty=2 , col="red")
Related
I am trying to make a table from a while loop. Basically, I want to make a while loop where the value of r increases by 1 and repeats this until the inequality is met. But in addition to that, I want to combine these values into a table with three columns: the value of r, the value of w, and the value of rhs (rounded to 3 decimal places).
```{r}
al = 0.10; n = 30; a = 3; b = 5; r = 2; int = 8; h = (int/2); msE = 19.19
table = function(MSE, V, H, alpha = al, r = 2){
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
while(w > rhs){
r = r+1
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
}
rbind(g)
}
table(MSE = msE, V = a*b, H = h)
```
I figured it would go something like this, but this only prints out the last value of r before the loop ends (it ends at 26), which results in a "table" that only has one row. I would like a table with 24 rows (since it starts at r = 2).
Any help would be appreciated!
Perhaps this might help:
al = 0.10; n = 30; a = 3; b = 5; r = 2; int = 8; h = (int/2); msE = 19.19
table = function(MSE, V, H, alpha = al, r = 2){
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
gn = data.frame(r, round(w, 3), round(rhs, 3))
while(w > rhs){
r = r+1
rhs = h^2*r/((V-1)*MSE)
w = qf(alpha, V-1, V*(r-1), lower.tail = FALSE)
g = data.frame(r, round(w, 3), round(rhs, 3))
gn <- rbind(gn,g)
}
return(gn)
}
table(MSE = msE, V = a*b, H = h)
A slightly different approach, eliminating the need for an interim data frame and for rbind(). Commented in the code.
# your parameters
al <- 0.10; n <- 30; a <- 3; b <- 5; int <- 8; h <- (int/2); msE <- 19.19
# your function definition (name changed to avoid confusion / conflict with existing R function)
tabula <- function(MSE, V, H, alpha = al, r = 2)
{
g <- data.frame( N = 0, W = 1, RHS = 0 ) # initiate data frame, values set
# so that the while condition is met
# the while function populates the data frame cell by cell,
# eliminating the need for an interim data.frame and rbind()
while( g[ r - 1, "W" ] > g[ r - 1, "RHS" ] ) # check condition in the last data frame row
{ # write values in a new row
g[ r, "N" ] <- r
g[ r, "W" ] <- round( qf( alpha, V - 1, V * ( r - 1 ), lower.tail = FALSE ), 3 )
g[ r, "RHS" ] <- round( h^2 * r / ( ( V - 1 ) * MSE ), 3 )
r <- r + 1 # increment row counter
}
return( g[ -1, ] ) # return the data frame, removing the initial row
}
tabula( MSE = msE, V = a * b, H = h )
the question I am trying to ask is how to I change one of the values of my variables (noted as LO$M in my list) after I pass a certain time.
The thing I am trying to achieve is that after 20,000 seconds passing I would like to change my value of Lac to the value of Lac at time 20,0000 +10,000
So at t = 20,000, Lac = Lac + 10,000
The issue I am having with my code is that within my if command I have if tt>= 20000, but this leads to the issue that every value of Lac after 20,000 being increased by 10,000 when what i want is that the FIRST value after 20,000 be increased by 10,000.
Basically, after 20,000 of my experiment passing I am trying to inject 10,000 more Lac into the experiment.
My code is given below:
LO = list()
LO$M = c(i = 1, ri = 0, I = 50, Lac = 20, ILac = 0, o = 1, Io = 0, RNAP = 100, RNAPo = 0, r = 0, z = 0)
LO$Pre = matrix(c(1,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,1,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,
0,0,0,0,0,1,0,1,0,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,0,0,0,0,1,0,
0,0,0,1,0,0,0,0,0,0,1,
0,1,0,0,0,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,1,0,
0,0,0,0,0,0,0,0,0,0,1), ncol=11, byrow=TRUE)
LO$Post = matrix(c(1,1,0,0,0,0,0,0,0,0,0,
0,1,1,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,1,0,0,0,0,
0,0,1,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,1,0,0,
0,0,0,0,0,1,0,1,0,0,0,
0,0,0,0,0,1,0,1,0,1,0,
0,0,0,0,0,0,0,0,0,1,1,
0,0,0,0,0,0,0,0,0,0,1,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0), ncol=11, byrow=TRUE)
LO$h = function(x,t,th=c(0.02,0.1,0.005,0.1,1,0.01,0.1,0.01,0.03,0.1,1e-05,0.01,0.002,0.01,0.001))
{
with(as.list(c(x, th)), {
return(c(th[1]*i, th[2]*ri, th[3]*I*Lac, th[4]*ILac, th[5]*I*o, th[6]*Io, th[7]*o*RNAP,
th[8]*RNAPo, th[9]*RNAPo, th[10]*r, th[11]*Lac*z, th[12]*ri, th[13]*I,
th[13]*ILac, th[14]*r, th[15]*z))
})
}
gillespie1 = function (N, n, ...)
{
tt = 0
x = N$M
S = t(N$Post - N$Pre)
u = nrow(S)
v = ncol(S)
tvec = vector("numeric", n)
xmat = matrix(ncol = u, nrow = n + 1)
xmat[1, ] = x
for (i in 1:n) {
h = N$h(x, tt, ...)
tt = tt + rexp(1, sum(h))
j = sample(v, 1, prob = h)
x = x + S[, j]
tvec[i] = tt
xmat[i + 1, ] = x
if( tt >=20000){
x[4] = x[4] +10000
}
}
return(list(t = tvec, x = xmat))
}
newout = gillespie1(LO,200000)
matplot(newout$x[,4], type="l", lwd=0.25, col="grey")
I don't have a high enough reputation to attach images, but it should look something like this:
https://gyazo.com/0ffd940a22df23b2ccfdf4a17e85dca8
Sorry if this isn't clear. Thanks
In this example, you have the function myTask(). When you call execMyTask(), you will execute myTask()once, and after that, you will execute it at random intervals between 1 to max_wait milliseconds. When you get tired, you can kill the task with tclTaskDelete().
library(tcltk2)
myTask <- function() cat("some task!\n")
id = "execMyTask"
execMyTask <- function(max_wait = 3000) {
id <- toString(match.call()[[1]])
myTask()
wait = sample(1:max_wait, 1)
cat("Waiting", wait, "miliseconds\n") # replace with your function
if (is.null(tclTaskGet(id))) {
tclTaskSchedule(wait=wait, execMyTask(), id=id, redo = TRUE)
} else {
tclTaskChange(wait=wait, execMyTask(), id=id, redo = TRUE)
}
}
execMyTask()
tclTaskDelete(id)
So far, there is a little problem with this approach, because we can not supply arguments to the function fun in tclTaskChange().
Don't be scared by my long code. What i am wondering is about the last part, the plot(step fun... part. When i enter this into Rstudio i get "stepfun "x" must be ordered increasingly"
Is there any1 here who knows what I have to do to finish this correctly?
bd_process <- function(lambda, mu, initial_state = 0, steps = 100) {
time_now <- 0
state_now <- initial_state
time <- 0
state <- initial_state
for (i in 1:steps) {
if (state_now == 3) {
lambda_now <- 0
} else {
lambda_now <- lambda
}
if (state_now == 0) {
mu_now <- 0
} else {
mu_now <- mu
}
time_to_transition <- rexp(mu, rate = 1) + rexp(lambda, rate = 1)
X <- rexp(mu, rate = 1)
Y <- rexp(lambda, rate = 1)
if (X < Y) {
state_now <- state_now - 1
} else {
state_now <- state_now + 1
}
time_now <- time_now + time_to_transition
time <- c(time, time_now)
state <- c(state, state_now)
}
list(time = time, state = state) }
set.seed(19930628)
proposal1 <- bd_process(lambda = 2, mu = 10)
proposal2 <- bd_process(lambda = 6, mu = 10)
proposal3 <- bd_process(lambda = 10, mu = 10)
time1 <- proposal1$time
state1 <- proposal1$state
plot(stepfun(time1[-1], state1),
do.points = FALSE,
xlab = "Tid",
ylab = "Tillstånd",
main = "",
yaxt = "n")
axis(2, at = c(0, 1, 2, 3), las = 2)
I don't know what your code is doing but you've asked us not to worry about that. At the moment it appears that you have only constructed "time intervals" but now need to "stack them together" or "integrate" them along a proper time axis. In order to plot a simulation of a stepfunction, you should be using cumsum to construct an increasing time1 vector. Because the "time" and "state" variables are of such different lengths a quick fix to the function arguments is trimming the time1 vector so it is the correct length for the state1 variable, and you get no error with:
plot(stepfun(cumsum(time1[2:101]), state1),
do.points = FALSE,
xlab = "Tid",
ylab = "Tillstånd",
main = "",
yaxt = "n")
axis(2, at = c(0, 1, 2, 3), las = 2)
Maybe if you "march step-by-step" through the code and explain the code (to yourself and the rest of us) using comments you will figure out why you have 10 times as many time1's as you have state1's. I suspect it may have something to do with using "mu" as the first argument to rexp(mu, rate = 1). The first argument to random number generators in R is usually a positive integer that determines length (the number of random numbers) from the distribution.
I'm using Sutton & Barto's ebook Reinforcement Learning: An Introduction to study reinforcement learning. I'm having some issues trying to emulate the results (plots) on the action-value page.
More specifically, how can I simulate the greedy value for each task? The book says:
...we can plot the performance and behavior of various methods as
they improve with experience over 1000 plays...
So I guess I have to keep track of the exploratory values as better ones are found. The issue is how to do this using the greedy approach - since there are no exploratory moves, how do I know what is a greedy behavior?
Thanks for all the comments and answers!
UPDATE: See code on my answer.
I finally got this right. The eps player should beat the greedy player because of the exploratory moves, as pointed out int the book.
The code is slow and need some optimizations, but here it is:
get.testbed = function(arms = 10, plays = 500, u = 0, sdev.arm = 1, sdev.rewards = 1){
optimal = rnorm(arms, u, sdev.arm)
rewards = sapply(optimal, function(x)rnorm(plays, x, sdev.rewards))
list(optimal = optimal, rewards = rewards)
}
play.slots = function(arms = 10, plays = 500, u = 0, sdev.arm = 1, sdev.rewards = 1, eps = 0.1){
testbed = get.testbed(arms, plays, u, sdev.arm, sdev.rewards)
optimal = testbed$optimal
rewards = testbed$rewards
optim.index = which.max(optimal)
slot.rewards = rep(0, arms)
reward.hist = rep(0, plays)
optimal.hist = rep(0, plays)
pulls = rep(0, arms)
probs = runif(plays)
# vetorizar
for (i in 1:plays){
## dont use ifelse() in this case
## idx = ifelse(probs[i] < eps, sample(arms, 1), which.max(slot.rewards))
idx = if (probs[i] < eps) sample(arms, 1) else which.max(slot.rewards)
reward.hist[i] = rewards[i, idx]
if (idx == optim.index)
optimal.hist[i] = 1
slot.rewards[idx] = slot.rewards[idx] + (rewards[i, idx] - slot.rewards[idx])/(pulls[idx] + 1)
pulls[idx] = pulls[idx] + 1
}
list(slot.rewards = slot.rewards, reward.hist = reward.hist, optimal.hist = optimal.hist, pulls = pulls)
}
do.simulation = function(N = 100, arms = 10, plays = 500, u = 0, sdev.arm = 1, sdev.rewards = 1, eps = c(0.0, 0.01, 0.1)){
n.players = length(eps)
col.names = paste('eps', eps)
rewards.hist = matrix(0, nrow = plays, ncol = n.players)
optim.hist = matrix(0, nrow = plays, ncol = n.players)
colnames(rewards.hist) = col.names
colnames(optim.hist) = col.names
for (p in 1:n.players){
for (i in 1:N){
play.results = play.slots(arms, plays, u, sdev.arm, sdev.rewards, eps[p])
rewards.hist[, p] = rewards.hist[, p] + play.results$reward.hist
optim.hist[, p] = optim.hist[, p] + play.results$optimal.hist
}
}
rewards.hist = rewards.hist/N
optim.hist = optim.hist/N
optim.hist = apply(optim.hist, 2, function(x)cumsum(x)/(1:plays))
### Plot helper ###
plot.result = function(x, n.series, colors, leg.names, ...){
for (i in 1:n.series){
if (i == 1)
plot.ts(x[, i], ylim = 2*range(x), col = colors[i], ...)
else
lines(x[, i], col = colors[i], ...)
grid(col = 'lightgray')
}
legend('topleft', leg.names, col = colors, lwd = 2, cex = 0.6, box.lwd = NA)
}
### Plot helper ###
#### Plots ####
require(RColorBrewer)
colors = brewer.pal(n.players + 3, 'Set2')
op <-par(mfrow = c(2, 1), no.readonly = TRUE)
plot.result(rewards.hist, n.players, colors, col.names, xlab = 'Plays', ylab = 'Average reward', lwd = 2)
plot.result(optim.hist, n.players, colors, col.names, xlab = 'Plays', ylab = 'Optimal move %', lwd = 2)
#### Plots ####
par(op)
}
To run it just call
do.simulation(N = 100, arms = 10, eps = c(0, 0.01, 0.1))
You could also choose to make use of the R package "contextual", which aims to ease the implementation and evaluation of both context-free (as described in Sutton & Barto) and contextual (such as for example LinUCB) Multi-Armed Bandit policies.
The package actually offers a vignette on how to replicate all Sutton & Barto bandit plots. For example, to generate the ε-greedy plots, just simulate EpsilonGreedy policies against a Gaussian bandit :
library(contextual)
set.seed(2)
mus <- rnorm(10, 0, 1)
sigmas <- rep(1, 10)
bandit <- BasicGaussianBandit$new(mu_per_arm = mus, sigma_per_arm = sigmas)
agents <- list(Agent$new(EpsilonGreedyPolicy$new(0), bandit, "e = 0, greedy"),
Agent$new(EpsilonGreedyPolicy$new(0.1), bandit, "e = 0.1"),
Agent$new(EpsilonGreedyPolicy$new(0.01), bandit, "e = 0.01"))
simulator <- Simulator$new(agents = agents, horizon = 1000, simulations = 2000)
history <- simulator$run()
plot(history, type = "average", regret = FALSE, lwd = 1, legend_position = "bottomright")
plot(history, type = "optimal", lwd = 1, legend_position = "bottomright")
Full disclosure: I am one of the developers of the package.
this is what I have so far based on our chat:
set.seed(1)
getRewardsGaussian <- function(arms, plays) {
## assuming each action has a normal distribution
# first generate new means
QStar <- rnorm(arms, 0, 1)
# then for each mean, generate `play`-many samples
sapply(QStar, function(u)
rnorm(plays, u, 1))
}
CalculateRewardsPerMethod <- function(arms=7, epsi1=0.01, epsi2=0.1
, plays=1000, methods=c("greedy", "epsi1", "epsi2")) {
# names for easy handling
names(methods) <- methods
arm.names <- paste0("Arm", ifelse((1:arms)<10, 0, ""), 1:arms)
# this could be different if not all actions' rewards have a gaussian dist.
rewards.source <- getRewardsGaussian(arms, plays)
# Three dimensional array to track running averages of each method
running.avgs <-
array(0, dim=c(plays, arms, length(methods))
, dimnames=list(PlayNo.=NULL, Arm=arm.names, Method=methods))
# Three dimensional array to track the outcome of each play, according to each method
rewards.received <-
array(NA_real_, dim=c(plays, 2, length(methods))
, dimnames=list(PlayNo.=seq(plays), Outcome=c("Arm", "Reward"), Method=methods))
# define the function internally to not have to pass running.avgs
chooseAnArm <- function(p) {
# Note that in a tie, which.max returns the lowest value, which is what we want
maxes <- apply(running.avgs[p, ,methods, drop=FALSE], 3, which.max)
# Note: deliberately drawing two separate random numbers and keeping this as
# two lines of code to accent that the two draws should not be related
if(runif(1) < epsi1)
maxes["epsi1"] <- sample(arms, 1)
if(runif(1) < epsi2)
maxes["epsi2"] <- sample(arms, 1)
return(maxes)
}
## TODO: Perform each action at least once, then select according to algorithm
## Starting points. Everyone starts at machine 3
choice <- c(3, 3, 3)
reward <- rewards.source[1, choice]
## First run, slightly different
rewards.received[1,,] <- rbind(choice, reward)
running.avgs[1, choice, ] <- reward # if different starting points, this needs to change like below
## HERE IS WHERE WE START PULLING THE LEVERS ##
## ----------------------------------------- ##
for (p in 2:plays) {
choice <- chooseAnArm(p)
reward <- rewards.source[p, choice]
# Note: When dropping a dim, the methods will be the columns
# and the Outcome info will be the rows. Use `rbind` instead of `cbind`.
rewards.received[p,,names(choice)] <- rbind(choice, reward)
## Update the running averages.
## For each method, the current running averages are the same as the
## previous for all arms, except for the one chosen this round.
## Thus start with last round's averages, then update the one arm.
running.avgs[p,,] <- running.avgs[p-1,,]
# The updating is only involved part (due to lots of array-indexing)
running.avgs[p,,][cbind(choice, 1:3)] <-
sapply(names(choice), function(m)
# Update the running average for the selected arm (for the current play & method)
mean( rewards.received[ 1:p,,,drop=FALSE][ rewards.received[1:p,"Arm",m] == choice[m],"Reward",m])
)
} # end for-loop
## DIFFERENT RETURN OPTIONS ##
## ------------------------ ##
## All rewards received, in simplifed matrix (dropping information on arm chosen)
# return(rewards.received[, "Reward", ])
## All rewards received, along with which arm chosen:
# return(rewards.received)
## Running averages of the rewards received by method
return( apply(rewards.received[, "Reward", ], 2, cumsum) / (1:plays) )
}
### EXECUTION (AND SIMULATION)
## PARAMETERS
arms <- 10
plays <- 1000
epsi1 <- 0.01
epsi2 <- 0.1
simuls <- 50 # 2000
methods=c("greedy", "epsi1", "epsi2")
## Single Iteration:
### we can run system time to get an idea for how long one will take
tme <- system.time( CalculateRewardsPerMethod(arms=arms, epsi1=epsi1, epsi2=epsi2, plays=plays) )
cat("Expected run time is approx: ", round((simuls * tme[["elapsed"]]) / 60, 1), " minutes")
## Multiple iterations (simulations)
rewards.received.list <- replicate(simuls, CalculateRewardsPerMethod(arms=arms, epsi1=epsi1, epsi2=epsi2, plays=plays), simplify="array")
## Compute average across simulations
rewards.received <- apply(rewards.received.list, 1:2, mean)
## RESULTS
head(rewards.received, 17)
MeanRewards <- rewards.received
## If using an alternate return method in `Calculate..` use the two lines below to calculate running avg
# CumulRewards <- apply(rewards.received, 2, cumsum)
# MeanRewards <- CumulRewards / (1:plays)
## PLOT
plot.ts(MeanRewards[, "greedy"], col = 'red', lwd = 2, ylim = range(MeanRewards), ylab = 'Average reward', xlab="Plays")
lines(MeanRewards[, "epsi1"], col = 'orange', lwd = 2)
lines(MeanRewards[, "epsi2"], col = 'navy', lwd = 2)
grid(col = 'darkgray')
legend('bottomright', c('greedy', paste("epsi1 =", epsi1), paste("epsi2 =", epsi2)), col = c('red', 'orange', 'navy'), lwd = 2, cex = 0.8)
You may also want to check this link
https://www.datahubbs.com/multi_armed_bandits_reinforcement_learning_1/
Copy of the relevant code from the above source
It does not use R but simply np.random.rand() from numpy
class eps_bandit:
'''
epsilon-greedy k-bandit problem
Inputs
=====================================================
k: number of arms (int)
eps: probability of random action 0 < eps < 1 (float)
iters: number of steps (int)
mu: set the average rewards for each of the k-arms.
Set to "random" for the rewards to be selected from
a normal distribution with mean = 0.
Set to "sequence" for the means to be ordered from
0 to k-1.
Pass a list or array of length = k for user-defined
values.
'''
def __init__(self, k, eps, iters, mu='random'):
# Number of arms
self.k = k
# Search probability
self.eps = eps
# Number of iterations
self.iters = iters
# Step count
self.n = 0
# Step count for each arm
self.k_n = np.zeros(k)
# Total mean reward
self.mean_reward = 0
self.reward = np.zeros(iters)
# Mean reward for each arm
self.k_reward = np.zeros(k)
if type(mu) == list or type(mu).__module__ == np.__name__:
# User-defined averages
self.mu = np.array(mu)
elif mu == 'random':
# Draw means from probability distribution
self.mu = np.random.normal(0, 1, k)
elif mu == 'sequence':
# Increase the mean for each arm by one
self.mu = np.linspace(0, k-1, k)
def pull(self):
# Generate random number
p = np.random.rand()
if self.eps == 0 and self.n == 0:
a = np.random.choice(self.k)
elif p < self.eps:
# Randomly select an action
a = np.random.choice(self.k)
else:
# Take greedy action
a = np.argmax(self.k_reward)
reward = np.random.normal(self.mu[a], 1)
# Update counts
self.n += 1
self.k_n[a] += 1
# Update total
self.mean_reward = self.mean_reward + (
reward - self.mean_reward) / self.n
# Update results for a_k
self.k_reward[a] = self.k_reward[a] + (
reward - self.k_reward[a]) / self.k_n[a]
def run(self):
for i in range(self.iters):
self.pull()
self.reward[i] = self.mean_reward
def reset(self):
# Resets results while keeping settings
self.n = 0
self.k_n = np.zeros(k)
self.mean_reward = 0
self.reward = np.zeros(iters)
self.k_reward = np.zeros(k)
This is my code for running a clinical trial in order to show probability of a trial been successful. My problem is that I need to show that by introducing a second set of sample (n.2), how many samples are required to produce a value above the threshold of 90%. Any help please, I know I need to loop the code I have but am having trouble doing so.
calc.quant = function( n, X.1, a, b, n.2, nsim, thr, p1=0.025, p2=0.975 )
{
a.star = a + n
b.star = b + n - X.1
theta = rbeta( nsim, a.star, b.star
X.2 = rbinom( nsim, n.2, theta )
theta.p1p2 = matrix( 0, nrow=nsim, ncol=2 )
for( j in 1:nsim ) {
theta.p1p2[j,] = qbeta( c( p1, p2 ), a.star + X.2[j], b.star + n.2 - X.2[j] )
}
return( theta.p1p2 )
}
n = 117
X.1 = 110
a = 1
b = 1
n.2 = 50
nsim = 1000
thr = .90
res = calc.quant( n, X.1, a, b, n.2, nsim, thr )
sum( res[,1] > thr ) / nsim
[This is not a complete answer, but simply to get clarification on what the OP is going for.]
Basic strategy with a for-loop:
threshold <- somevalue
for(i in someseq){
output <- somefunction(...)
if(output > threshold)
break
}
output
Basic strategy with a while-loop:
threshold <- somevalue
below.threshold <- TRUE
while(below.threshold){
output <- somefunction(...)
if(output > threshold)
below.threshold <- FALSE
}