Monte Carlo Simulation with Replacement Based On Sum of A Column - r

I am trying to simulate an unlikely situation in a videogame using a Monte Carlo simulation. I'm extremely new at coding and thought this would be a fun situation to simulate.
There are 3 targets and they are being attacked 8 times independently. My problem comes with how to deal with the fact that one of the columns cannot be attacked more than 6 times, when there are 8 attacks.
I would like to take any attack aimed at column 2 select one of the other 2 columns at random to attack instead, but only if column 2 has been attacked 6 times already.
Here is my attempt to simulate with 5000 repeats, for example.
#determine number of repeats
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (trial in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[trial, target] <- m[trial, target] + 1
ifelse(m[trial, 2] > 6, #determines if the value of column 2 is greater than 6 after each attack
function(m){
m[trial, 2] <- m[trial, 2] - 1 #subtract the value from the second column to return it to 6
newtarget <- sample(c(1,3), 1) #select either column 1 or 3 as a new target at random
m[trial, newtarget] <- m[trial, newtarget] + 1 #add 1 to indicate the new target has been selected
m}, #return the matrix after modification
m) #do nothing if the value of the second column is <= 6
}
}
For example, if I have the matrix below:
> matrix(c(2,1,5,7,1,0), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 2 5 1
[2,] 1 7 0
I would like the function to look at the 2nd line of the matrix, subtract 1 from 7, and then add 1 to either column 1 or 3 to create c(2,6,0) or c(1,6,1). I would like to learn how to do this within the loop, but it could be done afterwards as well.
I think I am making serious, fundamental error with how to use function(x) or ifelse.
Thank you.

Here's an improved version of your code:
set.seed(1)
trial <- 5000
#create matrix with a row for each trial
m <- matrix(0, nrow = trial, ncol = 3)
#The first for loop is for each row
#The second for loop runs each attack independently, sampling 1:3 at random, then adding one to that position of the row.
#The function that is called by ifelse() when m[trial, 2] > 6 = TRUE is the issue.
for (i in 1:trial){
for (attack in 1:8) {
target <- sample(1:3, 1)
m[i, target] <- m[i, target] + 1
#determines if the value of column 2 is greater than 6 after each attack
if(m[i, 2] > 6){
#subtract the value from the second column to return it to 6
m[i, 2] <- m[i, 2] - 1
#select either column 1 or 3 as a new target at random
newtarget <- sample(c(1,3), 1)
#add 1 to indicate the new target has been selected
m[i, newtarget] <- m[i, newtarget] + 1
}
}
}
# Notice the largest value in column 2 is no greater than 6.
apply(m, 2, max)
set.seed is used to make the results reproducible (usually just used for testing). The ifelse function has a different purpose than the normal if-else control flow. Here's an example:
x = runif(100)
ifelse(x < 0.5, 0, x)
You'll notice any element in x that is less than 0.5 is now zero. I changed your code to have an if block. Notice that m[i, 2] > 6 returns a single TRUE or FALSE whereas in the small example above, x < 0.5 a vector of logicals is returned. So ifelse can take a vector of logicals, but the if block requires there be only a single logical.
You were on the right track with using function, but it just isn't necessary in this case. Often, but not always, you'll define a function like this:
f = function(x)
x^2
But just returning the value doesn't mean what you want is changed:
x = 5
f(5) # 25
x # still 5
For more on this, look up function scope in R.
Lastly, I changed the loop to be i in 1:trial instead of trial in 1:trial. You probably wouldn't notice any issues in your case, but it is better practice to use a separate variable than that which makes up the range of the loop.
Hope this helps.
P.S. R isn't really known for it's speed when looping. If you want to make things goes faster, you'll typically need to vectorize your code.

Related

Efficiently change individual elements in matrix/array in R

I am running a simulation in R, which I am trying to make more efficient.
A little bit of background: this is an abstract simulation to test the effects of mutation on a population. The population has N individuals and each individuals has a genotype of M letters, each letter can be one of the twenty amino acids (I denote as 0:19).
One of the most (computationally) expensive tasks involves taking a matrix "mat" with M rows and N columns, which initially starts as a matrix of all zeroes,
mat <- matrix(rep(0,M*N),nrow=M)
And then changing (mutating) at least one letter in the genotype of each individual. The reason I say at least is, I would ideally like to set a mutation rate (mutrate) that, if I set to 2 in my overall simulation function, it will cause 2 mutations in the matrix per individual.
I found two rather computationally expensive ways to do so. As you can see below, only the second method incorporates the mutation rate parameter mutrate (I could not easily of think how to incorporate it into the first).
#method 1
for(i in 1:N){
position <- floor(runif(N, min=0, max=M))
letter <- floor(runif(N, min=0, max=19))
mat[position[i],i] = letter[i]}
#method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))
The second method incorporates a modulus because genotype values have to be between 0 and 19 as I mentioned.
A few additional notes for clarity:
I don't strictly need every individual to get exactly the same mutation amount. But that being said, the distribution should be narrow enough such that, if mutrate = 2, most individuals get two mutations, some one, some maybe three. I don't want however one individual getting a huge amount of mutations and many individuals getting no mutations Notably, some mutations will change the letter into the same letter, and so for a large population size N, the expected average number of mutations is slightly less than the assigned mutrate.
I believe the answer has something to do with the ability to use the square-bracket subsetting method to obtain one random element from every column of the matrix mat. However, I could not find any information about how to use the syntax to isolate one random element from every column of a matrix. mat[sample(1:M),sample(1:N)] obviously gives you the whole matrix... perhaps I am missing something stupidly clear here.
Any help is greatly appreciated !
To answer your last question first; you can access a single cell in a matrix with mat[row,column], or multiple scattered cells by their sequential cell id. Cell 1,1 is the first cell, followed by 2,1, 3,1, etc:
mat <- matrix(rep(0, 5*5), nrow=5)
mat[c(1,3,5,7,9)] = c(1,2,3,4,5)
mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 4 0 0 0
[3,] 2 0 0 0 0
[4,] 0 5 0 0 0
[5,] 3 0 0 0 0
Accessing / overwriting the individual cells is fast too however. The fastest way that I could think of to perform your task, is to first create vectors for the values we want. A vector of all column indices (every column as many times as mutrate), a vector of row indices (randomly), and a vector of new values for these column/row combinations (randomly).
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
for(i in seq_len(N*mutrate)) {
mat[rows[i],cols[i]] = values[i]
}
Instead of that for-loop to update the matrix, we can also calculate the cell-IDs so we can update all matrix cells in one go:
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
mat[cellid] = values
Trying with a 6000x10000 matrix to benchmark the multiple methods, shows how fast each method is:
N = 6000 # individuals
M = 10000 # genotype length
genotypes = 20
mutrate = 2
method1 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
for(i in 1:(N*mutrate)){
position <- sample(M, 1)
letter <- sample(genotypes, 1) - 1
mat[position,i%%N] = letter
}
return(mat)
}
method2 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20)))
}
method3 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
for(i in seq_len(N*mutrate)) {
mat[rows[i],cols[i]] = values[i]
}
return(mat)
}
method4 <- function() {
mat <- matrix(rep(0,M*N),nrow=M)
cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
mat[cellid] = values
return(mat)
}
benchmark <- function(func, times=10) {
begin <- as.numeric(Sys.time())
for(i in seq_len(times))
retval <- eval(parse(text=func))
end <- as.numeric(Sys.time())
cat(func, 'took', (end-begin)/times, 'seconds\n')
return(retval)
}
ret1 <- benchmark('method1()')
ret2 <- benchmark('method2()')
ret3 <- benchmark('method3()')
ret4 <- benchmark('method4()')
I've modified your first method to speed it up and perform mutrate.
method1() took 0.8936087 seconds
method2() took 8.767686 seconds
method3() took 0.7008878 seconds
method4() took 0.6548331 seconds

How can I create a for-loop to count the number of values within a vector that fall between a set boundary?

I'm trying to set an upper and lower boundary of a vector by simply adding and subtracting a set value from each index. I then want to create a loop that tells me for each value (i) in the vector, how many other points within the vector falls within that boundary.
Essentially creating a pseudo-density calculation based on how many values fall within the established range.
I have my vector "v" that contains random values. I then add/subtract three to it to get the upper and lower ranges. But can't create a loop that will count how many other values from that vector fall within that.
v <- c(1, 3, 4, 5, 8, 9, 10, 54)
for (i in v){
vec2 <- (vec +3 > vec[i] & vec -3 < vec[i])
}
vec2
I get NA's from this code.
I've also tried indexing the vec +/- 3 and it also didn't work.
vec2 <- (vec[i] +3 > vec[i] & vec - 3 < vec[i))
What I want is for every "i" value in the vector, I want to know how many points fall within that value + and -3.
i.e. first value being 1: so the upper limit would be 4 and the lower would be -2. I want it to count how many values remaining in the vector, fall within this. Which would be 3 for the first index (if it includes itself).
vec2 = (3, 4, 3, . . . )
Are you looking for something like this? Your code doesn't work because your syntax is incorrect.
vec <- c(1, 3, 4, 5, 8, 9, 10, 54) #Input vector
countvalswithin <- vector() #Empty vector that will store counts of values within bounds
#For loop to cycle through values stored in input vector
for(i in 1:length(vec)){
currval <- vec[i] #Take current value
lbound <- (currval - 3) #Calculate lower bound w.r.t. this value
ubound <- (currval + 3) #Calculate upper bound w.r.t. this value
#Create vector containing all values from source vector except current value
#This will be used for comparison against current value to find values within bounds.
othervals <- subset(vec, vec != currval)
currcount <- 1 #Set to 0 to exclude self; count(er) of values within bounds of current value
#For loop to cycle through all other values (excluding current value) to find values within bounds of current value
for(j in 1:length(othervals)){
#If statement to evaluate whether compared value is within bounds of current value; if it is, counter updates by 1
if(othervals[j] > lbound & othervals[j] <= ubound){
currcount <- currcount + 1
}
}
countvalswithin[i] <- currcount #Append count for current value to a vector
}
df <- data.frame(vec, countvalswithin) #Input vector and respective counts as a dataframe
df
# vec countvalswithin
# 1 1 3
# 2 3 4
# 3 4 3
# 4 5 4
# 5 8 3
# 6 9 3
# 7 10 3
# 8 54 1
Edit: added comments to the code explaining what it does.
In your for loop we can loop over every element in v, create range (-3, +3) and check how many of the elements in v fall within that range and store the result in new vector vec2.
vec2 <- numeric(length = length(v))
for (i in seq_along(v)) {
vec2[i] <- sum((v >= v[i] - 3) & (v <= v[i] + 3))
}
vec2
#[1] 3 4 4 4 4 3 3 1
However, you can avoid the for loop by using mapply
mapply(function(x, y) sum(v >= y & v <= x), v + 3, v - 3)
#[1] 3 4 4 4 4 3 3 1

Randomly select values from a given number list to add to a certain value in r

If I have a set of values such as
c(1,2,5,6,7,15,19,20)
and I want to randomly select 2 values where the sum equals 20. From the above list possible samples that I would like to see would be
[19,1], [15,5]
How do I do this in R. Any help would be greatly appreciated.
This computes all possible combinations of your input vector, so if this is very long, this might be a problem.
getVal <- function(vec,val) {
comb = combn(vec, 2)
idx = colSums(comb) == val
if (sum(idx)) {
return(comb[,idx][,sample(sum(idx),1)])
}
return(FALSE)
}
vec = (c(1,4,6,9))
val = 10
getVal(vec,val)
>>[1] 1 9
val = 11
>>[1] FALSE
getVal(vec,val)
For a small vector of values you can do an exhaustive search by working out all the combinations of pairs in the values. Example:
> values = c(1,2,5,6,7,15,19,20)
> pairs = matrix(values[t(combn(length(values),2))],ncol=2)
That is a 2-column matrix of all pairs from values. Now sum the rows and look for the target value of 20:
> targets = apply(pairs,1,sum)==20
> pairs[targets,]
[,1] [,2]
[1,] 1 19
[2,] 5 15
The size of pairs increases such that if you have 100 values then pairs will have nearly 5000 rows.
You can do this with the sample()-functie and a while-loop. It isn't the prettiest solution but a simple to implement one for sure.
First you sample two values from the vector and store them in an object, like:
values <- c(1, 2, 5, 6, 7, 15, 19, 20)
randomTwo <- sample(values, 2)
Then you start you while-loop. This loop checks if sum of the two sampled values modulo 10 equals 0 (I assumed you meant modulo from the examples in your question, see https://en.wikipedia.org/wiki/Modulo_operation to see what it does). If the operation does not equal 0 the loop samples two new values until the operation does equal zero, and you get your two values.
Here's what it looks like:
while (sum(randomTwo) %% 10 != 0) {
randomTwo <- sample(values, 2)
}
Now this might take more iterations than checking all combo's, and it might take less, depending on chance. If you have just this small vector than it's a nice solution. Good luck!
In a way where you don't need to compute a inmense matrix (way faster):
findpairs=function(a,sum,num){
list=list()
aux=1
for (i in 1:length(a)){
n=FALSE
n=which((a+a[i])==sum)
if (length(n)){
for (j in n){
if (j!=i){
list[[aux]]=c(a[i],a[j])
aux=aux+1
}
}
}
}
return(sample(list[1:(length(list)/2),num))
}
a=c(1,2,5,6,19,7,15,20)
a=a[order(a)]
sum=20
findpairs(a,sum,2)
[[1]]
[1] 5 15
[[2]]
[1] 1 19
Issue is that it gives repetition.
edit
Solved. Just take half of the list as the other half will be the same pairs the other way around.

How to exclusive a set from a large set in R

Suppose that I have a set of 10 elements. Suppose that my code is able to choose only 3 elements at a time. Then, I would like it to choose another $3$ elements, however, without selecting the elements that are already selected.
x <- c(4,3,5,6,-2,7,-4,10,22,-12)
Then, suppose that my condition is to select 3 elements that are less than 5. Then,
new_x <- c(4, 3, -2)
Then, I would like to select another 3 elements that are less than 5 but were not selected at the first time. If there is no 3 element then the third element should have value zero.
Hence,
new_xx <- c(-4,-12,0)
Any help, please?
Here is an option using split
f <- function(x, max = 5, n = 3) {
x <- x[x < max]
ret <- split(x, rep(1:(length(x) / n + 1), each = n)[1:length(x)])
lapply(ret, function(w) replace(rep(0, n), 1:length(w), w))
}
f(x)
#$`1`
#[1] 4 3 -2
#
#$`2`
#[1] -4 -12 0
Explanation: We define a custom function that first selects entries < 5, then splits the resulting vector into chunks of length 3 and stores the result in a list, and finally 0-pads those list elements that are vectors of length < 3.
Sample data
x <- c(4,3,5,6,-2,7,-4,10,22,-12)

fill up a matrix one random cell at a time

I am filling a 10x10 martix (mat) randomly until sum(mat) == 100
I wrote the following.... (i = 2 for another reason not specified here but i kept it at 2 to be consistent with my actual code)
mat <- matrix(rep(0, 100), nrow = 10)
mat[1,] <- c(0,0,0,0,0,0,0,0,0,1)
mat[2,] <- c(0,0,0,0,0,0,0,0,1,0)
mat[3,] <- c(0,0,0,0,0,0,0,1,0,0)
mat[4,] <- c(0,0,0,0,0,0,1,0,0,0)
mat[5,] <- c(0,0,0,0,0,1,0,0,0,0)
mat[6,] <- c(0,0,0,0,1,0,0,0,0,0)
mat[7,] <- c(0,0,0,1,0,0,0,0,0,0)
mat[8,] <- c(0,0,1,0,0,0,0,0,0,0)
mat[9,] <- c(0,1,0,0,0,0,0,0,0,0)
mat[10,] <- c(1,0,0,0,0,0,0,0,0,0)
i <- 2
set.seed(129)
while( sum(mat) < 100 ) {
# pick random cell
rnum <- sample( which(mat < 1), 1 )
mat[rnum] <- 1
##
print(paste0("i =", i))
print(paste0("rnum =", rnum))
print(sum(mat))
i = i + 1
}
For some reason when sum(mat) == 99 there are several steps extra...I would assume that once i = 91 the while would stop but it continues past this. Can somone explain what I have done wrong...
If I change the while condition to
while( sum(mat) < 100 & length(which(mat < 1)) > 0 )
the issue remains..
Your problem is equivalent to randomly ordering the indices of a matrix that are equal to 0. You can do this in one line with sample(which(mat < 1)). I suppose if you wanted to get exactly the same sort of output, you might try something like:
set.seed(144)
idx <- sample(which(mat < 1))
for (i in seq_along(idx)) {
print(paste0("i =", i))
print(paste0("rnum =", idx[i]))
print(sum(mat)+i)
}
# [1] "i =1"
# [1] "rnum =5"
# [1] 11
# [1] "i =2"
# [1] "rnum =70"
# [1] 12
# ...
See ?sample
Arguments:
x: Either a vector of one or more elements from which to choose,
or a positive integer. See ‘Details.’
...
If ‘x’ has length 1, is numeric (in the sense of ‘is.numeric’) and
‘x >= 1’, sampling _via_ ‘sample’ takes place from ‘1:x’. _Note_
that this convenience feature may lead to undesired behaviour when
‘x’ is of varying length in calls such as ‘sample(x)’. See the
examples.
In other words, if x in sample(x) is of length 1, sample returns a random number from 1:x. This happens towards the end of your loop, where there is just one 0 left in your matrix and one index is returned by which(mat < 1).
The iteration repeats on level 99 because sample() behaves very differently when the first parameter is a vector of length 1 and when it is greater than 1. When it is length 1, it assumes you a random number from 1 to that number. When it has length >1, then you get a random number from that vector.
Compare
sample(c(99,100),1)
and
sample(c(100),1)
Of course, this is an inefficient way of filling your matrix. As #josilber pointed out, a single call to sample could do everything you need.
The issue comes from how sample and which do the sampling when you have only a single '0' value left.
For example, do this:
mat <- matrix(rep(1, 100), nrow = 10)
Now you have a matrix of all 1's. Now lets make two numbers 0:
mat[15]<-0
mat[18]<-0
and then sample
sample(which(mat<1))
[1] 18 15
by adding a size=1 argument you get one or the other
now lets try this:
mat[18]<-1
sample(which(mat<1))
[1] 3 13 8 2 4 14 11 9 10 5 15 7 1 12 6
Oops, you did not get [1] 15 . Instead what happens in only a single integer (15 in this case) is passed tosample. When you do sample(x) and x is an integer, it gives you a sample from 1:x with the integers in random order.

Resources