Calculate probability table with exponentially increasing dimensions - r

The goal here is to calculate the expected outcomes and scoring table of a game.
The game outcomes can be described in terms of a weighted dice roll:
We can set the input weights and bonuses
Then we roll N times
Then we apply score modifiers to the outcome table
Then we calculate the expected score of the set of N rolls.
[roll: 1, weight: 0.24, score: 2],
[roll: 2, weight: 0.11, score: 4],
…
[roll: 19, weight: 0.05, score: 7],
[roll: 20, weight: 0.03, score: 20],
example bonus: min_set_score = rolls_per_set * 6.2
if we roll the die twice, we can manually generate the result table:
[rolls: [1,1], weight: 0.24 * 0.24, score: 2 + 2]
[rolls: [1,2], weight: 0.24 * 0.11, score: 2 + 4]
…etc
then [1,1] and [1,2] would actually be adjusted to 6.2.
How would we calculate the result table for 50, 100, 500 rolls? 50 rolls causes 20^50 outcomes, and can no longer be calculated by our naive implementation in a reasonable time frame.
Is this something that can be done with a better library? a more suitable language or technology?
This does not need to be done in real time, we would like to use the results to inform our decisions about how the game elements should affect the weights and bonuses.

Related

Converting stochastic probabilities into target numbers

Given the following stochastic transition matrix, how can I convert these probabilities to random number generator target numbers?
transmat <- structure(c(0.77, 0.561, 0.14, 0.187, 0.07, 0.063, 0, 0.063,
0.01, 0.125, 0.01, 0.001), dim = c(2L, 6L),
dimnames = list(c("0", "1"), c("0", "1", "2", "5", "8", "9")))
The aim is to output target numbers for a transition to occur in a simulation environment. This can be done manually as in the example below, which gives approximations of dice roll ranges for the sum of 3 fair six-sided dice for the first row of the transition matrix (see the probability distribution here); however, an algorithm to do this is needed.
In the provided example, state 0 will remain as state 0 with a roll of 3-12, but will transition to state 1 on a roll of 13-14, state 2 with a roll of 15-16, state 8 on a roll of 17, and state 9 on a roll of 18.
0 1 2 5 8 9
0 "3-12" "13-14" "15-16" "-" "17" "18"
Conceptually I am unsure of how to proceed. One thought was to (1) index the probabilities in the row and the range of random number generator results and (2) somehow compare the cumulative probability and the remaining probabilities with the probabilities of possible random numbers.
My understanding of your problem is that you'd like to generate simulations from the probabilities in your matrix "transmat".
Apart from the literal way of performing the dice-roll simulation you describe (i.e. just generate random dice rolls using floor(runif(n=1,min=1,max=7)) and generate the next state using some kind of if statement),
You could use sample to generate realizations from the probabilities in "transmat" directly:
sample(x=as.numeric(colnames(transmat)),
size=10^4,
replace=TRUE,
prob=transmat[1,])
The key here is using the probabilities from transmat as the argument to prob in sample. A count of the simulated states using table:
0 1 2 8 9
7681 1413 713 98 95
shows this is a simulation of the transition probabilities in the first row of your transition matrix.
Could you use different colored dice (or just roll one at a time)? That would give more states, each with the same probability. For example, there are 6^3 = 216 states with three dice if they are ordered by color (or by throw order). For the first row of transmat, it could be:
c(
"1,1,1 - 5,4,4" = 0.768518519,
"5,4,5 - 6,3,5" = 0.143518519,
"6,3,6 - 6,6,2" = 0.069444444,
" - " = 0,
"6,6,3 - 6,6,4" = 0.009259259,
"6,6,5 - 6,6,6" = 0.009259259
)
#> 1,1,1 - 5,4,4 5,4,5 - 6,3,5 6,3,6 - 6,6,2 - 6,6,3 - 6,6,4 6,6,5 - 6,6,6
#> 0.768518519 0.143518519 0.069444444 0.000000000 0.009259259 0.009259259
Each additional die/roll would increase the precision by a factor of 6.
If that's an option, I have some ideas about automating it.

Why does this idea to solve a absorption markov chain not work?

Edit: Seems like my method works.
I encountered a programming question that required me to calculate probability of reaching terminal states.
After a few painstaking hours trying to solve it traditionally, I googled and found that it is called an absorption markov chain. And there is a formula for it.
However, I am trying to figure out what is missing from my solution because it seems correct.
Pardon the crude drawing. Basically there are 4 nodes in this graph, the black lines show the original transitions and probability, while the coloured lines show the paths to termination.
The steps is something like this:
Trace all possible paths to a termination point, sum up the probability of every path to the termination node. That is the probability of reaching the node.
Ignore cyclical paths. Meaning that the "1/3" transition from 4 to 1 is essentially ignored.
Reason for (2): Because we can assume that going back will increase the probability of every possible path in such a way that they still maintain the same relative probability to each other! For example, if I were to go back to 1 from 4, then the chances of going to 2, 3 and 4 will each increase by 1/27 (1/3 * 1/3 * 1/3), making the relative probability still equal to each other!
I hope the above makes sense.
Calculate the probability of each node as "probability of each node" / "probability of terminating" because by eliminating cyclical graphs, the probability of reaching each node will not be 1 anymore.
So given the above algorithm, here are the values found:
Red path: 1/3
Green path: 1/3
Blue path: 1/3 * 2/3 = 2/9
Probability to reach 3: 1/3
Probability to reach 2: 2/9 + 1/3 = 5/9
Total probability to terminate: 1/3 + 5/9 = 8/9
Hence, final probability to reach 3:
(1/3) / (8/9) = 3/8
Final probability to reach 2:
(5/9) / (8/9) = 5/8
If you are unsure about step (2), we can try it again!
Assume that we went from 1 to 4 and back to 1 again, this has a probability of 1/9.
From here, we can follow each coloured paths again * 1/9 probability.
When combined with the probabilities calculated earlier, this gives us:
10/27 probability to reach 3.
50/81 probability to reach 2.
Total terminating probability of 80/81.
New probability of terminating at 3 is now (10/27) / (80/81) = 3/8 (SAME)
New probability of terminating at 2 is now (50/81) / (80/81) = 5/8 (SAME)
However, the actual probabilities are (2/5) and (3/5) for 3 and 2 respectively, using an algorithm I found online (there is a slim chance it is wrong though). Turns out I used the online solution wrongly
I realised my answer is actually pretty close, and I am not sure why is it wrong?
We can represent the transitions of the Markov chain with a matrix M. In Python notation, this would look like:
M = [[ 0, 1/3, 1/3, 1/3],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[1/3, 2/3, 0, 0]])
And the probabilities with a vector S, initially with 100% in state 1.
S = [1, 0, 0, 0]
Multiplying S by M gives the new probabilities:
S*M = [0, 1/3, 1/3, 1/3]
S*M**2 = [1/9, 5/9, 1/3, 0]
S*M**3 = [0, 16/27, 10/27, 1/27]
S*M**4 = [1/81, 50/81, 10/27, 0]
S*M**n = [3**(-n)*((-1)**n + 1)/2,
3**(-n)*((-1)**n + 5*3**n - 6)/8,
3**(-n)*(-(-1)**n + 3*3**n - 2)/8,
3**(-n)*(1 - (-1)**n)/2]
In the limit with n going to infinity, for even n, this would give
[0, 5/8, 3/8, 0]
Also starting with 1, 2, 3 and 4 with equal probability:
S = [1/4, 1/4, 1/4, 1/4]
S*M = [1/12, 1/2, 1/3, 1/12]
S*M**2 = [1/36, 7/12, 13/36, 1/36]
S*M**n = [3**(-n)/4, 5/8 - 3*3**(-n)/8, 3/8 - 3**(-n)/8, 3**(-n)/4]
leading to the same limit [0, 5/8, 3/8, 0].
Starting with 1 and 4 with equal probability:
S = [1/2, 0, 0, 1/2]
S*M = [1/6, 1/2, 1/6, 1/6]
S*M**2 = [1/18, 2/3, 2/9, 1/18]
S*M**n = [3**(-n)/2, 3/4 - 3*3**(-n)/4, 1/4 - 3**(-n)/4, 3**(-n)/2]
gives another limit for n going to infinity:
[0, 3/4, 1/4, 0]

Family Wise Error Rate controlled by method of maximum statistics

I am flipping each coin 100 times in a bag of 50 coins and then I want to use the Method of Maximum statistics in order to determine the Family Wise Error Rate. However, I keep getting an FWER of 1 which feels wrong.
coins <- rbinom(50, 100, 0.5)
So I start by defining a new function where we input how many times we do randomizations, the coins themselves, and how many times we flip them.
simulate_max <- function(n_number_of_randomizations, input_coins, N_number_of_tosses, alpha = 0.05) {
maxList <- NULL
Then we do a for loop for every time we have specified.
for (iteration in 1:n_number_of_randomizations){
Now we shuffle the list of coins
CoinIteration <- sample(input_coins)
Now we apply the binary test to every coin in the bag
testresults <- map_df(CoinIteration, function(x) tidy(binom.test(x,N_number_of_tosses,p=alpha)) )
Now we want to add the maximum result from every test to the max list.
thisRandMax <- max(testresults$statistic)
maxList <- c(maxList, thisRandMax)
}
Finally, we iterate through every member of the maximum list to subtract the expected value of heads (ie 50 for 50% chance * 100 tosses.
for (iterator2 in 1:length(maxList)){
maxList[iterator2]<-maxList[iterator2]-(0.5*N_number_of_tosses)
}
Return the output from the function
return(data.frame(maxList))
}
Now we apply this simulation for each of the requested iterations.
repsmax = map_df(1:Nreps, ~simulate_max(Nrandomizations,coins,Ntosses))
Now we calculate the fwer by dividing the increased amount by the total number of cells.
fwer = sum(repsmax>0) / (Nreps*Nrandomizations)
There are some issues that I think would be good to clarify.
A FWER of ~1 seems about right to me given the parameters of your experiment. FWER relates to Type I error, and for a single normally distributed test at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95 = 0.05. For two tests at alpha = 0.05, FWER = 1 - P(Type I error = 0); FWER = 1 - 0.95^2 = 0.0975. You have 50 coins (50 tests), so your FWER at alpha = 0.05 is 1 - 0.95^50 = 0.923. If your code treats the 100 coins as 100 tests, your FWER will be = 0.996 (~1).
You can control for Type I error (account for multiple testing) by using e.g. the Bonferroni correction (alpha / n). If you change your alpha to "0.05 / 50" = 0.001, you will control your FWER (reduce it) to 0.05 (1 - 0.999^50 = ~0.049). I suspect this is the answer you are looking for: if alpha = 0.001 then FWER = 0.05 and you have an acceptable chance of incorrectly rejecting the null hypothesis.
I don't know what the "maximum estimate of the effect size" is, or how to calculate it, but given that the two distributions are approximately identical, the effect size will be ~ 0. It then makes sense that controlling FWER to 0.05 (by adjusting alpha to 0.001) is the 'answer' to the question and if you can get your code to reflect that logic, I think you'll have your solution.

How to solve a matrix equation in R

My friend and I (both non-R experts) are trying to solve a matrix equation in R. We have matrix y which is defined by:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
This matrix simulates the way students in our school pass on to the next year. By multiplying this matrix with a vector containing the amount of students in each year we will get the amount of students in each year a year later.
With the function:
sumfun<-function(x,start,end){
return(sum(x[start:end]))
We add up the amount of students that are in each year to get the amount of students in our school in total. We want to fill in the vector (which we multiplicate by array with our matrix) with the amount of students currently in the school and have the amount of new students (first number of the vector) as our variable X.
For example:
sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6)
We want to equate this equation to 1000, the maximum amount of students our school building can house. By doing this, we can calculate how many new students can be accepted by our school. We have no idea how to do this. We would precast X is something between 100 and 300. We would be very grateful if somebody can help us with this!
I'm not familiar with R but I can guide through the main process of solving this matrix equation. Assuming that your matrix is called P:
And let the current student vector be called s0:
s0 = {x, 200, 178, 180, 201, 172, 0, 0, 200, 194, 0, 0};
Note that we leave x undefined as we want to solve for this variable later. Note that even though x is unknown, we can still multiply s0 with P. We call this new vector s1.
s1 = s0.P = {0.003*x, 2.34 + 0.977*x, 192.593, 173.326, 177.355, 192.113, 0, 0, 0, 0, 0, 192.749 + 0.02*x}
We can verify that this is correct as of the student years 2-6, only year 2 is effected by the amount of new students (x). So if now sum over the years 2-6 like in your example, we find that the sum is:
s1[2:6] = 737.727 + 0.977*x
All that is left is solving the trivial equation that s1[2:6] == 1000:
s1[2:6] == 1000
737.727 + 0.977*x == 1000
x = 268.447
Let me know if this is correct! This was all done in Mathematica.
The following code shows how to this in R:
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
sumfun<-function(x,start,end){
return(sum(x[start:end]))
}
students <- function(x) {
students = sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6);
return(students - 1000);
}
uniroot(students, lower=100, upper=300)$root;
The function uniroot finds whenever a function is 0. So if you define a function which returns the amount of students for a value x and subtract 1000, it will find the x for which the number of students is 1000.
Note: this only describes short term behavior of the total amount of students. To have the number of students be 1000 in the long-term other equations must be solved.
I would suggest probing various x values and see the resulting answer. From that, you could see the trend and use it for figuring out the answer. Here is an example:
# Sample data
y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrow=12, ncol=12, byrow=TRUE)
# funciton f will return a total number of students in the school for a given 'x'
f <- function(x) {
z <- c(x,200,178,180,201,172,0,0,200,194,0,0)
sum(t(y[,2:6]) %*% z)
}
# Let's see the plot
px <- 1:1000
py <- sapply(px,f) # will calculate the total number of students for each x from 1 to 1000
plot(px,py,type='l',lty=2)
# Analyze the matrices (the analysis is not shown here) and reproduce the linear trend
lines(px,f(0)+sum(y[1,2:6])*px,col='red',lty=4)
# obtain the answer using the linear trend
Xstudents <- (1000-f(0))/sum(y[1,2:6])
floor(Xstudents)

How can I get cumulative measure of data in increasing fashion

I have a data with two columns like following
Column_1 , Column_2
1 , 0.474124203822
2 , 0.545760430686
3 , 0.614420062696
4 , 0.654518950437
5 , 0.696226415094
6 , 0.6875
For simplicity, you can consider the data like
Column_2 = Probability of success when (X=column_1)
The relationship is somewhat increasing. Now If I just plot the data upto 30 points as a line graph I will obtain the following
Now, my question is how can I plot my data in a cumulative fashion (using what measure) like the following simple example
col_1(age) , col_2(Total cumulative number of people <= age)
10 , 200
20 , 1000
30 , 5000
Please let me know if my description is not clear enough or you have additional question.
Given your probability mass function, you can compute the cumulative mass function as follows.
// Probability mass function
pmf = [0.1, 0.3, 0.2, 0.1, 0.3]
// Cumulative mass function
cmf = [0, 0, 0, 0, 0]
cmf[0] = pmf[0]
for i = 1, 2, 3, 4
cmf[i] = cmf[i - 1] + pmf[i]
Now simply plot your cumulative mass function instead of your probability mass function.

Resources