Create functions to automatically calculate probabilities with R - r

I want to automatically calculate the probability with R. Rule : start with 0 points. We will flip a coin. If it comes up heads, we get a point. If comes up tails, we double our current score.
The functions I want to code:
Expected score after n flips (5flips, 15 flips...)
After n flips, what is the probability the score is a power of two (Express this probability as a number between 0 and 1)?
Standard deviation
The expected standard deviation of the scores?
I want my functions to adapt to rule changes. For example, 2/3 probability of heads, and a 1/3 probability of tails. What is our expected score after 10flips?

First, you want to think about what parameters the function needs to take. It appears it just needs to take the parameter n - the number of flips.
flips <- function(n){
}
Now, you can think about what needs to happen inside the function.
start with 0 points
add 1 if heads
double if tails
You also need to be able to do this n times, so it will need to be in a loop.
flips <- function(n){
## start with 0
sum <- 0
for(i in 1:n){
# create a flip (random draw of H or T)
flip <- sample(c("H", "T"), 1)
# identify what to do if flip is H
if(flip == "H"){
# increment sum by 1
sum <- sum + 1
# identify what to do if flip is not H (i.e., it is T)
}else{
sum <- sum*2
}
}
# return the sum
sum
}
flips(10)
# [1] 28
A function like this will code after n trials, what happens. That said, it seems like the questions you're trying to answer are more theoretical than they are about coding. If you can specify the operations you need to do, then we could probably help you code it.

Maybe you can start with building a function f like below which produces a series of random variables, where 0 and 1 denote head and tail respectively
f <- function(n,p) {
v <- sample(c(0,1),n,replace = TRUE,prob = c(p,1-p))
s <- 0
for (i in v) {
if (i == 1) {
s <- s*2
} else {
s <- s + 1
}
}
s
}
and then you can apply replicate to repeat the experiment, e.g.,
n <- 20
p <- 2/3
r <- replicate(1e6,f(n,p))
We will see
> mean(r)
[1] 629.074
> sd(r)
[1] 1326.681

Related

Inverse CDF method to simulate a random sample

I have a problem where I have written this piece of code, however I think there might be an issue with it.
This is the question:
Write an R function called pr1 that simulates a random sample of size n from the distribution with the CDF which is given as..
F_X(x) = 0 for x<=10
(x-10)^3/1000 for 10<x<20
1 for x=>20
x = 10 ( 1 + u^(1/3)) #I have used the inverse CDF method here and I now want to simulate a random sample of size n from the distribution.
Here is my code:
pr1 = function(n)
{ u = runif(n,0,1)
x = 10 * ( 1 + u^(1/3))
x }
pr1(5)
#This was just to check an example with n=5
My question is, since the CDF is 10< x <20, will this affect my code in any way?
Thank you
Are you confusing the range of X with the sample size? The former is restricted to the range (10, 20), the latter can be any positive integer.
You can do a sanity check on your inversion by considering U = 0, which should (and does) yield the minimum of the range of X, and U = 1, which should and does yield the maximum value of the range. There is no need to range restrict your inversion, the restriction is built into the use of U(0,1)'s on the input side, combined with the fact that CDFs are monotonically non-decreasing. Thus no value of U such that 0 < U < 1 can yield an outcome outside the range 10 < X < 20.
Since you want to simulate a piece-wise function, your R function should contain some flow controls like if.
Here's a start:
pr1 = function(n, drawing_range){
x <- sample(drawing_range, size = n) # random drawing of x
if (x <= 10)
output <- 0
else if ( 10 < x < 20 )
output <- (x-10)^3/1000
else
output <- 1
output
}
n is the number of draws. drawing_range is the population from which you draw; for example it can be from [-999, 999] in which case you input -999:999.

Simulating walk around a square

I am trying to simulate a walk around a square such that the probability of walking to a vertex adjacent in the square is p/2 going left and p/2 going right, then $1-p$ going diagonally. I've written some code to simulate this and made a function to calculate the number of occurrences of a subset of vertices. When using the starting vertex as the subset, it's getting close to the value of 0.25 that theory tells me I should expect.
move_func <- function(x, n){
pl <- c(x)
for(i in 1:n){
k <- cbind(replicate(p*1000,c(1,0)),replicate(p*1000, c(0,1)),replicate((1-
p)*2000, c(1,1))) #produces matrix of vectors in proportion to probabilities
x <- (x + k[,sample(ncol(k), 1)])%%2 # samples from matrix to get 2 x 1
#vector representing a movement around the square
pl <- cbind(pl, x) #adds new movement to matrix of verticies visited
}
return(pl) # returns final matrix will all verticies visited on the walk
}
test_in_H <- function(x, H){
sum(apply(H,2,identical, x = x)) # tests if vector x is in subset H (a
# matrix)
}
pl <- move_func(x, n)
print(mean(replicate(100, sum(apply(pl, 2, test_in_H, H = H ))/n))) #repeats
#the simulation 100 times and finds the average
}
However, it varies a lot around this even when a lot of steps are taken in the walk and it's repeated many times.
Does anyone know if I'm making a mistake in my code, or if it's simply going to vary quite a bit due to it being a simulation.

R generate clustered pseudo-random numbers

In R: I am trying to figure out a way to generate vectors with values 0 or 1. Rather than drawing each 0 and 1 independtly from a uniform distribution I would like the 1s to come clustered e.g. (1,0,0,0,0,0,1,0,1,1,1,1,0,1,0,0,0,0,1,0,0,0,...). In its most simple form something like: "if the previous number was 1 then increase the likelihood of drawing 1". Or make the chance of drawing 1 be dependent of the sum of the last say 5 numbers drawn. Is there an efficient way of doing this, maybe even a package. Would be reminiscent of rbinom(n,1,prob) with variable prob.
You can try the following method using a loop. First you can create a variable called "x" using sample which will assign an initial value of 0 or 1.
Within the loop you can use the sample function again, but this time you assign values to the prob option. For this purpose I've set the probability to 70/30 split (ie if your previous number was a 0, there is a 70% chance that the next number will be a 0 and vice versa if your previous value was 1.)
x = sample(c(0,1),1)
for(i in 2:100){
if(x[i-1] == 0){
x[i] = sample(c(0,1),1,prob=c(0.7,0.3))
} else {
x[i] = sample(c(0,1),1,prob=c(0.3,0.7))
}
}
x[1:20]
[1] 1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0
So I took good inspiration from Colin Charles, and added a little adjustability. There are obviously many ways to compute prob as being influenced by prior draws. I ended up using a cutoff m of the sum of the last w draws to determine whether to use low prob p0 or high prob p1 for each 0/1 to make vector of length l.
f <- function (l, w, m, p0, p1){
v = rbinom(w,1,p0) #Initilize with p0
for (i in w:(l-1)){
v[i+1] <- ifelse(sum(v[(i-w+1):i]) > m,
rbinom(1,1,p1),
rbinom(1,1,p0))
}
return(v)
}
#Test:
set.seed(8)
plot(f(100, 5, 1, 0.1, 0.6)) #Clustered
plot(f(100, 5, 2, 0.1, 0.4)) #Less clustered
Gives:
and (less clustered):

Calculating pmf and cdf for 20 sided dice in R

I would like to create two functions that would calculate the probability mass function (pmf) and cumulative distribution function (cdf) for a dice of 20 sides.
In the function I would use one argument, y for the side(from number 1 to 20). I should be able to put a vector and it would return the value for each of the variable.
If the value entered is non-discrete, it should then return zero in the result and a warning message.
This is what have solved so far for PMF:
PMF= function(side) {
a = NULL
for (i in side)
{
a= dbinom(1, size=1, prob=1/20)
print(a)
}
}
And this is what I got for CDF:
CDF= function(side) {
a = NULL
for (i in side)
{
a= pnorm(side)
print(a)
}
}
I am currently stuck with the warning message and the zero in result. How can I assing in the function the command line for that?
Next,how can I plot these two functions on the same plot on a specific interval (for example 1,12)?
Did I use the right function for calculating cdf and pmf?
I would propose the following simplifications:
PMF <- function(side) {
x <- rep(0.05, length(side))
bad_sides <- ! side %in% 1:20 # sides that aren't in 1:20 are bad
x[bad_sides] <- 0 # set bad sides to 0
# warnings use the warning() function. See ?warning for details
if (any(bad_sides)) warning("Sides not integers between 1 and 20 have 0 probability!")
# print results is probably not what you want, we'll return them instead.
return(x)
}
For the CDF, I assume you mean the probability of rolling a number less than or equal to the side given, which is side / 20. (pnorm is the wrong function... it gives the CDF of the normal distribution.)
CDF <- function(side) {
return(pmin(1, pmax(0, floor(side) / 20)))
}
Technically, the CDF is defined for non-integer values. The CDF of 1.2 is just the same as the CDF of 1, so I use floor here. If you want to make it more robust, you could make it min(1, floor(side) / 20) to make sure it doesn't exceed 1, and similarly a max() with 0 to make sure it's not negative. Or you could just try not to give it negative values or values over 20.
Plotting:
my_interval <- 1:12
plot(range(my_interval), c(0, 1), type = "n")
points(my_interval, PMF(my_interval))
lines(my_interval, CDF(my_interval), type = "s")

To find the distance between two roots in R

Suppose I have a function f(x) that is well defined on an interval I. I want to find the greatest and smallest roots of f(x), then taking the difference of them. What is a good way to program it?
To be precise, f can at worst be a rational function like (1+x)/(1-x). It should be a (high degree) polynomial most of the times. I only need to know the result numerically to some precision.
I am thinking about the following:
Convert f(x) into a form recognizable by R. (I can do)
Use R to list all roots of f(x) on I (I found the uniroot function only give me one root)
Use R to to find the maximum and minimum elements in the list (should be possible once I converted it to a vector)
Taking the difference of the two roots. (should be trivial)
I am stuck on step (2) and I do not know what to do. My professor give a brutal force solution, suggesting me to do:
Divide interval I into one million pieces.
Evaluate f on each end points, find the end points where f>=0.
Choose the maximum and minimum elements from the set formed in step 2.
Take the difference between them.
I feel this way is not very efficient and might not work for all f in general, but I am having trouble to implement it even for quadratics. I do not know how to do step (2) as well. So I want to ask for a hint or some toy examples.
At this point I am trying to implement the following code:
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(tail(ypos, -1) != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
at here everything is okay, but when I try to extract the roots to Y[i,1], Y[i,2] by
Y[i,1]=(ri<-root intervals(function(x)(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505, c(0,40),n=1e6)[1]
I found I cannot evaluate it anymore. R keep telling me
Error: unexpected symbol in:
"}
Y[i,1]=(ri<-root intervals"
and I got stuck. I really appreciate everyone's help as I am feeling lost.
I checked the function's expression many times using the plot function and it has no grammar mistakes. Also I believe it is well defined for all X in the interval.
This should give you a good start on the brute force solution. You're right, it's not elegant, but for relatively simple univariate functions, evaluating 1 million points is trivial.
root_intervals <- function(f, interval, n = 1e6) {
xvals <- seq(interval[1], interval[2], length = n)
yvals <- f(xvals)
ypos <- yvals > 0
x1 <- which(ypos[-1] != head(ypos, -1))
x2 <- x1 + 1
## so all the zeroes we can see are between x1 and x2
return(cbind(xvals[x1], xvals[x2]))
}
This function returns a two column matrix of x values, where the function changes sign between column 1 and column 2:
f1 <- function (x) 0.05 * x^5 - 2 * x^4 + x^3 - x^2 + 1
> (ri <- root_intervals(f1, c(-10, 10), n = 1e6))
[,1] [,2]
[1,] -0.6372706 -0.6372506
[2,] 0.8182708 0.8182908
> f1(ri)
[,1] [,2]
[1,] -3.045326e-05 6.163467e-05
[2,] 2.218895e-05 -5.579081e-05
Wolfram Alpha confirms results on the specified interval.
The top and bottom rows will be the min and max intervals found. These intervals (over which the function changes sign) are precisely what uniroot wants for it's interval, so you could use it to solve for the (more) exact roots. Of course, if the function changes sign twice within one interval (or any even number of times), it won't be picked up, so choose a big n!
Response to edited question:
Looks like your trying to define a bunch of functions, but your edits have syntax errors. Here's what I think you're trying to do: (this first part might take some more work to work right)
my_funs <- list()
Y=rep(0,200)
dim(Y)=c(100,2)
for(i in 1:100){
X=rnorm(9,0,1)
Z=rnorm(16,0,1)
a=0.64
b=a*sum(Z^2)/sum(X^2)
my_funs[[i]] <- function(x){(x/(a*x+b))^{9/2}*(1/((1-a)+a*(1-a)/b*x))^4-0.235505}
}
Here's using the root_intervals on the first of your generated functions.
> root_intervals(my_funs[[1]], interval = c(0, 40))
[,1] [,2]
[1,] 0.8581609 0.8582009
[2,] 11.4401314 11.4401714
Notice the output, a matrix, with the roots of the function being between the first and second columns. Being a matrix, you can't assign it to a vector. If you want a single root, use uniroot using each row to set the upper and lower bounds. This is left as an exercise to the reader.

Resources