Calculating pmf and cdf for 20 sided dice in R - r

I would like to create two functions that would calculate the probability mass function (pmf) and cumulative distribution function (cdf) for a dice of 20 sides.
In the function I would use one argument, y for the side(from number 1 to 20). I should be able to put a vector and it would return the value for each of the variable.
If the value entered is non-discrete, it should then return zero in the result and a warning message.
This is what have solved so far for PMF:
PMF= function(side) {
a = NULL
for (i in side)
{
a= dbinom(1, size=1, prob=1/20)
print(a)
}
}
And this is what I got for CDF:
CDF= function(side) {
a = NULL
for (i in side)
{
a= pnorm(side)
print(a)
}
}
I am currently stuck with the warning message and the zero in result. How can I assing in the function the command line for that?
Next,how can I plot these two functions on the same plot on a specific interval (for example 1,12)?
Did I use the right function for calculating cdf and pmf?

I would propose the following simplifications:
PMF <- function(side) {
x <- rep(0.05, length(side))
bad_sides <- ! side %in% 1:20 # sides that aren't in 1:20 are bad
x[bad_sides] <- 0 # set bad sides to 0
# warnings use the warning() function. See ?warning for details
if (any(bad_sides)) warning("Sides not integers between 1 and 20 have 0 probability!")
# print results is probably not what you want, we'll return them instead.
return(x)
}
For the CDF, I assume you mean the probability of rolling a number less than or equal to the side given, which is side / 20. (pnorm is the wrong function... it gives the CDF of the normal distribution.)
CDF <- function(side) {
return(pmin(1, pmax(0, floor(side) / 20)))
}
Technically, the CDF is defined for non-integer values. The CDF of 1.2 is just the same as the CDF of 1, so I use floor here. If you want to make it more robust, you could make it min(1, floor(side) / 20) to make sure it doesn't exceed 1, and similarly a max() with 0 to make sure it's not negative. Or you could just try not to give it negative values or values over 20.
Plotting:
my_interval <- 1:12
plot(range(my_interval), c(0, 1), type = "n")
points(my_interval, PMF(my_interval))
lines(my_interval, CDF(my_interval), type = "s")

Related

Trying to plot loglikelihood of Cauchy distribution for different values of theta in R

I am trying to plot the log-likelihood function of the Cauchy distribution for varying values of theta (location parameter). These are my observations:
obs<-c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
Here is my log-likelihood function:
ll_c<-function(theta,x_values){
n<-length(x_values)
logl<- -n*log(pi)-sum(log(1+(x_values-theta)^2))
return(logl)
}
and Ive tried making a plot by using this code:
x<-seq(from=-10,to=10,by=0.1);length(x)
theta_null<-NULL
for (i in x){
theta_log<-ll_c(i,counts)
theta_null<-c(theta_null,theta_log)
}
plot(theta_null)
The graph does not look right and for some reason the length of x and theta_null differs.
I am assuming that theta is your location parameter (the scale is set to 1 in my example). You should obtain the same result using a t-distribution with 1 df and shifting the observations by theta. I left some comments in the code as guidance.
obs = c(1.77,-0.23,2.76,3.80,3.47,56.75,-1.34,4.24,3.29,3.71,-2.40,4.53,-0.07,-1.05,-13.87,-2.53,-1.74,0.27,43.21)
ll_c=function(theta, obs)
{
# Compute log-lik for obs and a value of thet (location)
logl= sum(dcauchy(obs, location = theta, scale = 1, log = T))
return(logl)
}
# Loop for possible values of theta(obs given)
x = seq(from=-10,to=10,by=0.1)
ll = NULL
for (i in x)
{
ll = c(ll, ll_c(i, obs))
}
# Plot log-lik vs possible value of theta
plot(x, ll)
It is hard to say exactly what you are experiencing without more info. But I'll make an educated guess.
First of all, we can simplify this a lot by using the *t family of functions for the t distribution, as the cauchy distribution is just the t distribution with df = 1. So your calculations could've been done using
for(i in ncp)
theta_null <- c(theta_null, sum(dt(values, 1, i, log = TRUE)))
Note that multiplying by n doesn't actually matter for any practical purposes. We are usually interested in minimizing/maximizing the likelihood in which case all constants are irrelevant.
Now if we use this approach, we can quite quickly notice something by printing the values:
print(head(theta_null))
[1] -Inf -Inf -Inf -Inf -Inf -Inf
So I am assuming what you are experiencing is that many of your values are "almost" negative infinity, and maybe these are not stored correctly in your outcome vector. I can't see that this should be the case from your code, but this would be my initial guess.

Create functions to automatically calculate probabilities with R

I want to automatically calculate the probability with R. Rule : start with 0 points. We will flip a coin. If it comes up heads, we get a point. If comes up tails, we double our current score.
The functions I want to code:
Expected score after n flips (5flips, 15 flips...)
After n flips, what is the probability the score is a power of two (Express this probability as a number between 0 and 1)?
Standard deviation
The expected standard deviation of the scores?
I want my functions to adapt to rule changes. For example, 2/3 probability of heads, and a 1/3 probability of tails. What is our expected score after 10flips?
First, you want to think about what parameters the function needs to take. It appears it just needs to take the parameter n - the number of flips.
flips <- function(n){
}
Now, you can think about what needs to happen inside the function.
start with 0 points
add 1 if heads
double if tails
You also need to be able to do this n times, so it will need to be in a loop.
flips <- function(n){
## start with 0
sum <- 0
for(i in 1:n){
# create a flip (random draw of H or T)
flip <- sample(c("H", "T"), 1)
# identify what to do if flip is H
if(flip == "H"){
# increment sum by 1
sum <- sum + 1
# identify what to do if flip is not H (i.e., it is T)
}else{
sum <- sum*2
}
}
# return the sum
sum
}
flips(10)
# [1] 28
A function like this will code after n trials, what happens. That said, it seems like the questions you're trying to answer are more theoretical than they are about coding. If you can specify the operations you need to do, then we could probably help you code it.
Maybe you can start with building a function f like below which produces a series of random variables, where 0 and 1 denote head and tail respectively
f <- function(n,p) {
v <- sample(c(0,1),n,replace = TRUE,prob = c(p,1-p))
s <- 0
for (i in v) {
if (i == 1) {
s <- s*2
} else {
s <- s + 1
}
}
s
}
and then you can apply replicate to repeat the experiment, e.g.,
n <- 20
p <- 2/3
r <- replicate(1e6,f(n,p))
We will see
> mean(r)
[1] 629.074
> sd(r)
[1] 1326.681

How to calculate Kullback-leiber divergence of Kernel estimation in R

I used Kernel estimation to get a non parametric probability density function. Then, I want to compare the tails 'distance' between two Kernel distribution of continuous variables, using Kullback-leiber divergence. I have tried the following code:
kl_l <- function(x,y) {
integrand <- function(x,y) {
f.x <- fitted(density(x, bw="nrd0"))
f.y <- fitted(density(y, bw="nrd0"))
return((log(f.x)-log(f.y))*f.x)
}
return(integrate(integrand, lower=-Inf,upper=quantile(density(x, bw="nrd0"),0.25))$value)
#the Kullback-leiber equation
}
When I run kl_l(a,b) for a, b = 19 continuous variables, it returns a warning
Error in density(y, bw = "nrd0") : argument "y" is missing, with no default
Is there any way to calculate this?
(If anyone wants to see the actual equation: https://www.bankofengland.co.uk/-/media/boe/files/working-paper/2019/attention-to-the-tails-global-financial-conditions-and-exchange-rate-risks.pdf page 13.)
In short, I think you just need to move the f.x and f.y outside the integrand (and possibly replace fitted with approxfun):
kl_l <- function(x, y) {
f.x <- approxfun(density(x, bw = "nrd0"))
f.y <- approxfun(density(y, bw = "nrd0"))
integrand <- function(z) {
return((log(f.x(z)) - log(f.y(z))) * f.x(z))
}
return(integrate(integrand, lower = -Inf, upper = quantile(density(x, bw="nrd0"), 0.25))$value)
#the Kullback-leiber equation
}
Expanding a little:
Looking at the paper you referenced, it appears as though you need to first create the two fitted distributions f and g. So if your variable a contains observations under the 1-standard-deviation increase in global financial conditions, and b contains the observations under average global financial conditions, you can create two functions as in your example:
f <- approxfun(density(a))
g <- approxfun(density(b))
Then define the integrand:
integrand <- function(x) log(f(x) / g(x)) * f(x)
The upper bound:
upper <- quantile(density(b, bw = "nrd0"), 0.25)
And finally do the integration on x within the specified bounds. Note that each value of x in the numerical computation has to go into both f and g; in your function kl_l, the x and y were separately going into the integrand, which I think is incorrect; and in any case, integrate will only have operated on the first variable.
integrate(integrand, lower = -Inf, upper = upper)$value
One thing to check for is that approxfun returns NA for values outside the range specified in the density, which can mess up your operation, so you'll need to adjust for those (if you expect the density to go to zero, for example).

pth Quantile of Standard Normal Distribution - R

I'm learning statistics and R from a book called "Discovering Statistics using R"... Although it's very informative, it seems to skip over areas even though it suggests no prior knowledge of statistics or R is needed. So to the problem:
How can you calculate in R the pth quantile of the Standard Normal Distribution using the Dichotomy (or division in halves) method? (and assuming no use of qnorm()). that is:
pnorm(x) = p
pnorm(x)-p = 0
f (x) = 0
Update:
Dichotomy is a method where you take an interval [a,b] which takes values of different
signs at the end points of the interval and has a single root x within [a,b]. You then half if to find F(x1), and if f(x1) != 0 it gives you [a,x1] and [x1,b]... where the sequence x1, x2,..., converges to 0.
Clumsy, but this works:
tolerance <- 1e-6
interval <- c(-1000,1000)
quantile <- 0.2
while(interval[2]-interval[1] > tolerance) {
cat('current interval: ',interval,'\n')
interval.left <- c(interval[1],mean(interval))
interval.right <- c(mean(interval),interval[2])
if(sum(sign(pnorm(interval.left)-quantile))==0) {
interval <- interval.left
} else {
interval <- interval.right
}
}
mean(interval)
qnorm(quantile)

How do I best simulate an arbitrary univariate random variate using its probability function?

In R, what's the best way to simulate an arbitrary univariate random variate if only its probability density function is available?
Here is a (slow) implementation of the inverse cdf method when you are only given a density.
den<-dnorm #replace with your own density
#calculates the cdf by numerical integration
cdf<-function(x) integrate(den,-Inf,x)[[1]]
#inverts the cdf
inverse.cdf<-function(x,cdf,starting.value=0){
lower.found<-FALSE
lower<-starting.value
while(!lower.found){
if(cdf(lower)>=(x-.000001))
lower<-lower-(lower-starting.value)^2-1
else
lower.found<-TRUE
}
upper.found<-FALSE
upper<-starting.value
while(!upper.found){
if(cdf(upper)<=(x+.000001))
upper<-upper+(upper-starting.value)^2+1
else
upper.found<-TRUE
}
uniroot(function(y) cdf(y)-x,c(lower,upper))$root
}
#generates 1000 random variables of distribution 'den'
vars<-apply(matrix(runif(1000)),1,function(x) inverse.cdf(x,cdf))
hist(vars)
To clarify the "use Metropolis-Hastings" answer above:
suppose ddist() is your probability density function
something like:
n <- 10000
cand.sd <- 0.1
init <- 0
vals <- numeric(n)
vals[1] <- init
oldprob <- 0
for (i in 2:n) {
newval <- rnorm(1,mean=vals[i-1],sd=cand.sd)
newprob <- ddist(newval)
if (runif(1)<newprob/oldprob) {
vals[i] <- newval
} else vals[i] <- vals[i-1]
oldprob <- newprob
}
Notes:
completely untested
efficiency depends on candidate distribution (i.e. value of cand.sd).
For maximum efficiency, tune cand.sd to an acceptance rate of 25-40%
results will be autocorrelated ... (although I guess you could always
sample() the results to scramble them, or thin)
may need to discard a "burn-in", if your starting value is weird
The classical approach to this problem is rejection sampling (see e.g. Press et al Numerical Recipes)
Use cumulative distribution function http://en.wikipedia.org/wiki/Cumulative_distribution_function
Then just use its inverse.
Check here for better picture http://en.wikipedia.org/wiki/Normal_distribution
That mean: pick random number from [0,1] and set as CDF, then check Value
It is also called quantile function.
This is a comment but I don't have enough reputation to drop a comment to Ben Bolker's answer.
I am new to Metropolis, but IMHO this code is wrong because:
a) the newval is drawn from a normal distribution whereas in other codes it is drawn from a uniform distribution; this value must be drawn from the range covered by the random number. For example, for a gaussian distribution this should be something like runif(1, -5, +5).
b) the prob value must be updated only if acceptance.
Hope this help and hope that someone with reputation could correct this answer (especially mine if I am wrong).
# the distribution
ddist <- dnorm
# number of random number
n <- 100000
# the center of the range is taken as init
init <- 0
# the following should go into a function
vals <- numeric(n)
vals[1] <- init
oldprob <- 0
for (i in 2:n) {
newval <- runif(1, -5, +5)
newprob <- ddist(newval)
if (runif(1) < newprob/oldprob) {
vals[i] <- newval
oldprob <- newprob
} else vals[i] <- vals[i-1]
}
# Final view
hist(vals, breaks = 100)
# and comparison
hist(rnorm(length(vals)), breaks = 100)

Resources