how would you count the number of elements that are true in vector? - r

PDF=Fr(r)=1/(1+r)^2 and Rsample=Xsample/Ysample where X,Y are independent exponential distributions with rate = 0.001.xsample=100 values stored in x,ysample=100 values stored in y.
Find the CDF FR(r) corresponding to the PDF and evaluate this at r ∈{0.1,0.2,0.25,0.5,1,2,4,5,10}. Find the proportions of values in R-sample less than each of these values of r and plot the proportions against FR(0.1), FR(0.2), ... ,FR(5),FR(10). What does this plot show?
I know that the CDF is the integral of the pdf but wouldn't this give me negative values of r.also for the proportions section how would you count the number of elements that are true, that is the number of elements for which R-sample is less than each element of r.
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
for(i in 1:9)
{
x=Rsample<r[i]
prop[i]=c(TRUE,FALSE)
}
sum(prop[i])

You've made a few different errors here. The solution should look something like this.
Start by defining your variables and drawing your samples from the exponential distribution using rexp(100, 0.001):
r <- c(0.1, 0.2, 0.25, 0.5, 1, 2, 4, 5, 10)
set.seed(69) # Make random sample reproducible
x <- rexp(100, 0.001) # 100 random samples from exponential distribution
y <- rexp(100, 0.001) # 100 random samples from exponential distribution
Rsample <- x/y
The tricky part is getting the proportion of Rsample that is less than each value of r. For this we can use sapply instead of a loop.
props <- sapply(r, function(x) length(which(Rsample < x))/length(Rsample))
We get the cdf from the pdf by integrating (not shown):
cdf_at_r <- 1/(-r-1) # Integral of 1/(1+r)^2 at above values of r
And we can see what happens when we plot the proportions that are less than the sample against the cdf:
plot(cdf_at_r, props)
# What do we notice?
lines(c(-1, 0), c(0, 1), lty = 2, col = "red")
Created on 2020-03-05 by the reprex package (v0.3.0)

This is how you can count the number of elements for which R-sample is less than each element of r:
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
less = 0;
for(i in 1:9)
{
if (Rsample<r[i]) {
less = less + 1
}
}
sum(prop[i])
less

Related

Monte Carlo simulation from a pdf using runif

I'm given a pdf for X where f(x) = 2x when x is between 0 and 1, and f(x) = 0 otherwise. In class we learned to sample from a uniform distribution and transform the data to solve for y, however, I'm unsure how to apply that here because if I generate data from a uniform distribution then most of it will be between 0 and 1.
Am I doing these steps in the wrong order? It just seems weird to have a PDF that will lead to most of the data just being multiplied by 2.
I will use R's convention of naming PDF's with an initial d and CDF's with an initial p.
It is very simple. Compute the antiderivative of dmydist(x) = 2*x to get pmydist = sqrt(x). The associate RNG is immediate.
dmydist <- function(x) {
ifelse(x >= 0 & x <= 1, 2*x, 0)
}
pmydist <- function(y) {
ifelse(x >= 0 & x <= 1, sqrt(y), 0)
}
rmydist <- function(n) pmydist(runif(n))
set.seed(1234)
x <- rmydist(10000)
hist(x, prob = TRUE)
lines(seq(0, 1, by = 0.01), dmydist(seq(0, 1, by = 0.01)))
There are many ways how to do this. One way could be with rejection sampling https://en.wikipedia.org/wiki/Rejection_sampling. Simply put:
Sample a point on the x-axis from the proposal distribution.
Draw a vertical line at this x-position, up to the curve of the proposal distribution.
Sample uniformly along this line from 0 to the maximum of the probability density function. If the sampled value is greater than the value of the desired distribution at this vertical line, return to step 1.
n=1e5
x=runif(n)
t=runif(n)
hist(x[ifelse(2*t<2*x,T,F)])

Sample from a custom likelihood function

I have the following likelihood function which I used in a rather complex model (in practice on a log scale):
library(plyr)
dcustom=function(x,sd,L,R){
R. = (log(R) - log(x))/sd
L. = (log(L) - log(x))/sd
ll = pnorm(R.) - pnorm(L.)
return(ll)
}
df=data.frame(Range=seq(100,500),sd=rep(0.1,401),L=200,U=400)
df=mutate(df, Likelihood = dcustom(Range, sd,L,U))
with(df,plot(Range,Likelihood,type='l'))
abline(v=200)
abline(v=400)
In this function, the sd is predetermined and L and R are "observations" (very much like the endpoints of a uniform distribution), so all 3 of them are given. The above function provides a large likelihood (1) if the model estimate x (derived parameter) is in between the L-R range, a smooth likelihood decrease (between 0 and 1) near the bounds (of which the sharpness is dependent on the sd), and 0 if it is too much outside.
This function works very well to obtain estimates of x, but now I would like to do the inverse: draw a random x from the above function. If I would do this many times, I would generate a histogram that follows the shape of the curve plotted above.
The ultimate goal is to do this in C++, but I think it would be easier for me if I could first figure out how to do this in R.
There's some useful information online that helps me start (http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution, https://stats.stackexchange.com/questions/88697/sample-from-a-custom-continuous-distribution-in-r) but I'm still not entirely sure how to do it and how to code it.
I presume (not sure at all!) the steps are:
transform likelihood function into probability distribution
calculate the cumulative distribution function
inverse transform sampling
Is this correct and if so, how do I code this? Thank you.
One idea might be to use the Metropolis Hasting Algorithm to obtain a sample from the distribution given all the other parameters and your likelihood.
# metropolis hasting algorithm
set.seed(2018)
n_sample <- 100000
posterior_sample <- rep(NA, n_sample)
x <- 300 # starting value: I chose 300 based on your likelihood plot
for (i in 1:n_sample){
lik <- dcustom(x = x, sd = 0.1, L = 200, R =400)
# propose a value for x (you can adjust the stepsize with the sd)
x.proposed <- x + rnorm(1, 0, sd = 20)
lik.proposed <- dcustom(x = x.proposed, sd = 0.1, L = 200, R = 400)
r <- lik.proposed/lik # this is the acceptance ratio
# accept new value with probablity of ratio
if (runif(1) < r) {
x <- x.proposed
posterior_sample[i] <- x
}
}
# plotting the density
approximate_distr <- na.omit(posterior_sample)
d <- density(approximate_distr)
plot(d, main = "Sample from distribution")
abline(v=200)
abline(v=400)
# If you now want to sample just a few values (for example, 5) you could use
sample(approximate_distr,5)
#[1] 281.7310 371.2317 378.0504 342.5199 412.3302

Creating a histogram from iterations of a binomial distribution in R

Here are the instructions:
Create 10,000 iterations (N = 10,000) of
rbinom(50,1, 0.5) with n = 50 and your guess of p0 = 0.50 (hint: you will need to
construct a for loop). Plot a histogram of the results of the sample. Then plot your
pstar on the histogram. If pstar is not in the extreme region of the histogram, you would
assume your guess is correct and vice versa. Finally calculate the probability that
p0 < pstar (this is a p value).
I know how to create the for loop and the rbinom function, but am unsure on how transfer this information to plotting on a histogram, in addition to plotting a custom point (my guess value).
I'm not doing your homework for you, but this should get you started. You don't say what pstar is supposed to be, so I am assuming you are interested in the (distribution of the) maximum likelihood estimates for p.
You create 10,000 N=50 binomial samples (there is no need for a for loop):
sample <- lapply(seq(10^5), function(x) rbinom(50, 1, 0.5))
The ML estimates for p are then
phat <- sapply(sample, function(x) sum(x == 1) / length(x))
Inspect the distribution
require(ggplot)
ggplot(data.frame(phat = phat), aes(phat)) + geom_histogram(bins = 30)
and calculate the probability that p0 < phat.
Edit 1
If you insist, you can also use a for loop to generate your samples.
sample <- list();
for (i in 1:10^5) {
sample[[i]] <- rbinom(50, 1, 0.5);
}

Extract approximate probability density function (pdf) in R from random sampling

I have got n>2 independent continuous Random Variables(RV). For example say I have 4 Uniform RVs with different set of Upper and lowers.
W~U[-1,5], X~U[0,1], Y~[0,2], Z~[0.5,2]
I am trying to find out the approximate PDF for the sum of these RVs i.e. for T=W+X+Y+Z. As I don't need any closed form solution, I have sampled 1 million points for each of them to get 1 million samples for T. Is it possible in R to get the approximate PDF function or a way to get approximate probability of P(t<T)from this samples I have drawn. For example is there a easy way I can calculate P(0.5<T) in R. My priority here is to get probability first even if getting the density function is not possible.
Thanks
Consider the ecdf function:
set.seed(123)
W <- runif(1e6, -1, 5)
X <- runif(1e6, 0, 1)
Y <- runif(1e6, 0, 2)
Z <- runif(1e6, 0.5, 2)
T <- Reduce(`+`, list(W, X, Y, Z))
cdfT <- ecdf(T)
1 - cdfT(0.5) # Pr(T > 0.5)
# [1] 0.997589
See How to calculate cumulative distribution in R? for more details.

Generating random numbers in a specific interval

I want to generate some Weibull random numbers in a given interval. For example 20 random numbers from the Weibull distribution with shape 2 and scale 30 in the interval (0, 10).
rweibull function in R produce random numbers from a Weibull distribution with given shape and scale values. Can someone please suggest a method? Thank you in advance.
Use the distr package. It allows to do this kind of stuff very easily.
require(distr)
#we create the distribution
d<-Truncate(Weibull(shape=2,scale=30),lower=0,upper=10)
#The d object has four slots: d,r,p,q that correspond to the [drpq] prefix of standard R distributions
#This extracts 10 random numbers
d#r(10)
#get an histogram
hist(d#r(10000))
Using base R you can generate random numbers, filter which drop into target interval and generate some more if their quantity appears to be less than you need.
rweibull_interval <- function(n, shape, scale = 1, min = 0, max = 10) {
weib_rnd <- rweibull(10*n, shape, scale)
weib_rnd <- weib_rnd[weib_rnd > min & weib_rnd < max]
if (length(weib_rnd) < n)
return(c(weib_rnd, rweibull_interval(n - length(weib_rnd), shape, scale, min, max))) else
return(weib_rnd[1:n])
}
set.seed(1)
rweibull_interval(20, 2, 30, 0, 10)
[1] 9.308806 9.820195 7.156999 2.704469 7.795618 9.057581 6.013369 2.570710 8.430086 4.658973
[11] 2.715765 8.164236 3.676312 9.987181 9.969484 9.578524 7.220014 8.241863 5.951382 6.934886

Resources