"sample" and "rbinom" functions in R - r

I guess it has been asked before, but I'm still a bit rusty about "sample" and "rbinom" functions in R, and would like to ask the following two simple questions:
a) Let's say we have:
rbinom(n = 5, size = 1, prob = c(0.9,0.2,0.3))
So "n" = 5 but "prob" is only indicated for three of them. What values R assigns for these two n's?
b) Let's say we have:
sample(x = 1:3, size = 1, prob = c(.5,0.2,0.9))
According to R-help (?sample):
The optional prob argument can be used to give a vector of weights
for obtaining the elements of the vector being sampled.
They need not sum to one, but they should be non-negative and not all zero.
The question would be: why "prob" does not need sum to one?
Any answers would be very appreciated: thank you!

From the documentation for rbinom:
The numerical arguments other than n are recycled to the length of the result.
This means that in your example the prob vector you pass in will be recycled until it reaches the required length (presumably 5). So the vector which will be used is:
c(0.9, 0.2, 0.3, 0.9, 0.2)
As for the sample function, as #thelatemail pointed out the probabilities do not have to sum to 1. It appears that the prob vector gets normalized to 1 internally.

Related

Incorrect number of probabilities

Arrivals <- sample(c(0,1,2,3,4), size=1, prob = c(.15,.25,.3,.2,.1),replace = TRUE)
Buyers <- sample(Arrivals, size=1, prob = .6, replace = TRUE)
I want to take a sample of a sample.
Here Arrivals give me back a single integer. Yet I still get the error
Error in sample.int(x, size, replace, prob) :
incorrect number of probabilities
I found many answers on here that say that X and Prob need to be the same length and is the typical reason for the error.
But X (Arrivals) and the Prob are the same length and I still get the error.
Any idea why?
If you pass a single numeric value x into sample(), it thinks you want to sample from 1 to x. That's why it is telling you that you have the wrong number of probabilities in your second sample() call for Buyers.
For example, if Arrival is set to 2, then calling sample(Arrivals) is saying "I want to sample from c(1, 2). But you only provide one probability, instead of two - that's why you get the error.
set.seed(123)
Arrivals <- sample(c(0,1,2,3,4), size=1, prob = c(.15,.25,.3,.2,.1), replace = TRUE) # returns 2
Buyers <- sample(Arrivals, size=1, prob = c(.6, .4), replace = TRUE) # runs without error
From the sample documentation:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x. Note that this convenience feature may lead to undesired behaviour when x is of varying length in calls such as sample(x). See the examples.

Gibbs sampling in R: Error The number of variables in initial values does NOT match no_var

I have a sample of 50 random variables that follows a gamma (5,5) distribution. I have stored the vector of the variables as y i. e. y<-c(5.888242, 4.828093,...
Now, the r.variables are of two types; type 1 and type 2. I have stored the types vector as s i. e s<-c(1, 2, 1, 1,...
The probability of obtaining a variable of type 1 is 0.7, implying that if the length of y is 50then I would expect to have 35variables of type 1.
I am trying to use Gibbs sampling technique to obtain a sample of length 35, belonging to type 1, as well as another sample of length 15 belonging to type 2. I have the following code for obtaining a sample of type 1:
library(gibbs.met)
log_gamma<-dgamma(y,5, 5, log = TRUE)
gibbs_met(log_f = log_gamma,no_var = 35,
ini_value = 0,iters = 500,
stepsizes_met = 0.5)
When I run the above code, i get an error
The number of variables in initial values does NOT match no_var
Kindly assist on how to go about it.
Regards.
First of all, I'm not familiar with gibbs_met, actually I don't know what I'm doing here. But there a few things to change in your code to work
gibbs_met(log_f=dgamma,
no_var = 1,
ini_value = 1,
iters = 500,
iters_met = 2,
stepsizes_met = 0.5,
shape=5, rate=5, log=TRUE)
First gibbs_met needs the log_function, so you have to provide an actual function not a vector based on a given distribution. The arguments of the function dgamma are given to gibbs_met as stated above.
Second ini_value is a vector of length no_var. So either no_var = 1 (for ini_value being a number) or ini_value = rep([startvalue],35) or ini_value = c( ... ) with length 35 for no_var = 35.
The ini_value needs to have probability > 0, so [startvalue] has to be > 0.
Third the argument iters_met is missing, so I set it to 2.
Please take a closer look at the Reference manual since I don't know what I'm doing here.

How should I specify argument "prob" when using sample() for resampling?

In short
I'm trying to better understand the argument prob as part of the function sample in R. In what follows, I both ask a question, and provide a piece of R code in connection with my question.
Question
Suppose I have generated 10,000 random standard rnorms. I then want to draw a sample of size 5 from this mother 10,000 standard rnorms.
How should I set the prob argument within the sample such that the probability of drawing these 5 numbers from the mother rnorm considers that the middle areas of the mother rnorm are denser but tail areas are thinner (so in drawing these 5 numbers it would draw from the denser areas more frequently than the tail areas)?
x = rnorm(1e4)
sample( x = x, size = 5, replace = TRUE, prob = ? ) ## what should be "prob" here?
# OR I leave `prob` to be the default by not using it:
sample( x = x, size = 5, replace = TRUE )
Overthinking is devil.
You want to resample these samples, following the original distribution or an empirical distribution. Think about how an empirical CDF is obtained:
plot(sort(x), 1:length(x)/length(x))
In other words, the empirical PDF is just
plot(sort(x), rep(1/length(x), length(x)))
So, we want prob = rep(1/length(x), length(x)) or simply, prob = rep(1, length(x)) as sample normalizes prob internally. Or, just leave it unspecified as equal probability is default.

How to vectorise sampling from non-identically distributed Bernoulli random variables?

Given a sequence of independent but not identically distributed Bernoulli trials with success probabilities given by a vector, e.g.:
x <- seq(0, 50, 0.1)
prob <- - x*(x - 50)/1000 # trial probabilities for trials 1 to 501
What is the most efficient way to obtain a random variate from each trial? I am assuming that vectorisation is the way to go.
I know of two functions that give Bernoulli random variates:
rbernoulli from the package purr, which does not accept a vector of success probabilities as an input. In this case it may be possible to wrap the function in an apply type operation.
rbinom with arguments size = 1 gives Bernoulli random variates. It also accepts a vector of probabilities, so that:
rbinom(n = length(prob), size = 1, prob = prob)
gives an output with the right length. However, I am not entirely sure that this is actually what I want. The bits in the helpfile ?rbinom that seem relevant are:
The length of the result is determined by n for rbinom, and is the
maximum of the lengths of the numerical arguments for the other
functions.
The numerical arguments other than n are recycled to the length of the
result. Only the first elements of the logical arguments are used.
However, n is a parameter with no default, so I am not sure what the first sentence means. I presume the second sentence means that I get what I want, since only size = 1 should be recycled. However this thread seems to suggest that this method does not work.
This blog post gives some other methods as well. One commentator mentions my suggested idea using rbinom.
Another way to test that rbinom is vectorised for prob, taking advantage of the fact that the sum of N bernoulli random variables is a binomial random variable with denominator N:
x <- seq(0, 50, 0.1)
prob <- -x*(x - 50)/1000
n <- rbinom(prob, size=1000, prob)
par(mfrow=c(1, 2))
plot(prob ~ x)
plot(n ~ x)
If you don't trust random strangers on the internet and do not understand documentation, maybe you can convince yourself by testing. Just set the random seed to get reproducible results:
x <- seq(0, 50, 0.1)
prob <- - x*(x - 50)/1000
#501 seperate draws of 1 random number
set.seed(42)
res1 <- sapply(prob, rbinom, n = 1, size = 1)
#501 "simultaneous" (vectorized) draws
set.seed(42)
res2 <- rbinom(501, 1, prob)
identical(res1, res2)
#[1] TRUE

Generate random number with given probability

I have a question which is basically the vectorized R solution to the following matlab problem:
Generate random number with given probability matlab
I'm able to generate the random event outcome based on random uniform number and the given probability for each single event (summing to 100% - only one event may happen) by:
sum(runif(1,0,1) >= cumsum(wdOff))
However the function only takes a single random uniform number, whereas I want it to take a vector of random uniform numbers and output the corresponding events for these entries.
So basically I'm looking for the R solution to Oleg's vectorized solution in matlab (from the comments to the matlab solution):
"Vectorized solution: sum(bsxfun(#ge, r, cumsum([0, prob]),2) where r is a column vector and prob a row vector. – Oleg"
Tell me if you need more information.
You could just do a weighted random sample, without worrying about your cumsum method:
sample(c(1, 2, 3), size = 100, replace = TRUE, prob = c(0.5, 0.1, 0.4))
If you already have the numbers, you could also do:
x <- runif(10, 0, 1)
as.numeric(cut(x, breaks = c(0, 0.5, 0.6, 1)))

Resources