Find first increasing value in vector - r

I draw a random sample from Uniform Distribution by
u <- runif (1000,0,1)
Now I want to calculate the value of this random variable
N = min_n {n : u_n > u_{n-1}}
Edit
Let say I draw a random sample of size 10.
So, I have u= (u_1,u_2,u_3,...,u_10). Now I want to find minimum n for which u_n > u_{n-1}

If you take the difference (using diff) then you're looking for where the difference is greater than 0. We search for the first time that happens
u <- c(.5, .4, .3, .6)
min(which(diff(u) > 0))
This gives us 3 which is close to what we want but not exactly. Since this will return 1 if the first difference is greater than 0 what we really want to do is add 1 to the result
min(which(diff(u) > 0))) + 1
which should give what we want. This will give a warning if your sequence is strictly descending though since it can't find a value that meets the criteria. We could code in some tests and decide on the appropriate output in that case but I'll leave that as an exercise for the reader.

Related

R How to sample from an interrupted upside down bell curve

I've asked a related question before which successfully received an answer. Now I want to sample values from an upside down bell curve but exclude a range of values that fall in the middle of it like shown on the picture below:
I have this code currently working:
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
How may I adapt it to achieve the desired output?
Say you want a sample of 10,000 from your distribution but don't want any numbers between 5 and 15 in your sample. Why not just do:
q <- min + (max-min)*rbeta(50000, 0.5, 0.5);
q <- q[!(q > 5 & q < 15)][1:10000]
Which gives you this:
hist(q)
But still has the correct size:
length(q)
#> [1] 10000
An "upside-down bell curve" compared to the normal distribution, with the exclusion of a certain interval, can be sampled using the following algorithm. I write it in pseudocode because I'm not familiar with R. I adapted it from another answer I just posted.
Notice that this sampler samples in a truncated interval (here, the interval [x0, x1], with the exclusion of [x2, x3]) because it's not possible for an upside-down bell curve extended to infinity to integrate to 1 (which is one of the requirements for a probability density).
In the pseudocode, RNDU01() is a uniform(0, 1) random number.
x0pdf = 1-exp(-(x0*x0))
x1pdf = 1-exp(-(x1*x1))
ymax = max(x0pdf, x1pdf)
while true
# Choose a random x-coordinate
x=RNDU01()*(x1-x0)+x0
# Choose a random y-coordinate
y=RNDU01()*ymax
# Return x if y falls within PDF
if (x<x2 or x>x3) and y < 1-exp(-(x*x)): return x
end

Generate random numbers with rbinom but exclude 0s from the range

I need to generate random numbers with rbinom but I need to exclude 0 within the range.
How can I do it?
I would like something similar to:
k <- seq(1, 6, by = 1)
binom_pdf = dbinom(k, 322, 0.1, log = FALSE)
but I need to get all the relative dataset, because if I do the following:
binom_ran = rbinom(100, 322, 0.1)
I get values from 0 to 100.
Is there any way I can get around this?
Thanks
Let`s suppose that we have the fixed parameters:
n: number of generated values
s: the size of the experiment
p: the probability of a success
# Generate initial values
U<-rbinom(n,s,p)
# Number and ubication of zero values
k<-sum(U==0)
which.k<-which(U==0)
# While there is still a zero, . . . generate new numbers
while(k!=0){
U[which.k]<-rbinom(k,s,p)
k<-sum(U==0)
which.k<-which(U==0)
# Print how many zeroes are still there
print(k)
}
# Print U (without zeroes)
U
In addition to the hit and miss approach, if you want to sample from the conditional distribution of a binomial given that the number of successes is at least one, you can compute the conditional distribution then directly sample from it.
It is easy to work out that if X is binomial with parameters p and n, then
P(X = x | X > 0) = P(X = x)/(1-p)
Hence the following function will work:
rcond.binom <- function(k,n,p){
probs <- dbinom(1:n,n,p)/(1-p)
sample(1:n,k,replace = TRUE,prob = probs)
}
If you are going to call the above function numerous times with the same n and p then you can just precompute the vector probs and simply use the last line of the function whenever you need it.
I haven't benchmarked it, but I suspect that the hit-and-miss approach is preferable when k is small, p not too close to 0, and n large, but for larger k larger, p closer to 0, and n smaller then the above might be preferable.

R: draw from a vector using custom probability function

Forgive me if this has been asked before (I feel it must have, but could not find precisely what I am looking for).
Have can I draw one element of a vector of whole numbers (from 1 through, say, 10) using a probability function that specifies different chances of the elements. If I want equal propabilities I use runif() to get a number between 1 and 10:
ceiling(runif(1,1,10))
How do I similarly sample from e.g. the exponential distribution to get a number between 1 and 10 (such that 1 is much more likely than 10), or a logistic probability function (if I want a sigmoid increasing probability from 1 through 10).
The only "solution" I can come up with is first to draw e6 numbers from the say sigmoid distribution and then scale min and max to 1 and 10 - but this looks clumpsy.
UPDATE:
This awkward solution (and I dont feel it very "correct") would go like this
#Draw enough from a distribution, here exponential
x <- rexp(1e3)
#Scale probs to e.g. 1-10
scaler <- function(vector, min, max){
(((vector - min(vector)) * (max - min))/(max(vector) - min(vector))) + min
}
x_scale <- scaler(x,1,10)
#And sample once (and round it)
round(sample(x_scale,1))
Are there not better solutions around ?
I believe sample() is what you are looking for, as #HubertL mentioned in the comments. You can specify an increasing function (e.g. logit()) and pass the vector you want to sample from v as an input. You can then use the output of that function as a vector of probabilities p. See the code below.
logit <- function(x) {
return(exp(x)/(exp(x)+1))
}
v <- c(seq(1,10,1))
p <- logit(seq(1,10,1))
sample(v, 1, prob = p, replace = TRUE)

Confusion Between 'sample' and 'rbinom' in R

Why are these not equivalent?
#First generate 10 numbers between 0 and .5
set.seed(1)
x <- runif(10, 0, .5)
These are the two statements I'm confused by:
#First
sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
#Second
rbinom(length(x), size = 1, prob=x)
I was originally trying to use 'sample'. What I thought I was doing was generating ten (0,1) pairs, then assigning the probability that each would return either a 0 or a 1.
The second one works and gives me the output I need (trying to run a sim). So I've been able to solve my problem. I'm just curious as to what's going on under the hood with 'sample' so that I can understand R better.
The first area of difference is the location of the length of the vector specification in the parameter list. The names size have different meanings in these two functions. (I hadn't thought about that source of confusion before, and I'm sure I have made this error myself many times.)
The random number generators (starting with r and having a distribution suffix) have that choice as the first parameter, whereas sample has it as the second parameter. So the length of the second one is 10 and the length of the first is 1. In sample the draw is from the values in the first argument, while 'size' is the length of the vector to create. In the rbinom function, n is the length of the vector to create, while size is the number of items to hypothetically draw from a theoretical urn having a distribution determined by 'prob'. The result returned is the number of "ones". Try:
rbinom(length(x), size = 10, prob=x)
Regarding the argument to prob: I don't think you need the c().
The difference between the two function is quite simple.
Think of a pack of shuffled cards, and choose a number of cards from it. That is exactly the situation that sample simulates.
This code,
> set.seed(123)
> sample(1:40, 5)
[1] 12 31 16 33 34
randomly extract five numbers from the 1:40 vector of numbers.
In your example, you set size = 1. It means you choose only one element from the pool of possible values. If you set size = 10 you will get ten values as you desire.
set.seed(1)
x <- runif(10, 0, .5)
> sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
[1] 0 0 0 0 0 0 0 1 0 1
Instead, the goal of the rbinom function is to simulate events where the results are "discrete", such as the flip of a coin. It considers, as parameters, the probability of success on a trial, such as the flip of the coin, according to a given probability of 0.5. Here we simulate 100 flips. If you think that the coin could be stacked in order to favor one specific outcome, we could simulate this behaviour by setting probability equals to 0.8, as in the example below.
> set.seed(123)
> table(rbinom(100, 1, prob = 0.5))
0 1
53 47
> table(rbinom(100, 1, prob = 0.8))
0 1
19 81

Draw random numbers from distribution within a certain range

I want to draw a number of random variables from a series of distributions. However, the values returned have to be no higher than a certain threshold.
Let’s say I want to use the gamma distribution and the threshold is 10 and I need n=100 random numbers. I now want 100 random number between 0 and 10. (Say scale and shape are 1.)
Getting 100 random variables is obviously easy...
rgamma(100, shape = 1, rate = 1)
But how can I accomplish that these values range from 0 to 100?
EDIT
To make my question clearer. The 100 values drawn should be scaled beween 0 and 10. So that the highest drawn value is 10 and the lowest 0. Sorry if this was not clear...
EDIT No2
To add some context to the random numbers I need: I want to draw "system repair times" that follow certain distributions. However, within the system simulation there is a binomial probability of repairs beeing "simple" (i.e. short repair time) and "complicated" (i.e. long repair time). I now need a function that provides "short repair times" and one that provides "long repair times". The threshold would be the differentiation between short and long repair times. Again, I hope this makes my question a little clearer.
This is not possible with a gamma distribution.
The support of a distribution determine the range of sample data drawn from it.
As the support of the gamma distribution is (0,inf) this is not possible.(see https://en.wikipedia.org/wiki/Gamma_distribution).
If you really want to have a gamma distribution take a rejection sampling approach as Alex Reynolds suggests.
Otherwise look for a distribution with a bounded/finite support (see https://en.wikipedia.org/wiki/List_of_probability_distributions)
e.g. uniform or binomial
Well, fill vector with rejection, untested code
v <- rep(-1.0, 100)
k <- 1
while (TRUE) {
q <- rgamma(1, shape=1, rate=1)
if (q > 0.0 && q < 100) {
v[k] <- q
k<-k+1
if (k>100)
break
}
}
I'm not sure you can keep the properties of the original distribution, imposing additional conditions... But something like this will do the job:
Filter(function(x) x < 10, rgamma(1000,1,1))[1:100]
For the scaling - beware, the outcome will not follow the original distribution (but there's no way to do it, as the other answers pointed out):
# rescale numeric vector into (0, 1) interval
# clip everything outside the range
rescale <- function(vec, lims=range(vec), clip=c(0, 1)) {
# find the coeficients of transforming linear equation
# that maps the lims range to (0, 1)
slope <- (1 - 0) / (lims[2] - lims[1])
intercept <- - slope * lims[1]
xformed <- slope * vec + intercept
# do the clipping
xformed[xformed < 0] <- clip[1]
xformed[xformed > 1] <- clip[2]
xformed
}
# this is the requested data
10 * rescale(rgamma(100,1,1))
Use truncdist package. It truncates any distribution between upper and lower bounds.
Hope that helped.

Resources