In R: I am trying to figure out a way to generate vectors with values 0 or 1. Rather than drawing each 0 and 1 independtly from a uniform distribution I would like the 1s to come clustered e.g. (1,0,0,0,0,0,1,0,1,1,1,1,0,1,0,0,0,0,1,0,0,0,...). In its most simple form something like: "if the previous number was 1 then increase the likelihood of drawing 1". Or make the chance of drawing 1 be dependent of the sum of the last say 5 numbers drawn. Is there an efficient way of doing this, maybe even a package. Would be reminiscent of rbinom(n,1,prob) with variable prob.
You can try the following method using a loop. First you can create a variable called "x" using sample which will assign an initial value of 0 or 1.
Within the loop you can use the sample function again, but this time you assign values to the prob option. For this purpose I've set the probability to 70/30 split (ie if your previous number was a 0, there is a 70% chance that the next number will be a 0 and vice versa if your previous value was 1.)
x = sample(c(0,1),1)
for(i in 2:100){
if(x[i-1] == 0){
x[i] = sample(c(0,1),1,prob=c(0.7,0.3))
} else {
x[i] = sample(c(0,1),1,prob=c(0.3,0.7))
}
}
x[1:20]
[1] 1 1 1 0 0 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0
So I took good inspiration from Colin Charles, and added a little adjustability. There are obviously many ways to compute prob as being influenced by prior draws. I ended up using a cutoff m of the sum of the last w draws to determine whether to use low prob p0 or high prob p1 for each 0/1 to make vector of length l.
f <- function (l, w, m, p0, p1){
v = rbinom(w,1,p0) #Initilize with p0
for (i in w:(l-1)){
v[i+1] <- ifelse(sum(v[(i-w+1):i]) > m,
rbinom(1,1,p1),
rbinom(1,1,p0))
}
return(v)
}
#Test:
set.seed(8)
plot(f(100, 5, 1, 0.1, 0.6)) #Clustered
plot(f(100, 5, 2, 0.1, 0.4)) #Less clustered
Gives:
and (less clustered):
Related
For example I have a vector about possibility is
myprob <- (0.58, 0.51, 0.48, 0.46, 0.62)
And I want to sampling a series of number between 1 and 0 each time by the probability of c(1-myprob, myprob),
which means in the first number in the series, the function sample 1 and 0 by (0.42, 0.58), the second by (0.49, 0.50) and so on,
how can I generate the 5 numbers by sample?
The syntax of
Y <- sample(c(1,0), 1, replace=F, prob=c(1-myprob, prob))
would have incorrect number of probabilities and only 1 number output if I specify the prob;
while the syntax of
Y <- sample(c(1,0), 5, replace=F, prob=c(1-myprob, prob))
would have the probabilities focus on only 0.62(or not I am not sure, but the results seems not correct at all)
Thanks for any reply in advance!
If myprob is the probability of drawing 1 for each iteration, then you can use rbinom, with n = 5 and size = 1 (5 iterations of a 1-0 draw).
set.seed(2)
rbinom(n = 5, size = 1, prob = myprob)
[1] 1 0 1 0 0
Maël already proposed a great solution sampling from a binomial distribution. There are probably many more alternatives and I just wanted to suggest two of them:
runif()
as.integer(runif(5) > myprob)
This will first generate a series of 5 uniformly distributed random numbers between 0 and 1, then compare that vector against myprob and convert the logical values TRUE/FALSE to 1/0.
vapply(sample())
vapply(myprob, function(p) sample(1:0, 1, prob = c(1-p, p)), integer(1))
This is what you may have been looking for in the first place. This executes the sample() command by iterating over the values of myprob as p and returns the 5 draws as a vector.
I have a series composed by 0 and 1, and the 0 shows up without specfic order (as far as I can tell), how can I decide if the 0 is stochastically distributed?
pls find the toy sample for reference
library(magrittr)
s1 <- runif(10)*10 %>% mod(10) %>% round(0) %>% `>`(5) %>% ifelse(1,0)
s2 <- c(0,0,1,0,1,1,1,0,1,0)
The runs test is what you want:
The Wald–Wolfowitz runs test (or simply runs test), named after
statisticians Abraham Wald and Jacob Wolfowitz is a non-parametric
statistical test that checks a randomness hypothesis for a two-valued
data sequence. More precisely, it can be used to test the hypothesis
that the elements of the sequence are mutually independent.
It is implemented in the snpar package.
Are you looking for rbinom? This function simulates a Bernoulli process with a chance of success (1) equal to some probability p. Otherwise, the result is 0.
The usage of rbinom is rbinom(n, size, prob), where n is the number of random numbers to generate, size is the number of trials, and prob is the probability of getting a success. So to generate a bunch of binomial random numbers with equal probability of 1 or 0, use:
set.seed(100) # for reproducibility
rbinom(n = 10, size = 1, prob = 0.5)
[1] 0 0 1 0 0 0 1 0 1 0
I want to automatically calculate the probability with R. Rule : start with 0 points. We will flip a coin. If it comes up heads, we get a point. If comes up tails, we double our current score.
The functions I want to code:
Expected score after n flips (5flips, 15 flips...)
After n flips, what is the probability the score is a power of two (Express this probability as a number between 0 and 1)?
Standard deviation
The expected standard deviation of the scores?
I want my functions to adapt to rule changes. For example, 2/3 probability of heads, and a 1/3 probability of tails. What is our expected score after 10flips?
First, you want to think about what parameters the function needs to take. It appears it just needs to take the parameter n - the number of flips.
flips <- function(n){
}
Now, you can think about what needs to happen inside the function.
start with 0 points
add 1 if heads
double if tails
You also need to be able to do this n times, so it will need to be in a loop.
flips <- function(n){
## start with 0
sum <- 0
for(i in 1:n){
# create a flip (random draw of H or T)
flip <- sample(c("H", "T"), 1)
# identify what to do if flip is H
if(flip == "H"){
# increment sum by 1
sum <- sum + 1
# identify what to do if flip is not H (i.e., it is T)
}else{
sum <- sum*2
}
}
# return the sum
sum
}
flips(10)
# [1] 28
A function like this will code after n trials, what happens. That said, it seems like the questions you're trying to answer are more theoretical than they are about coding. If you can specify the operations you need to do, then we could probably help you code it.
Maybe you can start with building a function f like below which produces a series of random variables, where 0 and 1 denote head and tail respectively
f <- function(n,p) {
v <- sample(c(0,1),n,replace = TRUE,prob = c(p,1-p))
s <- 0
for (i in v) {
if (i == 1) {
s <- s*2
} else {
s <- s + 1
}
}
s
}
and then you can apply replicate to repeat the experiment, e.g.,
n <- 20
p <- 2/3
r <- replicate(1e6,f(n,p))
We will see
> mean(r)
[1] 629.074
> sd(r)
[1] 1326.681
Why are these not equivalent?
#First generate 10 numbers between 0 and .5
set.seed(1)
x <- runif(10, 0, .5)
These are the two statements I'm confused by:
#First
sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
#Second
rbinom(length(x), size = 1, prob=x)
I was originally trying to use 'sample'. What I thought I was doing was generating ten (0,1) pairs, then assigning the probability that each would return either a 0 or a 1.
The second one works and gives me the output I need (trying to run a sim). So I've been able to solve my problem. I'm just curious as to what's going on under the hood with 'sample' so that I can understand R better.
The first area of difference is the location of the length of the vector specification in the parameter list. The names size have different meanings in these two functions. (I hadn't thought about that source of confusion before, and I'm sure I have made this error myself many times.)
The random number generators (starting with r and having a distribution suffix) have that choice as the first parameter, whereas sample has it as the second parameter. So the length of the second one is 10 and the length of the first is 1. In sample the draw is from the values in the first argument, while 'size' is the length of the vector to create. In the rbinom function, n is the length of the vector to create, while size is the number of items to hypothetically draw from a theoretical urn having a distribution determined by 'prob'. The result returned is the number of "ones". Try:
rbinom(length(x), size = 10, prob=x)
Regarding the argument to prob: I don't think you need the c().
The difference between the two function is quite simple.
Think of a pack of shuffled cards, and choose a number of cards from it. That is exactly the situation that sample simulates.
This code,
> set.seed(123)
> sample(1:40, 5)
[1] 12 31 16 33 34
randomly extract five numbers from the 1:40 vector of numbers.
In your example, you set size = 1. It means you choose only one element from the pool of possible values. If you set size = 10 you will get ten values as you desire.
set.seed(1)
x <- runif(10, 0, .5)
> sample(rep(c(0,1), length(x)), size = 10, prob = c(rbind(1-x,x)), replace = F)
[1] 0 0 0 0 0 0 0 1 0 1
Instead, the goal of the rbinom function is to simulate events where the results are "discrete", such as the flip of a coin. It considers, as parameters, the probability of success on a trial, such as the flip of the coin, according to a given probability of 0.5. Here we simulate 100 flips. If you think that the coin could be stacked in order to favor one specific outcome, we could simulate this behaviour by setting probability equals to 0.8, as in the example below.
> set.seed(123)
> table(rbinom(100, 1, prob = 0.5))
0 1
53 47
> table(rbinom(100, 1, prob = 0.8))
0 1
19 81
How can one choose a number with a specific probability p?
Say we must choose between {0, 1} and the probability p stands for choosing 1.
So when p=0.8 we choose 1 with 80% and 0 with 20%.
Is there a simple solution in R for this?
Take a look at sample function.
> set.seed(1)
> sample(c(0,1), size=10, replace=TRUE, prob=c(0.2,0.8))
[1] 1 1 1 0 1 0 0 1 1 1
From the helpfile you can read:
sample takes a sample of the specified size from the elements of x using either with or without replacement.
and the argument prob in sample acts as ...
A vector of probability weights for obtaining the elements of the vector being sampled.