R- How to randomly select from two different probabilities? - r

I am looking to assign two different weighted probabilities (0.3 and 0.7). Each time (loop?) i want R to randomly select one of the two weighted probabilities. Then i need those data output in a dataframe.
I've tried a number of ways with no success. I am still a beginner.
Here is my noob code which doesn't work. I tried to sample the columns randomly.
Thanks.
roll_it<-c(1:100)
n70<- 1; p70<-7/10
n30<- 1; p30<-3/10
for( i in roll_it){
result <- c(rbinom(n30, 1, p30), rbinom(n70, 1, p70))
print(result)}
sample(result(1:2,), size=2, replace = F)

If I understand your your goal correctly you don't need to use a loop at all. Just create two samples and then sample from that mixture distribution.
set.seed(1)
cc <- cbind(rbinom(100, 1, 0.3), rbinom(100, 1, 0.7))
colMeans(cc)
# [1] 0.32 0.70
sample(cc, 2)
# [1] 0 0

Use sample().
roll_it < -c(1:100)
n70<- 1
n30<- 1
probabilities <- c(0.3, 0.7)
for(i in roll_it){
result <- rbinom(x = n30, size = 1, prob = sample(probabilities, 1))
print(result)
}
May I add, I'm not sure why are you using n30 and n70. Could you just use one of them?

Related

How to generate a series of number by function of sample in R, with given different probability in each try?

For example I have a vector about possibility is
myprob <- (0.58, 0.51, 0.48, 0.46, 0.62)
And I want to sampling a series of number between 1 and 0 each time by the probability of c(1-myprob, myprob),
which means in the first number in the series, the function sample 1 and 0 by (0.42, 0.58), the second by (0.49, 0.50) and so on,
how can I generate the 5 numbers by sample?
The syntax of
Y <- sample(c(1,0), 1, replace=F, prob=c(1-myprob, prob))
would have incorrect number of probabilities and only 1 number output if I specify the prob;
while the syntax of
Y <- sample(c(1,0), 5, replace=F, prob=c(1-myprob, prob))
would have the probabilities focus on only 0.62(or not I am not sure, but the results seems not correct at all)
Thanks for any reply in advance!
If myprob is the probability of drawing 1 for each iteration, then you can use rbinom, with n = 5 and size = 1 (5 iterations of a 1-0 draw).
set.seed(2)
rbinom(n = 5, size = 1, prob = myprob)
[1] 1 0 1 0 0
Maël already proposed a great solution sampling from a binomial distribution. There are probably many more alternatives and I just wanted to suggest two of them:
runif()
as.integer(runif(5) > myprob)
This will first generate a series of 5 uniformly distributed random numbers between 0 and 1, then compare that vector against myprob and convert the logical values TRUE/FALSE to 1/0.
vapply(sample())
vapply(myprob, function(p) sample(1:0, 1, prob = c(1-p, p)), integer(1))
This is what you may have been looking for in the first place. This executes the sample() command by iterating over the values of myprob as p and returns the 5 draws as a vector.

how would you count the number of elements that are true in vector?

PDF=Fr(r)=1/(1+r)^2 and Rsample=Xsample/Ysample where X,Y are independent exponential distributions with rate = 0.001.xsample=100 values stored in x,ysample=100 values stored in y.
Find the CDF FR(r) corresponding to the PDF and evaluate this at r ∈{0.1,0.2,0.25,0.5,1,2,4,5,10}. Find the proportions of values in R-sample less than each of these values of r and plot the proportions against FR(0.1), FR(0.2), ... ,FR(5),FR(10). What does this plot show?
I know that the CDF is the integral of the pdf but wouldn't this give me negative values of r.also for the proportions section how would you count the number of elements that are true, that is the number of elements for which R-sample is less than each element of r.
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
for(i in 1:9)
{
x=Rsample<r[i]
prop[i]=c(TRUE,FALSE)
}
sum(prop[i])
You've made a few different errors here. The solution should look something like this.
Start by defining your variables and drawing your samples from the exponential distribution using rexp(100, 0.001):
r <- c(0.1, 0.2, 0.25, 0.5, 1, 2, 4, 5, 10)
set.seed(69) # Make random sample reproducible
x <- rexp(100, 0.001) # 100 random samples from exponential distribution
y <- rexp(100, 0.001) # 100 random samples from exponential distribution
Rsample <- x/y
The tricky part is getting the proportion of Rsample that is less than each value of r. For this we can use sapply instead of a loop.
props <- sapply(r, function(x) length(which(Rsample < x))/length(Rsample))
We get the cdf from the pdf by integrating (not shown):
cdf_at_r <- 1/(-r-1) # Integral of 1/(1+r)^2 at above values of r
And we can see what happens when we plot the proportions that are less than the sample against the cdf:
plot(cdf_at_r, props)
# What do we notice?
lines(c(-1, 0), c(0, 1), lty = 2, col = "red")
Created on 2020-03-05 by the reprex package (v0.3.0)
This is how you can count the number of elements for which R-sample is less than each element of r:
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
less = 0;
for(i in 1:9)
{
if (Rsample<r[i]) {
less = less + 1
}
}
sum(prop[i])
less

Split the data in R, split into percentage

I have a dataset corresponding to different types datasets. Then how it is possible to calculate case.
Data should be split into one case: 1) First Case - 15% of train data & 5% test
How to write it correctly?
Without createDataPartition, an easy way will be as follows.
Suppose you want train_prop as training set and test_prop as test set from the dataset my_dataset. Ideally, their sum will be 1, or 1-val_prop, but here you want 15% and 5% for some reason. So you'll need 0.15 and 0.05 respectively.
indices <- sample(x = rep.int(x = c(0, 1, 2),
times = round(nrow(my_dataset) * c(1 - train_prop - test_prop, train_prop, test_prop))))
train_set <- my_dataset[indices == 1,]
test_set <- my_dataset[indices == 2,]

Generate random numbers with rbinom but exclude 0s from the range

I need to generate random numbers with rbinom but I need to exclude 0 within the range.
How can I do it?
I would like something similar to:
k <- seq(1, 6, by = 1)
binom_pdf = dbinom(k, 322, 0.1, log = FALSE)
but I need to get all the relative dataset, because if I do the following:
binom_ran = rbinom(100, 322, 0.1)
I get values from 0 to 100.
Is there any way I can get around this?
Thanks
Let`s suppose that we have the fixed parameters:
n: number of generated values
s: the size of the experiment
p: the probability of a success
# Generate initial values
U<-rbinom(n,s,p)
# Number and ubication of zero values
k<-sum(U==0)
which.k<-which(U==0)
# While there is still a zero, . . . generate new numbers
while(k!=0){
U[which.k]<-rbinom(k,s,p)
k<-sum(U==0)
which.k<-which(U==0)
# Print how many zeroes are still there
print(k)
}
# Print U (without zeroes)
U
In addition to the hit and miss approach, if you want to sample from the conditional distribution of a binomial given that the number of successes is at least one, you can compute the conditional distribution then directly sample from it.
It is easy to work out that if X is binomial with parameters p and n, then
P(X = x | X > 0) = P(X = x)/(1-p)
Hence the following function will work:
rcond.binom <- function(k,n,p){
probs <- dbinom(1:n,n,p)/(1-p)
sample(1:n,k,replace = TRUE,prob = probs)
}
If you are going to call the above function numerous times with the same n and p then you can just precompute the vector probs and simply use the last line of the function whenever you need it.
I haven't benchmarked it, but I suspect that the hit-and-miss approach is preferable when k is small, p not too close to 0, and n large, but for larger k larger, p closer to 0, and n smaller then the above might be preferable.

Sampling Distribution from a data-set with one column

I want to create a sampling distribution for a mean. I have a variable x with at least ten thousand values. I want take 500 samples (n=10) and then show the distribution of the sample means in a histogram. I think it worked with the following, but can anyone check if this is what i meant and tell me what the 2 within the apply function stands for?
x <- rnorm(10000, 7.5, 1.5)
draws = sample(x, size = 10 * 500, replace = TRUE)
draws = matrix(draws, 10)
drawmeans = apply(draws, 2, mean)
hist(drawmeans)
would be sincerely appreciated!
You could do this using replicate if you wanted. One of lots of different ways. For data frame df
out = replicate(500, mean(sample(df$Scores,10)))
hist(out)

Resources