How can I get cumulative measure of data in increasing fashion - graph

I have a data with two columns like following
Column_1 , Column_2
1 , 0.474124203822
2 , 0.545760430686
3 , 0.614420062696
4 , 0.654518950437
5 , 0.696226415094
6 , 0.6875
For simplicity, you can consider the data like
Column_2 = Probability of success when (X=column_1)
The relationship is somewhat increasing. Now If I just plot the data upto 30 points as a line graph I will obtain the following
Now, my question is how can I plot my data in a cumulative fashion (using what measure) like the following simple example
col_1(age) , col_2(Total cumulative number of people <= age)
10 , 200
20 , 1000
30 , 5000
Please let me know if my description is not clear enough or you have additional question.

Given your probability mass function, you can compute the cumulative mass function as follows.
// Probability mass function
pmf = [0.1, 0.3, 0.2, 0.1, 0.3]
// Cumulative mass function
cmf = [0, 0, 0, 0, 0]
cmf[0] = pmf[0]
for i = 1, 2, 3, 4
cmf[i] = cmf[i - 1] + pmf[i]
Now simply plot your cumulative mass function instead of your probability mass function.

Related

How to generate a series of number by function of sample in R, with given different probability in each try?

For example I have a vector about possibility is
myprob <- (0.58, 0.51, 0.48, 0.46, 0.62)
And I want to sampling a series of number between 1 and 0 each time by the probability of c(1-myprob, myprob),
which means in the first number in the series, the function sample 1 and 0 by (0.42, 0.58), the second by (0.49, 0.50) and so on,
how can I generate the 5 numbers by sample?
The syntax of
Y <- sample(c(1,0), 1, replace=F, prob=c(1-myprob, prob))
would have incorrect number of probabilities and only 1 number output if I specify the prob;
while the syntax of
Y <- sample(c(1,0), 5, replace=F, prob=c(1-myprob, prob))
would have the probabilities focus on only 0.62(or not I am not sure, but the results seems not correct at all)
Thanks for any reply in advance!
If myprob is the probability of drawing 1 for each iteration, then you can use rbinom, with n = 5 and size = 1 (5 iterations of a 1-0 draw).
set.seed(2)
rbinom(n = 5, size = 1, prob = myprob)
[1] 1 0 1 0 0
Maël already proposed a great solution sampling from a binomial distribution. There are probably many more alternatives and I just wanted to suggest two of them:
runif()
as.integer(runif(5) > myprob)
This will first generate a series of 5 uniformly distributed random numbers between 0 and 1, then compare that vector against myprob and convert the logical values TRUE/FALSE to 1/0.
vapply(sample())
vapply(myprob, function(p) sample(1:0, 1, prob = c(1-p, p)), integer(1))
This is what you may have been looking for in the first place. This executes the sample() command by iterating over the values of myprob as p and returns the 5 draws as a vector.

Simulation in R for multistage sampling

I am trying to simulate a multi stage sampling scheme as following:
I have N = 90 (Population) size
I need to choose 20 sample out of N = 90 (without replacement) with equal probability
Then I need choose 17 sample out 20 (so in this case my previous sample become population i.e. N=20) without replacement and with equal probability
Then I need to find 3(20-17) samples and choose 2 sample out of 3 (Here N = 3) without replacement and with equal probability.
I need to replicate whole for 1000 times and calculate mean for step 2, step 3, and step 4
''' replicate(1000,sample(1:90,20,replace = F, prob = NULL)) '''
I am trying to using this but I am not getting results.

how would you count the number of elements that are true in vector?

PDF=Fr(r)=1/(1+r)^2 and Rsample=Xsample/Ysample where X,Y are independent exponential distributions with rate = 0.001.xsample=100 values stored in x,ysample=100 values stored in y.
Find the CDF FR(r) corresponding to the PDF and evaluate this at r ∈{0.1,0.2,0.25,0.5,1,2,4,5,10}. Find the proportions of values in R-sample less than each of these values of r and plot the proportions against FR(0.1), FR(0.2), ... ,FR(5),FR(10). What does this plot show?
I know that the CDF is the integral of the pdf but wouldn't this give me negative values of r.also for the proportions section how would you count the number of elements that are true, that is the number of elements for which R-sample is less than each element of r.
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
for(i in 1:9)
{
x=Rsample<r[i]
prop[i]=c(TRUE,FALSE)
}
sum(prop[i])
You've made a few different errors here. The solution should look something like this.
Start by defining your variables and drawing your samples from the exponential distribution using rexp(100, 0.001):
r <- c(0.1, 0.2, 0.25, 0.5, 1, 2, 4, 5, 10)
set.seed(69) # Make random sample reproducible
x <- rexp(100, 0.001) # 100 random samples from exponential distribution
y <- rexp(100, 0.001) # 100 random samples from exponential distribution
Rsample <- x/y
The tricky part is getting the proportion of Rsample that is less than each value of r. For this we can use sapply instead of a loop.
props <- sapply(r, function(x) length(which(Rsample < x))/length(Rsample))
We get the cdf from the pdf by integrating (not shown):
cdf_at_r <- 1/(-r-1) # Integral of 1/(1+r)^2 at above values of r
And we can see what happens when we plot the proportions that are less than the sample against the cdf:
plot(cdf_at_r, props)
# What do we notice?
lines(c(-1, 0), c(0, 1), lty = 2, col = "red")
Created on 2020-03-05 by the reprex package (v0.3.0)
This is how you can count the number of elements for which R-sample is less than each element of r:
r=c(0.1,0.2,0.2,0.5,1,2,4,5,10)
prop=c(1:9)
less = 0;
for(i in 1:9)
{
if (Rsample<r[i]) {
less = less + 1
}
}
sum(prop[i])
less

R How to sample from an interrupted upside down bell curve

I've asked a related question before which successfully received an answer. Now I want to sample values from an upside down bell curve but exclude a range of values that fall in the middle of it like shown on the picture below:
I have this code currently working:
min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)
How may I adapt it to achieve the desired output?
Say you want a sample of 10,000 from your distribution but don't want any numbers between 5 and 15 in your sample. Why not just do:
q <- min + (max-min)*rbeta(50000, 0.5, 0.5);
q <- q[!(q > 5 & q < 15)][1:10000]
Which gives you this:
hist(q)
But still has the correct size:
length(q)
#> [1] 10000
An "upside-down bell curve" compared to the normal distribution, with the exclusion of a certain interval, can be sampled using the following algorithm. I write it in pseudocode because I'm not familiar with R. I adapted it from another answer I just posted.
Notice that this sampler samples in a truncated interval (here, the interval [x0, x1], with the exclusion of [x2, x3]) because it's not possible for an upside-down bell curve extended to infinity to integrate to 1 (which is one of the requirements for a probability density).
In the pseudocode, RNDU01() is a uniform(0, 1) random number.
x0pdf = 1-exp(-(x0*x0))
x1pdf = 1-exp(-(x1*x1))
ymax = max(x0pdf, x1pdf)
while true
# Choose a random x-coordinate
x=RNDU01()*(x1-x0)+x0
# Choose a random y-coordinate
y=RNDU01()*ymax
# Return x if y falls within PDF
if (x<x2 or x>x3) and y < 1-exp(-(x*x)): return x
end

Extract approximate probability density function (pdf) in R from random sampling

I have got n>2 independent continuous Random Variables(RV). For example say I have 4 Uniform RVs with different set of Upper and lowers.
W~U[-1,5], X~U[0,1], Y~[0,2], Z~[0.5,2]
I am trying to find out the approximate PDF for the sum of these RVs i.e. for T=W+X+Y+Z. As I don't need any closed form solution, I have sampled 1 million points for each of them to get 1 million samples for T. Is it possible in R to get the approximate PDF function or a way to get approximate probability of P(t<T)from this samples I have drawn. For example is there a easy way I can calculate P(0.5<T) in R. My priority here is to get probability first even if getting the density function is not possible.
Thanks
Consider the ecdf function:
set.seed(123)
W <- runif(1e6, -1, 5)
X <- runif(1e6, 0, 1)
Y <- runif(1e6, 0, 2)
Z <- runif(1e6, 0.5, 2)
T <- Reduce(`+`, list(W, X, Y, Z))
cdfT <- ecdf(T)
1 - cdfT(0.5) # Pr(T > 0.5)
# [1] 0.997589
See How to calculate cumulative distribution in R? for more details.

Resources