I'm trying to understand a proposed solution for a University test.
Let me assume that we have created a random variable with
set.seed(123)
R <- 5
X <- rexp(R, 2)
So the content of X is
0.42172863 0.28830514 0.66452743 0.01578868 0.02810549
In the solutions of the problem I find
Y <- rpois(R, exp(X / 4))
where the content of exp(X / 4) is
1.111191 1.074737 1.180729 1.003955 1.007051
where, contrary to my expectations the second argument is an array instead of being a scalar.
If I calculate
print(rpois(R, 1.111191))
print(rpois(R, 1.074737))
print(rpois(R, 1.180729))
print(rpois(R, 1.003955))
print(rpois(R, 1.007051))
I get
2 1 1 3 1
1 1 0 2 0
0 1 3 3 2
1 4 1 1 1
1 0 0 3 2
while for rpois(R, exp(X / 4)) I get
1 2 0 1 2
How are the two results related?
It's a behaviour I can't find explained anywhere.
R makes its functions vectorized wherever it's reasonable to do so.
In particular, in the function call rpois(R, lambda), R specifies the number of samples to take, and lambda is the vector of means, which is recycled to match R. In other words, if lambda is a single value then the same mean will be used for each Poisson draw; if it is a vector of length R, then each element of the vector will be used for the corresponding Poisson draw.
So the equivalent of Y <- rpois(R, exp(X / 4)) would be
Y <- c(
rpois(1, exp(X[1]/4),
rpois(1, exp(X[2]/4),
rpois(1, exp(X[3]/4),
...
)
We could also do this with a for loop:
Y <- numeric(R) ## allocate a length-R numeric vector
for (i in seq(R)) {
Y[i] <- rpois(1, exp(X[i]/4))
}
Using the vectorized version whenever it's available is best practice; it's faster and requires less code (therefore easier to understand).
Related
I am trying to write a function that creates a vector that counts up and back based on the number given c(1:n, (n-1):1). When 3 is entered, however, I want the vector to display as 1,1,1,2,2,2,3,3,3 instead of 1,2,3,2,1. I have tried using if(n==3), but I get an error when I try to run it that says "n cannot be found", but I can't quite understand why. Any help is very much appreciated! Here is what I have tried:
vector<-function(n)
c(1:n, (n-1):1)
if(n==3)
c(rep(1,3),rep(2,3),rep(3,3))
Problems
There are several problems with the code in the question:
the { ... } are missing from the function so only the first line after the function line would actually be regarded as part of the function.
a function returns the value of the last statement executed and the last statement executed in the question is the if or the body of the if so the c(1:n, (n-1):1) statement is computed but can never be returned.
also if n=1 then c(1:n, (n-1):1) gives 1,0,1 which is not likely what you want.
c(rep(1,3),rep(2,3),rep(3,3)) is not wrong in terms of the result it gives but rep can be used in a more compact manner.
normally x:y is not used in programming because if y < x then it unexpectedly gives values descending from x to y. In this case the if statements excluse such a possibility but you might want to replace the colon with the appropriate seq anyways. The Alternatives to Second Leg in Last if section below provides such an alternative.
Solution
Instead try this. It first checks if n is less than 1 and if so returns a zero length vector; otherwise, the remaining if is run with two legs, one leg for the n = 1 or n = 3 case and one leg for the remaining cases.
(If you are willing to only have this work for n > 0 then we could omit the first if. If you are willing to only have this work n > 1 then we could omit the n==1 part of the condition in the last if too.)
myfun <- function(n) {
if (n < 1) integer(0)
else if (n == 1 || n == 3) rep(1:n, each = n)
else c(1:n, (n-1):1)
}
giving:
myfun(-1)
## integer(0)
myfun(0)
## integer(0)
myfun(1)
## [1] 1
myfun(2)
## [1] 1 2 1
myfun(3)
## [1] 1 1 1 2 2 2 3 3 3
myfun(4)
## [1] 1 2 3 4 3 2 1
Alternatives for first leg of last if
Here are some alternatives for the first leg, i.e. for n <- 3.
rep(1:n, each = n)
## [1] 1 1 1 2 2 2 3 3 3
c(outer(rep(1, n), 1:n))
## [1] 1 1 1 2 2 2 3 3 3
c(col(diag(n)))
## [1] 1 1 1 2 2 2 3 3 3
Alternatives for second leg of last if
and here are some alternatives for the second leg. The first assumes n > 1 and the others assume n > 0. In the code in the Solution section we handle n=1 in the n=3 leg so any of the following could be used. As the first alternative below does not handle n=1 it relies on the fact that the first leg of the last if handles n=1; however, the remaining alternatives below can handle n=1 correctly so they could be used even if we only have the first leg handle n=3.
c(1:n, (n-1):1) # only works for n > 1
c(seq_len(n), rev(seq_len(n-1)))
pmin(seq(2*n - 1), seq(2*n-1, 1))
n - abs((n-1):-(n-1))
Try this one it's working :
vector<-function(n)
{
if(n==3)
rep(1:3, each=3)
else
c(1:n, (n-1):1)
}
I assume you ran the function with as one-liner and worked, then you added the conditional statement.
Try this
vector<-function(n){
m <- c(1:n, (n-1):1)
if(n==3) m<- c(rep(1,3),rep(2,3),rep(3,3))
m
}
Another way to do it
Vector2 <- function(n){
if(n == 3 ){
return(c(rep(1,3),rep(2,3),rep(3,3)))
} else{
return(c(1:n, (n-1):1) )
}
}
I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2
We have a big for loop in R for simulating various data where for some iterations the data generate in such a way that a quantity comes 0 inside the loop, which is not desirable and we should skip that step of data generation. But at the same time we also need to increase the number of iterations by one step because of such skip, otherwise we will have fewer observations than required.
For example, while running the following code, we get z=0 in iteration 1, 8 and 9.
rm(list=ls())
n <- 10
z <- NULL
for(i in 1:n){
set.seed(i)
a <- rbinom(1,1,0.5)
b <- rbinom(1,1,0.5)
z[i] <- a+b
}
z
[1] 0 1 1 1 1 2 1 0 0 1
We desire to skip these steps so that we do not have any z=0 but we also want a vector z of length 10. It may be done in many ways. But what I particularly want to see is how we can stop the iteration and skip the current step when z=0 is encountered and go to the next step, ultimately obtaining 10 observations for z.
Normally we do this via a while loop, as the number of iterations required is unknown beforehand.
n <- 10L
z <- integer(n)
m <- 1L; i <- 0L
while (m <= n) {
set.seed(i)
z_i <- sum(rbinom(2L, 1, 0.5))
if (z_i > 0L) {z[m] <- z_i; m <- m + 1L}
i <- i + 1L
}
Output:
z
# [1] 1 1 1 1 1 2 1 1 1 1
i
# [1] 14
So we sample 14 times, 4 of which are 0 and the rest 10 are retained.
More efficient vectorized method
set.seed(0)
n <- 10L
z <- rbinom(n, 1, 0.5) + rbinom(n, 1, 0.5)
m <- length(z <- z[z > 0L]) ## filtered samples
p <- m / n ## estimated success probability
k <- round(1.5 * (n - m) / p) ## further number of samples to ensure successful (n - m) non-zero samples
z_more <- rbinom(k, 1, 0.5) + rbinom(k, 1, 0.5)
z <- c(z, z_more[which(z_more > 0)[seq_len(n - m)]])
Some probability theory of geometric distribution has been used here. Initially we sample n samples, m of which are retained. So the estimated probability of success in accepting samples is p <- m/n. According to theory of Geometric distribution, on average, we need at least 1/p samples to observe a success. Therefore, we should at least sample (n-m)/p more times to expect (n-m) success. The 1.5 is just an inflation factor. By sampling 1.5 times more samples we hopefully can ensure (n-m) success.
According to Law of large numbers, the estimate of p is more precise when n is large. Therefore, this approach is stable for large n.
If you feel that 1.5 is not large enough, use 2 or 3. But my feeling is that it is sufficient.
I am a complete statistical noob and new to R, hence the question. I've tried to find an implementation of the Rao score for the particular case when one's data is binary and each observation has bernoulli distribution. I stumbled upon anova in the R language but failed to understand how to use that. Therefore, I tried implementing Rao score for this particular case myself:
rao.score.bern <- function(data, p0) {
# assume `data` is a list of 0s and 1s
y <- sum(data)
n <- length(data)
phat <- y / n
z <- (phat - p0) / sqrt(p0 * (1 - p0) / n)
p.value <- 2 * (1 - pnorm(abs(z)))
}
I am pretty sure that there is a bug in my code because it produces only two distinct p-values in the following scenario:
p0 <- 1 / 4
p <- seq(from=0.01, to=0.5, by=0.01)
n <- seq(from=5, to=70, by=1)
g <- expand.grid(n, p)
data <- apply(g, 1, function(x) rbinom(x[1], 1, x[2]))
p.values <- sapply(data, function(x) rao.score.bern(x[[1]], p0))
Could someone please show me where the problem is? Could you perhaps point me to a built-in solution in R?
First test, then debug.
Test
Does rao.score.bern work at all?
rao.score.bern(c(0,0,0,1,1,1), 1/6))
This returns...nothing! Fix it by replacing the ultimate line by
2 * (1 - pnorm(abs(z)))
This eliminates the unnecessary assignment.
rao.score.bern(c(0,0,0,1,1,1), 1/6))
[1] 0.02845974
OK, now we're getting somewhere.
Debug
Unfortunately, the code still doesn't work. Let's debug by yanking the call to rao.score.bern and replacing it by something that shows us the input. Don't apply it to the large input you created! Use a small piece of it:
sapply(data[1:5], function(x) x[[1]])
[1] 0 0 0 0 0
That's not what you expected, is it? It's returning just one zero for each element of data. What about this?
sapply(data[1:5], function(x) x)
[[1]]
[1] 0 0 0 0 0
[[2]]
[1] 0 0 0 0 0 0
...
[[5]]
[1] 0 0 0 0 0 0 0 0 0
Much better! The variable x in the call to sapply refers to the entire vector, which is what you want to pass to your routine. Whence
p.values <- sapply(data, function(x) rao.score.bern(x, p0)); hist(p.values)
I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2