in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.
In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.
The resulting vector should have values that are non-negative and below the original value.
set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1
y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)
Can you think of a way to do it? Thanks!
Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.
set.seed(1)
perc <- 50 / 100
x <- sample(1:10000, 1000)
sum(x)
# [1] 5014161
x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
sum(x)
# [1] 2550901
sum(x) * 2
# [1] 5101802
sum(x) * 2 / 5014161
# [1] 1.017479 # One percent deviation
Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:
set.seed(1)
perc <- 50 / 100
x <- sample(1:1000, 100)
sum(x)
# [1] 49921
x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
sum(x)
# [1] 24851
sum(x) * 2
# [1] 49702
sum(x) * 2 / 49921
# [1] 0.9956131 # Less than 0.5% deviation!
An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.
removereads<-function(x,perc=NULL){
xsum<-sum(x)
toremove<-floor(xsum*perc)
toremove2<-toremove
irem<-1
while(toremove2>(toremove*0.01)){
message("Downsampling iteration ",irem)
tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
tmp2<-table(tmp)
y<-x
common<-as.numeric(names(tmp2))
y[common]<-x[common]-tmp2
y[y<0]<-0
toremove2<-toremove-(xsum-sum(y))
irem<-irem+1
}
return(y)
}
set.seed(1)
x<-sample(1:1000,10000,replace=TRUE)
perc<-0.9
y<-removereads(x,perc)
plot(x,y,xlab="Before reduction",ylab="After reduction")
abline(0,1)
And the graphical results:
Here's a solution which uses draws from the Dirichlet distribution:
set.seed(1)
x = sample(10000, 1000, replace = TRUE)
magic = function(x, perc, alpha = 1){
# sample from the Dirichlet distribution
# sum(p) == 1
# lower values should reduce by less than larger values
# larger alpha means the result will have more "randomness"
p = rgamma(length(x), x / alpha, 1)
p = p / sum(p)
# scale p up an amount so we can subtract it from x
# and get close to the desired sum
reduce = round(p * (sum(x) - sum(round(x * perc))))
y = x - reduce
# No negatives
y = c(ifelse(y < 0, 0, y))
return (y)
}
alpha = 500
perc = 0.7
target = sum(round(perc * x))
y = magic(x, perc, alpha)
# Hopefully close to 1
sum(y) / target
> 1.000048
# Measure of the "randomness"
sd(y / x)
> 0.1376637
Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.
Related
I want to generate a vector of a given length, e.g., n = 5. Each value in the vector should be a proportion (i.e., a value between 0 and 1) so that across n elements they sum up to 1.
Unfortunately, I have two vectors: one (mymins) defines the allowed lower boundaries of each proportion and the other (mymaxs) defines the allowed top boundaries of each proportion.
In my example below the desired proportion for the first element is allowed to fall anywhere between 0.3 and 0.9. And for the last element, the desired proportion is allowed to fall between 0.05 and 0.7.
mymins <- c(0.3, 0.1, 0, 0.2, 0.05)
mymaxs <- c(0.9, 1, 1, 1, 0.7)
Let's assume that mymins are always 'legitimate' (i.e., their sum is never larger than 1).
How could I find a set of 5 proportions such that they all sum to 1 but lie within the boundaries?
Here is what I tried:
n = 5
mydif <- mymaxs - mymins # possible range for each proportion
myorder <- rank(mydif) # order those differences from smallest to largest
mytarget <- sum(mydif) # sum up the 5 ranges
x <- sort(runif(n))[myorder] # generate 5 random values an sort them in the order of mydif
x2 <- mymins + x / sum(x) * mytarget # rescale random values to sum up to mytarget and add them to mymins
x3 <- x2/sum(x2) # rescale x2 to sum up to 1
As you can see, I am not very far - because after rescaling some values are outside of their allowed boundaries.
I should probably also mention that I need this operation to be fast - because I am using it in an optimization loop.
I also tried to find a solution using optim, however the problem is that it always finds the same solution - and I need to generate a DIFFERENT solutions every time I find the proporotion:
myfun <- function(x) {
x <- round(x, 4)
abovemins <- x - mymins
n_belowmins <- sum(abovemins < 0)
if (n_belowmins > 0) return(100000)
belowmax <- x - mymaxs
n_abovemax <- sum(belowmax > 0)
if (n_abovemax > 0) return(100000)
mydist <- abs(sum(x) - 1)
return(mydist)
}
myopt <- optim(par = mymins + 0.01, fn = myfun)
myopt$par
sum(round(myopt$par, 4))
Thank you very much for your suggestions!
Perhaps its better to think of this in a different way. Your samples actually need to sum to 0.35 (which is 1 - sum(mymins)), then be added on to the minimum values
constrained_sample <- function(mymins, mymaxs)
{
sizes <- mymaxs - mymins
samp <- (runif(5) * sizes)
samp/sum(samp) * (1 - sum(mymins)) + mymins
}
It works like this:
constrained_sample(mymins, mymaxs)
#> [1] 0.31728333 0.17839397 0.07196067 0.29146744 0.14089459
We can test this works by running the following loop, which will print a message to the console if any of the criteria aren't met:
for(i in 1:1000)
{
test <- constrained_sample(mymins, mymaxs)
if(!all(test > mymins) | !all(test < mymaxs) | abs(sum(test) - 1) > 1e6) cat("failure")
}
This throws no errors, since the criteria are always met. However, as #GregorThomas points out, the bounds aren't realistic in this case. We can see a range of solutions constrained by your conditions using a boxplot:
samp <- constrained_sample(mymins, mymaxs)
for(i in 1:999) samp <- rbind(samp, constrained_sample(mymins, mymaxs))
df <- data.frame(val = c(samp[,1], samp[,2], samp[,3], samp[,4], samp[,5]),
index = factor(rep(1:5, each = 1000)))
ggplot(df, aes(x = index, y = val)) + geom_boxplot()
Because you need 5 random numbers to sum to 1, you really only have 4 independent numbers and one dependent number.
mymins <- c(0.3, 0.1, 0, 0.2, 0.05)
mymaxs <- c(0.9, 1, 1, 1, 0.7)
set.seed(42)
iter <- 1000
while(iter > 0 &&
(
(1 - sum(x <- runif(4, mymins[-5], mymaxs[-5]))) < mymins[5] ||
(1 - sum(x)) > mymaxs[5]
)
) iter <- iter - 1
if (iter < 1) {
# failed
stop("unable to find something within 1000 iterations")
} else {
x <- c(x, 1-sum(x))
}
sum(x)
# [1] 1
all(mymins <= x & x <= mymaxs)
# [1] TRUE
x
# [1] 0.37732330 0.21618036 0.07225311 0.24250359 0.09173965
The reason I use iter there is to make sure you don't take an "infinite" amount of time to find something. If your mymins and mymaxs combination make this mathematically infeasible (as your first example was), then you don't need to spin forever. If it is mathematically improbable to find it in a reasonable amount of time, you need to weigh how long you want to do this.
One reason this takes so long is that we are iteratively pulling entropy. If you expect this to go for a long time, then it is generally better to pre-calculate as much as you think you'll need (overall) and run things as a matrix.
set.seed(42)
n <- 10000
m <- matrix(runif(prod(n, length(mymins)-1)), nrow = n)
m <- t(t(m) * (mymaxs[-5] - mymins[-5]) + mymins[-5])
remainders <- (1 - rowSums(m))
ind <- mymins[5] <= remainders & remainders <= mymaxs[5]
table(ind)
# ind
# FALSE TRUE
# 9981 19
m <- cbind(m[ind,,drop=FALSE], remainders[ind])
nrow(m)
# [1] 19
rowSums(m)
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
head(m)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.3405821 0.1306152 0.05931363 0.2199362 0.24955282
# [2,] 0.3601376 0.1367465 0.20235704 0.2477507 0.05300821
# [3,] 0.4469526 0.1279795 0.02265618 0.2881733 0.11423845
# [4,] 0.5450527 0.1029903 0.07503371 0.2052423 0.07168103
# [5,] 0.3161519 0.1469783 0.15290720 0.3268470 0.05711557
# [6,] 0.4782448 0.1185735 0.01664063 0.2178225 0.16871845
all(
mymins[1] <= m[,1] & m[,1] <= mymaxs[1],
mymins[2] <= m[,2] & m[,2] <= mymaxs[2],
mymins[3] <= m[,3] & m[,3] <= mymaxs[3],
mymins[4] <= m[,4] & m[,4] <= mymaxs[4],
mymins[5] <= m[,5] & m[,5] <= mymaxs[5]
)
# [1] TRUE
This time it took 10000 attempts to make 19 valid combinations. It might take more or fewer attempts based on randomness, so ymmv with regards to how much you need to pre-generate.
If your example bounds are realistic, we can refine them quite a bit, narrowing the range of possibilities. For the current version of the question with:
mymins = c(0.3, 0.1, 0, 0.2, 0.05)
mymaxs = c(0.9, 1, 1, 1, 0.7)
What's the max for x[1]? Well, if x[2:5] take on minimum values, they will add up to 0.1 + 0 + 0.2 + 0.05 = 0.35, so based on the other mins only we know that max value for x[1] is 1 - 0.35 = 0.65. The 0.9 in mymaxs is way too high.
We can calculate the actual max values taking the minimum of the max values based on the minimums and the mymaxs vector:
new_max = pmin(mymaxs, 1 - (sum(mymins) - mymins))
new_max
# [1] 0.65 0.45 0.35 0.55 0.40
We can similarly revise the min bounds, though in this case even the revised max bounds new_max are high enough that it would have any impact on the minimums.
new_min = pmax(mymins, 1 - (sum(new_max) - new_max))
new_min
# [1] 0.30 0.10 0.00 0.20 0.05
With these adjustments, we should be able to see easily if any solutions are possible (all(new_min < new_max)). And then generating random numbers as in r2evans's answer should go much quicker using the new bounds.
Is there a method to generate random integers in R such that any two consecutive numbers are different? It is probably along the lines of x[k+1] != x[k] but I can't work out how to put it all together.
Not sure if there is a function available for that. Maybe this function can do what you want:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y=c()
while(length(y)!=n)
{
y= c(y,sample(sample_from,n-length(y),replace=T))
y=y[!c(FALSE, diff(y) == 0)]
}
return(y)
}
Example:
random_non_consecutive(20,c(2,4,6,8))
[1] 6 4 6 2 6 4 2 8 4 2 6 2 8 2 8 2 8 4 8 6
Hope this helps.
The function above has a long worst-case runtime. We can keep that worst-case more constant with for example the following implementation:
# n = number of elements
# sample_from = draw random numbers from this range
random_non_consecutive <- function(n=10,sample_from = seq(1,5))
{
y= rep(NA, n)
prev=-1 # change this if -1 is in your range, to e.g. max(sample_from)+1
for(i in seq(n)){
y[i]=sample(setdiff(sample_from,prev),1)
prev = y[i]
}
return(y)
}
Another approach is to over-sample and remove the disqualifying ones as follows:
# assumptions
n <- 5 # population size
sample_size <- 1000
# answer
mu <- sample_size * 1/n
vr <- sample_size * 1/n * (1 - 1/n)
addl_draws <- round(mu + vr, 0)
index <- seq(1:n)
sample_index <- sample(index, sample_size + addl_draws, replace = TRUE)
qualified_sample_index <- sample_index[which(diff(sample_index) != 0)]
qualified_sample_index <- qualified_sample_index[1:sample_size]
# In the very unlikely event the number of qualified samples < sample size,
# NA's will fill the vector. This will print those N/A's
print(which(is.na(qualified_sample_index) == TRUE))
I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2
We have a big for loop in R for simulating various data where for some iterations the data generate in such a way that a quantity comes 0 inside the loop, which is not desirable and we should skip that step of data generation. But at the same time we also need to increase the number of iterations by one step because of such skip, otherwise we will have fewer observations than required.
For example, while running the following code, we get z=0 in iteration 1, 8 and 9.
rm(list=ls())
n <- 10
z <- NULL
for(i in 1:n){
set.seed(i)
a <- rbinom(1,1,0.5)
b <- rbinom(1,1,0.5)
z[i] <- a+b
}
z
[1] 0 1 1 1 1 2 1 0 0 1
We desire to skip these steps so that we do not have any z=0 but we also want a vector z of length 10. It may be done in many ways. But what I particularly want to see is how we can stop the iteration and skip the current step when z=0 is encountered and go to the next step, ultimately obtaining 10 observations for z.
Normally we do this via a while loop, as the number of iterations required is unknown beforehand.
n <- 10L
z <- integer(n)
m <- 1L; i <- 0L
while (m <= n) {
set.seed(i)
z_i <- sum(rbinom(2L, 1, 0.5))
if (z_i > 0L) {z[m] <- z_i; m <- m + 1L}
i <- i + 1L
}
Output:
z
# [1] 1 1 1 1 1 2 1 1 1 1
i
# [1] 14
So we sample 14 times, 4 of which are 0 and the rest 10 are retained.
More efficient vectorized method
set.seed(0)
n <- 10L
z <- rbinom(n, 1, 0.5) + rbinom(n, 1, 0.5)
m <- length(z <- z[z > 0L]) ## filtered samples
p <- m / n ## estimated success probability
k <- round(1.5 * (n - m) / p) ## further number of samples to ensure successful (n - m) non-zero samples
z_more <- rbinom(k, 1, 0.5) + rbinom(k, 1, 0.5)
z <- c(z, z_more[which(z_more > 0)[seq_len(n - m)]])
Some probability theory of geometric distribution has been used here. Initially we sample n samples, m of which are retained. So the estimated probability of success in accepting samples is p <- m/n. According to theory of Geometric distribution, on average, we need at least 1/p samples to observe a success. Therefore, we should at least sample (n-m)/p more times to expect (n-m) success. The 1.5 is just an inflation factor. By sampling 1.5 times more samples we hopefully can ensure (n-m) success.
According to Law of large numbers, the estimate of p is more precise when n is large. Therefore, this approach is stable for large n.
If you feel that 1.5 is not large enough, use 2 or 3. But my feeling is that it is sufficient.
I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2