R - How to get row & column subscripts of matched elements from a distance matrix - r

I have an integer vector vec1 and I am generating a distant matrix using dist function. I want to get the coordinates (row and column) of element of certain value in the distance matrix. Essentially I would like to get the pair of elements that are d-distant apart. For example:
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
# 1 2 3 4
#2 1
#3 4 3
#4 10 9 6
#5 15 14 11 5
Say, I am interested in pair of elements in the vector that are 5 unit apart. I wanted to get the coordinate1 which are the rows and coordinate2 which are the columns of the distance matrix. In this toy example, I would expect
coord1
# [1] 5
coord2
# [1] 4
I am wondering if there is an efficient way to get these values that doesn't involve converting the dist object to a matrix or looping through the matrix?

A distance matrix is a lower triangular matrix in packed format, where the lower triangular is stored as a 1D vector by column. You can check this via
str(distMatrix)
# Class 'dist' atomic [1:10] 1 4 10 15 3 9 14 6 11 5
# ...
Even if we call dist(vec1, diag = TRUE, upper = TRUE), the vector is still the same; only the printing styles changes. That is, no matter how you call dist, you always get a vector.
This answer focus on how to transform between 1D and 2D index, so that you can work with a "dist" object without first making it a complete matrix using as.matrix. If you do want to make it a matrix, use the dist2mat function defined in as.matrix on a distance object is extremely slow; how to make it faster?.
R functions
It is easy to write vectorized R functions for those index transforms. We only need some care dealing with "out-of-bound" index, for which NA should be returned.
## 2D index to 1D index
f <- function (i, j, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
k <- (2 * n - j) * (j - 1) / 2 + (i - j)
k[!valid] <- NA_real_
k
}
## 1D index to 2D index
finv <- function (k, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (k >= 1) & (k <= n * (n - 1) / 2)
k_valid <- k[valid]
j <- rep.int(NA_real_, length(k))
j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1))) / 2)
i <- j + k - (2 * n - j) * (j - 1) / 2
cbind(i, j)
}
These functions are extremely cheap in memory usage, as they work with index instead of matrices.
Applying finv to your question
You can use
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
finv(which(distMatrix == 5), distMatrix)
# i j
#[1,] 5 4
Generally speaking, a distance matrix contains floating point numbers. It is risky to use == to judge whether two floating point numbers are equal. Read Why are these numbers not equal? for more and possible strategies.
Alternative with dist2mat
Using the dist2mat function given in as.matrix on a distance object is extremely slow; how to make it faster?, we may use which(, arr.ind = TRUE).
library(Rcpp)
sourceCpp("dist2mat.cpp")
mat <- dist2mat(distMatrix, 128)
which(mat == 5, arr.ind = TRUE)
# row col
#5 5 4
#4 4 5
Appendix: Markdown (needs MathJax support) for the picture
## 2D index to 1D index
The lower triangular looks like this: $$\begin{pmatrix} 0 & 0 & \cdots & 0\\ \times & 0 & \cdots & 0\\ \times & \times & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ \times & \times & \cdots & 0\end{pmatrix}$$ If the matrix is $n \times n$, then there are $(n - 1)$ elements ("$\times$") in the 1st column, and $(n - j)$ elements in the j<sup>th</sup> column. Thus, for element $(i,\ j)$ (with $i > j$, $j < n$) in the lower triangular, there are $$(n - 1) + \cdots (n - (j - 1)) = \frac{(2n - j)(j - 1)}{2}$$ "$\times$" in the previous $(j - 1)$ columns, and it is the $(i - j)$<sup>th</sup> "$\times$" in the $j$<sup>th</sup> column. So it is the $$\left\{\frac{(2n - j)(j - 1)}{2} + (i - j)\right\}^{\textit{th}}$$ "$\times$" in the lower triangular.
----
## 1D index to 2D index
Now for the $k$<sup>th</sup> "$\times$" in the lower triangular, how can we find its matrix index $(i,\ j)$? We take two steps: 1> find $j$; 2> obtain $i$ from $k$ and $j$.
The first "$\times$" of the $j$<sup>th</sup> column, i.e., $(j + 1,\ j)$, is the $\left\{\frac{(2n - j)(j - 1)}{2} + 1\right\}^{\textit{th}}$ "$\times$" of the lower triangular, thus $j$ is the maximum value such that $\frac{(2n - j)(j - 1)}{2} + 1 \leq k$. This is equivalent to finding the max $j$ so that $$j^2 - (2n + 1)j + 2(k + n - 1) \geq 0.$$ The LHS is a quadratic polynomial, and it is easy to see that the solution is the integer no larger than its first root (i.e., the root on the left side): $$j = \left\lfloor\frac{(2n + 1) - \sqrt{(2n-1)^2 - 8(k-1)}}{2}\right\rfloor.$$ Then $i$ can be obtained from $$i = j + k - \left\{\frac{(2n - j)(j - 1)}{2}\right\}.$$

If the vector is not too large, the best way is probably to wrap the output of dist into as.matrix and to use which with the option arr.ind=TRUE. The only disadvantage of this standard method to retrieve the index numbers within a dist matrix is an increase of memory usage, which may become important in the case of very large vectors passed to dist. This is because the conversion of the lower triangular matrix returned by dist into a regular, dense matrix effectively doubles the amount of stored data.
An alternative consists in converting the dist object into a list, such that each column in the lower triangular matrix of dist represents one member of the list. The index number of the list members and the position of the elements within the list members can then be mapped to the column and row number of the dense N x N matrix, without generating the matrix.
Here is one possible implementation of this list-based approach:
distToList <- function(x) {
idx <- sum(seq(length(x) - 1)) - rev(cumsum(seq(length(x) - 1))) + 1
listDist <- unname(split(dist(x), cumsum(seq_along(dist(x)) %in% idx)))
# http://stackoverflow.com/a/16358095/4770166
}
findDistPairs <- function(vec, theDist) {
listDist <- distToList(vec)
inList <- lapply(listDist, is.element, theDist)
matchedCols <- which(sapply(inList, sum) > 0)
if (length(matchedCols) > 0) found <- TRUE else found <- FALSE
if (found) {
matchedRows <- sapply(matchedCols, function(x) which(inList[[x]]) + x )
} else {matchedRows <- integer(length = 0)}
matches <- cbind(col=rep(matchedCols, sapply(matchedRows,length)),
row=unlist(matchedRows))
return(matches)
}
vec1 <- c(2, 3, 6, 12, 17)
findDistPairs(vec1, 5)
# col row
#[1,] 4 5
The parts of the code that might be somewhat unclear concern the mapping of the position of an entry within the list to a column / row value of the N x N matrix. While not trivial, these transformations are straightforward.
In a comment within the code I have pointed out an answer on StackOverflow which has been used here to split a vector into a list. The loops (sapply, lapply) should be unproblematic in terms of performance since their range is of order O(N). The memory usage of this code is largely determined by the storage of the list. This amount of memory should be similar to that of the dist object since both objects contain the same data.
The dist object is calculated and transformed into a list in the function distToList(). Because of the dist calculation, which is required in any case, this function could be time-consuming in the case of large vectors. If the goal is to find several pairs with different distance values, then it may be better to calculate listDist only once for a given vector and to store the resulting list, e.g., in the global environment.
Long story short
The usual way to treat such problems is simple and fast:
distMatrix <- as.matrix(dist(vec1)) * lower.tri(diag(vec1))
which(distMatrix == 5, arr.ind = TRUE)
# row col
#5 5 4
I suggest using this method by default. More complicated solutions may become necessary in situations where memory limits are reached, i.e., in the case of very large vectors vec1. The list-based approach described above could then provide a remedy.

Related

Generate non-negative (or positive) random integers that sum to a fixed value

I would like to randomly assign positive integers to G groups, such that they sum up to V.
For example, if G = 3 and V = 21, valid results may be (7, 7, 7), (10, 6, 5), etc.
Is there a straightforward way to do this?
Editor's notice (from 李哲源):
If values are not restricted to integers, the problem is simple and has been addressed in Choosing n numbers with fixed sum.
For integers, there is a previous Q & A: Generate N random integers that sum to M in R but it appears more complicated and is hard to follow. The loop based solution over there is also not satisfying.
non-negative integers
Let n be sample size:
x <- rmultinom(n, V, rep.int(1 / G, G))
is a G x n matrix, where each column is a multinomial sample that sums up to V.
By passing rep.int(1 / G, G) to argument prob I assume that each group has equal probability of "success".
positive integers
As Gregor mentions, a multinomial sample can contain 0. If such samples are undesired, they should be rejected. As a result, we sample from a truncated multinomial distribution.
In How to generate target number of samples from a distribution under a rejection criterion I suggested an "over-sampling" approach to achieve "vectorization" for a truncated sampling. Simply put, Knowing the acceptance probability we can estimate the expected number of trials M to see the first "success" (non-zero). We first sample say 1.25 * M samples, then there will be at least one "success" in these samples. We randomly return one as the output.
The following function implements this idea to generate truncated multinomial samples without 0.
positive_rmultinom <- function (n, V, prob) {
## input validation
G <- length(prob)
if (G > V) stop("'G > V' causes 0 in a sample for sure!")
if (any(prob < 0)) stop("'prob' can not contain negative values!")
## normalization
sum_prob <- sum(prob)
if (sum_prob != 1) prob <- prob / sum_prob
## minimal probability
min_prob <- min(prob)
## expected number of trials to get a "success" on the group with min_prob
M <- round(1.25 * 1 / min_prob)
## sampling
N <- n * M
x <- rmultinom(N, V, prob)
keep <- which(colSums(x == 0) == 0)
x[, sample(keep, n)]
}
Now let's try
V <- 76
prob <- c(53, 13, 9, 1)
Directly using rmultinom to draw samples can occasionally result in ones with 0:
## number of samples that contain 0 in 1000 trials
sum(colSums(rmultinom(1000, V, prob) == 0) > 0)
#[1] 355 ## or some other value greater than 0
But there is no such issue by using positive_rmultinom:
## number of samples that contain 0 in 1000 trials
sum(colSums(positive_rmultinom(1000, V, prob) == 0) > 0)
#[1] 0
Probably a less expensive way, but this seems to work.
G <- 3
V <- 21
m <- data.frame(matrix(rep(1:V,G),V,G))
tmp <- expand.grid(m) # all possibilities
out <- tmp[which(rowSums(tmp) == V),] # pluck those that sum to 'V'
out[sample(1:nrow(out),1),] # randomly select a column
Not sure how to do with runif
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2
Note, I also posted this here Generate N random integers that sum to M in R, but this answer is relevant to both questions.

split a numeric value into unequal size but summation of them is equal to total value [duplicate]

I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2

If...else within a for loop

I am writing a function to perform bit inversion for each row of a binary matrix which depends on a predefined n value. The n value will determine the number of 1 bits for each row of the matrix.
set.seed(123)
## generate a random 5 by 10 binary matrix
init <- t(replicate(5, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
n <- 3
## init_1 is a used to explain my problem (single row matrix)
init_1 <- t(replicate(1, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, 10 - i)))}))
The bit_inversion function does this few things:
If the selected row has number of 1's lesser than n, then it randomly select a few indices (difference) and invert them. (0 to 1)
Else if the selected row has number of 1's greater than n, then it randomly select a few indices (difference) and invert them. (1 to 0)
Else do nothing (when the row has number of 1's equals to n.)
Below is the function I implemented:
bit_inversion<- function(pop){
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
## checking condition where there are more bits being turned on than n
if(sum(pop[i,]) > n){
## determine position of 1's
bit_position_1 <- sample(which(pop[i,]==1), difference)
## bit inversion
for(j in 1:length(bit_position_1)){
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
}
}
else if (sum(pop[i,]) < n){
## determine position of 0's
bit_position_0 <- sample(which(pop[i,]==0), difference)
## bit inversion
for(j in 1:length(bit_position_0)){
pop[bit_position_0[j]] <- abs(pop[bit_position_0[j]] - 1)
}
}
}
return(pop)
}
Outcome:
call <- bit_inversion(init)
> rowSums(call) ## suppose to be all 3
[1] 3 4 5 4 3
But when using init_1 (a single row matrix), the function seems to work fine.
Outcome:
call_1 <- bit_inversion(init_1)
> rowSums(call)
[1] 3
Is there a mistake in my for and if...else loop?
Change the line in 'j' for loop
pop[bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
into
pop[i,bit_position_1[j]] <- abs(pop[i,][bit_position_1[j]] - 1)
You forgot the row index.
And, here is a more compact version of your for loop:
for(i in 1:nrow(pop)){
difference <- abs(sum(pop[i,]) - n)
logi <- sum(pop[i,]) > n
pop[i,sample(which(pop[i,]==logi), difference)] <- !logi
}

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

Generate N random integers that sum to M in R

I would like to generate N random positive integers that sum to M. I would like the random positive integers to be selected around a fairly normal distribution whose mean is M/N, with a small standard deviation (is it possible to set this as a constraint?).
Finally, how would you generalize the answer to generate N random positive numbers (not just integers)?
I found other relevant questions, but couldn't determine how to apply their answers to this context:
https://stats.stackexchange.com/questions/59096/generate-three-random-numbers-that-sum-to-1-in-r
Generate 3 random number that sum to 1 in R
R - random approximate normal distribution of integers with predefined total
Normalize.
rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i] <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}
For a continuous version, simply use:
rand_vect_cont <- function(N, M, sd = 1) {
vec <- rnorm(N, M/N, sd)
vec / sum(vec) * M
}
Examples
rand_vect(3, 50)
# [1] 17 16 17
rand_vect(10, 10, pos.only = FALSE)
# [1] 0 2 3 2 0 0 -1 2 1 1
rand_vect(10, 5, pos.only = TRUE)
# [1] 0 0 0 0 2 0 0 1 2 0
rand_vect_cont(3, 10)
# [1] 2.832636 3.722558 3.444806
rand_vect(10, -1, pos.only = FALSE)
# [1] -1 -1 1 -2 2 1 1 0 -1 -1
Just came up with an algorithm to generate N random numbers greater or equal to k whose sum is S, in an uniformly distributed manner. I hope it will be of use here!
First, generate N-1 random numbers between k and S - k(N-1), inclusive. Sort them in descending order. Then, for all xi, with i <= N-2, apply x'i = xi - xi+1 + k, and x'N-1 = xN-1 (use two buffers). The Nth number is just S minus the sum of all the obtained quantities. This has the advantage of giving the same probability for all the possible combinations. If you want positive integers, k = 0 (or maybe 1?). If you want reals, use the same method with a continuous RNG. If your numbers are to be integer, you may care about whether they can or can't be equal to k. Best wishes!
Explanation: by taking out one of the numbers, all the combinations of values which allow a valid Nth number form a simplex when represented in (N-1)-space, which lies at one vertex of a (N-1)-cube (the (N-1)-cube described by the random values range). After generating them, we have to map all points in the N-cube to points in the simplex. For that purpose, I have used one method of triangulation which involves all possible permutations of coordinates in descending order. By sorting the values, we are mapping all (N-1)! simplices to only one of them. We also have to translate and scale the numbers vector so that all coordinates lie in [0, 1], by subtracting k and dividing the result by S - kN. Let us name the new coordinates yi.
Then we apply the transformation by multiplying the inverse matrix of the original basis, something like this:
/ 1 1 1 \ / 1 -1 0 \
B = | 0 1 1 |, B^-1 = | 0 1 -1 |, Y' = B^-1 Y
\ 0 0 1 / \ 0 0 1 /
Which gives y'i = yi - yi+1. When we rescale the coordinates, we get:
x'i = y'i(S - kN) + k = yi(S - kN) - yi+1(S - kN) + k = (xi - k) - (xi+1 - k) + k = xi - xi+1 + k, hence the above formula. This is applied to all elements except the last one.
Finally, we should take into account the distortion that this transformation introduces into the probability distribution. Actually, and please correct me if I'm wrong, the transformation applied to the first simplex to obtain the second should not alter the probability distribution. Here is the proof.
The probability increase at any point is the increase in the volume of a local region around that point as the size of the region tends to zero, divided by the total volume increase of the simplex. In this case, the two volumes are the same (just take the determinants of the basis vectors). The probability distribution will be the same if the linear increase of the region volume is always equal to 1. We can calculate it as the determinant of the transpose matrix of the derivative of a transformed vector V' = B-1 V with respect to V, which, of course, is B-1.
Calculation of this determinant is quite straightforward, and it gives 1, which means that the points are not distorted in any way that would make some of them more likely to appear than others.
I figured out what I believe to be a much simpler solution. You first generate random integers from your minimum to maximum range, count them up and then make a vector of the counts (including zeros).
Note that this solution may include zeros even if the minimum value is greater than zero.
Hope this helps future r people with this problem :)
rand.vect.with.total <- function(min, max, total) {
# generate random numbers
x <- sample(min:max, total, replace=TRUE)
# count numbers
sum.x <- table(x)
# convert count to index position
out = vector()
for (i in 1:length(min:max)) {
out[i] <- sum.x[as.character(i)]
}
out[is.na(out)] <- 0
return(out)
}
rand.vect.with.total(0, 3, 5)
# [1] 3 1 1 0
rand.vect.with.total(1, 5, 10)
#[1] 4 1 3 0 2

Resources