Identify positions at which a vector of cumulative sums exceeds - r

I have a named vector of cumulated sums:
x <- sort(runif(20, 1, 10), decreasing = T)
names(x) <- LETTERS[1:20]
cumsums <- cumsum(x)
head(cumsums)
A B C D E F
9.902633 19.240766 28.531703 37.537920 46.065978 54.380480
How can i identify the positions at which the first value of cumsums exceeds a defined threshold (e.g. 25,50,75, 90)?

For a single threshold at a time, the following should work:
which(cumsums > 25)[1]
Unlike the which.max solution, it will return NA if there are no elements of cumsums greater than the threshold.
Of course, if your vector is very large or you need to look for multiple thresholds simultaneously, this may not be the most efficient solution.

Related

R: Accessing elements of 2D matrix with vectors of indices

Suppose I have a 3 X 15 matrix "phi", and I want to create a vector of entries from phi corresponding to an i,j combination, where i is a length 900 vector of numbers 1:3, and j is a length 900 vector of numbers 1:15. In other words, I want a length 900 vector of phi values, where the first element is phi[i[1], j[1]], the second element would be phi[i[2], j[2]], etc.
My initial thought was phi_list <- phi[i, j], but that appears to give back every combination of i,j values. So, how would I go about constructing such a vector?
Thanks for any help!
In this case, we can use the index as a matrix with the i for row index and 'j' for column index
phi[cbind(i, j)]
#[1] 6 18 35
If we use the i and jvectors in the 'i', and 'j' it would return a matrix by including the rows and columns included in the index instead of picking the elements that matches the location
data
set.seed(24)
phi <- matrix(1:50, 5, 10)
i <- c(1, 3, 5)
j <- c(2, 4, 7)

How to set rowSums(x) to equal some number, as opposed to colSums(x)

I have seen functions where colSums(x) can be set to equal some number when creating a matrix, in R, but I am wondering how to do the same with rowSums(x)? Specifically, I am trying to create a data table (matrix?) with 30 rows and 4 columns, where the sum of the values in each row is less than or equal to 100.
I have used replicate with runif.
x <- replicate(4, diff(c(0, sort(runif(30)), 2)))
I get a matrix where the colSums(x) is 2, but the rowSums(x) is not set to equal 100.

find all unique combinations of n numbers between 1 and k

I want a list of all possible sets of five (or n) numbers between 1 and 63 (or more generalizably 1 and k)
If computing time wasn't an issue, I could do something like
#Get all combenations of numbers between 1 and 63
indexCombinations <- expand.grid(1:63, 1:63, 1:63, 1:63, 1:63)
#Throw out the rows that have more than one of the same number in them
allDifferent <- apply(indexCombinations, 1, function(x){
length(x) == length(unique(x))
} # function
) # apply
indexCombinationsValid <- indexCombinations[allDifferent,]
# And then just take the unique values
indexCombinationsValidUnique <- unique(indexCombinationsValid)
The finding of unique values, I am concerned, is going to be prohibitively slow. Furthermore, I end up having to make a bunch of rows in the first place I never use. I was wondering if anyone has a more elegant and efficient way of getting a data frame or matrix of unique combinations of each of five numbers (or n numbers) between one and some some range of values.
Credit to #SymbolixAU for a very elegant solution, which I re-post here as an answer:
n <- 1:63; x <- combn(n, m = 5)

create an incidence matrix with restrictions in r (i.graph)

I would like to create a (N*M)-Incidence Matrix for a bipartite graph (N=M=200).
However, the following restrictions have to be considered:
Each column i ( 1 , ... , 200 ) has a column sum of g = 10
each row has a Row sum of h = 10
no multiedges (The values in the incidence Matrix only take on the values [0:1]
So far I have
M <- 200; # number of rows
N <- 200; # number of colums
g <- 10
I <- matrix(sample(0:1, M*N, repl=T, prob= c(1-g/N,g/N)), M, N);
Does anybody has a solution?
Here's one way to do what you want. First the algorithm idea, then its implementation in R.
Two step Algorithm Idea
You want a matrix of 0's and 1's, with each row adding up to be 10, and each column adding up to be 10.
Step 1: First,create a trivial solution as follows:
The first 10 rows have 1's for the first 10 elements, then 190 zeros.
The second set of ten rows have 1's from the 11th to the 20th element and so on.
In other words, a feasible solution is to have a 200x200 matrix of all 0's, with dense matrices of 10x10 1's embedded diagonally, 20 times.
Step 2: Shuffle entire rows and entire columns.
In this shuffle, the rowSum and columnSums are maintained.
Implementation in R
I use a smaller matrix of 16x16 to demonstrate. In this case, let's say we want each row and each column to add up to 4. (This colsum has to be integer divisible of the larger square matrix dimension.)
n <- 4 #size of the smaller square
i <- c(1,1,1,1) #dense matrix of 1's
z <- c(0,0,0,0) #dense matrix of 0's
#create a feasible solution to start with:
m <- matrix(c(rep(c(i,z,z,z),n),
rep(c(z,i,z,z),n),
rep(c(z,z,i,z),n),
rep(c(z,z,z,i),n)), 16,16)
#shuffle (Run the two lines following as many times as you like)
m <- m[sample(16), ] #shuffle rows
m <- m[ ,sample(16)] #shuffle columns
#verify that the sum conditions are not violated
colSums(m); rowSums(m)
#solution
print(m)
Hope that helps you move forward with your bipartite igraph.

R: smallest distance between an element of vector a and an element of vector b

a and b are two vectors of real numbers.
They do not necessarily have the same length.
The distance between the ith element of a and the jth element of b is defined as abs(a[i] - b[j])
How would you compute the smallest distance between any element of a and any element of b without explicit loops?
Here is what I did: min(sapply(X=1:length(b), FUN=function(x) abs(a - b[x]))).
However, I have the feeling there is something better to do...
I'd use the dist function to create a distance matrix, and then find the minimum distance in that. This is probably much faster than an explicit loop in R (including sapply).
a = runif(23)
b = runif(10)
d_matrix = as.matrix(dist(cbind(a,b)))
d_matrix[d_matrix == 0] <- NA
sqrt(min(d_matrix, na.rm = TRUE))
Note that cbind recycles the smaller vector. So this function is probably not optimal, but for vectors that do not differ that much in size still much fast than an explicit loop.
And to find which pair of elements had this distance (although the recycling introduces some challenges here):
which(d_matrix == min(d_matrix, na.rm = TRUE), arr.ind = TRUE)
Here's an attempt:
a <- c(9,5,6); b <- c(6,9)
# a
#[1] 9 5 6
# b
#[1] 6 9
combos <- sapply(b,function(x) abs(x-a))
# or an alternative
combos <- abs(outer(a,b,FUN="-"))
You could then get the minimum distance with:
min(combos)
If you wanted to get the respective indexes of the minimum values you could do:
which(combos==min(combos),arr.ind=TRUE)
# each matrix row has the 2 indexes for the minimums
# first column is 'a' index, second is 'b' index
# row col
# [1,] 3 1
# [2,] 1 2
One-liner should work here: min(abs(outer(a, b, "-")))

Resources