R: looking for a more efficient way of running rbinom() taking probabilities from a large matrix - r

I have a 20000x90 matrix M of probabilities ranging between 0 and 1. I want to use these probabilities to return a 20000x90 matrix of 1's and 0's (i.e. for each probability p in the matrix I want to run rbinom(1,1,p)).
This is my current solution which works:
apply(M,1:2,function(x) rbinom(1,1,x))
However, it is quite slow and I am wondering if there is a faster way of achieving the same thing.

rbinom is vectorised, so you can pass the vector (or matrix) of probabilities; you just need to change the n, the number of observations, from 1 to the number of values in M.
An example:
# with loop
set.seed(74309281)
M = matrix(runif(20000*90), nr=20000, nc=90)
a1 = apply(M,1:2,function(x) rbinom(1,1,x))
# without loop
set.seed(74309281)
M = matrix(runif(20000*90), nr=20000, nc=90)
a2 = matrix(rbinom(length(M),1,M), nr=nrow(M), nc=ncol(M))
all.equal(a1, a2)
# [1] TRUE

Related

How to calculate sum over term including rising factorial?

I am new to programming and R and would like to compute the following sum
I used the pochMpfr from the Rmpfr package for the rising factorial and a for loop in order compute the sum.
B=rep(1,k+1)
for (i in 0:k) {
B[(i+1)]= (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
}
sum(B)
Doing so, I get the results as list (including always: mpfr) and thus cannot compute the sum.
Is there a possibility to get the results immediately as a Matrix or to convert the list to vector including only the relevant Elements?
The solution is probably quite easy but I haven't found it while looking through the forums.
There is no need to use a for-loop, this should work:
library(Rmpfr)
# You do not define these in your question,
# so I just take some arbitrary values
k <- 10
n <- 3
sigma <- 0.3
i <- 0:k
B <- (-1)^i *choose(k,i)*pochMpfr((-i)*sigma, n)
sum(B)
## 1 'mpfr' number of precision 159 bits
## [1] 6.2977401071861993597462780570563107354142915151e-14

The sum of the first n odd integers

I am trying to create a function that takes the sum of the first n odd integers, i.e the summation from i=1 to n of (2i-1).
If n = 1 it should output 1
If n = 2 it should output 4
I'm having problems using a for loop which only outputs the nth term
n <-2
for (i in 1:n)
{
y<-((2*i)-1)
}
y
In R programming we try avoiding for loops
cumsum ( seq(1,2*n, by=2) )
Or just use 'sum' if you don't want the series of partial sums.
There's actually no need to use a loop or to construct the sequence of the first n odd numbers here -- this is an arithmetic series so we know the sum of the first n elements in closed form:
sum.first.n.odd <- function(n) n^2
sum.first.n.odd(1)
[1] 1
sum.first.n.odd(2)
[1] 4
sum.first.n.odd(100)
[1] 10000
This should be a good deal more efficient than any solution based on for or sum because it never computes the elements of the sequence.
[[Just seeing the title -- the OP apparently knows the analytic result and wanted something else...]]
Try this:
sum=0
n=2
for(i in seq(1,2*n,2)){
sum=sum+i
}
But, of course, R is rather slow when working with loops. That's why one should avoid them.

create an incidence matrix with restrictions in r (i.graph)

I would like to create a (N*M)-Incidence Matrix for a bipartite graph (N=M=200).
However, the following restrictions have to be considered:
Each column i ( 1 , ... , 200 ) has a column sum of g = 10
each row has a Row sum of h = 10
no multiedges (The values in the incidence Matrix only take on the values [0:1]
So far I have
M <- 200; # number of rows
N <- 200; # number of colums
g <- 10
I <- matrix(sample(0:1, M*N, repl=T, prob= c(1-g/N,g/N)), M, N);
Does anybody has a solution?
Here's one way to do what you want. First the algorithm idea, then its implementation in R.
Two step Algorithm Idea
You want a matrix of 0's and 1's, with each row adding up to be 10, and each column adding up to be 10.
Step 1: First,create a trivial solution as follows:
The first 10 rows have 1's for the first 10 elements, then 190 zeros.
The second set of ten rows have 1's from the 11th to the 20th element and so on.
In other words, a feasible solution is to have a 200x200 matrix of all 0's, with dense matrices of 10x10 1's embedded diagonally, 20 times.
Step 2: Shuffle entire rows and entire columns.
In this shuffle, the rowSum and columnSums are maintained.
Implementation in R
I use a smaller matrix of 16x16 to demonstrate. In this case, let's say we want each row and each column to add up to 4. (This colsum has to be integer divisible of the larger square matrix dimension.)
n <- 4 #size of the smaller square
i <- c(1,1,1,1) #dense matrix of 1's
z <- c(0,0,0,0) #dense matrix of 0's
#create a feasible solution to start with:
m <- matrix(c(rep(c(i,z,z,z),n),
rep(c(z,i,z,z),n),
rep(c(z,z,i,z),n),
rep(c(z,z,z,i),n)), 16,16)
#shuffle (Run the two lines following as many times as you like)
m <- m[sample(16), ] #shuffle rows
m <- m[ ,sample(16)] #shuffle columns
#verify that the sum conditions are not violated
colSums(m); rowSums(m)
#solution
print(m)
Hope that helps you move forward with your bipartite igraph.

R looping over two vectors

I have created two vectors in R, using statistical distributions to build the vectors.
The first is a vector of locations on a string of length 1000. That vector has around 10 values and is called mu.
The second vector is a list of numbers, each one representing the number of features at each location mentioned above. This vector is called N.
What I need to do is generate a random distribution for all features (N) at each location (mu)
After some fiddling around, I found that this code works correctly:
for (i in 1:length(mu)){
a <- rnorm(N[i],mu[i],20)
feature.location <- c(feature.location,a)
}
This produces the right output - a list of numbers of length sum(N), and each number is a location figure which correlates with the data in mu.
I found that this only worked when I used concatenate to get the values into a vector.
My question is; why does this code work? How does R know to loop sum(N) times but for each position in mu? What role does concatenate play here?
Thanks in advance.
To try and answer your question directly, c(...) is not "concatenate", it's "combine". That is, it combines it's argument list into a vector. So c(1,2,3) is a vector with 3 elements.
Also, rnorm(n,mu,sigma) is a function that returns a vector of n random numbers sampled from the normal distribution. So at each iteration, i,
a <- rnorm(N[i],mu[i],20)
creates a vector a containing N[i] random numbers sampled from Normal(mu[i],20). Then
feature.location <- c(feature.location,a)
adds the elements of that vector to the vector from the previous iteration. So at the end, you have a vector with sum(N[i]) elements.
I guess you're sampling from a series of locations, each a variable no. of times.
I'm guessing your data looks something like this:
set.seed(1) # make reproducible
N <- ceiling(10*runif(10))
mu <- sample(seq(1000), 10)
> N;mu
[1] 3 4 6 10 3 9 10 7 7 1
[1] 206 177 686 383 767 496 714 985 377 771
Now you want to take a sample from rnorm of length N(i), with mean mu(i) and sd=20 and store all the results in a vector.
The method you're using (growing the vector) is not recommended as it will be re-copied in memory each time an element is added. (See Circle 2, although for small examples like this, it's not so important.)
First, initialize the storage vector:
f.l <- NULL
for (i in 1:length(mu)){
a <- rnorm(n=N[i], mean=mu[i], sd=20)
f.l <- c(f.l, a)
}
Then, each time, a stores your sample of length N[i] and c() combines it with the existing f.l by adding it to the end.
A more efficient approach is
unlist(mapply(rnorm, N, mu, MoreArgs=list(sd=20)))
Which vectorizes the loop. Unlist is used as mapply returns a list of vectors of varying lengths.

How to improve performance on counting columns in a matrix which are below a threshold?

In my code I am subtracting one column of a matrix from every other column of the same matrix.
Then I count how many of the new columns have only elements that are smaller than r.
I'm doing this for each column of the matrix. You can see my code below. I left out the part where I put values into the matrix.
Is there any way to improve the performance of this code? I can't seem to figure out a way to make this faster
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
for (i in 1:(window_step)){
B_m_r[i] = sum(apply(abs(B[,-i]-B[,i]), 2,function(x) max(x) < r))
}
Solution
B = matrix(NA,(m),(window_step))
B_m_r = c(1:(window_step))
buffer_B = matrix(NA,(window_step-1),(window_step-1))
for (i in 1:(window_step-2)){
buffer_B[i,c(i:(window_step-1))] = apply(abs(B[,-c(1:i)]-B[,i]),2,function(x) max(x) < r)
B_m_r[i] = (sum(buffer_B[i,c(i:(window_step-1))])+sum(buffer_B[1:i,i]))
}
B_m_r[window_step] = sum(buffer_B[1:(window_step-1),(window_step-1)])
B_m_r[window_step-1] = sum(buffer_B[1:(window_step-2),(window_step-2)])
Ok so based on the help from Яaffael I found a solution, that doesn't calculate the differences twice.
Instead I save the result of the comparison with r from previous loops in the matrix buffer_B and use them for the next loop to calculate the sum of all columns who are smaller than r.
Now the code takes only half the time to finish.
Thanks!
You can for example reduce the calculation time by 50% by only checking "< r" for half of the column differences because they are effectively symmetric.
You are calculating abs(first of B - last of B) and abs(last of B - first of B).
PLUS you can precalculate the handled difference matrix instead of using a for loop to set it up step by step.
# I am using single-row matrices to keep it simple
> A <- matrix(1:4,ncol=4)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 4 3
> A <- matrix(1:5,ncol=5)
> A[,1:ceiling(ncol(A)/2)]
[1] 1 2 3
> A[,ncol(A):(floor(ncol(A)/2)+1)]
[1] 5 4 3
> abs(A[,1:ceiling(ncol(A)/2)] - A[,ncol(A):(floor(ncol(A)/2)+1)])
[1] 4 2 0
When you want to speed up code in R then first thing you should try is to turn all loops into vectorized expressions using R functions. A loop will run within R. Vectorized function calls allow R to execute essentially compiled C code.

Resources