I'm trying (by using R) to build a "grid" in a matrix based on two input vectors. So, the idea is to avoid nested loop like this:
inputVector1=1:4
inputVector2=1:4
grid=NULL
for(i in inputVector1){
line=NULL
for(j in inputVector2){
cellValue=i+j # Instead of i+j it can be anything like taking a value in a dataframe
line=cbind(line,cellValue)
}
grid=rbind(grid,line)
}
Is there a dedicated function in R to do this kind of job faster and simpler ? I know there is apply family functions but I didn't found a proper way to do it (without combining multiple apply family functions). Thank you for the help.
Loops are kind of simple and they are not necessarily slow. However, it depends on how to use those loops. In your code (I call your approach L.GUEGAN(), for further reference), for instance, you don't exploit the fact that you know the size of your ultimate grid and you keep expanding vectors, matrices. That slows things down. A very simple alternative would be
niceFor <- function() {
grid <- matrix(0, nrow = length(inputVector1), ncol = length(inputVector2))
for(i in seq_along(inputVector1))
for(j in seq_along(inputVector2))
grid[i, j] <- i + j
grid
}
where the essential difference is predefining the grid object and updating its values, rather than creating new objects.
Yes, you may say that there is a dedicated function for what:
outer(inputVector1, inputVector2, `+`)
However, one needs to keep in mind that the function in the third argument needs to be vectorized, which is the case in this situation. That is, vectors are allowed when using addition
1:2 + 3:4
# [1] 4 6
`+`(1:2, 3:4)
# [1] 4 6
However, some other functions are not vectorized. E.g.,
seq(3:4, 6:7)
# Error in seq.default(3:4, 6:7) : 'from' must be of length 1
In that case, if you use outer, take a look at ?Vectorize.
Certain operations have even "more direct" dedicated functions. E.g., if we had
grid[i, j] <- i * j
Then you should use
inputVector1 %*% t(inputVector2)
as it would be faster and cleaner than both loops and outer.
A comparison of the three approaches mentioned before
microbenchmark(L.GUEGAN(), niceFor(), funOuter(), times = 2000)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# L.GUEGAN() 24.354 33.8645 38.933968 35.6315 40.878 295.661 2000 c
# niceFor() 4.011 4.7820 6.576742 5.4050 7.697 29.547 2000 a
# funOuter() 4.928 6.1935 8.701545 7.3085 10.619 74.449 2000 b
So, the nice for loop seems even to be superior if speed matters. Notice that you could further improve it by exploiting symmetry of your grid: you could compute only half of the matrix manually and then use your results to fill the other triangle.
Thanks to #hrbrmstr this is what I was looking for:
outer( 1:4, 1:4, function(a,b){mapply(FUN = function(x,y){return(x+y)},a,b)} )
Related
I have seen many variations on the "split vector X into Y chunks in R" question on here. See for example: here and here for just two. So, when I realized I needed to split a vector into Y chunks of random size, I was surprised to find that the randomness requirement might be "new"--I couldn't find a way to do this on here.
So, here's what I've drawn up:
k.chunks = function(seq.size, n.chunks) {
break.pts = sample(1:seq.size, n.chunks, replace=F) %>% sort() #Get a set of break points chosen from along the length of the vector without replacement so no duplicate selections.
groups = rep(NA, seq.size) #Set up the empty output vector.
groups[1:break.pts[1]] = 1 #Set the first set of group affiliations because it has a unique start point of 1.
for (i in 2:(n.chunks)) { #For all other chunks...
groups[break.pts[i-1]:break.pts[i]] = i #Set the respective group affiliations
}
groups[break.pts[n.chunks]:seq.size] = n.chunks #Set the last group affiliation because it has a unique endpoint of seq.size.
return(groups)
}
My question is: Is this inelegant or inefficient somehow? It will get called 1000s of times in the code I plan to do, so efficiency is important to me. It'd be especially nice to avoid the for loop or having to set both the first and last groups "manually." My other question: Are there logical inputs that could break this? I recognize that n.chunks cannot > seq.size, so I mean other than that.
That should be pretty quick for smaller numbers. But here a more concise way.
k.chunks2 = function(seq.size, n.chunks) {
break.pts <- sort(sample(1:seq.size, n.chunks - 1, replace = FALSE))
break.len <- diff(c(0, break.pts, seq.size))
groups <- rep(1:n.chunks, times = break.len)
return(groups)
}
If you really get a huge number of groups, I think the sort will start to cost you execution time. So you can do something like this (probably can be tweaked to be even faster) to split based on proportions. I am not sure how I feel about this, because as n.chunks gets very large, the proportions will get very small. But it is faster.
k.chunks3 = function(seq.size, n.chunks) {
props <- runif(n.chunks)
grp.props <- props / sum(props)
chunk.size <- floor(grp.props[-n.chunks] * seq.size)
break.len <- c(chunk.size, seq.size - sum(chunk.size))
groups <- rep(1:n.chunks, times = break.len)
return(groups)
}
Running a benchmark, I think any of these will be fast enough (unit is microseconds).
n <- 1000
y <- 10
microbenchmark::microbenchmark(k.chunks(n, y),
k.chunks2(n, y),
k.chunks3(n, y))
Unit: microseconds
expr min lq mean median uq max neval
k.chunks(n, y) 49.9 52.05 59.613 53.45 58.35 251.7 100
k.chunks2(n, y) 46.1 47.75 51.617 49.25 52.55 107.1 100
k.chunks3(n, y) 8.1 9.35 11.412 10.80 11.75 44.2 100
But as the numbers get larger, you will notice a meaningful speedup (note the unit is now milliseconds).
n <- 1000000
y <- 100000
microbenchmark::microbenchmark(k.chunks(n, y),
k.chunks2(n, y),
k.chunks3(n, y))
Unit: milliseconds
expr min lq mean median uq max neval
k.chunks(n, y) 46.9910 51.38385 57.83917 54.54310 56.59285 113.5038 100
k.chunks2(n, y) 17.2184 19.45505 22.72060 20.74595 22.73510 69.5639 100
k.chunks3(n, y) 7.7354 8.62715 10.32754 9.07045 10.44675 58.2093 100
All said and done, I would probably use my k.chunks2() function.
Random is probably inefficient, but it would seem to be expected that it should be so. Random suggests all input elements should also be random. So, considering a desired random selection from a vector Y; it would seem the effort should be applied to an index of Y, and successive Y(s), that would be or seem random. With sufficient sets of Y(s) it can be discerned how far from completely random the indexing is, but maybe that is not material, or perhaps merely thousands of repetitions is insufficient to demonstrate it.
None the less, my sense is that both inputs to sample need to be 'random' in some way as a certainty in one reduces the randomness of the other.
my_vector <- c(1:100000)
sample_1 <- sample(my_vector, 50, replace = FALSE)
sample_2 <- sample(my_vector, 80, replace = FALSE)
full_range <- c(1, sort(unique(sample1,sample2)), 100000)
starts <- full_range[c(TRUE,FALSE)]#[generally](https://stackoverflow.com/questions/33257610/how-to-return-the-elements-in-the-odd-position)
ends <- full_range[c(FALSE, TRUE)]
!unique(diff(full_range))
And absent setting seed, I think non-reproducible is as close as you get to a random selection upon Y(s). This answer is just to suggest an approach to indexing Y. The use of indices thereafter might follow #Adam 's approach. And, of course, I could be completely wrong about all of this. Clearer random thinkers than me might well weigh in...
Basically I am helping someone to write some code for their research, but my usual time saving tactics have not reduced the run time of her algorithm enough for it to be reasonable. I was hoping someone else might know a better way to make a function run quickly based on an example I have written to avoid including information about the research.
The object in the example is smaller than the one she is using (but can easily be made larger). For the actual algorithm, this piece takes about 3 minutes in a small case, but might take 8-10 in the full case, and needs to run probably 1000-10000 times. This is the reason I need to seriously reduce the run time.
How I am currently doing this (hopefully with enough comments to make my thought process obvious):
example<-array(rnorm(100000), dim=c(5, 25, 40, 20))
observation <- array(rnorm(600), dim=c(5, 5, 12))
calc.err<-function(value, observation){
#'This creates the squared error for each observation, and each point in the
#'example array, across the five values in the first dimension of each
sqError<-(value-observation)^2
#'the apply function here sums up the squared error for each observation and
#'point. This is the value returned
return(apply(sqError, c(2,3), function(x) sum(x)))
}
run<-apply(example, c(2,3,4), function(x) calc.err(x, observation))
#'It isn't returned in the right format (small problem) but reformatting is fast
format<-array(run, dim=c(5, 12, 25, 40, 20))
Will clarify if necessary.
edit:
The data.table package appears to be very helpful. I will have to learn that package, but preliminaries seem to be much faster. I guess I was working with arrays because the code she gave me to make faster had the objects formatted that way. Didn't even think about changing it
Here's a couple simple refactors along with the timings:
calc.err2 <- function(value, observation){
#'This creates the squared error for each observation, and each point in the
#'example array, across the five values in the first dimension of each
sqError<-(value-observation)^2
#' getting rid of the anonymous function
apply(sqError, c(2,3), sum)
}
calc.err3 <- function(value, observation){
#'This creates the squared error for each observation, and each point in the
#'example array, across the five values in the first dimension of each
sqError<-(value-observation)^2
#' replacing with colSums
colSums(sqError)
}
R>microbenchmark(times=8, apply(example, 2:4, calc.err, observation),
+ apply(example, 2:4, calc.err2, observation),
+ apply(example, 2:4, calc.err3, observation)
+ )
Unit: milliseconds
expr min lq
apply(example, 2:4, calc.err, observation) 2284.350162 2321.875878
apply(example, 2:4, calc.err2, observation) 2194.316755 2257.007572
apply(example, 2:4, calc.err3, observation) 645.004808 652.567611
mean median uq max neval
2349.7524509 2336.6661645 2393.3452420 2409.894876 8
2301.7896566 2298.9346090 2362.5479790 2383.020177 8
681.3176878 667.9070175 720.7049605 723.177516 8
colSums is way faster than the corresponding apply.
I am trying to write R code which acts as a "moving window", just with memory (state). I have figured out (thanks to this question) how to apply a function to subsequent tuples of elements. For example, if I wish to write a (simple) moving average with a typical period 4, I would do the following:
mapply(myfunc, x[1:(length(x)-4)], x[2:(length(x)-3)], x[3:(length(x)-2)], x[4:(length(x)-1)])
Where myfunc is a function with 4 arguments, which calculates their mean (I cannot use mean, as it expects only 1 argument, and I don't know how to make the 4 arguments a single vector).
That's quite cumbersome, though, and if the typical period is 100, say, I am not sure how to do it.
So here's my first question: how do I generalize this?
But here's another issue: suppose I wish the applied function to be able to save state. A simple example would be to keep record of how many values it was applied on so far. Another example is the exponential moving average (EMA), which is not really a window function, but instead a function which works on single values but which keeps state (the last resulted mean).
How can I write a function which when applied to a vector, works on its values one by one, returning a vector of the same length, which is able to retain its last output every time, or save any other "state" during its calculations? In Python, for example, I'd use classes for that, but that's quite difficult in R.
Important note: I am not interested in auxiliary R packages like zoo or TTR to do the work for me. I am trying to learn R, and in any case the functions I wish to write, while having similarities with MA or EMA, are custom, and do not exist in any of these packages.
Regarding your first question,
n <- length(x)
k <- 4
r <- embed(x, n-k)[1:k, seq(n-k, 1)]
do.call("mapply", c("myfunc", split(r, 1:k)))
Regarding the second question, Reduce can be used to iterate over a vector saving state.
For things like this you should consider using a plain for loop:
x <- runif(10000)
k <- 100
n <- length(x)
res <- numeric(n - k)
library(microbenchmark)
microbenchmark(times=5,
for(i in k:n) res[i - k + 1] <- sum(vec[i:(i + k)]),
{
r <- embed(x, n-k)[1:k, seq(n-k, 1)]
gg <- do.call("mapply", c("sum", split(r, 1:k)))
},
flt <- filter(x, rep(1, k))
)
Produces:
Unit: milliseconds
min lq median uq max neval
for 163.5403 164.4929 165.2543 166.6315 167.0608 5
embed/mapply 1255.2833 1307.3708 1338.2748 1341.5719 1405.1210 5
filter 6.7101 6.7971 6.8073 6.8161 6.8991 5
Now, the results are not identical and I don't pretend to understand exactly what GGrothendieck is doing with embed, but generally speaking for loops are just as fast as *pply functions so long as you initialize your result vectors first. Windowed calculations don't lend themselves well to vectorization, so might as well use a for loop.
EDIT: as several have pointed out in comments, there appears to be an internally implemented function to do (filter) this that is quite a bit faster, so that seems to be the best option (though you should confirm it actually does what you want as again, the results are not exactly identical and I am not personally familiar with the function; in it's default configuration it appears to do a rolling weighted sum, or sum if weights are 1, with a centered window).
TL;DR
What is the fastest method in R for reading and writing a subset of
columns from a very large matrix. I attempt a solution with data.table
but need a fast way to extract a sequence of columns?
Short Answer: The expensive part of the operation is assignment. Thus the solution is to stick with a matrix and use Rcpp and C++ to modify the matrix in place. There are two excellent answers below with examples.[for those applying to other problems be sure to read the disclaimers in the solutions!]. Scroll to the bottom of the question for some more lessons learned.
This is my first Stack Overflow question- I greatly appreciate your time in taking a look and I apologize if I've left anything out. I'm working on an R package where I have a performance bottleneck from subsetting and writing to portions of a matrix (NB for statisticians the application is updating sufficient statistics after processing each data point). The individual operations are incredibly fast but the sheer number of them requires it to be as fast as possible. The simplest version of the idea is a matrix of dimension K by V where K is generally between 5 and 1000 and V can be between 1000 and 1,000,000.
set.seed(94253)
K <- 100
V <- 100000
mat <- matrix(runif(K*V),nrow=K,ncol=V)
we then end up performing a calculation on a subset of the columns and adding this into the full matrix.
thus naively it looks like
Vsub <- sample(1:V, 20)
toinsert <- matrix(runif(K*length(Vsub)), nrow=K, ncol=length(Vsub))
mat[,Vsub] <- mat[,Vsub] + toinsert
library(microbenchmark)
microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert)
because this is done so many times it can be quite slow as a result of R's copy-on-change semantics (but see the lessons learned below, modification can actually happen in place in some cricumstances).
For my problem the object need not be a matrix (and I'm sensitive to the difference as outlined here Assign a matrix to a subset of a data.table). I always want the full column and so the list structure of a data frame is fine. My solution was to use Matthew Dowle's awesome data.table package. The write can be done extraordinarily quickly using set(). Unfortunately getting the value is somewhat more complicated. We have to call the variables setting with=FALSE which dramatically slows things down.
library(data.table)
DT <- as.data.table(mat)
set(DT, i=NULL, j=Vsub,DT[,Vsub,with=FALSE] + as.numeric(toinsert))
Within the set() function using i=NULL to reference all rows is incredibly fast but (presumably due to the way things are stored under the hood) there is no comparable option for j. #Roland notes in the comments that one option would be to convert to a triple representation (row number, col number, value) and use data.tables binary search to speed retrieval. I tested this manually and while it is quick, it does approximately triple the memory requirements for the matrix. I would like to avoid this if possible.
Following the question here: Time in getting single elemets from data.table and data.frame objects. Hadley Wickham gave an incredibly fast solution for a single index
Vone <- Vsub[1]
toinsert.one <- toinsert[,1]
set(DT, i=NULL, j=Vone,(.subset2(DT, Vone) + toinsert.one))
however since the .subset2(DT,i) is just DT[[i]] without the methods dispatch there is no way (to my knowledge) to grab several columns at once although it certainly seems like it should be possible. As in the previous question, it seems like since we can overwrite the values quickly we should be able to read them quickly.
Any suggestions? Also please let me know if there is a better solution than data.table for this problem. I realized its not really the intended use case in many respects but I'm trying to avoid porting the whole series of operations to C.
Here are a sequence of timings of elements discussed- the first two are all columns, the second two are just one column.
microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert,
set(DT, i=NULL, j=Vsub,DT[,Vsub,with=FALSE] + as.numeric(toinsert)),
mat[,Vone] <- mat[,Vone] + toinsert.one,
set(DT, i=NULL, j=Vone,(.subset2(DT, Vone) + toinsert.one)),
times=1000L)
Unit: microseconds
expr min lq median uq max neval
Matrix 51.970 53.895 61.754 77.313 135.698 1000
Data.Table 4751.982 4962.426 5087.376 5256.597 23710.826 1000
Matrix Single Col 8.021 9.304 10.427 19.570 55303.659 1000
Data.Table Single Col 6.737 7.700 9.304 11.549 89.824 1000
Answer and Lessons Learned:
Comments identified the most expensive part of the operation as the assignment process. Both solutions give answers based on C code which modify the matrix in place breaking R convention of not modifying the argument to a function but providing a much faster result.
Hadley Wickham stopped by in the comments to note that the matrix modification is actually done in place as long as the object mat is not referenced elsewhere (see http://adv-r.had.co.nz/memory.html#modification-in-place). This points to an interesting and subtle point. I was performing my evaluations in RStudio. RStudio as Hadley notes in his book creates an additional reference for every object not within a function. Thus while in the performance case of a function the modification would happen in place, at the command line it was producing a copy-on-change effect. Hadley's package pryr has some nice functions for tracking references and addresses of memory.
Fun with Rcpp:
You can use Eigen's Map class to modify an R object in place.
library(RcppEigen)
library(inline)
incl <- '
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXi;
typedef Map<MatrixXd> MapMatd;
typedef Map<VectorXi> MapVeci;
'
body <- '
MapMatd A(as<MapMatd>(AA));
const MapMatd B(as<MapMatd>(BB));
const MapVeci ix(as<MapVeci>(ind));
const int mB(B.cols());
for (int i = 0; i < mB; ++i)
{
A.col(ix.coeff(i)-1) += B.col(i);
}
'
funRcpp <- cxxfunction(signature(AA = "matrix", BB ="matrix", ind = "integer"),
body, "RcppEigen", incl)
set.seed(94253)
K <- 100
V <- 100000
mat2 <- mat <- matrix(runif(K*V),nrow=K,ncol=V)
Vsub <- sample(1:V, 20)
toinsert <- matrix(runif(K*length(Vsub)), nrow=K, ncol=length(Vsub))
mat[,Vsub] <- mat[,Vsub] + toinsert
invisible(funRcpp(mat2, toinsert, Vsub))
all.equal(mat, mat2)
#[1] TRUE
library(microbenchmark)
microbenchmark(mat[,Vsub] <- mat[,Vsub] + toinsert,
funRcpp(mat2, toinsert, Vsub))
# Unit: microseconds
# expr min lq median uq max neval
# mat[, Vsub] <- mat[, Vsub] + toinsert 49.273 49.628 50.3250 50.8075 20020.400 100
# funRcpp(mat2, toinsert, Vsub) 6.450 6.805 7.6605 7.9215 25.914 100
I think this is basically what #Joshua Ulrich proposed. His warnings regarding breaking R's functional paradigm apply.
I do the addition in C++, but it is trivial to change the function to only do assignment.
Obviously, if you can implement your whole loop in Rcpp, you avoid repeated function calls at the R level and will gain performance.
Here's what I had in mind. This could probably be much sexier with Rcpp and friends, but I'm not as familiar with those tools.
#include <R.h>
#include <Rinternals.h>
#include <Rdefines.h>
SEXP addCol(SEXP mat, SEXP loc, SEXP matAdd)
{
int i, nr = nrows(mat), nc = ncols(matAdd), ll = length(loc);
if(ll != nc)
error("length(loc) must equal ncol(matAdd)");
if(TYPEOF(mat) != TYPEOF(matAdd))
error("mat and matAdd must be the same type");
if(nr != nrows(matAdd))
error("mat and matAdd must have the same number of rows");
if(TYPEOF(loc) != INTSXP)
error("loc must be integer");
int *iloc = INTEGER(loc);
switch(TYPEOF(mat)) {
case REALSXP:
for(i=0; i < ll; i++)
memcpy(&(REAL(mat)[(iloc[i]-1)*nr]),
&(REAL(matAdd)[i*nr]), nr*sizeof(double));
break;
case INTSXP:
for(i=0; i < ll; i++)
memcpy(&(INTEGER(mat)[(iloc[i]-1)*nr]),
&(INTEGER(matAdd)[i*nr]), nr*sizeof(int));
break;
default:
error("unsupported type");
}
return R_NilValue;
}
Put the above function in addCol.c, then run R CMD SHLIB addCol.c. Then in R:
addColC <- dyn.load("addCol.so")$addCol
.Call(addColC, mat, Vsub, mat[,Vsub]+toinsert)
The slight advantage to this approach over Roland's is that this only does the assignment. His function does the addition for you, which is faster, but also means you need a separate C/C++ function for every operation you need to do.
I'm analyzing large sets of data using the following script:
M <- c_alignment
c_check <- function(x){
if (x == c_1) {
1
}else{
0
}
}
both_c_check <- function(x){
if (x[res_1] == c_1 && x[res_2] == c_1) {
1
}else{
0
}
}
variance_function <- function(x,y){
sqrt(x*(1-x))*sqrt(y*(1-y))
}
frames_total <- nrow(M)
cols <- ncol(M)
c_vector <- apply(M, 2, max)
freq_vector <- matrix(nrow = sum(c_vector))
co_freq_matrix <- matrix(nrow = sum(c_vector), ncol = sum(c_vector))
insertion <- 0
res_1_insertion <- 0
for (res_1 in 1:cols){
for (c_1 in 1:conf_vector[res_1]){
res_1_insertion <- res_1_insertion + 1
insertion <- insertion + 1
res_1_subset <- sapply(M[,res_1], c_check)
freq_vector[insertion] <- sum(res_1_subset)/frames_total
res_2_insertion <- 0
for (res_2 in 1:cols){
if (is.na(co_freq_matrix[res_1_insertion, res_2_insertion + 1])){
for (c_2 in 1:max(c_vector[res_2])){
res_2_insertion <- res_2_insertion + 1
both_res_subset <- apply(M, 1, both_c_check)
co_freq_matrix[res_1_insertion, res_2_insertion] <- sum(both_res_subset)/frames_total
co_freq_matrix[res_2_insertion, res_1_insertion] <- sum(both_res_subset)/frames_total
}
}
}
}
}
covariance_matrix <- (co_freq_matrix - crossprod(t(freq_vector)))
variance_matrix <- matrix(outer(freq_vector, freq_vector, variance_function), ncol = length(freq_vector))
correlation_coefficient_matrix <- covariance_matrix/variance_matrix
A model input would be something like this:
1 2 1 4 3
1 3 4 2 1
2 3 3 3 1
1 1 2 1 2
2 3 4 4 2
What I'm calculating is the binomial covariance for each state found in M[,i] with each state found in M[,j]. Each row is the state found for that trial, and I want to see how the state of the columns co-vary.
Clarification: I'm finding the covariance of two multinomial distributions, but I'm doing it through binomial comparisons.
The input is a 4200 x 510 matrix, and the c value for each column is about 15 on average. I know for loops are terribly slow in R, but I'm not sure how I can use the apply function here. If anyone has a suggestion as to properly using apply here, I'd really appreciate it. Right now the script takes several hours. Thanks!
I thought of writing a comment, but I have too much to say.
First of all, if you think apply goes faster, look at Is R's apply family more than syntactic sugar? . It might be, but it's far from guaranteed.
Next, please don't grow matrices as you move through your code, that slows down your code incredibly. preallocate the matrix and fill it up, that can increase your code speed more than a tenfold. You're growing different vectors and matrices through your code, that's insane (forgive me the strong speech)
Then, look at the help page of ?subset and the warning given there:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
[, and in particular the non-standard evaluation of argument subset
can have unanticipated consequences.
Always. Use. Indices.
Further, You recalculate the same values over and over again. fre_res_2 for example is calculated for every res_2 and state_2 as many times as you have combinations of res_1 and state_1. That's just a waste of resources. Get out of your loops what you don't need to recalculate, and save it in matrices you can just access again.
Heck, now I'm at it: Please use vectorized functions. Think again and see what you can drag out of the loops : This is what I see as the core of your calculation:
cov <- (freq_both - (freq_res_1)*(freq_res_2)) /
(sqrt(freq_res_1*(1-freq_res_1))*sqrt(freq_res_2*(1-freq_res_2)))
As I see it, you can construct a matrix freq_both, freq_res_1 and freq_res_2 and use them as input for that one line. And that will be the whole covariance matrix (don't call it cov, cov is a function). Exit loops. Enter fast code.
Given the fact I have no clue what's in c_alignment, I'm not going to rewrite your code for you, but you definitely should get rid of the C way of thinking and start thinking R.
Let this be a start: The R Inferno
It's not really the 4 way nested loops but the way your code is growing memory on each iteration. That's happening 4 times where I've placed # ** on the cbind and rbind lines. Standard advice in R (and Matlab and Python) in situations like this is to allocate in advance and then fill it in. That's what the apply functions do. They allocate a list as long as the known number of results, assign each result to each slot, and then merge all the results together at the end. In your case you could just allocate the correct size matrix in advance and assign into it at those 4 points (roughly speaking). That should be as fast as the apply family, and you might find it easier to code.