I want to create polygons inside an apply and want to do this as quickly as possible from a matrix of coordinates. I have some code and realized this is one of the slowest parts of my code. How can I do this efficiently? I tried two different approaches:
Approach 1
library(sp)
library(terra)
t0 <- Sys.time()
poly_list <- apply(matrix(1:10000), 1, function(idx){
# set coordinates
coords <- cbind(rnorm(100), rnorm(100))
# create polygon
Polygons(list(Polygon(coords)), idx)
})
# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))
# show time passed
print(Sys.time() - t0)
# Time difference of 2.082166 secs
Approach 2
t0 <- Sys.time()
poly_list <- apply(matrix(1:10000), 1, function(idx){
# set coordinates
coords <- cbind(rnorm(100), rnorm(100))
# create polygon
vect(coords, type = "polygon")
})
# convert to terra polygons
poly_terra <- vect(poly_list)
print(Sys.time() - t0)
# Time difference of 16.38044 secs
Why is it faster to create sp polygons and convert them afterwards than directly creating terra polygons? The code with vect(SpatialPolygons(Polygons(list(Polygon(coords)), idx))) seems somewhat complicated. Is there a faster or at least more elegant way?
Edit Currently my fastest option, although it feels illegal:
t0 <- Sys.time()
dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")
poly_list <- apply(matrix(1:10000), 1, function(idx){
# set coordinates
coords <- cbind(rnorm(100), rnorm(100))
# create polygon
new <- dummy
new#ID <- as.character(idx)
new#Polygons[[1]]#coords <- coords
return(new)
})
# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))
print(Sys.time() - t0)
# Time difference of 0.7147191 secs
This is faster than your examples
t0 <- Sys.time()
p <- lapply(1:10000, function(idx){
cbind(id=idx, x=rnorm(100), y=rnorm(100))
})
p <- do.call(rbind, p)
v <- vect(p, "polygons")
print(Sys.time() - t0)
#Time difference of 0.483578 secs
This uses lapply and you state that you want to use apply; but in the context of your example apply does not seem to be a good choice.
I do not see much performance difference between your two sp approaches. Below I use a streamlined version of the one you say is fastest and benchmark it with my approach:
with_terra <- function() {
p <- lapply(1:10000, function(idx){
cbind(id=idx, x=rnorm(100), y=rnorm(100))
})
p <- do.call(rbind, p)
vect(p, "polygons")
}
with_sp <- function() {
dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")
poly_list <- apply(matrix(1:10000), 1, function(idx){
dummy#ID <- as.character(idx)
dummy#Polygons[[1]]#coords <- cbind(rnorm(100), rnorm(100))
dummy
})
vect(SpatialPolygons(poly_list))
}
bm <- microbenchmark::microbenchmark(
sp = with_sp(),
terra = with_terra(),
times = 10
)
bm
#Unit: milliseconds
# expr min lq mean median uq max neval
# sp 836.8434 892.8411 930.5261 935.3788 968.2724 1039.2840 10
# terra 261.2191 276.0770 298.3603 282.7462 296.3674 437.0505 10
I'm not really sure if this will be of help, but I had some good time experimenting and fine-tuning and thought I'll share my preliminary results at least.
Foremost, let me share some input for further reading:
This article was the starting point for some code optimization I worked on some time ago: FasteR! HigheR! StrongeR!
Maybe it's just me, but I prefer for-loops over apply and this approach does not seem to be slower (c.f. here). On the contrary, the median execution time of your first approach was ~0.12 s faster on my machine after I used a loop instead, but maybe there is another reason for you using apply here.
If you choose to go for a loop, here is another great guide how to reduce execution time.
Making use of namespaces actually slows down your code (c.f. here), so better attach the packages you are going to use - like you did.
The native pipe does not seem to have any overhead (c.f. here), so this might be a great way to un-nest your functions and make them look tidier without penalties.
For timing purposes, I came across {tictoc} some time ago as a nice implementation of Sys.time() - t0 from my point of view, for actual benchmarking, {benchmarking} is great.
Noam Ross suggests to find better (= faster) packages for your purpose. You already noticed {sp} operates faster than {terra} with your example. Let me present a third option:
library(sp)
library(terra)
library(sf)
# first node has to be equal to the last node for a polygon to be closed
coords <- cbind(rnorm(99), rnorm(99))
coords <- rbind(coords, coords[1, ])
mbm <- microbenchmark::microbenchmark(
sp = Polygon(coords) |> list() |> Polygons(1),
terra = vect(coords, type = "polygons"),
sf = list(coords) |> st_polygon(),
times = 100
)
ggplot2::autoplot(mbm)
If your target object has to be of class SpatVector, you may consider applying terra::vect() once as a final step. However, what exactly is your goal once you created your polygon objects? This might affect which package / workflow to use. E.g. if you only need geometries in a specific order, you might drop attributes etc.
Considering your third approach - it should not feel illegal from my point of view, pre-allocating objects and exchanging some attributes seems like a smart move to do - a condensate might encompass loops, pipes and maybe the {sf} package, whereat I failed implementing the latter at the moment, but at least I did not slow down your code so far:
# your take on this
illegal_approach_a <- function() {
dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")
poly_list <- apply(matrix(1:10000), 1, function(idx){
# set coordinates
coords <- cbind(rnorm(100), rnorm(100))
# create polygon
new <- dummy
new#ID <- as.character(idx)
new#Polygons[[1]]#coords <- coords
return(new)
})
# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))
}
# my take on this
illegal_approach_b <- function() {
dummy <- cbind(rep(0, 4), rep(0, 4)) |> Polygon() |> list() |> Polygons("0")
poly_list <- list()
for (i in 1:10000) {
# set coordinates
coords <- cbind(rnorm(100), rnorm(100))
# create polygon
new <- dummy
new#ID <- as.character(i)
new#Polygons[[1]]#coords <- coords
poly_list[[i]] <- new
}
# convert to terra polygons
poly_terra <- SpatialPolygons(poly_list) |> vect()
}
mbm <- microbenchmark::microbenchmark(
your_take = illegal_approach_a(),
my_take = illegal_approach_b(),
times = 100
)
ggplot2::autoplot(mbm)
Related
I have a list, which contains 4438 dataframes with different sizes. I am not sure how to make a reproducible example, but the way I obtained the list is using the expand.grid function to have a dataframe with all the possible combination of elements:
citation <- citation %>%
map_depth(., 1, expand.grid)
List before applying expand.grid
List after applying expand.grid
What I am going to achieve is for each dataframe, counting the number of unique values per row, and finding the minimum number of unique values in the dataframe.
First, I write the function below
fun1 <- function(res){
min(apply(res,1,function(x) length(unique(x))))
}
Then, apply the function to each dataframe:
library(furrr)
plan(multisession, workers = 4)
min_set <- c()
min_set <- citation %>% future_map_dbl(fun1)
However, the calculation is super slow, almost 30 mins to complete. I would like to find another way to accelerate the performance. Looking forward to hear the solution from you guys. Thank you in advance
To speed up the current approach of enumerating the combinations, use rowTabulate from the Rfast package (or rowTabulates from the matrixStats package).
However, it will be much faster to get the desired results with the setcover function in the adagio package, which solves the set cover problem directly (i.e., without the use of expand.grid) via integer linear programming with lp from the lpSolve package.
library(Rfast) # for the rowTabulate function
library(adagio) # for the setcover function
# reproducible example data
set.seed(1141593349)
citation1 <- list(
lapply(c(5,2,8,12,6,38), function(size) sample(50, size)),
lapply(c(5,2,8,12,7), function(size) sample(50, size))
)
# get all combinations of the indices of the unique values for each list in citation1
citation2 <- lapply(citation1, function(x) expand.grid(lapply(x, match, table = unique(unlist(x)))))
# original solution
fun1 <- function(res) min(apply(res, 1, function(x) length(unique(x))))
# faster version of the original solution
fun2 <- function(res) min(rowsums(rowTabulate(as.matrix(res)) > 0L))
# linear programming solution (uses citation1 rather than citation2)
fun3 <- function(res) {
v <- unlist(res)
m <- matrix(0L, max(v), length(res))
m[cbind(v, rep.int(seq_along(res), lengths(res)))] <- 1L
setcover(m)$objective
}
microbenchmark::microbenchmark(fun1 = sapply(citation2, fun1),
fun2 = as.integer(sapply(citation2, fun2)),
fun3 = as.integer(sapply(citation1, fun3)),
times = 10,
check = "identical")
#> Unit: milliseconds
#> expr min lq mean median uq max
#> fun1 1110.4976 1162.003601 1217.049501 1204.608151 1281.121601 1331.057001
#> fun2 101.5173 113.123501 142.265371 145.964502 165.788700 187.196301
#> fun3 1.4038 1.461101 1.734781 1.850701 1.870801 1.888702
I would like to find what is the most memory and time efficient way to calculate euclidean distances on a large matrix. I've ran this small benchmark below comparing a few packages I know: parallelDist, geodist, fields and stats. I've also considered this customized function that combines Rcppand bigmemory. Here are the results I've found (reprex below), but I'd like to know whether there are other efficient pacakges / solutions to do this task:
Results
benchmrk
#> package time alloc
#>1: parDist 0.298 5.369186e-04
#>2: fields 1.079 9.486198e-03
#>3: rcpp 54.422 2.161113e+00
#>4: stats 0.770 5.788603e+01
#>5: geodist 2.513 1.157635e+02
# plot
ggplot(benchmrk, aes(x=alloc , y=time, color= package, label=package)) +
geom_label(alpha=.5) +
coord_trans(x="log10", y="log10") +
theme(legend.position = "none")
Reprex
library(parallelDist)
library(geodist)
library(fields)
library(stats)
library(bigmemory)
library(Rcpp)
library(lineprof)
library(geobr)
library(sf)
library(ggplot2)
library(data.table)
# data input
df <- geobr::read_weighting_area()
gc(reset = T)
# convert projection to UTM
df <- st_transform(df, crs = 3857)
# get spatial coordinates
coords <- suppressWarnings(st_coordinates( st_centroid(df) ))
# prepare customized rcpp function
sourceCpp("euc_dist.cpp")
bigMatrixEuc <- function(bigMat){
zeros <- big.matrix(nrow = nrow(bigMat)-1,
ncol = nrow(bigMat)-1,
init = 0,
type = typeof(bigMat))
BigArmaEuc(bigMat#address, zeros#address)
return(zeros)
}
### Start tests
perf_fields <- lineprof(dist_fields <- fields::rdist(coords) )
perf_geodist <- lineprof(dist_geodist <- geodist::geodist(coords, measure = "cheap") )
perf_stats <- lineprof(dist_stats <- stats::dist(coords) )
perf_parDist <- lineprof(dist_parDist <- parallelDist::parDist(coords, method = "euclidean") )
perf_rcpp <- lineprof(dist_rcpp <- bigMatrixEuc( as.big.matrix(coords) ) )
perf_fields$package <- 'fields'
perf_geodist$package <- 'geodist'
perf_stats$package <- 'stats'
perf_parDist$package <- 'parDist'
perf_rcpp$package <- 'rcpp'
# gather results
benchmrk <- rbind(perf_fields, perf_geodist, perf_stats , perf_parDist, perf_rcpp)
benchmrk <- setDT(benchmrk)[, .(time =sum(time), alloc = sum(alloc)), by=package][order(alloc)]
benchmrk
Here, I try to propose an answer 'theoretically'.
I think a combination of the rccp approach (here) and the parDist (here) might allow for working on very large data sets while keeping execution times at an acceptable level?
Unfortunately, I did not work with rccp, RcppParallel nor RcppArmadilloyet. But it seems the parDist and the rccp-big.matrix approach build upon the same 'infrastructure'.
Maybe some more experienced users will take up the challenge.
I have a large raster data (X) with a dimension of 32251*51333. The values of X are repetitions of another array (Y), which has a size of 3*10^6.
Now I want to change the values of X by matching it against each value of Y, for example I can program like this,
for (i in 1:length(Y)){
X[X==Y[i]] = Z[i] #Z is just another array with the same size as Y
}
The problem is that, first the index matching X[X==Y[i]] = Z[i] does not work because X is too large. After a few minutes the program just stops by giving an error "Error: cannot allocate vector of size 6.2 Gb".
Second, going over the loops from 1 to length(Y), even though Y is of size 10^6, may take "forever" to complete.
One approach came to my mind is to split X into small chunks and then do the index match for each chunk. But I feel this would still take a lot of time.
Is there a better way to achieve the above goal?
1st Update:
Thanks to the example provided by #Lyngbakr, I will elaborate this question further. Because the raster I'm working with is very large (32251*51333), it seems not possible to upload it. The example given by #Lyngbakr is very similar to what I want, except that the raster created is too small. Now following the example, I ran two tests by generating a much larger raster with dimension of 3000*2700. See code below.
#Method 1: Use subs
start_time <- Sys.time()
Y <- 1:9
Z <- 91:99
X <- raster(matrix(rep(Y, 3), nrow=3000,ncol = 2700))
df <- data.frame(Y, Z)
X <- subs(X, df)
end_time <- Sys.time()
end_time - start_time
#Time difference of 2.248908 mins
#Method 2: Use for loop
start_time <- Sys.time()
Y <- 1:9
Z <- 91:99
X <- raster(matrix(rep(Y, 3), nrow=3000,ncol = 2700))
for (i in 1:length(Y)){
X[X==Y[i]]=Z[i] #this indexing of R seems not efficient if X becomes large
}
end_time <- Sys.time()
end_time - start_time
#Time difference of 10.22717 secs
As you can see, a simple for loop is even more efficient than the subs function. Remember, the raster shown in the example is still smaller than what I work with (about an order of 100 smaller). Also, the array Y in the example is very small. Now the question could be, how to speed up the Method 2, which is just a simple for loop?
You're looking for the subs function. I don't know if it works with large rasters, but here's how you'd try.
I load the raster package and create some dummy data. (It would be really helpful if you provide data in your question.) Then, I plot the results.
# Load library
library(raster)
#> Loading required package: sp
# Z holds values that will replace Y
Y <- 1:9
Z <- 91:99
# Create dummy raster
X <- raster(matrix(rep(Y, 3), ncol = 9))
# Examine raster
plot(X)
As you can see, X is just a bunch of Y vectors patched together. Next, I bind Y and Z together into a data frame df.
# Combine y & z into a data frame
df <- data.frame(Y, Z)
Finally, I use subs to replace Y values with Z values.
# Substitute Z for Y in X
X <- subs(X, df)
A quick look at the raster shows that the values have been replaced correctly.
# Examine raster
plot(X)
Created on 2019-06-25 by the reprex package (v0.2.1.9000)
Update
Rcppis really helpful when performance is an issue. Below, I compare three methods:
Looping in R (from the question)
Using subs from the raster package
Looping in C++ using Rcpp
By the way, Sys.time() isn't a great way to examine performance, so I'd recommend microbenchmark instead.
# Load library
library(raster)
# Define vectors and raster
Y <- 1:9
Z <- 91:99
X <- raster(matrix(rep(Y, 3), nrow = 3000, ncol = 2700))
method_1 is the subs function.
# Using subs function
method_1 <- function(){
df <- data.frame(Y, Z)
X <- subs(X, df)
}
method_2 is your original looping approach.
# Using R loop
method_2 <- function(){
for (i in 1:length(Y)){
X[X==Y[i]]=Z[i]
}
X
}
method_3 is the looping approach implemented in C++.
# Using Rcpp loops
src <-
"Rcpp::NumericMatrix subs_cpp(Rcpp::NumericMatrix X, Rcpp::NumericVector Y, Rcpp::NumericVector Z){
for(int i = 0; i < Y.length(); ++i){
for(int j = 0; j < X.ncol(); ++j){
for(int k = 0; k < X.nrow(); ++k){
if(X(k, j) == Y(i)){
X(k, j) = Z(i);
}
}
}
}
return X;
}"
Rcpp::cppFunction(src)
method_3 <- function(){
subs_cpp(as.matrix(X), Y, Z)
}
And here I benchmark the approaches.
# Run benchmarking
microbenchmark::microbenchmark(method_1(), method_2(), method_3(), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# method_1() 16861.5447 17737.2124 19321.5674 18628.8573 20117.0159 25506.208 10
# method_2() 671.2223 677.6029 1111.3935 738.6216 1657.0542 2163.137 10
# method_3() 316.9810 319.1484 481.3548 320.2337 326.7133 1477.454 10
As you can see, the Rcpp approach is by far the fastest.
You can also compare the output to ensure they produce the same result using a smaller raster.
# Examine all three outputs with smaller raster
X <- raster(matrix(rep(Y, 3), ncol = 9))
plot(method_1(), main = "Method 1")
plot(method_2(), main = "Method 2")
plot(raster(method_3()), main = "Method 3") # Needs to converted into a raster
And they all look alike. Note that for the third method, the result needs to be converted back to a raster from a matrix.
Well, I need to handle with large dense graphs in R. Also this graphs may have quite big names of vertices. I use igraph package and in general I have the following two possibilities:
store "original big names" in name attribute
store them, for example, in realname attribute and name attribute fill by ids
The thing is igraph can access to the vertex by its name, so I concern about the possible performance. For example, a larger name may require more overhead. However, I found the opposite using the code below.
Idea: measure random access to graph elements depending on the size of the name attribute
Results:
Name: Character Names >> Time difference of 3.435584 mins
Name: Character Ids >> Time difference of 4.507384 mins
Question: Why do I get such a counterintuitive result and What is the best way to store names of vertices?
Thanks in advance!
Code:
library(igraph)
n <- 50000
charIds <- c()
verNames <- c()
g1 <- make_ring(n)
g2 <- make_ring(n)
for (i in 1:n) {
verNames[i] <- paste("VeryVeryVeryLongAndStrangeName:", i, sep = "")
charIds[i] <- as.character(i)
}
V(g1)$name <- verNames
V(g2)$name <- charIds
# -------- Test1: Begin -------- #
start1 <- Sys.time()
for (i in 1:n) {
ranNumber <- as.integer(runif(1, 1, n))
try(V(g1)[verNames[ranNumber]])
}
end1 <- Sys.time()
cat("Name: Character Names >> ")
print (end1 - start1)
# -------- Test1: End -------- #
# -------- Test2: Begin -------- #
start2 <- Sys.time()
for (i in 1:n) {
ranNumber <- as.integer(runif(1, 1, n))
try(V(g2)[charIds[ranNumber]])
}
end2 <- Sys.time()
cat ("Name: Character Ids >> ")
print (end2 - start2)
# -------- Test2: End -------- #
I believe the difference you are getting has more to do with the way the code is timed than with a significant difference in actual times.
The problem seems to be the granularity of system.time. It is not the best way to measure the running time of R code. Much better is to use a package such as package microbenchmark.
In what follows I have changed your code a little, starting with the number of vertices from 50000 to 500.
library(igraph)
library(ggplot2)
library(microbenchmark)
f <- function(g, charVec, rn){
for (i in seq_along(rn)) {
try(V(g)[charVec[ rn[i] ]])
}
}
set.seed(9707)
n <- 500
g1 <- make_ring(n)
g2 <- make_ring(n)
verNames <- paste("VeryVeryVeryLongAndStrangeName:", 1:n, sep = "")
charIds <- as.character(1:n)
ranNumber <- sample.int(n, n, TRUE)
V(g1)$name <- verNames
V(g2)$name <- charIds
mb <- microbenchmark(
long = f(g1, verNames, ranNumber),
short = f(g2, charIds, ranNumber),
times = 100
)
mb
#Unit: seconds
# expr min lq mean median uq max neval
# long 1.48296 1.560941 1.612141 1.599182 1.647836 1.892210 100
# short 1.48385 1.550512 1.629363 1.598392 1.642355 4.287886 100
As you can see, the times the same, except for an outlier that shows up in the short names graph access. And since a picture is worth a thousand words, microbenchmark includes a function to graph its results. (You will need package ggplot2.)
autoplot(mb)
I am attempting to build a large (~200 MM line) dataframe in R. Each entry in the dataframe will consist of approximately 10 digits (e.g. 1234.12345). The code is designed to walk through a list, subtract an item in position [i] from every item after [i], but not the items before [i] (If I was putting the output into a matrix it would be a triangular matrix). The code is simple and works fine on smaller lists, but I am wondering if there is a faster or more efficient way to do this? I assume the first part of the answer is going to entail "don't use a nested for loop," but I am not sure what the alternatives are.
The idea is that this will be an "edge list" for a social network analysis graph. Once I have 'outlist' I will reduce the number of edges based on some criteria(<,>,==,) so the final list (and graph) won't be quite so ponderous.
#Fake data of same approximate dimensions as real data
dlist<-sample(1:20,20, replace=FALSE)
#purge the output list before running the loop
rm(outlist)
outlist<-data.frame()
for(i in 1:(length(dlist)-1)){
for(j in (i+1):length(dlist)){
outlist<-rbind(outlist, c(dlist[i],dlist[j], dlist[j]-dlist[i]))
}
}
IIUC your final dataset will be ~200 million rows by 3 columns, all of type numeric, which takes a total space of:
200e6 (rows) * 3 (cols) * 8 (bytes) / (1024 ^ 3)
# ~ 4.5GB
That's quite a big data, where it's essential to avoid copies wherever possible.
Here's a method that uses data.table package's unexported (internal) vecseq function (written in C and is fast + memory efficient) and makes use of it's assignment by reference operator :=, to avoid copies.
fn1 <- function(x) {
require(data.table) ## 1.9.2
lx = length(x)
vx = as.integer(lx * (lx-1)/2)
# R v3.1.0 doesn't copy on doing list(.) - so should be even more faster there
ans = setDT(list(v1 = rep.int(head(x,-1L), (lx-1L):1L),
v2=x[data.table:::vecseq(2:lx, (lx-1L):1, vx)]))
ans[, v3 := v2-v1]
}
Benchmarking:
I'll benchmark with functions from other answers on your data dimensions. Note that my benchmark is on R v3.0.2, but fn1() should give better performance (both speed and memory) on R v3.1.0 because list(.) doesn't result in copy anymore.
fn2 <- function(x) {
diffmat <- outer(x, x, "-")
ss <- which(upper.tri(diffmat), arr.ind = TRUE)
data.frame(v1 = x[ss[,1]], v2 = x[ss[,2]], v3 = diffmat[ss])
}
fn3 <- function(x) {
idx <- combn(seq_along(x), 2)
out2 <- data.frame(v1=x[idx[1, ]], v2=x[idx[2, ]])
out2$v3 <- out2$v2-out2$v1
out2
}
set.seed(45L)
x = runif(20e3L)
system.time(ans1 <- fn1(x)) ## 18 seconds + ~8GB (peak) memory usage
system.time(ans2 <- fn2(x)) ## 158 seconds + ~19GB (peak) memory usage
system.time(ans3 <- fn3(x)) ## 809 seconds + ~12GB (peak) memory usage
Note that fn2() due to use of outer requires quite a lot of memory (peak memory usage was >=19GB) and is slower than fn1(). fn3() is just very very slow (due to combn, and unnecessary copy).
Another way to create that data is
#Sample Data
N <- 20
set.seed(15) #for reproducibility
dlist <- sample(1:N,N, replace=FALSE)
we could do
idx <- combn(1:N,2)
out2 <- data.frame(i=dlist[idx[1, ]], j=dlist[idx[2, ]])
out2$dist <- out2$j-out2$i
This uses combn to create all paris of indices in the data.set rather than doing loops. This allows us to build the data.frame all at once rather than adding a row at a time.
We compare that to
out1 <- data.frame()
for(i in 1:(length(dlist)-1)){
for(j in (i+1):length(dlist)){
out1<-rbind(out1, c(dlist[i],dlist[j], dlist[j]-dlist[i]))
}
}
we see that
all(out1==out2)
# [1] TRUE
Plus, if we compare with microbenchmark we see that
microbenchmark(loops(), combdata())
# Unit: microseconds
# expr min lq median uq max neval
# loops() 30888.403 32230.107 33764.7170 34821.2850 82891.166 100
# combdata() 684.316 800.384 873.5015 940.9215 4285.627 100
The method that doesn't use loops is much faster.
You can always start with a triangular matrix and then make your dataframe directly from that:
vec <- 1:10
diffmat <- outer(vec,vec,"-")
ss <- which(upper.tri(diffmat),arr.ind = TRUE)
data.frame(one = vec[ss[,1]],
two = vec[ss[,2]],
diff = diffmat[ss])
You need to preallocate out list, this will significantly increase the speed of your code. By preallocating I mean creating an output structure that already has the desired size, but filled with for example NA's.