I have to calculate the following in R
where kip, c are constants. One way of doing this is like:
xfun<- function(x,k,p,c){
ghhh<-numeric()
for(i in 1: length(x)){
ghhh[i]<-sum(k/(x[i]-x[1:(i-1)]+c)^p)
}
res<-sum(log(ghhh))
return(res)
}
. But can I calculate this using "outer" ? So that it becomes faster?
The data is like:
t <- numeric(2000)
t[1]<-0
for (i in 2:2000){
t[i]<- t[i-1]+rexp(1, 0.2)
}
k=0.5; p=1.2; c=0.02
Your equation is a bit confusing. I'm not sure what should happen in the inner sum if i == 1. Sum from 1 to zero?
Based on some guessing (if I guessed wrong, you need to adjust the following), I suspect your function should be corrected to this:
xfun<- function(x,k,p,c){
ghhh<-numeric() # it would be better practice to use numeric(length(x) - 1)
for(i in 1: (length(x) - 1)){
ghhh[i]<-sum(k/(x[i+1]-x[1:i]+c)^p)
}
res<-sum(log(ghhh))
return(res)
}
t <- numeric(2000)
t[1]<-0
set.seed(42)
for (i in 2:2000){
t[i]<- t[i-1]+rexp(1, 0.2)
}
k=0.5; p=1.2; c=0.02
xfun(t, k, p, c)
#[1] -1526.102
Rewritten with outer:
xfun1 <- function(x ,k ,p ,c){
o <- outer(seq_along(x), seq_along(x), function(i, j) {
res <- numeric(length(i))
test <- j < i
res[test] <- k / (x[i[test]] - x[j[test]] + c) ^ p
res
})
sum(log(rowSums(o)[-1]))
}
xfun1(t, k, p, c)
#[1] -1526.102
Benchmarking:
library(microbenchmark)
microbenchmark(loop = xfun(t, k, p, c),
outer = xfun1(t, k, p, c),
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# loop 186.0454 186.2375 188.9567 187.4005 189.0597 196.6992 10 a
# outer 263.4137 274.6610 346.4505 344.6918 423.3651 425.2885 10 b
As you see, the solution with outer is not faster for data of this size. The main reasons are that R needs to allocate memory for a vector of length 2000^2 and work on this large vector. Also, your simple loop is optimized by the JIT bytecode compiler.
If you want to be faster, you need to switch to a compiled language. Luckily, this is rather easy with Rcpp:
library(Rcpp)
library(inline)
cppFunction(
'double xfun2(const NumericVector x, const double k, const double p, const double c) {
int n = x.length() - 1;
NumericVector ghhh(n);
for (int i = 0; i < n; ++i) {
for (int j = 0; j <= i ; ++j) {
ghhh(i) += k / pow(x(i + 1) - x(j) + c, p);
}
}
ghhh = log(ghhh);
double res;
res = sum(ghhh);
return res;
}')
xfun2(t, k, p, c)
#[1] -1526.102
microbenchmark(loop = xfun(t, k, p, c),
outer = xfun1(t, k, p, c),
Rcpp = xfun2(t, k, p, c),
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# loop 186.0395 188.7875 189.7487 189.9298 191.6967 192.7213 10 b
# outer 408.4452 416.7730 432.3356 419.7510 422.4000 559.4279 10 c
# Rcpp 136.1496 136.1606 136.1929 136.1762 136.2129 136.3089 10 a
As you see, speed gains are minimal for data of this size (JIT compilation is truly marvelous). I suggest to stay with your R loop.
Considering that the logic you implement is the correct one you could try the parallel R functionalities:
library(foreach)
library(doParallel)
xfun2<- function(x,k,p,c){
no_cores <- detectCores() - 1
cl<-makeCluster(no_cores)
registerDoParallel(cl)
ghhh <- foreach(i = 1: length(x), .combine = c) %dopar% sum(k/(x[i]-x[1:(i-1)]+c)^p)
res <- sum(log(ghhh))
}
I ran it with x <- rnorm(100000, 1, 0.5) and the parallel version was almost twice as fast.
You can read more about the doParallel package here
Related
How could I do something like this but in an optimal (vectorized) way in R?
N=10000
f <- 1.005
S0 <- 100
p <- 1/10
n <- seq(3,N)
S <- c(f*S0, f^2*S0, f^3*S0)
P <- c(0, 0, p*(f-1)*f^2*S0)
for(i in n){
R <- tail(S,1)-tail(P,1)
S <- c(S, f*R)
P <- c(P, p*(f-1)*R)
}
the final desired output being of course S and P (all the way up to row N+1). This computes a sequential time series row by row (each row is a function of the previous row values, above row 3).
I tried to use lapply but it's difficult to get a function to return two changes in the global environment... (and the resulting table is also badly formatted)
The simplest step to speed up your code is to pre-allocate the vectors. Start S and P at their final lengths, rather than "growing" them each iteration of the loop. This results in a more than 100x speed-up of your code:
N <- 10000
f <- 1.005
S0 <- 100
p <- 1/10
original = function(N, f, S0, p) {
n <- seq(3,N)
S <- c(f*S0, f^2*S0, f^3*S0)
P <- c(0, 0, p*(f-1)*f^2*S0)
for(i in n){
R <- tail(S,1)-tail(P,1)
S <- c(S, f*R)
P <- c(P, p*(f-1)*R)
}
return(list(S, P))
}
pre_allocated = function(N, f, S0, p) {
n <- seq(3,N)
S <- c(f*S0, f^2*S0, f^3*S0, rep(NA, N - 3))
P <- c(0, 0, p*(f-1)*f^2*S0, rep(NA, N - 3))
for(i in n){
R <- S[i] - P[i]
S[i + 1] <- f*R
P[i + 1] <- p*(f-1)*R
}
return(list(S, P))
}
## Check that we get the same result
identical(original(N, f, S0, p), pre_allocated(N, f, S0, p))
# [1] TRUE
## See how fast it is
microbenchmark::microbenchmark(original(N, f, S0, p), pre_allocated(N, f, S0, p), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# original(N, f, S0, p) 414.3610 419.9241 441.26030 426.01610 454.6002 538.0523 10
# pre_allocated(N, f, S0, p) 2.3306 2.6478 2.92908 3.05785 3.1198 3.2885 10
It's possible that a vectorized solution, perhaps using a function like cumprod, would be even faster, but I don't see a clear way to do it. If you can write out your result mathematically as a cumulative sum or product, that would make it clearer and possibly reveal a solution.
I am implementing an algorithm, and as part of that, I need to generate exponential random variables. Unfortunately though, I can't really avoid looping, as each generated random variable depends on the previous one so I think vectorisation is out of the question. There are some calculations that I do around the generation, but the bottleneck (at present) is the generation. At this point I am assuming N will be large (N >= 1,000,000).
Here is some example code:
N <- 1e7
#Preallocate
x <- rep(0, times=N)
#Set a starting seed
x[1] <- runif(1)
for(i in 2:N) {
#Do some calculations
x[i] <- x[i-1] + rexp(1, x[i-1]) #Bottleneck
#Do some more calculations
}
How can I speed this up? I've tried implementing in Rcpp, but it doesn't seem to do much in this case. Is there another clever way I can get around the rexp() call in each iteration?
We can use the fact that if X ~ Exp(λ) then kX ~ Exp(λ/k) (source: Wikipedia) to speed up the code. This way we can do all the random draws with rate = 1 up front and then just divide within the loop to scale them appropriately.
draws = rexp(N, rate = 1)
x <- rep(0, times = N)
x[1] <- runif(1)
for(i in 2:N) {
#Do some calculations
x[i] <- x[i-1] + draws[i] / x[i-1]
#Do some more calculations
}
A microbenchmark with N = 1e6 values show this to be about 14x faster:
N <- 1e6
draws = rexp(N, rate = 1)
x <- rep(0, times = N)
x[1] <- runif(1)
microbenchmark::microbenchmark(
draw_up_front = {
draws = rexp(N, rate = 1)
for (i in 2:N)
x[i] <- x[i - 1] + draws[i] / x[i - 1]
},
draw_one_at_time = {
for (i in 2:N)
x[i] <- x[i - 1] + rexp(1, x[i - 1])
},
times = 10
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# draw_up_front 153.9547 156.6552 159.9622 160.1901 161.9803 167.2831 10 a
# draw_one_at_time 2207.1997 2212.0460 2280.1265 2236.5197 2332.9913 2478.5104 10 b
A brute-force Rcpp solution:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector genExp(int N) {
NumericVector res(N);
double prev;
res[0] = unif_rand();
for (int i = 1; i < N; i++) {
prev = res[i-1];
res[i] = prev + exp_rand() / prev;
}
return res;
}
Microbenchmark with N = 1e6:
Unit: milliseconds
expr min lq mean median uq max neval
draw_up_front 167.17031 168.57345 170.62292 170.18072 171.73782 175.46868 20
draw_one_at_time 1415.01898 1465.57139 1510.81220 1502.15753 1550.07829 1623.70831 20
rcpp 28.25466 29.33682 33.52528 29.89636 30.74908 94.38009 20
With N = 1e7:
Unit: milliseconds
expr min lq mean median uq max neval
draw_up_front 1698.730 1708.739 1737.8633 1716.1345 1752.3276 1923.3940 20
rcpp 297.142 319.794 338.6755 327.6626 364.6308 398.1554 20
Is there a way to calculate the 4th column in the data table (timeout) without a for loop? Every i th row of that column uses the i-1 th row, so it takes a lot of time to generate as I increase the number of rows.
library(data.table)
dt <- data.table(
id = 1:200,
timein = cumsum(runif(200,1,6)),
servtime = runif(200,3,4))
dt[,"timeout"] <- dt$timein # initialisation of timeout column
# update column timeout
for(i in 2:200) {
dt$timeout[i] <- max(dt$timein[i], dt$timeout[i-1]) + dt$servtime[i]
}
I don't see an easy way in base R to use vectorized operators to speed this up, but you could use Rcpp to speed up the operation:
library(Rcpp)
get.timeout <- cppFunction("
NumericVector getTimeout(NumericVector timein, NumericVector servtime) {
const int n = timein.size();
NumericVector timeout(n);
timeout[0] = timein[0];
for (int i=1; i < n; ++i) {
timeout[i] = fmax(timein[i], timeout[i-1]) + servtime[i];
}
return timeout;
}")
This is quicker than the solution with a for loop:
for.loop <- function(timein, servtime) {
timeout <- dt$timein
n <- length(timeout)
for(i in 2:n) {
timeout[i] <- max(timein[i], timeout[i-1]) + servtime[i]
}
return(timeout)
}
all.equal(for.loop(dt$timein, dt$servtime), get.timeout(dt$timein, dt$servtime))
# [1] TRUE
library(microbenchmark)
microbenchmark(for.loop(dt$timein, dt$servtime), get.timeout(dt$timein, dt$servtime))
# Unit: microseconds
# expr min lq mean median uq max neval
# for.loop(dt$timein, dt$servtime) 414.040 429.5315 438.68765 435.4000 445.1185 506.162 100
# get.timeout(dt$timein, dt$servtime) 22.432 23.9305 28.54934 27.9135 28.6670 97.259 100
The advantage will likely increase for larger inputs.
JSD matrix is a similarity matrix of distributions based on Jensen-Shannon divergence.
Given matrix m which rows present distributions we would like to find JSD distance between each distribution. Resulting JSD matrix is a square matrix with dimensions nrow(m) x nrow(m). This is triangular matrix where each element contains JSD value between two rows in m.
JSD can be calculated by the following R function:
JSD<- function(x,y) sqrt(0.5 * (sum(x*log(x/((x+y)/2))) + sum(y*log(y/((x+y)/2)))))
where x, y are rows in matrix m.
I experimented with different JSD matrix calculation algorithms in R to figure out the quickest one. For my surprise, the algorithm with two nested loops performs faster than the different vectorized versions (parallelized or not). I'm not happy with the results. Could you pinpoint me better solutions than the ones I game up?
library(parallel)
library(plyr)
library(doParallel)
library(foreach)
nodes <- detectCores()
cl <- makeCluster(4)
registerDoParallel(cl)
m <- runif(24000, min = 0, max = 1)
m <- matrix(m, 24, 1000)
prob_dist <- function(x) t(apply(x, 1, prop.table))
JSD<- function(x,y) sqrt(0.5 * (sum(x*log(x/((x+y)/2))) + sum(y*log(y/((x+y)/2)))))
m <- t(prob_dist(m))
m[m==0] <- 0.000001
Algorithm with two nested loops:
dist.JSD_2 <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
resultsMatrix <- matrix(0, matrixColSize, matrixColSize)
for(i in 2:matrixColSize) {
for(j in 1:(i-1)) {
resultsMatrix[i,j]=JSD(inMatrix[,i], inMatrix[,j])
}
}
return(resultsMatrix)
}
Algorithm with outer:
dist.JSD_3 <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
resultsMatrix <- outer(1:matrixColSize,1:matrixColSize, FUN = Vectorize( function(i,j) JSD(inMatrix[,i], inMatrix[,j])))
return(resultsMatrix)
}
Algorithm with combn and apply:
dist.JSD_4 <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
ind <- combn(matrixColSize, 2)
out <- apply(ind, 2, function(x) JSD(inMatrix[,x[1]], inMatrix[,x[2]]))
a <- rbind(ind, out)
resultsMatrix <- sparseMatrix(a[1,], a[2,], x=a[3,], dims=c(matrixColSize, matrixColSize))
return(resultsMatrix)
}
Algorithm with combn and aaply:
dist.JSD_5 <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
ind <- combn(matrixColSize, 2)
out <- aaply(ind, 2, function(x) JSD(inMatrix[,x[1]], inMatrix[,x[2]]))
a <- rbind(ind, out)
resultsMatrix <- sparseMatrix(a[1,], a[2,], x=a[3,], dims=c(matrixColSize, matrixColSize))
return(resultsMatrix)
}
performance test:
mbm = microbenchmark(
two_loops = dist.JSD_2(m),
outer = dist.JSD_3(m),
combn_apply = dist.JSD_4(m),
combn_aaply = dist.JSD_5(m),
times = 10
)
ggplot2::autoplot(mbm)
> summary(mbm)
expr min lq mean median
1 two_loops 18.30857 18.68309 23.50231 18.77303
2 outer 38.93112 40.98369 42.44783 42.16858
3 combn_apply 20.45740 20.90747 21.49122 21.35042
4 combn_aaply 55.61176 56.77545 59.37358 58.93953
uq max neval cld
1 18.87891 65.34197 10 a
2 42.85978 48.82437 10 b
3 22.06277 22.98803 10 a
4 62.26417 64.77407 10 c
This is my implementation of your dist.JSD_2
dist0 <- function(m) {
ncol <- ncol(m)
result <- matrix(0, ncol, ncol)
for (i in 2:ncol) {
for (j in 1:(i-1)) {
x <- m[,i]; y <- m[,j]
result[i, j] <-
sqrt(0.5 * (sum(x * log(x / ((x + y) / 2))) +
sum(y * log(y / ((x + y) / 2)))))
}
}
result
}
The usual steps are to replace iterative calculations with vectorized versions. I moved sqrt(0.5 * ...) from inside the loops, where it is applied to each element of result, to outside the loop, where it is applied to the vector result.
I realized that sum(x * log(x / (x + y) / 2)) could be written as sum(x * log(2 * x)) - sum(x * log(x + y)). The first sum is calculated once for each entry, but could be calculated once for each column. It too comes out of the loops, with the vector of values (one element for each column) calculated as colSums(m * log(2 * m)).
The remaining term inside the inner loop is sum((x + y) * log(x + y)). For a given value of i, we can trade off space for speed by vectorizing this across all relevant y columns as a matrix operation
j <- seq_len(i - 1L)
xy <- m[, i] + m[, j, drop=FALSE]
xylogxy[i, j] <- colSums(xy * log(xy))
The end result is
dist4 <- function(m) {
ncol <- ncol(m)
xlogx <- matrix(colSums(m * log(2 * m)), ncol, ncol)
xlogx2 <- xlogx + t(xlogx)
xlogx2[upper.tri(xlogx2, diag=TRUE)] <- 0
xylogxy <- matrix(0, ncol, ncol)
for (i in seq_len(ncol)[-1]) {
j <- seq_len(i - 1L)
xy <- m[, i] + m[, j, drop=FALSE]
xylogxy[i, j] <- colSums(xy * log(xy))
}
sqrt(0.5 * (xlogx2 - xylogxy))
}
Which produces results that are numerically equal (though not exactly identical) to the original
> all.equal(dist0(m), dist4(m))
[1] TRUE
and about 2.25x faster
> microbenchmark(dist0(m), dist4(m), dist.JSD_cpp2(m), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
dist0(m) 48.41173 48.42569 49.26072 48.68485 49.48116 51.64566 10
dist4(m) 20.80612 20.90934 21.34555 21.09163 21.96782 22.32984 10
dist.JSD_cpp2(m) 28.95351 29.11406 29.43474 29.23469 29.78149 30.37043 10
You'll still be waiting for about 10 hours, though that seems to imply a very large problem. The algorithm seems like it is quadratic in the number of columns, but the number of columns here was small (24) compared to the number of rows, so I wonder what the actual size of data being processed is? There are ncol * (ncol - 1) / 2 distances to be calculated.
A crude approach to further performance gain is parallel evaluation, which the following implements using parallel::mclapply()
dist4p <- function(m, ..., mc.cores=detectCores()) {
ncol <- ncol(m)
xlogx <- matrix(colSums(m * log(2 * m)), ncol, ncol)
xlogx2 <- xlogx + t(xlogx)
xlogx2[upper.tri(xlogx2, diag=TRUE)] <- 0
xx <- mclapply(seq_len(ncol)[-1], function(i, m) {
j <- seq_len(i - 1L)
xy <- m[, i] + m[, j, drop=FALSE]
colSums(xy * log(xy))
}, m, ..., mc.cores=mc.cores)
xylogxy <- matrix(0, ncol, ncol)
xylogxy[upper.tri(xylogxy, diag=FALSE)] <- unlist(xx)
sqrt(0.5 * (xlogx2 - t(xylogxy)))
}
My laptop has 8 nominal cores, and for 1000 columns I have
> system.time(xx <- dist4p(m1000))
user system elapsed
48.909 1.939 8.043
suggests that I get 48s of processor time in 8s of clock time. The algorithm is still quadratic, so this might reduce overall computation time to about 1h for the full problem. Memory might become an issue on a multicore machine, where all processes are competing for the same memory pool; it might be necessary to choose mc.cores less than the number available.
With large ncol, the way to get better performance is to avoid calculating the complete set of distances. Depending on the nature of the data it might make sense to filter for duplicate columns, or to filter for informative columns (e.g., with greatest variance), or... An appropriate strategy requires more information on what the columns represent and what the goal is for the distance matrix. The question 'how similar is company i to other companies?' can be answered without calculating the full distance matrix, just a single row, so if the number of times the question is asked relative to the total number of companies is small, then maybe there is no need to calculate the full distance matrix? Another strategy might be to reduce the number of companies to be clustered by (1) simplify the 1000 rows of measurement using principal components analysis, (2) kmeans clustering of all 50k companies to identify say 1000 centroids, and (3) using the interpolated measurements and Jensen-Shannon distance between these for clustering.
I'm sure there are better approaches than the following, but your JSD function itself can trivially be converted to an Rcpp function by just swapping sum and log for their Rcpp sugar equivalents, and using std::sqrt in place of the R's base::sqrt.
#include <Rcpp.h>
// [[Rcpp::export]]
double cppJSD(const Rcpp::NumericVector& x, const Rcpp::NumericVector& y) {
return std::sqrt(0.5 * (Rcpp::sum(x * Rcpp::log(x/((x+y)/2))) +
Rcpp::sum(y * Rcpp::log(y/((x+y)/2)))));
}
I only tested with your dist.JST_2 approach (since it was the fastest version), but you should see an improvement when using cppJSD instead of JSD regardless of the implementation:
R> microbenchmark::microbenchmark(
two_loops = dist.JSD_2(m),
cpp = dist.JSD_cpp(m),
times=100L)
Unit: milliseconds
expr min lq mean median uq max neval
two_loops 41.25142 41.34755 42.75926 41.45956 43.67520 49.54250 100
cpp 36.41571 36.52887 37.49132 36.60846 36.98887 50.91866 100
EDIT:
Actually, your dist.JSD_2 function itself can easily be converted to an Rcpp function for an additional speed-up:
// [[Rcpp::export("dist.JSD_cpp2")]]
Rcpp::NumericMatrix foo(const Rcpp::NumericMatrix& inMatrix) {
size_t cols = inMatrix.ncol();
Rcpp::NumericMatrix result(cols, cols);
for (size_t i = 1; i < cols; i++) {
for (size_t j = 0; j < i; j++) {
result(i,j) = cppJSD(inMatrix(Rcpp::_, i), inMatrix(Rcpp::_, j));
}
}
return result;
}
(where cppJSD was defined in the same .cpp file as the above). Here are the timings:
R> microbenchmark::microbenchmark(
two_loops = dist.JSD_2(m),
partial_cpp = dist.JSD_cpp(m),
full_cpp = dist.JSD_cpp2(m),
times=100L)
Unit: milliseconds
expr min lq mean median uq max neval
two_loops 41.25879 41.36729 42.95183 41.84999 44.08793 54.54610 100
partial_cpp 36.45802 36.62463 37.69742 36.99679 37.96572 44.26446 100
full_cpp 32.00263 32.12584 32.82785 32.20261 32.63554 38.88611 100
dist.JSD_2 <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
resultsMatrix <- matrix(0, matrixColSize, matrixColSize)
for(i in 2:matrixColSize) {
for(j in 1:(i-1)) {
resultsMatrix[i,j]=JSD(inMatrix[,i], inMatrix[,j])
}
}
return(resultsMatrix)
}
##
dist.JSD_cpp <- function(inMatrix) {
matrixColSize <- ncol(inMatrix)
resultsMatrix <- matrix(0, matrixColSize, matrixColSize)
for(i in 2:matrixColSize) {
for(j in 1:(i-1)) {
resultsMatrix[i,j]=cppJSD(inMatrix[,i], inMatrix[,j])
}
}
return(resultsMatrix)
}
m <- runif(24000, min = 0, max = 1)
m <- matrix(m, 24, 1000)
prob_dist <- function(x) t(apply(x, 1, prop.table))
JSD <- function(x,y) sqrt(0.5 * (sum(x*log(x/((x+y)/2))) + sum(y*log(y/((x+y)/2)))))
m <- t(prob_dist(m))
m[m==0] <- 0.000001
I'd like to take an increasing sequence of numbers (e.g. a series of times)
set.seed(41); d <- seq(1:100) + runif(100, 0, 1)
and if the difference between two sequential numbers is below a threshold, merge them into a single point by taking the mean value of the two and, then continue going through until the next time combining is necessary. I resorted to functions I usually avoid: while and ifelse to write a quick-and-dirty function, and it works but isn't fast. Can you solve this task 1) more efficiently and 2) without invoking a for or while loop. Is there some built-in function, perhaps with even more functionality, that is well-suited for such a task?
combine_points <- function(x, th=0.5)
{
i = 1 # start i at 1
while(min(diff(x)) < th) # initiate while loop
{
ifelse(x[i+1] - x[i] < th, # logical condition
x[i] <- x[i+1] <-
mean(c(x[i+1], x[i])), # assignment if TRUE
(x[i] <- x[i])) # assignment if FALSE
x <- sort(unique(x)) # get rid of the duplicated entry created when
# the ifelse statement was TRUE
# increment i or reset i to 1 if it gets too large
ifelse(i == length(x), i <- 1, i <- i+1 )
}
return(x)
}
newd <- combine_points(d)
th <- 0.5
which(diff(newd) < th)
integer(0)
Update to benchmarks of solutions so far.
I benchmarked with a larger sample vector, and the Rcpp solution suggested by #Roland is slower than my first while loop when the vector gets long. I made an improvement to the initial while loop, and made an Rcpp version of it, too. The benchmark results are below. Note that #flodel answer is not directly comparable because it is a fundamentally different approach to combining, but it is definitely very fast.
set.seed(41); d <- seq(1:4000) + runif(4000, 0, 1)
library(microbenchmark)
microbenchmark(
combine_points.Frank(d,th=0.5),
combine_points.Frank2(d,th=0.5),
combine_points_Roland(d,th=0.5),
combine_points_Roland2(d,th=0.5))
Unit: milliseconds
expr min lq median uq max neval
combine_points.Frank(d, th = 0.5) 2115.6391 2154.5038 2174.5889 2193.8444 7884.1638 100
combine_points.Frank2(d, th = 0.5) 1298.2923 1323.2214 1341.5357 1357.4260 15538.0872 100
combine_points_Roland(d, th = 0.5) 2497.9106 2506.5960 2512.3591 2519.0036 2573.2854 100
combine_points_Roland2(d, th = 0.5) 494.8406 497.3613 498.2347 499.8777 544.9743 100
This is a considerable improvement over my first attempt, and the following is an Rcpp version, which is the fastest, so far:
combine_points.Frank2 <- function(x, th=0.5)
{
i = 1
while(min(diff(x)) < th)
{
if(x[i+1] - x[i] >= th){
i <- i + 1}
else {
x[i] <- x[i+1] <-
mean(c(x[i+1], x[i]));x <- unique(x); i <- i }
}
return(x)
}
Rcpp version
cppFunction('
NumericVector combine_points_Roland2(NumericVector x, double th) {
int i=0;
while(min(diff(x)) < th)
{
if ((x[i+1] - x[i]) >= th)
{
i = i + 1;
}
else{
x[i] = (x[i+1] + x[i])/2;
x[i+1] = x[i];
x = sort_unique(x);
i = i;
}
}
return x;
}
')
Here is something faster. It avoids resizing/copying x in the loop.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector combine_points_Roland3(NumericVector x, double th) {
int i=0, j;
int n(x.size());
while(i < n-1)
{
if ((x[i+1] - x[i]) >= th)
{
i = i + 1;
}
else{
x[i] = (x[i+1] + x[i])/2;
n = n-1;
for (j=i+1; j<n; j++)
{
x[j]=x[j+1];
}
}
}
NumericVector y(n);
for (i = 0; i < n; i++) {
y[i] = x[i];
}
return y;
}
An R implementation of the same algorithm:
combine_points_Roland3R <- function(x, th) {
i <- 1
n <- length(x)
while(i < n) {
if ((x[i+1] - x[i]) >= th) {
i <- i + 1;
} else {
x[i] <- (x[i+1] + x[i])/2
n <- n-1
x[(i+1):n] <- x[(i+2):(n+1)]
}
}
x[1:n]
}
set.seed(41); d <- seq(1:4000) + runif(4000, 0, 1)
x2 <- combine_points_Roland2(d, 0.5)
x3 <- combine_points_Roland3(d, 0.5)
all.equal(x2, x3)
#TRUE
x4 <- combine_points_Roland3R(d, 0.5)
all.equal(x2, x4)
#TRUE
Benchmarks:
library(microbenchmark)
microbenchmark(combine_points_Roland2(d, 0.5),
combine_points_Roland3(d, 0.5),
combine_points_Roland3R(d, 0.5))
# Unit: microseconds
# expr min lq median uq max neval
# combine_points_Roland2(d, 0.5) 126458.64 131414.592 132355.4285 133422.2235 147306.728 100
# combine_points_Roland3(d, 0.5) 121.34 128.269 140.8955 143.3595 393.582 100
# combine_points_Roland3R(d, 0.5) 17564.24 18626.878 19155.6565 20910.2935 68707.888 100
See if this does what you want:
combine_points <- function(x, th=0.5) {
group <- cumsum(c(FALSE, diff(x) > th))
unname(sapply(split(x, group), mean))
}
combine_points(c(-1, 0.1, 0.2, 0.3, 1, 1.5, 2.0, 2.5, 3.0, 10), 0.5)
# [1] -1.0 0.2 2.0 10.0
Here is a translation of your function into Rcpp. It uses sugar functions, which are very convenient, but often there are faster alternatives (RcppEigen or RcppArmadillo are good for that). And of course the algorithm could be improved.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector combine_points1(NumericVector x, double th) {
int i=0;
while(min(diff(x)) < th)
{
if ((x[i+1] - x[i]) < th)
{
x[i] = (x[i+1] + x[i])/2;
x[i+1] = x[i];
}
x = sort_unique(x);
if(i <= x.size())
{
i = i+1;
}
else {
i=1;
}
}
return x;
}
I recommend using RStudio for writing Rcpp functions and sourcing them.
all.equal(combine_points1(d, 0.5),
combine_points(d, 0.5))
#[1] TRUE
library(compiler)
combine_points_comp <- cmpfun(combine_points)
library(microbenchmark)
microbenchmark(combine_points1(d, 0.5),
combine_points_comp(d, 0.5),
combine_points(d, 0.5))
# Unit: microseconds
# expr min lq median uq max neval
# combine_points1(d, 0.5) 652.772 664.6815 683.1315 714.653 1030.171 100
# combine_points_comp(d, 0.5) 8344.839 8692.0880 9010.1470 10627.049 14117.553 100
# combine_points(d, 0.5) 8996.768 9371.0805 9687.0235 10560.226 12800.831 100
A speed-up by a factor of 14 without real effort.