Related
Let's say I have a double like 3.5 and I would like to find out where to sort it in an existing sorted vector say seq(1, 10), put differently, which index the new number would take in the vector. Of course it sits somewhere between 3 and 4 and hence between the third and fourth index, but what would be the fastet way to arrive at this result?
As mentioned in the comments, findInterval is fastest for this purpose. Even a very simple loop in C++ that does the same thing is a little slower on average.
library(Rcpp)
cppFunction("int find_index(double x, NumericVector v) {
int len = v.size();
for(int i = 0; i < len; ++i) {
if(x <= v[i]) return i + 1;
}
return NA_INTEGER;
}")
microbenchmark::microbenchmark(
findInterval = findInterval(453993.5, 1:1000000),
find_index = find_index(453993.5, 1:1000000)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> findInterval 1.9646 2.1739 2.996931 2.32375 2.4846 37.4218 100
#> find_index 2.2151 2.4502 11.319199 2.60925 2.9800 337.9229 100
Something like this?
First define dbl and my_seq
then concatenate both with c(dbl, my_seq) and wrap it with sort
then define the index with which(my_vec == dbl):
dbl <- 3.5
my_seq <- seq(1,10)
my_vec <- sort(c(dbl, my_seq))
index <- which(my_vec == dbl)
index
output:
[1] 4
I have a random vector vec and want make a new vector L without using a loop. New element of L depends on old elements of L and vec.
set.seed(0)
vec <- rnorm(20,0)
i = 2;
N <- length(vec) -1
L <- numeric(N-1)
constant <- 0.6
while (i < N){
L[i] = vec[i + 1] - vec[i] - constant * L[i - 1]
i <- i + 1
}
L
# [1] 0.0000000 1.6560326 -1.0509895 -0.2271942 -1.8182750 1.7023480 -0.3875622 0.5214906 2.0975262 -2.8995756 0.1771427
# [12] -0.4549334 1.1311555 -0.6884468 0.3007724 0.4832709 -1.4341071 2.1880687
You want
L[1] = 0
L[i] = -constant * L[i - 1] + (vec[i + 1] - vec[i]), i = 2, 3, ...,
Let dv <- diff(vec), the 2nd line becomes
L[i] = -constant * L[i - 1] + dv[i], i = 2, 3, ...
an AR1 process with lag-1 auto-correlation -constant and innovation dv[-1]. AR1 process can be efficiently generated by filter with "recursive" method.
dv <- diff(vec)
L <- c(0, filter(dv[-1], -constant, "recursive"))
# [1] 0.0000000 1.6560326 -1.0509895 -0.2271942 -1.8182750 1.7023480
# [7] -0.3875622 0.5214906 2.0975262 -2.8995756 0.1771427 -0.4549334
#[13] 1.1311555 -0.6884468 0.3007724 0.4832709 -1.4341071 2.1880687
#[19] -2.9860629
I guess you mean while (i <= N) in your question. If you do want i < N, then you have to get rid of the last element above. Which can be done by
dv <- diff(vec)
L <- c(0, filter(dv[2:(length(dv) - 1)], -constant, "recursive"))
hours later...
I was brought to attention by Rui Barradas's benchmark. For short vec, any method is fast enough. For long vec, filter is definitely faster, but practically suffers from coercion as filter expects and returns a "ts" (time series) object. It is better to call its workhorse C routine straightaway:
AR1_FILTER <- function (x, filter, full = TRUE) {
n <- length(x)
AR1 <- .Call(stats:::C_rfilter, as.double(x), as.double(filter), double(n + 1L))
if (!full) AR1 <- AR1[-1L]
AR1
}
dv <- diff(vec)
L <- AR1_FILTER(dv[-1], -constant)
#L <- AR1_FILTER(dv[2:(length(dv) - 1)], -constant)
I am not interested in comparing AR1_FILTER with R-level loop. I will just compare it with filter.
library(microbenchmark)
v <- runif(100000)
microbenchmark("R" = c(0, filter(v, -0.6, "recursive")),
"C" = AR1_FILTER(v, -0.6))
Unit: milliseconds
expr min lq mean median uq max neval
R 6.803945 7.987209 11.08361 8.074241 9.131967 54.672610 100
C 2.586143 2.606998 2.76218 2.644068 2.660831 3.845041 100
When you have to compute values based on previous values the general purpose answer is no, there is no way around a loop.
In your case I would use a for loop. It's simpler.
M <- numeric(N - 1)
for(i in seq_len(N)[-N])
M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]
identical(L, M)
#[1] TRUE
Note the use of seq_len, not 2:(N - 1).
Edit.
I have timed the solutions by myself and by user 李哲源. The results are clearly favorable to my solution.
f1 <- function(vec, constant = 0.6){
N <- length(vec) - 1
M <- numeric(N - 1)
for(i in seq_len(N)[-c(1, N)]){
M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]
}
M
}
f2 <- function(vec, constant = 0.6){
dv <- diff(vec)
c(0, c(stats::filter(dv[2:(length(dv) - 1)], -constant, "recursive")) )
}
L1 <- f1(vec)
L2 <- f2(vec)
identical(L, L1)
identical(L, L2)
microbenchmark::microbenchmark(
loop = f1(vec),
filter = f2(vec)
)
On my PC the ratio of the medians gives my code 11 times faster.
I was thinking about using Rcpp for this, but one of the answer mentioned rfilter built internally in R, so I had a check:
/* recursive filtering */
SEXP rfilter(SEXP x, SEXP filter, SEXP out)
{
if (TYPEOF(x) != REALSXP || TYPEOF(filter) != REALSXP
|| TYPEOF(out) != REALSXP) error("invalid input");
R_xlen_t nx = XLENGTH(x), nf = XLENGTH(filter);
double sum, tmp, *r = REAL(out), *rx = REAL(x), *rf = REAL(filter);
for(R_xlen_t i = 0; i < nx; i++) {
sum = rx[i];
for (R_xlen_t j = 0; j < nf; j++) {
tmp = r[nf + i - j - 1];
if(my_isok(tmp)) sum += tmp * rf[j];
else { r[nf + i] = NA_REAL; goto bad3; }
}
r[nf + i] = sum;
bad3:
continue;
}
return out;
}
This function is already pretty look and I don't think I could write an Rcpp one to beat it with great improvement. I did a benchmark with this rfilter and the f1 function in the accepted answer:
f1 <- function(vec, constant = 0.6){
N <- length(vec) - 1
M <- numeric(N - 1)
for(i in seq_len(N)[-c(1, N)]){
M[i] = vec[i + 1] - vec[i] - constant*M[i - 1]
}
M
}
AR1_FILTER <- function (x, filter, full = TRUE) {
n <- length(x)
AR1 <- .Call(stats:::C_rfilter, as.double(x), as.double(filter), double(n + 1L))
if (!full) AR1 <- AR1[-1L]
AR1
}
f2 <- function (vec, constant) {
dv <- diff(vec)
AR1_FILTER(dv[2:(length(dv) - 1)], -constant)
}
library(microbenchmark)
Bench <- function (n) {
vec <- runif(n)
microbenchmark("R" = f1(vec, 0.6), "C" = f2(vec, 0.6))
}
For short vectors with length 100, I got
Bench(100)
Unit: microseconds
expr min lq mean median uq max neval
R 68.098 69.8585 79.05593 72.456 74.6210 244.148 100
C 66.423 68.5925 73.18702 69.793 71.1745 150.029 100
For large vectors with length 10000, I got
Bench(10000)
Unit: microseconds
expr min lq mean median uq max neval
R 6168.742 6699.9170 6870.277 6786.0415 6997.992 8921.279 100
C 876.934 904.6175 1192.000 931.9345 1034.273 2962.006 100
Yeah, there is no way that R is going to beat a compiled language.
library(dplyr)
L2 <- c(0,lead(vec) - vec - constant * lag(L))
L2 <- L2[!is.na(L2)]
L2
[1] 0.00000000 1.09605531 -0.62765133 1.81529867 -2.10535596 3.10864280 -4.36975556 1.41375965
[9] -1.08809820 2.16767510 -1.82140234 1.14748512 -0.89245650 0.03962074 -0.10930073 1.48162072
[17] -1.63074832 2.21593009
all.equal(L,L2)
[1] TRUE
I have 3 vectors and I want to apply separately on each of them the 'which()' function.
I'm trying to find the max index of values less than some given number.
How can I operate this task using vectorization?
my 3 vectors (may have various lengths)
vec1 <- c(1,2,3,4,5)
vec2 <- c(11,12,13)
vec3 <- c(1,2,3,4,5,6,7,8)
How can I vectorize it?
max(which(vec1<3))
max(which(vec2<12.3))
max(which(vec3<5.7))
The expected result is:
2
2
5
One way to get a speedup would be to use Rcpp to search for elements smaller than your cutoff, starting from the right side of the vector and moving left. You can return as soon as you find the element that meets your criteria, which means that if your target is near the right side of the vector you might avoid looking at most of the vector's elements (meanwhile, which looks at all vector elements and max looks at all values returned by which). The speedup would be largest for long vectors where the target element is close to the end.
library(Rcpp)
rightmost.small <- cppFunction(
'double rightmostSmall(NumericVector x, const double cutoff) {
for (int i=x.size()-1; i >= 0; --i) {
if (x[i] < cutoff) return i+1; // 1-index
}
return 0; // None found
}')
rightmost.small(vec1, 3)
# [1] 2
rightmost.small(vec2, 12.3)
# [1] 2
rightmost.small(vec3, 5.7)
# [1] 5
Let's look at the performance for a vector where we expect this to give us a big speedup:
set.seed(144)
vec.large <- rnorm(1000000)
all.equal(max(which(vec.large < -1)), rightmost.small(vec.large, -1))
# [1] TRUE
library(microbenchmark)
microbenchmark(max(which(vec.large < -1)), rightmost.small(vec.large, -1))
# Unit: microseconds
# expr min lq mean median uq max neval
# max(which(vec.large < -1)) 4912.016 8097.290 12816.36406 9189.0685 9883.9775 60405.585 100
# rightmost.small(vec.large, -1) 1.643 2.476 8.54274 8.8915 12.8375 58.152 100
For this vector of length 1 million, we see a speedup of about 1000x using the Rcpp code.
This speedup should carry directly over to the case where you have many vectors stored in a list; you can use #JoshO'Brien's mapply code and observe a speedup when you switch from max(which(...)) to the Rcpp code:
f <- function(v,m) max(which(v < m))
l <- list(vec.large)[rep(1, 100)]
m <- rep(-1, 100)
microbenchmark(mapply(f, l, m), mapply(rightmost.small, l, m))
Unit: microseconds
expr min lq mean median uq max neval
mapply(f, l, m) 865287.828 907893.8505 931448.1555 918637.343 935632.0505 1133909.950 100
mapply(rightmost.small, l, m) 253.573 281.6855 344.5437 303.094 335.1675 712.897 100
We see a 3000x speedup by using the Rcpp code here.
l <- list(vec1,vec2,vec3)
m <- c(3, 12.3, 5.7)
f <- function(v,m) max(which(v < m))
mapply(f,l,m)
# [1] 2 2 5
I am trying to do Conway–Maxwell-Poisson regression using COMPoissonReg in R
However, it is extremely slow for large dataset. Thus I tried to profile and check the source code.
The majority time (>95%) is spent on a function COMPoissonReg:::computez, which is equivalent to:
test <- function (lambda, nu, max=100)
{
forans <- matrix(0, ncol = max + 1, nrow = length(lambda))
for (j in 1:max) {
temp <- matrix(0, ncol = j, nrow = length(lambda))
for (i in 1:j) {
temp[, i] <- lambda/(i^nu)
}
for (k in 1:length(lambda)) {
forans[k, j + 1] <- prod(temp[k, ])
}
}
forans[, 1] <- rep(1, length(lambda))
ans <- rowSums(forans)
return(ans)
}
v is nu here, and lambda is a vector, max is the upper-limit of s (here it is set to 100 as an approximate to infinity).
The question doesn't really need special background stats knowledge, but the link or link2 is here just in case.
A simple script to test performance, this takes 8 secs, and if I lazily cmpfun compile it, it takes 4 secs. I believe it has the potential to be further improved. (without rewriting in C, and I am aiming for around ~ 0.05 sec so that I don't have to refactor the code in package which iteratively calls this function.)
lambda <- rnorm(10000, 1.5, 0.3)
Rprof(tmp <- tempfile())
sum(log(test(lambda, 1.2)))
Rprof()
summaryRprof(tmp)
Update
I realized another issue: floating point arithmetic limitation. Doing power series is dangerous, it can overflow very soon, especially if we have to vectorize. E.g. lambda ^ 100 is certainly NAN if lambda > 10000. Maybe I will use reduce if I program in other languages, but I fear in R reduce is slow.
You can make it much faster than the function you are using by avoiding loops. For example:
test2<-function(lambda,nu,max=100){
len<-length(lambda)
mm<-matrix(rep(lambda,each=max+1),max+1,len)
mm<-mm^(0:max)
mm<-mm/factorial(0:max)^nu
colSums(mm)
}
This runs about 50 times faster with lambda of length 100:
> require(microbenchmark)
> lam<-rnorm(100)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 4.510281e-16
> microbenchmark(test(lam,1.2),test2(lam,1.2),times=10)
Unit: milliseconds
expr min lq median uq max neval
test(lam, 1.2) 77.124705 77.422619 78.241945 79.635746 81.260280 10
test2(lam, 1.2) 1.335716 1.373116 1.401411 1.507765 1.562447 10
You can probably optimize it a little more, but this should get most of the gains, unless there is some kind of builtin function you can exploit rather than doing the sum explicitly.
On input of length 10000, it takes 0.148 seconds on my machine, versus 6.850 seconds for test:
> lam<-rnorm(10000)
> max(abs(test(lam,1.2)-test2(lam,1.2)))
[1] 3.552714e-15
> system.time(test2(lam,1.2))
user system elapsed
0.132 0.016 0.148
> system.time(test(lam,1.2))
user system elapsed
6.780 0.056 6.850
OK, here's an Rcpp answer. As expected, it's a lot faster than either of the others.
require(Rcpp)
rcppfun<-"
Rcpp::NumericVector myfun(Rcpp::NumericVector lambda,
Rcpp::NumericVector weights)
{
int num = lambda.size();
int max = weights.size();
std::vector<double> r(num);
for(int i=0; i<num; i++){
double total = 0;
double prod = 1;
for(int j=0; j<max; j++){
total += prod/weights[j];
prod *= lambda[i];
}
r[i]=total;
}
return Rcpp::wrap(r);
}
"
testRcpp<-cppFunction(rcppfun)
test5<-function(lambda,nu,max=100){
wts<-factorial(0:max)^nu
testRcpp(lambda,wts)
}
This is about 40x faster than my original test2 and about 12x faster than than #NealFultz's improved logarithmic implementation.
> lam<-abs(rnorm(10000))
> max(abs(test5(lam,1.2)-test2(lam,1.2)))
[1] 7.105427e-15
> microbenchmark(test2(lam,1.2),test3(lam,1.2),test4(lam,1.2),test5(lam,1.2))
Unit: milliseconds
expr min lq median uq max neval
test2(lam, 1.2) 125.601616 126.790516 127.700099 135.182263 222.340179 100
test3(lam, 1.2) 125.523424 126.666410 126.921035 131.316254 178.633839 100
test4(lam, 1.2) 41.734015 42.640340 43.190553 50.932952 97.765219 100
test5(lam, 1.2) 3.432029 3.501046 3.519007 3.532603 3.754232 100
On edit, here's one more Rcpp version that should at least partially address the overflow issue, by computing each term incrementally, rather than the numerator and denominator separately.
rcppfun2<-"
Rcpp::NumericVector myfun2(Rcpp::NumericVector lambda, Rcpp::NumericVector nu){
int num = lambda.size();
int max = nu.size();
std::vector<double> r(num);
for(int i=0; i<num; i++){
double term = 1;
double total = 1;
for(int j=0; j< max; j++){
term *= (lambda[i]/nu[j]);
total += term;
}
r[i]=total;
}
return Rcpp::wrap(r);
}
"
testRcpp2<-cppFunction(rcppfun2)
test6<-function(lambda,nu,max=100){
testRcpp2(lambda,(1:max)^nu)
}
> lam<-abs(rnorm(10000))
> max(abs(test2(lam,1.2)-test6(lam,1.2)))
[1] 1.065814e-14
> microbenchmark(test5(lam,1.2),test6(lam,1.2))
Unit: milliseconds
expr min lq median uq max neval
test5(lam, 1.2) 3.416786 3.426013 3.435492 3.444196 3.604486 100
test6(lam, 1.2) 3.554147 3.572285 3.580865 3.588030 3.840713 100
I slept on it, came up with another big improvement if you can use the gsl package. All you're doing is evaluating a polynomial:
require(gsl)
test5 <- function(lambda, nu, max=100){
gsl_poly(factorial(0:max)^-nu, lambda)
}
R>microbenchmark(test2(1:50,5.1), test4(1:50,5.1), test5(1:50,5.1))
Unit: microseconds
expr min lq median uq max neval
test2(1:50, 5.1) 4518.957 4838.5185 5318.5040 5617.6330 19978.039 100
test4(1:50, 5.1) 2043.422 2268.3490 2472.0430 2727.1045 10328.376 100
test5(1:50, 5.1) 311.144 407.2465 476.0755 540.6095 1138.766 100
Following up #mrips, sometimes working on the log can be a bit faster because you can multiply instead of exponentiate:
test4 <- function(lambda,nu,max=100){
mm<-matrix(log(lambda),max,length(lambda), byrow=TRUE)
mm<-mm * 1:max - nu*lfactorial(1:max)
1 + colSums(exp(mm))
}
I've also factored out the special case where s = 0. Here's my timings:
R>microbenchmark(test2(1:50,5), test4(1:50,5))
Unit: microseconds
expr min lq median uq max neval
test2(1:50, 5) 952.360 1432.6600 1436.4525 1440.1860 3467.981 100
test4(1:50, 5) 695.189 1041.4785 1042.8315 1045.6525 2970.441 100
I have a vector of positive and negative numbers
vec<-c(seq(-100,-1), rep(0,20), seq(1,100))
the vector is larger than the example, and takes on a random set of values. I have to repetitively find the number of negative numbers in the vector... I am finding this is quite inefficient.
Since I only need to find the number of negative numbers, and the vector is sorted, I only need to know the index of the first 0 or positive number (there may be no 0s in the actual random vectors).
Currently I am using this code to find the length
length(which(vec<0))
but this forces R to go through the entire vector, but since it is sorted, there is no need.
I could use
match(0, vec)
but my vector does not always have 0s
So my question is, is there some kind of match() function that applies a condition instead of finding a specific value? Or is there a more efficient way to run my which() code?
The solutions offered so far all imply creating a logical(length(vec)) and doing a full or partial scan on this. As you note, the vector is sorted. We can exploit this by doing a binary search. I started thinking I'd be super-clever and implement this in C for even greater speed, but had trouble with debugging the indexing of the algorithm (which is the tricky part!). So I wrote it in R:
f3 <- function(x) {
imin <- 1L
imax <- length(x)
while (imax >= imin) {
imid <- as.integer(imin + (imax - imin) / 2)
if (x[imid] >= 0)
imax <- imid - 1L
else
imin <- imid + 1L
}
imax
}
For comparison with the other suggestions
f0 <- function(v) length(which(v < 0))
f1 <- function(v) sum(v < 0)
f2 <- function(v) which.min(v < 0) - 1L
and for fun
library(compiler)
f3.c <- cmpfun(f3)
Leading to
> vec <- c(seq(-100,-1,length.out=1e6), rep(0,20), seq(1,100,length.out=1e6))
> identical(f0(vec), f1(vec))
[1] TRUE
> identical(f0(vec), f2(vec))
[1] TRUE
> identical(f0(vec), f3(vec))
[1] TRUE
> identical(f0(vec), f3.c(vec))
[1] TRUE
> microbenchmark(f0(vec), f1(vec), f2(vec), f3(vec), f3.c(vec))
Unit: microseconds
expr min lq median uq max neval
f0(vec) 15274.275 15347.870 15406.1430 15605.8470 19890.903 100
f1(vec) 15513.807 15575.229 15651.2970 17064.8830 18326.293 100
f2(vec) 21473.814 21558.989 21679.3210 22733.1710 27435.889 100
f3(vec) 51.715 56.050 75.4495 78.5295 100.730 100
f3.c(vec) 11.612 17.147 28.5570 31.3160 49.781 100
Probably there are some tricky edge cases that I've got wrong! Moving to C, I did
library(inline)
f4 <- cfunction(c(x = "numeric"), "
int imin = 0, imax = Rf_length(x) - 1, imid;
while (imax >= imin) {
imid = imin + (imax - imin) / 2;
if (REAL(x)[imid] >= 0)
imax = imid - 1;
else
imin = imid + 1;
}
return ScalarInteger(imax + 1);
")
with
> identical(f3(vec), f4(vec))
[1] TRUE
> microbenchmark(f3(vec), f3.c(vec), f4(vec))
Unit: nanoseconds
expr min lq median uq max neval
f3(vec) 52096 53192.0 54918.5 55539.0 69491 100
f3.c(vec) 10924 12233.5 12869.0 13410.0 20038 100
f4(vec) 553 796.0 893.5 1004.5 2908 100
findInterval came up when a similar question was asked on the R-help list. It is slow but safe, checking that vec is actually sorted and dealing with NA values. If one wants to live on the edge (arguably no worse that implementing f3 or f4) then
f5.i <- function(v)
.Internal(findInterval(v, 0 - .Machine$double.neg.eps, FALSE, FALSE))
is nearly as fast as the C implementation, but likely more robust and vectorized (i.e., look up a vector of values in the second argument, for easy range-like calculations).
Use sum() and logical comparison:
sum( vec < 0 )
[1] 100
This will be pretty quick, and when you sum a logical, TRUE is 1 and FALSE is 0 so the total will be the number of negative values.
Uh oh, I feel the need for a benchmarking comparison... :-) Vector length is 2e5
library(microbenchmark)
vec<-c(seq(-100,-1,length.out=1e5), rep(0,20), seq(1,100,length.out=1e5))
microbenchmark( (which.min(vec < 0) - 1L) , (sum( vec < 0 )) )
Unit: milliseconds
expr min lq median uq max neval
(which.min(vec < 0) - 1L) 1.883847 2.130746 2.554725 3.141787 75.943911 100
(sum(vec < 0)) 1.398100 1.500639 1.508688 1.745088 2.662164 100
You could use which.min
which.min(vec < 0) - 1L
This will return the first FALSE value, i.e. the first 0.