Is there a way in Rcpp to return an R function with some pre-computed values that are only computed on the first function call? Consider the following R code:
1: func_generator<-function(X) {
2: X_tot<-sum(X)
3: function(b_vec) { (X_tot*b_vec) }
4: }
5: myfunc<-func_generator(c(3,4,5))
6: myfunc(1:2)
7: myfunc(5:6)
8: myfunc2<-func_generator(c(10,11,12,13))
...
Can this be programmed in Rcpp? In practice, assume that something more computationally intensive is done in place of line 2.
To add context, given vector X and scalar b, there is some likelihood function f(b|X), which can be reexpressed as f(b,s(X)) for some sufficient statistic s(X) that is a function only of X, and which involves some computation. This is in a computationally intensive computer experiment, with many vectors X (many likelihoods), and many separate calls to f(bvec|X) for each likelihood, so I'd rather compute s(X) once (for each likelihood) and save it in some fashion rather than re-computing it many times. I've started by simply programming f(bvec,X) to evaluate f(b|X) at the points bvec=(b_1,...,b_n), but this has extra overhead since I call this function several times and it computes s(X) on each run. I'd like to just compute s(X) once.
Any suggestions to accomplish this task efficiently in Rcpp would be appreciated (whether via returning a function; or via storing intermediate calculations in some other fashion).
One simple way to store intermediate results would be a static variable at function level:
// [[Rcpp::plugins(cpp11)]]
#include <thread>
#include <chrono>
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector foo(Rcpp::NumericVector X, Rcpp::NumericVector b, bool useCache = true) {
static double cache;
static bool initialized{false};
if (!(useCache && initialized)) {
// sleep to simulate actual work
std::this_thread::sleep_for (std::chrono::seconds(1));
cache = Rcpp::sum(X);
initialized = true;
}
return cache * b;
}
/*** R
X <- 1:10
b <- 10:20
system.time(r1 <- foo(X, b))
system.time(r2 <- foo(X, b))
all.equal(r1, r2)
system.time(r3 <- foo(X, b, FALSE))
all.equal(r1, r3)
*/
Output:
> system.time(r1 <- foo(X, b))
user system elapsed
0 0 1
> system.time(r2 <- foo(X, b))
user system elapsed
0.002 0.000 0.002
> all.equal(r1, r2)
[1] TRUE
> system.time(r3 <- foo(X, b, FALSE))
user system elapsed
0 0 1
> all.equal(r1, r3)
[1] TRUE
When the cache is used in the second function call, the result is computed almost instantaneously.
This approach is efficient if you can loop over the different b within a loop over the different X. If this restriction does not work for you, then you could use the memoise package at the R level to efficiently store the output of your expensive function for arbitrary input:
// [[Rcpp::plugins(cpp11)]]
#include <thread>
#include <chrono>
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector foo(double total, Rcpp::NumericVector b) {
return total * b;
}
// [[Rcpp::export]]
double bar(Rcpp::NumericVector X) {
// sleep to simulate actual work
std::this_thread::sleep_for (std::chrono::seconds(1));
return Rcpp::sum(X);
}
/*** R
X1 <- 1:10
b1 <- 10:20
X2 <- 10:1
b2 <- 20:10
library(memoise)
bar2 <- memoise(bar)
system.time(r11 <- foo(bar2(X1), b1))
system.time(r21 <- foo(bar2(X2), b2))
system.time(r12 <- foo(bar2(X1), b1))
system.time(r22 <- foo(bar2(X2), b2))
all.equal(r11, r12)
all.equal(r21, r22)
*/
Output:
> system.time(r11 <- foo(bar2(X1), b1))
user system elapsed
0.001 0.000 1.001
> system.time(r21 <- foo(bar2(X2), b2))
user system elapsed
0.033 0.000 1.033
> system.time(r12 <- foo(bar2(X1), b1))
user system elapsed
0 0 0
> system.time(r22 <- foo(bar2(X2), b2))
user system elapsed
0 0 0
> all.equal(r11, r12)
[1] TRUE
> all.equal(r21, r22)
[1] TRUE
As an alternative you could also use these two functions as building blocks for your function generator:
func_generator <- function(X) {
X_tot <- bar(X)
function(b_vec) { foo(X_tot, b_vec) }
}
myfunc <- func_generator(c(3,4,5))
myfunc2 <- func_generator(c(10,11,12,13))
myfunc(1:2)
myfunc(5:6)
myfunc2(1:2)
myfunc2(5:6)
So keep the numerical expensive work in C++, but keep it simple. The functional aspects can then be added using R.
Related
I have a data frame in R that consists of two (numeric) columns, say A and B. I would like to constructor the column consisting of the elements
A1+B1
(A1+B1)*A2+B2
((A1+B1)*A2+B2)*A3+B3
...
This probably is an easy one-liner, but I don't see it.
Edit: deleted the word vectorized from the title, since I'm basically just interested in any elegant solution (the dumb ones I can do myself). In F# - which I'm more familiar with - this would be something like (assuming the elements would be in a list as tuples, which would be more idiomatic):
ABlist |> List.fold (fun acc (a,b) -> acc*a+b) 1
Which is still something very short and clear. I'm dragging this up, because I'm an R noob and unfamiliar with it, but I have read somewhere that it's a functional language, so I would guess a solution in terms of a fold over a data-frame would exist?
This is a different answer driven by the failure of Bram's effort at using Reduce. It builds a list that has the A;B pairs and then sets the intial value for the accumulator to 1 so that the first multiplication doesn't get zeroed out:
abList <- mapply(c, A,B, SIMPLIFY=FALSE) # keeps as a list
Reduce( function(acc,x) {acc*x[1]+x[2]},
abList,init=c(1,0),
accumulate=TRUE)[-1] # removes the initialization value
#--------
[[1]]
[1] 4 3
[[2]]
[1] 10 8
[[3]]
[1] 31 25
[[4]]
[1] 128 104
[[5]]
[1] 645 525
Might take some further work with s/lapply( ..., "[", 1) to pull out the accumulator
> sapply( Reduce( function(acc,x) {acc*x[1]+x[2]},
+ abList,init=c(1,0),
+ accumulate=TRUE)[-1], "[", 1)
[1] 4 10 31 128 645
Ok, turns out I was too lazy, I figured it out myself (note: before that there was a good answer using Rcpp already, but I can't use that at work). Basically just a translation to R of what I wrote in my edit on how I would do this in F#:
a <- c(1,2,3)
b <- c(4,5,6)
abList <- list(a,b)
Reduce( function(acc,x) {acc*x[[1]]+x[[2]]},
abList,
accumulate=TRUE)
Does the trick. Edit: as per the comment below, it actually doesn't do the trick. If one build abList by
abList <- apply(rbind(a,b),2,as.pairlist)
and then folds by:
Reduce(function(acc,x) {(acc*x[[1]])+x[[2]]},abList,1,accumulate=TRUE)
One gets the right answer (with a 1 prepended because that's the intial value for the accumulator)
This is relatively straightforward in Rcpp, which won't have the performance problems you would see if you tried to implement this with loops in R.
library(Rcpp)
sum.prod <- cppFunction(
"NumericVector sum_prod(NumericVector x, NumericVector y) {
NumericVector result(x.size());
result[0] = x[0] + y[0];
for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
return result;
}")
sum.prod(c(1, 2, 3, 4, 5), c(3, 2, 1, 4, 5))
# [1] 4 10 31 128 645
I've found Rcpp to be the simplest way to speed up hard-to-vectorize computations.
X = 1; for ( i in seq(length(A) ) ) { X= B[i]+A[i]*X; print(X) }
[1] 4
[1] 10
[1] 31
[1] 128
[1] 645
If you want to accumulate rather than reprot:
X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]
#[1] 4 10 31 128 645
Will be ploddingly slow compared to the Rcpp solution, but if you need to do the compilation step on the fly it's only when the lengths are more than 1000 that you might even notice the difference:
> A <- sample(1:10000, 1000); B <- sample(1:10000, 1000)
> system.time( {X = 1; for ( i in seq(length(A) ) ) { X[1+i]= B[i]+A[i]*X[i] }; X[-1]})
user system elapsed
0.014 0.002 0.017
> library(Rcpp)
> system.time( {sum.prod <- cppFunction(
+ "NumericVector sum_prod(NumericVector x, NumericVector y) {
+ NumericVector result(x.size());
+ result[0] = x[0] + y[0];
+ for (int i=1; i < x.size(); ++i) result[i] = result[i-1]*x[i] + y[i];
+ return result;
+ }")
+ sum.prod(A,B) } )
user system elapsed
0.012 0.002 0.014
I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.
listk <- numeric(0)
for( i in 1:10000) {
idx <- which(matr == max(matr), arr.ind=T)
if( length(idx) != 0) {
listk <- rbind( listk, idx[1,])
matr[idx[1,1], idx[1,2]] <- 0
matr[idx[2,1], idx[2,2]] <- 0
}
}
Here's how you might find the indices (ij) of the 4 largest elements in a 10-by-10 matrix m.
## Sample data
m <- matrix(runif(100), ncol=10)
## Extract the indices of the 4 largest elements
(ij <- which(m >= sort(m, decreasing=T)[4], arr.ind=TRUE))
# row col
# [1,] 2 1
# [2,] 5 1
# [3,] 6 2
# [4,] 3 10
## Use the indices to extract the values
m[ij]
# [1] 0.9985190 0.9703268 0.9836373 0.9914510
Edit:
For large matrices, performing a partial sort will be a faster way to find the 10,000th largest element:
v <- runif(1e7)
system.time(a <- sort(v, decreasing=TRUE)[10000])
# user system elapsed
# 4.35 0.03 4.38
system.time(b <- -sort(-v, partial=10000)[10000])
# user system elapsed
# 0.60 0.09 0.69
a==b
# [1] TRUE
I like #JoshO'Brien 's answer; the use of partial sorting is great! Here's an Rcpp solution (I'm not a strong C++ programmer so probably bone-headed errors; corrections welcome... how would I template this in Rcpp, to handle different types of input vector?)
I start by including the appropriate headers and using namespaces for convenience
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
Then arrange to expose my C++ function to R
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
and define some variables, most importantly a priority_queue to hold as a pair the numeric value and index. The queue is ordered so the smallest values are at the 'top', with small relying on the standard pair<> comparator.
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
Now I'll walk through the input data, adding it to the queue if either (a) I don't yet have enough values or (b) the current value is larger than the smallest value in the queue. In the latter case, I pop off the smallest value, and insert it's replacement. In this way the priority queue always contains the n_max largest elements.
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
And finally I pop the indexes from the priority queue into the return vector, remembering to translate to 1-based R coordinates.
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
and return the result to R
return wrap(result);
This has nice memory use (the priority queue and return vector are both small relative to the original data) and is fast
> library(Rcpp); sourceCpp("top_i_pq.cpp"); z <- runif(12000 * 12000)
> system.time(top_i_pq(z, 10000))
user system elapsed
0.992 0.000 0.998
Problems with this code include:
The default comparator greater<Elt> works so that, in the case of a tie spanning the value of the _n_th element, the last, rather than first, duplicate is retained.
NA values (and non-finite values?) may not be handled correctly; I'm not sure whether this is true or not.
The function only works for NumericVector input, but the logic is appropriate for any R data type for which an appropriate ordering relationship is defined.
Problems 1 and 2 can likely be dealt with by writing an appropriate comparator; maybe for 2 this is already implemented in Rcpp? I don't know how to leverage C++ language features and the Rcpp design to avoid re-implementing the function for each data type I want to support.
Here's the full code:
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
{
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
return wrap(result);
}
A bit late into the party, but I came up with this, which avoids the sort.
Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.
find_n_top_elements <- function( x, n ){
#set the quantile to correspond to n top elements
quant <- n / (dim(x)[1]*dim(x)[2])
#select the cutpoint to get the quantile above quant
lvl <- quantile(x, probs=1.0-quant)
#select the elements above the cutpoint
res <- x[x>lvl[[1]]]
}
#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)
system.time( res <- find_n_top_elements( x, 10e3 ) )
Resulting in
system.time( res <- find_n_top_elements( x, 10e3 ) )
user system elapsed
3.47 0.42 3.89
For comparison, just sorting x on my system takes
system.time(sort(x))
user system elapsed
30.69 0.21 31.33
Matrix in R is like a vector.
mat <- matrix(sample(1:5000, 10000, rep=T), 100, 100)
mat.od <- order(mat, decreasing = T)
mat.od.arr <- cbind(mat.od%%nrow(mat), mat.od%/%nrow(mat)+1)
mat.od.arr[,2][mat.od.arr[,1]==0] <- mat.od.arr[,2][mat.od.arr[,1]==0] - 1
mat.od.arr[,1][mat.od.arr[,1]==0] <- nrow(mat)
head(mat.od.arr)
# [,1] [,2]
# [1,] 58 5
# [2,] 59 72
# [3,] 38 22
# [4,] 23 10
# [5,] 38 14
# [6,] 90 15
mat[58, 5]
# [1] 5000
mat[59, 72]
# [1] 5000
mat[38, 22]
# [1] 4999
mat[23, 10]
# [1] 4998
I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.
listk <- numeric(0)
for( i in 1:10000) {
idx <- which(matr == max(matr), arr.ind=T)
if( length(idx) != 0) {
listk <- rbind( listk, idx[1,])
matr[idx[1,1], idx[1,2]] <- 0
matr[idx[2,1], idx[2,2]] <- 0
}
}
Here's how you might find the indices (ij) of the 4 largest elements in a 10-by-10 matrix m.
## Sample data
m <- matrix(runif(100), ncol=10)
## Extract the indices of the 4 largest elements
(ij <- which(m >= sort(m, decreasing=T)[4], arr.ind=TRUE))
# row col
# [1,] 2 1
# [2,] 5 1
# [3,] 6 2
# [4,] 3 10
## Use the indices to extract the values
m[ij]
# [1] 0.9985190 0.9703268 0.9836373 0.9914510
Edit:
For large matrices, performing a partial sort will be a faster way to find the 10,000th largest element:
v <- runif(1e7)
system.time(a <- sort(v, decreasing=TRUE)[10000])
# user system elapsed
# 4.35 0.03 4.38
system.time(b <- -sort(-v, partial=10000)[10000])
# user system elapsed
# 0.60 0.09 0.69
a==b
# [1] TRUE
I like #JoshO'Brien 's answer; the use of partial sorting is great! Here's an Rcpp solution (I'm not a strong C++ programmer so probably bone-headed errors; corrections welcome... how would I template this in Rcpp, to handle different types of input vector?)
I start by including the appropriate headers and using namespaces for convenience
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
Then arrange to expose my C++ function to R
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
and define some variables, most importantly a priority_queue to hold as a pair the numeric value and index. The queue is ordered so the smallest values are at the 'top', with small relying on the standard pair<> comparator.
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
Now I'll walk through the input data, adding it to the queue if either (a) I don't yet have enough values or (b) the current value is larger than the smallest value in the queue. In the latter case, I pop off the smallest value, and insert it's replacement. In this way the priority queue always contains the n_max largest elements.
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
And finally I pop the indexes from the priority queue into the return vector, remembering to translate to 1-based R coordinates.
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
and return the result to R
return wrap(result);
This has nice memory use (the priority queue and return vector are both small relative to the original data) and is fast
> library(Rcpp); sourceCpp("top_i_pq.cpp"); z <- runif(12000 * 12000)
> system.time(top_i_pq(z, 10000))
user system elapsed
0.992 0.000 0.998
Problems with this code include:
The default comparator greater<Elt> works so that, in the case of a tie spanning the value of the _n_th element, the last, rather than first, duplicate is retained.
NA values (and non-finite values?) may not be handled correctly; I'm not sure whether this is true or not.
The function only works for NumericVector input, but the logic is appropriate for any R data type for which an appropriate ordering relationship is defined.
Problems 1 and 2 can likely be dealt with by writing an appropriate comparator; maybe for 2 this is already implemented in Rcpp? I don't know how to leverage C++ language features and the Rcpp design to avoid re-implementing the function for each data type I want to support.
Here's the full code:
#include <Rcpp.h>
#include <queue>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
IntegerVector top_i_pq(NumericVector v, int n)
{
typedef pair<double, int> Elt;
priority_queue< Elt, vector<Elt>, greater<Elt> > pq;
vector<int> result;
for (int i = 0; i != v.size(); ++i) {
if (pq.size() < n)
pq.push(Elt(v[i], i));
else {
Elt elt = Elt(v[i], i);
if (pq.top() < elt) {
pq.pop();
pq.push(elt);
}
}
}
result.reserve(pq.size());
while (!pq.empty()) {
result.push_back(pq.top().second + 1);
pq.pop();
}
return wrap(result);
}
A bit late into the party, but I came up with this, which avoids the sort.
Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.
find_n_top_elements <- function( x, n ){
#set the quantile to correspond to n top elements
quant <- n / (dim(x)[1]*dim(x)[2])
#select the cutpoint to get the quantile above quant
lvl <- quantile(x, probs=1.0-quant)
#select the elements above the cutpoint
res <- x[x>lvl[[1]]]
}
#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)
system.time( res <- find_n_top_elements( x, 10e3 ) )
Resulting in
system.time( res <- find_n_top_elements( x, 10e3 ) )
user system elapsed
3.47 0.42 3.89
For comparison, just sorting x on my system takes
system.time(sort(x))
user system elapsed
30.69 0.21 31.33
Matrix in R is like a vector.
mat <- matrix(sample(1:5000, 10000, rep=T), 100, 100)
mat.od <- order(mat, decreasing = T)
mat.od.arr <- cbind(mat.od%%nrow(mat), mat.od%/%nrow(mat)+1)
mat.od.arr[,2][mat.od.arr[,1]==0] <- mat.od.arr[,2][mat.od.arr[,1]==0] - 1
mat.od.arr[,1][mat.od.arr[,1]==0] <- nrow(mat)
head(mat.od.arr)
# [,1] [,2]
# [1,] 58 5
# [2,] 59 72
# [3,] 38 22
# [4,] 23 10
# [5,] 38 14
# [6,] 90 15
mat[58, 5]
# [1] 5000
mat[59, 72]
# [1] 5000
mat[38, 22]
# [1] 4999
mat[23, 10]
# [1] 4998
I need to define t be a 3d vector of repeated elements dependent on user input m i.e. t=c(m, m, m) and then use it the function:
b <- c(0.5,0.2,1)
fn <- function(m) { sum(t-b) }
Is there a way for me to tell R that t is a 3d repeated vector with elements m?
Use the rep() function, for one option:
b <- c(0.5,0.2,1)
fn <- function(m, b) {
t <- rep(m , 3)
sum(t-b)
}
> fn(2, b)
[1] 4.3
You could rely on recycling in this case:
fn2 <- function(m, b) {
sum(m-b)
}
> fn2(2, b)
[1] 4.3
where R automagically extends m to be of the correct length. This is dangerous though, especially if you don't do any checking to see if the length of m and b match.
A better version using rep() would be to not hardcode the length that t needs to be but to take that from b, as in:
fn3 <- function(m, b) {
t <- rep(m , length(b))
sum(t - b)
}
> fn3(2, b)
[1] 4.3
> b <- c(b, b) ## make `b` longer
> length(b)
[1] 6
> fn3(2, b) ## fn3() still works
[1] 8.6
A note on you fn(); you refer to b inside the function but do not pass in b as an argument. Hence you are relying on R finding the right b in the global workspace. This is not good practice. Instead, pass in via arguments all objects needed within the function, as I did in my examples. You'll find errors easier to track down etc if you do it that way.
I have noticed a curious thing whilst working in R.
When I have a simple program that computes squares from 1 to N implemented using for-loop and while-loop the behaviour is not the same. (I don't care about vectorisation in this case or apply functions).
fn1 <- function (N)
{
for(i in 1:N) {
y <- i*i
}
}
AND
fn2 <- function (N)
{
i=1
while(i <= N) {
y <- i*i
i <- i + 1
}
}
The results are:
system.time(fn1(60000))
user system elapsed
2.500 0.012 2.493
There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In i * i : NAs produced by integer overflow
.
.
.
system.time(fn2(60000))
user system elapsed
0.138 0.000 0.137
Now we know that for-loop is faster, my guess is because of pre allocation and optimisations there. But why does it overflow?
UPDATE: So now trying another way with vectors:
fn3 <- function (N)
{
i <- 1:N
y <- i*i
}
system.time(fn3(60000))
user system elapsed
0.008 0.000 0.009
Warning message:
In i * i : NAs produced by integer overflow
So Perhaps its a funky memory issue? I am running on OS X with 4Gb of memory and all default settings in R. This happens in 32- and 64-bit versions (except that times are faster).
Alex
Because 1 is numeric, but not integer (i.e. it's a floating point number), and 1:6000 is numeric and integer.
> print(class(1))
[1] "numeric"
> print(class(1:60000))
[1] "integer"
60000 squared is 3.6 billion, which is NOT representable in signed 32-bit integer, hence you get an overflow error:
> as.integer(60000)*as.integer(60000)
[1] NA
Warning message:
In as.integer(60000) * as.integer(60000) : NAs produced by integer overflow
3.6 billion is easily representable in floating point, however:
> as.single(60000)*as.single(60000)
[1] 3.6e+09
To fix your for code, convert to a floating point representation:
function (N)
{
for(i in as.single(1:N)) {
y <- i*i
}
}
The variable in the for loop is an integer sequence, and so eventually you do this:
> y=as.integer(60000)*as.integer(60000)
Warning message:
In as.integer(60000) * as.integer(60000) : NAs produced by integer overflow
whereas in the while loop you are creating a floating point number.
Its also the reason these things are different:
> seq(0,2,1)
[1] 0 1 2
> seq(0,2)
[1] 0 1 2
Don't believe me?
> identical(seq(0,2),seq(0,2,1))
[1] FALSE
because:
> is.integer(seq(0,2))
[1] TRUE
> is.integer(seq(0,2,1))
[1] FALSE
And about timing:
fn1 <- function (N) {
for(i in as.numeric(1:N)) { y <- i*i }
}
fn2 <- function (N) {
i=1
while (i <= N) {
y <- i*i
i <- i + 1
}
}
system.time(fn1(60000))
# user system elapsed
# 0.06 0.00 0.07
system.time(fn2(60000))
# user system elapsed
# 0.12 0.00 0.13
And now we know that for-loop is faster than while-loop. You cannot ignore warnings during timing.