I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.
Related
I am created a double for loop in Rcpp to move up one cell all 1's in a column that has 5 in the next available cell. When I compile the code I don't get any error but the code does move 1's in the matrix, it just returns the same matrix. Let's take an original matrix, say named t:
5 1 1 1 1
1 5 5 5 1
5 5 1 5 5
5 0 0 5 1
5 5 0 1 1
after running the code up_rcpp(t,5,5), I should get the following results
1 5 1 1 1
5 5 1 5 1
5 5 5 5 1
5 0 0 5 5
5 1 0 1 1
Below is my rcpp code:
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) & (t(i+1, j) == 1))
{
main(i, j) = 1;
main(i + 1, j) = 5;
}
}
for (int i= r-1; i == r-1; ++i){
if ((t(i, j) == 5) & (t(1, j) == 1))
{
main(i, j) = 1;
main(1, j) = 5;
}
}
}
return main;
}
Maybe I'm a bit paranoid when I pass values to Rcpp, but I never allow my function to change what I pass either. But the clone(main) is necessary here to avoid changes to main changing t.
The last piece was to change the 1 indicies to 0 for the top row.
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix ans = clone(main);
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) && (t(i+1, j) == 1))
{
ans(i, j) = 1;
ans(i + 1, j) = 5;
}
}
for (int i= r-1; i <= r-1; ++i){
if ((t(i, j) == 5) && (t(0, j) == 1))
{
ans(i, j) = 1;
ans(0, j) = 5;
}
}
}
return ans;
}
Which gives:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 1 1
[2,] 5 5 1 5 1
[3,] 5 5 5 5 1
[4,] 5 0 0 1 5
[5,] 5 1 0 5 1
This is different than your solution in column 4, but the way I understand the logic, this is correct.
This question already has answers here:
Create counter within consecutive runs of certain values
(6 answers)
Closed 3 years ago.
I have a logical vector like
as.logical(c(0,0,1,1,1,0,1,1,0,0,0,1,1,1,1))
but much longer. How can i transform it to:
c(0,0,1,2,3,0,1,2,0,0,0,1,2,3,4)
by counting the length of ones?
Another rle option:
r <- rle(x)
x[x] <- sequence(r$l[r$v])
#[1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
Or without saving r:
x[x] <- sequence(with(rle(x), lengths[values]))
with C++ through Rcpp
library(Rcpp)
cppFunction('NumericVector seqOfLogical(LogicalVector lv) {
size_t n = lv.size();
NumericVector res(n);
int foundCounter = 0;
for (size_t i = 0; i < n; i++) {
if (lv[i] == 1) {
foundCounter++;
} else {
foundCounter = 0;
}
res[i] = foundCounter;
}
return res;
}')
seqOfLogical(x)
# [1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
Benchmarks
library(microbenchmark)
set.seed(1)
x <- sample(c(T,F), size = 1e6, replace = T)
microbenchmark(
symbolix = { symbolix(x) },
thelatemail1 = { thelatemail1(x) },
thelatemail2 = { thelatemail2(x) },
wen = { wen(x) },
maurits = { maurits(x) },
#mhammer = { mhammer(x) }, ## this errors
times = 5
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# symbolix 2.760152 4.579596 34.60909 4.833333 22.31126 138.5611 5
# thelatemail1 154.050925 189.784368 235.16431 235.982093 262.33704 333.6671 5
# thelatemail2 138.876834 146.197278 158.66718 148.547708 179.80223 179.9119 5
# wen 780.432786 898.505231 1091.39099 1093.702177 1279.33318 1404.9816 5
# maurits 1002.267323 1043.590621 1136.35624 1086.967756 1271.38803 1277.5675 5
functions
symbolix <- function(x) {
seqOfLogical(x)
}
thelatemail1 <- function(x) {
r <- rle(x)
x[x] <- sequence(r$l[r$v])
return(x)
}
thelatemail2 <- function(x) {
x[x] <- sequence(with(rle(x), lengths[values]))
return(x)
}
maurits <- function(x) {
unlist(Map(function(l, v) if (!isTRUE(v)) rep(0, l) else 1:l, rle(x)$lengths, rle(x)$values))
}
wen <- function(A) {
B=data.table::rleid(A)
B=ave(B,B,FUN = seq_along)
B[!A]=0
B
}
mhammer <- function(x) {
x_counts <- x
for(i in seq_along(x)) {
if(x[i] == 1) { x_counts[i] <- x_counts[i] + x_counts[i-1] }
}
return(x_counts)
}
You can using rleid in data.table
A=as.logical(c(0,0,1,1,1,0,1,1,0,0,0,1,1,1,1))
B=data.table::rleid(A)
B=ave(B,B,FUN = seq_along)
B[!A]=0
B
[1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
x <- c(0,0,1,1,1,0,1,1,0,0,0,1,1,1,1)
x_counts <- x
for(i in seq_along(x)) {
if(x[i] == 1) { x_counts[i] <- x_counts[i] + x_counts[i-1] }
}
x_counts
Here is a solution using base R's rle with Map
x <- as.logical(c(0,0,1,1,1,0,1,1,0,0,0,1,1,1,1))
unlist(Map(function(l, v) if (!isTRUE(v)) rep(0, l) else 1:l, rle(x)$lengths, rle(x)$values))
# [1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
or using purrr::pmap
library(purrr);
unlist(pmap(unclass(rle(x)),
function(lengths, values) if (!isTRUE(values)) rep(0, lengths) else 1:lengths))
#[1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
slightly different from Wen's, I came up with:
library(data.table)
ave(v,rleid(v),FUN=function(x) x *seq_along(x))
# [1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
I recommend runner package and function streak_run which calculates consecutive occurences. Possible also calculating on sliding windows (eg. last 5 observations), more in github documentation
x <- as.logical(c(0,0,1,1,1,0,1,1,0,0,0,1,1,1,1))
streak <- streak_run(x)
streak[x == 0] <- 0
print(streak)
# [1] 0 0 1 2 3 0 1 2 0 0 0 1 2 3 4
I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.
I'm trying to write a recursive function that takes as input an integer n and returns a matrix that contains all binary sequences of length n.
I wrote this code but it is not giving an output
binseq <- function(n){
binsequ <- matrix(nrow = length(n), ncol = n)
r <- 0 # current row of binseq
for (i in 0:n) {
for (j in 0:n) {
for (k in 0:n) {
r <- r + 1
return (binsequ[r,] <- c(i, j, k))
}
}
}
}
I tried to run it using n=3
binseq(3)
But with no success.
However, when I do not use the function command and give specific numbers, it works. For example,
binseq <- matrix(nrow = 8, ncol = 3)
r <- 0 # current row of binseq
for (i in 0:1) {
for (j in 0:1) {
for (k in 0:1) {
r <- r + 1
binseq[r,] <- c(i, j, k)
}
}
}
binseq
the output is:
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1
I created a function, was thinking about doing recursive but it turned out to be using loop. Hope to see how others did this as this sounds like a quite basic question. Landed here. If helps, here is my function.
##k: number of digits, k>1, eg k=3 for 101 etc
binary_gen <- function(k) {
base <- c(0L,1L)
##initialize
bi.set <- integer(2^(k))
bi.set[1:2] <- base
##create set through loop
for (i in 2:k) {
bi.set[(2^(i-1)+1):2^i] <- 10^(i-1)+bi.set[1:2^(i-1)]
bi.set
}
return(bi.set)
}
Here is output for k=4.
> binary_gen(4L)
[1] 0 1 10 11 100 101 110 111 1000 1001 1010 1011 1100 1101 1110 1111
You can manipulate the output vector into a matrix of desired format (# of rows=k, # of columns=2^k). Based on k, the location for binary numbers of k digits are 2^(k-1)+1 to 2^k.
Can someone help me figure out the running time of this loop? I believe it is O(5nlogn).
for(int f = 0; f < Array.length; f++) {
F = Array[f];
for(int e = 0; e <= f; e++) {
E = Array[e];
for(int d = 0; d <= e; d++) {
D = Array[d];
for(int c = 0; c <= d; c++) {
C = Array[c];
for(int b = 0; b <= c; b++) {
B = Array[b];
for(int a = 0; a <= b; a++) {
A = Array[a];
}
}
}
}
}
}
Thanks
The answer is Θ(n6). I wrote a program to simulate the inner loop and record how many times a series of n executions occurs:
static void Main(string[] args)
{
int arrLength = 20;
int[] arr = new int[arrLength];
for (int f = 0; f < arrLength; f++)
{
for (int e = 0; e <= f; e++)
{
for (int d = 0; d <= e; d++)
{
for (int c = 0; c <= d; c++)
{
for (int b = 0; b <= c; b++)
{
//for (int a = 0; a <= b; a++)
arr[b] = arr[b] + 1;
}
}
}
}
}
for (int i = 0; i < arr.Length; i++)
{
Debug.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
Console.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
}
Console.ReadLine();
}
Running this with an arrLength of 1 gives:
1 execution: 1 time(s).
Running this with an arrLength of 2 gives:
1 execution: 5 time(s).
2 execution: 1 time(s).
Running this with an arrLength of 3 gives:
1 execution: 15 time(s).
2 execution: 5 time(s).
3 execution: 1 time(s).
As it turns out, the execution times always follow the same equation. At arrLength of 20, we get:
1 execution: 8855 time(s).
2 execution: 7315 time(s).
3 execution: 5985 time(s).
4 execution: 4845 time(s).
5 execution: 3876 time(s).
6 execution: 3060 time(s).
7 execution: 2380 time(s).
8 execution: 1820 time(s).
9 execution: 1365 time(s).
10 execution: 1001 time(s).
11 execution: 715 time(s).
12 execution: 495 time(s).
13 execution: 330 time(s).
14 execution: 210 time(s).
15 execution: 126 time(s).
16 execution: 70 time(s).
17 execution: 35 time(s).
18 execution: 15 time(s).
19 execution: 5 time(s).
20 execution: 1 time(s).
Plugging this into the awesome Online Encyclopedia of Integer Sequences, we get the Binomial coefficient binomial(n,4), which is this (the sequence starts at an offset of 4):
binomial(n,4)
n*(n-1)*(n-2)*(n-3)/24
0 = 0
1 = 0
2 = 0
3 = 0
4 = 1
5 = 5
6 = 15
7 = 35
...
If we look at the execution patterns output by my program above, we can rewrite it using a summation and this binomial sequence. For each integer i between 1 and n inclusive, we have the (n - i + 4)th number in the binomial(n,4) sequence, then multiplied by i, as the total number of executions. This is expressed as the following:
Substituting j = n - i + 1, and realizing that j goes from n downto 1, we can rewrite this equation as:
Relying on Wolfram Alpha to figure out this equation, I plugged in sum (n-j+1)(j+3)(j+2)(j+1)*j/24, j = 1 to n, and it came up with:
This is very obviously Θ(n6), so that is our answer.
The final equation is actually binomial(n,6), so for m loops, the number of executions of the innermost loop is probably binomial(n,m). For a given number of m loops, we have:
A good way to do this is to think about the space you're iterating over. If you think about it, the loops will iterate over nonnegative integral valuesof (a, b, c, d, e, f) where
n > f ≥ e ≥ d ≥ c ≥ b ≥ a
Each of these iterations does O(1) work (all loops just assign a variable, which takes O(1) work), so the question is how many possible values there are that satisfy the above formula. I'm going to claim it's Θ(n6), and will try to justify this with the rest of my answer.
First, note that the value certainly isn't any more than O(n6). All of a, b, c, d, e, and f range between 0 and n-1, so there's at most n different values for each. Therefore, the maximum possible number of values they can have is n6. This is not a tight bound, but it's certainly an upper bound. That gives us that the runtime is at most O(n6).
If we want to get a tighter bound, we have to work harder. To do this, I'm going to use the following fact:
1k + 2k + 3k + ... + nk = Θ(nk)
This is the sum of a geometric series, which is where it comes from.
This means that
sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
sum (a from 0 to b)
1
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
Theta(b)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
Theta(c^2)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
Theta(d^3)
= sum(f from 0 to n-1)
sum (e from 0 to f)
Theta(e^4)
= sum(f from 0 to n-1)
Theta(f^5)
= Theta(n^6)
Hope this helps!