I am created a double for loop in Rcpp to move up one cell all 1's in a column that has 5 in the next available cell. When I compile the code I don't get any error but the code does move 1's in the matrix, it just returns the same matrix. Let's take an original matrix, say named t:
5 1 1 1 1
1 5 5 5 1
5 5 1 5 5
5 0 0 5 1
5 5 0 1 1
after running the code up_rcpp(t,5,5), I should get the following results
1 5 1 1 1
5 5 1 5 1
5 5 5 5 1
5 0 0 5 5
5 1 0 1 1
Below is my rcpp code:
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) & (t(i+1, j) == 1))
{
main(i, j) = 1;
main(i + 1, j) = 5;
}
}
for (int i= r-1; i == r-1; ++i){
if ((t(i, j) == 5) & (t(1, j) == 1))
{
main(i, j) = 1;
main(1, j) = 5;
}
}
}
return main;
}
Maybe I'm a bit paranoid when I pass values to Rcpp, but I never allow my function to change what I pass either. But the clone(main) is necessary here to avoid changes to main changing t.
The last piece was to change the 1 indicies to 0 for the top row.
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
Rcpp::NumericMatrix up_rcpp(Rcpp::NumericMatrix main, int r, int c) {
Rcpp::NumericMatrix ans = clone(main);
Rcpp::NumericMatrix t = clone(main);
for (int j=0; j <= c-1; ++j) {
for (int i=0; i <= r-2; ++i){
if ((t(i,j) == 5) && (t(i+1, j) == 1))
{
ans(i, j) = 1;
ans(i + 1, j) = 5;
}
}
for (int i= r-1; i <= r-1; ++i){
if ((t(i, j) == 5) && (t(0, j) == 1))
{
ans(i, j) = 1;
ans(0, j) = 5;
}
}
}
return ans;
}
Which gives:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 1 1
[2,] 5 5 1 5 1
[3,] 5 5 5 5 1
[4,] 5 0 0 1 5
[5,] 5 1 0 5 1
This is different than your solution in column 4, but the way I understand the logic, this is correct.
Related
I'm trying to write a nested for loops (i index 1st loop, j index 2nd loop) such that j starts from i + 1 and ends at a specific value (the last value of a vector). In particular:
vettore = 1 : 10
for (i in vettore) {
j = i + 1
for (j in vettore) {
cat("i: ", i)
cat("j: ", j, "\n")
}
}
I need the following behaviour.
Iteration 1 of 1st loop: i = 1, j = 2 up to j = 10
Iteration 2 of 1st loop: i = 2, j = 3 up to j = 10
and so on. How can I do this?
The second loop should start from j but in your code it is still running for every value of vettore. Also it is better to use seq_along or 1:length(vettore) instead of for (i in vettore).
Here is a way to fix it.
vettore = 1 : 10
for (i in seq_along(vettore)) {
j = i + 1
for (k in j:length(vettore)) {
cat("i: ", i)
cat("\tj: ", k, "\n")
}
}
#i: 1 j: 2
#i: 1 j: 3
#i: 1 j: 4
#i: 1 j: 5
#i: 1 j: 6
#i: 1 j: 7
#i: 1 j: 8
#i: 1 j: 9
#i: 1 j: 10
#i: 2 j: 3
#i: 2 j: 4
#i: 2 j: 5
#...
#...
I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.
I have a vector of numbers:
x <- c(0, 0, 0, 30, 60, 0, 0, 0, 0, 0, 10, 0, 0, 15, 45, 0, 0)
For each element i in x, I would like to do the following
If x[i] > 0, return 0
If all 4 elements before x[i] are 0, return NA
If the 4 elements before x[i] are not 0, count how many elements are between the last not-0 element and x[i]
I expect this output:
#> x
#[1] 0 0 0 30 60 0 0 0 0 0 10 0 0 15 45 0 0
#> x_out
#[1] NA NA NA 0 0 1 2 3 4 NA 0 1 2 0 0 1 2
Notice that the solution should also work when there are less than 4 elements available at the beginning of the vector (i.e. condition 2 and 3 should use as many elements as are available). Does anybody have a solution for this? A vectorised approach is preferred because the vectors are long and the dataset is fairly big.
Here is a simple Rcpp solution. Create a new C++ file in RStudio and paste the code into it and source the file. Obviously, you'll need to have installed Rtools if you use Windows.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector funRcpp(const IntegerVector x) {
const double n = x.length();
int counter = 4;
IntegerVector y(n);
for (double i = 0; i < n; ++i) {
if (x(i) > 0) {
y(i) = 0;
counter = 0;
}
else {
if (counter > 3) {
y(i) = NA_INTEGER;
} else {
counter++;
y(i) = counter;
}
}
}
return y;
}
/*** R
x <- c(0, 0, 0, 30, 60, 0, 0, 0, 0, 0, 10, 0, 0, 15, 45, 0, 0)
funRcpp(x)
*/
This returns the desired result:
> funRcpp(x)
[1] NA NA NA 0 0 1 2 3 4 NA 0 1 2 0 0 1 2
This is my current approach:
library(dplyr)
last_x_months <- 4
my_list <- vector("list", 1 + last_x_months)
my_list[[1]] <- x
# create lagged variants of vector
for (j in seq_along(1:last_x_months)) {
my_list[[1 + j]] <- lag(my_list[[1]], n = j, default = NA)
}
# row bind it to a data.frame
i_dat <- do.call(rbind, my_list) %>%
as.data.frame()
# apply function to each column in dataframe
sapply(i_dat, function(x) {
if (sum(x, na.rm = TRUE) == 0) {
NA
} else if (x[1] > 0) {
0
} else {
rle(x)$lengths[1]
}
})
This is the output I get:
#> output
#[1] NA NA NA 0 0 1 2 3 4 NA 0 1 2 0 0 1 2
Is this good practice or could I improve performance with a shortcut? I am pretty inexperienced when it comes to performance optimisation, that's why I posed the question.
Can someone help me figure out the running time of this loop? I believe it is O(5nlogn).
for(int f = 0; f < Array.length; f++) {
F = Array[f];
for(int e = 0; e <= f; e++) {
E = Array[e];
for(int d = 0; d <= e; d++) {
D = Array[d];
for(int c = 0; c <= d; c++) {
C = Array[c];
for(int b = 0; b <= c; b++) {
B = Array[b];
for(int a = 0; a <= b; a++) {
A = Array[a];
}
}
}
}
}
}
Thanks
The answer is Θ(n6). I wrote a program to simulate the inner loop and record how many times a series of n executions occurs:
static void Main(string[] args)
{
int arrLength = 20;
int[] arr = new int[arrLength];
for (int f = 0; f < arrLength; f++)
{
for (int e = 0; e <= f; e++)
{
for (int d = 0; d <= e; d++)
{
for (int c = 0; c <= d; c++)
{
for (int b = 0; b <= c; b++)
{
//for (int a = 0; a <= b; a++)
arr[b] = arr[b] + 1;
}
}
}
}
}
for (int i = 0; i < arr.Length; i++)
{
Debug.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
Console.WriteLine(string.Format("{0} execution: {1} time(s).", i + 1, arr[i]));
}
Console.ReadLine();
}
Running this with an arrLength of 1 gives:
1 execution: 1 time(s).
Running this with an arrLength of 2 gives:
1 execution: 5 time(s).
2 execution: 1 time(s).
Running this with an arrLength of 3 gives:
1 execution: 15 time(s).
2 execution: 5 time(s).
3 execution: 1 time(s).
As it turns out, the execution times always follow the same equation. At arrLength of 20, we get:
1 execution: 8855 time(s).
2 execution: 7315 time(s).
3 execution: 5985 time(s).
4 execution: 4845 time(s).
5 execution: 3876 time(s).
6 execution: 3060 time(s).
7 execution: 2380 time(s).
8 execution: 1820 time(s).
9 execution: 1365 time(s).
10 execution: 1001 time(s).
11 execution: 715 time(s).
12 execution: 495 time(s).
13 execution: 330 time(s).
14 execution: 210 time(s).
15 execution: 126 time(s).
16 execution: 70 time(s).
17 execution: 35 time(s).
18 execution: 15 time(s).
19 execution: 5 time(s).
20 execution: 1 time(s).
Plugging this into the awesome Online Encyclopedia of Integer Sequences, we get the Binomial coefficient binomial(n,4), which is this (the sequence starts at an offset of 4):
binomial(n,4)
n*(n-1)*(n-2)*(n-3)/24
0 = 0
1 = 0
2 = 0
3 = 0
4 = 1
5 = 5
6 = 15
7 = 35
...
If we look at the execution patterns output by my program above, we can rewrite it using a summation and this binomial sequence. For each integer i between 1 and n inclusive, we have the (n - i + 4)th number in the binomial(n,4) sequence, then multiplied by i, as the total number of executions. This is expressed as the following:
Substituting j = n - i + 1, and realizing that j goes from n downto 1, we can rewrite this equation as:
Relying on Wolfram Alpha to figure out this equation, I plugged in sum (n-j+1)(j+3)(j+2)(j+1)*j/24, j = 1 to n, and it came up with:
This is very obviously Θ(n6), so that is our answer.
The final equation is actually binomial(n,6), so for m loops, the number of executions of the innermost loop is probably binomial(n,m). For a given number of m loops, we have:
A good way to do this is to think about the space you're iterating over. If you think about it, the loops will iterate over nonnegative integral valuesof (a, b, c, d, e, f) where
n > f ≥ e ≥ d ≥ c ≥ b ≥ a
Each of these iterations does O(1) work (all loops just assign a variable, which takes O(1) work), so the question is how many possible values there are that satisfy the above formula. I'm going to claim it's Θ(n6), and will try to justify this with the rest of my answer.
First, note that the value certainly isn't any more than O(n6). All of a, b, c, d, e, and f range between 0 and n-1, so there's at most n different values for each. Therefore, the maximum possible number of values they can have is n6. This is not a tight bound, but it's certainly an upper bound. That gives us that the runtime is at most O(n6).
If we want to get a tighter bound, we have to work harder. To do this, I'm going to use the following fact:
1k + 2k + 3k + ... + nk = Θ(nk)
This is the sum of a geometric series, which is where it comes from.
This means that
sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
sum (a from 0 to b)
1
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
sum (b from 0 to c)
Theta(b)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
sum (c from 0 to d)
Theta(c^2)
= sum(f from 0 to n-1)
sum (e from 0 to f)
sum (d from 0 to e)
Theta(d^3)
= sum(f from 0 to n-1)
sum (e from 0 to f)
Theta(e^4)
= sum(f from 0 to n-1)
Theta(f^5)
= Theta(n^6)
Hope this helps!
I made a first stab at an Rcpp function via inline and it solved my speed problem (thanks Dirk!):
Replace negative values by zero
The initial version looked like this:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
for(int i=0; i < n_xa; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
But when called cpp_if(p), it overwrote p with the output, which was not as intended. So I assumed it was passing by reference.
So I fixed it with the following version:
library(inline)
cpp_if_src <- '
Rcpp::NumericVector xa(a);
int n_xa = xa.size();
Rcpp::NumericVector xr(a);
for(int i=0; i < n_xa; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
'
cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
Which seemed to work. But now the original version doesn't overwrite its input anymore when I re-load it into R (i.e. the same exact code now doesn't overwrite its input):
> cpp_if_src <- '
+ Rcpp::NumericVector xa(a);
+ int n_xa = xa.size();
+ for(int i=0; i < n_xa; i++) {
+ if(xa[i]<0) xa[i] = 0;
+ }
+ return xa;
+ '
> cpp_if <- cxxfunction(signature(a="numeric"), cpp_if_src, plugin="Rcpp")
>
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
> cpp_if(p)
[1] 0 0 0 0 0 0 1 2 3 4 5
> p
[1] -5 -4 -3 -2 -1 0 1 2 3 4 5
I'm not the only one who has tried to replicate this behavior and found inconsistent results:
https://chat.stackoverflow.com/transcript/message/4357344#4357344
What's going on here?
They key is 'proxy model' -- your xa really is the same memory location as your original object so you end up changing your original.
If you don't want that, you should do one thing: (deep) copy using the clone() method, or maybe explicit creation of a new object into which the altered object gets written. Method two does not do that, you simply use two differently named variables which are both "pointers" (in the proxy model sense) to the original variable.
An additional complication, though, is in implicit cast and copy when you pass an int vector (from R) to a NumericVector type: that creates a copy, and then the original no longer gets altered.
Here is a more explicit example, similar to one I use in the tutorials or workshops:
library(inline)
f1 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
for(int i=0; i < n; i++) {
if(xa[i]<0) xa[i] = 0;
}
return xa;
')
f2 <- cxxfunction(signature(a="numeric"), plugin="Rcpp", body='
Rcpp::NumericVector xa(a);
int n = xa.size();
Rcpp::NumericVector xr(a); // still points to a
for(int i=0; i < n; i++) {
if(xr[i]<0) xr[i] = 0;
}
return xr;
')
p <- seq(-2,2)
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
p <- as.numeric(seq(-2,2))
print(class(p))
print(cbind(f1(p), p))
print(cbind(f2(p), p))
and this is what I see:
edd#max:~/svn/rcpp/pkg$ r /tmp/ari.r
Loading required package: methods
[1] "integer"
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 -2
[2,] 0 -1
[3,] 0 0
[4,] 1 1
[5,] 2 2
[1] "numeric"
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
p
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 1
[5,] 2 2
edd#max:~/svn/rcpp/pkg$
So it really matters whether you pass int-to-float or float-to-float.