why quick sort implemented in rcpp works slow? - r

I have implemented quick sort algorithm in Rcpp, but it works significantly slower than sort(array, method="quick") for large arrays. Why?
Here is my Rcpp code
// partition using hoare's scheme
#include <Rcpp.h>
using namespace Rcpp;
int partition(NumericVector a,int start,int end)
{
double pivot = a[end];
int i = start - 1;
int j = end + 1;
//Rcout << a <<"\n";
while(1)
{
do {
i++;
} while (a[i] < pivot);
do {
j--;
} while (pivot < a[j]);
if(i > j)
return j;
//special.Swap(a, i, j);
std::swap(a[i], a[j]);
}
}
void qsort(NumericVector a,int start,int end)
{
//Rcout << start <<"," << end <<"\n";
if(start < end)
{
int P_index = partition(a, start, end);
//Rcout << P_index << "\n";
qsort(a, start, P_index);
qsort(a, P_index + 1, end);
}
}
// [[Rcpp::export]]
NumericVector QuickSortH_WC(NumericVector arr)
{
int len = arr.size();
qsort(arr, 0, len-1);
//Rcout << arr <<"\n";
return 1;
}
Also for arrays with floating values, the algorithm is worse. I want to make a comparison with hoare's and lomuto partitioning scheme, But I do not know whether this implementation has any flaw in it for which algorithm works slower.

The main reason for the inefficiency of your code seems to be mixing of the two partitioning schemes you want to compare. You claim to use the Hoare partition scheme, and the code looks very much like it, but pivot is calculated according to the Lomuto partition scheme. In addition, you should return j if i >= j, not if i > j. Fixing these two things and replacing i++ with the slightly faster ++i I get:
// partition using hoare's scheme
#include <Rcpp.h>
using namespace Rcpp;
int partition(NumericVector a,int start,int end)
{
double pivot = a[(start + end) / 2];
int i = start - 1;
int j = end + 1;
//Rcout << a <<"\n";
while(1)
{
do {
++i;
} while (a[i] < pivot);
do {
--j;
} while (pivot < a[j]);
if(i >= j)
return j;
//special.Swap(a, i, j);
std::swap(a[i], a[j]);
}
}
void qsort(NumericVector a,int start,int end)
{
//Rcout << start <<"," << end <<"\n";
if(start < end)
{
int P_index = partition(a, start, end);
//Rcout << P_index << "\n";
qsort(a, start, P_index);
qsort(a, P_index + 1, end);
}
}
// [[Rcpp::export]]
NumericVector QuickSortH_WC(NumericVector arr)
{
int len = arr.size();
qsort(arr, 0, len-1);
//Rcout << arr <<"\n";
return arr;
}
/*** R
set.seed(42)
dat <- runif(1e6)
bench::mark(QuickSortH_WC(dat), sort(dat, method="quick"))
*/
Output
> bench::mark(QuickSortH_WC(dat), sort(dat, method="quick"))
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr
<bch:expr> <bch:> <bch:t> <dbl> <bch:byt> <dbl> <int>
1 QuickSortH_WC(dat) 95.7ms 100.5ms 8.63 2.49KB 43.2 5
2 sort(dat, method = "quick") 15ms 16.5ms 53.1 11.44MB 23.6 27
# … with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
# memory <list>, time <list>, gc <list>
Warning message:
Some expressions had a GC in every iteration; so filtering is disabled.
So while this method is about a factor of 7 slow than R's sort, it has at least comparable order of magnitude for the run time. (Thanks #JosephWood for digging out the link). And Wikipedia lists even more improvements over these two schemas.
BTW, I also changed the wrapper function to return the changed array. This allows me to use the default behavior of bench::mark which is to compare the returned results. I find that useful ...

Rcpp apply recursive functions badly.
I suggest iterative quick sort implementation:
void _Quick_sorti( double _array[],int _l,int _h){
int *_stack=new int [_h-_l+1]; double _tmp;int _i,_p,_top=-1;
_stack[++_top]=_l;_stack[++_top]=_h;
while(_top>=0){
_h=_stack[_top--];_l=_stack[_top--];
_tmp=_array[_h];
_i=_l-1;
for(int _j=_l;_j<=_h-1;_j++){
if(_array[_j]<=_tmp){_i++;std::swap(_array[_i],_array[_j]);}
}
_p=_i+1;
std::swap(_array[_p],_array[_h]);
if(_p-1>_l){_stack[++_top]=_l;_stack[++_top]=_p-1;}
if(_p+1<_h){_stack[++_top]=_p+1;_stack[++_top]=_h;}
}
delete _stack;
}
// [[Rcpp::export]]
SEXP Quick_sorti(SEXP &unsorted) { //run
SEXP temp=clone(unsorted);// or Rf_duplicate
double *z=REAL(temp);
int N=LENGTH(temp)-1;
int k=0;
_Quick_sorti(z,k,N); // note that we have provide lvalue (if we put 0 it will not works int place of N)
return temp;}
The code is adapted from a macros that include '_' prefix and look ugly moreover it use R internals. Adding stack imply N more memory requirement.

Related

Allow C++ constants to be a default function parameter using Rcpp Attributes

I created a cumsum function in an R package with rcpp which will cumulatively sum a vector until it hits the user defined ceiling or floor. However, if one wants the cumsum to be bounded above, the user must still specify a floor.
Example:
a = c(1, 1, 1, 1, 1, 1, 1)
If i wanted to cumsum a and have an upper bound of 3, I could cumsum_bounded(a, lower = 1, upper = 3). I would rather not have to specify the lower bound.
My code:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, int upper, int lower) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
What I would like:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, long long int upper = LLONG_MAX, long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
In short, yes its possible but it requires finesse that involves creating an intermediary function or embedding sorting logic within the main function.
In long, Rcpp attributes only supports a limit feature set of values. These values are listed in the Rcpp FAQ 3.12 entry
String literals delimited by quotes (e.g. "foo")
Integer and Decimal numeric values (e.g. 10 or 4.5)
Pre-defined constants including:
Booleans: true and false
Null Values: R_NilValue, NA_STRING, NA_INTEGER, NA_REAL, and NA_LOGICAL.
Selected vector types can be instantiated using the
empty form of the ::create static member function.
CharacterVector, IntegerVector, and NumericVector
Matrix types instantiated using the rows, cols constructor Rcpp::Matrix n(rows,cols)
CharacterMatrix, IntegerMatrix, and NumericMatrix)
If you were to specify numerical values for LLONG_MAX and LLONG_MIN this would meet the criteria to directly use Rcpp attributes on the function. However, these values are implementation specific. Thus, it would not be ideal to hardcode them. Thus, we have to seek an outside solution: the Rcpp::Nullable<T> class to enable the default NULL value. The reason why we have to wrap the parameter type with Rcpp::Nullable<T> is that NULL is a very special and can cause heartache if not careful.
The NULL value, unlike others on the real number line, will not be used to bound your values in this case. As a result, it is the perfect candidate to use on the function call. There are two choices you then have to make: use Rcpp::Nullable<T> as the parameters on the main function or create a "logic" helper function that has the correct parameters and can be used elsewhere within your application without worry. I've opted for the later below.
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
NumericVector cumsum_bounded_logic(NumericVector x,
long long int upper = LLONG_MAX,
long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x,
Rcpp::Nullable<long long int> upper = R_NilValue,
Rcpp::Nullable<long long int> lower = R_NilValue) {
if(upper.isNotNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), Rcpp::as< long long int >(lower));
} else if(upper.isNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, LLONG_MAX, Rcpp::as< long long int >(lower));
} else if(upper.isNotNull() && lower.isNull()) {
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), LLONG_MIN);
} else {
return cumsum_bounded_logic(x, LLONG_MAX, LLONG_MIN);
}
// Required to quiet compiler
return x;
}
Test Output
cumsum_bounded(a, 5)
## [1] 1 2 3 4 5 5 5
cumsum_bounded(a, 5, 2)
## [1] 2 3 4 5 5 5 5

What's wrong with my dynamic programming solution for uva 10739?

I'm solving this problem on uva. I've found the recurrence relation and it works perfectly for the given test cases. However, without memoization, it exceeds time limit. I cached the values and returned the cache(basic memoization). With caching, I'm getting an answer of 1 more than the actual answer for the last two test cases. I can't understand what might be the bug because it works if you take out the caching. Thanks for your help.
Code:
#include<iostream>
using namespace std;
string a;
int n;
int dp[1005][1005];
int solve(int i, int j, int moves)
{
if(j<=i)
return dp[i][j] = moves;
if(dp[i][j]!=-1)
return dp[i][j];
if(a[i]==a[j])
return dp[i][j] = solve(i+1, j-1, moves);
else
return dp[i][j] = min(min(solve(i+1, j-1, moves+1), solve(i+1, j, moves+1)), solve(i, j-1, moves+1));
}
int main()
{
int T;
cin >> T;
while(T--)
{
cin >> a;
n = a.length();
memset(dp, -1, sizeof(dp));
int ans = solve(0, n-1, 0);
cout << ans << "\n";
}
}
Expected O/P for:
sadrulhabibchowdhury: 8
My Output: 9

Frama-C slice: parallelizable loop

I am trying to perform a backward slicing of an array element at specific position. I tried two different source codes. The first one is (first.c):
const int in_array[5][5]={
1,2,3,4,5,
6,7,8,9,10,
11,12,13,14,15,
16,17,18,19,20,
21,22,23,24,25
};
int out_array[5][5];
int main(unsigned int x, unsigned int y)
{
int res;
int i;
int j;
for(i=0; i<5; i++){
for(j=0; j<5; j++){
out_array[i][j]=i*j*in_array[i][j];
}
}
res = out_array[x][y];
return res;
}
I run the command:
frama-c-gui -slevel 10 -val -slice-return main file.c
and get the following generated code:
int main(unsigned int x, unsigned int y)
{
int res;
int i;
int j;
i = 0;
while (i < 5) {
j = 0;
while (j < 5){
out_array[i][j] = (i * j) * in_array[i][i];
j ++;
}
i ++;
}
res = out_array[x][y];
return res;
}
This seems to be ok, since the x and y are not defined, so the "res" can be at any position in the out_array. I tried then with the following code:
const int in_array[5][5]={
1,2,3,4,5,
6,7,8,9,10,
11,12,13,14,15,
16,17,18,19,20,
21,22,23,24,25
};
int out_array[5][5];
int main(void)
{
int res;
int i;
int j;
for(i=0; i<5; i++){
for(j=0; j<5; j++){
out_array[i][j]=i*j*in_array[i][j];
}
}
res = out_array[3][3];
return res;
}
The result given was exactly the same. However, since I am explicitly looking for a specific position inside the array, and the loops are independent (parallelizable), I would expect the output to be something like this:
int main(void)
{
int res;
int i;
int j;
i = 3;
j = 3;
out_array[i][j]=(i * j) * in_array[i][j];
res = out_array[3][3];
}
I am not sure if is it clear from the examples. What I want to do is to identify, for a given array position, which statements impact its final result.
Thanks in advance for any support.
You obtain "the statements which impact the final result". The issue is that not all loop iterations are useful, but there is no way for the slicing to remove a statement to the code in its current form. If you perform syntactic loop unrolling, with -ulevel 5, then you will each loop iteration is individualized, and slicing can decide for each of them whether it is to be included in the slice or not. In the end, frama-c-gui -ulevel 5 -slice-return main loop.c gives you the following code
int main(void)
{
int res;
int i;
int j;
i = 0;
i ++;
i ++;
i ++;
j = 0;
j ++;
j ++;
j ++;
out_array[i][j] = (i * j) * in_array[i][j];
res = out_array[3][3];
return res;
}
which is indeed the minimal set of instructions needed to compute the value of out_array[3][3].
Of course whether -ulevel n scales up to very high values of n is another question.

caught segfault, memory not mapped error in Rcpp trying to implement a function

i'm new in Rcpp and i dont really know Rcpp. but as a personal project, i was trying to run some sort algorithms using some C code that i had, converting them to R with Rcpp.
But i'm getting the memory not mapped error, and i dont really understand what i'm doing wrong, so if someone could enlighten me :)
The problem happens when a try the following code
#include <Rcpp.h>
using namespace Rcpp;
void intercala(int p, int q, int r, NumericVector v)
{
int i, j, k;
NumericVector w = NumericVector::create();
i = p;
j = q;
k = 0;
while (i < q && j < r) {
if (v[i] < v[j]) {
w[k] = v[i];
i++;
}
else {
w[k] = v[j];
j++;
}
k++;
}
while (i < q) {
w[k] = v[i];
i++;
k++;
}
while (j < r) {
w[k] = v[j];
j++;
k++;
}
for (i = p; i < r; i++)
v[i] = w[i-p];
}
void mergesort(int p, int r, NumericVector v)
{
int q;
if (p < r - 1) {
q = (p + r) / 2;
mergesort(p, q, v);
mergesort(q, r, v);
intercala(p, q, r, v);
}
}
// [[Rcpp::export]]
NumericVector mergesortC(NumericVector vetor) {
int n = vetor.size();
mergesort(0,n,vetor);
return vetor;
}
This code is in a file called merge.cpp
Them when i try to run on R
> library(Rcpp)
> sourceCpp("merge.cpp")
> vetor<-sample(1:10)
> vetor
[1] 1 5 7 4 3 8 9 2 10 6
> mergesortC(vetor)
[1] 1 2 3 4 5 6 7 8 9 10
> vetor
*** caught segfault ***
address 0x8, cause 'memory not mapped'
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
It seems to me that i'm doing something that i shouldn't, but the code seem to work in the begin, then i somehow mess with the memory in the object vetor. I managed to work on other algorithms with Rcpp, but this one wont work, and i dont understand what i'm doing wrong, so if anyone could spare a moment.

Recursion in C unable to return result from the prototype!?

I'm not sure why this recursion is not working! I'm trying to get the total of an input from i=0 to n. I'm also testing recursion instead of 'for loop' to see how it performs. Program runs properly but stops after the input. I would appreciate any comments, thx!
int sigma (int n)
{
if (n <= 0) // Base Call
return 1;
else {
printf ("%d", n);
int sum = sigma( n+sigma(n-1) );
return sum;
}
// recursive call to calculate any sum>0;
// for example: input=3; sum=(3+sigma(3-1)); sum=(3+sigma(2))
// do sigma(2)=2+sigma(2-1)=2+sigma(1);
// so sigma(1)=1+sigma(1-1)=1+sigma(0)=1;
// finally, sigma(3)=3+2+1+0=6
}
int main (int argc, char *argv[])
{
int n;
printf("Enter a positive integer for sum : ");
scanf( " %d ", &n);
int sum = sigma(n);
printf("The sum of all numbers for your entry: %d\n", sum);
getch();
return 0;
}
Change
int sum = sigma( n+sigma(n-1) );
to
int sum = n + sigma( n-1 );
As you've written it, calling sigma(3) then calls sigma(5), etc...
Also, return 0 from the guard case, not 1.
I think it should be
int sum = n + sigma(n-1)

Resources