I know that we can't overload a C++ function in R according to this Q/A. But sometime we need it. The solution proposed to make a dispatcher in the question linked seems a bit overkilled to me (using templates and switches). I'm wondering if there is an elegant solution for simple problems.
So here a simple problem. x is a vector of int or a vector of double. sum(x == y) can be written in a faster and memory optimized way in C++.
int int_count_equal(IntegerVector x, int y)
{
return std::count(x.begin(), x.end(), y);
}
and
int num_count_equal(NumericVector x, double y)
{
return std::count(x.begin(), x.end(), y);
}
The two functions have the same body. It does not make sense and it is not convenient to have two functions. How can we make a dispatcher in a single function count_equal without writing twice the same code?
Related
I have a C function from a down-stream library that I call in C like this
result = cfunction(input_function)
input_function is a callback that needs to have the following structure
double input_function(const double &x)
{
return(x*x);
}
Where x*x is a user-defined computation that is usually much more complicated. I'd like to wrap cfunction using Rcpp so that the R user could call it on arbitrary R functions.
NumericVector rfunction(Function F){
NumericVector result(1);
// MAGIC THAT I DON'T KNOW HOW TO DO
// SOMEHOW TURN F INTO COMPATIBLE input_funcion
result[0] = cfunction(input_function);
return(result);
}
The R user then might do rfunction(function(x) {x*x}) and get the right result.
I am aware that calling R functions within cfunction will kill the speed but I figure that I can figure out how to pass compiled functions later on. I'd just like to get this part working.
The closest thing I can find that does what I need is this https://sites.google.com/site/andrassali/computing/user-supplied-functions-in-rcppgsl which wraps a function that uses callback that has an oh-so-useful second parameter within which I could stuff the R function.
Advice would be gratefully received.
One possible solution would be saving the R-function into a global variable and defining a function that uses that global variable. Example implementation where I use an anonymous namespace to make the variable known only within the compilation unit:
#include <Rcpp.h>
extern "C" {
double cfunction(double (*input_function)(const double&)) {
return input_function(42);
}
}
namespace {
std::unique_ptr<Rcpp::Function> func;
}
double input_function(const double &x) {
Rcpp::NumericVector result = (*func)(x);
return result(0);
}
// [[Rcpp::export]]
double rfunction(Rcpp::Function F){
func = std::make_unique<Rcpp::Function>(F);
return cfunction(input_function);
}
/*** R
rfunction(sqrt)
rfunction(log)
*/
Output:
> Rcpp::sourceCpp('57137507/code.cpp')
> rfunction(sqrt)
[1] 6.480741
> rfunction(log)
[1] 3.73767
I have a variable which is vector of vector, And in c++, I am easily able to define and declare it but in OpenCL Kernel, I am facing the issues. Here is an example of what I am trying to do.
std::vector<vector <double>> filter;
for (int m= 0;m<3;m++)
{
const auto& w = filters[m];
-------sum operation using w
}
Now Here, I can easily referencing the values of filters[m] in w, but I am not able to do this OpenCl kernel file. Here is what I have tried,but it is giving me wrong output.
In host code:-
filter_dev = cl::Buffer(context,CL_MEM_READ_ONLY|CL_MEM_USE_HOST_PTR,filter_size,(void*)&filters,&err);
filter_dev_buff = cl::Buffer(context,CL_MEM_READ_WRITE,filter_size,NULL,&err);
kernel.setArg(0, filter_dev);
kernel.setArg(1, filter_dev_buff);
In kernel code:
__kernel void forward_shrink(__global double* filters,__global double* weight)
{
int i = get_global_id[0]; // I have tried to use indiviadual values of i in filters j, just to check the output, but is not giving the same values as in serial c++ implementation
weight = &filters[i];
------ sum operations using weight
}
Can anyone help me? Where I am wrong or what can be the solution?
You are doing multiple things wrong with your vectors.
First of all (void*)&filters doesn't do what you want it to do. &filters doesn't return a pointer to the beginning of the actual data. For that you'll have to use filters.data().
Second you can't use an array of arrays in OpenCL (or vector of vectors even less). You'll have to flatten the array yourself to a 1D array before you pass it to a OpenCL kernel.
I am new to functional programming. Loops in imperative programming replaces recursion in FP. Another statement is FP gives high concurrency. The instructions being executed parallelly on multi-core/cpu systems as the data is immutable.
Whereas in recursion, steps cannot be executed parallelly due to a step execution is dependent on the previous steps result.
So, I am assuming that recursion in FP will not give high concurrency. Am I correct?
Sort of. You cannot get more execution parallelism than the data parallelism; this is Amdahl's law. However, you frequently have more data parallelism than is expressed in typical sequential algorithms, whether functional or imperative. Consider for example taking the scalar multiple of a vector: (note: this is some made-up algol-style language):1
function scalar_multiple(scalar c, vector v) {
vector v1;
for (int i = 0; i < length(v); i++) {
v1[i] = c * v[i];
}
return v1;
}
Obviously, this isn't going to run in parallel. The situation isn't improved if we re-write in a functional language, using recursion (you can think of this as Haskell):
scalar_multiple c [] = []
scalar_multiple c (x:xn) = c * x : scalar_multiple c xn
This is still a sequential algorithm!
However, you can notice that there is no data dependency --- you don't actually need the result of earlier / later multiplications to calculate later ones. So we have the potential for parallelization here. This can be accomplished in an imperative language:
function scalar_multiple(scalar c, vector v) {
vector v1;
parallel_for (int i in 0..length(v)-1) {
v1[i] = c * v[i];
}
return v1;
}
But this parallel_for is a dangerous construct. Consider a search function:
function first(predicate p, vector v) {
for (int i = 0; i < length(v); i++) {
if (p(v[i])) return i;
}
return -1;
}
If we try speeding this up by replacing for with parallel_for:
function first(predicate p, vector v) {
parallel_for (int i in 0..length(v)-1) {
if (p(v[i])) return i;
}
return -1;
}
Now we won't necessarily return the index of the first element to satisfy the condition, just an element that satisfies it. We broke the contract of the function by parallelizing it.
The obvious solution is 'don't allow return inside parallel_for. But there are lots of other dangerous constructs; in fact, you'll notice I had to abandon the C-style for loop because the increment-and-test pattern itself is dangerous in parallel languages. Consider:
function sequence(int n) {
vector v;
int c = 0;
parallel_for (int i = 0..n-1) {
v[i] = c++;
}
return v;
}
This is again a 'toy' example ("just use v[i] = i;!"), but it illustrates the point: this function initializes v in a random order, due to parallelism. So it turns out that the constructs that are 'safe' to use inside a construct like parallel_for are precisely the constructs that are allowed in purely-functional languages, which makes adding parallel constructs to those languages 'safer' than adding them to imperative languages.
1 This is just a very simple example; of course, real parallelism involves finding bigger chunks of work to parallize than this!
Not sure, if I understand you right, but it generally depends on what you want to accomplish.
One recursion alone cannot execute its subcalls parallel. But you CAN have 2 recursions working on the same dataset. i.e. processing an array from left AND from right simultaneosly trough two concurrent running recursive functions. Those (two) functions can then (theretically) run parallel.
In detail it does not matter if you have a recursive function or a function with a loop inside as long as there is a function who can run on its own. So in respect to your question:
No, a recursive function per definition does not give you any concurrency.
Loops are replaced by higher-order functions more frequently than by direct recursion. Recursion is sort of a catch-all measure in functional programming for when higher-order functions don't already exist for what you need to do.
For example, if you want to run the same calculation on all elements of a list, you use a map, which is highly parallelizable. Finding which elements meet certain criteria is a filter, also highly parallelizable.
Some algorithms just plain require the result of the previous iteration in order to proceed. Those are the ones that tend to require a recursive function, and you're right, they are not generally easy to make highly concurrent.
I have a user defined function in r:
blacksch<-function(s_0,k,sigma,r,t)
{
d1=(log(s_0/k) + (r + (sigma^2)/2)*(t))/(sigma*sqrt(t))
d2=(log(s_0/k) + (r - (sigma^2)/2)*(t))/(sigma*sqrt(t))
p=(pnorm(-d2)*k*exp(-r*t))-pnorm(-d1)*s_0
}
And I would like to use this function in c++ code that I have written using Rcpp and cppFunction. I have been through the documentation and examples a few times, but have not been successful.
bs_martin<-cppFunction('NumericMatrix compMartin (NumericMatrix st, NumericMatrix dv, double s_0, double k,
double t, double sigma, double r, int steps, int paths, Function blacksch(fun)) {
// Ensure RNG scope set
RNGScope scope;
int min_bs_step=0;
double minbsvalue=0;
vector<double> u[0]=100.0;
for(int i=1;i<=paths; i++)
{
min_bs_step=0;
for(int j=1;j<=steps;j++)
{
if (dv[i,j]>0 && min_bs_step==0)
{
min_bs_step=i;
minbsvalue=blacksch(s_0,k,sigma,r,t);
}
else if (min_bs_step!=0)
{
dv[i,j]=1 - minbsvalue;
}
}
}
return dv;
}')
I would suggest the following:
Study our documentation and examples. We show how to pass functions around too, even if we do not recommend it (for obvious performance reason, calling R from C++ ain't speedy).
If you somewhat complex example does not work, try a smaller one. At the end of the day you may just want a tester which receives two numbers and passes those to a supplied function.
And lastly: You really want blacksch in C++ too. All the statistical functions are available under the same names.
I know this is probably a very simple question but how would I do something like
n2 in a programming language?
Is it n * n? Or is there another way?
n * n is the easiest way.
For languages that support the exponentiation operator (** in this example), you can also do n ** 2
Otherwise you could use a Math library to call a function such as pow(n, 2) but that is probably overkill for simply squaring a number.
n * n will almost always work -- the couple cases where it won't work are in prefix languages (Lisp, Scheme, and co.) or postfix languages (Forth, Factor, bc, dc); but obviously then you can just write (* n n) or n n* respectively.
It will also fail when there is an overflow case:
#include <limits.h>
#include <stdio.h>
int main()
{
volatile int x = INT_MAX;
printf("INT_MAX squared: %d\n", x * x);
return 0;
}
I threw the volatile quantifier on there just to point out that this can be compiled with -Wall and not raise any warnings, but on my 32-bit computer this says that INT_MAX squared is 1.
Depending on the language, you might have a power function such as pow(n, 2) in C, or math.pow(n, 2) in Python... Since those power functions cast to floating-point numbers, they are more useful in cases where overflow is possible.
There are many programming languages, each with their own way of expressing math operations.
Some common ones will be:
x*x
pow(x,2)
x^2
x ** 2
square(x)
(* x x)
If you specify a specific language, we can give you more guidance.
If n is an integer :p :
int res=0;
for(int i=0; i<n; i++)
res+=n; //res=n+n+...+n=n*n
For positive integers you may use recursion:
int square(int n){
if (n>1)
return square(n-1)+(n-1)+n;
else
return 1;
}
Calculate using array allocation (extremely sub-optimal):
#include <iostream>
using namespace std;
int heapSquare(int n){
return sizeof(char[n][n]);
}
int main(){
for(int i=1; i<=10; i++)
cout << heapSquare(i) << endl;
return 0;
}
Using bit shift (ancient Egyptian multiplication):
int sqr(int x){
int i=0;
int result = 0;
for (;i<32;i++)
if (x>>i & 0x1)
result+=x << i;
return result;
}
Assembly:
int x = 10;
_asm_ __volatile__("imul %%eax,%%eax"
:"=a"(x)
:"a"(x)
);
printf("x*x=%d\n", x);
Always use the language's multiplication, unless the language has an explicit square function. Specifically avoid using the pow function provided by most math libraries. Multiplication will (except in the most outrageous of circumstances) always be faster, and -- if your platform conforms to the IEEE-754 specification, which most platforms do -- will deliver a correctly-rounded result. In many languages, there is no standard governing the accuracy of the pow function. It will generally give a high-quality result for such a simple case (many library implementations will special-case squaring to save programmers from themselves), but you don't want to depend on this[1].
I see a tremendous amount of C/C++ code where developers have written:
double result = pow(someComplicatedExpression, 2);
presumably to avoid typing that complicated expression twice or because they think it will somehow slow down their code to use a temporary variable. It won't. Compilers are very, very good at optimizing this sort of thing. Instead, write:
const double myTemporaryVariable = someComplicatedExpression;
double result = myTemporaryVariable * myTemporaryVariable;
To sum up: Use multiplication. It will always be at least as fast and at least as accurate as anything else you can do[2].
1) Recent compilers on mainstream platforms can optimize pow(x,2) into x*x when the language semantics allow it. However, not all compilers do this at all optimization settings, which is a recipe for hard to debug rounding errors. Better not to depend on it.
2) For basic types. If you really want to get into it, if multiplication needs to be implemented in software for the type that you are working with, there are ways to make a squaring operation that is faster than multiplication. You will almost never find yourself in a situation where this matters, however.