In Rust by Example #36, the sum of odd integers up to a limit is calculated in both imperative style and functional style.
I separated these two out and increased the upper limit to 10000000000000000 and timed the results:
Imperative style:
me.home:rust_by_example>time ./36_higher_order_functions_a
Find the sum of all the squared odd numbers under 10000000000000000
imperative style: 333960700851149440
real 0m2.396s
user 0m2.387s
sys 0m0.009s
Functional style:
me.home:rust_by_example>time ./36_higher_order_functions_b
Find the sum of all the squared odd numbers under 10000000000000000
functional style: 333960700851149440
real 0m5.192s
user 0m5.188s
sys 0m0.003s
The functional version runs slower and also takes very slightly longer to compile.
My question is, what causes the functional version to be slower? Is this inherent to the functional style or is it due to the compiler not optimising as well as it could?
what causes the functional version to be slower? Is this inherent to the functional style or is it due to the compiler not optimising as well as it could?
Generally, the compiler will translate the higher level/shorter functional version to an imperative encoding as part of code generation. It may also apply optimizations that improve performance.
If the compiler has poor optimizations or a poor code generator, the functional code may be worse than manually-written versions.
It's really up to the compiler. Start by enabling optimizations.
Related
I'm a non English native speaker. While I'm reading Why Functional Programming Matters, I get a little confused with the following paragraph:
Even a functional programmer should be dissatisfied with these
so-called advantages, because they give no help in exploiting the
power of functional languages. One cannot write a program that is
particularly lacking in assignment statements, or particularly
referentially transparent. There is no yardstick of program quality
here, and therefore no ideal to aim at.
I'm not sure whether the last sentence means "deliberately writing no assignment statements does not improve a program's quality and therefore one should not do it" or just "there is no general yardstick of program quality".
Sorry if I asked in the wrong place.
Any help would be appreciated.
By a "yardstick of program quality", he means something you can measure or you can at least qualitatively look at to see if you're doing a good job or a bad one. So attributes like speed, clarity of code, parallelizability, ease of making changes, ease of re-factoring, etc. are all either quantitative attributes you can measure or qualitative attributes where you can see that a new version of your program is better or worse than the old version. You can give yourself, as a programmer, a job like "double the speed of this program" or "make it easier to add features to this program". On the other hand:
One cannot write a program that is particularly lacking in assignment statements, or particularly referentially transparent.
These things are binary (despite what ML programmers will tell you). You cannot give yourself an assignment like "make this program use fewer assignment statements" (according to Hughes) or "make this program more referentially transparent". That also makes it hard to tell if you've done a good job writing your program in the first place, or if you still have room for improvement.
Are there any software tools for performing arithmetic on very large numbers in parallel? What I mean by parallel is that I want to use all available cores on my computer for this.
The constraints are wide open for me. I don't mind trying any language or tech.
Please and thanks.
It seems like you are either dividing really huge numbers, or are using a suboptimal algorithm. Parallelizing things to a fixed number of cores will only tweak the constants, but have no effect on the asymptotic behavior of your operation. And if you're talking about hours for a single division, asymptotic behavior is what matters most. So I suggest you first make sure sure your asymptotic complexity is as good as can be, and then start looking for ways to improve the constants, perhaps by parallelizing.
Wikipedia suggests Barrett division, and GMP has a variant of that. I'm not sure whether what you've tried so far is on a similar level, but unless you are sure that it is, I'd give GMP a try.
See also Parallel Modular Multiplication on Multi-core Processors for recent research. Haven't read into that myself, though.
The only effort I am aware of is a CUDA library called CUMP. However, the library only provides support for addition, subtraction and multiplication. Anyway, you can use multiplication to perform the division on the GPU and check if the quality of the result is enough for your particular problem.
I saw that the latest R version supports byte compilation. What is the performance gain I can expect? And are specific tasks more positively impacted that others?
It is a good question:
The byte compiler appeared with 2.13.0 in April, and some of us run tests then (with thanks to gsk3 for the links).
I tend to look more at what Rcpp can do for us, but I now always include 'uncompiled R' and 'byte compiled R' as baselines.
I fairly often see gains by a factor of two (eg the varSims example) but also essentially no gains at all (the fibonacci example which is of course somewhat extreme).
We could do with more data. When we had the precursor byte compile in the Ra engine by Stephen Milborrow (which I had packaged up for Debian too), it was often said that both algebraic expression and loops benefited. I don't know of a clear rule for the byte compiler by Luke Tierney now in R---but it generally never seems to hurt so it may be worthwhile to get into the habit of turning it on.
Between a one- and five-fold increase:
http://radfordneal.wordpress.com/2011/05/13/speed-tests-for-r-%E2%80%94-and-a-look-at-the-compiler/
http://dirk.eddelbuettel.com/blog/2011/04/12/
This is a fairly general question about the future of R: Any hope to see a merger of compilerand Rllvm (from Omegahat) or another JIT compilation scheme for R (I know there is Ra, but not updated recently)?
In my tests the speed gain from compiler are marginal for "complicated" functions...
What matters isn't how complicated a function is but what kinds of computations it performs. The compiler will make most difference for functions dominated by interpreter overhead, such as ones that perform mostly simple operations on scalar or other small data. In cases like that I have seen a factor of 3 for artificial examples and a a bit
better than a factor of 2 for some production code. Functions that spend most of their time in operations implemented in native code, like linear algebra operations, will see little benefit.
This is just the first release of the compiler and it will evolve over time. LLVM is one of several possible direction we will look at but probably not for a while. In any case, I would expect using something like LLVM to provide further improvements in cases where the current compiler already makes a difference, but not to add much in cases where it does not.
(Moving from a comment to an answer ...)
This sounds more like a question for the r development mailing list. Based on my general impressions I would say "probably not". Are your complicated functions already based on heavily vectorized (and hence efficient) functions? I think a more promising direction for not-so-easily-automatically-optimized situations is the increased simplicity of embedding C++ etc. (i.e. Rcpp), inline if necessary
I'm implementing the Euclidian algorithm for finding the GCD (Greatest Common Divisor) of two integers.
Two sample implementations are given: Recursive and Iterative.
http://en.wikipedia.org/wiki/Euclidean_algorithm#Implementations
My Question:
In school I remember my professors talking about recursive functions like they were all the rage, but I have one doubt. Compared to an iterative version don't recursive algorithms take up more stack space and therefore much more memory? Also, because calling a function requires uses some overhead for initialization, aren't recursive algorithms more slower than their iterative counterpart?
It depends entirely on the language. If your language has tail-call recursion support(a lot do now days) then they will go at an equal speed. If it does not, then the recursive version will be slower and take more (precious) stack space.
It all depends on the language and compiler. Current computers aren't really geared towards efficient recursion, but some compilers can optimize some cases of recursion to run just as efficiently as a loop (essentially, it becomes a loop in the machine code). Then again, some compilers can't.
Recursion is perhaps more beautiful in a mathematical sense, but if you feel more comfortable with iteration, just use it.