OpenCL: better to use macros or functions? - opencl

In OpenCL, which is C-99, we have two options for creating something function-like:
macros
functions
[edit: well, or use a templating language, new option 3 :-) ]
I heard somewhere (can't find any official reference to this anywhere, just saw it in a comment somewhere on stackoverflow, once), that functions are almost always inlined in practice, and that therefore using functions is ok performance-wise?
But macros are basically guaranteed to be inlined, by the nature of macros. But susceptible to bugs, eg if dont add parentheses around everything, and not typesafe.
In practice, what works well? What is most standard? What is likely to be most portable?
I suppose my requirements are some combination of:
as fast as possible
as little register pressure as possible
when used with compile-time constants, should ideally be guaranteed to be optimized-away to another constant
easy to maintain...
standard, not too weird, since I'm looking at using this for an opensource project, that I hope other people will contribute to

But macros are basically guaranteed to be inlined, by the nature of macros.
On GPU at least, the OpenCL compilers aggressively inlines pretty much everything, with the exception of recursive functions (OpenCL 2.0). This is both for hardware constraints and performance reasons.
While this is indeed implementation dependent, I have yet to see a GPU binary that is not aggressively inlined. I do not work much with CPU OpenCL but I believe the optimizer strategy may be similar, although the hardware constraints are not the same.
But as far as the standard is concerned, there are no guarantees.
Let's go through your requirements:
as fast as possible
as little register pressure as possible
when used with compile-time constants, should ideally be guaranteed to be optimized-away to another constant
Inline functions are as fast as macros, do not use more registers, and will be optimized away when possible.
easy to maintain...
Functions are much easier to maintain that macros. They are type safe, they can be refactored easily, etc, the list goes on forever.
standard, not too weird, since I'm looking at using this for an opensource project, that I hope other people will contribute to
I believe that is very subjective. I personally hate macros with a passion and avoid them like the plague. But I know some very successful projects that use them extensively (for example Boost.Compute). It's up to you.

Related

What are the main differences between CLISP, ECL, and SBCL?

I want to do some simulations with ACT-R and I will need a Common Lisp implementation. I have three Common Lisp implementations available: (1) CLISP [1], (2) ECL [1], and (3) SBCL [1]. As you might have gathered from the links I have read a bit about all three of them on Wikipedia. But I would like the opinion of some experienced users. More specifically I would like to know:
(i) What are the main differences between the three implementations (e.g.: What are they best at? Is any of them used only for specific purposes and might therefore not be suited for specific tasks?)?
(ii) Is there an obvious choice either based on the fact that I will be using ACT-R or based on general reasons?
As this could be interpreted as a subjective question
I checked What topics can I ask about here and What types of questions should I avoid asking? and if I read correctly it should not qualify as forbidden fruit.
I wrote a moderately-sized application and ran it in SBCL, CCL, ECL, CLISP, ABCL, and LispWorks. For my application, SBCL is far and away the fastest, and it's got a pretty good debugger. It's a bit strict about some warnings--you may end up coding in a slightly more regimented way, or turn off one or more warnings.
I agree with Sylwester: If possible, write to the standard, and then you can run your code in any implementation. You'll figure out through testing which is best for your project.
Since SBCL compiles so agressively, once in a while the stacktrace in the debugger is less informative than I'd like. This can probably be controlled with parameters, but I just rerun the same code in one of the other implementations. ABCL has an informative stacktrace, for example, as I recall. (It's also very slow, but if you want real Common Lisp and Java interoperability, it's the only option.)
One of the nice things about Common Lisp is how many high-quality implementations there are, most of them free.
For informal use--e.g. to learn Common Lisp, CCL or CLISP may be a better choice than SBCL.
I have never tried compiling to C using ECL. It's possible that it would beat SBCL on speed for some applications. I have no idea.
CLISP and LispWorks will not handle arbitrarily long argument lists (unless that's been fixed in the last couple of years, but I doubt it). This turned out to be a problem with my application, but would not be a problem for most code.
Doesn't ACT-R come out of Carnegie Mellon? What do its authors use? My guess would be CMUCL or SBCL, which is derived from CMUCL. (I only tried CMUCL briefly. Its interpreter is very slow, but I assume that compiled code is very fast. I think that most people choose SBCL over CMUCL, however.)
(It's possible that this question belongs on Programmers.SE.)
In general, SBCL is the default choice among open-source Lisps. It is solid, well-supported, produces fast code, and provides many goodies beyond what the standard mandates (concurrency primitives, profiling, etc.) Another implementation with similar properties is CCL.
CLISP is more suitable if you're not an engineer, or you want to quickly show Lisp to someone non-engineer. It's a pretty basic implementation, but quick to get running and user-friendly. A Lisp-calculator :)
ECL's major selling point is that it's embeddable, i.e. it is rather easy to make it work inside some C application, like a web-server etc. It's a good choice for geeks, who want to explore solutions on the boundary of Lisp and the outside world. If you're not intersted in such use case I wouldn't recommend you to try it, especially since it is not actively supported, at the moment.
Their names, their bugs and their non standard additions (using them will lock you in)
I use CLISP as REPL and testing during dev and usually SBCL for production. ECL i've never used.
I recommend you test your code with more than one implementation.

How to set a pragma to constrain Ada's functionality to Pascal

I intend to use Ada for some programs. I remember reading somewhere that with pragmas you can set compiler instructions to optimize your program. More specifically, I remember reading that if you need only a limited subset of Ada functionality (basically corresponding to Pascal, but with Ada's strong typing), you can use pragmas to specify a sort of 'Pascal-like mode' (I use this term for lack of a better expression). My aim is disabling those runtime checks that I don't need (since I only need basic functionality), thus reducing the size of the executable and enhancing the performance.
My question is: how do I set such a pragma? What parameters/options should I specify?
Thank you
This perhaps comes from a misunderstanding.
Ada is not a superset of Pascal. It is a fair bit more accurate to view them as sibling languages of the parent language Algol 60. Pascal was orginally developed by Niklaus Wirth to be a simplified version of Algol 60. The Algol folks instead went the other way with what became Algol 68.
Ada was instead a new language designed from scratch that borrowed from Algol 60's syntax (in much the way that Java borrows from C's syntax). It is however much more complex (some would use the word "functional") than even Algol 68.
So asking for a "Pascal flag" in your Ada compiler is much like asking for a "C++ flag" in your Java compiler.
If you are just looking for a free Pascal compiler, you might instead look at using Free Pascal or GNU Pascal.
If you are just looking to decrease the overhead of unused runtime facilities, you should look into Annex H, which lets you to use pragma Restrictions() to selectively disallow access to parts of the Ada runtime. That allows you to get rid of things like floating-point, dynamic allocation, dynamic dispatch, tasking, exceptions/runtime constraint checks, etc.
I'm sorry, but this is a Bad Idea. If you want to avoid any possible overhead from tasking constructs, then don't use tasking! People often want to suppress constraint checking (which you can do in GNAT by compiling with -p) but - in my experience - you rarely get more than a small improvement.
Ada now has pragma Restrictions, which prevents you using certain features; you can see GNAT's here. The aims are to support production of high-integrity software, portable software, or efficient tasking runtimes.
That {'Pascal-like mode'} sounds like a implementation-specific pragma, unless I'm misunderstanding you.
Though there are the 'optimize[time or space] andrestriction` pragmas that might impact your final size.

Do functional languages cope well with complexity?

I am curious how functional languages compare (in general) to more "traditional" languages such as C# and Java for large programs. Does program flow become difficult to follow more quickly than if a non-functional language is used? Are there other issues or things to consider when writing a large software project using a functional language?
Thanks!
Functional programming aims to reduce the complexity of large systems, by isolating each operation from others. When you program without side-effects, you know that you can look at each function individually - yes, understanding that one function may well involve understanding other functions too, but at least you know it won't interfere with some other piece of system state elsewhere.
Of course this is assuming completely pure functional programming - which certainly isn't always the case. You can use more traditional languages in a functional way too, avoiding side-effects where possible. But the principle is an important one: avoiding side-effects leads to more maintainable, understandable and testable code.
Does program flow become difficult to follow more quickly than if a >non-functional language is used?
"Program flow" is probably the wrong concept to analyze a large functional program. Control flow can become baroque because there are higher-order functions, but these are generally easy to understand because there is rarely any shared mutable state to worry about, so you can just think about arguments and results. Certainly my experience is that I find it much easier to follow an aggressively functional program than an aggressively object-oriented program where parts of the implementation are smeared out over many classes. And I find it easier to follow a program written with higher-order functions than with dynamic dispatch. I also observe that my students, who are more representative of programmers as a whole, have difficulties with both inheritance and dynamic dispatch. They do not have comparable difficulties with higher-order functions.
Are there other issues or things to consider when writing a large
software project using a functional language?
The critical thing is a good module system. Here is some commentary.
The most powerful module system I know of the unit system of PLT Scheme designed by Matthew Flatt and Matthias Felleisen. This very powerful system unfortunately lacks static types, which I find a great aid to programming.
The next most powerful system is the Standard ML module system. Unfortunately Standard ML, while very expressive, also permits a great many questionable constructs, so it is easy for an amateur to make a real mess. Also, many programmers find it difficult to use Standard ML modules effectively.
The Objective Caml module system is very similar, but there are some differences which tend to mitigate the worst excesses of Standard ML. The languages are actually very similar, but the styles and idioms of Objective Caml make it significantly less likely that beginners will write insane programs.
The least powerful/expressive module system for a functional langauge is the Haskell module system. This system has a grave defect that there are no explicit interfaces, so most of the cognitive benefit of having modules is lost. Another sad outcome is that while the Haskell module system gives users a hierarchical name space, use of this name space (import qualified, in case you're an insider) is often deprecated, and many Haskell programmers write code as if everything were in one big, flat namespace. This practice amounts to abandoning another of the big benefits of modules.
If I had to write a big system in a functional language and had to be sure that other people understood it, I'd probably pick Standard ML, and I'd establish very stringent programming conventions for use of the module system. (E.g., explicit signatures everywhere, opague ascription with :>, and no use of open anywhere, ever.) For me the simplicity of the Standard ML core language (as compared with OCaml) and the more functional nature of the Standard ML Basis Library (as compared with OCaml) are more valuable than the superior aspects of the OCaml module system.
I've worked on just one really big Haskell program, and while I found (and continue to find) working in Haskell very enjoyable, I really missed not having explicit signatures.
Do functional languages cope well with complexity?
Some do. I've found ML modules and module types (both the Standard ML and Objective Caml) flavors invaluable tools for managing complexity, understanding complexity, and placing unbreachable firewalls between different parts of large programs. I have had less good experiences with Haskell
Final note: these aren't really new issues. Decomposing systems into modules with separate interfaces checked by the compiler has been an issue in Ada, C, C++, CLU, Modula-3, and I'm sure many other languages. The main benefit of a system like Standard ML or Caml is the that you get explicit signatures and modular type checking (something that the C++ community is currently struggling with around templates and concepts). I suspect that these issues are timeless and are going to be important for any large system, no matter the language of implementation.
I'd say the opposite. It is easier to reason about programs written in functional languages due to the lack of side-effects.
Usually it is not a matter of "functional" vs "procedural"; it is rather a matter of lazy evaluation.
Lazy evaluation is when you can handle values without actually computing them yet; rather, the value is attached to an expression which should yield the value if it is needed. The main example of a language with lazy evaluation is Haskell. Lazy evaluation allows the definition and processing of conceptually infinite data structures, so this is quite cool, but it also makes it somewhat more difficult for a human programmer to closely follow, in his mind, the sequence of things which will really happen on his computer.
For mostly historical reasons, most languages with lazy evaluation are "functional". I mean that these language have good syntaxic support for constructions which are typically functional.
Without lazy evaluation, functional and procedural languages allow the expression of the same algorithms, with the same complexity and similar "readability". Functional languages tend to value "pure functions", i.e. functions which have no side-effect. Order of evaluation for pure function is irrelevant: in that sense, pure functions help the programmer in knowing what happens by simply flagging parts for which knowing what happens in what order is not important. But that is an indirect benefit and pure functions also appear in procedural languages.
From what I can say, here are the key advantages of functional languages to cope with complexity :
Functional programming hates side-effects.
You can really black-box the different layers
and you won't be afraid of parallel processing
(actor model like in Erlang is really easier to use
than locks and threads).
Culturally, functional programmer
are used to design a DSL to express
and solve a problem. Identifying the fundamental
primitives of a problem is a radically
different approach than rushing to the brand
new trendy framework.
Historically, this field has been led by very smart people :
garbage collection, object oriented, metaprogramming...
All those concepts were first implemented on functional platform.
There is plenty of literature.
But the downside of those languages is that they lack support and experience in the industry. Having portability, performance and interoperability may be a real challenge where on other platform like Java, all of this seems obvious. That said, a language based on the JVM like Scala could be a really nice fit to benefit from both sides.
Does program flow become difficult to
follow more quickly than if a
non-functional language is used?
This may be the case, in that functional style encourages the programmer to prefer thinking in terms of abstract, logical transformations, mapping inputs to outputs. Thinking in terms of "program flow" presumes a sequential, stateful mode of operation--and while a functional program may have sequential state "under the hood", it usually isn't structured around that.
The difference in perspective can be easily seen by comparing imperative vs. functional approaches to "process a collection of data". The former tends to use structured iteration, like a for or while loop, telling the program "do this sequence of tasks, then move to the next one and repeat, until done". The latter tends to use abstracted recursion, like a fold or map function, telling the program "here's a function to combine/transform elements--now use it". It isn't necessary to follow the recursive program flow through a function like map; because it's a stateless abstraction, it's sufficient to think in terms of what it means, not what it's doing.
It's perhaps somewhat telling that the functional approach has been slowly creeping into non-functional languages--consider foreach loops, Python's list comprehensions...

What are the things that optimize speed declaration do in CL?

What are some of the optimization steps that this command does
`(optimize speed (safety 0))`
Can I handcode some of these techniques in my Lisp/Scheme program?
What these things do in CL depends on the implementation. What usually happens is: there's a bunch of optimizations and other code transformations that can be applied on your code with various tradeoffs, and these declarations are used as a higher level specification that is translated to these individual transformations. Most implementations will also let you control the individual settings, but that won't be portable.
In Scheme there is no such standard facility, even though some Schemes use a similar approach. The thing is that Scheme in general (ie, in the standard) avoids such "real world" issues. It is possible to use some optimization techniques here and there, but that depends on the implementation. For example, in PLT the first thing you should do is make sure your code is defined in a module -- this ensures that the compiler gets to do a bunch of optimizations like inlining and unfolding loops.
I don't know, but I think the SBCL internals wiki might have some starting points if you want to explore.
Higher speed settings will cause the compiler to work harder on constant folding, compile-time type-inference (hence eliminating runtime dynamic dispatches for generic operations) and other code analyses/transformations; lower safety will skip runtime type checks, array bound checks, etc. For more details, see
Advanced Compiler Use and Efficiency Hints chapter of the CMUCL User's Manual, which applies to both CMUCL and SBCL(more or less).

How do I use Declarations (type, inline, optimize) in Scheme?

How do I declare the types of the parameters in order to circumvent type checking?
How do I optimize the speed to tell the compiler to run the function as fast as possible like (optimize speed (safety 0))?
How do I make an inline function in Scheme?
How do I use an unboxed representation of a data object?
And finally are any of these important or necessary? Can I depend on my compiler to make these optimizations?
thanks,
kunjaan.
You can't do any of these in any portable way.
You can get a "sort of" inlining using macros, but it's almost always to try to do that. People who write Scheme (or any other language) compilers are usually much better than you in deciding when it is best to inline a function.
You can't make values unboxed; some Scheme compilers will do that as an optimization, but not in any way that is visible (because it is an optimization -- so it should preserve the semantics).
As for your last question, an answer is very subjective. Some people cannot sleep at night without knowing exactly how many CPU cycles some function uses. Some people don't care and are fine with trusting the compiler to optimize things reasonably well. At least at the stages where you're more of a student of the language and less of an implementor, it is better to stick to the latter group.
If you want to help out the compiler, consider reducing top level definitions where possible.
If the compiler sees a function at top-level, it's very hard for it to guess how that function might be used or modified by the program.
If a function is defined within the scope of a function that uses it, the compiler's job becomes much simpler.
There is a section about this in the Chez Scheme manual:
http://www.scheme.com/csug7/use.html#./use:h4
Apparently Chez is one of the fastest Scheme implementations there is. If it needs this sort of "guidance" to make good optimizations, I suspect other implementations can't live without it either (or they just ignore it all together).

Resources