Write fast Common Lisp code [closed] - common-lisp

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm not sure, if some weird things make my Code faster:
Is it normally better to use inbuilt operations or write new specialized functions, that do the same thing?
(for example a version of #'map only for vectors; my version is often faster without type declarations)
Should I define new (complicated) types to use them in declarations?
(for example a typed list)
Should I define slots directly to an object? (for example px and py for a 2-dimensional object, or is it ok to use one slot pos of type vector, that I could reuse it for other purposes)

There are a few parts to this but here is a quick braindump
PROFILE!
Use a distribution of CL that has a profiler built in, I use sbcl for example http://www.sbcl.org/1.0/manual/Statistical-Profiler.html
The nice thing about the sbcl profiler is that once you have profiled a function, if you disassemble it, the machine code is annotated with statistics. This requires some knowledge of the target machine code.
Do not underestimate your implementation: They can have advanced type and flow analysis built in and are able to, for example, pick a vector only version of map when it makes sense.
Learn compiler macros: compiler macros can shadow functions this gives you a place to put extra optimizations based on the context of the form. IT does this without replacing the function so it can still be used in a higher order way.
Learn Type declarations
I found this series of blog posts helped me understand this technique http://nklein.com/tags/optimization/page/2/ Read em all!
ONE MASSIVE NOTE: Don't ever lie to your compiler about a type. Type declarations are a way of telling your compiler you know what the type is the compiler doesn't even have to use them, and when it does it doesn't have to check you are giving it the correct thing.
Unboxed data
Some implementations are able to unbox certain datatype in certain conditions. Sorry that is vague but you will need to read up for your implementation. For sbcl the 'sbcl internals' guide is very helpful.
For example:
(make-array 100 :element-type 'single-float :initial-element 0.0)
Can be stored as a contiguous block of memory in sbcl.
PROFILE AGAIN (With realistic data)
I spent 3 hours writing a crazy compiler macro based n dimensional matrix multiplication routine and then tested it against a 1 line built in solution. For matricies below 5 dimensions there was not a big difference! For higher dimensions, yeah It rocked but that 'performance benefit' is purely academic because those code paths were never touched. Luckily I undertook the task for fun as I was asking the same question you are now.
Algorithms
All the type specifiers in the world won't give you a 100times performance increase. This comes from better techniques. Read on the maths behind the problem, implements different helper functions that have different strengths and choose between them at runtime...then go back and use compiler macros to allow lisp to choose at compile time. OR specify the technique as a higher order functions, for example make-hash-table allows you to specify the hashing function and rehash sizes, this can be crucial in getting good performance at certain sizes.
Know the limits of BigO
Algorithmic complexity means nothing if you loose all the of performance due to memory locality issues. Conversly sometime we can achieve superlinear performance characteristics if, by spliting the problem among cores, the reduced dataset now fits in the l2 cache.
BigO is a great metric but it isn't the end of the story. This is the reason assoc lists are a totally valid alternative to hash-tables for low numbers of keys and certain access profiles.
Summary
There is a golden mantra I heard from somewhere in the lisp community that works so well:
Make it Fast and then make it Fast
If nothing else follow this. Chant it to yourself!
Get the program up and running quickly, in doing so you are more likely to spot the places where you can use a better technique or algorithm to get your several-orders-of-magnitude improvement. Do use CL's own functions first. Don't trade lisp's higher order nature too early by using macros, explore how far you can go with functions.
[Edit] More notes - the following is for sbcl
Type definitions on struct slots are used for optimizing, type declarations for class slots are not.
With regard to types, start with what makes the program easy to write and understand (Make it fast) and then look into access times if it is the bottleneck (make It Fast!)
(slot-value x 'name) is very fast when name is known. Look at how with-slots uses symbol-macrolet to it's advantage
So to kinda directly answer your original question:
built in first (also check libraries)
does it make the problem easier to write and understand?
use pos. By the time the performance of that indirection becomes and issue you will have found a dozen other ways to speed up the problem and the solution will be part of a wider optimization technique.

Related

Global State in Functional Programming (F#)

I want to compute some functions which are dependent on some variables (specific data on which I run the code) and global variables, which are unlikely to be changed, but I want to leave them user-tunable. Just to clarify with an example, suppose I want to declare the following function:
let multiplyByGain x =
x * gain
Where would you declare gain, being gain a global constant for the whole project. In a separate module with constants? That would couple the module with this code, though. Or would you use a curried version:
let multiblyByGain x gain =
x * gain
and then specialize for the specific values? But suppose you have many functions like that, you will have to inject gain to all of them (in a sort of linking module)?
In my specific problem this becomes more cumbersome because both x and gain are arrays which must have the same length, suppose I have to do a Array.zip, e.g.: what is the best practice in terms of functional design to address a global constant, as gain, in a general way?
P.S.: I have found this old postenter link description here, but addresses only a specific problem.
There is no single correct answer to the question and the best approach will depend on a variety of other constraints and requirements that you have. Also, it depends on whether you are asking specifically about F# or whether you are asking about functional programming more generally. I think there are three main points:
Keeping it simple.
Using a module that exposes gain as a global value, which has some initialization code to read configuration seems like a good default approach in F#. If this is changed only rarely (say, before you run the whole computation), then mutation is not going to cause you any troubles. You just need to be careful to avoid changing the values while some computation is still running. I think most F# programmers code tend to be quite pragmatic about this and this seems like the easiest thing to start with.
Unit testing.
If you want to unit ytest your multiplyByGain function with different gain as an argument, then you'll need some way of passing different values of gain to the function from your unit tests. In this case, having it as an additional parameter and using currying is nice, because you can just call it with other values of gain from your tests.
Functional programming.
Some functional language communities (especially Haskell and, sometimes, Scala) are way more strict about state. The purely functional way of keeping state would be to use monads (either the reader monad or some kind of free monad structure). This makes your code a lot more complicated (both conceptually and in terms of extra syntactic overhead), but it is a purely functional solution that eliminates state. In F#, this kind of approach is even more cumbersome, so it's not very common.

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

why use pointers

I'm learning C++ on my own. I'm an EE and learned it about 20 years ago, but in the progress of my career I stopped programming and didn't take it up again until recently. I should add that I never took any classes in programming.
I have a theoretical question about pointers. In reading the books about pointers it seems they have an important role in C++. My problem is that I can't see what that is. I see that pointers have a role in arrays, but I can't see their role in anything else.
I can see what they do, but I don't see why use pointers in the situations I see them in. Either references or straight variables would work just as well. I have a feeling the answer lies in the area of memory ( it's optimal use), but I just don't know.
Any answers would be appreciated. Thanks.
Consider the following from cplusplus.com:
"[T]here may be cases where the memory needs of a program can only be
determined during runtime. For example, when the memory needed depends
on user input. On these cases, programs need to dynamically allocate
memory, for which the C++ language integrates the operators new and
delete."
If you could determine all your memory needs prior to run time and did not need to make use of any abstract data type like a linked list, then yes, it would be difficult to see their use. However, what if you want to store values in an array, but you don't yet know how big that array will need to be?
Another value of pointers arises when you consider passing values from function to function. You may find this thread of value regarding the differences between pointers and references in C++ and how/why to use each.
We have been having several pedagogical conversations focused on pointers on the CSEducators.SE site. I'd encourage you to read those as well:
Simple Pointer Examples in C
Lesson Idea: Arrays, Pointers, and Syntactic Sugar
Pointers come from C, which had no concept of reference, and which C++ inherited from.
Everything that can be done with a reference in C++ is done with a pointer in C.
I find this question really great because it is pure.
A programming language is considered "safe" when the programs written in it can only call functions and access data that the program can name.
Now, the concept of pointer was invented to break this sandbox of safety and provide developer with freedom to think and act outside of the box.
Think of pointers as poor man's tool to achieve something not provided by the programming language itself.
It is misleading to think you could achieve higher performance if programmed some algorithm using pointers. Optimization is privilege of the compiler and hardware, not human.

Is recursion a feature in and of itself?

...or is it just a practice?
I'm asking this because of an argument with my professor: I lost credit for calling a function recursively on the basis that we did not cover recursion in class, and my argument is that we learned it implicitly by learning return and methods.
I'm asking here because I suspect someone has a definitive answer.
For example, what is the difference between the following two methods:
public static void a() {
return a();
}
public static void b() {
return a();
}
Other than "a continues forever" (in the actual program it is used correctly to prompt a user again when provided with invalid input), is there any fundamental difference between a and b? To an un-optimized compiler, how are they handled differently?
Ultimately it comes down to whether by learning to return a() from b that we therefor also learned to return a() from a. Did we?
To answer your specific question: No, from the standpoint of learning a language, recursion isn't a feature. If your professor really docked you marks for using a "feature" he hadn't taught yet, that was wrong.
Reading between the lines, one possibility is that by using recursion, you avoided ever using a feature that was supposed to be a learning outcome for his course. For example, maybe you didn't use iteration at all, or maybe you only used for loops instead of using both for and while. It's common that an assignment aims to test your ability to do certain things, and if you avoid doing them, your professor simply can't grant you the marks set aside for that feature. However, if that really was the cause of your lost marks, the professor should take this as a learning experience of his or her own- if demonstrating certain learning outcomes is one of the criteria for an assignment, that should be clearly explained to the students.
Having said that, I agree with most of the other comments and answers that iteration is a better choice than recursion here. There are a couple of reasons, and while other people have touched on them to some extent, I'm not sure they've fully explained the thought behind them.
Stack Overflows
The more obvious one is that you risk getting a stack overflow error. Realistically, the method you wrote is very unlikely to actually lead to one, since a user would have to give incorrect input many many times to actually trigger a stack overflow.
However, one thing to keep in mind is that not just the method itself, but other methods higher or lower in the call chain will be on the stack. Because of this, casually gobbling up available stack space is a pretty impolite thing for any method to do. Nobody wants to have to constantly worry about free stack space whenever they write code because of the risk that other code might have needlessly used a lot of it up.
This is part of a more general principle in software design called abstraction. Essentially, when you call DoThing(), all you should need to care about is that Thing is done. You shouldn't have to worry about the implementation details of how it's done. But greedy use of the stack breaks this principle, because every bit of code has to worry about how much stack it can safely assume it has left to it by code elsewhere in the call chain.
Readability
The other reason is readability. The ideal that code should aspire to is to be a human-readable document, where each line describes simply what it's doing. Take these two approaches:
private int getInput() {
int input;
do {
input = promptForInput();
} while (!inputIsValid(input))
return input;
}
versus
private int getInput() {
int input = promptForInput();
if(inputIsValid(input)) {
return input;
}
return getInput();
}
Yes, these both work, and yes they're both pretty easy to understand. But how might the two approaches be described in English? I think it'd be something like:
I will prompt for input until the input is valid, and then return it
versus
I will prompt for input, then if the input is valid I will return it, otherwise I get the input and return the result of that instead
Perhaps you can think of slightly less clunky wording for the latter, but I think you'll always find that the first one is going to be a more accurate description, conceptually, of what you are actually trying to do. This isn't to say recursion is always less readable. For situations where it shines, like tree traversal, you could do the same kind of side by side analysis between recursion and another approach and you'd almost certainly find recursion gives code which is more clearly self-describing, line by line.
In isolation, both of these are small points. It's very unlikely this would ever really lead to a stack overflow, and the gain in readability is minor. But any program is going to be a collection of many of these small decisions, so even if in isolation they don't matter much, it's important to learn the principles behind getting them right.
To answer the literal question, rather than the meta-question: recursion is a feature, in the sense that not all compilers and/or languages necessarily permit it. In practice, it is expected of all (ordinary) modern compilers - and certainly all Java compilers! - but it is not universally true.
As a contrived example of why recursion might not be supported, consider a compiler that stores the return address for a function in a static location; this might be the case, for example, for a compiler for a microprocessor that does not have a stack.
For such a compiler, when you call a function like this
a();
it is implemented as
move the address of label 1 to variable return_from_a
jump to label function_a
label 1
and the definition of a(),
function a()
{
var1 = 5;
return;
}
is implemented as
label function_a
move 5 to variable var1
jump to the address stored in variable return_from_a
Hopefully the problem when you try to call a() recursively in such a compiler is obvious; the compiler no longer knows how to return from the outer call, because the return address has been overwritten.
For the compiler I actually used (late 70s or early 80s, I think) with no support for recursion the problem was slightly more subtle than that: the return address would be stored on the stack, just like in modern compilers, but local variables weren't. (Theoretically this should mean that recursion was possible for functions with no non-static local variables, but I don't remember whether the compiler explicitly supported that or not. It may have needed implicit local variables for some reason.)
Looking forwards, I can imagine specialized scenarios - heavily parallel systems, perhaps - where not having to provide a stack for every thread could be advantageous, and where therefore recursion is only permitted if the compiler can refactor it into a loop. (Of course the primitive compilers I discuss above were not capable of complicated tasks like refactoring code.)
The teacher wants to know whether you have studied or not. Apparently you didn't solve the problem the way he taught you (the good way; iteration), and thus, considers that you didn't. I'm all for creative solutions but in this case I have to agree with your teacher for a different reason: If the user provides invalid input too many times (i.e. by keeping enter pressed), you'll have a stack overflow exception and your solution will crash. In addition, the iterative solution is more efficient and easier to maintain. I think that's the reason your teacher should have given you.
Deducting points because "we didn't cover recursion in class" is awful. If you learnt how to call function A which calls function B which calls function C which returns back to B which returns back to A which returns back to the caller, and the teacher didn't tell you explicitly that these must be different functions (which would be the case in old FORTRAN versions, for example), there is no reason that A, B and C cannot all be the same function.
On the other hand, we'd have to see the actual code to decide whether in your particular case using recursion is really the right thing to do. There are not many details, but it does sound wrong.
There are many point of views to look at regarding the specific question you asked but what I can say is that from the standpoint of learning a language, recursion isn't a feature on its own. If your professor really docked you marks for using a "feature" he hadn't taught yet, that was wrong but like I said, there are other point of views to consider here which actually make the professor being right when deducting points.
From what I can deduce from your question, using a recursive function to ask for input in case of input failure is not a good practice since every recursive functions' call gets pushed on to the stack. Since this recursion is driven by user input it is possible to have an infinite recursive function and thus resulting in a StackOverflow.
There is no difference between these 2 examples you mentioned in your question in the sense of what they do (but do differ in other ways)- In both cases, a return address and all method info is being loaded to the stack. In a recursion case, the return address is simply the line right after the method calling (of course its not exactly what you see in the code itself, but rather in the code the compiler created). In Java, C, and Python, recursion is fairly expensive compared to iteration (in general) because it requires the allocation of a new stack frame. Not to mention you can get a stack overflow exception if the input is not valid too many times.
I believe the professor deducted points since recursion is considered a subject of its own and its unlikely that someone with no programming experience would think of recursion. (Of course it doesn't mean they won't, but it's unlikely).
IMHO, I think the professor is right by deducting you the points. You could have easily taken the validation part to a different method and use it like this:
public bool foo()
{
validInput = GetInput();
while(!validInput)
{
MessageBox.Show("Wrong Input, please try again!");
validInput = GetInput();
}
return hasWon(x, y, piece);
}
If what you did can indeed be solved in that manner then what you did was a bad practice and should be avoided.
Maybe your professor hasn't taught it yet, but it sounds like you're ready to learn the advantages and disadvantages of recursion.
The main advantage of recursion is that recursive algorithms are often much easier and quicker to write.
The main disadvantage of recursion is that recursive algorithms can cause stack overflows, since each level of recursion requires an additional stack frame to be added to the stack.
For production code, where scaling can result in many more levels of recursion in production than in the programmer's unit tests, the disadvantage usually outweighs the advantage, and recursive code is often avoided when practical.
Regarding the specific question, is recursion a feature, I'm inclined to say yes, but after re-interpreting the question. There are common design choices of languages and compilers that make recursion possible, and Turing-complete languages do exist that don't allow recursion at all. In other words, recursion is an ability that is enabled by certain choices in language/compiler design.
Supporting first-class functions makes recursion possible under very minimal assumptions; see writing loops in Unlambda for an example, or this obtuse Python expression containing no self-references, loops or assignments:
>>> map((lambda x: lambda f: x(lambda g: f(lambda v: g(g)(v))))(
... lambda c: c(c))(lambda R: lambda n: 1 if n < 2 else n * R(n - 1)),
... xrange(10))
[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
Languages/compilers that use late binding, or that define forward declarations, make recursion possible. For example, while Python allows the below code, that's a design choice (late binding), not a requirement for a Turing-complete system. Mutually recursive functions often depend on support for forward declarations.
factorial = lambda n: 1 if n < 2 else n * factorial(n-1)
Statically typed languages that allow recursively defined types contribute to enabling recursion. See this implementation of the Y Combinator in Go. Without recursively-defined types, it would still be possible to use recursion in Go, but I believe the Y combinator specifically would be impossible.
From what I can deduce from your question, using a recursive function to ask for input in case of input failure is not a good practice. Why?
Because every recursive functions call gets pushed on to the stack. Since this recursion is driven by user input it is possible to have an infinite recursive function and thus resulting in a StackOverflow :-p
Having a non recursive loop to do this is the way to go.
Recursion is a programming concept, a feature (like iteration), and a practice. As you can see from the link, there's a large domain of research dedicated to the subject. Perhaps we don't need to go that deep in the topic to understand these points.
Recursion as a feature
In plain terms, Java supports it implicitly, because it allows a method (which is basically a special function) to have "knowledge" of itself and of others methods composing the class it belongs to. Consider a language where this is not the case: you would be able to write the body of that method a, but you wouldn't be able to include a call to a within it. The only solution would be to use iteration to obtain the same result. In such a language, you would have to make a distinction between functions aware of their own existence (by using a specific syntax token), and those who don't! Actually, a whole group of languages do make that distinction (see the Lisp and ML families for instance). Interestingly, Perl does even allow anonymous functions (so called lambdas) to call themselves recursively (again, with a dedicated syntax).
no recursion?
For languages which don't even support the possibility of recursion, there is often another solution, in the form of the Fixed-point combinator, but it still requires the language to support functions as so called first class objects (i.e. objects which may be manipulated within the language itself).
Recursion as a practice
Having that feature available in a language doesn't necessary mean that it is idiomatic. In Java 8, lambda expressions have been included, so it might become easier to adopt a functional approach to programming. However, there are practical considerations:
the syntax is still not very recursion friendly
compilers may not be able to detect that practice and optimize it
The bottom line
Luckily (or more accurately, for ease of use), Java does let methods be aware of themselves by default, and thus support recursion, so this isn't really a practical problem, but it still remain a theoretical one, and I suppose that your teacher wanted to address it specifically. Besides, in the light of the recent evolution of the language, it might turn into something important in the future.

Goto is considered harmful, but did anyone attempt to make code using goto re-usable and maintainable?

Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf). I was wondering is anyone attempted to find a way to make code using goto's re-usable and maintainable and not-harmful by adding any other language extensions or developing a language which allows for gotos.
The reason I ask the question is that it occurs to me that code written in Assembly language often used goto's and global variables to make the program work well within a limited space. The Atari 2600 which had 128 bytes of ram and the program was loaded from ROM cartridge. In this case, it was better to use unstructured programming and to make the most of the freedoms this allows to make the most of a very limited space for the program.
When you compare this with a game programmed today without the use of gotos, the game takes up much more space.
Then it occurs to me that perhaps its possible to program with the use of gotos if some rules or other language changes are made to support this, then the negative effects of gotos could be reduced or eliminated. Has anyone tried to find a way to make goto's NOT considered harmful by creating a language or some rules to follow which allow gotos to be not harmful.
If no-one looked for a way to use gotos in a non-harmful way then perhaps we adopted structured programming un-necessarily based solely on this paper? Perhaps there is another solution which allows for the use of gotos without the down-side.
Comparing gotos to structured programming is comparing a situation where the programmer has to remember what every labels in the code actually mean and do, and where there are, to a situation where the conditional branches are explicitly described.
As of the advantages of the goto statement regarding the place a program might take, I think that games today are big because of the graphic and sound resources they use. That is, show 1,000,000 polygons. The cost of a goto compared to that is totally neglectable.
Moreover, the structural statements are ultimately compiled into goto ("jmp") statements by the compiler when outputting assembly.
To answer the question, it might be possible to make goto less harmful by creating naming and syntax conventions. Enforcing these conventions into rules is however pretty much what structural programming does.
Linus Torvald argued once that goto can make source code clearer, but goto is useful in so very special cases that I would not dare use it as a programmer.
This question is somehow related to yours, since I think this one of the most common situations where a goto is needed.

Resources