Knowing when what you're looking at must be a macro

Knowing when what you're looking at must be a macro - common-lisp

I know there is macro-function, explained here, which allows you to check, but is it also possible in simply reading lisp source to sometimes infer of what you're looking at "that must be a macro"? (assuming of course you have never seen the function/macro before).
I'm fairly sure the answer is yes, but as this seems so fundamental, I thought worth asking, especially because any nuances on this may be valuable & interesting to know about.
In Paul Graham's ANSI Common Lisp, p70, he is describing how to use defstruct.
When I see (defstruct point x y), were I to know absolutely nothing about what defstruct was, this could just as well be a function.
But when I see
(defstruct polemic
(subject "foo")
(effect "bar"))
I know that must be a macro because (let's assume), I also know that subject and effect are undefined functions. (I know that because they error with undefined function when called 'at the top level'(?)) (if that's the right term).
If the two list arguments to defstruct above were quoted, it would not be so simple. Because they're not quoted, it must be a macro.
Is it as simple as that?
I've changed the field names slightly from those used on the book to make this question clearer.
Finally, Graham writes:
"We can specify default values for structure fields by enclosing the field name and a default expression in a list in the original definition"
What I'm noticing is that that's true but it is not a (quoted) list. Would any readers of this post have phrased the above sentence at all differently (given that macros haven't been introduced in the book yet (though I have a basic awareness of what they are)).
My feeling is it's not a "data list" those default expressions are enclosed in. (apologies for bad terminology) - seeking how rightly to conceptualise here.

In general, you're right: if there's some nesting inside the call and you are sure that the car's of the nested lists aren't functions - it's a macro.
Also, almost always, def-something and with-something are macros.
But there's no guarantee. The question is, what are you trying to accomplish? Some code walking/transformation or external processing (like in an editor). For the latter, you should keep in mind that full control is possible only if you perform code evaluation, although heuristics (like in Emacs) can take you pretty far. Or you just want to develop your intuition for faster code reading...

There is a set of conventions that identify quite cleary what forms are supposed to be macros, simply by mimicking the syntax of existing macros or special operators of CL.
For example, the following is a mix of various imaginary macros, but even without knowing their definition, the code shouldn't be too hard to figure out:
(defun/typed example ((id (integer 0 10)))
(with-connection (connection (connect id))
(do-events (event connection)
(event-case event
(:quit (&optional code) (return code))))))
The usual advice about macros is to avoid them if possible, so if you spot something that doesn't make sense as a lisp expression, it probably is, or is enclosed in, a macro.
(defstruct point x y)
[...] were I to know absolutely nothing about what defstruct was, this could just as well be a function.
There are various hints that this is not a function. First of all, the name starts with def. Then, if defstruct was a function, then point, x and y would all be evaluated before calling the function, and that means the code would be relying on global variables, even though they are not wearing earmuffs (e.g. *point*, *x*, *y*), and you probably won't find any definition for them in the preceding forms (or later in the same compilation unit). Also, if it was a function, the result would be discarded directly since it is not used (this is a toplevel form). That only indicates the probable presence of side-effects, but still, this would be unusual.
A top-level function with side-effects would look like this instead, with quoted data:
(register-struct 'point '(x y))
Finally, there are cases where you cannot easily guess if you are using a macro or a function:
(my-get object :slot)
This could be a function call, or you could have a macro that turns the above to (aref object 0) (assuming :slot is the zeroth slot in object, because all your objects are assumed to be of a certain custom type backed by a vector). You could also have compiler macros. In case of doubt, try to macroexpand it and look at the documentation.

Related

Moral of the story from SICP Ex. 1.20?

In this exercise we are asked to trace Euclid's algorithm using first normal order then applicative order evaluation.
(define (gcd a b)
(if (= b 0)
a
(gcd b (remainder a b))))
I've done the manual trace, and checked it with the several solutions available on the internet. I'm curious here to consolidate the moral of the exercise.
In gcd above, note that b is re-used three times in the function body, plus this function is recursive. This being what gives rise to 18 calls to remainder for normal order, in contrast to only 4 for applicative order.
So, it seems that when a function uses an argument more than once in its body, (and perhaps recursively! as here), then not evaluating it when the function is called (i.e. applicative order), will lead to redundant recomputation of the same thing.
Note that the question is at pains to point out that the special form if does not change its behaviour when normal order is used; that is, if will always run first; if this didn't happen, there could be no termination in this example.
I'm curious regarding delayed evaluation we are seeing here.
As a plus, it might allow us to handle infinite things, like streams.
As a minus, if we have a function like here, it can cause great inefficiency.
To fix the latter it seems like there are two conceptual options. One, wrap it in some data structure that caches its result to avoid recomputation. Two, selectively force the argument to realise when you know it will otherwise cause repeated recomputation.
The thing is, neither of those options seem very nice, because both represent additional "levers" the programmer must know how to use and choose when to use.
Is all of this dealt with more thoroughly later in the book?
Is there any straightforward consolidation of these points which would be worth making clear at this point (without perhaps going into all the detail that is to come).

The short answer is yes, this is covered extensively later in the book.
It is covered in most detail in Chapter 4 where we implement eval and apply, and so get the opportunity to modify their behaviour. For example Exercise 4.31 suggests the following syntax:
(define (f a (b lazy) c (d lazy-memo))
As you can see this identifies three different behaviours.
Parameters a and c have applicative order (they are evaluated before they are passed to the f).
b is normal or lazy, it is evaluated everytime it is used (but only if it is used).
d lazy but it's value it memoized so it is evaluated at most once.
Although this syntax is possible it isn't used. I think the philosopy is that the the language has an expected behaviour (applicative order) and that normal order is only used by default when necessary (e.g., the consequent and alternative of an if statement, and in creating streams). When it is necssary (or desirable) to have a parameter with normal evaluation then we can use delay and force, and if necessary memo-proc (e.g. Exercise 3.77).
So at this point the authors are introducing the ideas of normal and applicative order so that we have some familiarity with them by the time we get into the nitty gritty later on.
In a sense this a recurring theme, applicative order is probably more intuitive, but sometimes we need normal order. Recursive functions are simpler to write, but sometimes we need the performance of iterative functions. Programs where we can use the substition model are easier to reason about, but sometimes we need environmental model because we need mutable state.

How is it possible that a function can call itself

I know about recursion, but I don't know how it's possible. I'll use the fallowing example to further explain my question.
(def (pow (x, y))
(cond ((y = 0) 1))
(x * (pow (x , y-1))))
The program above is in the Lisp language. I'm not sure if the syntax is correct since I came up with it in my head, but it will do. In the program, I am defining the function pow, and in pow it calls itself. I don't understand how it's able to do this. From what I know the computer has to completely analyze a function before it can be defined. If this is the case, then the computer should give an undefined message when I use pow because I used it before it was defined. The principle I'm describing is the one at play when you use an x in x = x + 1, when x was not defined previously.

Compilers are much smarter than you think.
A compiler can turn the recursive call in this definition:
(defun pow (x y)
(cond ((zerop y) 1)
(t (* x (pow x (1- y))))))
into a goto intruction to re-start the function from scratch:
Disassembly of function POW
(CONST 0) = 1
2 required arguments
0 optional arguments
No rest parameter
No keyword parameters
12 byte-code instructions:
0 L0
0 (LOAD&PUSH 1)
1 (CALLS2&JMPIF 172 L15) ; ZEROP
4 (LOAD&PUSH 2)
5 (LOAD&PUSH 3)
6 (LOAD&DEC&PUSH 3)
8 (JSR&PUSH L0)
10 (CALLSR 2 57) ; *
13 (SKIP&RET 3)
15 L15
15 (CONST 0) ; 1
16 (SKIP&RET 3)
If this were a more complicated recursive function that a compiler cannot unroll into a loop, it would merely call the function again.

From what I know the computer has to completely analyze a function before it can be defined.
When the compiler sees that one defines a function POW, then it tells itself: now we are defining function POW. If it then inside the definition sees a call to POW, then the compiler says to itself: oh, this seems to be a call to the function that I'm currently compiling and it can then create code to make a recursive call.

A function is just a block of code. It's name is just help so you don't have to calculate the exact address it will end up in. The programming language will turn the names into where the program is to go to execute.
How one function call another is by storing the address of the next command in this function on the stack, perhaps add arguments to the stack and then jump to the address location of the function. The function itself jumps to the return address it finds so that control goes back to the callee. There are several calling conventions implemented by the language on which side do what. CPUs don't really have function support so just like there is nothing called a while loop in CPUs functions are emulated.
Just like functions have names, arguments have names too, however they are mere pointers just like the return address. When calling itself it just adds a new return address and arguments onto the stack and jump to itself. The top of the stack will be different and thus the same variable names are unique addresses to the call so x and y in the previous call is somewhere else than the current x and y. In fact there is no special treatment needed for calling itself than calling anything else.
Historically the first high level language, Fortran, did not support recursion. It would call itself but when it returned it returned to the original callee without doing the rest of the function after the self call. Fortran itself would have been impossible to write without recursion so while itself used recursion it did not offer it to the programmer that used it. This limitation is the reason why John McCarthy discovered Lisp.

I think to see how this can work in general, and in particular in cases where recursive calls can't be turned into loops, it's worth thinking about how a general compiled language might work, because the problems are not different.
Let's imagine how a compiler might turn this function into machine code:
(defun foo (x)
(+ x (bar x)))
And let's assume that it does not know anything about bar at the time of compilation. Well, it has two options.
It can compile foo in such a way that the call to bar is translated a set of instructions which say, 'look up the function definition stored under the name bar, whatever it currently is, and arrange to call that function with the right arguments'.
It can compile foo in such a way that there is a machine-level function call to a function but the address of that function is left as a placeholder of some kind. And it can then attach some metadata to foo which says: 'before this function is called you need to find the function named bar, find its address, splice it into the code in the right place, and remove this metadata.
Both of these mechanisms allow foo to be defined before it's known what bar is. And note that instead of bar I could have written foo: these mechanisms deal with recursive calls too. They differ apart from that, however.
The first mechanism means that, every time foo is called it needs to do some kind of dynamic lookup for bar which will involve some overhead (but this overhead can be pretty small):
as a consequence of this the first mechanism will be slightly slower than it might be;
but, also as a consequence of this, if bar gets redefined, then the new definition will get picked up, which is a very desirable thing for an interactive language, which Lisp implementations usually are.
The second mechanism means that, after foo has all its references to other functions linked in to it, then the calls happen at the machine level:
this means they will be quick;
but that redefinition will be, at best, more complicated or, at worst, not possible at all.
The second of these implementations is close to how traditional compilers compile code: they compile code leaving a bunch of placeholders with associated metadata saying what names those placeholders correspond to. A linker, (sometimes known as a link-loader, or loader) then grovels over all the files produced by the compiler as well as other libraries of code and resolves all these references, resulting in a bit of code which can actually be run.
A very simple-minded Lisp system might work entirely by the first mechanism (I am pretty sure that this is how Python works, for instance). A more advanced compiler will probably work by some combination of the first and second mechanism. As an example of this, CL allows the compiler to make assumptions that apparent self-calls in functions really are self-calls, and so the compiler may well compile them as direct calls (essentially it will compile the function and then link it on the fly). But when compiling code in general, it might call 'through the name' of the function.
There are also more-or-less heroic strategies which things could do: for instance at the first call of a function link it, on the fly, to all the things it refers to, and note in their definitions that if they change then this thing needs to be unlinked as well so it all happens again. These kind of tricks once seemed implausible, but compilers for languages like JavaScript do things at least as hairy as this all the time now.
Note that compilers and linkers for modern systems actually do something more complicated than I've described, because of shared libraries &c: what I described is more-or-less what happened pre shared-library.

Why is there no generic operators for Common Lisp?

In CL, we have many operators to check for equality that depend on the data type: =, string-equal, char=, then equal, eql and whatnot, so on for other data types, and the same for comparison operators (edit don't forget to answer about these please :) do we have generic <, > etc ? can we make them work for another object ?)
However the language has mechanisms to make them generic, for example generics (defgeneric, defmethod) as described in Practical Common Lisp. I imagine very well the same == operator that will work on integers, strings and characters, at least !
There have been work in that direction: https://common-lisp.net/project/cdr/document/8/cleqcmp.html
I see this as a major frustration, and even a wall, for beginners (of which I am), specially we who come from other languages like python where we use one equality operator (==) for every equality check (with the help of objects to make it so on custom types).
I read a blog post (not a monad tutorial, great serie) today pointing this. The guy moved to Clojure, for other reasons too of course, where there is one (or two?) operators.
So why is it so ? Is there any good reasons ? I can't even find a third party library, not even on CL21. edit: cl21 has this sort of generic operators, of course.
On other SO questions I read about performance. First, this won't apply to the little code I'll write so I don't care, and if you think so do you have figures to make your point ?
edit: despite the tone of the answers, it looks like there is not ;) We discuss in comments.

Kent Pitman has written an interesting article that tackles this subject: The Best of intentions, EQUAL rights — and wrongs — in Lisp.
And also note that EQUAL does work on integers, strings and characters. EQUALP also works for lists, vectors and hash tables an other Common Lisp types but objects… For some definition of work. The note at the end of the EQUALP page has a nice answer to your question:
Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. Although these functions take any type of argument and their names sound very generic, equal and equalp are not appropriate for every application.
Specifically note that there is a trick in my last “works” definition.

A newer library adds generic interfaces to standard Common Lisp functions: https://github.com/alex-gutev/generic-cl/
GENERIC-CL provides a generic function wrapper over various functions in the Common Lisp standard, such as equality predicates and sequence operations. The goal of the wrapper is to provide a standard interface to common operations, such as testing for the equality of two objects, which is extensible to user-defined types.
It does this for equality, comparison, arithmetic, objects, iterators, sequences, hash-tables, math functions,…
So one can define his own + operator for example.

Yes we have! eq works with all values and it works all the time. It does not depend on the data type at all. It is exactly what you are looking for. It's like the is operator in python. It must be exactly what you were looking for? All the other ones agree with eq when it's t, however they tend to be t for totally different values that have various levels of similarities.
(defparameter *a* "this is a string")
(defparameter *b* *a*)
(defparameter *c* "this is a string")
(defparameter *d* "THIS IS A STRING")
All of these are equalp since they contain the same meaning. equalp is perhaps the sloppiest of equal functions. I don't think 2 and 2.0 are the same, but equalp does. In my mind 2 is 2 while 2.0 is somewhere between 1.95 and 2.04. you see they are not the same.
equal understands me. (equal *c* *d*) is definitely nil and that is good. However it returns t for (equal *a* *c*) as well. Both are arrays of characters and each character are the same value, however the two strings are not the same object. they just happen to look the same.
Notice I'm using string here for every single one of them. We have 4 equal functions that tells you if two values have something in common, but only eq tells you if they are the same.
None of these are type specific. They work on all types, however they are not generics since they were around long before that was added in the language. You could perhaps make 3-4 generic equal functions but would they really be any better than the ones we already have?

Fortunately CL21 introduces (more) generic operators, particularly for sequences it defines length, append, setf, first, rest, subseq, replace, take, drop, fill, take-while, drop-while, last, butlast, find-if, search, remove-if, delete-if, reverse, reduce, sort, split, join, remove-duplicates, every, some, map, sum (and some more). Unfortunately the doc isn't great, it's best to look at the sources. Those should work at least for strings, lists, vectors and define methods of the new abstract-sequence.
see also
https://github.com/cl21/cl21/wiki
https://lispcookbook.github.io/cl-cookbook/cl21.html

In GHC's STG output with -O2, what's this sequence following Str=DmdType all about?

(Misleading title: it's only one of a plethora of inter-related similar questions below: these sound like asking for a full reference manual but keep in mind for this topic there is no reference manual other than the entirety of GHC's source-codes of its STG pipeline stage, and the collective accumulated experience of others/"insiders"..)
I'm exploring "transpiling" Haskell (from scratch for fun/learning, ignoring existing projects; target language/s similarly high-level / "already-fit-for-STG-machine" with existing GC + lambdas/func-values + closures) and so I'm trying to become ever more familiar with GHC's STG IR. Having repeatedly gone through the dozen-or-two online articles/videos of varying age, depth, detail that actually deal with the topic (plus the original paper, plus StgSyn.hs), and understanding many-perhaps-most basic principles, seeing -ddump-stged output still baffles me in various parts (I won't manually parse it but reuse GHC API's in-memory AST later on of course) --- mostly I think I'm stuck mapping my "roughly known" concepts to the "still-foreign" abbreviated/codified identifiers of that IR. If you know your way around STG a bit, mind looking at the following mini-sample to clarify a few open questions and help further solidify my (and future searchers') grasp?
From a most simple .hs module, I have -ddump-stged twice, first (on the left) with -O0 and then (on the right) with -O2, both captured in this diff.
Walking through everything def-by-def..
Lines L_|R5-11: so in O2, testX1 and testX2 seem to be global constants/literals for the integers 4 and 5 --- O0 doesn't have them. Curious!
Is Str=DmdType something about strictness? "Strictness is of type on-demand" or some such? But then a top-level/heap-ish/"global" constant literal can't be "lazy" can it.. (one of the things where I can't just casually Ctrl+F in StgSyn.hs --- it's not in there! which is odd in its own way, how come there's STG syntax not in StgSyn.hs)
Caf have a rough idea about constant-applicative-forms, but Unf=OtherCon? "Other constructor" (unboxed/native Type.S#-related?) ..
Line L6|R14: Surprised to still see type-class information in there (Num), is that "just info/annotation" or is this crucial for any of the built-in code-gens to set up some "dictionary" lookup machinery at runtime? (I'd sure hope by the late STG / pre-CMM stage that would be resolved and inlined already where possible at least in O2. After all GHC has also decided to type-default 4 and 5 to Integer). Generally speaking I understand STG is "untyped" other than denoting prim types, saturated cons, perhaps strings (looks like it later on at the bottom), so such "typeclass" annotations can only be.. I guess for readers to find their way around the ddump-ed *.stg. But correct me if not.
GblId probably just "global identifier" aka top-level CAF right? Arity clear.
Line L7|R18: now Str=DmdType for testX is, only in O2, followed by a freakish <S(LLC(C(S))LLLL),U(1*C1(C1(U)),A,1*C1(C1(U)),A,A,A,C(U))><L,U>! What's that, SKI calculus? ;D no seriously, LLC.. LLLL.. stack or other memory layout hints for CMM? Any idea? Must be some optimization, would like to understand which-and-how..
Line L8|R20: $dNum_sGM (left) and $dNum_sIx (right) have me a bit concerned, they don't seem to be "defined at the module level" here anywhere. Typeclass "method dispatch dictionary lookup" kind of thing? Would eg. CMM take this together with the above Num annotation to set things up? It always appears together with the input func arg.
The whole function "body" for both left and right can be seen here essentially as "3 lets with a lambda-ish form for 3 atoms, 2 of which are statically known literal-constants" --- I suppose this is standard and to be expected in the STG IR AST? For the first of these, funnily enough we could say that O0 has "inlined the global (what is testX1 or testX2 in O2) and O2 hasn't" (making the latter much shorter as that applies to both these constant literals).
I've only ever seen Occ=Once, what are the others and how to interpret? Once for one isn't even in StgSyn.hs..
Now LclId a counterpart to the earlier encountered GblId. That's denoting the scope of the identifier? Could it also be anything else, in this expression context? As in: if traversing the AST I roughly know how deep I am, I can ignore this since if I'm at the top-level it must be GblId and otherwise LclId? Hm.. maybe better take what STG gives me but then I need to be sure about the semantics and possibilities.. guys, using StgSyn.hs I have the wrong source file, right? Nothing on this in there either.. (always hopeful as its comments are quite well-done)
the rest is just metadata as string constants, OK.. oh wait, look at O2, there's Str=DmdType m1 and Str=DmdType m, what's the m/m1 about, another thing I don't see "defined anywhere at the module level" here? And it's not in O0..
still going strong? Merely a bonus question (for now), tell us about srt:SRT:[] ;)

Just a few tidbits - a full answer is quite beyond my knowledge.
The type of your function is
testX :: GHC.Num.Num a => a -> a
It’s compiled to a function with two arguments: a dictionary of the Num type class, and the actual argument.
The $d… names stand for dictionaries of type class instances. The <S(LLC(C(S))LLLL),… annotations are strictness information about the function arguments. They basically say which part of the argument will be used by your function and which not. Looks a bit weird here because it contains information about all the class instance members.
Some of this is explained here:
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/Demand
The str:STR: is the „Static reference table“, i.e. list of free variables of the expression - in your case, always [].

What does it mean to "open code" something in Common Lisp?

In the SBCL user manual there are several references to the term "open code". Common Lisp hackers also use this term when referring to optimizing code.
Could you please explain what it means to "open code" something and give an example of how it works?

What It Is
Open-coding, AKA inlining, means replacing function calls with inline assembly.
The idea is that funcall and apply are expensive (they require saving and restoring the stack &c) and replacing them with the few operations which constitute the function can be beneficial.
E.g., the function 1+ is a single instruction when the argument is a fixnum (which it usually is in practice), so turning the funcall into the two parallel branches (fixnum and otherwise) would be a win.
How to Control it
Declarations
They user can control this optimization explicitly by the inline
declaration (which implementations are free to ignore).
The user can also influence this optimization by the optimize declaration.
Both will affect inlining the code of a function defined just as a function (see below).
Macros
The "old" way is to implement the function as a macro. E.g., instead of
(defun last1f (list)
(car (last list)))
write
(defmacro last1m (list)
`(car (last ,list)))
and last1m will be always open-coded. The problem with this approach is that you cannot use last1m as a function - you cannot pass it to, say, mapcar.
Thus Common Lisp has an alternative way - compiler macros, which tell the compiler how to transform the form before compiling it:
(define-compiler-macro last1f (list)
;; use in conjunction with (defun last1f ...)
`(car (last ,list)))
See also the excellent examples in the aforelinked CLHS page.
Its Effects on Optimization
A comment asked about the effects of inlining, i.e., what optimizations result from it. E.g., constant propagation in addition to eliminating a function call.
The answer to this question is left to implementations.
IOW, the CL standard does not specify what optimizations must be done.
However, Minimal Compilation implies that
if an implementation does something (e.g. constant folding), it will be
done for compiler macros too.
For more details, you should compare the results of
disassemble with and
without the declarations and whatnot and see the effects.
For explanations, you should ask the vendor (e.g., by using the appropriate
tag here - e.g., sbcl, clisp, &c).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex