Specification of conses - common-lisp

The Wikipedia-page about car and cdr says that a cons is a pair of pointers.
The following code seems to confirm that:
(progn
(setq a '(1 . 2))
(setq b a)
(setf (car b) 10)
(print a))
The evaluation of that form gives the cons (10 . 2). Setting the car of b changes the car of a. You can try that in the online repl at compileonline.com.
Where is that behavior defined in the Common Lisp specification?
(I've read some of the text but couldn't find the section pin-pointing that behavior.)
Interestingly the Wikipedia page on conses says "cons constructs memory objects which hold two values or pointers to values". If the atom 1 would directly be stored in the cons object then changing b wouldn't change a, would it?
I assume above that a and b hold the cons objects and not pointer to conses.
Even if the actual lisp implementation works with pointers this should not be visible on repl-level or should it according to the spec? One could achieve a similar effect when one assumes that a and b hold pointers that both point to the same cons.
Consing, i.e., the construction of lists through repeated application of cons, supports the assumption that conses are represented by pointers in symbol values.
The following two forms are equivalent:
(let ((a '(1)))
(setq b (cons 2 a)))
(setq b '(2 1))

Setting the car of b changes the car of a.
You are not setting the car of b. You are setting the car of the same cons cell which is referenced by b and a.
CL-USER 1 > (let (a b)
(setq a (cons 1 2))
(setq b a)
(eq a b))
T
Explanation:
we have variables a and b.
(cons 1 2) returns a cons cell
(setq a (cons 1 2)) sets a to the result of (cons 1 2), a cons cell.
(setq b a) evaluates a, which returns above cons cell and sets b to that cons cell.
The key to understand is that evaluation of variables and functions returns non-primitive (that is other than primitive numbers, characters, ...) objects as themselves - not as a copy.
I assume above that a and b hold the cons objects and not pointer to conses.
That's wrong. a and b are variables, which just point to both the same single cons cell.
"cons constructs memory objects which hold two values or pointers to values"
In a Common Lisp implementation something like small numbers (fixnums) might be stored directly in a cons cell. One can not reliably compare numbers by identity (using EQ) and has to do numeric comparison (EQL, =, ...).
OTOH, cons cells can't be stored inside cons cells and thus are referenced by pointers internally.
The operations you use:
SETQ : First form1 is evaluated and the result is stored in the variable var1. -> Notice how it says: result and not copy of the result.
RPLCA - this is what (setf CAR) actually uses. : rplaca replaces the car of the cons with object and The cons is modified -> thus it modifies the cons object it gets passed as an argument.
Evaluation - since you execute your code, the rules for evaluation apply.
Further useful to understand the execution model of Lisp:
An old Lisp book: 'Anatomy of LISP' by John Allen. (amazon)
A Scheme spec like R5RS.
Lisp in small pieces, a book explaining the execution models of Scheme/Lisp and their implementation.

First of all, your code is invalid because you're not allowed to modify constant lists. It should be:
(progn
(setq a (cons 1 2))
(setq b a)
(setf (car b) 10)
(print a))
When you perform an assignment like (setq b a), it sets the value of b to be the same as the value of a. The specification of SETQ says nothing about making a copy of the value, so the two variables contain the same value, which in this case is a cons cell.
Setting the car of that cons cell modifies that object -- again, there's no copying being done. So you'll see the change through any variable that refers to the cons cell, or any other reference (it could be in a structure slot, an array element, another cons cell, etc.).
I don't think the specification ever actually comes out and says that these things are all the same, it's just implicit in the fact that we're passing around abstract objects, and no copying is done unless you call a function that's explicitly defined to do so (e.g. COPY-TREE).
The specification doesn't talk about pointers, but that's generally what's going on under the covers. A cons cell is like a C structure:
typedef struct cons {
lisp_object car,
lisp_object cdr
} cons;
lisp_object would probably be a union of various types (some immediate types for things like FIXNUM, and pointers for other types). When a variable contains a cons, it actually contains a pointer to the above structure, and the assignment copies the pointer, not the structure. So the Lisp code is analogous to C code like:
cons *a = make_cons(1, 2);
cons *b = a;
b->car = 10;
printf("%d\n", a->car);

Related

Why mutating the list to be only its first element with this approach does not work in Common Lisp?

I am trying to learn Common Lisp with the book Common Lisp: A gentle introduction to Symbolic Computation. In addition, I am using SBCL, Emacs, and Slime.
By the end of chapter 10, on the advanced section there is this question:
10.9. Write a destructive function CHOP that shortens any non-NIL list to a list of one element. (CHOP '(FEE FIE FOE FUM)) should return
(FEE).
This is the answer-sheet solution:
(defun chop (x)
(if (consp x) (setf (cdr x) nil))
x)
I understand this solution. However, before checking out the official solution I tried:
(defun chop (xs)
(cond ((null xs) xs)
(t (setf xs (list (car xs))))))
Using this as a global variable for tests:
(defparameter teste-chop '(a b c d))
I tried on the REPL:
CL-USER> (chop teste-chop)
(A)
As you can see, the function returns the expected result.
Unfortunately, the side-effect to mutate the original list does not happen:
CL-USER> teste-chop
(A B C D)
Why it did not change?
Since I was setting the field (setf) of the whole list to be only its car wrapped as a new list, I was expecting the cdr of the original list to be vanished.
Apparently, the pointers are not automatically removed.
Since I have very strong gaps in low-level stuff (like pointers) I thought that some answer to this question could educate me on why this happens.
The point is about how parameters to functions are passed in Common Lisp. They are passed by value. This means that, when a function is called, all arguments are evaluated, and their values are assigned to new, local variables, the parameters of the function. So, consider your function:
(defun chop (xs)
(cond ((null xs) xs)
(t (setf xs (list (car xs))))))
When you call it with:
(chop teste-chop)
the value of teste-chop, that is the list (a b c d) is assigned to the function parameter xs. In the last line of the body of function, by using setf, you are assigning a new value, (list (car xs)) to xs, that is you are assigning the list (a) to this local variable.
Since this is the last expression of the function, such value is also returned by the function, so that the evaluation of (chop test-chop) returns the value (a).
In this process, as you can see, the special variable teste-chop is not concerned in any way, apart from calculating its value at the beginning of the evaluation of the function call. And for this reason its value is not changed.
Other forms of parameter passing are used in other languages, like for instance by name, so that the behaviour of function call could be different.
Note that instead, in the first function, with (setf (cdr x) nil) a data structure is modified, that is a part of a cons cell. Since the global variable is bound to that cell, also the global variable will appear modified (even if, in a certain sense, it is not modified, since it remains bound to the same cons cell).
As a final remark, in Common Lisp it is better not to modify constant data structures (like those obtained by evaluating '(a b c d)), since it could produce an undefined behaviour, depending on the implementation. So, if some structure should be modifiable, it should be built with the usual operators like cons or list (e.g. (list 'a 'b 'c 'd)).

Nondestructive setf?

Common Lisp seems to go to great lengths to provide both nondestructive functions (like subst & remove) and destructive functions and modify macros (like delete & rotatef) for general use. Presumably this is to effectively support both functional and nonfunctional styles of programming. But there also seems to be a particular bias toward the nonfunctional style in the design of the ubiquitous setf. The setf macro, incorporating generalized reference, is apparently flexible enough to modify any specifiable place (except perhaps for immutable integers and characters). This power probably accounts for its widespread useage in spite of its nonfunctional/destructive behavior.
The question is why there is not a corresponding "functional style" nondestructive operator, patterned after setf (call it put, since set is already taken), analogous to other nondestructive/destructive lisp operator pairs. Such an operator would probably take the same arguments, a place and a value, but would return a copy of the object in which the place was embedded, instead of the new value at that place. It also would probably involve a universal copier of some sort, with the standard setf simply modifying the copy before returning it. The nondestructive operator then could be used in lieu of setf for most assignments, with setf being reserved for really large objects. Is such an operator feasible (or even possible) given the (presumed) requirement for a universal copier and the need to recover an object from an arbitrary place embedded in it?
Common Lisp has no universal copier for the same reason it does not have a built-in printable representation of CLOS objects (see, e.g., Saving CLOS objects): the power of MOP.
Specifically, object creation can have arbitrary side effects whose replication is hard to guarantee. E.g., define initialize-instance for your class to modify slots based something fluid (e.g., tides or just random). Then the result of make-instance called with the same arguments twice may be different. You might argue that this is an argument in favor of a memcpy-style generic copier. However, imagine now a class which does instance accounting (an :allocation :class slot which contains a hash table mapping unique ID to instance). Now, the round trip from the copied object via the table will get a different object (the original, not the copy) - a major contract violation. And these examples are just the tip of the iceberg.
However, I would not agree that Common Lisp encourages imperative style more than functional style. It merely discourages the style you describe (copy+setf). Think about it this way: from the functional POV, a copy is indistinguishable from the original (because everything is immutable).
There are also "small hints" showing a clear preference for functional style.
Observe that the "natural" names
(e.g., append) are reserved
for non-destructive functions; the destructive versions
(e.g., nconc) are given
obscure names.
Additionally, pathnames, characters and numbers (including composites like ratio and complex) are
immutable, i.e., all pathname and number functions create new objects (functional style).
Thus, IMO, Common Lisp encourages programmers to use functional style while still making imperative style feasible, in conformance to the slogan "simple things should be easy, hard things should be possible".
See also Kent Pitman's writeup.
There is no universal setter either, but with SETF you are supposed to provide one when you use DEFINE-SETF-EXPANDER. I suppose you could define the equivalent of lenses/functional references. For another approach, Clojure defines an update function (borrowed from other languages, I know it exists in Prolog), which is not universal either. The FSET library provides some functional structures, notably associative maps which work like the ones in Clojure.
Suppose for a moment you limit yourself to structures:
(defstruct foo a b c)
(defstruct bar x y z)
(defparameter *test*
(make-foo :a (make-bar :x 0 :y 0 :z 0)
:b 100
:c "string"))
;; *test* is now
;; #S(FOO :A #S(BAR :X 0 :Y 0 :Z 0) :B 100 :C "string")
The following approach makes a copy, it relies on copy-structure and slot-value, which, while not standard, is well supported:
(defun update (structure keys value)
(if (endp keys)
value
(destructuring-bind (key &rest keys) keys
(let ((copy (copy-structure structure)))
(setf (slot-value copy key)
(update (slot-value copy key)
keys
value))
copy))))
You can update *test* as follows:
(update *test* '(a z) 10)
Here is a trace:
0: (UPDATE #S(FOO :A #S(BAR :X 0 :Y 0 :Z 0) :B 100 :C "string") (A Z) 10)
1: (UPDATE #S(BAR :X 0 :Y 0 :Z 0) (Z) 10)
2: (UPDATE 0 NIL 10)
2: UPDATE returned 10
1: UPDATE returned #S(BAR :X 0 :Y 0 :Z 10)
0: UPDATE returned #S(FOO :A #S(BAR :X 0 :Y 0 :Z 10) :B 100 :C "string")
In order to generalize, you could accept a function instead of a value, so that you could implement the equivalent of incf by partially applying the update function with #'1+ (the resulting closure would accept a list of keys).
Also, you need to generalize the copy operation, which is possible with generic functions. Likewise, you could use other kinds of accessors, and replace slot-value with a generic access function (for which there is a (setf access) operation). This generic copy/setf approach could be wasteful, however, if you want to share some data. For example, you only need to copy the part of a list which leads to your data, not the cons cells that are following it.
You could define some facility for defining custom updaters:
(defmethod perform-update (new (list list) position)
(nconc (subseq list 0 position)
(list new)
(nthcdr (1+ position) list)))

Why does this function crash LispWorks?

When I run this function from a listener in LispWorks, it either crashes the listener or gives an exception and assembly language data. Can anyone tell me what's wrong with it?
(defun should-flip-block (rowlist)
(declare ((vector number) rowlist))
(if (= (length rowlist) 0) (eval nil)
(let* ((exithigh (= (car (last rowlist)) 2))
(enterhigh (= (first rowlist) 2)))
(and exithigh enterhigh))))
It's called as (should-flip-block '(1 2 1 2 1)).
Problematic declaration
Note that not all Common Lisp implementations will think that (declare ((vector number)) rowvector) is a valid declaration.
Write instead (declare (type (vector number) rowvector)).
Wrong: a list is not a vector
The problems you see is because you lied to the implementation and safety is set low. You told Lisp that the argument is a vector, but you pass a list (which is not a vector).
The function then use calls to FIRST and LAST, which don't work on vectors, but lists.
Run code with higher safety value
Don't run Common Lisp by default with low safety. Use a default safety value of 2 or 3.
Using LispWorks 6.1.1:
CL-USER 43 > (proclaim '(optimize (safety 2)))
NIL
now I recompile the function and then call it:
CL-USER 44 > (should-flip-block '(1 2 1 2 1))
Error: Variable ROWLIST was declared type (VECTOR NUMBER) but is being
bound to value (1 2 1 2 1)
1 (abort) Return to level 0.
2 Return to top loop level 0.
Now you see an useful error and not a segment violation.
Literal Vectors
#(1 2 1 2 1) is a vector.
Note: LIST type has no parameter
Note that a type (list number) does not exist in Common Lisp and can't be defined. The type list can't have a parameter. It's also not possible to define such a type based on the type cons - recursive types don't work.
You declare that rowlist is a vector (but treat it is a list — using last and first).
This means that the compiler assumes that the object you pass to it is a vector, so, when you pass it a list, you get undefined behavior.
The most important thing to know about declarations in Lisp is: do not lie to the compiler.
I.e., if you violate your declarations (like you just did), you will get burned.
(Additionally, you do not need to eval nil, and there is no need for the let* since you are using the variables it binds just once).

Scheme: Procedures that return another inner procedure

This is from the SICP book that I am sure many of you are familiar with. This is an early example in the book, but I feel an extremely important concept that I am just not able to get my head around yet. Here it is:
(define (cons x y)
(define (dispatch m)
(cond ((= m 0) x)
((= m 1) y)
(else (error "Argument not 0 or 1 - CONS" m))))
dispatch)
(define (car z) (z 0))
(define (cdr z) (z 1))
So here I understand that car and cdr are being defined within the scope of cons, and I get that they map some argument z to 1 and 0 respectively (argument z being some cons). But say I call (cons 3 4)...how are the arguments 3 and 4 evaluated, when we immediately go into this inner-procedure dispatch which takes some argument m that we have not specified yet? And, maybe more importantly, what is the point of returning 'dispatch? I don't really get that part at all. Any help is appreciated, thanks!
This is one of the weirder (and possibly one of the more wonderful) examples of exploiting first-class functions in Scheme. Something similar is also in the Little Schemer, which is where I first saw it, and I remember scratching my head for days over it. Let me see if I can explain it in a way that makes sense, but I apologize if it's not clear.
I assume you understand the primitives cons, car, and cdr as they are implemented in Scheme already, but just to remind you: cons constructs a pair, car selects the first component of the pair and returns it, and cdr selects the second component and returns it. Here's a simple example of using these functions:
> (cons 1 2)
(1 . 2)
> (car (cons 1 2))
1
> (cdr (cons 1 2))
2
The version of cons, car, and cdr that you've pasted should behave exactly the same way. I'll try to show you how.
First of all, car and cdr are not defined within the scope of cons. In your snippet of code, all three (cons, car, and cdr) are defined at the top-level. The function dispatch is the only one that is defined inside cons.
The function cons takes two arguments and returns a function of one argument. What's important about this is that those two arguments are visible to the inner function dispatch, which is what is being returned. I'll get to that in a moment.
As I said in my reminder, cons constructs a pair. This version of cons should do the same thing, but instead it's returning a function! That's ok, we don't really care how the pair is implemented or laid out in memory, so long as we can get at the first and second components.
So with this new function-based pair, we need to be able to call car and pass the pair as an argument, and get the first component. In the definition of car, this argument is called z. If you were to execute the same REPL session I had above with these new cons ,car, and cdr functions, the argument z in car will be bound to the function-based pair, which is what cons returns, which is dispatch. It's confusing, but just think it through carefully and you'll see.
Based on the implementation of car, it appears to be that it take a function of one argument, and applies it to the number 0. So it's applying dispatch to 0, and as you can see from the definition of dispatch, that's what we want. The cond inside there compares m with 0 and 1 and returns either x or y. In this case, it returns x, which is the first argument to cons, in other words the first component of the pair! So car selects the first component, just as the normal primitive does in Scheme.
If you follow this same logic for cdr, you'll see that it behaves almost the same way, but returns the second argument to cons, y, which is the second component of the pair.
There are a couple of things that might help you understand this better. One is to go back to the description of the substitution model of evaluation in Chapter 1. If you carefully and meticulously follow that substitution model for some very simple example of using these functions, you'll see that they work.
Another way, which is less tedious, is to try playing with the dispatch function directly at the REPL. Below, the variable p is defined to refer to the dispatch function returned by cons.
> (define p (cons 1 2))
#<function> ;; what the REPL prints here will be implementation specific
> (p 0)
1
> (p 1)
2
The code in the question shows how to redefine the primitive procedure cons that creates a cons-cell (a pair of two elements: the car and the cdr), using only closures and message-dispatching.
The dispatch procedure acts as a selector for the arguments passed to cons: x and y. If the message 0 is received, then the first argument of cons is returned (the car of the cell). Likewise, if 1 is received, then the second argument of cons is returned (the cdr of the cell). Both arguments are stored inside the closure defined implicitly for the dispatch procedure, a closure that captures x and y and is returned as the product of invoking this procedural implementation of cons.
The next redefinitions of car and cdr build on this: car is implemented as a procedure that passes 0 to a closure as returned in the above definition, and cdr is implemented as a procedure that passes 1 to the closure, in each case ultimately returning the original value that was passed as x and y respectively.
The really nice part of this example is that it shows that the cons-cell, the most basic unit of data in a Lisp system can be defined as a procedure, therefore blurring the distinction between data and procedure.
This is the "closure/object isomorphism", basically.
The outer function (cons) is a class constructor. It returns an object, which is a function of one argument, where the argument is equivalent to the name of a method. In this case, the methods are getters, so they evaluate to values. You could just as easily have stored more procedures in the object returned by the constructor.
In this case, numbers where chosen as method names and sugary procedures defined outside the object itself. You could have used symbols:
(define (cons x y)
(lambda (method)
(cond ((eq? method 'car) x)
((eq? method 'cdr) y)
(else (error "unknown method")))))
In which case what you have more closely resembles OO:
# (define p (cons 1 2))
# (p 'car)
1
# (p 'cdr)
2

how to allocate memory blocks, in clisp

in clisp
(eq (cons 'a 'b) (cons 'a 'b))
is false(NIL)
because first(AB) and second(AB) allocates in different memory.
and
(eq 'a(car '(a b c)))
is true(T)
however, why?
like two (AB), does first A and second A which is result from (car '(a b c)) have to use different memory?
how to allocate memory blocks, in clisp?
Each call to CONS creates a new cons cell. These are different objects.
(eq (cons 1 2) (cons 1 2)) -> NIL
Symbols with the same name are the same objects.
(eq 'foo 'foo) -> T
Complications:
Symbols with the same name, but in different packages are different objects.
(eq 'bar::foo 'baz::foo) -> NIL
Symbols with the same name without a package might be the same object or not.
(eq '#:foo '#:foo) -> NIL
(let ((sym '#:foo))
(eq sym sym))
-> T
This is because 'a is interned. That is, whenever you type 'a in the code (in your second example, it occurs twice) they will resolve to the same symbol. This is because the reader will obtain a reference to the symbol using the INTERN function. (Hyperspec docs).
That is why (eq 'a 'a) returns T. There is one exception to this though. A symbol with the prefix #: will never be interned. This feature is used by GENSYM to guarantee that you get a unique symbol when writing macros.
CONS creates pairs. In your specific example, two pairs both containing the symbols A and B.
Typically, you don't worry about allocating memory blocks in Common Lisp, you create structures instead (arrays, lists, classes, hash tables). The only time you'd allocate memory bnlocks, as such, is for the purpose of calling functions linked in from another language.
Symbols are special objects. When Common Lisp first encounters a symbol, it is interned, i.e an object representing the symbol is created and the symbol is mapped to that object. For all further references to that symbol this object is retrieved using the symbol name as a key. For example, imagine that Common Lisp has created an object at memory location 0x05 for the symbol a. All further references to a will point to this same object. That means (eq 'a 'a) essentially becomes (eq 0x05 0x05) which evaluates to T.
Things are different for cons. Each call to cons creates a new object in memory. The first call to cons will create an object at, say, location 0x10 and the second call at 0x20. So your first comparison becomes (eq 0x10 0x20) which returns nil. The components of the pairs may point to the same objects, but the pairs themselves point to different locations.

Resources