Internal representation of string - common-lisp

In my understanding, everything in lisp is either an atom or a pair.
Is a string considered as an atom?
How does a lisp compiler know if string is a mere sequence of characters or a string?
What is the difference between foo, 'foo and "foo" in general?

String is an Atom
Your first statement is trivial in the sense that "atom" is defined as "not pair".
Thus a string is most definitely an atom:
(atom "a")
==> T
The class string
is actually defined as a vector
of characters
(or its subtypes):
A string is a specialized vector whose elements are of type character
or a subtype of type character. When used as a type specifier for
object creation, string means (vector character).
What? Collection is an Atom?!!
This flies in the face of the common understanding that a "collection" cannot be an "atom".
Nevertheless, in Lisp, arrays, vectors, hash tables, structures, class objects are all atoms.
The rationale (besides ancient history) is that the implementation (eval) does not have to look inside atoms - they evaluate to themselves (except for symbols).
Quoting
foo is a
symbol,
'foo is identical to (quote foo), and "foo" is a string.
Further Reading
You should take a look at a basic lisp textbook (e.g., "Practical Common
Lisp" or "ANSI Common Lisp").

Related

Why is (equal (copy-<some struct type> <struct type object>) <same struct type object>) nil?

The elements of the struct type are s-expressions.
(defstruct state
homeS
homeH
homeD
homeC
free
stacks)
I was attempting to use these objects as keys to a hash table. Before I converted to a struct and was using a deeper S-expression, all worked well. When I changed to a defstruct, the hash table never succeeded with finding a duplicate key.
The hashtable was made with (make-hash-table :test #'equal), so I looked at the the behavior of equal.
I expected (equal (copy-state state1) state1) to return t, but it returned nil.
I haven't found this in the Common Lisp Hyper-Specification. I think the answer is to write my own test and hash functions, and give them to make-hash-table.
This must be defined as common lisp behavior -- it works the same way in CLISP and SBCL.
According to the Common Lisp Hyper Specification, the natural operation "equal", does not work with structured data like defstruct and defarray objects.
In this case, for objects defined by a defstruct type the "equal" operator does not work, and a different operator, "equalp", is used to compare structured data.

Check whether a symbol is bound

In Emacs Lisp (boundp 'symbol) returns t if symbol is bound to some value, nil otherwise. Is there an equivalent procedure in Guile Scheme?
Scheme avoids leaking implementation into the specification and speaks of 'identifiers' rather than of binding an interned symbol to a value - see §2.1 of R7RS. In scheme, an 'identifier' is just a name.
An identifier name is treated as identifying a variable unless it identifies a macro (syntax) or it is in a context requiring it to be treated as identifying a symbol, such as by quotation. In particular, §2.1 of R7RS states that "When an identifier appears as a literal or within a literal (see section 4.1.2), it is being used to denote a symbol (see section 6.5)". You can test whether an identifer identifies a symbol with the symbol? procedure.
Guile scheme does in fact implement identifiers by interning symbols and you can query whether a symbol is bound using defined?:
(defined? 'num)
=> #f
(define num 1)(defined? 'num)
=> #t
This is a guile implementation matter and not portable scheme.
Edit: Note that defined? only works with top level variables defined with define. It does not work with let and cognates.

What characters are allowed in common lisp symbols?

What characters are allowed in common lisp symbols? Can you give a regular expression to match them (or are they beyond the capable of regular grammars to describe)?
I have tried looking for information on this, but all I can find are some examples in CLHS, but no concrete definition of what exactly a legal symbol is.
Edit:
So, common lisp symbols can legally contain any character.
However, the parser doesn't just accept any character as it reads lisp code. What are the rules for parsable symbols? E.g. symbols that can be supplied as 'quoted symbols or inside of '(quoted lists).
I am interested in generating and reading non-bar-delimited symbols, from a non-lisp language. It should suffice, for my application, to use [a-zA-Z0-9:&-]+, but I tend to prefer to be as accurate as possible, which is why I am trying to determine if there is a regex that can match symbols. Matching the |delimited syntax| would be a bonus, but non-delimited symbols would suffice.
This needs to be symbols that would be loaded legally when using (read). The answer is not that symbols can contain any character:
[1]> (read t)
#
*** - READ from #<IO TERMINAL-STREAM>: objects printed as # in view of *PRINT-LEVEL* cannot be read back in
I want to know the rules, or a regex, for what is a valid symbol here, without delimiting it with |.
As sds mentioned, symbol names can contain any characters. Given any string, you can create a symbol with that name. However, based on your comments, it sounds like you're wonder what, under fairly default settings, will be read as a symbol. The answer is still "pretty much anything", with a few exceptions.
The relevant sections in the HyperSpec begin with 2.2 Reader Algorithm, which describes the tokenization process. It describes the process in detail, but perhaps the most important part is:
When dealing with tokens, the reader's basic function is to
distinguish representations of symbols from those of numbers. When a
token is accumulated, it is assumed to represent a number if it
satisfies the syntax for numbers listed in Figure 2-9. If it does not
represent a number, it is then assumed to be a potential number if it
satisfies the rules governing the syntax for a potential number. If a
valid token is neither a representation of a number nor a potential
number, it represents a symbol.
The Figure 2.9 mentioned in that except is in section 2.3.1 Numbers as Tokens, which says:
When a token is read, it is interpreted as a number or symbol. The token is interpreted as a number if it satisfies the syntax for numbers specified in the next figure.
So, the process is really "tokenize the stream, and for each token, check if it's a number, and if it's not a number, then it's a symbol." I realize this doesn't provide an a nice clean grammar for symbols, but that's just the way the language is defined. If you sit down to the task of writing a tokenizer and reader for a Lisp, you may find that this is a pretty convenient way of going about it. You pretty much just need to recognize which characters terminate a symbol, which characters start and end lists, what gets eliminated as whitespace, and what your escape characters are. Then you read nested lists of tokens, turning each token into a number or a symbol (or a string, etc.).
Perhaps one of the easiest ways to see why you have to do this in terms of tokenization and then checking for numbers is the fact that Common Lisp has a *read-base*variable that controls the base. Depending on the value of *read-base*, some things are numbers or symbols, and you can't know until you know what the complete token is, and what the current state of the runtime is.
CL-USER> 'beef
BEEF
CL-USER> (setf *read-base* 16)
16
CL-USER> 'beef
48879
CL-USER> (setf *read-base* a) ; set it back to 10, which is now a
10
CL-USER> (setf *read-base* 36)
36
CL-USER> 'hello ; a number
29234652
CL-USER> 'hello\ world ; a symbol
|HELLO WORLD|
Any character can be in a symbol. E.g.:
(length (loop for i to char-code-limit
collect (intern (string (code-char i)))))
==> 1114113

The relationship between quotation, reification and reflection

I recently get confused with quotation, reification and reflection. Someone could offer a good explanation about their relationship and differences (if any)?
Quoting
This is probably the easiest one. Consider what happens when you type the following into the REPL:
(+ a 1)
REPL stands for Read Eval Print Loop, so first it Reads this in. This is a list, so after reading we have a list of 3 elements containing: <the symbol "+"> <the symbol "a"> <the number 1>
The next step is evaluation. Evaluating a list in Common Lisp involves looking up the function (or macro) bound to the first item in the list. Since + is bound to a function and not a macro, it then evaluates each subsequent element in the list. Numbers evaluate to themselves, and "a" will evaluate to whatever it is bound to. Now that the arguments are evaluated, the function "+" is called with the results of the evaluation.
We then Print the result and Loop back to the Read step
So this is great, but what if we want something that, when evaluated, will end up as a list of 3 elements containing <the symbol "+"> <the symbol "a"> <the number 1>? The solution to this is quoting. Lisps in general have a special form called "quote" that takes a single argument, and the result is that argument, unevaluated. So
(quote (+ a 1))
will evaluate to that list. As some syntactic sugar, ' is treated the same as (quote ), so we can just write '(+ a 1).
Reification
Reification is a generic term that roughly means "Make an abstract concept" concrete. Specific to programming, when something is reified, it roughly means you can treat it as data (or "First-class object"). An example of this in lisp is functions the lambda expression gives you the ability to create a concrete, first-class object that represents the abstract concept of a function call. Another example is a CLOS class, which is itself a CLOS object, that represents the abstract concept of a class.
Reflection
To a certain degree, reflection is the opposite of Reification. Given something concrete, you want some information about it's abstract representation. Consider a Common Lisp Package object, which is a reification of the concept of a package, which is just a mapping from symbol names to symbols. You can use do-symbols to iterate over all of the symbols in a package, thus getting that information out at runtime.
Also, remember how I said lambda's were a reification of functions? Well "function-lambda-expression" is the reflection on functions.
The metaobject protocol (MOP) is a semi-standard way of doing all sorts of things with classes and objects. Among other things, it allows reflection on classes and objects.

What is the difference between a variable and a symbol in LISP?

In terms of scope? Actual implementation in memory? The syntax? For eg, if (let a 1) Is 'a' a variable or a symbol?
Jörg's answer points in the right direction. Let me add a bit to it.
I'll talk about Lisps that are similar to Common Lisp.
Symbols as a data structure
A symbol is a real data structure in Lisp. You can create symbols, you can use symbols, you can store symbols, you can pass symbols around and symbols can be part of larger data structures, for example lists of symbols. A symbol has a name, can have a value and can have a function value.
So you can take a symbol and set its value.
(setf (symbol-value 'foo) 42)
Usually one would write (setq foo 42), or (set 'foo 42) or (setf foo 42).
Symbols in code denoting variables
But!
(defun foo (a)
(setq a 42))
or
(let ((a 10))
(setq a 42))
In both forms above in the source code there are symbols and a is written like a symbol and using the function READ to read that source returns a symbol a in some list. But the setq operation does NOT set the symbol value of a to 42. Here the LET and the DEFUN introduce a VARIABLE a that we write with a symbol. Thus the SETQ operation then sets the variable value to 42.
Lexical binding
So, if we look at:
(defvar foo nil)
(defun bar (baz)
(setq foo 3)
(setq baz 3))
We introduce a global variable FOO.
In bar the first SETQ sets the symbol value of the global variable FOO. The second SETQ sets the local variable BAZ to 3. In both case we use the same SETQ and we write the variable as a symbol, but in the first case the FOO donates a global variable and those store values in the symbol value. In the second case BAZ denotes a local variable and how the value gets stored, we don't know. All we can do is to access the variable to get its value. In Common Lisp there is no way to take a symbol BAZ and get the local variable value. We don't have access to the local variable bindings and their values using symbols. That's a part of how lexical binding of local variables work in Common Lisp.
This leads for example to the observation, that in compiled code with no debugging information recorded, the symbol BAZ is gone. It can be a register in your processor or implemented some other way. The symbol FOO is still there, because we use it as a global variable.
Various uses of symbols
A symbol is a data type, a data structure in Lisp.
A variable is a conceptual thing. Global variables are based on symbols. Local lexical variables not.
In source code we write all kinds of names for functions, classes and variables using symbols.
There is some conceptual overlap:
(defun foo (bar) (setq bar 'baz))
In the above SOURCE code, defun, foo, bar, setq and baz are all symbols.
DEFUN is a symbol providing a macro.
FOO is a symbol providing a function.
SETQ is a symbol providing a special operator.
BAZ is a symbol used as data. Thus the quote before BAZ.
BAR is a variable. In compiled code its symbol is no longer needed.
Quoting from the Common Lisp HyperSpec:
symbol n. an object of type symbol.
variable n. a binding in the “variable” namespace.
binding n. an association between a name and that which the name denotes. (…)
Explanation time.
What Lisp calls symbols is fairly close to what many languages call variables. In a first approximation, symbols have values; when you evaluate the expression x, the value of the expression is the value of the symbol x; when you write (setq x 3), you assign a new value to x. In Lisp terminology, (setq x 3) binds the value 3 to the symbol x.
A feature of Lisp that most languages don't have is that symbols are ordinary objects (symbols are first-class objects, in programming language terminology). When you write (setq x y), the value of x becomes whatever the value of y was at the time of the assignment. But you can write (setq x 'y), in which case the value of x is the symbol y.
Conceptually speaking, there is an environment which is an association table from symbols to values. Evaluating a symbol means looking it up in the current environment. (Environments are first-class objects too, but this is beyond the scope of this answer.) A binding refers to a particular entry in an environment. However, there's an additional complication.
Most Lisp dialects have multiple namespaces, at least a namespace of variables and a namespace of functions. An environment can in fact contain multiple entries for one symbol, one entry for each namespace. A variable, strictly speaking, is an entry in an environment in the namespace of variables. In everyday Lisp terminology, a symbol is often referred to as a variable when its binding as a variable is what you're interested in.
For example, in (setq a 1) or (let ((a 1)) ...), a is a symbol. But since the constructs act on the variable binding for the symbol a, it's common to refer to a as a variable in this context.
On the other hand, in (defun a (...) ...) or (flet ((a (x) ...)) ...), a is a also symbol, but these constructs act on its function binding, so a would not be considered a variable.
In most cases, when a symbol appears unquoted in an expression, it is evaluated by looking up its variable binding. The main exception is that in a function call (foo arg1 arg2 ...), the function binding for foo is used. The value of a quoted symbol 'x or (quote x) is itself, as with any quoted expression. Of course, there are plenty of special forms where you don't need to quote a symbol, including setq, let, flet, defun, etc.
A symbol is a name for a thing. A variable is a mutable pointer to a mutable storage location.
In the code snippet you showed, both let and a are symbols. Within the scope of the let block, the symbol a denotes a variable which is currently bound to the value 1.
But the name of the thing is not the thing itself. The symbol a is not a variable. It is a name for a variable. But only in this specific context. In a different context, the name a can refer to a completely different thing.
Example: the symbol jaguar may, depending on context, denote
OSX 10.2
a gaming console
a car manufacturer
a ground attack military jet airplane
another military jet airplane
a supercomputer
an electric guitar
and a whole lot of other things
oh, did I forget something?
Lisp uses environments which are similar to maps (key -> value) but with extra built-in mechanisms for chaining environments and controlling bindings.
Now, symbols are pretty much, keys (except special form symbols), and point to a value,
ie function, integer, list, etc.
Since Common Lisp gives you a way to alter the values, i.e. with setq, symbols in some contexts
(your example) are also variables.
A symbol is a Lisp data object. A Lisp "form" means a Lisp object that is intended to be evaluated. When a symbol itself is used as a Lisp form, i.e. when you evaluate a symbol, the result is a value that is associated with that symbol. The way values are associated with symbols is a deep part of the Lisp langauge. Whether the symbol has been declared to be "special" or not greatly changes the way evaluation works.
Lexical values are denoted by symbols, but you can't manipulate those symbols as objects yourself. In my opinion, explaining anything in Lisp in terms of "pointers" or "locations" is not the best way.
Adding a side note to the above answers:
Newcomers to Lisp often are not sure exactly what symbols are for, besides being the names of variables. I think the best answer is that they are like enumeration constants, except that you don't have to declare them before using them. Of course, as others have explained, they are also objects. (This shouldn't seem strange to users of Java, in which enumeration constants are objects too.)
Symbol and variable are 2 different things.
Like in mathematic symbol is a value. And variable have the same meaning than in mathematic.
But your confusion came from the fact that symbol are the meta representation of a variable.
That is if you do
(setq a 42)
You just define a variable a. Incidentally the way common lisp store it is throw the structure of a symbol.
In common lips symbol is a structure withe different property. Each one can be access with function like symbol-name, symbol-function...
In the case of variable you can access his value via ssymbol-value
? (symbol-value 'a)
42
This is not the common case of getting the value of a.
? a
42
Note that symbols are self evaluating that mean that if you ask a symbol you get the symbol not the symbol-value
? 'a
A

Resources