What characters are allowed in common lisp symbols? - common-lisp

What characters are allowed in common lisp symbols? Can you give a regular expression to match them (or are they beyond the capable of regular grammars to describe)?
I have tried looking for information on this, but all I can find are some examples in CLHS, but no concrete definition of what exactly a legal symbol is.
Edit:
So, common lisp symbols can legally contain any character.
However, the parser doesn't just accept any character as it reads lisp code. What are the rules for parsable symbols? E.g. symbols that can be supplied as 'quoted symbols or inside of '(quoted lists).
I am interested in generating and reading non-bar-delimited symbols, from a non-lisp language. It should suffice, for my application, to use [a-zA-Z0-9:&-]+, but I tend to prefer to be as accurate as possible, which is why I am trying to determine if there is a regex that can match symbols. Matching the |delimited syntax| would be a bonus, but non-delimited symbols would suffice.
This needs to be symbols that would be loaded legally when using (read). The answer is not that symbols can contain any character:
[1]> (read t)
#
*** - READ from #<IO TERMINAL-STREAM>: objects printed as # in view of *PRINT-LEVEL* cannot be read back in
I want to know the rules, or a regex, for what is a valid symbol here, without delimiting it with |.

As sds mentioned, symbol names can contain any characters. Given any string, you can create a symbol with that name. However, based on your comments, it sounds like you're wonder what, under fairly default settings, will be read as a symbol. The answer is still "pretty much anything", with a few exceptions.
The relevant sections in the HyperSpec begin with 2.2 Reader Algorithm, which describes the tokenization process. It describes the process in detail, but perhaps the most important part is:
When dealing with tokens, the reader's basic function is to
distinguish representations of symbols from those of numbers. When a
token is accumulated, it is assumed to represent a number if it
satisfies the syntax for numbers listed in Figure 2-9. If it does not
represent a number, it is then assumed to be a potential number if it
satisfies the rules governing the syntax for a potential number. If a
valid token is neither a representation of a number nor a potential
number, it represents a symbol.
The Figure 2.9 mentioned in that except is in section 2.3.1 Numbers as Tokens, which says:
When a token is read, it is interpreted as a number or symbol. The token is interpreted as a number if it satisfies the syntax for numbers specified in the next figure.
So, the process is really "tokenize the stream, and for each token, check if it's a number, and if it's not a number, then it's a symbol." I realize this doesn't provide an a nice clean grammar for symbols, but that's just the way the language is defined. If you sit down to the task of writing a tokenizer and reader for a Lisp, you may find that this is a pretty convenient way of going about it. You pretty much just need to recognize which characters terminate a symbol, which characters start and end lists, what gets eliminated as whitespace, and what your escape characters are. Then you read nested lists of tokens, turning each token into a number or a symbol (or a string, etc.).
Perhaps one of the easiest ways to see why you have to do this in terms of tokenization and then checking for numbers is the fact that Common Lisp has a *read-base*variable that controls the base. Depending on the value of *read-base*, some things are numbers or symbols, and you can't know until you know what the complete token is, and what the current state of the runtime is.
CL-USER> 'beef
BEEF
CL-USER> (setf *read-base* 16)
16
CL-USER> 'beef
48879
CL-USER> (setf *read-base* a) ; set it back to 10, which is now a
10
CL-USER> (setf *read-base* 36)
36
CL-USER> 'hello ; a number
29234652
CL-USER> 'hello\ world ; a symbol
|HELLO WORLD|

Any character can be in a symbol. E.g.:
(length (loop for i to char-code-limit
collect (intern (string (code-char i)))))
==> 1114113

Related

Why is there no generic operators for Common Lisp?

In CL, we have many operators to check for equality that depend on the data type: =, string-equal, char=, then equal, eql and whatnot, so on for other data types, and the same for comparison operators (edit don't forget to answer about these please :) do we have generic <, > etc ? can we make them work for another object ?)
However the language has mechanisms to make them generic, for example generics (defgeneric, defmethod) as described in Practical Common Lisp. I imagine very well the same == operator that will work on integers, strings and characters, at least !
There have been work in that direction: https://common-lisp.net/project/cdr/document/8/cleqcmp.html
I see this as a major frustration, and even a wall, for beginners (of which I am), specially we who come from other languages like python where we use one equality operator (==) for every equality check (with the help of objects to make it so on custom types).
I read a blog post (not a monad tutorial, great serie) today pointing this. The guy moved to Clojure, for other reasons too of course, where there is one (or two?) operators.
So why is it so ? Is there any good reasons ? I can't even find a third party library, not even on CL21. edit: cl21 has this sort of generic operators, of course.
On other SO questions I read about performance. First, this won't apply to the little code I'll write so I don't care, and if you think so do you have figures to make your point ?
edit: despite the tone of the answers, it looks like there is not ;) We discuss in comments.
Kent Pitman has written an interesting article that tackles this subject: The Best of intentions, EQUAL rights — and wrongs — in Lisp.
And also note that EQUAL does work on integers, strings and characters. EQUALP also works for lists, vectors and hash tables an other Common Lisp types but objects… For some definition of work. The note at the end of the EQUALP page has a nice answer to your question:
Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. Although these functions take any type of argument and their names sound very generic, equal and equalp are not appropriate for every application.
Specifically note that there is a trick in my last “works” definition.
A newer library adds generic interfaces to standard Common Lisp functions: https://github.com/alex-gutev/generic-cl/
GENERIC-CL provides a generic function wrapper over various functions in the Common Lisp standard, such as equality predicates and sequence operations. The goal of the wrapper is to provide a standard interface to common operations, such as testing for the equality of two objects, which is extensible to user-defined types.
It does this for equality, comparison, arithmetic, objects, iterators, sequences, hash-tables, math functions,…
So one can define his own + operator for example.
Yes we have! eq works with all values and it works all the time. It does not depend on the data type at all. It is exactly what you are looking for. It's like the is operator in python. It must be exactly what you were looking for? All the other ones agree with eq when it's t, however they tend to be t for totally different values that have various levels of similarities.
(defparameter *a* "this is a string")
(defparameter *b* *a*)
(defparameter *c* "this is a string")
(defparameter *d* "THIS IS A STRING")
All of these are equalp since they contain the same meaning. equalp is perhaps the sloppiest of equal functions. I don't think 2 and 2.0 are the same, but equalp does. In my mind 2 is 2 while 2.0 is somewhere between 1.95 and 2.04. you see they are not the same.
equal understands me. (equal *c* *d*) is definitely nil and that is good. However it returns t for (equal *a* *c*) as well. Both are arrays of characters and each character are the same value, however the two strings are not the same object. they just happen to look the same.
Notice I'm using string here for every single one of them. We have 4 equal functions that tells you if two values have something in common, but only eq tells you if they are the same.
None of these are type specific. They work on all types, however they are not generics since they were around long before that was added in the language. You could perhaps make 3-4 generic equal functions but would they really be any better than the ones we already have?
Fortunately CL21 introduces (more) generic operators, particularly for sequences it defines length, append, setf, first, rest, subseq, replace, take, drop, fill, take-while, drop-while, last, butlast, find-if, search, remove-if, delete-if, reverse, reduce, sort, split, join, remove-duplicates, every, some, map, sum (and some more). Unfortunately the doc isn't great, it's best to look at the sources. Those should work at least for strings, lists, vectors and define methods of the new abstract-sequence.
see also
https://github.com/cl21/cl21/wiki
https://lispcookbook.github.io/cl-cookbook/cl21.html

String Comparison in Common Lisp

I am new to Common Lisp and Functional programming in general. I have a function lets call it "wordToNumber", I want it to check if the input string is "one" "two" "three".. etc (0-9) only. and I want to return 1 2 3 etc. so (wordToNumber "one") should output the number 1. I'm having some trouble with string comparison, tried using eq and eql, but its not working, from what I read it is comparing memory location not actual string. Is there an easier way to go about this or is there someway to compare strings. I need any examples to be purely functional programming, no loops and stuff. This is a small portion of a project I'm working on for school.
Oh, for string comparison im just using a simple function at the moment like this:
(defun wordToNumber(x)
(if(eq 'x "one")(return-from wordToNumber 1)))
and calling it with this : (wordToNumber "one")
keep getting Nil returned
Thanks for any help
The functions to compare strings are string= and string-equal, depending on whether you want the comparison to be case-sensitive.
And when you want to compare the value of a variable, you mustn't quote it, since the purpose of quoting is to prevent evaluation.
(defun word-to-number (x)
(cond ((string-equal x "one") 1)
((string-equal x "two") 2)
((string-equal x "three") 3)
...
))
As a practical matter, before you make a ten-branch conditional, consider this: you can pass string= and string-equal (and any other binary function) as an :test argument to most of the sequence functions. Look though the sequence functions and see if there's something that seems relevant to this problem. http://l1sp.org/cl/17.3 (There totally is!)
One nice thing about Lisp is the apropos function. Lisp is a big language and usually has what you want, and (apropos "string") would prolly have worked for you. I recommend also the Lisp Hyperpec: http://www.lispworks.com/documentation/HyperSpec/Front/
eq is good for symbols, CLOS objects, and even cons cells but be careful: (eq (list 1) (list 1)) is false because each list form returns a different cons pointing to the same number.
eql is fine for numbers and characters and anything eq can handle. One nice thing is that (eql x 42) works even if x is not a number, in which case (= x 42) would not go well.
You need equal for lists and arrays, and strings are arrays so you could use that. Then there is equalp, which I will leave as an exercise.

Can I manipulate symbols like I manipulate strings?

Question: Can I divide a symbol into two symbols based on a letter or symbol?
Example: For example, let's say I have :symbol1_symbol2, and I want to split it on the _ into :symbol1 and :symbol2. Is this possible?
Motivation: A fairly common recommendation in Julia is to use Symbol in place of String or ASCIIString as it is more efficient for many operations. So I'm interested in situations where this might break down because there is no analogue for Symbol for an operation that we might typically perform on ASCIIString, e.g. anything to do with regular expressions.
No you can't manipulate symbols.
They are not a composite type (in logic, though they maybe in implement).
They are one thing.
Much like an integer is one thing,
or a boolean is one thing.
You can't manipulate the parts of it.
As I understand, it the reason they are fast is because thay are "one thing".
Symbols are not strings.
Symbols are the representation of a parsed token.
They exist for working with macros etc.
They are useful for other things.
Though one fo there most common alterate uses in 0.3 was as a standin for enumerations. Now that Enum is in 0.4, that use will decline.
They are still logically good for dictionary keys etc.
--
If for some reason you must.
Eg for interop with a 3rd party library, or for some kind of dynamic dispatch:
You can convert it to a String,
with string(:abc), (There is not currently a convert),
and back with Symbol("abc").
so
function symsplit(s_s::Symbol)
combined_string_from=string(s_s)
strings= split(combined_string_from, '_')
map(Symbol,strings)
end
#show symsplit(:a)
#show symsplit(:a_b)
#show symsplit(:a_b_c);
but please don't.
You can find all the methods that operate on symbols by calling methodswith(Symbol) (though most just use the symbol as a marker/enum)
See also:
What is a "symbol" in Julia?

Understanding type specifiers in Common Lisp

I've found a great example of type checking in LispWorks Hyper Spec, but the "type specifier" link leads to a mere glossary not the denotation, and I got a little confused with the syntax.
In (check-type n (integer 0 *) "a positive integer") what does (integer 0 *) mean? I assume it means inclusive range from 0 to infinity, but is this so?
Yes you can use type specifiers in common lisp, they can be very powerful if your compiler chooses to use them. While you may find uses for check-type, the most common kinds of type specifications come in the form of declarations.
The declare expression is not only used for types however it has a number of declaration identifiers and common lisp implementations are actually free to add their own.
The bit you are interested in though is 'types' and more specifically than that 'Type Specifiers'. That page will give you the low-down on a variety of ways to specify types, including the way you mentioned in your question.
Again be aware that your implementation doesn’t have to use the declarations it could just ignore them! Here is some more info on that.
And for some example code, here is the example that got me understanding the basics of how this works. Here and more here.
From the 4.2.3 Type Specifiers:
If a type specifier is a list, the car of the list is a symbol, and
the rest of the list is subsidiary type information. Such a type
specifier is called a compound type specifier. Except as explicitly
stated otherwise, the subsidiary items can be unspecified. The
unspecified subsidiary items are indicated by writing *. For example,
to completely specify a vector, the type of the elements and the
length of the vector must be present.
(vector double-float 100)
The following leaves the length unspecified:
(vector double-float *)
The following leaves the element type unspecified:
(vector * 100)

How to prove by induction that a program does something?

I have a computer program that reads in an array of chars that operands and operators written in postfix notation. The program then scans through the array works out the result by using a stack as shown :
get next char in array until there are no more
if char is operand
push operand into stack
if char is operator
a = pop from stack
b = pop from stack
perform operation using a and b as arguments
push result
result = pop from stack
How do I prove by induction that this program correctly evaluates any postfix expression? (taken from exercise 4.16 Algorithms in Java (Sedgewick 2003))
I'm not sure which expressions you need to prove the algorithm against. But if they look like typical RPN expressions, you'll need to establish something like the following:
1) algoritm works for 2 operands (and one operator)
and
algorithm works for 3 operands (and 2 operators)
==> that would be your base case
2) if algorithm works for n operands (and n-1 operators)
then it would have to work for n+1 operands.
==> that would be the inductive part of the proof
Good luck ;-)
Take heart concerning mathematical proofs, and also their sometimes confusing names. In the case of an inductive proof one is still expected to "figure out" something (some fact or some rule), sometimes by deductive logic, but then these facts and rules put together constitute an broader truth, buy induction; That is: because the base case is established as true and because one proved that if X was true for an "n" case then X would also be true for an "n+1" case, then we don't need to try every case, which could be a big number or even infinite)
Back on the stack-based expression evaluator... One final hint (in addtion to Captain Segfault's excellent explanation you're gonna feel over informed...).
The RPN expressions are such that:
- they have one fewer operator than operand
- they never provide an operator when the stack has fewer than 2 operands
in it (if they didn;t this would be the equivalent of an unbalanced
parenthesis situation in a plain expression, i.e. a invalid expression).
Assuming that the expression is valid (and hence doesn't provide too many operators too soon), the order in which the operand/operators are fed into the algorithm do not matter; they always leave the system in a stable situtation:
- either with one extra operand on the stack (but the knowledge that one extra operand will eventually come) or
- with one fewer operand on the stack (but the knowledge that the number of operands still to come is also one less).
So the order doesn't matter.
You know what induction is? Do you generally see how the algorithm works? (even if you can't prove it yet?)
Your induction hypothesis should say that, after processing the N'th character, the stack is "correct". A "correct" stack for a full RPN expression has just one element (the answer). For a partial RPN expression the stack has several elements.
Your proof is then to think of this algorithm (minus the result = pop from stack line) as a parser that turns partial RPN expressions into stacks, and prove that it turns them into the correct stacks.
It might help to look at your definition of an RPN expression and work backwards from it.

Resources