Is it valid to have NIL as an argument in the string comparison functions? - common-lisp

I am wondering whether or not (string= "abc" nil) is valid in Common Lisp. I noticed that SBCL does not complain even though nil is not a string. (string= '() nil) returns T although both arguments are not strings ...
(SBCL version: 2.2.2)

In Common Lisp the string comparison operators accept “string designators”. According to the Reference Manual, we have:
string designator n. a designator for a string; that is, an object that denotes a string and that is one of: a character (denoting a singleton string that has the character as its only element), a symbol (denoting the string that is its name), or a string (denoting itself).
So the operators accept symbols, and compare their names.
On the other hand, the empty list is equivalent to the symbol NIL:
nil n. the object that is at once the symbol named "NIL" in the COMMON-LISP package, the empty list, the boolean (or generalized boolean) representing false, and the name of the empty type.
So the comparison is equivalent to testing the equality of the strings "NIL" and "NIL", which is obviously true.

Related

Correcting the regex "\[([a-zA-Z0-9_-]+)]"

The following cl-ppcre regular expression generates an error:
(ppcre:scan-to-strings "\[([a-zA-Z0-9_-]+)]" "[has-instance]")
debugger invoked on a CL-PPCRE:PPCRE-SYNTAX-ERROR in thread
#<THREAD "main thread" RUNNING {10010B0523}>:
Expected end of string. at position 16 in string "[([a-zA-Z0-9_-]+)]"
What I was expecting as return values is:
“[has-instance]”
#(“has-instance”)
in order to get at the string within the brackets. Can someone provide a regex correction? Thanks.
The escape character (backslash) only escapes itself and double quotes (§2.4.5 Double-Quote):
If a single escape character is seen, the single escape character is discarded, the next character is accumulated, and accumulation continues.
That means that:
"\[([a-zA-Z0-9_-]+)]"
is parsed the same as the following, where backslash is not present:
"[([a-zA-Z0-9_-]+)]"
The PCRE syntax implemented by CL-PPCRE understands the opening square bracket as a special syntax for character classes, and ends at the next closing bracket.
Thus, the above reads the following as a class:
[([a-zA-Z0-9_-]
The corresponding regex tree is:
CL-USER> (ppcre:parse-string "[([a-zA-Z0-9_-]")
(:CHAR-CLASS #\( #\[ (:RANGE #\a #\z) (:RANGE #\A #\Z) (:RANGE #\0 #\9) #\_ #\-)
Note in particular that the opening parenthesis inside it is treated literally. When the parser encounters the closing parenthesis that follows the above fragment, it interprets it as the end of a register group, but no such group was started, hence the error message at position 16 of the string.
To avoid treating the bracket as a character class, it must be preceded by a literal backslash in the string, as you tried to do, but in order to do so you must write two backslash characters:
CL-USER> (ppcre:parse-string "\\[([a-zA-Z0-9_-]+)]")
(:SEQUENCE #\[
(:REGISTER
(:GREEDY-REPETITION 1 NIL
(:CHAR-CLASS (:RANGE #\a #\z) (:RANGE #\A #\Z) (:RANGE #\0 #\9) #\_ #\-)))
#\])
The closing square brackets needs no backslash.
I encourage you to write regular expressions in Lisp using the tree form, with :regex terms when it improves clarity: it avoids having to deal with the kind of problems that escaping brings. For example:
CL-USER> (ppcre:scan-to-strings
'(:sequence "[" (:register (:regex "[a-zA-Z0-9_-]+")) "]")
"[has-instance]")
"[has-instance]"
#("has-instance")
Double escape the square brackets.
You forgot to (double) escape the closing bracket, too.
(cl-ppcre:scan-to-strings "\\[([a-zA-Z0-9_-]+)\\]" "[has-instance]")
;; "[has-instance]" ;
;; #("has-instance")
For those who are new to common lisp, you import cl-ppcre using quicklisp:
(load "~/quicklisp/setup.list") ;; adjust path to where you installed your quicklisp
(ql:quickload :cl-ppcre)

How does a non-standard string literal avoid a syntax error generated by a standard string literal?

Based on the relevant section of the Julia docs, my understanding is that a non-standard string literal like foo"hello, world" is equivalent to explicitly calling the corresponding macro: #foo_str("hello, world"). However, there must be some extra magic that I'm not understanding. Consider a date format dateformat"\m". By itself, "\m" throws a syntax error:
julia> "\m"
ERROR: syntax: invalid escape sequence
And the same syntax error is thrown if I call #dateformat_str("\m"), since the string literal "\m" appears to be evaluated or error checked before it is passed to the macro:
julia> using Dates
julia> #dateformat_str("\m")
ERROR: syntax: invalid escape sequence
However, using the non-standard string literal works:
julia> dateformat"\m"
dateformat"\m"
This is counter-intuitive, because I thought that dateformat"\m" was equivalent to #dateformat_str("\m"). How does the non-standard string literal avoid the syntax error generated by the standard string literal?
In short, because the parser recognizes that situation and parses the string literal differently
For string macros invocations it does this.
Calling: parse-raw-literal
Where as for normal string literals it does this.
Calling parse-string-literal
#dateformat_str("\m") on the other hand parses as a macro invocation on a normal string literal. so it uses the later parse-string-literal, which errors.
Note that ones parsed it has parsed the string into what is escaped as "\\m"
julia> dump(:(dateformat"\m"))
Expr
head: Symbol macrocall
args: Array{Any}((3,))
1: Symbol #dateformat_str
2: LineNumberNode
line: Int64 1
file: Symbol REPL[6]
3: String "\\m
Of related interest is the raw string macro , which does nothing at all, but is still going to be string parsed using parse-raw-literal
It is basically defined as
macro raw_str(s)
return s
end

auto-generate key for hash table in common lisp

I would like to generate sequential keys that I can use across a number of hash tables. I will call them 'id1','id2' etc. If ht is my hash table then I would like to make symbols from strings as keys. To add an entry to the hash table I want to so something like:
(setf (gethash (make-symbol "id1") ht) 1)
And then access it again with
(gethash 'id1 ht)
I don't think make-symbol is giving me what I want, and the key 'id1' sn't recognised.
What is the best way to make this key?
Error: symbol should be in a package and needs the correct case
In your case we have:
CL-USER 24 > (symbol-name (make-symbol "id0"))
"id0"
CL-USER 25 > (symbol-package (make-symbol "id0"))
NIL
Make sure that you think about the following:
intern the symbol in a package
intern the symbol in the correct package
make sure the symbol has the correct name with the correct case
write symbols with the case you intend to use, possibly you need to escape the symbol to preserve the case
Examples:
uppercased symbol and lowercase symbol name -> not eq
CL-USER 26 > (eq 'id0 (intern "id0" "CL-USER"))
NIL
uppercased symbol and uppercase symbol name -> is eq
CL-USER 27 > (eq 'id0 (intern "ID0" "CL-USER"))
T
an escaped&lowercase symbol and a lowercase symbol name -> is eq
CL-USER 28 > (eq '|id0| (intern "id0" "CL-USER"))
T
make-symbol creates uninterned symbols. It means you will have a unique symbol every time. To get an interned symbol use intern instead.

How to convert any type into String in Julia

Using Julia, I'd like to reliably convert any type into type String. There seems to be two ways to do the conversion in v0.5, either the string function or String constructor. The problem is that you need to choose the right one depending upon the input type.
For example, typeof(string(1)) evaluates to String, but String(1) throws an error. On the other hand, typeof(string(SubString{String}("a"))) evaluates to Substring{String}, which is not a subtype of String. We instead need to do String(SubString{String}("a")).
So it seems the only reliable way to convert any input x to type String is via the construct:
String(string(x))
which feels a bit cumbersome.
Am I missing something here?
You should rarely need to explicitly convert to String. Note that even if your type definitions have String fields, or if your arrays have concrete element type String, you can still rely on implicit conversion.
For instance, here are examples of implicit conversion:
type TestType
field::String
end
obj = TestType(split("x y")[1]) # construct TestType with a SubString
obj.field # the String "x"
obj.field = SubString("Hello", 1, 3) # assign a SubString
obj.field # the String "Hel"

What are the default separator for string interpolation?

It seems ",", "$", "/" all serve as a separator, but "_" not.
x = "1"
"$x,x", "$x$x", "$x/1", "$x_1"
Is there any doc about this?
I believe this is because x_1 is a valid variable name in Julia, so it is trying to insert the value of that variable into the string.
The doc says:
The shortest complete expression after the $ is taken as the expression whose value is to be interpolated into the string
The internal workings are explained in the github issue #455 which could be summarised by:
The way string interpolation works is actually entirely defined in Julia. What happens is that the parser (in FemtoLisp) scans the code and finds a string literal, delimited by double quotes. If it finds no unescaped $ in the string, it just creates a string literal itself — ASCIIString or UTF8String depending on the content of the string. On the other hand, if the string has an unescaped $, it punts and hands the interpretation of the string literal to the str julia macro, which generates an expression that constructs the desired strings by concatenating string literals and interpolated values. This is a nice elegant scheme that lets the parser not worry about stuff like interpolation.
I could guess that #\, #\) #\] #\} #\; which are ,, ), ], } and ; respectively are closing tokens for expressions and $ is specifying the start of next interpolation.

Resources