trying to use cl-lexer on a file containing "{" and "}" - common-lisp

Using the file "test-lexer.lisp", I have very slightly modified lex to be
(defparameter *lex* (test-lexer "{ 1.0 12 fred 10.23e12"))
and increased the number of times test repeats to 6
(defun test ()
(loop repeat 6
collect (multiple-value-list (funcall *lex*))))
and tried modifying test-lexer in a number of ways to try to get it to recognize "{" as a token.
For example, adding [:punct:] in (deflexer test-lexer ...)
by changing
("[:alpha:][:alnum:]*"
(return (values 'name %0)))
to
("[:alpha:][:alnum:][:punct:]*"
(return (values 'name %0)))
and consistently get errors like
"""Lexer unable to recognize a token in "{ 1.0 12 fred 10.23e12", position 0 ("{ 1.0 12 fred 10.23e")
[Condition of type SIMPLE-ERROR]"""
How can i specify "{" as a character to be recognized? Or is my problem elsewhere?

The cl-lexer system is based on regular expressions, so you can put any literal character to stand for itself, like {. But it happens that the brace character has a special meaning in the regular expression language, so you need to quote it with a backslash. In order to write a backslash in Lisp strings, backslashes need to be escaped. Hence:
(deflexer test-lexer
("\\{" (return (values :grouping :open-brace))) ;; <-- Here
("[0-9]+([.][0-9]+([Ee][0-9]+)?)"
(return (values 'flt (num %0))))
("[0-9]+"
(return (values 'int (int %0))))
("[:alpha:][:alnum:]*"
(return (values 'name %0)))
("[:space:]+"))
I return the :open-brace value and the :grouping category, but you can choose to return something else if you want.
The test function then returns:
((:GROUPING :OPEN-BRACE) (FLT 1.0) (INT 12)
(NAME "fred") (FLT 1.023e13) (NIL NIL))

Related

Escaping quotes in cl-ppcre regex

Background
I need to parse CSV files, and cl-csv et. al. are too slow on large files, and have a dependency on cl-unicode, which my preferred lisp implementation does not support. So, I am improving cl-simple-table, one that Sabra-on-the-hill benchmarked as the fastest csv reader in a review.
At the moment, simple-table's line parser is rather fragile, and it breaks if the separator character appears within a quoted string. I'm trying to replace the line parser with cl-ppcre.
Attempts
Using the Regex Coach, I've found a regex that works in almost all cases:
("[^"]+"|[^,]+)(?:,\s*)?
The challenge is getting this Perl regex string into something I can use in cl-ppcre to split the line. I have tried passing the regex string, with various escapes for the ":
(defparameter bads "\"AER\",\"BenderlyZwick\",\"Benderly and Zwick Data: Inflation, Growth and Stock returns\",31,5,0,0,0,0,5,\"https://vincentarelbundock.github.io/Rdatasets/csv/AER/BenderlyZwick.csv\",\"https://vincentarelbundock.github.io/Rdatasets/doc/AER/BenderlyZwick.html\"
"Bad string, note a separator character in the quoted field, near Inflation")
(ppcre:split "(\"[^\"]+\"|[^,]+)(?:,\s*)?" bads)
NIL
Neither single, double, triple nor quadruple \ work.
I've parsed the string to see what the parse tree looks like:
(ppcre:parse-string "(\"[^\"]+\"|[^,]+)(?:,s*)?")
(:SEQUENCE (:REGISTER (:ALTERNATION (:SEQUENCE #\" (:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\")) #\") (:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\,)))) (:GREEDY-REPETITION 0 1 (:GROUP (:SEQUENCE #\, (:GREEDY-REPETITION 0 NIL #\s)))))
and passed the resulting tree to split:
(ppcre:split '(:SEQUENCE (:REGISTER (:ALTERNATION (:SEQUENCE #\" (:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\")) #\") (:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\,)))) (:GREEDY-REPETITION 0 1 (:GROUP (:SEQUENCE #\, (:GREEDY-REPETITION 0 NIL #\s))))) bads)
NIL
I also tried various forms of *allow-quoting*:
(let ((ppcre:*allow-quoting* t))
(ppcre:split "(\\Q\"\\E[^\\Q\"\\E]+\\Q\"\\E|[^,]+)(?:,\s*)?" bads))
I've read through the cl-ppcre docs, but there are very few examples of using parse trees, and no examples of escaping quotes.
Nothing seems to work.
I was hoping that the Regex Coach would provide a way to see the S-expression parse tree form of the Perl syntax string. That would be a very useful feature, allowing you to experiment with the regex string and then copy & paste the parse tree in Lisp code.
Does anyone know how to escape quotes in this example?
In this answer I focus on the errors in your code and try to explain how you could make it work. As explained by #Svante, this might not be the best course of actions for your use-case. In particular, your regex might be too tailored for your known test inputs and might miss cases that could arise later.
For example, your regex consider fields as either strings delimited by double-quotes with no inner double-quotes (even escaped), or a sequence of characters different from the comma. If, however, your field starts with a normal letter and then contains a double quote, it will be part of the field name.
Fixing the test string
Maybe there was a problem when formatting your question, but the form introducing bads is malformed.
Here is a fixed definition for *bads* (notice the asterisks around the special variable, this is a useful convention that helps distinguish them from lexical variables (asterisks around the names are also known as "earmuffs")):
(defparameter *bads*
"\"AER\",\"BenderlyZwick\",\"Benderly and Zwick Data: Inflation, Growth and Stock returns\",31,5,0,0,0,0,5,\"https://vincentarelbundock.github.io/Rdatasets/csv/AER/BenderlyZwick.csv\",\"https://vincentarelbundock.github.io/Rdatasets/doc/AER/BenderlyZwick.html\"")
Escape characters in regex
The parse tree you obtain contains this:
(... (:GREEDY-REPETITION 0 NIL #\s) ...)
There is a literal character #\s in your parse-tree. To understand why, let's define two auxiliary functions:
(defun chars (string)
"Convert a string to a list of char names"
(map 'list #'char-name string))
(defun test (s)
(list :parse (chars s)
:as (ppcre:parse-string s)))
For example, here is how the different strings below are parsed:
(test "s")
=> (:PARSE ("LATIN_SMALL_LETTER_S") :AS #\s)
(test "\s")
=> (:PARSE ("LATIN_SMALL_LETTER_S") :AS #\s)
(test "\\s")
=> (:PARSE ("REVERSE_SOLIDUS" "LATIN_SMALL_LETTER_S")
:AS :WHITESPACE-CHAR-CLASS)
Only in the last case, where the backslash (reverse solidus) is escaped, the PPCRE parser sees both this backslash and the next character #\s and interprets this sequence as :WHITESPACE-CHAR-CLASS. The Lisp reader interprets \s as s, because it is not part of the characters that can be escaped in Lisp.
I tend to work with parse tree directly because a lot of headaches w.r.t. escaping goes away (and in my opinion this is exacerbated with \Q and \E). A fixed parse tree is for example the following one, where I replaced the #\s by the desired keyword and removed the :register nodes that were not useful:
(:sequence
(:alternation
(:sequence #\"
(:greedy-repetition 1 nil
(:inverted-char-class #\"))
#\")
(:greedy-repetition 1 nil (:inverted-char-class #\,)))
(:greedy-repetition 0 1
(:group
(:sequence #\,
(:greedy-repetition 0 nil :whitespace-char-class)))))
Why the result is NIL
Remember that you are trying to split the string with this regex, but the regex actually describes a field and the following comma. The reason you have a NIL result is because your string is just a sequence of separators, like this example:
(split #\, ",,,,,,")
NIL
With a simpler example, you can see that splitting words as separators give:
(split "[a-z]+" "abc0def1z3")
=> ("" "0" "1" "3")
But if the separators also include digits, then the result is NIL:
(split "[a-z0-9]+" "abc0def1z3")
=> NIL
Looping over fields
With the regex you defined, it is easier to use do-register-groups. It is a loop construct that iterates over the string by trying to match the regex successively on the string, binding each (:register ...) in the regex to a variable.
If you put (:register ...) around the first (:alternation ...), you will sometimes capture the double quotes (first branch of the alternation):
(do-register-groups (field)
('(:SEQUENCE
(:register
(:ALTERNATION
(:SEQUENCE #\"
(:GREEDY-REPETITION 1 NIL
(:INVERTED-CHAR-CLASS #\"))
#\")
(:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\,))))
(:GREEDY-REPETITION 0 1
(:GROUP
(:SEQUENCE #\,
(:GREEDY-REPETITION 0 NIL :whitespace-char-class)))))
*bads*)
(print field))
"\"AER\""
"\"BenderlyZwick\""
"\"Benderly and Zwick Data: Inflation, Growth and Stock returns\""
"31"
"5"
"0"
"0"
"0"
"0"
"5"
"\"https://vincentarelbundock.github.io/Rdatasets/csv/AER/BenderlyZwick.csv\""
"\"https://vincentarelbundock.github.io/Rdatasets/doc/AER/BenderlyZwick.html\""
Another option is to add two :register nodes, one for each branch of the alternation; that means binding two variables, one of them being NIL for each successful match:
(do-register-groups (quoted simple)
('(:SEQUENCE
(:ALTERNATION
(:SEQUENCE #\"
(:register ;; <- quoted (first register)
(:GREEDY-REPETITION 1 NIL
(:INVERTED-CHAR-CLASS #\")))
#\")
(:register ;; <- simple (second register)
(:GREEDY-REPETITION 1 NIL (:INVERTED-CHAR-CLASS #\,))))
(:GREEDY-REPETITION 0 1
(:GROUP
(:SEQUENCE #\,
(:GREEDY-REPETITION 0 NIL :whitespace-char-class)))))
*bads*)
(print (or quoted simple)))
"AER"
"BenderlyZwick"
"Benderly and Zwick Data: Inflation, Growth and Stock returns"
"31"
"5"
"0"
"0"
"0"
"0"
"5"
"https://vincentarelbundock.github.io/Rdatasets/csv/AER/BenderlyZwick.csv"
"https://vincentarelbundock.github.io/Rdatasets/doc/AER/BenderlyZwick.html"
Inside the loop you could push each field into a list or a vector to be processed later.

How to prevent close!-ing before put-ing in onto-chan

I'd like to run a code like
(->> input
(partition-all 5)
(map a-side-effect)
dorun)
asynchronously dividing input and output(a-side-effect).
Then I've written the code to experiment below.
;; using boot-clj
(set-env! :dependencies '[[org.clojure/core.async "0.2.374"]])
(require '[clojure.core.async :as async :refer [<! <!! >! >!!]])
(let [input (range 18)
c (async/chan 1 (comp (partition-all 5)
(map prn)))]
(async/onto-chan c input false)
(async/close! c))
explanation for this code:
Actually elements in input and its quantity is not defined before running and elements in input is able to be taken by some numbers from 0 to 10.
async/onto-chan is used to put a Seq of elements (a fragment of input) into the channel c and will be called many times thus the 3rd argument is false.
prn is a substitute for a-side-effect.
I expected the code above prints
[0 1 2 3 4]
[5 6 7 8 9]
[10 11 12 13 14]
[15 16 17]
in REPL however it prints no characters.
And then I add a time to wait, like this
(let [c (async/chan 1 (comp (partition-all 5)
(map prn)))]
(async/onto-chan c (range 18) false)
(Thread/sleep 1000) ;wait
(async/close! c))
This code gave my expected output above.
And then I inspect core.async/onto-chan.
And I think what happend:
the channel c was core.async/close!ed in my code.
each item of the argument of core.async/onto-chan was put(core.async/>!) in vain in the go-loop in onto-chan because the channel c was closed.
Are there sure ways to put items before close!ing?
write a synchronous version of onto-chan not using go-loop?
Or is my idea wrong?
Your second example with Thread.sleep only ‘works’ by mistake.
The reason it works is that every transformed result value that comes out of c’s transducer is nil, and since nils are not allowed in channels, an exception is thrown, and no value is put into c: this is what allows the producer onto-chan to continue putting into the channel and not block waiting. If you paste your second example into the REPL you’ll see four stack traces – one for each partition.
The nils are of course due to mapping over prn, which is a side-effecting function that returns nil for all inputs.
If I understand your design correctly, your goal is to do something like this:
(defn go-run! [ch proc]
(async/go-loop []
(when-let [value (<! ch)]
(proc value)
(recur))))
(let [input (range 18)
c (async/chan 1 (partition-all 5))]
(async/onto-chan c input)
(<!! (go-run! c prn)))
You really do need a producer and a consumer, else your program will block. I’ve introduced a go-loop consumer.
Very generally speaking, map and side-effects don’t go together well, so I’ve extracted the side-effecting prn into the consumer.
onto-chan cannot be called ‘many times’ (at least in the code shown) so it doesn’t need the false argument.
taking megakorre's idea:
(let [c (async/chan 1 (comp (partition-all 5)
(map prn)))
put-ch (async/onto-chan c (range 18) false)]
(async/alts!! [put-ch])
(async/close! c))

Lisp, why is this number not a float

Using Common Lisp I am trying loop through a list of students and if the GPA is greater than or equal to 3.0 I want to push a 1 onto another list called equal_names. The problem I am having is the interpreter keeps saying the GPA in the comparison list is "not of type (or rational float)". Why am I getting this error?
Yes, this is for homework. Also this is my first time asking on here, so if you need anything else please let me know.
Sample of the list I am getting the GPA from, where the GPA is 2.307...:
(SETQ students (LIST
(LIST (LIST 'Abbott 'Ashley 'J) '8697387888 'NONE 2.3073320999676614)))
The code I have written:
(setq gpa_count ())
(loop for x in students
if(>= 3.0 (cdr (cdr (cdr x))))
do(push '1 gpa_count))
Given a non-empty list cdr returns the tail of that list, i.e. the list that contains all the elements of the list but the first. The important thing to note is that it returns a list, not an element. That is (cdr (cdr (cdr x))) returns the list (2.30733...), not the float 2.30733.
The loop iterates the outer list. To understand the code in the loop you can look at the first element in students, which is:
'((Abbott Ashley J) 8697387888 NONE 2.3073320999676614)
Now we are going to orientate the list. Every time you pass an element add a d.
Every time you pick a value or go to a list in the list you add an a.
To find how to access the number 2.307.... You look at the first element element in the list:
(Abbott Ashley J) d
8697387888 d
NONE d
Now we are at the part that you are interested in, ie. (2.3073320999676614)), thus you add an a. Now order those in reverse and put a c in front and a r in the end.. It becomes cadddr In light of that your loop should be:
(setq students '(("Mary Larson" 333 NONE 1.1)
("Mina Morson" 333 NONE 2.5)
("Magnus Outsider" 333 NONE 4.1)))
(setq gpa_count ())
(loop for x in students
if (>= 3.0 (cadddr x))
do (push '1 gpa_count))
gpa_count ; ==> (1 1)
Another example:
(setq x (1 (2 3) (3 4 (5 6) 7))) ; ==> (1 (2 3) (3 4 (5 6) 7))
To get the 3*. We follow the parts. 1 == d, (2 3) == a, 2 ==d, 3* == a. In reverse: adad and add c and r to the ends ==> cadadr. thus:
(cadadr '(1 (2 3) (3 4 (5 6) 7))) ; ==> 3
To get the 5. we do the same 1 == d, (2 3) == d and then we have the list we want ==a.
Then 3 ==d, 4 ==d, (5 6) ==a. The 5 is the first element == a. In reverse aaddadd. Now CL guarantees 4 letters accessors so we need to split it up in 4s from the right. Thus it becomes:
(caadr (cdaddr '(1 (2 3) (3 4 (5 6) 7)))) ; ==> 5
Now, without describing you can pick any number or list. Eg. to get (5 6) ddadda, in reverse and split up becomes (cadr (cdaddr x))
Hope it helps.
If your data format is consistent then
(fourth x)
will return the GPA.
Going further,
(setf (symbol-function 'gpa)(function fourth))
would provide
(gpa x)
as "an accessor" for the gpa in the data structure.
My CLISP 2.49 gives this error message:
*** - >=: (2.307332) is not a real number
Let's look at that error message: >=: (2.307332) is not a real number.
The error happens at the call to >= and one argument is a list of a number, not a number.
Since you try to extract the number from a list, does that extract work?
We see that you call CDR. CDR of a list returns a list. So there is the error. You need to extract the number from the list.
Btw., CLISP has commands like help, where, backtrace, ... to further investigate the problem. Just type help and return, without anything else, and you see a list of commands.

char representation clojure

How can I represent a char (character) in clojure?
Also I would like an example to test it using the char? function
(println (char? 1))
(println (char? (char 'a')))
Use a backslash for representing an individual character. For instance:
(char? \a)
returns true

Create a variable name from a string in Lisp

I'm trying to take a string, and convert it into a variable name. I though (make-symbol) or (intern) would do this, but apparently it's not quite what I want, or I'm using it incorrectly.
For example:
> (setf (intern (string "foo")) 5)
> foo
5
Here I would be trying to create a variable named 'foo' with a value of 5. Except, the above code gives me an error. What is the command I'm looking for?
There are a number of things to consider here:
SETF does not evaluate its first argument. It expects a symbol or a form that specifies a location to update. Use SET instead.
Depending upon the vintage and settings of your Common Lisp implementation, symbol names may default to upper case. Thus, the second reference to foo may actually refer to a symbol whose name is "FOO". In this case, you would need to use (intern "FOO").
The call to STRING is harmless but unnecessary if the value is already a string.
Putting it all together, try this:
> (set (intern "FOO") 5)
> foo
5
Use:
CL-USER 7 > (setf (SYMBOL-VALUE (INTERN "FOO")) 5)
5
CL-USER 8 > foo
5
This also works with a variable:
CL-USER 11 > (let ((sym-name "FOO"))
(setf (SYMBOL-VALUE (INTERN sym-name)) 3))
3
CL-USER 12 > foo
3
Remember also that by default symbols are created internally as uppercase. If you want to access a symbol via a string, you have to use an uppercase string then.

Resources