Learning clisp regexp - common-lisp

Trying to do my first steps in lisp:
I'm finding the following behaviour that, AFAIK, is incorrect.
[185]> (if (regexp:match "[:alnum:]" "2" :extended t) t nil)
NIL
[186]> (if (regexp:match "[:alnum:0-9]" "2" :extended t) t nil)
T
I understand :alnum: should include digits, but, apparently it doesn't!
What I'm doing wrong?

The syntax for character classes is "[:alnum:]", including the square brackets. So if you want to match, you have to write it like this:
(regexp:match "[[:alnum:]]" "2" :extended t)

Related

Puzzled about output of "typeof" in R

I cannot understand this output. I'm posting some dummy code because the real data is of course very large, but if you think it's necessary I could post it.
Field <- "COUNTRIES"
M <- a.dataframe.with.country.info
typeof(M[, Field])
#[1] "list"
typeof(M[, "COUNTRIES"])
#[1] "list"
typeof(M$COUNTRIES)
#[1] "character`
The function I need to use will use M[, Field] as input, and I need it to be a character, not a list.
I'm clueless.
Thanks in advance :D
I tried converting using unlist and as.character, but none of this worked. The last resource is to expose the (third-party) function and change the line which is causing the error Error in strsplit(M[, Field], sep) : non-character argument
Thanks to your replies, I can see now what was going on. I did not realize that tibbles are not the same as dataframes. So indeed "pull" (as suggested by #MrFlick) and casting to "dataframe" did the job.
I cannot change the M[, Field] line, because it is a third-party code - but it was the source of the error in the strsplit method.

How would I print out a list with spaces in between elements

So I'm trying to print out a list that looks a little bit something like this (setq lst (list '- '- '- '- '-)) and in the past I used the print command to print out the whole list, however, when printing the whole list there is parenthesis on each side which I do not want to see. I want to use something like (format t) to print every bit of my list and I have something like this set up.
(loop for item from 0 to 4
do (progn
(format t "~X" (nth item lst))
)
)
This code prints out the list perfectly fine like this, ----- but as a mentioned, I want it to print spaces between each element so that it is output like this - - - - -. I used the conditional "~X" because I looked up how to output spaces with the format command and you are apparently supposed to use "~X" but it does not work so if anybody knows how I could put spaces between elements that would be greatly appreciated.
Why not just use the features provided by format:
CL-USER> (defvar *my-list* '(- - - -))
*MY-LIST*
CL-USER> (format nil "~{~A~^ ~}" *my-list*)
"- - - -"
CL-USER> (format t "~{~A~^ ~}" *my-list*)
- - - -
NIL
Here the first call to format outputs to a string to show where the spaces are placed. ~{ is an iteration directive that takes a list as its argument. The directives between the opening ~{ and closing ~} are used repeatedly as a format string for the elements of the input list. The ~^ directive causes an early escape from the iteration context when there are no more arguments; this prevents a trailing space from being added.
The second call to format just outputs to *standard-output*.
Regarding your update, that you posted in the answers to your own post:
First of all, you should edit your post to show us that you found a solution, rather than having us look through all the answers to see how much progress you made on your initial problem.
As it was already mentioned in another answer, you can iterate through the elements of a list using format built-in syntax with ~{~} and ~^ (see the documentation !)
In your own solution, when you iterate over the list using loop, you can put a space at the end of the format string rather than calling format twice ...
You can use loop for <elt> in <list> rather than iterating with the indices, and calling nth at each step - which is slower, and also more verbose.
The loop ... do <stuff> already wraps the <stuff> in what we call an implicit progn, i.e. you do not need to wrap yourself all your instructions in a progn, the loop macro does that for you.
There also exists the macro dolist, which is (arguably) simpler to use in those case when you simply want to iterate over a list.
To be fair, it looks like you are a Common Lisp beginner. In this case, I suggest you read the excellent Practical Common Lisp book, which covers in details the loop macro, the format function, and a lot of basic principles. It is available for free online, and is often recommended to beginners, for good reasons !
Ok I came up with an ingenius solution to my problem which I definitely should've seen before.
(loop for item from 0 to 4
do (progn
(format t "~X" (nth item lst))
(format t " ")
)
)
I didn't realize I could print a space like that but it works perfectly fine. Sorry for wasting you all's time but hopefully someone else can see this if they are having a brain fart like me and thanks to everyone who tried to help.

Correcting the regex "\[([a-zA-Z0-9_-]+)]"

The following cl-ppcre regular expression generates an error:
(ppcre:scan-to-strings "\[([a-zA-Z0-9_-]+)]" "[has-instance]")
debugger invoked on a CL-PPCRE:PPCRE-SYNTAX-ERROR in thread
#<THREAD "main thread" RUNNING {10010B0523}>:
Expected end of string. at position 16 in string "[([a-zA-Z0-9_-]+)]"
What I was expecting as return values is:
“[has-instance]”
#(“has-instance”)
in order to get at the string within the brackets. Can someone provide a regex correction? Thanks.
The escape character (backslash) only escapes itself and double quotes (§2.4.5 Double-Quote):
If a single escape character is seen, the single escape character is discarded, the next character is accumulated, and accumulation continues.
That means that:
"\[([a-zA-Z0-9_-]+)]"
is parsed the same as the following, where backslash is not present:
"[([a-zA-Z0-9_-]+)]"
The PCRE syntax implemented by CL-PPCRE understands the opening square bracket as a special syntax for character classes, and ends at the next closing bracket.
Thus, the above reads the following as a class:
[([a-zA-Z0-9_-]
The corresponding regex tree is:
CL-USER> (ppcre:parse-string "[([a-zA-Z0-9_-]")
(:CHAR-CLASS #\( #\[ (:RANGE #\a #\z) (:RANGE #\A #\Z) (:RANGE #\0 #\9) #\_ #\-)
Note in particular that the opening parenthesis inside it is treated literally. When the parser encounters the closing parenthesis that follows the above fragment, it interprets it as the end of a register group, but no such group was started, hence the error message at position 16 of the string.
To avoid treating the bracket as a character class, it must be preceded by a literal backslash in the string, as you tried to do, but in order to do so you must write two backslash characters:
CL-USER> (ppcre:parse-string "\\[([a-zA-Z0-9_-]+)]")
(:SEQUENCE #\[
(:REGISTER
(:GREEDY-REPETITION 1 NIL
(:CHAR-CLASS (:RANGE #\a #\z) (:RANGE #\A #\Z) (:RANGE #\0 #\9) #\_ #\-)))
#\])
The closing square brackets needs no backslash.
I encourage you to write regular expressions in Lisp using the tree form, with :regex terms when it improves clarity: it avoids having to deal with the kind of problems that escaping brings. For example:
CL-USER> (ppcre:scan-to-strings
'(:sequence "[" (:register (:regex "[a-zA-Z0-9_-]+")) "]")
"[has-instance]")
"[has-instance]"
#("has-instance")
Double escape the square brackets.
You forgot to (double) escape the closing bracket, too.
(cl-ppcre:scan-to-strings "\\[([a-zA-Z0-9_-]+)\\]" "[has-instance]")
;; "[has-instance]" ;
;; #("has-instance")
For those who are new to common lisp, you import cl-ppcre using quicklisp:
(load "~/quicklisp/setup.list") ;; adjust path to where you installed your quicklisp
(ql:quickload :cl-ppcre)

Regular expression complex combination : (^ +)|( +$)

Considering myself a novice at Regular-Expressions, I came across an R-script which would eventually wipe away white-spaces from a string or (say) a line using gsub().
Following is the gsub() function with a (in my opinion) a complex criterion to match :
gsub("(^ +)|( +$)", "", line)
Can anyone explain me what does this expression mean ? Thoroughly !
An example would make this so easy.
Please also provide some links where i can learn some real stuff about regex, because i found no good sources when i looked for the same.
Thanks for considerations.
The regex just trims the space in the string, Using the base R function trimws will be more clear I think.
(^ +)|( +$)
^ string start position.
+(space plus) more than one space.
$ string ending position.
| alternative.

How to properly tell FORMAT to discard input

I want a dynamic way of telling FORMAT to discard output depending an a certain global variable set before the actual call. I figured that changing t to nil should do the deal, but I am not satisfied as I will then not be able to use those FORMAT calls at any point where the returned string could be confused for an actual return value.
E.g:
Telling FORMAT to output on TERMINAL-IO (works fine)
(let ((*the-var* t))
(FORMAT *the-var* "some text")
#|do some other stuff|#)
->"some-text"
->'return-value'
Telling FORMAT to discard output (works fine)
(let ((*the-var* nil))
(FORMAT *the-var* "some text")
#|do some other stuff|#)
->'return-value'
Telling FORMAT to discard output (does not work fine as the returned string of FORMAT might get confused with a possible return value)
(let ((*the-var* nil)) ;no return value intended//nil expected
#|do some stuff|#
(FORMAT *the-var* "some text"))
->"some text"
Therefore I wonder if there is any way of telling FORMAT to discard output without to much fuss, like setting the *the-var* variable to a "/dev/null"-stream or putting a condition-clause around it?
A broadcast stream with no component streams is the Common Lisp way to discard output. You can create one with make-broadcast-stream.
FORMAT does not discard output.
If you pass NIL to FORMAT as output direction, then it will return the output as a string and will not print to a stream.
The best way to not print anything is to not call FORMAT.
It makes very little sense to use FORMAT to generate output and not use that output for display. Just check if you want output or not.
(let ((output-p nil)) ;no return value intended//nil expected
#|do some stuff|#
(when output-p
(FORMAT stream "some text")))

Resources