CLISP open-http example - http

I am trying to read a series of web pages with CLISP, if they exist, but I don't understand how open-http works to skip non existing web pages.
I have the following:
(dolist (word '(a b c))
(with-open-stream (stream (ext:open-http
(format nil
"https://en.wikipedia.org/wiki/~a.html"
word)
:if-does-not-exist nil))
(when stream
(print word))))
I want to simply skip a web-page if it doesn't exist, but CLISP seems to hang and returns an "Invalid argument" error.
Could anyone explain how the argument :if-does-not-exist works and/or provide examples of how to use open-http. Thanks!

It does work for me:
(with-open-stream (stream (ext:open-http
"http://stackoverflow.com/questions/234242424242"
:if-does-not-exist nil))
(format t "~&Stream: ~A~%" stream))
Output:
;; connecting to "http://stackoverflow.com/questions/234242424242"...connected...HTTP/1.1 404 Not Found
;; HTML source of Page not found
Stream: NIL
NIL
There is a delay to get the connection, but it works.
If the page does exist:
[7]> (with-open-stream (stream (ext:open-http
"http://stackoverflow.com/questions/36003343/clisp-open-http-example"
:if-does-not-exist nil))
(format t "~&Stream: ~A~%" stream))
;; connecting to "http://stackoverflow.com/questions/36003343/clisp-open-http-example"...connected...HTTP/1.1 200 OK
Stream: #<IO INPUT-BUFFERED SOCKET-STREAM CHARACTER stackoverflow.com:80>
NIL
With Wikipedia I couldn't make it work since Wikipedia.org re-directs it to HTTPS and EXT:OPEN-HTTP neither can handle HTTPS directly, nor it can handle redirects:
Here if HTTPS is used directly:
[10]> (with-open-stream (stream (ext:open-http
"https://en.wikipedia.org/wiki/Common_Lisp"
:if-does-not-exist nil))
(format t "~&Stream: ~A~%" stream))
*** - OPEN-HTTP: "https://en.wikipedia.org/wiki/Common_Lisp" is not an HTTP URL
The following restarts are available:
ABORT :R1 Abort main loop
Break 1 [11]> :r1
If "https" is replaced by "http", CLISP doesn't construct a proper address:
[12]> (with-open-stream (stream (ext:open-http
"http://en.wikipedia.org/wiki/Common_Lisp"
:if-does-not-exist nil))
(format t "~&Stream: ~A~%" stream))
;; connecting to "http://en.wikipedia.org/wiki/Common_Lisp"...connected...HTTP/1.1 301 TLS Redirect --> "https://en.wikipedia.org/wiki/Common_Lisp"
;; connecting to "http://en.wikipedia.orghttps://en.wikipedia.org/wiki/Common_Lisp"...
*** - PARSE-INTEGER: substring "" does not have integer syntax at position 0
The following restarts are available:
ABORT :R1 Abort main loop
Break 1 [13]>

Related

How to properly use Common Lisp's multiple-value-bind and handler-case on this HTTP-request interface?

I am using SBCL, Emacs, and Slime. In addition, I am using the library Dexador.
Dexador documentation provides an example on how to handle failed HTTP requests.
From the official documentation, it says:
;; Handles 400 bad request
(handler-case (dex:get "http://lisp.org")
(dex:http-request-bad-request ()
;; Runs when 400 bad request returned
)
(dex:http-request-failed (e)
;; For other 4xx or 5xx
(format *error-output* "The server returned ~D" (dex:response-status e))))
Thus, I tried the following. It must be highlighted that this is part of a major system, so I am simplifying it as:
;;Good source to test errors: https://httpstat.us/
(defun my-get (final-url)
(let* ((status-code)
(response)
(response-and-status (multiple-value-bind (response status-code)
(handler-case (dex:get final-url)
(dex:http-request-bad-request ()
(progn
(setf status-code
"The server returned a failed request of 400 (bad request) status.")
(setf response nil)))
(dex:http-request-failed (e)
(progn
(setf status-code
(format nil
"The server returned a failed request of ~a status."
(dex:response-status e)))
(setf response nil))))
(list response status-code))))
(list response-and-status response status-code)))
My code output is close to what I want. But I do not understand it is output.
When the HTTP request is successful, this is the output:
CL-USER> (my-get "http://www.paulgraham.com")
(("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">
<html><script type=\"text/javascript\">
<!--
... big HTML omitted...
</script>
</html>"
200)
NIL NIL)
I was expecting (or wished) the output to be something like: '(("big html" 200) "big html" 200).
But, things happen to be even weirder when the HTTP request fails. For instance:
CL-USER> (my-get "https://httpstat.us/400")
((NIL NIL) NIL
"The server returned a failed request of 400 (bad request) status.")
I was expecting: '((NIL "The server returned a failed request of 400 (bad request) status.") NIL "The server returned a failed request of 400 (bad request) status.")
Or:
CL-USER> (my-get "https://httpstat.us/425")
((NIL NIL) NIL "The server returned a failed request of 425 status.")
Again, I was expecting: ((NIL "The server returned a failed request of 425 status.") NIL "The server returned a failed request of 425 status.")
I am afraid there is a variable overshadowing problem happening - not sure, though.
How can I create a function so that I can safely store in variables the response and the status-code independent of being a failed or successful request?
If the request is successful, I have ("html" 200). If it fails, it would be: (nil 400) or other number (nil 425) - depending on the error message.
Your problem is that you failed to understand how multiple-value-bind, handler-case, and let* work. (Probably also setf and progn.)
TLDR
Quick fix:
(defun my-get (final-url)
(let* ((status-code)
(response)
(response-and-status
(multiple-value-bind (bresponse bstatus-code)
(handler-case (dex:get final-url)
(dex:http-request-bad-request ()
(values nil
"The server returned a failed request of 400 (bad request) status."))
(dex:http-request-failed (e)
(values nil
(format nil "The server returned a failed request of ~a status." (dex:response-status e)))))
(list (setf response bresponse)
(setf status-code bstatus-code)))))
(list response-and-status response status-code)))
Output:
CL-USER> (my-get "http://www.paulgraham.com")
(("big html" 200) "big html" 200)
CL-USER> (my-get "https://httpstat.us/400")
((NIL "The server returned a failed request of 400 (bad request) status.") NIL "The server returned a failed request of 400 (bad request) status.")
CL-USER> (my-get "https://httpstat.us/425")
((NIL "The server returned a failed request of 425 status.") NIL "The server returned a failed request of 425 status.")
So, Why Do You Get Your Current Result?
Preliminary
For multiple-value-bind, it binds multiple values returning from a values form into the corresponding variables.
CL-USER> (multiple-value-bind (a b)
nil
(list a b))
(NIL NIL)
CL-USER> (multiple-value-bind (a b)
(values 1 2)
(list a b))
(1 2)
CL-USER> (multiple-value-bind (a b)
(values 1 2 3 4 5)
(list a b))
(1 2)
For handler-case, it returns the value of the expression form when there is no error. When there is an error, it will execute the corresponding error-handling code.
CL-USER> (handler-case (values 1 2 3)
(type-error () 'blah1)
(error () 'blah2))
1
2
3
CL-USER> (handler-case (signal 'type-error)
(type-error () 'blah1)
(error () 'blah2))
BLAH1
CL-USER> (handler-case (signal 'error)
(type-error () 'blah1)
(error () 'blah2))
BLAH2
For let* form, variables are initialised to nil if init-forms are not provided. Same goes to the let form. The parentheses around these variables without init-forms are unneeded. Example:
CL-USER> (let* (a b (c 3))
(list a b c))
(NIL NIL 3)
Combining These Knowledges
When dex:get success (i.e. no error), it returns the values of dex:get, which is (values body status response-headers uri stream). With multiple-value-bind, your response-and-status is bound to the value of (list response status-code), which is ("big html" 200).
Since the code in both dex:http-request-bad-request and dex:http-request-failed will only be executed when dex:get failed, both response and status-code have the initial value nil. That's why you get (("big html" 200) nil nil) on success.
When dex:get failed, both response and status-code are setf to new values. Since (setf response nil) mutates the value of response and returns nil (the new value set), and since progn returns the value of last form, your progn returns nil for both error handling cases. That's why your response-and-status is bound to (nil nil) when failed.
This is an addendum to zacque's answer, which describes the problem and your confusions about multiple-value-bind & other things pretty well.
As mentioned in that answer, dex:get returns five values: body, status, response-headers, uri, stream. But it also signals one of various conditions on failure. So one obvious thing to do is simply have a function which returns the same five values (there's no reason to package some of them into a list), but handles the errors, relying on the handler to return suitable values. Such a function is terribly simple, and has no assignment.
(defun my-get (final-url)
(handler-case (dex:get final-url)
(dex:http-request-failed (e)
;; These are probably not the right set of values, but if the
;; first one is NIL we're basically OK.
(values nil
(dex:response-status e)
e
(format nil
"The server returned a failed request of ~a status."
(dex:response-status e))
nil))))
If you really want to package the values up into some structure you can do that, but generally it's easy to just handle multiple values. For instance a user of this function can now just ignore all but the first value rather than unpacking all sorts of grot, so:
(defun my-get-user (url)
(or (my-get url)
(error "oops")))

'Required argument is not a symbol' error in let binding

In the following code, I get a Required argument is not a symbol error.
(defconstant +localhost+ (vector 127 0 0 1))
(defun ip-from-hostname (hostname)
(sb-bsd-sockets:host-ent-addresses
(sb-bsd-sockets:get-host-by-name hostname)))
(defun test-connect
(let ((ip (car (ip-from-hostname "www.google.com")))
(socket (make-instance 'sb-bsd-sockets:inet-socket :type :stream :protocol :tcp)))
(sb-bsd-sockets:socket-bind socket +localhost+ 8080)
(sb-bsd-sockets:socket-connect socket ip)
(sb-bsd-sockets:socket-send socket "GET / HTTP/1.1" nil)
(write-line (sb-bsd-sockets:socket-receive socket nil 2048))))
(test-connect)
More complete error message:
Required argument is not a symbol: ((IP
(CAR
(IP-FROM-HOSTNAME "www.google.com")))
(SOCKET
(MAKE-INSTANCE
'SB-BSD-SOCKETS:INET-SOCKET :TYPE
:STREAM :PROTOCOL :TCP)))
I've narrowed down the issue to the section calling ip-from-hostname, but the strange thing is a boiled down version of the let binding works in the REPL:
(let ((ip (sb-bsd-sockets:host-ent-addresses (sb-bsd-sockets:get-host-by-name "www.google.com"))))
(write-line (write-to-string (car ip))))
I also tried replacing the ip-from-hostname call with its body thinking that it might be something to do with the arguments, but still no luck. Any thoughts?
(defun test-connect ...
... should start with a lambda list (the list of parameters), which is missing.
Remember, the syntax for DEFUN is:
defun function-name lambda-list
[[declaration* | documentation]]
form*

How do I catch the socket error in the TCP server of cl-async package?

How do I catch the unexpected disconnection of the TCP socket with cl-async tcp-server? For example the server below cannot handle the situation when the client closes the telnet terminal forcefully. I guess I should add the handler after the :event-cb keyword but I don't know how I can combine with the as:signal-handler.
(require 'cl-async)
(require 'babel)
(defun my-tcp-server ()
(format t "Starting server.~%")
(as:tcp-server
nil 8888 ; nil means 0.0.0.0 to listen for any address
;; read-cb
(lambda (socket data)
(let ((data-str ; stores the received data in utf8 string
(handler-case (babel:octets-to-string data :encoding :utf-8)
(babel-encodings:invalidutf8-continuation-byte (err)
(declare (ignore err))
(format nil "^#~%")))))
;; exits if received "bye"
(cond ((equal "bye" (string-right-trim '(#\Return #\Newline) data-str))
(as:close-socket socket)
(format t "Client disconnected.~%"))
(t (format t "~a" data-str) ; echo on the server side
(as:write-socket-data socket "Send to server > ")))))
;; handle SIGINT
:event-cb (as:signal-handler 2 (lambda (sig)
(declare (ignore sig))
(as:free-signal-handler 2)
(as:exit-event-loop)))
:connect-cb (lambda (socket)
(format t "Client connected.~%")
(as:write-socket-data socket "Send to server > "))))
(as:start-event-loop #'my-tcp-server)

common lisp: how to suppress newline or "soft return"

This code
(defun arabic_to_roman (filename)
(let ((arab_roman_dp '())
(arab nil)
(roman nil))
(with-open-file (in filename
:direction :input
:if-does-not-exist nil)
(when in
(loop for line = (read-line in nil)
while line do
(setq arab (subseq line 0 (search "=" line)))
(setq roman (subseq line (1+ (search "=" line)) (length line)))
(setf arab_roman_dp (acons arab roman arab_roman_dp))
;(format t "~S ~S~%" arab roman)
)))
(with-open-file (stream #p"ar_out.txt"
:direction :output
:if-exists :overwrite
:if-does-not-exist :create )
(write arab_roman_dp :stream stream :escape nil :readably nil))
'done!))
seems to work well. It takes a file with entries like this
1=I
2=II
...
and builds one large list of dotted pairs. However when I look at the output file, it seems as though soft returns or newlines have been inserted.
((4999 . MMMMCMXCIX) (4998 . MMMMCMXCVIII) (4997 . MMMMCMXCVII)
(4996 . MMMMCMXCVI) (4995 . MMMMCMXCV) (4994 . MMMMCMXCIV)
(4993 . MMMMCMXCIII) (4992 . MMMMCMXCII) (4991 . MMMMCMXCI) (4990 . MMMMCMXC)
...
I was expecting the output to look more like just one continuous line:
((4999 . MMMMCMXCIX) (4998 . MMMMCMXCVIII) (4997 . MMMMCMXCVII) (4996 . MMMCMXCVI) (4995 . MMMMCMXCV) (4994 . MMMMCMXCIV) (4993 . MMMMCMXCIII) (4992 . MMMMCMXCII) (4991 . MMMMCMXCI) (4990 . MMMMCMXC) ...
Is my code the way it is indeed throwing in newlines somehow? I've used the write version of princ which supposedly suppresses newlines. Later I want to read this file back into the program as just one big list, so I don't want newline issues.
It looks like the pretty-printer is being invoked (the default is implementation-dependent), to print it with indentation and human-readable line lengths. Use :pretty nil to disable this.
(write arab_roman_dp :stream stream :escape nil :readably nil :pretty nil)
A better way to write it:
use functions to create blocks of code, which can be combined
less side effects and fewer variables
no need to test for in
easy to understand control flow
Example:
(defun arabic_to_roman (filename)
(flet ((read-it ()
(with-open-file (in filename
:direction :input
:if-does-not-exist nil)
(loop for line = (read-line in nil)
for pos = (position #\= line)
while line collect (cons (subseq line 0 pos)
(subseq line (1+ pos))))))
(write-it (list)
(with-open-file (stream #p"ar_out.txt"
:direction :output
:if-exists :overwrite
:if-does-not-exist :create)
(write list :stream stream :escape nil :readably nil :pretty nil))))
(write-it (read-it))
'done-read-write))

How to interact with a process input/output in SBCL/Common Lisp

I have a text file with one sentence per line. I would like to lemmatize the worlds in each line using hunspell (-s option). Since I want to have the lemmas of each line separately, it wouldn't make sense to submit the whole text file to hunspell. I do need to send one line after another and have the hunspell output for each line.
Following the answers from How to process input and output streams in Steel Bank Common Lisp?, I was able to send the whole text file for hunspell one line after another but I was not able to capture the output of hunspell for each line. How interact with the process sending the line and reading the output before send another line?
My current code to read the whole text file is
(defun parse-spell-sb (file-in)
(with-open-file (in file-in)
(let ((p (sb-ext:run-program "/opt/local/bin/hunspell" (list "-i" "UTF-8" "-s" "-d" "pt_BR")
:input in :output :stream :wait nil)))
(when p
(unwind-protect
(with-open-stream (o (process-output p))
(loop
:for line := (read-line o nil nil)
:while line
:collect line))
(process-close p))))))
Once more, this code give me the output of hunspell for the whole text file. I would like to have the output of hunspell for each input line separately.
Any idea?
I suppose you have a buffering problem with the program you want to run. For example:
(defun program-stream (program &optional args)
(let ((process (sb-ext:run-program program args
:input :stream
:output :stream
:wait nil
:search t)))
(when process
(make-two-way-stream (sb-ext:process-output process)
(sb-ext:process-input process)))))
Now, on my system, this will work with cat:
CL-USER> (defparameter *stream* (program-stream "cat"))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*) ; will hang without this
NIL
CL-USER> (read-line *stream*)
"foo bar baz"
NIL
CL-USER> (close *stream*)
T
Notice the finish-output – without this, the read will hang. (There's also force-output.)
Python in interactive mode will work, too:
CL-USER> (defparameter *stream* (program-stream "python" '("-i")))
*STREAM*
CL-USER> (loop while (read-char-no-hang *stream*)) ; skip startup message
NIL
CL-USER> (format *stream* "1+2~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"3"
NIL
CL-USER> (close *stream*)
T
But if you try this without the -i option (or similar options like -u), you'll probably be out of luck, because of the buffering going on. For example, on my system, reading from tr will hang:
CL-USER> (defparameter *stream* (program-stream "tr" '("a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*) ; hangs
; Evaluation aborted on NIL.
CL-USER> (read-char-no-hang *stream*)
NIL
CL-USER> (close *stream*)
T
Since tr doesn't provide a switch to turn off buffering, we'll wrap the call with a pty wrapper (in this case unbuffer from expect):
CL-USER> (defparameter *stream* (program-stream "unbuffer"
'("-p" "tr" "a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"FOO BAR BAZ
"
NIL
CL-USER> (close *stream*)
T
So, long story short: Try using finish-output on the stream before reading. If that doesn't work, check for command line options preventing buffering. If it still doesn't work, you could try wrapping the programm in some kind of pty-wrapper.

Resources