I'm attempting to learn a little more about handling sockets and network connections in SBCL; so I wrote a simple wrapper for HTTP. Thus far, it merely makes a stream and performs a request to ultimately get the header data and page content of a website.
Until now, it has worked at somewhat decently. Nothing to brag home about, but it at least worked.
I have come across a strange problem, however; I keep getting "400 Bad Request" errors.
At first, I was somewhat leery about how I was processing the HTTP requests (more or less passing a request string as a function argument), then I made a function that formats a query string with all the parts I need and returns it for use later... but I still get errors.
What's even more odd is that the errors don't happen every time. If I try the script on a page like Google, I get a "200 Ok" return value... but at other times on other sites, I'll get "400 Bad Request".
I'm certain its a problem with my code, but I'll be damned if I know exactly what is causing it.
Here is the code that I am working with:
(use-package :sb-bsd-sockets)
(defun read-buf-nonblock (buffer stream)
(let ((eof (gensym)))
(do ((i 0 (1+ i))
(c (read-char stream nil eof)
(read-char-no-hang stream nil eof)))
((or (>= i (length buffer)) (not c) (eq c eof)) i)
(setf (elt buffer i) c))))
(defun http-connect (host &optional (port 80))
"Create I/O stream to given host on a specified port"
(let ((socket (make-instance 'inet-socket
:type :stream
:protocol :tcp)))
(socket-connect
socket (car (host-ent-addresses (get-host-by-name host))) port)
(let ((stream (socket-make-stream socket
:input t
:output t
:buffering :none)))
stream)))
(defun http-request (stream request &optional (buffer 1024))
"Perform HTTP request on a specified stream"
(format stream "~a~%~%" request )
(let ((data (make-string buffer)))
(setf data (subseq data 0
(read-buf-nonblock data
stream)))
(princ data)
(> (length data) 0)))
(defun request (host request)
"formated HTTP request"
(format nil "~a HTTP/1.0 Host: ~a" request host))
(defun get-page (host &optional (request "GET /"))
"simple demo to get content of a page"
(let ((stream (http-connect host)))
(http-request stream (request host request)))
A few things. First, to your concern about the 400 errors you are getting back, a few possibilities come to mind:
"Host:" isn't actually a valid header field in HTTP/1.0, and depending on how fascist the web server you are contacting is about standards, it would reject this as a bad request based on the protocol you claim to be speaking.
You need a CRLF between your Request-line and each of the header lines.
It is possible that your (request) function is returning something for the Request-URI field -- you substitute in the value of request as the contents of this part of the Request-line -- that is bogus in one way or another (badly escaped characters, etc.). Seeing what it is outputting might help out some.
Some other more general pointer to help you along your way:
(read-buf-nonblock) is very confusing. Where is the symbol 'c' defined? Why is 'eof' (gensym)ed and then not assigned any value? It looks very much like a byte-by-byte copy taken straight out of an imperative program, and plopped into Lisp. It looks like what you have reimplemented here is (read-sequence). Go look here in the Common Lisp Hyperspec, and see if this is what you need. The other half of this is to set your socket you created to be non-blocking. This is pretty easy, even though the SBCL documentation is almost silent on the topic. Use this:
(socket-make-stream socket
:input t
:output t
:buffering :none
:timeout 0)
The last (let) form of (http-connect) isn't necessary. Just evaluate
(socket-make-stream socket
:input t
:output t
:buffering :none)
without the let, and http-connect should still return the right value.
In (http-request)...
Replace:
(format stream "~a~%~%" request )
(let ((data (make-string buffer)))
(setf data (subseq data 0
(read-buf-nonblock data
stream)))
(princ data)
(> (length data) 0)))
with
(format stream "~a~%~%" request )
(let ((data (read-buf-nonblock stream)))
(princ data)
(> (length data) 0)))
and make (read-buf-nonblock) return the string of data, rather that having it assign within the function. So where you have buffer being assigned, create a variable buffer within and then return it. What you are doing is called relying on "side-effects," and tends to produce more errors and harder to find errors. Use it only when you have to, especially in a language that makes it easy not to depend on them.
I mostly like the the way get-page is defined. It feels very much in the functional programming paradigm. However, you should either change the name of the (request) function, or the variable request. Having both in there is confusing.
Yikes, hands hurt. But hopefully this helps. Done typing. :-)
Here's a possibility:
HTTP/1.0 defines the sequence CR LF as the end-of-line marker.
The ~% format directive is generating a #\Newline (LF on most platforms, though see CLHS).
Some sites may be tolerant of the missing CR, others not so much.
Related
I don't understand why this code behaves differently in different implementations:
(format t "asdf")
(setq var (read))
In CLISP it behaves as would be expected, with the prompt printed followed by the read, but in SBCL it reads, then outputs. I read a bit on the internet and changed it:
(format t "asdf")
(force-output t)
(setq var (read))
This, again, works fine in CLISP, but in SBCL it still reads, then outputs. I even tried separating it into another function:
(defun output (string)
(format t string)
(force-output t))
(output "asdf")
(setq var (read))
And it still reads, then outputs. Am I not using force-output correctly or is this just an idiosyncrasy of SBCL?
You need to use FINISH-OUTPUT.
In systems with buffered output streams, some output remains in the output buffer until the output buffer is full (then it will be automatically written to the destination) or the output buffer is explicity emptied.
Common Lisp has three functions for that:
FINISH-OUTPUT, attempts to ensure that all output is done and THEN returns.
FORCE-OUTPUT, starts the remaining output, but IMMEDIATELY returns and does NOT wait for all output being done.
CLEAR-OUTPUT, tries to delete any pending output.
Also the T in FORCE-OUTPUT and FORMAT are unfortunately not the same.
force-output / finish-output: T is *terminal-io* and NIL is *standard-output*
FORMAT: T is *standard-output*
this should work:
(format t "asdf")
(finish-output nil) ; note the NIL
(setq var (read))
Otherwise said, I want to rely on epoll (or similar) to write asynchronous network code that looks like regular code that is without relying on callbacks.
The code must look like synchronous code but unlike synchronous code instead of blocking to wait for network io, it must suspend the current coroutine and restart it when the file descriptor is ready.
My initial thought to achieve that was relying on generators and yield. But this was a mistake that was partly mis-guided by the fact that python used to abuse yield from.
Anyway, guile fibers was a great insipiration and I adapted it to chez scheme.
Here is an example server code:
(define (handler request port)
(values 200 #f (http-get "https://httpbin.davecheney.com/ip")))
(untangle (lambda ()
(run-server "127.0.0.1" 8888)))
The handler returns its IP according the httpbin service. The code look synchronous with the help of call/cc actually call/1cc.
untangle will initiate the event loop with a lambda passed as argument!
Here is the definition of run-server:
(define (run-server ip port handler)
(log 'info "HTTP server running at ~a:~a" ip port)
(let* ((sock (socket 'inet 'stream 'ipv4)))
(socket:setsockopt sock 1 2 1) ;; re-use address
(socket:bind sock (make-address ip port))
(socket:listen sock 1024)
(let loop ()
(let ((client (accept sock)))
(let ((port (fd->port client)))
(spawn (lambda () (run-once handler port)))
(loop))))))
As you can see there is no callback. The only thing that is somewhat different from simple synchronous webserver is the spawn procedure that will handle the request in its own coroutine. In particular accept is asynchronous.
run-once will just pass the scheme request to handler and take its 3 values to build the response. Not very interesting. The part that looks synchronous, but is actually asynchronous is http-get above.
I will only explain, how accept works, given http-get requires to introduce custom binary ports, but suffice to say it is the same behavior...
(define (accept fd)
(let ((out (socket:%accept fd 0 0)))
(if (= out -1)
(let ((code (socket:errno)))
(if (= code EWOULDBLOCK)
(begin
(abort-to-prompt fd 'read)
(accept fd))
(error 'accept (socket:strerror code))))
out)))
As you can see it calls a procedure abort-to-prompt that we could call simply pause that will "stop" the coroutine and call the prompt handler.
abort-to-prompt works in cooperation with call-with-prompt.
Since chez scheme doesn't have prompts I emulate it using two one shot continuations call/1cc
(define %prompt #f)
(define %abort (list 'abort))
(define (call-with-prompt thunk handler)
(call-with-values (lambda ()
(call/1cc
(lambda (k)
(set! %prompt k)
(thunk))))
(lambda out
(cond
((and (pair? out) (eq? (car out) %abort))
(apply handler (cdr out)))
(else (apply values out))))))
(define (abort-to-prompt . args)
(call/1cc
(lambda (k)
(let ((prompt %prompt))
(set! %prompt #f)
(apply prompt (cons %abort (cons k args)))))))
call-with-prompt will initiate a continuation a set! global called %prompt which means there is single prompt for THUNK. If the continuation arguments OUT, the second lambda of call-with-values, starts with the unique object %abort it means the continuation was reached via abort-to-prompt. It will call the HANDLER with the abort-to-prompt continuation and any argument passed to call-with-prompt continuation parameter that is the (apply handler (cons k (cdr out))).
abort-to-promp will initiate a new continuation to be able to come back, after the code executes the prompt's continuation stored in %prompt.
The call-with-prompt is at the heart of the event-loop. Here is it, in two pieces:
(define (exec epoll thunk waiting)
(call-with-prompt
thunk
(lambda (k fd mode) ;; k is abort-to-prompt continuation that
;; will allow to restart the coroutine
;; add fd to the correct epoll set
(case mode
((write) (epoll-wait-write epoll fd))
((read) (epoll-wait-read epoll fd))
(else (error 'untangle "mode not supported" mode)))
(scheme:hash-table-set! waiting fd (make-event k mode)))))
(define (event-loop-run-once epoll waiting)
;; execute every callback waiting in queue,
;; call the above exec procedure
(let loop ()
(unless (null? %queue)
;; XXX: This is done like that because, exec might spawn
;; new coroutine, so we need to cut %queue right now.
(let ((head (car %queue))
(tail (cdr %queue)))
(set! %queue tail)
(exec epoll head waiting)
(loop))))
;; wait for ONE event
(let ((fd (epoll-wait-one epoll (inf))
(let ((event (scheme:hash-table-ref waiting fd)))
;; the event is / will be processed, no need to keep around
(scheme:hash-table-delete! waiting fd)
(case (event-mode event)
((write) (epoll-ctl epoll 2 fd (make-epoll-event-out fd)))
((read) (epoll-ctl epoll 2 fd (make-epoll-event-in fd))))
;; here it will schedule the event continuation that is the
;; abort-to-prompt continuation that will be executed by the
;; next call the above event loop event-loop-run-once
(spawn (event-continuation event))))))
I think that is all.
If you are using chez-scheme, there is chez-a-sync. It uses POSIX poll rather than epoll (epoll is linux specific). guile-a-sync2 is also available for guile-2.2/3.0.
I was hoping to experiment with cl-async to run a series of external programs with a large combinations of command line arguments. However, I can't figure out how to read the stdout of the processes launched with as:spawn.
I would typically use uiop which makes it easy to capture the process output:
(let ((p (uiop:launch-program ... :output :stream)))
(do-something-else-until-p-is-done)
(format t "~a~%" (read-line (uiop:process-info-output p))))
I've tried both :output :pipe and :output :stream options to as:spawn and executing (as:process-output process-object) in my exit-callback shows the appropriate pipe or async-stream objects but I can't figure out how to read from them.
Can anyone with experience with this library tell how to accomplish this?
So you go to your repl and type:
CL-USER> (documentation 'as:spawn 'function)
And you read whatever comes out (or put your point on the symbol and hit C-c C-d f). If you read it you’ll see that the format for the :input, etc arguments is either :pipe, (:pipe args...), :stream, or (:stream args...) (or some other options). And that :stream behaves similarly to :pipe but gives output of a different type and that for details of args one should look at PIPE-CONNECT so you go and look up the documentation for that. Well it tells you what the options are but it isn’t very useful. What’s the documentation/description of PIPE or STREAM? Well it turns out that pipe is a class and a subclass of STREAMISH. What about PROCESS that’s a class too and it has slots (and accessors) for things like PROCESS-OUTPUT. So what is a good plan for how to figure out what to do next? Here’s a suggestion:
Spawn a long running process (like cat foo.txt -) with :output :stream :input :pipe say
Inspect the result (C-c C-v TAB)
Hopefully it’s an instance of PROCESS. What is it’s output? Inspect that
Hopefully the output is a Gray stream (ASYNC-STREAM). Get it into your repl and see what happens if you try to read from it?
And what about the input? See what type that has and what you can do with it
The above is all speculation. I’ve not tried running any of this but you should. Alternatively go look at the source code for the library. It’s already on your computer and if you can’t find it it’s on GitHub. There are only about half a dozen source files and they’re all small. Just read them and see what you can learn. Or go to the symbol you want to know about and hit M-. to jump straight to its definition. Then read the code. Then see if you can figure out what to do.
I found the answer in the test suite. The output stream can only be processed asynchronously via a read call-back. The following is simple example for posterity
(as:start-event-loop
(lambda ()
(let ((bytes (make-array 0 :element-type '(unsigned-byte 8))))
(as:spawn "./test.sh" '()
:exit-cb (lambda (proc exit-status term-signal)
(declare (ignore proc exit-status term-signal))
(format t "proc output:~%~a"
(babel:octets-to-string bytes)))
:output (list :stream
:read-cb (lambda (pipe stream)
(declare (ignore pipe))
(let ((buf (make-array 128 :element-type '(unsigned-byte 8))))
(loop for n = (read-sequence buf stream)
while (plusp n) do
(setf bytes
(concatenate '(vector (unsigned-byte 8))
bytes
(subseq buf 0 n)))))))))))
with
$ cat test.sh
#!/bin/bash
sleep_time=$((1+$RANDOM%10))
echo "Process $$ will sleep for $sleep_time"
sleep $sleep_time
echo "Process $$ exiting"
yields the expected output
I'm currently designing a program in which part of the program files run on a Raspberry Pi and the other part runs on my computer.
To communicate between them I send messages over TCP/IP.
So to read incoming messages, I use (read port). Then I do some calculations and send the answer back.
Now I noticed that when the answer is a number I don't receive that answer on the other side (I don't know if it is because it's a number or not, I assume it is). Although it has been sent. And afterwards it causes incorrect reads (I suppose because it's still in the buffer).
So this is how I send messages :
#lang racket
; Not important
(require (rename-in racket/tcp
(tcp-connect racket-tcp-connect)
(tcp-listen racket-tcp-listen)))
(define (tcp-connect adress port)
(let-values ([(port-in port-out) (racket-tcp-connect adress port)])
(cons port-in port-out)))
;;;;
(define ports (tcp-connect "localhost" 6667))
(define in (car ports))
(define out (cdr ports))
(define (send destination message expectAnswer? . arguments)
(write-byte destination out) ; Send the destination (is a number < 256) (so that I know on the other side which object I have to send the message to).
(newline out) ; I noticed that if I don't do this, sometimes the message won't be sent.
(write message out)
(newline out)
(write arguments out)
(newline out)
(write expectAnswer? out)
(newline out)
(flush-output out)
(display "destination : ") (display destination) (newline)
(display "Message : ") (display message) (newline)
(display "Arguments : ") (display arguments) (newline)
(display "Expects an answer? ") (display expectAnswer?) (newline)
(when expectAnswer?
(let ((answer (read in)))
(if (eof-object? answer)
'CC ; CC = Connection Closed
(begin (display "Answer : ")(display answer)(newline)(newline)
answer)))))
And this is how I read incoming messages (on the Raspberry Pi) and send an answer back :
#lang racket
; Not important
(require (rename-in racket/tcp
(tcp-listen racket-tcp-listen)
(tcp-accept racket-tcp-accept)))
(define (tcp-accept port)
(let-values ([(port-in port-out) (racket-tcp-accept (racket-tcp-listen port))])
(cons port-in port-out)))
;;;;
(define ports (tcp-accept 6667))
(define in (car ports))
(define out (cdr ports))
(define (executeMessage destination message argumentList expectAnswer?)
(let ((destinationObject (decode destination)) ; This is the object that corresponds to the number we received
(answer '()))
(if (null? argumentList)
(set! answer (destinationObject message))
(set! answer (apply (destinationobject message) argumentList)))
(display "Destination : ")(display destination)(newline)
(display "Message : ")(display message)(newline)
(display "Arguments : ")(display argumentList)(newline)
(display "Expects answer? ")(display expectAnswer?) (newline)
(display "Answer : ")(display answer)(newline)(newline)
; We send the answer back if it is needed.
(when expectAnswer?
(write answer out)
(newline out) ; Because I noticed that if I don't to this, it won't be sent.
(flush-output out))))
; We call this function to skip the newlines that are send "(newline out)"
(define (skipNewline)
(read-byte in))
(define (listenForMessages)
(when (char-ready? in) ; Could be omitted.
; A message was sent
(let ((destination (read-byte in))
(message (begin (skipNewline) (read in)))
(argumentList (begin (skipNewline) (read in)))
(expectAnswer? (begin (skipNewline) (read in))))
(skipNewline)
(executeMessage destination message argumentList expectAnswer?)))
(listenForMessages))
(listenForMessages)
When running the program I see a bunch of messages being sent and answered correctly.
But then I see a message which expects an answer and doesn't get one.
This is what is displayed on the raspberry pi :
Destination : 2
Message : getStationHoogte
Arguments : '()
Expects answer? #t
Answer : 15
So the message was executed and the result was 15 (I checked it and that's the result it was supposed to produce, so I'm happy so far).
Notice that the display of Answer : ... happens just before sending the answer.
But on my computer I read this :
Destination : 2
Message : getStationHoogte
Arguments : ()
Expects answer? #t
Answer :
What I find really really strange is that the answer is nothing?
How is that possible? I use "read" for reading incoming answers, that's a blocking operation. How can it be that it detects an answer (I would suppose 15 in this example) (because it stops blocking) and yet produce "nothing".
What could be the reason of this behaviour? What could be the reason a message (in this case a number) isn't send?
Although I can't tell from what you posted what the exact problem is, I have a couple suggestions:
You can use define-values with the result of tcp-connect, like so:
(define-values (in out) (tcp-connect "localhost" 6667))
It might be simpler and more reliable for each message to be a single write and read. To do so, simply put all the values inside a list (or maybe a #:prefab struct). You can use match to easily extract the elements again. For example something like this (which I haven't run/tested):
(define (send destination message expect-answer? . arguments)
(write (list destination message expect-answer? arguments)
out)
(newline out) ;do you actually need this?
(flush-output out) ;you definitely do want this!
(when expect-answer?
(match (read in)
[(? eof-object?) 'CC] ; CC = Connection Closed
[answer (printf "Answer : ~a\n" answer)])))
(define (listen-for-messages)
(match (read in)
[(? eof-object?) 'CC]
[(list destination message expect-answer? arguments)
(execute-message destination message arguments expect-answer?)
(listen-for-messages)]))
Update about newlines:
Now that you're writeing and reading s-expressions (lists), newlines aren't needed to separate messages -- parentheses now serve that role instead.
What does matter is buffering -- ergo flush-output. And be sure to use it in whatever code runs when expect-answer? is #t, too.
By the way, you can change the buffering mode for some kinds of ports (including TCP ports) with file-stream-buffer-mode. Probably it was 'block by default and that's why you needed newlines, before. It might have worked if you'd changed the mode to 'line, instead. But now that you're using s-expressions I don't think it should matter. You should just use flush-output after each message (or answer) is sent.
Replace (display answer) with (write answer) to see what is printed.
I'm trying to build a lazy seq which pulls its data from aws S3 as needed, (via the amazonica library). I've got the following code which almost does what I want, but makes one more network call than needed. (If there is more data available, it always realizes one more recursive call)
Edit: Thanks Alex, for pointing out that my println was in a place where it would be called even if the network call wasn't realized. This code performs as desired now. So that just leaves the question is there a better way to do it?
(defn chunked-list-objects-seq
"Returns a listing of objects in a bucket, with given prefix. These
are lazily chunked, to avoid unneeded network calls.
opts are :bucket-name :prefix :next-marker"
[cred opts]
(lazy-seq
(let [response (s3/list-objects cred opts)
chunk-size (count (:object-summaries response))]
(println "pulling from network")
(chunk-cons
(let [buffer (chunk-buffer chunk-size)]
(dotimes [i chunk-size]
(chunk-append buffer (nth (:object-summaries response) i)))
(chunk buffer))
(if (:truncated? response)
(chunked-list-objects-seq cred (assoc opts :next-marker (:next-marker response)))
nil)))))
Above code was adapted from "Clojure High Performance Programming" pg. 28 (custom chunking)
Calling it looks like this:
user> (time (pprint (count (take 990 (chunked-list-objects-seq cred {:bucket-name "bucket-name" :prefix "path-prefix/"})))))
=> pulling from network
990
"Elapsed time: 2009.723 msecs"
(AWS seems to like returning 1k chunks, when there are more than 1k items in a bucket)
There are certainly other ways to do this, (an atom & future implementation comes to mind), but this seems to fit the interface of a seq the best.
So basically, can this code be fixed to not make unnecessary network calls, and is this a good way to do this?
I think that making a lazy sequence with chunking that fetches blocks of data over the network is a perfectly reasonable approach - with the only caveat being that extra care is needed if the S3 client code happens to rely on having any dynamic bindings set.
Your initial code had the println outside the call to set up the lazy-seq to fetch the next block of data, so you were seeing the message printed regardless of whether the next block was actually fetched. Putting the println closer to the call to list-objects will give you a better idea of when the network request is made.