How to read utf-8 string by usocket - networking

When I was reading from a usocket stream using the code below:
(let ((stream (socket-stream sk)) line)
(loop for line = (read-line stream)
while line do (format t line)))
when read-line meets an non-ascii charactor, it throw out an exception:
decoding error on stream
#<SB-SYS:FD-STREAM
for "socket 118.229.141.195:52946, peer: 119.75.217.109..."
{BCA02F1}>
(:EXTERNAL-FORMAT :UTF-8):
the octet sequence (176) cannot be decoded.
[Condition of type SB-INT:STREAM-DECODING-ERROR]
Neither read-line nor read-byte works, so I tried to use trivial-utf-8 to read utf-8 string
using read-utf-8-string, but It only accepts a binary stream, it seems socket-stream does not create a binary stream, so I was confused how to read from a socket stream that has non-ascii charactors?

You can first read-sequence (if you know the length ahead of time) or read-bytes while there are some, and then convert them to string with (babel:octets-to-string octets :encoding :utf-8)) (where octets is (make-array expected-length :element-type '(unsigned-byte 8))).

The error you're getting indicates that the data you're trying to read is not actually valid UTF-8 data. Indeed, 176 (= #b10110000) is not a byte that can introduce a UTF-8 character. If the data you're trying to read is in some other encoding, try adjusting your Lisp compiler's external format setting accordingly or using Babel or FLEXI-STREAMS to decode the data.

Once I needed it and I was lazy to look for a library to do it, so I did it myself :) It may not be the best way, but I only needed something for a fast and not complicated, so here it goes:
(defun read-utf8-char (stream)
(loop for i from 7 downto 0
with first-byte = (read-byte stream nil 0)
do (when (= first-byte 0) (return +null+))
do (when (or (not (logbitp i first-byte)) (= i 0))
(setf first-byte (logand first-byte (- (ash 1 i) 1)))
(return
(code-char
(dotimes (a (- 6 i) first-byte)
(setf first-byte
(+ (ash first-byte 6)
(logand (read-byte stream) #x3F)))))))))

Related

Why can I modify a string defined using defparameter?

I tried this in SBCL 2.0.1:
(let ((s "Tom's house"))
(setf (subseq s 0 5) "Cat")
s)
I got a warning:
; in: LET ((S "Tom's house"))
; (SETF (SUBSEQ S 0 5) "Cat")
; --> LET*
; ==>
; (REPLACE #:SEQUENCE #:NEW1 :START1 0 :END1 5)
;
; caught WARNING:
; Destructive function REPLACE called on constant data: "Tom's house"
; See also:
; The ANSI Standard, Special Operator QUOTE
; The ANSI Standard, Section 3.2.2.3
;
; compilation unit finished
; caught 1 WARNING condition
But when I tried the code below, I get no warnings or errors. Why can I modify a string defined using defparameter (or defvar) but not a string defined using let?
(defparameter *s* "Tom's house")
(setf (subseq *s* 0 3) "Cat")
As mentioned in the comments: you are not allowed to modify literal objects, and specifically, from the definition of quote
The consequences are undefined if literal objects (including quoted objects) are destructively modified.
What that means is 'don't do this in conforming programs'. What it does not mean is 'the system is required to prevent you from doing this'.
In particular it should be clear that a system which did prevent you doing this would have either to allocate all literals in some special area of memory so that memory protection can deal with the problem, or have a whole series of secret paired mutable/immutable types for objects which can be literals (or perhaps a 'mutable' bit in the tags of objects). The latter is what, I think, languages like Racket do: they have mutable and immutable strings, for instance.
Requiring the implementation to check this is requiring strategies which may be very hard, and some of which may not even always be possible (for instance the special-memory-area trick assumes that the architecture supports read-only bits on memory pages, which is not something the language should assume). So the language spec just says 'the consequences are undefined'.
However, quite clearly, there are cases where a smart compiler can detect some obviously bogus code. One is this:
(let ((x "literal string"))
... do not assign to x ...
(setf (char x 0) ...)
...)
A smart compiler (particularly one that is doing fancy type inference) can easily see that the value of x you are mutating is a literal string and can warn you about that at compile time and/or raise an exception at run-time.
Compare that with your second example:
(defparameter *x* "a literal string")
...
(setf (char *x* 0) ...)
In order to deal with this, the compiler would have to prove that *x* actually was still a literal string at the point where you tried to mutate its value. Doing that requires some kind of whole-program analysis: it needs to know everything that happened in between the definition of *x* and the assignment. And while that is, perhaps, sometimes possible – for instance, the code is in a file being compiled you are using defparameter (because defvar won't work!) and there is nothing between the definition and the mutation – it certainly is not always possible.
So what you are seeing is that the SBCL compiler is successfully detecting some cases, but not all. And that's fine: that's better than not detecting any.
I have tried running the above code in the REPL of SBCL. I confirm the above observations.
The real underlying problem comes from replace, as the error message suggests.
; doesn't work
(let ((s "Tom's house"))
(setf (subseq s 0 5) "Cat")
s)
; does work
(defparameter *s* "Tom's house")
(setf (subseq *s* 0 3) "Cat")
; also works
(replace "Tom's house" "Cat")
So:
We can change literal strings without complaint or warning
The problem appears to come from let
Exploring some more:
(defun myfunc ()
(let ((s "Tom's house"))
s))
The disassembly is (disassemble 'myfunc):
; disassembly for MYFUNC
; Size: 30 bytes. Origin: #x1003CA1DD3 ;MYFUNC
; D3: 498B4510 MOV RAX, [R13+16]
; D7: 488945F8 MOV [RBP-8], RAX
; DB: 840425F8FF1020 TEST AL, [#x2010FFF8] ;
; E2: 488B15B7FFFFFF MOV RDX, [RIP-73] ; "Tom's house"
; E9: 488BE5 MOV RSP, RBP
; EC: F8 CLC
; ED: 5D POP RBP
; EE: C3 RET
; EF: CC10 INT3 16 ; Invalid argument count trap
NIL
The string literal appears to be held on the heap, and pointed to, rather than being on the stack.
I guess that let doesn't like its values being changed like that.

Pointer to a pointer to a foreign string in Common Lisp

Using SBCL, I am trying to call a GStreamer function with this signature:
void gst_init (int *argc, char **argv[]);
so I wrote this interface code (simplified) based on what I had seen here:
(cffi:defcfun gst-init :VOID
(argc :POINTER :INT)
(argv :POINTER :STRING))
(defun start-gstreamer ()
(cffi:with-foreign-object (argc :INT)
(setf (cffi:mem-ref argc :INT) 1)
(cffi:with-foreign-string (options "foo ")
(cffi:with-foreign-object (poptions :POINTER)
(setf (cffi:mem-ref poptions :POINTER) options)
(gst-init argc poptions)))))
But when I run it I get a "memory fault" referencing an address that turns out to be an ASCII string of " oof", the reverse of the original string. It seems I need yet another level of indirection. Or maybe the defcfun is wrong. How do I accomplish this?

How to write a 2D byte array to a binary file in Common Lisp?

I guess this is an easy question for someone with Common Lisp experience. Not so much for me, who just started out with LISP.
As you see in the next snippet below, I create a 800 by 600 array of type UNSIGNED BYTE.
(defun test-binary-save ()
(let*
((width 800)
(height 600)
(arr (make-array (list width height)
:element-type '(mod 256)
:initial-element 0)))
(utilities::save-array-as-pgm "test.pgm" arr)))
And the function in my utilities package is supposed to write that in P5 PGM format do disk.
(defun save-array-as-pgm (filename buffer)
"Writes a byte array as a PGM file (P5) to a file."
(with-open-file
(stream filename
:element-type '(unsigned-byte 8)
:direction :output
:if-does-not-exist :create
:if-exists :supersede)
(let*
((dimensions (array-dimensions buffer))
(width (first dimensions))
(height (second dimensions))
(header (format nil "P5~A~D ~D~A255~A"
#\newline
width height #\newline
#\newline)))
(loop
:for char :across header
:do (write-byte (char-code char) stream))
;(write-sequence buffer stream) <<-- DOES NOT WORK - is not of type SEQUENCE
))
filename)
The equivalent (and working) C-function which does the same thing looks like this.
static
int
save_pgm
( const char* filename
, size_t width
, size_t height
, const uint8_t* pixels
)
{
if(NULL == filename)
return 0;
if(NULL == pixels)
return 0;
FILE *out = fopen(filename, "wb");
if(NULL != out)
{
fprintf(out, "P5\n%zu %zu\n255\n", width, height);
size_t nbytes = width * height;
fwrite(pixels,1,nbytes,out);
fclose(out);
return 1;
}
return 0;
}
Who can tell me how to fix my save-array-as-pgm function, preferably with writing the array in one go, instead of using a loop and (write-byte (aref buffer y x) stream)?
Before I decided to ask this question here, I googled a lot and only found references to some packages which do fancy binary stuff - but this is a simple case and I look for a simple solution.
Common Lisp supports displaced arrays:
CL-USER 6 > (let ((array (make-array (list 3 4)
:initial-element 1
:element-type 'bit)))
(make-array (reduce #'* (array-dimensions array))
:element-type 'bit
:displaced-to array))
#*111111111111
A displaced array has no storage on its own, but uses the storage of another array. It can have different dimensions.
Now there is the question how efficiently the Lisp implementation can access the array through the displaced array.
If you want do do serious bit pushing in Common Lisp, use 1 dimensional arrays.
Looks like it was not that difficult after all... once I found that it is possible to "cast" a 2D-array to a 1D array and then simply use write-sequence.
To find the solution I had to inspect the sbcl source code on github to get a grasp on how make-array works and find the - sbcl - specific function array-storage-vector.
As I had guessed, multi-dimensional arrays use a 1d backing array for data storage.
The function save-array-as-pgm now looks like this:
(defun save-array-as-pgm (filename buffer)
"Writes a byte array as a PGM file (P5) to a file."
(with-open-file
(stream filename
:element-type '(unsigned-byte 8)
:direction :output
:if-does-not-exist :create
:if-exists :supersede)
(let*
((dimensions (array-dimensions buffer))
(width (first dimensions))
(height (second dimensions))
(header (format nil "P5~A~D ~D~A255~A"
#\newline
width height #\newline
#\newline)))
(loop
:for char :across header
:do (write-byte (char-code char) stream))
(write-sequence (sb-c::array-storage-vector buffer) stream)
))
filename)

Get Thread ID in SBCL

I am working with Lisp sb-thread package. When I use *current-thread* to get the thread id, the result of the last evaluated expression is also returned with the thread id. I only need the thread id for my program.
SBCL has setf-able thread names, not IDs:
(sb-thread:thread-name SB-THREAD:*CURRENT-THREAD*)
==> "main thread"
What do you need the ID for?
There are situations in life when you desperately need the thread's ID, for example:
https://bugs.launchpad.net/sbcl/+bug/1751562
"I had a situation where on SBCL on Linux, I had multiple worker
threads on my machine, and one of them was taking 100% of my CPU. I
wanted to retrieve the thread object of the offending thread, but this
turned out to be non-trivial."
(defun thread-real-id ()
(sb-alien:alien-funcall
(sb-alien:extern-alien "syscall"
;; sb-alien:unsigned is the return value's type and int is the parameter's type
(function sb-alien:unsigned int))
;; if on your system it returns 0xFFFFFFFF then try 186 instead of 224
;; or check the right gettid syscall value for your system.
224))
(ql:quickload :bordeaux-threads)
(let ((top-level *standard-output*))
(bt:make-thread (lambda()
(format top-level "my id is ~A~%" (thread-real-id)))))
; output:
my id is 657
In the man pages syscall function's prototype is long syscall(long number, ...) so the correct type for both the parameter and the ret value is LONG, but I'm not sure how to specify it using the sb-alien: types
I checked out the sbcl's src:
find . -name "*alien*" -exec echo {} \; -exec grep define-alien-type-tr {} \;
search results:
./host-alieneval.lisp
(defun %define-alien-type-translator (name translator)
(define-alien-type-translator system-area-pointer ()
(define-alien-type-translator signed (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator integer (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator unsigned (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator boolean (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator enum (&whole
(define-alien-type-translator single-float ()
(define-alien-type-translator double-float ()
(define-alien-type-translator * (to &environment env)
(define-alien-type-translator array (ele-type &rest dims &environment env)
(define-alien-type-translator struct (name &rest fields &environment env)
(define-alien-type-translator union (name &rest fields &environment env)
(define-alien-type-translator function (result-type &rest arg-types
(define-alien-type-translator values (&rest values &environment env)
# not sure which of them is the type for LONG
To see the threads you can use this command:
ps -To pid,tid -p `pidof sbcl`
; if you need the PID do (sb-posix:getpid) or call the "getpid" alien:
(sb-alien:alien-funcall
(sb-alien:extern-alien "getpid"
(function sb-alien:unsigned)) )
If you are on Windows you could use the following code ( taken from here: https://www.linux.org.ru/forum/development/11998951 )
#|
typedef struct pthread_thread {
pthread_fn start_routine;
void* arg;
HANDLE handle;
...
}
|#
(defun get-thread-handle (thread)
"Retrieves WIN32 thread HANDLE from SBCL thread"
(declare (type sb-thread:thread thread))
(let* ((pthread-pointer
(sb-sys:int-sap (sb-thread::thread-os-thread thread)))
(pthread-alien
(sb-alien:sap-alien
pthread-pointer (sb-alien:struct nil
(start-addr (* t))
(arg (* t))
(handle (* t))))))
(sb-alien:alien-sap (sb-alien:slot pthread-alien 'handle))))
(defun get-thread-id (thread)
"Retrieves WIN32 thread ID from SBCL thread"
(declare (type sb-thread:thread thread))
(sb-alien:alien-funcall
(sb-alien:extern-alien "GetThreadId" (function sb-alien:unsigned
(* t)))
(get-thread-handle thread)))
(get-thread-id sb-thread:*current-thread*) ; ==> 62
Update It turns out that the windows code above can also be reduced to a one liner:
(sb-alien:alien-funcall (sb-alien:extern-alien "GetCurrentThreadId" (function sb-alien:unsigned)))

Appropriate error handling in initializing a data structure with explicit resource management?

When initializing a data structure, or object, which has sub objects requiring explicit release process after used, how should I handle an error during the initialization process?
Let me take an example, initializing an OBJECT object with SUBOBJ1 and SUBOBJ2 slots to be set foreign pointers to int values:
(defun init-object ()
(let ((obj (make-object)))
(setf (subobj1 obj) (cffi:foreign-alloc :int)
(subobj2 obj) (cffi:foreign-alloc :int))
obj))
If we had an error in FOREIGN-ALLOCing for SUBOBJ2 slot, we should do FOREIGN-FREEing for SUBOBJ1 slot to avoid memory leaking.
As an idea, I can write as below:
(defun init-object ()
(let ((obj (make-object)))
(handler-case
(setf (subobj1 obj) (cffi:foreign-alloc :int)
(subobj2 obj) (cffi:foreign-alloc :int))
(condition (c) ; forcedly handling all conditions
(when (subobj1 obj) (cffi:foreign-free (subobj1 obj)))
(error c))) ; invoke the condition c again explicitly
obj))
Do you have any better idea, or generally idiomatic pattern?
Thanks
Following the answers, I add a code using UNWIND-PROTECT. It would not work because the deallocating forms run even if all allocation are completed successfully.
(defun init-object ()
(let ((obj (make-object)))
(unwind-protect
(progn
(setf (subobj1 obj) (cffi:foreign-alloc :int)
(subobj2 obj) (cffi:foreign-alloc :int))
obj)
; foreign pointers freed even when successfully initialized
(when (subobj2 obj) (cffi:foreign-free (subobj2 obj)))
(when (subobj1 obj) (cffi:foreign-free (subobj1 obj))))))
Use UNWIND-PROTECT. When the error causes an exit out of the scope, unwind-protect allows you to force the execution of clean-up forms.
Something like this:
(defun init-object ()
(let ((obj (make-object)))
(unwind-protect
(setf (subobj1 obj) (cffi:foreign-alloc :int)
(subobj2 obj) (cffi:foreign-alloc :int))
(unless (and (subobj2 obj) (subobj1 obj))
(when (subobj1 obj) (cffi:foreign-free (subobj1 obj)))
(when (subobj2 obj) (cffi:foreign-free (subobj2 obj)))))
obj))
Use whatever is available to detect if a slot is bound or not. Above assumes that a non-initialized slot has a value of NIL.
Common Lisp has the facilities that correspond to today's languages (e.g. Java, C#) exception and resource management statements, such as try with catch and/or finally.
The try-catch in Common Lisp is achieved with handler-case, as you have in your code. It's possible to simply resignal the same error back, but you won't then catch the error on the debugger where it actually happened. Java includes the stacktrace of the exception when it's created. C# includes the stacktrace of the exception when it's thrown. In any case, I think both have ways to throw a new exception with an inner exception, so you can get to the original stacktrace.
The try-finally in Common Lisp is achieved with unwind-protect. The first form is executed normally, and the rest are executed unconditionally whether the first form returns normally or not.
Common Lisp has a facility which allows running code at the point where an error is signaled, which is handler-bind. The main difference regarding handler-case is that it doesn't rewind the stack and it doesn't prevent the error from popping up to other handlers or the debugger, if no handler exited non-locally.
Thus, you'd use something like this:
(defun init-object ()
(let ((obj (make-object)))
(handler-bind
(;; forcedly handling all conditions
(condition #'(lambda (c)
(declare (ignore c))
(when (subobj1 obj) (cffi:foreign-free (subobj1 obj)))
;; return normally, allowing the condition to go up the handler chain
;; and possibly to the debugger, if none exits non-locally
)))
(setf (subobj1 obj) (cffi:foreign-alloc :int)
(subobj2 obj) (cffi:foreign-alloc :int)))
obj))
I advise you against matching with condition, as all conditions inherit from it, such as storage-condition. You may not want to do anything on conditions you can't or won't possibly recover from.
Just for reference, the full try-catch-finally clause in Common Lisp is achieved with unwind-protect around handler-case:
(unwind-protect
(handler-case
(do-something)
(error-type-1 ()
(foo))
(error-type-2 (e)
(bar e)))
(cleanup-form-1)
(cleanup-form-2))
There have been suggestions to use UNWIND-PROTECT. That is the idiomatic way to deal with resource allocation. However, if your goal here is to deallocate resources on error, but return those resources if everything succeeded, you can use the something like the following:
(defun init-object ()
(let ((obj (create-object)))
(handler-case
(progn
(setf (subobj1 obj) (cffi:foreign-alloc :int))
(setf (subobj2 obj) (cffi:foreign-alloc :int))
obj)
(error (condition)
(free-object obj)
;; Re-throw the error up in the call chain
(error condition)))))
(defun free-object (obj)
(when (subobj2 obj) (cffi:foreign-free (subobj2 obj)))
(when (subobj1 obj) (cffi:foreign-free (subobj1 obj))))
Another way of achieving the same thing is to have a check that verifies that the end of the function has been reached, and free the object if not. However, I don't really like that style since it doesn't really show very well exactly what is going on.
Note, however, that when you use the function INIT-OBJECT you need to enclose it in UNWIND-PROTECT. Otherwise you will be leaking resources once the object returned by the function is GC'ed.
The way to do this is is to always do the following when using the function:
(let ((obj (init-object)))
(unwind-protect
... use object here ...
(free-object obj)))
Another solution is to free the object when it's being GC'ed. There is no standard way of doing it, but the necessary functionality is abstracted in the TRIVIAL-GARBAGE:FINALIZE function.
I second Rainer's suggestion:
I wrap an unwind-protect form in a macro and check if the initialization succeeded in the protected clause.

Resources