Why can I modify a string defined using defparameter? - common-lisp

I tried this in SBCL 2.0.1:
(let ((s "Tom's house"))
(setf (subseq s 0 5) "Cat")
s)
I got a warning:
; in: LET ((S "Tom's house"))
; (SETF (SUBSEQ S 0 5) "Cat")
; --> LET*
; ==>
; (REPLACE #:SEQUENCE #:NEW1 :START1 0 :END1 5)
;
; caught WARNING:
; Destructive function REPLACE called on constant data: "Tom's house"
; See also:
; The ANSI Standard, Special Operator QUOTE
; The ANSI Standard, Section 3.2.2.3
;
; compilation unit finished
; caught 1 WARNING condition
But when I tried the code below, I get no warnings or errors. Why can I modify a string defined using defparameter (or defvar) but not a string defined using let?
(defparameter *s* "Tom's house")
(setf (subseq *s* 0 3) "Cat")

As mentioned in the comments: you are not allowed to modify literal objects, and specifically, from the definition of quote
The consequences are undefined if literal objects (including quoted objects) are destructively modified.
What that means is 'don't do this in conforming programs'. What it does not mean is 'the system is required to prevent you from doing this'.
In particular it should be clear that a system which did prevent you doing this would have either to allocate all literals in some special area of memory so that memory protection can deal with the problem, or have a whole series of secret paired mutable/immutable types for objects which can be literals (or perhaps a 'mutable' bit in the tags of objects). The latter is what, I think, languages like Racket do: they have mutable and immutable strings, for instance.
Requiring the implementation to check this is requiring strategies which may be very hard, and some of which may not even always be possible (for instance the special-memory-area trick assumes that the architecture supports read-only bits on memory pages, which is not something the language should assume). So the language spec just says 'the consequences are undefined'.
However, quite clearly, there are cases where a smart compiler can detect some obviously bogus code. One is this:
(let ((x "literal string"))
... do not assign to x ...
(setf (char x 0) ...)
...)
A smart compiler (particularly one that is doing fancy type inference) can easily see that the value of x you are mutating is a literal string and can warn you about that at compile time and/or raise an exception at run-time.
Compare that with your second example:
(defparameter *x* "a literal string")
...
(setf (char *x* 0) ...)
In order to deal with this, the compiler would have to prove that *x* actually was still a literal string at the point where you tried to mutate its value. Doing that requires some kind of whole-program analysis: it needs to know everything that happened in between the definition of *x* and the assignment. And while that is, perhaps, sometimes possible – for instance, the code is in a file being compiled you are using defparameter (because defvar won't work!) and there is nothing between the definition and the mutation – it certainly is not always possible.
So what you are seeing is that the SBCL compiler is successfully detecting some cases, but not all. And that's fine: that's better than not detecting any.

I have tried running the above code in the REPL of SBCL. I confirm the above observations.
The real underlying problem comes from replace, as the error message suggests.
; doesn't work
(let ((s "Tom's house"))
(setf (subseq s 0 5) "Cat")
s)
; does work
(defparameter *s* "Tom's house")
(setf (subseq *s* 0 3) "Cat")
; also works
(replace "Tom's house" "Cat")
So:
We can change literal strings without complaint or warning
The problem appears to come from let
Exploring some more:
(defun myfunc ()
(let ((s "Tom's house"))
s))
The disassembly is (disassemble 'myfunc):
; disassembly for MYFUNC
; Size: 30 bytes. Origin: #x1003CA1DD3 ;MYFUNC
; D3: 498B4510 MOV RAX, [R13+16]
; D7: 488945F8 MOV [RBP-8], RAX
; DB: 840425F8FF1020 TEST AL, [#x2010FFF8] ;
; E2: 488B15B7FFFFFF MOV RDX, [RIP-73] ; "Tom's house"
; E9: 488BE5 MOV RSP, RBP
; EC: F8 CLC
; ED: 5D POP RBP
; EE: C3 RET
; EF: CC10 INT3 16 ; Invalid argument count trap
NIL
The string literal appears to be held on the heap, and pointed to, rather than being on the stack.
I guess that let doesn't like its values being changed like that.

Related

Return a pointer at a specific position - Assembly

I am a beginner in Assembly and i have a simple question.
This is my code :
BITS 64 ; 64−bit mode
global strchr ; Export 'strchr'
SECTION .text ; Code section
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
mov rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
This compile but doesn't work. I want to reproduce the function strchr as you can see. When I test my function with a printf it crashed ( the problem isn't the test ).
I know I can INC rdi directly to move into the rdi argument and return it at the position I want.
But I just want to know if there is a way to return rdi at the position rcx to fix my code and probably improve it.
Your function strchr seems to expect two parameters:
pointer to a string in RDI, and
pointer to a character in RSI.
Register rcx is used as index inside the string? In this case you should use al instead of cl. Be aware that you don't limit the search size. When the character refered by RSI is not found in the string, it will probably trigger an exception. Perhaps you should test al loaded from [rdi+rcx] and quit further searching when al=0.
If you want it to return pointer to the first occurence of character
inside the string, just
replace mov rax,[rdi+rcx] with lea rax,[rdi+rcx].
Your code (from edit Version 2) does the following:
char* strchr ( char *p, char x ) {
int i = -1;
do {
if ( p[i] == '\0' ) return null;
i++;
} while ( p[i] != x );
return * (long long*) &(p[i]);
}
As #vitsoft says, your intention is to return a pointer, but in the first return (in assembly) is returning a single quad word loaded from the address of the found character, 8 characters instead of an address.
It is unusual to increment in the middle of the loop.  It is also odd to start the index at -1.  On the first iteration, the loop continue condition looks at p[-1], which is not a good idea, since that's not part of the string you're being asked to search.  If that byte happens to be the nul character, it'll stop the search right there.
If you waited to increment until both tests are performed, then you would not be referencing p[-1], and you could also start the index at 0, which would be more usual.
You might consider capturing the character into a register instead of using a complex addressing mode three times.
Further, you could advance the pointer in rdi and forgo the index variable altogether.
Here's that in C:
char* strchr ( char *p, char x ) {
for(;;) {
char c = *p;
if ( c == '\0' )
break;
if ( c == x )
return p;
p++;
}
return null;
}
Thanks to your help, I finally did it !
Thanks to the answer of Erik, i fixed a stupid mistake. I was comparing str[-1] to NULL so it was making an error.
And with the answer of vitsoft i switched mov to lea and it worked !
There is my code :
strchr:
mov rcx, -1
.loop:
inc rcx
cmp byte [rdi+rcx], 0
je exit_null
cmp byte [rdi+rcx], sil
jne .loop
lea rax, [rdi+rcx]
ret
exit_null:
mov rax, 0
ret
The only bug remaining in the current version is loading 8 bytes of char data as the return value instead of just doing pointer math, using mov instead of lea. (After various edits removed and added different bugs, as reflected in different answers talking about different code).
But this is over-complicated as well as inefficient (two loads, and indexed addressing modes, and of course extra instructions to set up RCX).
Just increment the pointer since that's what you want to return anyway.
If you're going to loop 1 byte at a time instead of using SSE2 to check 16 bytes at once, strchr can be as simple as:
;; BITS 64 is useless unless you're writing a kernel with a mix of 32 and 64-bit code
;; otherwise it only lets you shoot yourself in the foot by putting 64-bit machine code in a 32-bit object file by accident.
global mystrchr
mystrchr:
.loop: ; do {
movzx ecx, byte [rdi] ; c = *p;
cmp cl, sil ; if (c == needle) return p;
je .found
inc rdi ; p++
test cl, cl
jnz .loop ; }while(c != 0)
;; fell out of the loop on hitting the 0 terminator without finding a match
xor edi, edi ; p = NULL
; optionally an extra ret here, or just fall through
.found:
mov rax, rdi ; return p
ret
I checked for a match before end-of-string so I'd still have the un-incremented pointer, and not have to decrement it in the "found" return path. If I started the loop with inc, I could use an [rdi - 1] addressing mode, still avoiding a separate counter. That's why I switched up the order of which branch was at the bottom of the loop vs. your code in the question.
Since we want to compare the character twice, against SIL and against zero, I loaded it into a register. This might not run any faster on modern x86-64 which can run 2 loads per clock as well as 2 branches (as long as at most one of them is taken).
Some Intel CPUs can micro-fuse and macro-fuse cmp reg,mem / jcc into a single load+compare-and-branch uop for the front-end, at least when the memory addressing mode is simple, not indexed. But not cmp [mem], imm/jcc, so we're not costing any extra uops for the front-end on Intel CPUs by separately loading into a register. (With movzx to avoid a false dependency from writing a partial register like mov cl, [rdi])
Note that if your caller is also written in assembly, it's easy to return multiple values, e.g. a status and a pointer (in the not-found case, perhaps to the terminating 0 would be useful). Many C standard library string functions are badly designed, notably strcpy, to not help the caller avoid redoing length-finding work.
Especially on modern CPUs with SIMD, explicit lengths are quite useful to have: a real-world strchr implementation would check alignment, or check that the given pointer isn't within 16 bytes of the end of a page. But memchr doesn't have to, if the size is >= 16: it could just do a movdqu load and pcmpeqb.
See Is it safe to read past the end of a buffer within the same page on x86 and x64? for details and a link to glibc strlen's hand-written asm. Also Find the first instance of a character using simd for real-world implementations like glibc's using pcmpeqb / pmovmskb. (And maybe pminub for the 0-terminator check to unroll over multiple vectors.)
SSE2 can go about 16x faster than the code in this answer for non-tiny strings. For very large strings, you might hit a memory bottleneck and "only" be about 8x faster.

Pointer to a pointer to a foreign string in Common Lisp

Using SBCL, I am trying to call a GStreamer function with this signature:
void gst_init (int *argc, char **argv[]);
so I wrote this interface code (simplified) based on what I had seen here:
(cffi:defcfun gst-init :VOID
(argc :POINTER :INT)
(argv :POINTER :STRING))
(defun start-gstreamer ()
(cffi:with-foreign-object (argc :INT)
(setf (cffi:mem-ref argc :INT) 1)
(cffi:with-foreign-string (options "foo ")
(cffi:with-foreign-object (poptions :POINTER)
(setf (cffi:mem-ref poptions :POINTER) options)
(gst-init argc poptions)))))
But when I run it I get a "memory fault" referencing an address that turns out to be an ASCII string of " oof", the reverse of the original string. It seems I need yet another level of indirection. Or maybe the defcfun is wrong. How do I accomplish this?

How to write a 2D byte array to a binary file in Common Lisp?

I guess this is an easy question for someone with Common Lisp experience. Not so much for me, who just started out with LISP.
As you see in the next snippet below, I create a 800 by 600 array of type UNSIGNED BYTE.
(defun test-binary-save ()
(let*
((width 800)
(height 600)
(arr (make-array (list width height)
:element-type '(mod 256)
:initial-element 0)))
(utilities::save-array-as-pgm "test.pgm" arr)))
And the function in my utilities package is supposed to write that in P5 PGM format do disk.
(defun save-array-as-pgm (filename buffer)
"Writes a byte array as a PGM file (P5) to a file."
(with-open-file
(stream filename
:element-type '(unsigned-byte 8)
:direction :output
:if-does-not-exist :create
:if-exists :supersede)
(let*
((dimensions (array-dimensions buffer))
(width (first dimensions))
(height (second dimensions))
(header (format nil "P5~A~D ~D~A255~A"
#\newline
width height #\newline
#\newline)))
(loop
:for char :across header
:do (write-byte (char-code char) stream))
;(write-sequence buffer stream) <<-- DOES NOT WORK - is not of type SEQUENCE
))
filename)
The equivalent (and working) C-function which does the same thing looks like this.
static
int
save_pgm
( const char* filename
, size_t width
, size_t height
, const uint8_t* pixels
)
{
if(NULL == filename)
return 0;
if(NULL == pixels)
return 0;
FILE *out = fopen(filename, "wb");
if(NULL != out)
{
fprintf(out, "P5\n%zu %zu\n255\n", width, height);
size_t nbytes = width * height;
fwrite(pixels,1,nbytes,out);
fclose(out);
return 1;
}
return 0;
}
Who can tell me how to fix my save-array-as-pgm function, preferably with writing the array in one go, instead of using a loop and (write-byte (aref buffer y x) stream)?
Before I decided to ask this question here, I googled a lot and only found references to some packages which do fancy binary stuff - but this is a simple case and I look for a simple solution.
Common Lisp supports displaced arrays:
CL-USER 6 > (let ((array (make-array (list 3 4)
:initial-element 1
:element-type 'bit)))
(make-array (reduce #'* (array-dimensions array))
:element-type 'bit
:displaced-to array))
#*111111111111
A displaced array has no storage on its own, but uses the storage of another array. It can have different dimensions.
Now there is the question how efficiently the Lisp implementation can access the array through the displaced array.
If you want do do serious bit pushing in Common Lisp, use 1 dimensional arrays.
Looks like it was not that difficult after all... once I found that it is possible to "cast" a 2D-array to a 1D array and then simply use write-sequence.
To find the solution I had to inspect the sbcl source code on github to get a grasp on how make-array works and find the - sbcl - specific function array-storage-vector.
As I had guessed, multi-dimensional arrays use a 1d backing array for data storage.
The function save-array-as-pgm now looks like this:
(defun save-array-as-pgm (filename buffer)
"Writes a byte array as a PGM file (P5) to a file."
(with-open-file
(stream filename
:element-type '(unsigned-byte 8)
:direction :output
:if-does-not-exist :create
:if-exists :supersede)
(let*
((dimensions (array-dimensions buffer))
(width (first dimensions))
(height (second dimensions))
(header (format nil "P5~A~D ~D~A255~A"
#\newline
width height #\newline
#\newline)))
(loop
:for char :across header
:do (write-byte (char-code char) stream))
(write-sequence (sb-c::array-storage-vector buffer) stream)
))
filename)

Get Thread ID in SBCL

I am working with Lisp sb-thread package. When I use *current-thread* to get the thread id, the result of the last evaluated expression is also returned with the thread id. I only need the thread id for my program.
SBCL has setf-able thread names, not IDs:
(sb-thread:thread-name SB-THREAD:*CURRENT-THREAD*)
==> "main thread"
What do you need the ID for?
There are situations in life when you desperately need the thread's ID, for example:
https://bugs.launchpad.net/sbcl/+bug/1751562
"I had a situation where on SBCL on Linux, I had multiple worker
threads on my machine, and one of them was taking 100% of my CPU. I
wanted to retrieve the thread object of the offending thread, but this
turned out to be non-trivial."
(defun thread-real-id ()
(sb-alien:alien-funcall
(sb-alien:extern-alien "syscall"
;; sb-alien:unsigned is the return value's type and int is the parameter's type
(function sb-alien:unsigned int))
;; if on your system it returns 0xFFFFFFFF then try 186 instead of 224
;; or check the right gettid syscall value for your system.
224))
(ql:quickload :bordeaux-threads)
(let ((top-level *standard-output*))
(bt:make-thread (lambda()
(format top-level "my id is ~A~%" (thread-real-id)))))
; output:
my id is 657
In the man pages syscall function's prototype is long syscall(long number, ...) so the correct type for both the parameter and the ret value is LONG, but I'm not sure how to specify it using the sb-alien: types
I checked out the sbcl's src:
find . -name "*alien*" -exec echo {} \; -exec grep define-alien-type-tr {} \;
search results:
./host-alieneval.lisp
(defun %define-alien-type-translator (name translator)
(define-alien-type-translator system-area-pointer ()
(define-alien-type-translator signed (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator integer (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator unsigned (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator boolean (&optional (bits sb!vm:n-word-bits))
(define-alien-type-translator enum (&whole
(define-alien-type-translator single-float ()
(define-alien-type-translator double-float ()
(define-alien-type-translator * (to &environment env)
(define-alien-type-translator array (ele-type &rest dims &environment env)
(define-alien-type-translator struct (name &rest fields &environment env)
(define-alien-type-translator union (name &rest fields &environment env)
(define-alien-type-translator function (result-type &rest arg-types
(define-alien-type-translator values (&rest values &environment env)
# not sure which of them is the type for LONG
To see the threads you can use this command:
ps -To pid,tid -p `pidof sbcl`
; if you need the PID do (sb-posix:getpid) or call the "getpid" alien:
(sb-alien:alien-funcall
(sb-alien:extern-alien "getpid"
(function sb-alien:unsigned)) )
If you are on Windows you could use the following code ( taken from here: https://www.linux.org.ru/forum/development/11998951 )
#|
typedef struct pthread_thread {
pthread_fn start_routine;
void* arg;
HANDLE handle;
...
}
|#
(defun get-thread-handle (thread)
"Retrieves WIN32 thread HANDLE from SBCL thread"
(declare (type sb-thread:thread thread))
(let* ((pthread-pointer
(sb-sys:int-sap (sb-thread::thread-os-thread thread)))
(pthread-alien
(sb-alien:sap-alien
pthread-pointer (sb-alien:struct nil
(start-addr (* t))
(arg (* t))
(handle (* t))))))
(sb-alien:alien-sap (sb-alien:slot pthread-alien 'handle))))
(defun get-thread-id (thread)
"Retrieves WIN32 thread ID from SBCL thread"
(declare (type sb-thread:thread thread))
(sb-alien:alien-funcall
(sb-alien:extern-alien "GetThreadId" (function sb-alien:unsigned
(* t)))
(get-thread-handle thread)))
(get-thread-id sb-thread:*current-thread*) ; ==> 62
Update It turns out that the windows code above can also be reduced to a one liner:
(sb-alien:alien-funcall (sb-alien:extern-alien "GetCurrentThreadId" (function sb-alien:unsigned)))

How to read utf-8 string by usocket

When I was reading from a usocket stream using the code below:
(let ((stream (socket-stream sk)) line)
(loop for line = (read-line stream)
while line do (format t line)))
when read-line meets an non-ascii charactor, it throw out an exception:
decoding error on stream
#<SB-SYS:FD-STREAM
for "socket 118.229.141.195:52946, peer: 119.75.217.109..."
{BCA02F1}>
(:EXTERNAL-FORMAT :UTF-8):
the octet sequence (176) cannot be decoded.
[Condition of type SB-INT:STREAM-DECODING-ERROR]
Neither read-line nor read-byte works, so I tried to use trivial-utf-8 to read utf-8 string
using read-utf-8-string, but It only accepts a binary stream, it seems socket-stream does not create a binary stream, so I was confused how to read from a socket stream that has non-ascii charactors?
You can first read-sequence (if you know the length ahead of time) or read-bytes while there are some, and then convert them to string with (babel:octets-to-string octets :encoding :utf-8)) (where octets is (make-array expected-length :element-type '(unsigned-byte 8))).
The error you're getting indicates that the data you're trying to read is not actually valid UTF-8 data. Indeed, 176 (= #b10110000) is not a byte that can introduce a UTF-8 character. If the data you're trying to read is in some other encoding, try adjusting your Lisp compiler's external format setting accordingly or using Babel or FLEXI-STREAMS to decode the data.
Once I needed it and I was lazy to look for a library to do it, so I did it myself :) It may not be the best way, but I only needed something for a fast and not complicated, so here it goes:
(defun read-utf8-char (stream)
(loop for i from 7 downto 0
with first-byte = (read-byte stream nil 0)
do (when (= first-byte 0) (return +null+))
do (when (or (not (logbitp i first-byte)) (= i 0))
(setf first-byte (logand first-byte (- (ash 1 i) 1)))
(return
(code-char
(dotimes (a (- 6 i) first-byte)
(setf first-byte
(+ (ash first-byte 6)
(logand (read-byte stream) #x3F)))))))))

Resources