Output block size with AES - encryption

I'm trying to use the Racket Crypto library to encrypt blocks of 16 bytes with a 16-byte key. I'm expecting to have a 16-bytes output block but I get a 32-byte one.
A 15-byte input block give a 16-bit output.
#lang racket
(require (planet vyzo/crypto))
(bytes-length (encrypt cipher:aes-128-ecb
(string->bytes/latin-1 "0123456789ABCDEF") ; 16-byte key
(make-bytes 16) ; IV
(string->bytes/latin-1 "0123456789ABCDEF"))) ; 16-byte data
; -> 32
(bytes-length (encrypt cipher:aes-128-ecb
(string->bytes/latin-1 "0123456789ABCDEF") ; 16-byte key
(make-bytes 16)
(string->bytes/latin-1 "0123456789ABCDE"))) ; 15-byte data
; -> 16
Am I wrong somewhere? Is this due to padding?
Note: I'm aware of the problems with ECB mode, My goal is to implement CBC mode.

You're right, it's because of the padding. Unfortunately the API of the vyzo/crypto lib doesn't allow you to easily disable padding (and rightly so, see Caveat below).
How to disable the padding
However, based on this Thread on the Racket users mailing list, you could disable the padding like this:
#lang racket
(require (planet vyzo/crypto) (planet vyzo/crypto/util))
(define (cipher-encrypt-unpadded type key iv)
(lambda (ptext)
(let ((octx (cipher-encrypt type key iv #:padding #f)))
(bytes-append (cipher-update! octx ptext)
(cipher-final! octx)))))
(define (cipher-decrypt-unpadded type key iv)
(lambda (ctext)
(let ((ictx (cipher-decrypt type key iv #:padding #f)))
(bytes-append (cipher-update! ictx ctext)
(cipher-final! ictx)))))
; bytes-> bytes
; convenience function for encryption
(define enc-aes-128-ecb-unpadded
(cipher-encrypt-unpadded cipher:aes-128-ecb
(string->bytes/latin-1 "0123456789ABCDEF"); 16-byte key
(make-bytes 16)))
; bytes -> bytes
; convenience function for decryption
(define dec-aes-128-ecb-unpadded
(cipher-decrypt-unpadded cipher:aes-128-ecb
(string->bytes/latin-1 "0123456789ABCDEF"); 16-byte key
(make-bytes 16)))
(define message (string->bytes/latin-1 "0123456789ABCDEF")) ; 16-byte data
(bytes-length (enc-aes-128-ecb-unpadded message))
; -> 16
(dec-aes-128-ecb-unpadded (enc-aes-128-ecb-unpadded message))
; -> #"0123456789ABCDEF"
This worked well on my machine. Also, switching to CBC mode is trivial.
Caveat
When you disable padding, your messages have to have a length that is a multiple of the block size. For AES128 that is an exact multiple of 16 Bytes. Otherwise the function will blow up in your face:
(enc-aes-128-ecb-unpadded (string->bytes/latin-1 "too short!"))
EVP_CipherFinal_ex: libcrypto error: data not multiple of block length [digital envelope routines:EVP_EncryptFinal_ex:101183626]

It looks like all input is being padded to the next block boundary. That means that a 16 byte input will be padded to the next boundary at 32 bytes. If all your input is going to be exact block sizes, then you could turn off padding. If the input can end in the middle of a block then you will have to leave padding switched on.
If you are going to be using CBC mode, then you might need to think about authentication as well. If you do need it, then HMAC is probably the easiest to get started with.

Related

Lisp cl-dbi - how to embed control characters in sqlite queries

How do you embed control characters (ie " or ' or :) in cl-dbi queries for sqlite3?
currently using a make-string-output/get-output-stream-string to build a variable that contains quoted data (json). Now, I want to be able to store the data in a sqlite3 db, but I'm obviously building the string wrong, because I get an error
DB Error: unrecognized token: ":" (Code: ERROR)
how do I escape characters in cl-dbi to pass them through to sqlite3?
EDIT - here's a brief passage of the JSON data that I'm trying to store (as text) in the sqlite3 db:
{
"type": "artificial",
Update: GAH! Took me a day to find an errant : in the prepared query string. :/
As far as I can make out, what you are trying to do is generate, in Lisp, some string which contains valid SQL, which itself contains an SQL literal string.
First of all this is an instance of an antipattern I call 'language in a string'. I keep thinking I have a big diatribe about this to point people at but it seems I haven't. Suffice it to say it's kind of the antithesis of what Lisp people have tried to achieve for more than 60 years and it's why we have SQL injection attacks and a lot of the other crap that afflicts us. But that battle is long lost and all we can do is try to avoid actually drowning in the sea of mud and rotting bits of people that now litters the battlefield.
So, to do this you need to be able to do two things.
You need to be able to generate an SQL literal string from a sequence of characters (or from a string). This means you need to know the syntax of literal SQL strings and in particular what characters are legal in them and how you express characters which are not.
You need to be able to interpolate this string into a CL string.
The second of these is trivial: this is what format's ~A directive does. Or if you want to get fancy you could use cl-interpol.
For the first, I don't know the syntax of SQL literal strings, but I will give an example which assumes the following simple rules:
literal strings are delimited by " characters;
the character \ escapes the following character to remove it of any special significance;
all other characters are allowed (this is almost certainly wrong).
Well, there are lots of ways of doing this, all of which involve walking along the sequence of characters looking for the ones that need to be escaped. Here is something reasonably horrible and quick which I wrote. It needs a macro called nlet which is Scheme's 'named let' construct, and it assumes TRO in the implementation (if your implementation does not do this, get one that does).
(defmacro nlet (name bindings &body forms)
"named let"
(multiple-value-bind (vars vals) (values (mapcar (lambda (b)
(etypecase b
(symbol b)
(cons (first b))))
bindings)
(mapcar (lambda (b)
(etypecase b
(symbol 'nil)
(cons
(unless (null (cddr b))
(error "bad binding ~A" b))
(second b))))
bindings))
`(labels ((,name ,vars ,#forms))
(,name ,#vals))))
(defun ->sql-string (seq)
;; turn SEQ (a sequence of characters) into a string representing an
;; SQL literal string (perhaps)
(nlet loupe ((tail (coerce seq 'list))
(accum '()))
(if (null tail)
(coerce (cons #\" (nreverse (cons #\" accum))) 'string)
(destructuring-bind (first . rest) tail
(loupe rest
(case first
((#\\ #\")
(append (list first #\\) accum))
(otherwise
(cons first accum))))))))
So now:
> (->sql-string "foo")
"\"foo\""
> (->sql-string '(#\f #\\ #\o #\" #\o))
"\"f\\\\o\\\"o\""
This is made ugly by the Lisp printer, but (see above) we can see what the strings actually are:
> (format t "~&select x from y where x.y = ~A;~%"
(->sql-string '(#\f #\\ #\o #\" #\o)))
select x from y where x.y = "f\\o\"o";
nil
And you can see that the SQL literal string obeys the rules I set out above.
Before using anything like this check what the rules are, because if you get them wrong you are possibly open to SQL injection attacks.

Interpret foreign memory as lisp memory (or vice versa) without copying data

I try to write BLOB into database - chunk by chunk, using database API C-function (say, db-write-chunk).
This function takes a pointer to a foreign memory (where chunk is placed) as an argument.
So, I make buffer for a chunk: foreign-buffer.
I'll take chunk data from a file (or binary stream) by read-sequence into stream-buffer:
(let ((foreign-buffer (foreign-alloc :uchar 1024)))
(stream-buffer ((make-array 1024 :element-type '(unsigned-byte 8))))
(loop
for cnt = (read-sequence stream-buffer MY-STREAM)
while (> cnt 0)
do
;; copy cnt bytes from stream-buffer into foreign-buffer
;; call db-write-chunk with foreign-buffer
L in BLOB is for Large and loop may iterate many times.
Besides that, all this code may be wrapped by the external loop (bulk-insert, for example).
So, I want to minimize the count of steps in the loop(s) body.
To have this done I need:
to be able to read sequence not into stream-buffer, but into foreign-buffer directly, like this:
(read-sequence (coerce foreign-buffer '(vector/array ...)) MY-STREAM)
or to be able to interpret stream-buffer as foreign memory, like this:
(db-write-chunk (mem-aptr stream-buffer :uchar 0))
Is it possible to solve my problem using single buffer only - native or foreign, without copying memory between them?
Like anything else ffi, this is implementation dependent, but cffi has cffi:make-shareable-byte-vector, which is a CL (unsigned-byte 8) array which you can then use with cffi:with-pointer-to-vector-data:
(cffi:defcfun memset :pointer
(ptr :pointer)
(val :int)
(size :int))
(let ((vec (cffi:make-shareable-byte-vector 256)))
(cffi:with-pointer-to-vector-data (ptr vec)
(memset ptr 0 (length vec))))
Depending on your use, this might be preferable to static-vectors, because you don't have to remember to free it manually. On SBCL this works by pinning the vector data during with-pointer-to-vector-data.

How do I memory map tmpfs files in sbcl?

Exactly as the question says. I want to use shared memory to communicate between two lisp processes. Any pointers on how to do that?
I can see some tutorials on doing this in clozure at :-
http://ccl.clozure.com/manual/chapter4.7.html
Can someone point me to a similar library to do this with sbcl?
For a portable implementation, you might want to use the osicat library, which provides a CFFI wrapper for many POSIX calls in the osicat-posix package.
There is a very nice and short article with code for using it at http://wandrian.net/2012-04-07-1352-mmap-files-in-lisp.html (by Nicolas Martyanoff).
To preserve that, I mostly cite from there:
Mapping a file is done by opening it with osicat-posix:open, reading its size with fstat, then calling mmap. Once the file has been mapped we can close the file descriptor, it’s not needed anymore.
(defun mmap-file (path)
(let ((fd (osicat-posix:open path (logior osicat-posix:o-rdonly))))
(unwind-protect
(let* ((size (osicat-posix:stat-size (osicat-posix:fstat fd)))
(addr (osicat-posix:mmap (cffi:null-pointer) size
(logior osicat-posix:prot-read)
(logior osicat-posix:map-private)
fd 0)))
(values addr size))
(osicat-posix:close fd))))
The mmap-file function returns two values: the address of the memory mapping and its size.
Unmapping this chunk of memory is done with osicat-posix:munmap.
Let’s add a macro to safely map and unmap files:
(defmacro with-mmapped-file ((file addr size) &body body)
(let ((original-addr (gensym "ADDR-"))
(original-size (gensym "SIZE-")))
`(multiple-value-bind (,addr ,size)
(mmap-file ,file)
(let ((,original-addr ,addr)
(,original-size ,size))
(unwind-protect
(progn ,#body)
(osicat-posix:munmap ,original-addr ,original-size))))))
This macro mmaps the given file and binds the two given variables to its address and and size. You can then calculate address pointers with cffi:inc-pointer and access the file contents with cffi:mem-aref. You might want to build your own wrappers around this to represent the format of your file (e. g. plain text in UTF-8).
(In comparison to the posting linked above, I removed the wrapping of osicat-posix:munmap into another function of exactly the same signature and effect, because it seemed superfluous to me.)
There is low-level mmap function bundled with sbcl:
CL-USER> (apropos "MMAP")
SB-POSIX:MMAP (fbound)
; No value
CL-USER> (describe 'sb-posix:mmap)
SB-POSIX:MMAP
[symbol]
MMAP names a compiled function:
Lambda-list: (ADDR LENGTH PROT FLAGS FD OFFSET)
Derived type: (FUNCTION (T T T T T T)
(VALUES SYSTEM-AREA-POINTER &OPTIONAL))
Inline proclamation: INLINE (inline expansion available)
Source file: SYS:CONTRIB;SB-POSIX;INTERFACE.LISP.NEWEST
; No value
You have to use explicit address arithmetics to use it, as in C.

How to implement `tail` command using CL?

"with-open-file" will read from the beginning of a file. If the file is VERY big how to read the last 20 lines efficiently ?
Sincerely!
This opens a file, reads the final byte, and closes the file.
(defun read-final-byte (filename)
(with-open-file (s filename
:direction :input
:if-does-not-exist :error)
(let ((len (file-length s)))
(file-position s (1- len)) ; 0-based position.
(read-char s nil)))) ; don't error if reading the end of the file.
If you want to specifically read the last n lines, you will have to read back an indeterminate number of bytes until you get n+1 newlines. In order to do this, you will either have to do block reads backwards (faster but will wind up in reading unneeded bytes), or byte-reads (slower but allows precision and a slightly more obvious algorithm).
I suspect tail has a reasonable algorithm applied for this, so it would likely be worth reading tail's source for a guideline.

How to convert byte array to string in Common Lisp?

I'm calling a funny API that returns a byte array, but I want a text stream. Is there an easy way to get a text stream from a byte array? For now I just threw together:
(defun bytearray-to-string (bytes)
(let ((str (make-string (length bytes))))
(loop for byte across bytes
for i from 0
do (setf (aref str i) (code-char byte)))
str))
and then wrap the result in with-input-from-string, but that can't be the best way. (Plus, it's horribly inefficient.)
In this case, I know it's always ASCII, so interpreting it as either ASCII or UTF-8 would be fine. I'm using Unicode-aware SBCL, but I'd prefer a portable (even ASCII-only) solution to a SBCL-Unicode-specific one.
FLEXI-STREAMS (http://weitz.de/flexi-streams/) has portable conversion function
(flexi-streams:octets-to-string #(72 101 108 108 111) :external-format :utf-8)
=>
"Hello"
Or, if you want a stream:
(flexi-streams:make-flexi-stream
(flexi-streams:make-in-memory-input-stream
#(72 101 108 108 111))
:external-format :utf-8)
will return a stream that reads the text from byte-vector
There are two portable libraries for this conversion:
flexi-streams, already mentioned in another answer.
This library is older and has more features, in particular the extensible streams.
Babel, a library specificially for character encoding and decoding
The main advantage of Babel over flexi-streams is speed.
For best performance, use Babel if it has the features you need, and fall back to flexi-streams otherwise. Below a (slighly unscientific) microbenchmark illustrating the speed difference.
For this test case, Babel is 337 times faster and needs 200 times less memory.
(asdf:operate 'asdf:load-op :flexi-streams)
(asdf:operate 'asdf:load-op :babel)
(defun flexi-streams-test (bytes n)
(loop
repeat n
collect (flexi-streams:octets-to-string bytes :external-format :utf-8)))
(defun babel-test (bytes n)
(loop
repeat n
collect (babel:octets-to-string bytes :encoding :utf-8)))
(defun test (&optional (data #(72 101 108 108 111))
(n 10000))
(let* ((ub8-vector (coerce data '(simple-array (unsigned-byte 8) (*))))
(result1 (time (flexi-streams-test ub8-vector n)))
(result2 (time (babel-test ub8-vector n))))
(assert (equal result1 result2))))
#|
CL-USER> (test)
Evaluation took:
1.348 seconds of real time
1.328083 seconds of user run time
0.020002 seconds of system run time
[Run times include 0.12 seconds GC run time.]
0 calls to %EVAL
0 page faults and
126,402,160 bytes consed.
Evaluation took:
0.004 seconds of real time
0.004 seconds of user run time
0.0 seconds of system run time
0 calls to %EVAL
0 page faults and
635,232 bytes consed.
|#
If you don't have to worry about UTF-8 encoding (that, essentially, means "just plain ASCII"), you may be able to use MAP:
(map 'string #'code-char #(72 101 108 108 111))
I say go with the proposed flexistream or babel solutions.
But just for completeness and the benefit of future googlers arriving at this page I want to mention sbcl's own sb-ext:octets-to-string:
SB-EXT:OCTETS-TO-STRING is an external symbol in #<PACKAGE "SB-EXT">.
Function: #<FUNCTION SB-EXT:OCTETS-TO-STRING>
Its associated name (as in FUNCTION-LAMBDA-EXPRESSION) is
SB-EXT:OCTETS-TO-STRING.
The function's arguments are: (VECTOR &KEY (EXTERNAL-FORMAT DEFAULT) (START 0)
END)
Its defined argument types are:
((VECTOR (UNSIGNED-BYTE 8)) &KEY (:EXTERNAL-FORMAT T) (:START T) (:END T))
Its result type is:
*
SBCL supports the so-called Gray Streams. These are extensible streams based on CLOS classes and generic functions. You could create a text stream subclass that gets the characters from the byte array.
Try the FORMAT function. (FORMAT NIL ...) returns the results as a string.

Resources