Safe Parsing of Format Directives in Common Lisp - common-lisp

I would like to read in a string from an input file (which may or may not have been modified by the user). I would like to treat this string as a format directive to be called with a fixed number of arguments. However, I understand that some format directives (particularly, the ~/ comes to mind) could potentially be used to inject function calls, making this approach inherently unsafe.
When using read to parse data in Common Lisp, the language provides the *read-eval* dynamic variable which can be set to nil to disable #. code injection. I'm looking for something similar that would prevent code injection and arbitrary function calls inside format directives.

If the user cannot introduce custom code but only format strings, then you can avoid the problems of print-object. Remember to use with-standard-io-syntax (or a customized version of it) to control to exact kind of output you will generate (think about *print-base*, ...).
You can scan the input strings to detect the presence of ~/ (but ~~/ is valid) and refuse to interpret format that contains blacklisted constructs.
However, some analysis are more difficult and you might need to act at runtime.
For example, if the format string is malformed, you will probably encouter an error, which must be handled (also, you may give bad values to the expected arguments).
Even if the user is not malicious, you can also have problems with iteration constructs:
~{<X>~:*~}
... never stops because ~:* rewinds current argument. In order to handle this, you must consider that <X> may, or not, print something. You could implement both of those strategies:
have a timeout to limit the time formatting takes
have the underlying stream reach end-of-file when writing too much (e.g. write into a string buffer).
There might be other problems I currently don't see, be careful.

It's not just ~/ that you'd need to worry about. The pretty printer functionality has lots of possibilities for code extension, and even ~A can cause problems, because objects may have methods on print-object defined. E.g.,
(defclass x () ())
(defmethod print-object ((x x) stream)
(format *error-output* "Executing arbitrary code...~%")
(call-next-method x stream))
CL-USER> (format t "~A" (make-instance 'x))
Executing arbitrary code...
#<X {1004E4B513}>
NIL
I think you'd need to define for yourself which directives are safe, using whatever criteria you consider important, and then include only those.

Related

How do i programmatically write to *standard-input* for evaluation at the repl?

I type the following in a repl (clozure common lisp)
(defparameter test 1)
The repl responds with test
Now I enter:
(format *standard-input* "(defparameter test 2)")
Repl outputs (defparameter test 2) followed by nil.
But the value of test remains unchanged at 1.
Why is this? Isn't writing to the variable *standard-input* the same as entering the text at the repl?
How do I achieve the desired evaluation?
Some context:
I'm making a custom frontend for common lisp development using sockets. I need to write to standard input because even though I can evaluate code using eval and read, I cannot debug the code on errors.
For instance entering 1 to unwind the stack and return to top level is impossible without writing to standard input (as far as I can tell). I have the output parts figured out.
*standard-input* is an input stream, as its name implies. It's a stream you read from, not one you write to. It may be an output stream as well, but if it is then writing to it is not going to inject strings into the REPL.
I'd suggest looking at SLIME or SLY if you want to understand how to have REPLs & debuggers which interact with things down streams. In particular SWANK is probably the interesting bit to understand, or the equivalent for SLY, which is SLYNK (or slynk, not sure of the capitalisation). The implementations of these protocols in various Lisps are not entirely trivial, but the implementations exist already: you don't have to write them. Screen-scraping an interface made for humans to interact with is almost always a terrible approach: it's only reasonable when there is no better way, and in this case there is a better way, in fact there are at least two.

How to undo a `declaration` proclamation?

It appears that most (if not all) global declarations cannot be reverted in an ANSI CL standard way.
E.g., once you evaluate (either directly or by loading a file) a form like (proclaim '(declaration my-decl)) or (declaim (special *my-var*)) there is no portable way to make (declare (my-decl ...)) illegal or *my-var* lexical.
Various implementation offer non-portable way to revert the special declaration, usually via (proclaim '(notspecial *my-var*)) or some other trick.
How about the declaration proclamation?
How do various implementations undo it?
How would you implement it?
Do you think (proclaim '(notdeclaration my-decl)) is a good idea?
Motivation: in a test suite, it would be nice to be modular - to be able to revert all effects of a test statements to avoid any possible interference with test suite parts. I know it's a week motivation because The Right Way is to use packages.
One possible way is to provide a transaction mechanism (rollback/commit). This is taken from Xach's Naggum's archive:
I miss a transaction facility where I can make a number of changes
to the system that are only visible to my thread of a multi-threaded
execution environment and then discard them or commit them all at
once. E.g., loading a file could be such a transcation. Signaling
an error during loading could cause the whole slew of operations
that preceded it to be discarded instead of leaving the system in a
partially modified state. This is not impossible to build on top of
the existing system, but it takes significant effort, so it is the
kind of thing that Common Lisp systems programmers should do.
You would miss the possibility to undo only a subset of the declarations, like (declare A), (declare B), (undeclare A) but given your motivation, it would not be a problem because you are likely to want to undo all possible declarations made during a test.
You could provide a special form to individually "undeclare" declarations. I'd name it retract, but this might be difficult to specify in some cases. Suppose you declare that x is either a string or a number, can you retract a declaration that says x is a string? What about inline functions, etc? A transaction looks easier to implement, with e.g. a temporary environment.

Is the "define" primitive of Scheme an imperative languages feature? Why or why not?

(define hypot
(lambda (a b)
(sqrt (+ (* a a) (* b b)))))
This is a Scheme programming language.
"define" creates a variable and global binding
lambda creates a procedure
I would like to know if "define" would be considered as an imperative language feature! As long as I know imperative feature is static scoping. I think it is an imperative feature since "define" create a global binding and static scoped looks at global binding for any variable definition where as in dynamic it looks at the current most active binding.
Please help me find the correct answer!! And I would like to know why or why not?
In a Scheme program (define var expr) statement is both a declaration and an initialization. Declarations introduce a new name into the scope. Declarations and initialization are present in both imperative and declarative languages.
However if the same variable is defined twice, then define behave as an assignment - which belongs to the imperative paradigm.
You've put your finger on a subtle and contentious issue. There have long been two informal camps on how define should work, which I would label (very imperfectly, and very controversially!) as the static vs. dynamic camps.
The static camp sees define as a non-side-effecting top-level declaration—it's a syntax that simply defines a name in a top-level scope, just like let is a syntax that defines a name in a local scope. A bit more precisely, this camp tends to see the top-level environment as equivalent to a big letrec with all the defines as the bindings, and all "loose" top-level expressions as the body. This is, incidentally, similar to the way that simple compilers work—read the whole program from one or more files, figure out all of the top-level bindings and generate code with knowledge of the whole program's source text.
The dynamic camp, on the other hand, tends to conceive of the top-level environment as a mutable data structure to which bindings can be added at runtime, and define is then an operation that modifies the top-level environment. This is, incidentally, similar to how simple interactive interpreters work—read definitions interactively from input, one at a time, and incorporate them into the environment as the user provides them.
To give one example, the SLIB library is one that I recall has been criticized for being much too firmly in the "dynamic" camp. If you read Section 1.1 on "features", you see this right from the beginning:
SLIB maintains a list of features supported by a Scheme session. The set of features provided by a session may change during that session.
The documentation for the require form that you use in SLIB to "load" modules continues with this:
Procedure: require feature
If (provided? feature) is true, then require just returns.
Otherwise, if feature is found in the catalog, then the corresponding files will be loaded and (provided? feature) will henceforth return #t. That feature is thereafter provided.
Otherwise (feature not found in the catalog), an error is signaled.
If you read this carefully, you will be struck that it's framing the whole thing as modules being "loaded" at runtime—and not as compile-time linking, which is foreign to the design.
So a "session" is a set of bindings whose keys—not just their values—changes during the runtime of the program. Programs are able to mutate the session with provide and require. They are able to directly observe the mutation with provided?. And it is implied that they can indirectly observe the set of identifiers bound in top-level environment change as a result of require—a call to require causes procedure invocations that would result in a runtime error before its invocation to no longer be so afterwards.
So we can't help but conclude that going by the philosophy of the people who designed this library, define is imperative. But not every Scheme user or implementer shares this philosophy.
First off Scheme is lexically scoped. Define usually is not limited to top level bindings like it is in Racket. It can create bindings within other procedure bodies.
In some implementations define can manipulate state but only for top level definitions. Otherwise it acts like let and binds a variable to the local scope. To actually take advantage of the top-level rebinding programatically is difficult.
So define doesn't introduce an imperative style into scheme code. Compare define to set! and its relatives, which by modify the variable in whatever environment it is bound, thereby allowing imperative style in scheme code.

clojure functions, let & return values

Is it unwise to return a var bound using let?
(let [pipeline (Channels/pipeline)]
(.addLast pipeline "codec" (HttpClientCodec.))
;; several more lines like this
pipeline)
Is the binding here just about the lexical scope (as opposed to def) and not unsafe to pass around?
Update
In writing this question I realised the above was ugly. And if something is ugly in Clojure you are probably doing it wrong.
I think this is probably the more idiomatic way of handling the above (which makes the question moot, btw, but still handy knowledge).
(doto (Channels/pipeline)
(.addLast "codec" (HttpClientCodec.)))
let is purely lexically scoped and doesn't create a var. The locals created by let (or loop) behave exactly like function arguments. So yeah, it's safe to use as many let/loop-defined locals as you like, close over them, etc. Returning a local from the function simply returns its value, not the internal representation (which is actually on the stack, unless closed over). let/loop bindings are therefore also reentrancy/thread-safe.
By the way, for your specific code example with lots of java calls, you may want to consider using doto instead or additionally. http://clojure.github.com/clojure/clojure.core-api.html#clojure.core/doto

More explanation on Lexical Binding in Closures?

There are many SO posts related to this, but I am asking this again with a different purpose
I am trying to understand why closures are important and useful. One of things that I've read in other SO posts related to this is that when you pass a variable to closure, the closure starts remembering this value from then onwards. Is this the entire Technical aspect of it or there is more to what happens there.
What I wonder then is what would happen when the variable used inside the closure gets modified from outside. Should they be constants only?
In the language Clojure, I can do the following: But since there are value is immutable, this issue does not arise. What about other languages and what is the proper technical definition of a closure?
(defn make-greeter [greeting-prefix]
(fn [username] (str greeting-prefix ", " username)))
((make-greeter "Hello") "World")
This is not the sort of answer that appears to get up-votes around here, but I would heartily urge you to discover the answer to your question by reading Shriram Krishnamurthi's (free!) (online!) textbook, Programming Languages: Application and Interpretation.
I will paraphrase the book very, very briefly, by summarizing the development of the teeny tiny interpreters that it leads you through:
an arithmetic expression language (AE)
an arithmetic expression language with named expressions (WAE);
implementing this involves developing a substitution function that can
replace names with values
a language that adds first-order functions (F1WAE): using a function involves substituting
values for each of the parameter names.
The same language, without substitution: it turns out that "environments" allow you to avoid the overhead of pre-emptive substitution.
a language that eliminates the separation between functions and expressions by allowing
functions to be defined at arbitrary locations (FWAE)
This is the key point: you implement this, and then you discover that with substitution it works fine, but with environments it's broken. In particular, in order to fix it up, you must be sure to associate with an evaluated function definition the environment that was in place when it was evaluated. This pair (fundef + environment-of-definition) is what's called a "closure".
Whew!
Okay, what happens when we add mutable bindings to the picture? If you try this yourself, you'll see that the natural implementation replaces an environment that associates names with values with an environment that associates names with bindings. This is orthogonal to the notion of closures; since closures capture environments, and since environments now map names to bindings, you get the behavior you describe, whereby mutation of a variable captured in an environment is visible and persistent.
Again, I would very much urge you to take a look at PLAI.
A closure is really a data structure used by the compiler to make sure that a function will always have access to the data that it needs to opperate. here is an example of a function that recordes when it was defined.
(defn outer []
(let [foo (get-time-of-day)]
(defn inner []
#(str "then:" foo " now:" (get-time-of-day)))))
(def then-and-now (outer))
(then-and-now) ==> "then:1:02:03 now:2:30:01"
....
(then-and-now) ==> "then:1:02:03 now:2:31:02"
when this function is defined a class is created and a small structure (a closure) is allocated on the heap that stores the value of foo. the class has a pointer to that (or it contains it im not sure). if you run this again then a second closure would be allocated to hold that other foo. When we say "this function closes over foo" we mean to say that it has a reference to a stricture/class/whatever that stores the state of foo at the time it was compiled. The reason you need to close over something is because the function that contains it is going away before the data will be used. In this case outer (which contains the value of foo) is going to end and be gone long before foo is used so nobody will be around to modify foo. of course foo could pas a ref to somebody who could then modify it.
A lexical closure is one in which the enclosed variables (e.g. greeting-prefix in your example) are enclosed by reference. The closure created does not simply get the value of greeting-prefix at the time it is created, but gets a reference. If greeting-prefix is modified after the closure is created, then its new value will be used by the closure every time it is called.
In pure functional languages this isn't much of a distinction, because values are never changed. So it doesn't matter if the value of greeting-prefix is copied into the closure: there's no possible difference in behaviour that could arise from referring to the original versus its copy.
In "imperative-languages-with-closures", such as C# and Java (via anonymous classes), some decision has to be made about whether the enclosed variable is enclosed by value or by reference. In Java this decision is pre-empted by only allowing final variables to be enclosed, effectively mimicking a functional language as far as that variable is concerned. In C# I believe it is a different matter.
Enclosing by value simplifies the implementation: the variable to be enclosed will often exist on the stack and hence will be destroyed when the function constructing the closure returns -- that means it can't be enclosed by reference. If you need enclosure by reference, a workaround is to identify such variables and keep them in an object allocated each time that function is called. This object is then kept as part of the closure's environment and must remain live as long as all closures using it are live. (I do not know if any compiled languages directly use this technique.)
For more descriptions see for example:
Common Lisp HyperSpec, 3.1.4 Closures and Lexical Binding
and
Common Lisp the Language, 2nd Edition, Chapter 3., Scope and Extent
You can think of a closure as an "environment", in which names are bound to values. Those names are entirely private to the closure, which is why we say that it "closes over" its environment. So your question isn't meaningful, in that the "outside" cannot affect the closed-over environment. Yes, a closure can refer to a name in a global environment (in other words, if it uses a name that is not bound in its private, closed-over environment), but that's a different story.
If you like, you can think of an environment as a dictionary, or hash table. A closure gets its own little dictionary where names are looked up.
You might enjoy reading On lambdas, capture, and mutability, which describes how this works in C# and F#, for comparison.
Have a look at this blog post: ADTs in Clojure. It shows a nice application of closures to the problem of locking up data so that it is accessible exclusively through a particular interface (rendering the data type opaque).
The main idea behind this type of locking is more simply illustrated with the counter example, which huaiyuan posted in Common Lisp while I was composing this answer. Actually, the Clojure version is interesting in that it shows that the issue of a closed-over variable changing its value does arise in Clojure if the variable happens to hold an instance of one of the reference types.
(defn create-counter []
(let [counter (atom 0)
inc-counter! #(swap! counter inc)
get-counter (fn [] #counter)]
[inc-counter! get-counter]))
As for the original make-greeter example, you could rewrite it thus (note the deref/#):
(defn make-greeter [greeting-prefix]
(fn [username] (str #greeting-prefix ", " username)))
Then you can use it to render personalised greetings from the different operators of various sections of a website. :-)
((make-greeter "Hello from Gizmos Dept") "John")
((make-greeter "Hello from Gadgets Dept") "Jack").
You can think of a closure as an
"environment", in which names are
bound to values. Those names are
entirely private to the closure, which
is why we say that it "closes over"
its environment. So your question
isn't meaningful, in that the
"outside" cannot affect the
closed-over environment. Yes, a
closure can refer to a name in a
global environment (in other words, if
it uses a name that is not bound in
its private, closed-over environment),
but that's a different story.
I suppose that the question was if things like these are possible in languages which allow mutation of local variables:
CL-USER> (let ((x (list 1 2 3)))
(prog1
(let ((y x))
(lambda () y))
(rplaca x 2)))
#<COMPILED-LEXICAL-CLOSURE #x9FEC77E>
CL-USER> (funcall *)
(2 2 3)
And -- since they are obviously possible -- I think the question is legitimate.

Resources