It appears that most (if not all) global declarations cannot be reverted in an ANSI CL standard way.
E.g., once you evaluate (either directly or by loading a file) a form like (proclaim '(declaration my-decl)) or (declaim (special *my-var*)) there is no portable way to make (declare (my-decl ...)) illegal or *my-var* lexical.
Various implementation offer non-portable way to revert the special declaration, usually via (proclaim '(notspecial *my-var*)) or some other trick.
How about the declaration proclamation?
How do various implementations undo it?
How would you implement it?
Do you think (proclaim '(notdeclaration my-decl)) is a good idea?
Motivation: in a test suite, it would be nice to be modular - to be able to revert all effects of a test statements to avoid any possible interference with test suite parts. I know it's a week motivation because The Right Way is to use packages.
One possible way is to provide a transaction mechanism (rollback/commit). This is taken from Xach's Naggum's archive:
I miss a transaction facility where I can make a number of changes
to the system that are only visible to my thread of a multi-threaded
execution environment and then discard them or commit them all at
once. E.g., loading a file could be such a transcation. Signaling
an error during loading could cause the whole slew of operations
that preceded it to be discarded instead of leaving the system in a
partially modified state. This is not impossible to build on top of
the existing system, but it takes significant effort, so it is the
kind of thing that Common Lisp systems programmers should do.
You would miss the possibility to undo only a subset of the declarations, like (declare A), (declare B), (undeclare A) but given your motivation, it would not be a problem because you are likely to want to undo all possible declarations made during a test.
You could provide a special form to individually "undeclare" declarations. I'd name it retract, but this might be difficult to specify in some cases. Suppose you declare that x is either a string or a number, can you retract a declaration that says x is a string? What about inline functions, etc? A transaction looks easier to implement, with e.g. a temporary environment.
Related
I type the following in a repl (clozure common lisp)
(defparameter test 1)
The repl responds with test
Now I enter:
(format *standard-input* "(defparameter test 2)")
Repl outputs (defparameter test 2) followed by nil.
But the value of test remains unchanged at 1.
Why is this? Isn't writing to the variable *standard-input* the same as entering the text at the repl?
How do I achieve the desired evaluation?
Some context:
I'm making a custom frontend for common lisp development using sockets. I need to write to standard input because even though I can evaluate code using eval and read, I cannot debug the code on errors.
For instance entering 1 to unwind the stack and return to top level is impossible without writing to standard input (as far as I can tell). I have the output parts figured out.
*standard-input* is an input stream, as its name implies. It's a stream you read from, not one you write to. It may be an output stream as well, but if it is then writing to it is not going to inject strings into the REPL.
I'd suggest looking at SLIME or SLY if you want to understand how to have REPLs & debuggers which interact with things down streams. In particular SWANK is probably the interesting bit to understand, or the equivalent for SLY, which is SLYNK (or slynk, not sure of the capitalisation). The implementations of these protocols in various Lisps are not entirely trivial, but the implementations exist already: you don't have to write them. Screen-scraping an interface made for humans to interact with is almost always a terrible approach: it's only reasonable when there is no better way, and in this case there is a better way, in fact there are at least two.
Supposed I wanted to take advantage of Common Lisp's ability to read and execute Common Lisp code so that my program can execute external code written in Lisp, but I don't trust that code so I don't want it to have access the full power of Common Lisp. Is it possible for me to restricts its environment so that it can only see the packages/symbols to which I explicitly give it access, effectively creating a DSL?
To read the code, start by disabling *read-eval* (that stops people injecting execution during parsing, using something like #.(do-evil-stuff). You probably want to do the reading using a custom read-table that disables most (if not all) read-macros. You probably want to do the reading with a custom, one-off, package, importing only symbols you allow.
Once you've read the user-provided code, you still need to validate that there's no unexpected function/macro references in the code. If you have used a custom package, you should be able to confirm that each symbol falls in either of the two classes "belongs to the custom one-off package" (this is user-supplied stuff) or "explicitly allowed from elsewhere" (you would need this list to construct the custom package).
Once that's been done, you can then evaluate it.
However, doing this correctly would take a fair bit of care and you really should have someone else have a look at the code and actively try to break out of the sandbox.
Take a look at the section 'Reader security' in chapter 4 of Let over lambda which discusses this topic in some depth. In particular, you probably want to set *read-eval* to nil. To address your question regarding restricting access to the environment, this is generally difficult in Common Lisp, as it is designed to allow access to most pieces of the system in the first place. Maybe you can use elaborate the ideas of Let over lambda in the direction of white listing symbols (in comparison to the blacklisting of macro characters in the linked chapter). I don't think there are any ready-made solutions.
Let's say we define a function c sum(a, b), functional programming -style, that returns the sum of its arguments. So far so good; all the nice things of FP without any problems.
Now let's say we run this in an environment with dynamic typing and a singleton, stateful error stream. Then let's say we pass a value of a and/or b that sum isn't designed to handle (i.e. not numbers), and it needs to indicate an error somehow.
But how? This function is supposed to be pure and side-effect-less. How does it insert an error into the global error stream without violating that?
No programming language that I know of has anything like a "singleton stateful error stream" built in, so you'd have to make one. And you simply wouldn't make such a thing if you were trying to write your program in a pure functional style.
You could, however, have a sum function that returns either the sum or an indication of an error. The type used to do this is in fact often known by the name Either. Then you could easily make a function that invokes a whole bunch of computations that could possibly return an error, and returns a list of all the errors that were encountered in the other computations. That's pretty close to what you were talking about; it's just explicitly returned rather than being global.
Remember, the question when you're writing a functional program is "how do I make a program that has the behavior I want?" not, "how would I duplicate one particular approach taken in another programming style?". A "global stateful error stream" is a means not an end. You can't have a global stateful error stream in pure function style, no. But ask yourself what you're using the global stateful error stream to achieve; whatever it is, you can achieve that in functional programming, just not with the same mechanism.
Asking whether pure functional programming can implement a particular technique that depends on side effects is like asking how you use techniques from assembly in object-oriented programming. OO provides different tools for you to use to solve problems; limiting yourself to using those tools to emulate a different toolset is not going to be an effective way to work with them.
In response to comments: If what you want to achieve with your error stream is logging error messages to a terminal, then yes, at some level the code is going to have to do IO to do that.1
Printing to terminal is just like any other IO, there's nothing particularly special about it that makes it worthy of singling out as a case where state seems especially unavoidable. So if this turns your question into "How do pure functional programs handle IO?", then there are no doubt many duplicate questions on SO, not to mention many many blog posts and tutorials speaking precisely to that issue. It's not like it's a sudden surprise to implementors and users of pure programming languages, the question has been around for decades, and there have been some quite sophisticated thought put into the answers.
There are different approaches taken in different languages (IO monad in Haskell, unique modes in Mercury, lazy streams of requests and responses in historical versions of Haskell, and more). The basic idea is to come up with a model which can be manipulated by pure code, and hook up manipulations of the model to actual impure operations within the language implementation. This allows you to keep the benefits of purity (the proofs that apply to pure code but not to general impure code will still apply to code using the pure IO model).
The pure model has to be carefully designed so that you can't actually do anything with it that doesn't make sense in terms of actual IO. For example, Mercury does IO by having you write programs as if you're passing around the current state of the universe as an extra parameter. This pure model accurately represents the behaviour of operations that depend on and affect the universe outside the program, but only when there is exactly one state of the universe in the system at any one time, which is threaded through the entire program from start to finish. So some restrictions are put in
The type io is made abstract so that there's no way to construct a value of that type; the only way you can get one is to be passed one from your caller. An io value is passed into the main predicate by the language implementation to kick the whole thing off.
The mode of the io value passed in to main is declared such that it is unique. This means you can't do things that might cause it to be duplicated, such as putting it in a container or passing the same io value to multiple different invocations. The unique mode ensures that you can only ass the io value to a predicate that also uses the unique mode, and as soon as you pass it once the value is "dead" and can't be passed anywhere else.
1 Note that even in imperative programs, you gain a lot of flexibility if you have your error logging system return a stream of error messages and then only actually make the decision to print them close to the outermost layer of the program. If your log calls are directly writing the output immediately, here's just a few things I can think of off the top of my head that become much harder to do with such a system:
Speculatively execute a computation and see whether it failed by checking whether it emitted any errors
Combine multiple high level systems into a single system, adding tags to the logs to distinguish each system
Emit debug and info log messages only if there is also an error message (so the output is clean when there are no errors to debug, and rich in detail when there are)
What are some of the optimization steps that this command does
`(optimize speed (safety 0))`
Can I handcode some of these techniques in my Lisp/Scheme program?
What these things do in CL depends on the implementation. What usually happens is: there's a bunch of optimizations and other code transformations that can be applied on your code with various tradeoffs, and these declarations are used as a higher level specification that is translated to these individual transformations. Most implementations will also let you control the individual settings, but that won't be portable.
In Scheme there is no such standard facility, even though some Schemes use a similar approach. The thing is that Scheme in general (ie, in the standard) avoids such "real world" issues. It is possible to use some optimization techniques here and there, but that depends on the implementation. For example, in PLT the first thing you should do is make sure your code is defined in a module -- this ensures that the compiler gets to do a bunch of optimizations like inlining and unfolding loops.
I don't know, but I think the SBCL internals wiki might have some starting points if you want to explore.
Higher speed settings will cause the compiler to work harder on constant folding, compile-time type-inference (hence eliminating runtime dynamic dispatches for generic operations) and other code analyses/transformations; lower safety will skip runtime type checks, array bound checks, etc. For more details, see
Advanced Compiler Use and Efficiency Hints chapter of the CMUCL User's Manual, which applies to both CMUCL and SBCL(more or less).
How do I declare the types of the parameters in order to circumvent type checking?
How do I optimize the speed to tell the compiler to run the function as fast as possible like (optimize speed (safety 0))?
How do I make an inline function in Scheme?
How do I use an unboxed representation of a data object?
And finally are any of these important or necessary? Can I depend on my compiler to make these optimizations?
thanks,
kunjaan.
You can't do any of these in any portable way.
You can get a "sort of" inlining using macros, but it's almost always to try to do that. People who write Scheme (or any other language) compilers are usually much better than you in deciding when it is best to inline a function.
You can't make values unboxed; some Scheme compilers will do that as an optimization, but not in any way that is visible (because it is an optimization -- so it should preserve the semantics).
As for your last question, an answer is very subjective. Some people cannot sleep at night without knowing exactly how many CPU cycles some function uses. Some people don't care and are fine with trusting the compiler to optimize things reasonably well. At least at the stages where you're more of a student of the language and less of an implementor, it is better to stick to the latter group.
If you want to help out the compiler, consider reducing top level definitions where possible.
If the compiler sees a function at top-level, it's very hard for it to guess how that function might be used or modified by the program.
If a function is defined within the scope of a function that uses it, the compiler's job becomes much simpler.
There is a section about this in the Chez Scheme manual:
http://www.scheme.com/csug7/use.html#./use:h4
Apparently Chez is one of the fastest Scheme implementations there is. If it needs this sort of "guidance" to make good optimizations, I suspect other implementations can't live without it either (or they just ignore it all together).