Ada: how to send and receive objects between executables on separate machines - ada

I need to send some fairly large data-structures between instances of my running Ada program. Obviously json over https is an option. Not one I want to use as it's bigger than I'd like in terms of data overhead, but it will work for now.
Ideally I'd want to mash it into a binary blob and be sent with a hash to confirm the message. Is there a decent way to do this in Ada?

I would look for a solution based on Streams, sent over TCP.
If you want to implement your own blocking and hashing, you’ll probably need to write the raw stream to memory first so that you can tell how big the blob is and work out the checksum. A fairly straightforward approach to this would be here, spec and body.
For a solution that’s had a lot more work put into it, look at Dmitry Kazakov’s Simple Components’ Block Streams.

Ideally I'd want to mash it into a binary blob and be sent with a hash
to confirm the message. Is there a decent way to do this in Ada?
As mentioned, the DSA [Annex E] is an excellent way to handle this, though with some caveats due to the implementations (rather than language) — the definitions of/for the DSA are broad enough that the transport could be mostly anything, so long as the interface (RPC- and Stream-based) is respected.
Things will be simple[r] if you structure your program with proper categorizations from the outset —see Pure, Shared_Passive, Remote_Types, and Remote_Call_Interface in the ARM, and the Ada Rational— rather than trying to shoehorn something extant into the DSA's required structuring. (That said, there are some cases where modifying an extant program to be DSA capable is rather simply a matter of adding the categorization pragmas/aspects and configuring+compiling.)
Note that Ada's containers are designed so that they can be used in DSA programs, and are all [IIRC] Remote_Types categorized.
I need to send some fairly large data-structures between instances of
my running Ada program.
Also an option is ASN.1, which allows you to make a type-definition for some data in a language- and machine-independent manner. There are several ASN.1 compilers and a good chunk can generate Ada; here's one (written in F*, IIRC) used by the ESA and freely available open-source.
ASN.1 has an encoding scheme which is optimized for space, and so will give you the most compact on-the-wire representation.
Obviously json over https is an option. Not
one I want to use as it's bigger than I'd like in terms of data
overhead, but it will work for now.
Using HTTP and JSON directly is attractive for many people because it's "easy", though this ease is typically misleading: all the things that they don't do, such as range-checking values or validating the structure are offloaded to the programmer. — That said, you can make things modular and use generics to allow you to "swap-out" methods.
Generic
Type Data(<>) is private;
Type Transport_Type(<>) is private;
Target : However_you_address_the_target;
with Function Encode(Input : Data) return Transport_Type;
Procedure Send( Value : Data );
and
Generic
Type Data(<>) is private;
Type Transport_Type(<>) is private;
with Function Decode(Input : Transport_Type) return Data;
Procedure Receive( Value : Transport_Type ) return Data;
Or something similar to this. I would rate this as less convenient than using the DSA, but also possibly a bit more simple, considering you [mostly] don't have to worry about categorization with this method.

Related

What is the need for immutable/persistent data structures in erlang

Each Erlang process maintains its own private address space. All communication happens via copying without sharing (except big binaries). If each process is processing one message at a time with no concurrent access over its objects, I don't see why do we need immutable/persistent data structures.
Erlang was initially implemented in Prolog, which doesn't really use mutable data structures either (though some dialects do). So it started off without them. This makes runtime implementation simpler and faster (garbage collection in particular).
So adding mutable data structures would require a lot of effort, could introduce bugs, and Erlang programmers are nearly by definition at least willing to live without them.
Many actually consider their absence to be a positive good: less concern about object identity, no need for defensive copying because you don't know whether some other piece of code is going to modify the data you passed (or might be changed later to modify it), etc.
This absence does mean that Erlang is pretty unusable in some domains (e.g. high performance scientific computing), at least as the main language. But again, this means that nobody in these domains is going to use Erlang in the first place and so there's no particular incentive to make it usable at the cost of making existing users unhappy.
I remember seeing a mailing list post by Joe Armstrong quite a long time ago (which I couldn't find with a quick search now) saying that he initially planned to add mutable variables when he'd need them... except he never quite did, and performance was good enough for everything he was using Erlang for.
It is indeed the case that in Erlang immutability does not solve any "shared state" problems, as immutable data are "process local".
From the functional programming language perspective, however, immutability offers a number of benefits, summarized adequately in this Quora answer:
The simplest definition of functional programming is that it’s a programming
paradigm where you are transforming immutable data with functions.
The definition uses functions in the mathematical sense, where it’s
something that takes an input, and produces an output.
OO + mutability tends to violate that definition because when you want
to change a piece of data it generally will not return the output, it
will likely return void or unit, and that when you call a method on
the object the object itself isn’t input for the function.
As far as what advantages the paradigm has, composability, thread
safety, being able to track what went wrong where better, the ability
to sort of separate the data from the actual computation on it being
done, etc.
how would this work?
factorial(1) -> 1;
factorial(X) ->
X*factorial(X-1).
if you run factorial(4), a single process will be running the same function. Each time the function will have it's own value of X, if the value of X was in the scope of the process and not the function recursive functions wouldn't work. So first we need to understand scope. If you want to say that you don't see why data needs to be immutable within the scope of a single function/block you would have a point, but it would be a headache to think about where data is immutable and where it isn't.

Ada: pragma Pure / Remote_Types and system types

I'm writing an Ada application that needs to be distributed, and I'm trying to use the DSA to do it, but I'm finding big limitations in what is "allowed" to be "withed" and what isn't.
I won't post sourcecode, since it's quite complex and this is a generic question anyway, I just wanted some pointers on what I'm not understanding correctly, so please bear with me and correct me if I'm wrong.
So my problem is this: I want to mark a procedure with the pragma Remote_Call_Interface so it can be called remotely. However as soon as I add the pragma compilation breaks due to the fact that the procedure is including other packages in my project that are not categorized as either Pure or Remote_Types.
So I try to mark the packages I need as either Pure or Remote_Types (dpeending wether they have state or not) but this in turn breaks compilation even further, since it turns out that you can't use even basic system types in a Pure/Remote_Types package, for example: you can't use Vectors, you can't use Unbounded_Strings, you can't use Maps, etc... the whole program falls to pieces since I can't use the data structures I used to build it anymore!
Is there a way around this? Or if I want to distribute my application I must strictly limit myself to the most basic types like Integers and booleans and little else?? I don't understand if I'm hitting against a limitation of the language or if I'm just doing it incorrectly (unfortunately the tutorials I found on DSA are all very vague, incidentally if anyone has some good ones feel free to link them!)
EDIT: after ajb's answer let me specify what is annoying me in particular: in the package I want to mark with pragma Remote_Call_Interface I'm trying to "with" some packages that are not pure/remote_types, however it only uses the types in those packages locally, it does not contain any procedures that accept such types as parameters, nor functions that return such types. This is what bothers me: since those types would not have to "travel" over the network, why can't I with them? I'm only using them locally... I don't understand this, and that is why I was trying to make those types Pure/Remote_Types, but now that I've read ajb's explanation (ie: Remote_Types is used so that objects of those types can travel over the network) I'm even more confused about why I can't use them if I only use them locally.
I'm not an expert on Ada distributed programming, but here's what I do know (or think I know):
The Annotated Ada Reference Manual, Section E.2.3 says, "The restrictions governing a remote call interface library unit are intended to ensure that the values of the actual parameters in a remote call can be meaningfully sent between two active partitions." For example, if a record type has a field that's an access type, you can't send it from one partition to another blindly, because the called partition won't be able to access the memory that the pointer points to. (Unbounded_String, Map, and Vector are implemented using access types as part of the internals.) All types used as parameters or return types must support "external streaming", meaning there has to be a way for the type to be converted to and from a stream of bytes so that the parameter value can be transmitted over a socket. If you have a record with an access type, but you provide 'Read and 'Write attributes so that the type can be written to and read from a byte stream without any actual pointers being transmitted, then you can put your record type in a Remote_Types package.
I'm not sure exactly what your problem is: are there certain types you want to pass as a parameter to a remote call but can't; or are there types that you want to use only in the rest of your application, but are getting in the way?
If it's the second one, then I think the solution is to restructure your packages so that all the "remote types" are separate from the non-remote types.
However, if you're really looking to pass an Unbounded_String, Map, or Vector from one partition to another in a remote call, it's trickier. Unbounded_String really should support external streaming, and there was a proposal to make Unbounded_String a Remote_Types package (see AI05-0204), but it wasn't acted on--I don't know why. Map and Vector would be bigger problems, though, since they are generic packages that have to work on any type, including those that don't support external streaming. In any case, those types aren't set up to be automatically converted to or from bytes to be passed over a socket.
But I think you could make it work like this:
private with Ada.Strings.Unbounded;
package Remote_Types_Package is
pragma Remote_Types;
type My_Unbounded_String is private;
private
type My_Unbounded_String is record
S : Ada.Strings.Unbounded.Unbounded_String;
end record;
end Remote_Types_Package;
The Unbounded_String package must be withed with private with; see E.2.2(6). You'll need to provide a function to create the My_Unbounded_String, and you'll need to provide stream read and write routines for My_Unbounded_String, and define 'Read and 'Write for the type. You should be able to write the Read and Write attributes by using the Read and Write attributes for the Unbounded_String. Something similar should be doable if you want to use a Vector as a remote call parameter, although you may have to do more work to marshal/unmarshal the type yourself.
Once again, I have not tried this, and it's possible there are some hitches in this solution.
EDIT: Since it now looks like the question is the simpler one--i.e. you have some types that are not going to be passed between partitions getting in the way--the solution should be simpler. Any types that you define that are going to be communicated between partitions need to be in a Remote_Types package, say P1. Other types should be in a different package, say P2 (or multiple packages). If types in P1 depend on types in P2, you can still get this to work by having P1 say private with P2;, and making sure you have the marshalling and unmarshalling procedures you need. If you run into difficulties, I'd encourage you to ask a new question here.
I don't know why the language required all such types to be quarantined in a Remote_Types package, instead of just saying that any type used in a Remote_Call_Interface package has to have only parts that can be streamed. There may have been some implementation issues. Any code that exists for a Remote_Types package has to be in programs in both partitions, perhaps, and this may have been an attempt to limit the type of code that would have to be linked into multiple partitions. But I'm just guessing.

Program extraction using native integers/words (not bignums) from Isabelle theory

This question comes in a context where Isabelle is used with formal software development in mind more than with pure maths theorization in mind (and from a standalone developer's context).
Seems at best, SML programs generated from an Isabelle theory, use SML's IntInf.int, not the native integer type, which is Int.int; even if Code_Target_Int, Code_Binary_Nat or Code_Target_Nat is used. Investigation of these theories sources seems to confirm it's all it can do. Native platform integers may be required for multiple reasons, including efficiency and the case the SML imperative program is to be optionally translated into an imperative language subset (ex. C or Ada), which is relevant when the theory relies on the Imperative_HOL theory. The codegen.pdf document which comes with the Isabelle distribution, did not help with it, except in suggesting the first of the options below.
Options may be:
Not using Isabelle's int and nat and re‑create a new numeric type from scratch, then use the code_printing commands (with its type_constructor and constant) to give it the native platform representation and operations (implies inclusion of range limitations in some way in the theory) : must be tedious, although unlikely error‑prone I hope, due to the formal environment. Note this does seems feasible with Isabelle's own int and nat… it makes code generation fails, and nothing tells which constants are missing in the code_printing command.
If the SML program is to be compiled directly (ex. with MLTon), tweak the SML environment with a replacement IntInf structure : may be unsafe or not feasible, and still requires to embed the range limitations in the theory, so the previous options may finally be better than this one.
Touch the generated program to change IntInf into Int : easy, but it is safe? (at least, IntInf implements the same signature as Int do, so may be it's safe). As above, requires to specifies bounds in the theory in some way, it's OK with this.
Dive into Isabelle internals : surely unreasonable, even worse than the second option.
There exist a Word theory, but according to some readings, it's seems not suited for that purpose.
Are they other known options not listed here? Are they comments on the listed options?
If there is no ready‑to‑cook solutions (I feel there is no at the time), what hints or tracks would be best known? (ex. links to documents, mentions of concepts).
Update
Points #2 and #3 of the list, may be OK (if it really is) only if there is a single integer type. If the program use more than only one, it's not applicable.
Directly generating native words from Isabelle int would be unsound, because your formalisation would not take overflow into account where it exists in reality.
It looks like the AFP entry Native_Word does what you want, though:
http://afp.sourceforge.net/entries/Native_Word.shtml

How does functional programming avoid state when it seems unavoidable?

Let's say we define a function c sum(a, b), functional programming -style, that returns the sum of its arguments. So far so good; all the nice things of FP without any problems.
Now let's say we run this in an environment with dynamic typing and a singleton, stateful error stream. Then let's say we pass a value of a and/or b that sum isn't designed to handle (i.e. not numbers), and it needs to indicate an error somehow.
But how? This function is supposed to be pure and side-effect-less. How does it insert an error into the global error stream without violating that?
No programming language that I know of has anything like a "singleton stateful error stream" built in, so you'd have to make one. And you simply wouldn't make such a thing if you were trying to write your program in a pure functional style.
You could, however, have a sum function that returns either the sum or an indication of an error. The type used to do this is in fact often known by the name Either. Then you could easily make a function that invokes a whole bunch of computations that could possibly return an error, and returns a list of all the errors that were encountered in the other computations. That's pretty close to what you were talking about; it's just explicitly returned rather than being global.
Remember, the question when you're writing a functional program is "how do I make a program that has the behavior I want?" not, "how would I duplicate one particular approach taken in another programming style?". A "global stateful error stream" is a means not an end. You can't have a global stateful error stream in pure function style, no. But ask yourself what you're using the global stateful error stream to achieve; whatever it is, you can achieve that in functional programming, just not with the same mechanism.
Asking whether pure functional programming can implement a particular technique that depends on side effects is like asking how you use techniques from assembly in object-oriented programming. OO provides different tools for you to use to solve problems; limiting yourself to using those tools to emulate a different toolset is not going to be an effective way to work with them.
In response to comments: If what you want to achieve with your error stream is logging error messages to a terminal, then yes, at some level the code is going to have to do IO to do that.1
Printing to terminal is just like any other IO, there's nothing particularly special about it that makes it worthy of singling out as a case where state seems especially unavoidable. So if this turns your question into "How do pure functional programs handle IO?", then there are no doubt many duplicate questions on SO, not to mention many many blog posts and tutorials speaking precisely to that issue. It's not like it's a sudden surprise to implementors and users of pure programming languages, the question has been around for decades, and there have been some quite sophisticated thought put into the answers.
There are different approaches taken in different languages (IO monad in Haskell, unique modes in Mercury, lazy streams of requests and responses in historical versions of Haskell, and more). The basic idea is to come up with a model which can be manipulated by pure code, and hook up manipulations of the model to actual impure operations within the language implementation. This allows you to keep the benefits of purity (the proofs that apply to pure code but not to general impure code will still apply to code using the pure IO model).
The pure model has to be carefully designed so that you can't actually do anything with it that doesn't make sense in terms of actual IO. For example, Mercury does IO by having you write programs as if you're passing around the current state of the universe as an extra parameter. This pure model accurately represents the behaviour of operations that depend on and affect the universe outside the program, but only when there is exactly one state of the universe in the system at any one time, which is threaded through the entire program from start to finish. So some restrictions are put in
The type io is made abstract so that there's no way to construct a value of that type; the only way you can get one is to be passed one from your caller. An io value is passed into the main predicate by the language implementation to kick the whole thing off.
The mode of the io value passed in to main is declared such that it is unique. This means you can't do things that might cause it to be duplicated, such as putting it in a container or passing the same io value to multiple different invocations. The unique mode ensures that you can only ass the io value to a predicate that also uses the unique mode, and as soon as you pass it once the value is "dead" and can't be passed anywhere else.
1 Note that even in imperative programs, you gain a lot of flexibility if you have your error logging system return a stream of error messages and then only actually make the decision to print them close to the outermost layer of the program. If your log calls are directly writing the output immediately, here's just a few things I can think of off the top of my head that become much harder to do with such a system:
Speculatively execute a computation and see whether it failed by checking whether it emitted any errors
Combine multiple high level systems into a single system, adding tags to the logs to distinguish each system
Emit debug and info log messages only if there is also an error message (so the output is clean when there are no errors to debug, and rich in detail when there are)

How to use non-blocking or asynchronous IO with Boost Spirit?

Does Spirit provide any capabilities for working with non-blocking IO?
To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.
The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.
It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?
TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.
For anything less trivial, please heed to following considerations/hints/tips:
How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?
That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:
this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).
Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.
Now, if you still want to do this:
You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.
If you manage the above, I think by far the best approach would seem to
use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.
Some more loose pointers:
you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Resources