How far should I take referential transparency? - functional-programming

I am building a website using erlang, mnesia, and webmachine. Most of the documentation I have read praises the virtues of having referentially transparent functions.
The problem is, all database access is external state. This means that any method that hits the database is no longer referentially transparent.
Lets say I have a user object in a database and some functions that deal with authentication.
Referentially opaque functions might look like:
handle_web_request(http_info) ->
is_authorized_user(http_info.userid),
...
%referentially opaque
is_authorized_user(userid) ->
User = get_user_from_db(userid),
User.is_authorized.
%referentially opaque
lots_of_other_functions(that_are_similar) ->
db_access(),
foo.
Referentially transparency requires that I minimize the amount of referentially opaque code, so the caller must get the object from the database and pass that in as an argument to a function:
handle_web_request(http_info) ->
User = get_user(http_info.userid),
is_authorized_user(User),
...
%referentially opaque
get_user(userid) ->
get_user_from_db(userid).
%referentially transparent
is_authorized(userobj) ->
userobj.is_authorized.
%referentially transparent
lots_of_other_functions(that_are_similar) ->
foo.
The code above is obviously not production code - it is made up purely for illustrative purposes.
I don't want to get sucked into dogma. Do the benefits of referentially transparent code (like provable unit testing) justify the less friendly interface? Just how far should I go in the pursuit of referentially transparancy?

Why not take referential transparency all the way?
Consider the definition of get_user_from_db. How does it know how to talk to the database? Obviously it assumes some (global) database context. You could change this function so that it returns a function that takes the database context as its argument. What you have is...
get_user_from_db :: userid -> User
This is a lie. You can't go from a userid to a user. You need something else: a database.
get_user_from_db :: userid -> Database -> User
Now just curry that with the userid, and given a Database at some later time, the function will give you a User. Of course, in the real world, Database will be a handle or a database connection object or whatever. For testing, give it a mock database.

You already mentioned unit-testing, keep thinking in those terms. Everything you find value in testing should be refentially transparent so you can test it.
If you don't have any complex logic that could go wrong, and a single functional/integration test would see that it is correct, then why bother going the extra distance?
Think YAGNI. But where unit-testability is a real need.

Related

how to start one flux after completion of another?

how can i start Flux after completion of Flux? Flux doesn't depend on result of Flux.
I thought operation then() should do the work, but it just return mono. So what is the correct operator?
Flux.thenMany is the operator that you need
abstract Flux<Integer> flux1();
abstract Flux<String> flux2();
public Flux<String> flux2AfterFlux1(){
return flux1().thenMany(flux2());
}
My reply is late, so you have probably either figured this out, or moved on. I need to do the same thing, so I seem to have achieved this by using Flux.usingWhen(). I need to obtain IDs from my database that exhibit a certain property. After retrieving these IDs, then I need to get some information about these IDs. Normally, a join would work, but I am using MongoDB, and joins are horrible, so it is better to do separate, sequential queries. So let's say I have a Mongo reactive repository with a method called Flux<String> findInterestingIds(). I have another method in the same repository called Flux<String> findInterestingInformation(String[] interestingIds) that I need to feed the IDs found in the call to the previous method. So this is how I handle it:
Mono<List<String>> interestingIdsMono = myRepo.findInterestingIds()
.collectList();
Flux.usingWhen(interestingIdsMono,
interestingIds -> myRepo.findInterestingInformation(interestingIds),
interestingIds -> Flux.empty().doOnComplete(() -> log.debug("Complete"));
The cleanup function (third parameter) was something that I don't quite yet understand how to use in a good way, so I just logged completion and I do not need to emit anything extra when the flux completes.
ProjectReactor used to have a compose function that is now called transformDeferred, and I had some hopes for that, but I do not quite see how it would apply in this type of situation. At any rate, this is my attempt at it, so feel free to show me a better way!

Dealing with DB handles and initialization in functional programming

I have several functions that deal with database interactions. (like readModelById, updateModel, findModels, etc) that I try to use in a functional style.
In OOP, I'd create a class that takes DB-connection-parameters in the constructor, creates the database-connection and save the DB-handle in the instance. The functions then would just use the DB-handle from "this".
What's the best way in FP to deal with this? I don't want to hand around the DB handle throughout the entire application. I thought about partial application on the functions to "bake in" the handle, but that creates ugly boilerplate code, doing it one by one and handing it back.
What's the best practice/design pattern for things like this in FP?
There is a parallel to this in OOP that might suggest the right approach is to take the database resource as parameter. Consider the DB implementation in OOP using SOLID principles. Due to Interface Segregation Principle, you would end up with an interface per DB method and at least one implementation class per interface.
// C#
public interface IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day);
}
public class GetRegistrationsImpl : IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day)
{
...
}
private readonly DbResource _db;
public GetRegistrationsImpl(DbResource db)
{
_db = db;
}
}
Then to execute your use case, you pass in only the dependencies you need instead of the whole set of DB operations. (Assume that ISaveRegistration exists and is defined like above).
// C#
public async Task Register(
IGetRegistrations a,
ISaveRegistration b,
RegisterRequest requested
)
{
var registrations = await a.GetRegistrations(requested.Date);
// examine existing registrations and determine request is valid
// throw an exception if not?
...
return await b.SaveRegistration( ... );
}
Somewhere above where this code is called, you have to new up the implementations of these interfaces and provide them with DbResource.
var a = new GetRegistrationsImpl(db);
var b = new SaveRegistrationImpl(db);
...
return await Register(a, b, request);
Note: You could use a DI framework here to attempt to avoid some boilerplate. But I find it to be borrowing from Peter to pay Paul. You pay as much in having to learn a DI framework and how to make it behave as you do for wiring the dependencies yourself. And it is another tech new team members have to learn.
In FP, you can do the same thing by simply defining a function which takes the DB resource as a parameter. You can pass functions around directly instead of having to wrap them in classes implementing interfaces.
// F#
let getRegistrations (db: DbResource) (day: DateTime) =
...
let saveRegistration (db: DbResource) ... =
...
The use case function:
// F#
let register fGet fSave request =
async {
let! registrations = fGet request.Date
// call your business logic here
...
do! fSave ...
}
Then to call it you might do something like this:
register (getRegistrations db) (saveRegistration db) request
The partial application of db here is analogous to constructor injection. Your "losses" from passing it to multiple functions is minimal compared to the savings of not having to define interface + implementation for each DB operation.
Despite being in a functional-first language, the above is in principle the same as the OO/SOLID way... just less lines of code. To go a step further into the functional realm, you have to work on eliminating side effects in your business logic. Side effects can include: current time, random numbers, throwing exceptions, database operations, HTTP API calls, etc.
Since F# does not require you to declare side effects, I designate a border area of code where side effects should stop being used. For me, the use case level (register function above) is the last place for side effects. Any business logic lower than that, I work on pushing side effects up to the use case. It is a learning process to do that, so do not be discouraged if it seems impossible at first. Just do what you have to for now and learn as you go.
I have a post that attempts to set the right expectations on the benefits of FP and how to get them.
I'm going to add a second answer here taking an entirely different approach. I wrote about it here. This is the same approach used by MVU to isolate decisions from side effects, so it is applicable to UI (using Elmish) and backend.
This is worthwhile if you need to interleave important business logic with side effects. But not if you just need to execute a series of side effects. In that case just use a block of imperative statements, in a task (F# 6 or TaskBuilder) or async block if you need IO.
The pattern
Here are the basic parts.
Types
Model - The state of the workflow. Used to "remember" where we are in the workflow so it can be resumed after side effects.
Effect - Declarative representation of the side effects you want to perform and their required data.
Msg - Represents events that have happened. Primarily, they are the results of side effects. They will resume the workflow.
Functions
update - Makes all the decisions. It takes in its previous state (Model) and a Msg and returns an updated state and new Effects. This is a pure function which should have no side effects.
perform - Turns a declared Effect into a real side effect. For example, saving to a database. Returns a Msg with the result of the side effect.
init - Constructs an initial Model and starting Msg. Using this, a caller gets the data it needs to start the workflow without having to understand the internal details of update.
I jotted down an example for a rate-limited emailer. It includes the implementation I use on the backend to package and run this pattern, called Ump.
The logic can be tested without any instrumentation (no mocks/stubs/fakes/etc). Declare the side effects you expect, run the update function, then check that the output matches with simple equality. From the linked gist:
// example test
let expected = [SendEmail email1; ScheduleSend next]
let _, actual = Ump.test ump initArg [DueItems ([email1; email2], now)]
Assert.IsTrue(expected = actual)
The integrations can be tested by exercising perform.
This pattern takes some getting-used-to. It reminds me a bit of an Erlang actor or a state machine. But it is helpful when you really need your business logic to be correct and well-tested. It also happens to be a proper functional pattern.

How to avoid the use of mutable datastructure and use more functional approach?

I'm building in scheme a database using a wiredtiger key/value store.
To query a given table one needs to have a cursor over the table. The library recommends to re-use the cursor. The general behavior can be described by the following pseudo-code:
with db.cursor() as cursor:
cursor.get(key)
...
do_something(db)
...
During the extent of the with statment cursor can only be used in the current context. If do_something(db) needs a cursor, it must create/retrieve another cursor even if it's to query the same table. Otherwise the cursor loose its position and the continuation of do_something(db) doesn't expect.
You can work around it by always reseting the cursor, that's a waste. Instead it's preferable to keep a set of cursors ready to be used and when one can request via db.cursor() this will remove a cursor from the available cursors and return it. Once the "context" operation is finished, put it back.
The way I solve this in Python is by using a list. db.cursor() looks like:
def cursor(self):
cursor = self.cursors.pop()
yield cursor
self.cursors.append(cursor)
Which means, retrieve a cursor, send it to the current context, once the context is finished, put it back to the list of available cursors.
How can I avoid the mutation and use more functional approach?
Maybe you want parameters?
Lookup the exact construct used by your Scheme implementation.
Some implementations use:
http://srfi.schemers.org/srfi-39/srfi-39.html

Can Code be Protected From Rogue Callers In Ada?

I'm a fairly new Ada programmer. I have read the book by Barnes (twice I might add) and even managed to write a fair terminal program in Ada. My main language is C++ though.
I am currently wondering if there is a way to "protect" subroutine calls in Ada, perhaps in Ada 2012 (of which I know basically nothing). Let me explain what I mean (although in C++ terms).
Suppose you have a class Secret like this:
class Secret
{
private:
int secret_int;
public:
Set_Secret_Value( int i );
}
Now this is the usual stuff, dont expose secret_int, manipulate it only through access functions. However, the problem is that anybody with access to an object of type Secret can manipulate the value, whether that particular code section is supposed to do it or not. So the danger of rogue altering of secret_int has been reduced to anybody altering secret_int through the permitted functions, even if it happens in a code section that's not supposed to manipulate it.
To remedy that I came up with the following construct
class Secret
{
friend class Secret_Interface;
private:
int secret_int;
Set_Secret_Value( int i );
Super_Secret_Function();
};
class Secret_Interface
{
friend class Client;
private:
static Set_Secret_Value( Secret &rc_secret_object, int i )
{
rc_secret_object.Set_Secret( i );
}
};
class Client
{
Some_Function()
{
...
Secret_Interface::Set_Secret_Value( c_object, some-value );
...
}
}
Now the class Secret_Interface can determine which other classes can use it's private functions and by doing so, indirectly, the functions of class Secret that are exposed to Secret_Interface. This way class Secret still has private functions that can not be called by anybody outside the class, for instance function Super_Secret_Function().
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
Thanks for any help.
Edit:
I add a diagram here with a program structure like I have in mind that shows that what I mean here is a transport of a data structure across a wide area of the software, definition, creation and use of a record should happen in code sections that are otherwise unrleated
I think the key is to realize that, unlike C++ and other languages, Ada's primary top-level unit is the package, and visibility control (i.e. public vs. private) is on a per-package basis, not a per-type (or per-class) basis. I'm not sure I'm saying that correctly, but hopefully things will be explained below.
One of the main purposes of friend in C++ is so that you can write two (or more) closely related classes that both take part in implementing one concept. In that case, it makes sense that the code in one class would be able to have more direct access to the code in another class, since they're working together. I assume that in your C++ example, Secret and Client have that kind of close relationship. If I understand C++ correctly, they do all have to be defined in the same source file; if you say friend class Client, then the Client class has to be defined somewhere later in the same source file (and it can't be defined earlier, because at that point the methods in Secret or Secret_Interface haven't yet been declared).
In Ada, you can simply define the types in the same package.
package P is
type Secret is tagged private;
type Client is tagged private;
-- define public operations for both types
private
type Secret is tagged record ... end record;
type Client is tagged record ... end record;
-- define private operations for either or both types
end P;
Now, the body of P will contain the actual code for the public and private operations of both types. All code in the package body of P has access to those things defined in P's private part, regardless of which type they operate on. And, in fact, all code has access to the full definitions of both types. This means that a procedure that operates on a Client can call a private operation that operates on a Secret, and in fact it can read and write a Secret's record components directly. (And vice versa.) This may seem bizarre to programmers used to the class paradigm used by most other OOP languages, but it works fine in Ada. (In fact, if you don't need Secret to be accessible to anything else besides the implementation of Client, the type and its operations can be defined in the private part of P, or the package body.) This arrangement doesn't violate the principles behind OOP (encapsulation, information hiding), as long as the two types are truly two pieces of the implementation of one coherent concept.
If that isn't what you want, i.e. if Secret and Client aren't that closely related, then I would need to see a larger example to find out just what kind of use case you're trying to implement.
MORE THOUGHTS: After looking over your diagram, I think that the way you're trying to solve the problem is inferior design--an anti-pattern, if you will. When you write a "module" (whatever that means--a class or package, or in some cases two or more closely related classes or packages cooperating with each other), the module defines how other modules may use it--what public operations it provides on its objects, and what those operations do.
But the module (let's call it M1) should work the same way, according to its contract, regardless of what other module calls it, and how. M1 will get a sequence of "messages" instructing it to perform certain tasks or return certain information; M1 should not care where those messages are coming from. In particular, M1 should not be making decisions about the structure of the clients that use it. By having M1 decree that "procedure XYZ can only be called from package ABC", M1 is imposing structural requirements on the clients that use it. This, I believe, causes M1 to be too tightly coupled to the rest of the program. It is not good design.
However, it may make sense for the module that uses M1 to exercise some sort of control like that, internally. Suppose we have a "module" M2 that actually uses a number of packages as part of its implementation. The "main" package in M2 (the one that clients of M2 use to get M2 to perform its task) uses M1 to create a new object, and then passes that object to several other packages that do the work. It seems like a reasonable design goal to find a way that M2 could pass that object to some packages or subprograms without giving them the ability to, say, update the object, but pass it to other packages or subprograms that would have that ability.
There are some solutions that would protect against most accidents. For example:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_With_Updater is new Secret with null record;
procedure Dangerous_Operation (X : in out Secret_With_Updater);
end M1;
Now, the packages that could take a "Secret" object but should not have the ability to update it would have procedures defined with Secret'Class parameters. M2 would create a Secret_With_Updater object; since this object type is in Secret'Class, it could be passed as a parameter to procedures with Secret'Class parameters. However, those procedures would not be able to call Dangerous_Operation on their parameters; that would not compile.
A package with a Secret'Class parameter could still call the dangerous operation with a type conversion:
procedure P (X : in out Secret'Class) is
begin
-- ...
M1.Secret_With_Updater(X).Dangerous_Operation;
-- ...
end P;
The language can't prevent this, because it can't make Secret_With_Updater visible to some packages but not others (without using a child package hierarchy). But it would be harder to do this accidentally. If you really wish to go further and prevent even this (if you think there will be a programmer whose understanding of good design principles is so poor that they'd be willing to write code like this), then you could go a little further:
package M1 is
type Secret is tagged private;
procedure Harmless_Operation (X : in out Secret);
type Secret_Acc is access all Secret;
type Secret_With_Updater is tagged private;
function Get_Secret (X : Secret_With_Updater) return Secret_Acc;
-- this will be "return X.S"
procedure Dangerous_Operation (X : in out Secret_With_Updater);
private
-- ...
type Secret_With_Updater is tagged record
S : Secret_Acc;
end record;
-- ...
end M1;
Then, to create a Secret, M2 would call something that creates a Secret_With_Updater that returns a record with an access to a Secret. It would then pass X.Get_Secret to those procedures which would not be allowed to call Dangerous_Operation, but X itself to those that would be allowed. (You might also be able to declare S : aliased Secret, declare Get_Secret to return access Secret, and implement it with return X.S'access. This may avoid a potential memory leak, but it may also run into accessibility-check issues. I haven't tried this.)
Anyway, perhaps some of these ideas could help accomplish what you want to accomplish without introducing unnecessary coupling by forcing M1 to know about the structure of the application that uses it. It's hard to tell because your description of the problem, even with the diagram, is still at too abstract a level for me to see what you really want to do.
You could do this by using child packages:
package Hidden is
private
A : Integer;
B : Integer;
end Hidden;
and then
package Hidden.Client_A_View is
function Get_A return Integer;
procedure Set_A (To : Integer);
end Hidden.Client_A_View;
Then, Client_A can write
with Hidden.Client_A_View;
procedure Client_A is
Tmp : Integer;
begin
Tmp := Hidden.Client_A_View.Get_A;
Hidden.Client_A_View.Set_A (Tmp + 1);
end Client_A;
Your question is extremely unclear (and all the C++ code doesn't help explaining what you need), but if your point is that you want a type to have some publicly accessible operations, and some private operations, then it is easily done:
package Example is
type Instance is private;
procedure Public_Operation (Item : in out Instance);
private
procedure Private_Operation (Item : in out Instance);
type Instance is ... -- whatever you need it to be
end Example;
The procedure Example.Private_Operation is accessible to children of Example. If you want an operation to be purely internal, you declare it only in the package body:
package body Example is
procedure Internal_Operation (Item : in out Instance);
...
end Example;
Well I was wondering if anything of this sort is possible in Ada. Basically my desire is to be able to say:
Code A may only be executed by code B but not by anybody else
If limited to language features, no.
Programmatically, code execution can be protected if the provider must be provided an approved "key" to allow execution of its services, and only authorized clients are supplied with such keys.
Devising the nature, generation, and security of such keys is left as an exercise for the reader.

First-Class Citizen

The definition of first-class citizen found in the wiki article says:
An object is first-class when it
can be stored in variables and data structures
can be passed as a parameter to a subroutine
can be returned as the result of a subroutine
can be constructed at run-time
has intrinsic identity (independent of any given name)
Can someone please explain/elaborate on the 5th requirement (in bold)? I feel that the article should have provided more details as in what sense "intrinsic identity" is capturing.
Perhaps we could use functions in Javascript and functions in C in our discussion to illustrate the 5th bullet.
I believe functions in C are second-class, whereas functions are first-class in Javascript because we can do something like the following in Javascript:
var foo = function () { console.log("Hello world"); };
, which is not permitted in C.
Again, my question is really on the 5th bullet (requirement).
Intrinsic identity is pretty simple, conceptually. If a thing has it, its identity does not depend on something external to that thing. It can be aliased, referenced, renamed, what-have-you, but it still maintains whatever that "identity" is. People (most of them, anyway) have intrinsic identity. You are you, no matter what your name is, or where you live, or what physical transformations you may have suffered in life.
An electron, on the other hand, has no intrinsic identity. Perhaps introducing quantum mechanics here just confuses the issue, but I think it's a really fantastic example. There's no way to "tag" or "label" an electron such that we could tell the difference between it and a neighbor. If you replace one electron with another, there is absolutely no way to distinguish the old one from the new one.
Back to computers: an example of "intrinsic identity" might be the value returned by Object#hashCode() in Java, or whatever mechanism a JavaScript engine uses that permits this statement to be false:
{} === {} // false
but this to be true:
function foo () {}
var bar = foo;
var baz = bar;
baz === foo; // true

Resources