Functions with no side effect but with readonly access to state - functional-programming

Considering the common definition of pure functions:
the function return values are identical for identical arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams)
the function application has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or input/output streams).
What are the implications of using (only) impure functions that still have 2 but do not have 1, in the sense that they can read the current value of (some) immutable state, but can not modify (any) state ? Is such a pattern have a name and is useful or is it an anti pattern ?
An example can be using a global way to get a read only immutable version of a state, or passing as an argument a function that return the current immutable value of a state.
(Rationale - I have been trying to structure my (C#) code in a more functional way, using pure functions where possible (as static members of static classes).
It quickly became obvious how complex and tedious it is to pass state values to these pure functions even when they only need to read the value. I need to know the relevant state value at the point of calling and that often means passing it around through parts of the code which have no need to know it otherwise.
However if for example I initialize such static classes with an internal member function that can return the current immutable value of the state, other members can use it instead of having that value passed to them. And This pattern has greatly simplified my code where I used it. And it feels like I still get most of the benefits of isolating state changes etc)

An potentially impure operation without side effects fits the description of a Query in Command-Query Separation (CQS) - a decade-old object-oriented design principle.
CQS explicitly distinguishes between operations with side effects (Commands) and operations that return data (Queries). According to that principle, a Query must not have a side effect.
On the other hand, CQS says nothing about determinism, so a Query is allowed to be non-deterministic.
In C#, a fundamental example of a Query is DateTime.Now. This is essentially a method that takes no arguments - which is equivalent with unit as an input argument. Thus, we can think of DateTime.Now as a Query from unit to DateTime: () -> DateTime.
DateTime.Now is (in)famously non-deterministic, so it's clearly not a pure function. It is, however, a Query.
All pure functions are Queries, but not all Queries are pure functions.
CQS is a nice design principle, but it's not Functional Programming (FP). It's a move in the right direction, but you should attempt to have as few non-deterministic Queries as possible.
People often tend to focus on avoiding side-effects when learning FP, but it's just as important to avoid non-determinism.

Related

Difference between the internal procedures and functions in Progress4gl?

Both internal procedures and functions are accepting the parameters to give the output. So what is the use of using Internal procedures instead of functions.
A user-defined function is used when you want to perform some calculation and return a single value. In this respect it is the same as a built-in ABL function, like the SUBSTRING or EXP functions. Putting this calculation code in a FUNCTION block instead of inline in your code allows you to put it in one place and reference it multiple times without code duplication.
An internal procedure is also an encapsulated piece of code that does some work, but it is more general-purpose. While a function must return a single value, an internal procedure may or may not have input parameters or output parameters.
https://docs.progress.com/category/openedge-archives
Also functions (like methods) parameters and return value type are checked at compile time, which removes some potential problems at run time later.
The question acknowledges that both functions and internal procedures allow OUTPUT parameters and asks "what is the use" of internal procedures instead of functions.
To me, this implies that the poster is contemplating always using functions and deprecating internal procedures and is asking: "what would I lose if I do that?"
Two things spring to mind:
Sort of the opposite of Jean-Christophe Cardot's point: you would lose some automatic type conversions and syntactic flexibility about the parameter lists. Some people see that flexibility in a negative light. Others see it as a positive.
You need to "forward declare" your functions or use dynamic invocations. With an internal procedure you can RUN it without providing a declaration earlier in the code.
If you tend to think that strict type checking is useful then these are probably not benefits that you think of as being lost. If you prefer more flexible behaviors, then you may regret choosing functions rather than internal procedures.

Julia functions: making mutable types immutable

Coming from Wolfram Mathematica, I like the idea that whenever I pass a variable to a function I am effectively creating a copy of that variable. On the other hand, I am learning that in Julia there are the notions of mutable and immutable types, with the former passed by reference and the latter passed by value. Can somebody explain me the advantage of such a distinction? why arrays are passed by reference? Naively I see this as a bad aspect, since it creates side effects and ruins the possibility to write purely functional code. Where I am wrong in my reasoning? is there a way to make immutable an array, such that when it is passed to a function it is effectively passed by value?
here an example of code
#x is an in INT and so is immutable: it is passed by value
x = 10
function change_value(x)
x = 17
end
change_value(x)
println(x)
#arrays are mutable: they are passed by reference
arr = [1, 2, 3]
function change_array!(A)
A[1] = 20
end
change_array!(arr)
println(arr)
which indeed modifies the array arr
There is a fair bit to respond to here.
First, Julia does not pass-by-reference or pass-by-value. Rather it employs a paradigm known as pass-by-sharing. Quoting the docs:
Function arguments themselves act as new variable bindings (new
locations that can refer to values), but the values they refer to are
identical to the passed values.
Second, you appear to be asking why Julia does not copy arrays when passing them into functions. This is a simple one to answer: Performance. Julia is a performance oriented language. Making a copy every time you pass an array into a function is bad for performance. Every copy operation takes time.
This has some interesting side-effects. For example, you'll notice that a lot of the mature Julia packages (as well as the Base code) consists of many short functions. This code structure is a direct consequence of near-zero overhead to function calls. Languages like Mathematica and MatLab on the other hand tend towards long functions. I have no desire to start a flame war here, so I'll merely state that personally I prefer the Julia style of many short functions.
Third, you are wondering about the potential negative implications of pass-by-sharing. In theory you are correct that this can result in problems when users are unsure whether a function will modify its inputs. There were long discussions about this in the early days of the language, and based on your question, you appear to have worked out that the convention is that functions that modify their arguments have a trailing ! in the function name. Interestingly, this standard is not compulsory so yes, it is in theory possible to end up with a wild-west type scenario where users live in a constant state of uncertainty. In practice this has never been a problem (to my knowledge). The convention of using ! is enforced in Base Julia, and in fact I have never encountered a package that does not adhere to this convention. In summary, yes, it is possible to run into issues when pass-by-sharing, but in practice it has never been a problem, and the performance benefits far outweigh the cost.
Fourth (and finally), you ask whether there is a way to make an array immutable. First things first, I would strongly recommend against hacks to attempt to make native arrays immutable. For example, you could attempt to disable the setindex! function for arrays... but please don't do this. It will break so many things.
As was mentioned in the comments on the question, you could use StaticArrays. However, as Simeon notes in the comments on this answer, there are performance penalties for using static arrays for really big datasets. More than 100 elements and you can run into compilation issues. The main benefit of static arrays really is the optimizations that can be implemented for smaller static arrays.
Another package-based options suggested by phipsgabler in the comments below is FunctionalCollections. This appears to do what you want, although it looks to be only sporadically maintained. Of course, that isn't always a bad thing.
A simpler approach is just to copy arrays in your own code whenever you want to implement pass-by-value. For example:
f!(copy(x))
Just be sure you understand the difference between copy and deepcopy, and when you may need to use the latter. If you're only working with arrays of numbers, you'll never need the latter, and in fact using it will probably drastically slow down your code.
If you wanted to do a bit of work then you could also build your own array type in the spirit of static arrays, but without all the bells and whistles that static arrays entails. For example:
struct MyImmutableArray{T,N}
x::Array{T,N}
end
Base.getindex(y::MyImmutableArray, inds...) = getindex(y.x, inds...)
and similarly you could add any other functions you wanted to this type, while excluding functions like setindex!.

Localizing global variables

When using the Extended Program Check, I get the following warning:
Do not declare fields and field symbols (variable name) globally.
This is from declaring global data before the selection screen. The obvious solution is that they should be declared locally in a subroutine.
If I decide to do this, the data will now be out of scope for the other subroutines, so I would end up creating something to the effect of a main() function from C or Java. This sounds like a good idea - however, events such as INITIALIZATION are not allowed to be inside of subroutines, meaning that it forces a break in scope.
Observe the sample program below:
REPORT Z_EXAMPLE.
SELECTION-SCREEN BEGIN OF BLOCK upload WITH FRAME TITLE text-H01.
PARAMETERS: p_infile TYPE rlgrap-filename LOWER CASE OBLIGATORY.
SELECTION-SCREEN END OF BLOCK upload.
AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_infile.
PERFORM main1 CHANGING p_infile.
INITIALIZATION.
PERFORM main2.
TOP-OF-PAGE.
PERFORM main3.
...
main1, main2, and main3 cannot to my knowledge pass any data to one another without global declaration. If the data is parsed from the uploaded file p_infile in main1, it cannot be accessed in main2 or main3. Aside from omitting events all together, is there any way to abide by the warning but let data be passed over events?
There are a variety of techniques - I prefer to code almost everything except for the basic selection screen handling in a separate controller class. The report simply defers to that class and calls its methods. Other than that - it's just a warning that you can ignore if you know what you're doing. Writing a program without any global variable at all will certainly not be practical - however, you should think at least twice before using global variables or attributes in a place where a method parameter would be more appropriate.
As #vwegert so rightly said, it's almost impossible to write an ABAP program that doesn't have at least a few global variables (the selection screen and events enforce that, unfortunately).
One approach is to use a controller class, another is to have a main subroutine and have it call other subroutines as required, passing values as required. I tend to favour the latter approach in a lot of cases, if only because it's easier to split the subroutines into logical groupings in separate includes (doing so with classes can sometimes be a little ugly). It really is a matter of approach though, but the key thing is reducing global variables to a minimum - unfortunately too few ABAP developers that I've encountered care about such issues.
Update
#Christian has reminded me that as of ABAP AS 7.02, subroutines are considered obsolete:
Subroutines should no longer be created in new programs for the following reasons:
The parameter interface has clear weaknesses when compared with the parameter interface of methods, such as:
positional parameters instead of keyword parameters
no genuine input parameters in pass by reference
typing is optional
no optional parameters
Every subroutine implicitly belongs to the public interface of its program. Generally this is not desirable.
Calling subroutines externally is critical with regard to the assignment of the container program to a program group in the internal session. This assignment cannot generally be defined as static.
Those are all valid points and I think in light of that, using classes for modularisation is definitely the preferred approach (and from a purely aesthetic point of view, they also "fit" better with the syntax enhancements in 7.02 and later).

Better way to get the reflect.Type of an interface in Go

Is there an better way to get the reflect.Type of an interface in Go than reflect.TypeOf((*someInterface)(nil)).Elem()?
It works, but it makes me cringe every time I scroll past it.
Unfortunately, there is not. While it might look ugly, it is indeed expressing the minimal amount of information needed to get the reflect.Type that you require. These are usually included at the top of the file in a var() block with all such necessary types so that they are computed at program init and don't incur the TypeOf lookup penalty every time a function needs the value.
This idiom is used throughout the standard library, for instance:
html/template/content.go: errorType = reflect.TypeOf((*error)(nil)).Elem()
The reason for this verbose construction stems from the fact that reflect.TypeOf is part of a library and not a built-in, and thus must actually take a value.
In some languages, the name of a type is an identifier that can be used as an expression. This is not the case in Go. The valid expressions can be found in the spec. If the name of a type were also usable as a reflect.Type, it would introduce an ambiguity for method expressions because reflect.Type has its own methods (in fact, it's an interface). It would also couple the language spec with the standard library, which reduces the flexibility of both.

How to hand over variables to a function? With an array or variables?

When I try to refactor my functions, for new needs, I stumble from time to time about the crucial question:
Shall I add another variable with a default value? Or shall I use only one array, where I´m able to add an additional variable without breaking the API?
Unless you need to support a flexible number of variables, I think it's best to explicitly identify each parameter. In most cases you can add an overloaded method that has a different signature to support the extra parameter while still supporting the original method signature. If you use an array for passing variables it just makes it too confusing for users of your API. Obviously there are some inputs that lend themselves to an array (a list of points in a polygon, a list of account IDs you wish to perform an action on, etc.) but if it's not a variable that you would reasonably expect to be an array or list, you should pass it into the method as a separate parameter.
Just like many questions in programming, the right answer is "it depends".
To take Javascript/jQuery as an example, one good rule of thumb is whether the parameter will be required each time the function is called or whether it is optional. For example, the main jQuery function itself requires an expression to determine what element(s) the operation will affect:
jQuery(expresssion)
It makes no sense to try to pass this parameter as part of an array as it will be required every time this function is called.
On the other hand, many jQuery plugins require several miscellaneous parameters that may be optional. By convention, these are passed as parameters via an 'options' array. As you said, this provides a nice interface as new parameters can be added without affecting the existing API. This makes the API clean as well since the user can ignore those options that are not applicable.
In general, when several parameters are involved, passing them as an array is a nice convention as many of them are certainly going to be optional. This would have helped clean up many WIN32 API's, although it is more difficult to deal with arrays in C/C++ than in Javascript.
It depends on the programming language used.
If you have a run-of-the-mill OO language, you should use an object that you can easily extend, if you are really concerned about API consistency.
If that doesn't matter that much, there is the option of changing the method signature and overloading the method with more / different parameters.
If your language doesn't support either and you want the API to be binary stable, use an array.
There are several considerations that must be made.
Where is the function used? - Only in code you created? One place or hundreds of places? The amount of work that will need to be done to maintain existing code is important. Remember to include the amount of time it will take to communicate to other programmers that may currently be using your function.
How critical is the new parameter? - Do you want to require it to be used? If it has a default value, will that default value break existing use of the function in any subtle ways?
Ease of comprehension - How many parameters are already passed into the function? The larger the number, the more confusing and error prone it will be. Code Complete recommends that you restrict the number of parameters to 7 or less. If you need more than that, you should try to abstract some or all of the related parameters into one object.
Other special considerations - Do you want to optimize your efforts for any special conditions such as code speed or size? Are there any special considerations that must be taken into account for your execution environment? Keep in mind your goals for the project and make sure you aren't working against them with whatever design choice you make.
In his book Code Complete, Steve McConnell decrees that a function should never have more than 7 arguments, and rarely even that many. He presents compelling arguments - that I can't cite from memory, alas.
Clean Code, more recently, advocates even fewer arguments.
So unless the number of things to pass is really small, they should be passed in an enveloping structure. If they're homogenous, an array. If not, then a reasonably lightweight object should be built for the purpose.
You should do neither. Just add the parameter and change all callers to supply the proper default value. The reason is that parameters with default values can only be at the end, and will not be able to add any more required parameters anywhere in the parameters list, without having a risk of misinterpretation.
These are the critical steps to disaster:
1. add one or two parameters with defaults
2. some callers will supply it, and some will rely on defaults.
[half a year passed]
3. add a required parameter (before them)
4. change all callers to accept the required parameter
5. get a phone call, or other event which will make you forget to change one of the instances in part#2
6. now your program compiles perfectly, but is invalid.
Unfortunately, in function call semantics we usually don't have a chance to say, by name, which value goes where.
Array is also not a proper solution. Array should be used as a connection of similar objects, upon which there's a uniform activity performed. As they say here, if it's worth refactoring, it's worth refactoring now.

Resources