How to interrupt, exit a compose or pipe? - functional-programming

What's the proper way to interrupt a long chain of compose or pipe functions ?
Let's say the chain doesn't need to run after the second function because it found an invalid value and it doesn't need to continue the next 5 functions as long as user submitted value is invalid.
Do you return an undefined / empty parameter, so the rest of the functions just check if there is no value returned, and in this case just keep passing the empty param ?

I don't think there is a generic way of dealing with that.
Often when working with algebraic data types, things are defined so that you can continue to run functions even when the data you would prefer is not present. This is one of the extremely useful features of types such as Maybe and Either for instance.
But most versions of compose or related functions don't give you an early escape mechanism. Ramda's certainly doesn't.

While you can't technically exit a pipeline, you can use pipe() within a pipeline, so there's no reason why you couldn't do something like below to 'exit' (return from a pipeline or kick into another):
pipe(
// ... your pipeline
propOr(undefined, 'foo'), // - where your value is missing
ifElse(isNil, always(undefined), pipe(
// ...the rest of your pipeline
))
)

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

What is the use case of firebase-queue sanitize?

I am experimenting with firebase-queue. I saw the option for sanitizing. It's described in the doc as
sanitize - specifies whether the data object passed to the processing
function is sanitized of internal keys reserved for use by the queue.
Defaults to true.
What does it mean?
I am getting an error for not specifying { sanitize : false }
When the sanitize option is set, the queue sanitizes (or cleans) the input provided to the processing function so that it resembles that which the original client placed onto the queue, and doesn't contain any of the keys added by the implementation of the queue itself.
If, however, you rely on a key (usually the keys starting with an underscore, e.g. _id) that is added by the queue, and not the original client, you need to set sanitize: false so those keys are returned to your function and they're not undefined.
You can clearly see the difference with a simple processing function that just performs a console.log(data).
A quick note about why these keys are removed by default: Reading or writing directly to the location (as it looks like you're perhaps doing, by passing undefined into the client SDK child() method instead of data._id) is generally a bad idea from within the worker itself as writes performed directly are not guarded by the extensive transaction logic in the queue to prevent race conditions. If you can isolate the work to taking input from the provided data field, and returning outputs to the resolve() function, you'll likely have a better time scaling up your queue.

Compare & contrast redux reselect vs lodash / underscore memoize...?

I was wondering if someone can compare & contrast the differences between redux reselect lib vs lodash memoize...?
Lodash's memoize is a classic memoizing utility: It memoizes a function using (by default) its first argument as cache key. Lodash's memoize keeps track/caches all the results obtained with different cache keys.
Reselect instead, checks each provided argument, and (by default) recomputes the result ONLY IF one of the arguments changes. Reselect's selectors have by default a cache size of 1, and are mainly designed to stabilize state-derived data avoiding undesired recomputations.
There are 2 major differences:
are all parameters checked for changes before recomputing, or just the first one?
are all results cached, or only the most recent result?
lodash _.memoize:
only checks the first parameter for changes and returns the most recent result for that first parameter, regardless of whether other parameters have changed (by default).
keeps an infinitely large record of all result values keyed by that first parameter.
Example of a problematic case with lodash _.memoize:
const memoizedFunc = _.memoize(
(param1, param2) => param1 + param2
);
console.log(memoizedFunc(1, 2)); // 3
console.log(memoizedFunc(1, 3)); // 3 (but it should be 4!)
Note that you can write a custom resolver function and pass it in as the second argument to _.memoize to change this behavior and thus take all or some parameters into account. This is extra logic to test and maintain and may or may not be worth it to you. (The resolver function determines the key to be used in the Map of cached results that the memoized function maintains. By default the key is just set to equal the first parameter.)
reselect:
checks all parameters for equality against the most recent execution only.
only caches a single result (the most recent).
Note that reselect is primarily used for redux applications, creating memoized selectors with createSelector. In createSelector, each parameter is expected to have a getter (selector) function, because of the expectation of usage with a redux store. If you are not trying to memoize data from a redux store, you still could use this and send in an identity function like _.identity for each parameter, but that would be silly.
Fortunately, if you only want to memoize functions, and you want all parameters checked for changes, you can use reselect's defaultMemoize and get the desired behavior.
Summary / other libraries (memoize-one)
If you want all parameters checked for changes, and are not using redux or otherwise need to use createSelector, you may want to just use a very lightweight and fast library intended for specifically for this like memoize-one.
If you do want createSelector, you can just use reselect for all your needs most likely.
If you want all results to be cached, not just the most recent ones, you can use lodash _.memoize alone, and you can also customize reselect to use that feature of lodash _.memoize.

How to return results together with update operations in BaseX?

I recognized that (insert/delete)-XQueries executed with the BaseX client always returning an empty string. I find this very confusing or unintuitive.
Is there a way to find out if the query was "successful" without querying the database again (and using potentially buggy "transitive" logic like "if I deleted a node, there must be 'oldNodeCount-1' nodes in the XML")?
XQuery Update statements do not return anything -- that's how they are defined. But you're not the only one who does not like those restrictions, and BaseX added two ways around this limitation:
Returning Results
By default, it is not possible to mix different types of expressions
in a query result. The outermost expression of a query must either be
a collection of updating or non-updating expressions. But there are
two ways out:
The BaseX-specific update:output() function bridges this gap: it caches the results of its arguments at runtime and returns them after
all updates have been processed. The following example performs an
update and returns a success message:
update:output("Update successful."), insert node <c/> into doc('factbook')/mondial
With the MIXUPDATES option, all updating constraints will be turned off. Returned nodes will be copied before they are modified by
updating expressions. An error is raised if items are returned within
a transform expression.
If you want to modify nodes in main memory, you can use the transform
expression.
The transform expression will not help you, as you seem to modify the data on disk. Enabling MIXUPDATES allows you to both update the document and return something at the same time, for example running something like
let $node := <c/>
return ($node, insert node $node into doc('factbook')/mondial)
MIXUPDATES allows you to return something which can be further processed. Results are copied before being returned, if you run multiple updates operations and do not get the expected results, make sure you got the concept of the pending update list.
The db:output() function intentionally breaks its interface contract: it is defined to be an updating function (not having any output), but at the same time it prints some information to the query info. You cannot further process these results, but the output can help you debugging some issues.
Pending Update List
Both ways, you will not be able to have an immediate result from the update, you have to add something on your own -- and be aware updates are not visible until the pending update list is applied, ie. after the query finished.
Compatibility
Obviously, these options are BaseX-specific. If you strongly require compatible and standard XQuery, you cannot use these expressions.

How to separate concerns functionally

I'm writing a program in Scala and trying to remain as functionally pure as is possible. The problem I am facing is not Scala specific; it's more to do with trying to code functionally. The logic for the function that I have to code goes something like:
Take some value of type A
Use this value to generate log information
Log this information by calling a function in an external library and evaluate the return status of the logging action (ie was it a successful log or did the log action fail)
Regardless of whether the log succeeded or failed, I have to return the input value.
The reason for returning the input value as the output value is that this function will be composed with another function which requires a value of type A.
Given the above, the function I am trying to code is really of type A => A i.e. it accepts a value of type A and returns a value of type A but in between it does some logging. The fact that I am returning the same value back that I inputted makes this function boil down to an identity function!
This looks like code smell to me and I am wondering what I should do to make this function cleaner. How can I separate out the concerns here? Also the fact that the log function goes away and logs information means that really I should wrap that call in a IO monad and call some unsafePerformIO function on it. Any ideas welcome.
What you're describing sounds more like debugging than logging. For example, Haskell's Debug.Trace.trace does exactly that and its documentation states: "These can be useful for investigating bugs or performance problems. They should not be used in production code."
If you're doing logging, the logging function should only log and have no further return value. As mentioned by #Bartek above, its type would be A -> IO (), i.e. returning no information () and having side-effects (IO). For example Haskell's hslogger library provides such functions.

Resources