How to separate concerns functionally - functional-programming

I'm writing a program in Scala and trying to remain as functionally pure as is possible. The problem I am facing is not Scala specific; it's more to do with trying to code functionally. The logic for the function that I have to code goes something like:
Take some value of type A
Use this value to generate log information
Log this information by calling a function in an external library and evaluate the return status of the logging action (ie was it a successful log or did the log action fail)
Regardless of whether the log succeeded or failed, I have to return the input value.
The reason for returning the input value as the output value is that this function will be composed with another function which requires a value of type A.
Given the above, the function I am trying to code is really of type A => A i.e. it accepts a value of type A and returns a value of type A but in between it does some logging. The fact that I am returning the same value back that I inputted makes this function boil down to an identity function!
This looks like code smell to me and I am wondering what I should do to make this function cleaner. How can I separate out the concerns here? Also the fact that the log function goes away and logs information means that really I should wrap that call in a IO monad and call some unsafePerformIO function on it. Any ideas welcome.

What you're describing sounds more like debugging than logging. For example, Haskell's Debug.Trace.trace does exactly that and its documentation states: "These can be useful for investigating bugs or performance problems. They should not be used in production code."
If you're doing logging, the logging function should only log and have no further return value. As mentioned by #Bartek above, its type would be A -> IO (), i.e. returning no information () and having side-effects (IO). For example Haskell's hslogger library provides such functions.

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

Error During Function. Where does all the data go?

I've developed a function where i use the return function to spit out a list of vectors. Unfortunately there are still a few bugs in my code. Once my function has failed due to error can i recover that list of vectors?
Functions have their own scope, so if a function fails, the function will exit and no return value will be returned. It's difficult to say what's making your function fail without looking at your code. If the failure is due to something you have control of in your code, I suggest you solve it before relying on potentially function bogus results. But if the failure is out of your control (e.g. calling an unavailable external data source) you can wrap your risky code in a try call to recover in the event of an error. I hope this helps.

What is the use case of firebase-queue sanitize?

I am experimenting with firebase-queue. I saw the option for sanitizing. It's described in the doc as
sanitize - specifies whether the data object passed to the processing
function is sanitized of internal keys reserved for use by the queue.
Defaults to true.
What does it mean?
I am getting an error for not specifying { sanitize : false }
When the sanitize option is set, the queue sanitizes (or cleans) the input provided to the processing function so that it resembles that which the original client placed onto the queue, and doesn't contain any of the keys added by the implementation of the queue itself.
If, however, you rely on a key (usually the keys starting with an underscore, e.g. _id) that is added by the queue, and not the original client, you need to set sanitize: false so those keys are returned to your function and they're not undefined.
You can clearly see the difference with a simple processing function that just performs a console.log(data).
A quick note about why these keys are removed by default: Reading or writing directly to the location (as it looks like you're perhaps doing, by passing undefined into the client SDK child() method instead of data._id) is generally a bad idea from within the worker itself as writes performed directly are not guarded by the extensive transaction logic in the queue to prevent race conditions. If you can isolate the work to taking input from the provided data field, and returning outputs to the resolve() function, you'll likely have a better time scaling up your queue.

How to interrupt, exit a compose or pipe?

What's the proper way to interrupt a long chain of compose or pipe functions ?
Let's say the chain doesn't need to run after the second function because it found an invalid value and it doesn't need to continue the next 5 functions as long as user submitted value is invalid.
Do you return an undefined / empty parameter, so the rest of the functions just check if there is no value returned, and in this case just keep passing the empty param ?
I don't think there is a generic way of dealing with that.
Often when working with algebraic data types, things are defined so that you can continue to run functions even when the data you would prefer is not present. This is one of the extremely useful features of types such as Maybe and Either for instance.
But most versions of compose or related functions don't give you an early escape mechanism. Ramda's certainly doesn't.
While you can't technically exit a pipeline, you can use pipe() within a pipeline, so there's no reason why you couldn't do something like below to 'exit' (return from a pipeline or kick into another):
pipe(
// ... your pipeline
propOr(undefined, 'foo'), // - where your value is missing
ifElse(isNil, always(undefined), pipe(
// ...the rest of your pipeline
))
)

How can I tell the Closure Compiler not to rename an inner function using SIMPLE_OPTIMIZATIONS?

How can I tell the Closure Compiler not to rename an inner function? E.g., given this code:
function aMeaninglessName() {
function someMeaningfulName() {
}
return someMeaningfulName;
}
...I'm fine with Closure renaming the outer function (I actively want it to, to save space), but I want the function name someMeaningfulName left alone (so that the name shown in call stacks for it is "someMeaningfulName", not "a" or whatever). This despite the fact that the code calling it will be doing so via the reference returned by the factory function, not by the name in the code. E.g., this is purely for debugging support.
Note that I want the function to have that actual name, not be anonymous and assigned to some property using that name, so for instance this is not a duplicate of this other question.
This somewhat obscure use case doesn't seem to be covered by either the externs or exports functionality. (I was kind of hoping there'd be some annotation I could throw at it.) But I'm no Closure Compiler guru, I'm hoping some of you are. Naturally, if there's just no way to do that, that's an acceptable answer.
(The use case is a library that creates functions in response to calls into it. I want to provide a version of the library that's been pre-compressed by Closure with SIMPLE_OPTIMIZATIONS, but if someone is using that copy of the library with their own uncompressed code and single-stepping into the function in a debugger [or other similar operations], I want them to see the meaningful name. I could get around it with eval, or manually edit the compressed result [in fact, the context is sufficiently unique I could throw a sed script at it], but that's awkward and frankly takes us into "not worth bothering" territory, hence looking for a simple, low-maintenance way.)
There is no simple way to do this. You would have to create a custom subclass of the CodingConvention class to indicate that your methods are "local" externs (support for this was added to handle the Prototype library). It is possible that InlineVariables, InlineFunctions, or RemoveUsedVariables will still try to remove the name and would also need to be fixed up.
Another approach is to use the source maps to remap the stack traces to the original source.
read the following section
https://developers.google.com/closure/compiler/docs/api-tutorial3#export
Two options basically, use object['functionName'] = obj.functionName or the better way
use exportSymbol and exportProperty both on the goog object, here is the docs link for that
http://closure-library.googlecode.com/svn/docs/closure_goog_base.js.html
-- edit
ah, i see now, my first answer is not so great for you. The compiler has some interesting flags, the one which might interest you is DEBUG, which you can pass variables into the compiler which will allow you to drop some debugging annotations in via logging or just a string which does nothing since you are using simple mode.
so if you are using closure you can debug against a development version which is just a page built with dependiencies resolved. we also the drop the following in our code
if(DEBUG){
logger.info('pack.age.info.prototype.func');
}

Resources