Should every function which calls itself be considered a recursive function? - recursion

I understand what recursive functions are, but consider the following example of a function meant to get the local version of data on an item, check if there is new data about it available online based on locally stored cache time, and if there is, updating the local data with the new version, returning up-to-date data about it either way.
function getItemData(id){
var local=getLocalItemData(id);
if(!local.cacheTime.upToDate()){
var newData=getOnlineItemData(id);
updateLocalItemData(id, newData);
return getItemData(id);
}
else{
return local.returnHumanReadable();
}
}
My argument against considering it a recursive function is the fact that it will only end up calling itself on rare occasions when the cache time indicates the data has expired, and that the function only calls itself for convenience.
Instead of using return getLocalItemData(id).returnHumanReadable(); I can use return getItemData(id); because it will return the same result, as the newly saved data won't need to be refreshed again in the several microseconds it will take the function to call itself. Also, it is much shorter: in the actual code, I would use lower level commands instead of those function calls, resulting in code duplication which would make the entire function harder to read and maintain.
So, do you think that my argument makes any sense, or do you consider this to be nothing more than a matter of code organization?

The short answer is: yes it is recursive
It becomes important if you consider a platform that does not support recursion
Since it can call itself, your code will not work on that platform because it is technically recursion
For a trivial case like this replacing the recursive call with getLocalItemData(id).returnHumanReadable(); will allow it to work on this platform. In fact, you can change your code to:
function getItemData(id){
var local=getLocalItemData(id);
if(!local.cacheTime.upToDate()){
var newData=getOnlineItemData(id);
updateLocalItemData(id, newData);
local=getLocalItemData(id);
}
return local.returnHumanReadable();
}
NOTE: If you cannot 100% guarantee that only one call is needed, change the if to while

Related

How do I do jumps to a label in an enclosing function in LLVM IR?

I want to do an LLVM compiler for a very old language, PL/M. This has some peculiar features, not least of which is having nested functions with the ability to jump out of an enclosing function. In pseudocode:
toplevel() {
nested() {
if (something)
goto label;
}
nested();
label:
print("finished!");
}
The constraints here are:
you can only jump into the top-level function, luckily
the stack does get unwound (the language does not support destructors, so this is easy)
you do not have to have executed the statement at label before jumping (so the naive setjmp/longjmp method doesn't work).
code at label can be executed normally, i.e. it's not like catch
LLVM has a number of non-local jump mechanisms, such as the exception handling system, but I've never used that. Can this be implemented using LLVM exceptions, or are they not suitable for this? Is there an easier way?
If you want the stack to get unwound, you'll likely want it to be in a separate function, at least a separate LLVM IR function. (The only real exception is if your language does not have a construct like C's "alloca()" and you don't allow calling a nested function by address in which case you could inline it.)
That part of the problem you mentioned, jumping out of an enclosing function, is best handled by having some way for the callee to communicate "how it exited" to the caller, and the caller having a "switch()" on that value. You could stick it in the return value (if it already returns a value, make it a struct of both values), you could add a pointer parameter that it writes to, you could add it a thread-local global variable and fill that in before calling longjmp, or you could use exceptions.
Exceptions, they're complex (I can't describe how to make them work offhand but the docs are here: https://llvm.org/docs/ExceptionHandling.html ) and slow when the exception path is taken, and really intended for exceptional situations, not for normal code. Setjmp/longjmp does the same thing as exceptions except simpler to use and without the performance trade-off when executed, but unfortunately there are miscompiles in LLVM which you need will be the one to fix if you start using them in earnest (see the postscript at the end of the answer).
Those two options cover the ways you can do it without changing the function signature, which may be necessary if your language allows the address to be taken then called later.
If you do need to take the address of nested, then LLVM supports trampolines. See https://llvm.org/docs/LangRef.html#trampoline-intrinsics . Trampolines solve the problem of accessing the local variables of the calling function from the callee, even when the function is called by address.
PS. LLVM miscompiles setjmp/longjmp today. The current model is that a call to setjmp may return twice, and only functions with the returns_twice attribute may return twice. Note that this doesn't affect the whole call stack, only the direct caller of a function that returns twice has to deal with the twice-returning call-- just because function F calls setjmp does not mean that F itself can return twice. So far, so good.
The problem is that in a function with a setjmp, all function calls may themselves call longjmp. I'd say "unless proven otherwise" as with all things in optimizers, but there is no attribute in LLVM doesnotlongjmp or any code within LLVM that attempts to answer the question of whether a function could call longjmp. Adding that would be a good optimization, but it's a separate issue from the miscompile.
If you have code like this pseudo-code:
%entry block:
allocate val
val <- 0
setjmpret <- call setjmp
br i1 setjmpret, %first setjmp return block, %second setjmp return block
%first setjmp return block:
val <- 1;
call foo();
goto after;
%second setjmp return block:
call print(val);
goto after;
%after:
return
The control flow graph shows that is no path from val <- 0 to val <- 1 to print(val). The only path with "print(val)" has "val <- 0" before it therefore constant propagation may turn print(val) into print(0). The problem here is a missing control flow edge from foo() back to the %second setjmp return block. In a function that contains a setjmp, all calls which may call longjmp must have a CFG edge to the second setjmp return block. In LLVM that control flow edge is missing and LLVM miscompiles code because of it.
This problem also manifests in the backend. The first time I heard of this problem it was in the context of the backend losing track of the placement of variables on the stack, and this issue was the underlying root cause.
For the most part setjmp/longjmp seems to work because LLVM isn't usually able to analyze what calling foo() might do and can't perform the optimization. For instance if val was not a fresh allocation but was a pointer, then who's to say that foo() doesn't have access to the same pointer, and then performs "val <- 1" on it? If LLVM can't prove that impossible, that precludes the transform to print(0). Secondly, setjmp/longjmp are just not used often in real code.

Pattern for async testing with scalatest and mocking objects with mockito

I am writing unit tests for some async sections of my code (returning Futures) that also involves the need to mock a Scala object.
Following these docs, I can successfully mock the object's functions. My question stems from the fact that withObjectMocked[FooObject.type] returns Unit, where async tests in scalatest require either an Assertion or Future[Assertion] to be returned. To get around this, I'm creating vars in my tests that I reassign within the function sent to withObjectMocked[FooObject.type], which ends up looking something like this:
class SomeTest extends AsyncWordSpec with Matchers with AsyncMockitoSugar with ResetMocksAfterEachAsyncTest {
"wish i didn't need a temp var" in {
var ret: Future[Assertion] = Future.failed(new Exception("this should be something")) // <-- note the need to create the temp var
withObjectMocked[SomeObject.type] {
when(SomeObject.someFunction(any)) thenReturn Left(Error("not found"))
val mockDependency = mock[SomeDependency]
val testClass = ClassBeingTested(mockDependency)
ret = testClass.giveMeAFuture("test_id") map { r =>
r should equal(Error("not found"))
} // <-- set the real Future[Assertion] value here
}
ret // <-- finally, explicitly return the Future
}
}
My question then is, is there a better/cleaner/more idiomatic way to write async tests that mock objects without the need to jump through this bit of a hoop? For some reason, I figured using AsyncMockitoSugar instead of MockitoSugar would have solved that for me, but withObjectMocked still returns Unit. Is this maybe a bug and/or a candidate for a feature request (the async version of withObjectMocked returning the value of the function block rather than Unit)? Or am I missing how to accomplish this sort of task?
You should refrain from using mockObject in a multi-thread environment as it doesn't play well with it.
This is because the object code is stored as a singleton instance, so it's effectively global.
When you use mockObject you're efectibly forcefully overriding this var (the code takes care of restoring the original, hence the syntax of usign it as a "resource" if you want).
Because this var is global/shared, if you have multi-threaded tests you'll endup with random behaviour, this is the main reason why no async API is provided.
In any case, this is a last resort tool, every time you find yourself using it you should stop and ask yourself if there isn't anything wrong with your code first, there are quite a few patterns to help you out here (like injecting the dependency), so you should rarely have to do this.

How to avoid loops when writing cloud functions?

When writing event based cloud functions for firebase firestore it's common to update fields in the affected document, for example:
When a document of users collection is updated a function will trigger, let's say we want to determine the user info state and we have a completeInfo: boolean property, the function will have to perform another update so that the trigger will fire again, if we don't use a flag like needsUpdate: boolean to determine if excecuting the function we will have an infinite loop.
Is there any other way to approach this behavior? Or the situation is a consequence of how the database is designed? How could we avoid ending up in such scenario?
I have a few common approaches to Cloud Functions that transform the data:
Write the transformed data to a different document than the one that triggers the Cloud Function. This is by far the easier approach, since there is no additional code needed - and thus I can't make any mistakes in it. It also means there is no additional trigger, so you're not paying for that extra invocation.
Use granular triggers to ensure my Cloud Function only gets called when it needs to actually do some work. For example, many of my functions only need to run when the document gets created, so by using an onCreate trigger I ensure my code only gets run once, even if it then ends up updating the newly created document.
Write the transformed data into the existing document. In that case I make sure to have the checks for whether the transformation is needed in place before I write the actual code for the transformation. I prefer to not add flag fields, but use the existing data for this check.
A recent example is where I update an amount in a document, which then needs to be fanned out to all users:
exports.fanoutAmount = functions.firestore.document('users/{uid}').onWrite((change, context) => {
let old_amount = change.before && change.before.data() && change.before.data().amount ? change.before.data().amount : 0;
let new_amount = change.after.data().amount;
if (old_amount !== new_amount) {
// TODO: fan out to all documents in the collection
}
});
You need to take care to avoid writing a function that triggers itself infinitely. This is not something that Cloud Functions can do for you. Typically you do this by checking within your function if the work was previously done for the document that was modified in a previous invocation. There are several ways to do this, and you will have to implement something that meets your specific use case.
I would take this approach from an execution time perspective, this means that the function for each document will be run twice. Each time when the document is triggered, a field lastUpdate would be there with a timestamp and the function only updates the document if the time is older than my time - eg 10 seconds.

Do you make safe and unsafe version of your functions or just stick to the safe version? (Embedded System)

let's say you have a function that set an index and then update few variables based on the value stored in the array element which the index is pointing to. Do you check the index to make sure it is in range? (In embedded system environment to be specific Arduino)
So far I have made a safe and unsafe version for all functions, is that a good idea? In some of my other codes I noticed that having only safe functions result in checking conditions multiple time as the libraries get larger, so I started to develop both. The safe function checks the condition and call the unsafe function as shown in example below for the case explained above.
Safe version:
bool RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS)
{
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
return true;
}
return false;
}
Unsafe version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex)
{
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
If I am doing it wrong fundamentally please let me know why and how I could avoid that. Also I would like to know, generally when you program, do you consider the future user to be a fool or you expect them to follow the minimal documentation provided? (the reason I say minimal is because I do not have the time to write a proper documentation)
void RcChannelModule::setCuurentFactorIndexUnsafe(const factorIndex_T factorIndex)
{
currentFactorIndex_ = factorIndex;
}
Safety checks, such as array index range checks, null checks, and so on, are intended to catch programming errors. When these checks fail, there is no graceful recovery: the best the program can do is to log what happened, and restart.
Therefore, the only time when these checks become useful is during debugging and testing of your code. C++ provides built-in functionality for dealing with this through asserts, which are kept in the debug versions of the code, but compiled out from the release version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex) {
assert(factorIndex < N_FACTORS);
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
Note: [When you make a library for external use] an argument-checking version of each external function perhaps makes sense, with non-argument-checking implementations of those and all internal-only functions. If you perform argument checking then do it (only) at the boundary between your library and the client code. But it's pointless to offer a choice to your users, for if you want to protect them from usage errors then you cannot rely on them to choose the "safe" versions of your functions. (John Bollinger)
Do you make safe and unsafe version of your functions or just stick to the safe version?
For higher level code, I recommend one version, a safe one.
High level code, with a large set of related functions and data, the combinations of interactions of data and code are not possible to fully check at development time. When an error is detected, the data should be set to indicate an error state. Subsequent use of data within these functions would be aware of the error state.
For lowest level -time critical routines, I'd go with #dasblinkenlight answer. Create one source code that compiles 2 ways per the debug and release compiles.
Yet keep in mind #pete becker, it this really likely a performance bottle neck to do a check?
With floating-point related routines, use the NaN to help keep track of an unrecoverable error.
Lastly, as able, create functions that do not fail and avoid the issue. With many, not all, this only requires small code additions. It often only adds a constant of time performance penalty and not a O(n) penalty.
Example: Consider a function to lop off the first character of a string - in place.
// This work fine as long as s[0] != 0
char *slop_1(char *s) {
size_t len = strlen(s); // most work is here
return memmove(s, s + 1, len); // and here
}
Instead define the function, and code it, to do nothing when s[0] == 0
char *slop_2(char *s) {
size_t len = strlen(s);
if (len > 0) { // negligible additional work
memmove(s, s + 1, len);
}
return s;
}
Similar code can be applied to OP's example. Note that it is "safe", at least within the function. The assert() scheme can still be used to discovery development issues. Yet the released code, without the assert(), still checks the range.
void RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS) {
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
} else {
assert(1);
}
}
Since you tagged this Arduino and embedded, you have a very resource-constrained system, one of the crappiest processors still manufactured.
On such a system you cannot afford extra error handling. It is better to properly document what values the parameters passed to the function must have, then leave the checking of this to the caller.
The caller can then either check this in run-time, if needed, or otherwise in compile-time with a static assert. Your function would however not be able to implement it as a static assert, as it can't know if factorIndex is a run-time variable or a compile-time constant.
As for "I have no time to write proper documentation", that's nonsense. It takes far less time to document this function than to post this SO question. You don't necessarily have to write an essay in some Word file. You don't necessarily have to use Doxygen or similar.
But you do need to write the bare minimum of documentation: In the header file, document the purpose and expected values of all function parameters in the form of comments. Preferably you should have a coding standard for how to document such functions. A minimal documentation of public API functions in the form of comments is part of your job as programmer. The code is not complete until this is written.

xquery call a maintenance function within another function

Using MarkLogic Xquery, I have a function (admin:add-collection-to-publication) which calls another maintenance function ( admin:check-collections-exists) which checks for an element's presence and if its not present then it creates that particular element.
The way I call the maintenance function is with a let. This seems like a weird way, to do this it requires creating an unused variable. Should I instead return a sequence with the call to admin:check-collections-exists being the first item in the sequence then the subsequent processing being the second element? Just looking for the standard elegant way to do this. My functions are:
declare function admin:add-collection-to-publication($pub-name, $collection-name)
{
(:does this publication have a COLLECTIONS element?:)
let $unnecessary-variable := admin:check-collections-exists($pub-name)
(:now go and do what this function does:)
return "do some other stuff then return"
};
declare function admin:check-collections-exists($pub-name)
{
if(fn:exists($pubs-node/pub:PUBLICATION[pub:NAME/text()=$pub-name]/pub:COLLECTIONS))
then
"exists"
else
xdmp:node-insert-child($pubs-node/pub:PUBLICATION[pub:NAME/text()=$pub-name],<pub:COLLECTIONS/>)
};
Using a sequence is not reliable. MarkLogic will most likely attempt to evaluate the sequence items in parallel, which could cause the creating to happen at 'same' time or even after the other work. The best approach is indeed to use a let. The let's are always evaluated before the return. Note though that let's can be evaluated in parallel as well, but the optimizer is smart enough to detect dependencies.
Personally, I often use unused variables. For example to insert logging statements, in which case I have one unused variable name that I reuse each time:
let $log := xdmp:log($before)
let $result := do:something($before)
let $log := xdmp:log($result)
You could also use a very short variable name like $_. Or you could reconsider actually giving the variable a sensible name, and use it after all, even though you know it never reaches the else
let $exists :=
if (collection-exists()) then
create()
else true()
return
if ($exists) then
"do stuff"
else () (: never reached!! :)
HTH!

Resources