I have read that it is in principle possible to convert a recursive function to an iterative function. I have some bunch of functions calling each other. I constructed the structure of the code looking at my flowchart and it was kind of obvious to do it by recursive style. It runs good for small size problems but gives segmentation fault for bigger scale. So I am trying to switch to iterative style but I cannot imagine a way to do it technically since the branching structure confuses me. Can someone give me a clue to handle it? The code is something like that in python:
def main_function(parameters):
if condition0:
....
if condition1:
....
if condition2:
....
return function1(parameters)
else:
....
return function2(parameters)
else:
return function1(parameters)
else:
return function2(parameters)
#############################################
def function1(parameters):
if condition3:
...
return function3(parameters) ### yet another function.. so messed up? :-(((
else:
return main_function(parameters)
##############################################
def function2(parameters):
if condition4:
...
return main_function(parameters)
else:
return function1(parameters)
###############################################
def function3(parameters):
if condition5
if condition6:
...
return function3(parameters)
else:
...
return main_function(parameters)
else:
return RESULTS # The only way out!
Any idea would be greatly appreciated, thank you very much in advance.
Since every return statement that you've shown is essentially a return some_other_function(), it seems that a state machine would be a natural way to model this. There would be a state corresponding to each function, and the return statements would become state transitions.
Since every recursive call is initiated in return statements. You don't need to hold up the old stack. For example, when function1() calls return function3(), function1 stack can be removed. This way you won't get RuntimeError: maximum recursion depth exceeded.
You can achieve this by returning the successive function to call with parameters, instead of calling recursively.
def main_function(parameters):
if condition0:
if condition1:
if condition2:
return function1, parameters # return function to call next with arguments
else:
....
return function2, parameters
else:
return function1, parameters
else:
return function2, parameters
You should change the other functions to a similar way. Now you can call the main_function() as follows:
next_function, next_fun_param = main_function(parameters)
while hasattr(next_function, '__call__')
next_function, next_fun_param = next_function(next_fun_param)
# got the RESULT
Related
I'm new to elixir, I'm trying to find something similar to Python's ContextManager.
Problem:
I have a bunch of functions and I want to add latency metric around them.
Now we have:
def method_1 do
...
end
def method_2 do
...
end
... more methods
I'd like to have:
def method_1 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
def method_2 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
... more methods
Now code duplication is a problem
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
So what is a better way to avoid code duplication in this case? I like the context manager idea in python. But now sure how I can achieve something similar in Elixir, thanks for the help in advance!
In Erlang/Elixir this is done through higher-order functions, take a look at BEAM telemetry. It is an Erlang and Elixir library/standard for collecting metrics and instrumenting your code - it is widely adopted by Pheonix, Ecto, cowboy and other libraries. Specifically, you'd be interested in :telemetry.span/3 function as it emits start time and duration measurements by default:
def some_function(args) do
:telemetry.span([:my_app, :my_function], %{metadata: "Some data"}, fn ->
result = do_some_work(args)
{result, %{more_metadata: "Some data here"}}
end)
end
def do_some_work(args) # actual work goes here
And then, in some other are of your code you listen to those events and log them/send them to APM:
:telemetry.attach_many(
"test-telemetry",
[[:my_app, :my_function, :start],
[:my_app, :my_function, :stop],
[:my_app, :my_function, :exception]],
fn event, measurements, metadata, config ->
# Handle the actual event.
end)
nil
)
I think the closest thing to python context manager would be to use higher order functions, i.e. functions taking a function as argument.
So you could have something like:
def measure(fun) do
start = System.monotonic_time()
result = fun.()
stop = System.monotonic_time()
emit_metric(stop - start)
result
end
And you could use it like:
measure(fn ->
do_stuff()
...
end)
Note: there are other similar instances where you would use a context manager in python that would be done in a similar way, on the top of my head: Django has a context manager for transactions but Ecto uses a higher order function for the same thing.
PS: to measure elapsed time, you probably want to use :timer.tc/1 though:
def measure(fun) do
{elapsed, result} = :timer.tc(fun)
emit_metric(elapsed)
result
end
There is actually a really nifty library called Decorator in which macros can be used to "wrap" your functions to do all sorts of things.
In your case, you could write a decorator module (thanks to #maciej-szlosarczyk for the telemetry example):
defmodule MyApp.Measurements do
use Decorator.Define, measure: 0
def measure(body, context) do
meta = Map.take(context, [:name, :module, :arity])
quote do
# Pass the metadata information about module/name/arity as metadata to be accessed later
:telemetry.span([:my_app, :measurements, :function_call], unquote(meta), fn ->
{unquote(body), %{}}
end)
end
end
end
You can set up a telemetry listener in your Application.start definition:
:telemetry.attach_many(
"my-app-measurements",
[[:my_app, :measurements, :function_call, :start],
[:my_app, :measurements, :function_call, :stop],
[:my_app, :measurements, :function_call, :exception]],
&MyApp.MeasurementHandler.handle_telemetry/4)
nil
)
Then in any module with a function call you'd like to measure, you can "decorate" the functions like so:
defmodule MyApp.Domain.DoCoolStuff do
use MyApp.Measurements
#decorate measure()
def awesome_function(a, b, c) do
# regular function logic
end
end
Although this example uses telemetry, you could just as easily print out the time difference within your decorator definition.
for instance, lets suppose we had to write an algorithm to get the max value of an array of integers, could we still call the code functional if we make the recursive function return various information that simulates an assignment to a global object? an exemple:
function getMax(array, props={}) {
const {index = 0, actualMax = array[0]}= props ///initial props
const arrayNotEnded = array[index + 1] !== undefined
if (arrayNotEnded) {
const maxOf= (a, b) => a > b ? a : b
const newMax = maxOf(actualMax, array[index+1])
const nextIndex = index+1
return getMax(array, {index:nextIndex, actualMax:newMax} )
}else return actualMax
}
a funny thing about that is, in Haskell, we cannot have optional arguments, so this logic would not be something cool to work with, since we would have to pass the initial props every time we would need to call this function.
Yes, you could consider it cheating, but this is a well-known technique in functional programming, the accumulator argument [1][2][3]. Remember: code doesn't become functional by not having state, functional programming is all about making state explicit. There's no better way of doing that than by making it a parameter of your function.
Your code has some other problems, though. Most prominently, the state should be internal to your function, only being passed to a helper function (that might be locally declared or separate) but not as part of your function's public interface. This also prevents confusing your helper function by passing invalid state (e.g. out-of-bound indices). And yes, also the optional parameter smells - not because you think this is not possible in Haskell (it is, using Maybe), but because it can be forgotten or passed mistakenly. Instead, the helper function should have a required state parameter, and getMax should have none.
Last but not least, you should avoid out-of-bounds indexed access on arrays - check the length to know where the end is, don't compare to undefined. This includes unconditionally accessing array[0] - that makes it very easy to overlook that your function can return undefined. Make this error condition explicit as well.
Here's how I'd write it:
function getMax(array) {
if (!array.length)
throw new Error("array must be non-empty");
else
return maxFrom(1, array[0]);
function maxFrom(index, max) {
if (index < array.length)
return maxFrom(index+1, array[index] > max ? array[index] : max);
else
return actualMax
}
}
Even better than throwing exceptions would be if you'd had an algebraic data type at hand that you could return to represent the error-or-result.
I need help on returning a value/object from function
noReturnKeyword <- function(){
'noReturnKeyword'
}
justReturnValue <- function(){
returnValue('returnValue')
}
justReturn <- function(){
return('justReturn')
}
When I invoked these functions: noReturnKeyword(), justReturnValue(), justReturn(), I got output as [1] "noReturnKeyword", [1] "returnValue", [1] "justReturn" respectively.
My question is, even though I have not used returnValue or return keywords explicitly in noReturnKeyword() I got the output (I mean the value returned by the function).
So what is the difference in these function noReturnKeyword(), justReturnValue(), justReturn()
What is the difference in these words returnValue('') , return('')? are these one and the same?
When to go for returnValue('') and return('') in R functions ?
In R, according to ?return
If the end of a function is reached without calling return, the value of the last evaluated expression is returned.
return is the explicite way to exit a function and set the value that shall be returned. The advantage is that you can use it anywhere in your function.
If there is no explicit return statement R will return the value of the last evaluated expression
returnValueis only defined in a debugging context. The manual states:
the experimental returnValue() function may be called to obtain the
value about to be returned by the function. Calling this function in
other circumstances will give undefined results.
In other words, you shouldn't use that except in the context of on.exit. It does not even work when you try this.
justReturnValue <- function(){
returnValue('returnValue')
2
}
This function will return 2, not "returnValue". What happened in your example is nothing more than the second approach. R evaluates the last statement which is returnValue() and returns exactly that.
If you use solution 1 or 2 is up to you. I personally prefer the explicit way because I believe it makes the code clearer. But that is more a matter of opinion.
Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action
Imagine having a function, which handles a heavy computational job, that we wish to execute asynchronously in a Tornado application context. Moreover, we would like to lazily evaluate the function, by storing its results to the disk, and not rerunning the function twice for the same arguments.
Without caching the result (memoization) one would do the following:
def complex_computation(arguments):
...
return result
#gen.coroutine
def complex_computation_caller(arguments):
...
result = complex_computation(arguments)
raise gen.Return(result)
Assume to achieve function memoization, we choose Memory class from joblib. By simply decorating the function with #mem.cache the function can easily be memoized:
#mem.cache
def complex_computation(arguments):
...
return result
where mem can be something like mem = Memory(cachedir=get_cache_dir()).
Now consider combining the two, where we execute the computationally complex function on an executor:
class TaskRunner(object):
def __init__(self, loop=None, number_of_workers=1):
self.executor = futures.ThreadPoolExecutor(number_of_workers)
self.loop = loop or IOLoop.instance()
#run_on_executor
def run(self, func, *args, **kwargs):
return func(*args, **kwargs)
mem = Memory(cachedir=get_cache_dir())
_runner = TaskRunner(1)
#mem.cache
def complex_computation(arguments):
...
return result
#gen.coroutine
def complex_computation_caller(arguments):
result = yield _runner.run(complex_computation, arguments)
...
raise gen.Return(result)
So the first question is whether the aforementioned approach is technically correct?
Now let's consider the following scenario:
#gen.coroutine
def first_coroutine(arguments):
...
result = yield second_coroutine(arguments)
raise gen.Return(result)
#gen.coroutine
def second_coroutine(arguments):
...
result = yield third_coroutine(arguments)
raise gen.Return(result)
The second question is how one can memoize second_coroutine? Is it correct to do something like:
#gen.coroutine
def first_coroutine(arguments):
...
mem = Memory(cachedir=get_cache_dir())
mem_second_coroutine = mem(second_coroutine)
result = yield mem_second_coroutine(arguments)
raise gen.Return(result)
#gen.coroutine
def second_coroutine(arguments):
...
result = yield third_coroutine(arguments)
raise gen.Return(result)
[UPDATE I] Caching and reusing a function result in Tornado discusses using functools.lru_cache or repoze.lru.lru_cache as a solution for second question.
The Future objects returned by Tornado coroutines are reusable, so it generally works to use in-memory caches such as functools.lru_cache, as explained in this question. Just be sure to put the caching decorator before #gen.coroutine.
On-disk caching (which seems to be implied by the cachedir argument to Memory) is trickier, since Future objects cannot generally be written to disk. Your TaskRunner example should work, but it's doing something fundamentally different from the others because complex_calculation is not a coroutine. Your last example will not work, because it's trying to put the Future object in the cache.
Instead, if you want to cache things with a decorator, you'll need a decorator that wraps the inner coroutine with a second coroutine. Something like this:
def cached_coroutine(f):
#gen.coroutine
def wrapped(*args):
if args in cache:
return cache[args]
result = yield f(*args)
cache[args] = f
return result
return wrapped