Asynchronously running long running tasks in Elixir - asynchronous

I have a module that saves data in csv format which takes a relative long time depending on the data size. What is the Elixir way to accomplish this asynchronously? I tried using Agent, but the process times out.
defmodule FinReporting.Export_CSV do
alias FinReporting.DistributeRepo
alias FinReporting.InterfaceMdl
import Ecto.Query
def start_link do
Agent.start_link(fn -> HashDict.new end, name: __MODULE__)
end
def export do
Agent.update(__MODULE__, fn dict ->
export_sub()
end)
end
defp export_sub do
file = File.open!("test.csv",[:write, :utf8])
IO.puts("===> CSV export of NGInterface file started.")
DistributeRepo.all(from entry in InterfaceMdl, limit: 100000, select: %{ field1: entry.field1, amount: entry.amount})
|>Enum.map(fn(entry)-> %{entry|amount: Decimal.to_string(entry.amount)}end)
|> Enum.map(fn(m) -> [m.field1, m.amount] end)
|> CSV.encode
|> Enum.each(&IO.write(file, &1))
IO.puts("===> CSV export of NGInterface file completed.")
_ = File.close(file)
end
end

You can specify a custom timeout using the third argument to Agent.update. You can pass an integer specifying the number of milliseconds, e.g. 60000 for one minute, or :infinity for infinite timeout.
Agent.update(__MODULE__, fn dict -> export_sub() end, 60000)
But, Agent.update waits for the function to complete executing, which is not what you want.
You want Task and specifically Task.async/1.
Task.async(fn -> export_sub() end)
This will return a Task struct that you can wait on later in your application using Task.await or ask for its status using Task.yield. All this and more is explained in great detail in the documentation of Task.

Related

How to Get the Task Status and Result of a Celery Task Type When Multiple Tasks Are Defined?

Typically I send an asynchronous task with .apply_async method of the Promise defined, and then I use the taskid on the AsyncResult method of the same object to get task status, and eventually, result.
But this requires me to know the exact type of task when more than one tasks are defined in the same deployment. Is there any way to circumvent this, when I can know the task status and result (if available) without knowing the exact task?
For example, take this example celery master node code.
#!/usr/bin/env python3
# encoding:utf-8
"""Define the tasks in this file."""
from celery import Celery
redis_host: str = 'redis://localhost:6379/0'
celery = Celery(main='test', broker=redis_host,
backend=redis_host)
celery.conf.CELERY_TASK_SERIALIZER = 'pickle'
celery.conf.CELERY_RESULT_SERIALIZER = 'pickle'
celery.conf.CELERY_ACCEPT_CONTENT = {'json', 'pickle'}
# pylint: disable=unused-argument
#celery.task(bind=True)
def add(self, x: float, y: float) -> float:
"""Add two numbers."""
return x + y
#celery.task(bind=True)
def multiply(self, x: float, y: float) -> float:
"""Multiply two numbers."""
return x * y
When I call something like this in a different module
task1=add.apply_async(args=[2, 3]).id
task2=multiply.apply_async(args=[2, 3]).id
I get two uuids for the tasks. But when checking back the task status, I need to know which method (add or multiply) is associated with that task id, since I have to call the method on the corresponding object, like this.
status: str = add.AsyncResult(task_id=task1).state
My question is how can I fetch the state and result armed only with the task id without knowing whether the task belongs add, multiply or any other category defined.
id and state are just properties of the AsyncResult objects. If you looked at documentation for the AsyncResult class, you would find the name property which is exactly what you are asking for.

What is a good way for writing a function to measure another function in Elixir

I'm new to elixir, I'm trying to find something similar to Python's ContextManager.
Problem:
I have a bunch of functions and I want to add latency metric around them.
Now we have:
def method_1 do
...
end
def method_2 do
...
end
... more methods
I'd like to have:
def method_1 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
def method_2 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
... more methods
Now code duplication is a problem
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
So what is a better way to avoid code duplication in this case? I like the context manager idea in python. But now sure how I can achieve something similar in Elixir, thanks for the help in advance!
In Erlang/Elixir this is done through higher-order functions, take a look at BEAM telemetry. It is an Erlang and Elixir library/standard for collecting metrics and instrumenting your code - it is widely adopted by Pheonix, Ecto, cowboy and other libraries. Specifically, you'd be interested in :telemetry.span/3 function as it emits start time and duration measurements by default:
def some_function(args) do
:telemetry.span([:my_app, :my_function], %{metadata: "Some data"}, fn ->
result = do_some_work(args)
{result, %{more_metadata: "Some data here"}}
end)
end
def do_some_work(args) # actual work goes here
And then, in some other are of your code you listen to those events and log them/send them to APM:
:telemetry.attach_many(
"test-telemetry",
[[:my_app, :my_function, :start],
[:my_app, :my_function, :stop],
[:my_app, :my_function, :exception]],
fn event, measurements, metadata, config ->
# Handle the actual event.
end)
nil
)
I think the closest thing to python context manager would be to use higher order functions, i.e. functions taking a function as argument.
So you could have something like:
def measure(fun) do
start = System.monotonic_time()
result = fun.()
stop = System.monotonic_time()
emit_metric(stop - start)
result
end
And you could use it like:
measure(fn ->
do_stuff()
...
end)
Note: there are other similar instances where you would use a context manager in python that would be done in a similar way, on the top of my head: Django has a context manager for transactions but Ecto uses a higher order function for the same thing.
PS: to measure elapsed time, you probably want to use :timer.tc/1 though:
def measure(fun) do
{elapsed, result} = :timer.tc(fun)
emit_metric(elapsed)
result
end
There is actually a really nifty library called Decorator in which macros can be used to "wrap" your functions to do all sorts of things.
In your case, you could write a decorator module (thanks to #maciej-szlosarczyk for the telemetry example):
defmodule MyApp.Measurements do
use Decorator.Define, measure: 0
def measure(body, context) do
meta = Map.take(context, [:name, :module, :arity])
quote do
# Pass the metadata information about module/name/arity as metadata to be accessed later
:telemetry.span([:my_app, :measurements, :function_call], unquote(meta), fn ->
{unquote(body), %{}}
end)
end
end
end
You can set up a telemetry listener in your Application.start definition:
:telemetry.attach_many(
"my-app-measurements",
[[:my_app, :measurements, :function_call, :start],
[:my_app, :measurements, :function_call, :stop],
[:my_app, :measurements, :function_call, :exception]],
&MyApp.MeasurementHandler.handle_telemetry/4)
nil
)
Then in any module with a function call you'd like to measure, you can "decorate" the functions like so:
defmodule MyApp.Domain.DoCoolStuff do
use MyApp.Measurements
#decorate measure()
def awesome_function(a, b, c) do
# regular function logic
end
end
Although this example uses telemetry, you could just as easily print out the time difference within your decorator definition.

Elixir spawn processes without self recursion in anonymous function

I am pretty new to elixir and functional programming in general, and I was wondering in the specific example below how can I add 'nodes' in the currently empty list in the 3rd parameter of the spawn/3 function.
def create(n) do
nodes = Enum.map(1..n, fn(_) -> spawn(Node, :begin, []) end)
end
For example what I am trying to do is similar to this:
def create(n) do
nodes = Enum.map(1..n, fn(_) -> spawn(Node, :begin, [nodes]) end)
end
I have tried piping and pre declared nodes but as processes, they are already spawn and the begin function is already triggered following the other ways.
What I am trying to do and needs nodes for is for the class Node as follows
defmodule Node do
def begin(nodes) do
# do stuff with nodes here
end
end
If I understand your problem correctly, I believe you'll need to create the nodes first.
Enum.map(1..n, fn ... create node here ... end)
Then pipe them to a map operation that will evaluate your nodes using the Node.begin function.
Enum.map(nodes, &Node.begin/1)
If Node.begin is some expensive( heavy workload ) operation you could take advantage of creating async Tasks to handle the heavy workload in separate processes.
nodes
|> Enum.map(&(Task.async(fn -> Node.begin(&1) end)))
|> Enum.map(&Task.await/1)
FYI: &Node.begin/1 passes the Node.begin function to the callback of Enum.map. The &(...) part inside of the map call is a shorthand anonymous function.
Here's the final code:
# If Node.begin is a heavy operation
def create(n) do
Enum.map(1..n, fn -> ... create nodes here ... end)
|> Enum.map(&Node.begin/1)
end
# If Node.begin will take some muscle to complete
def create(n) do
Enum.map(1..n, fn -> ... create nodes here ... end)
|> Enum.map(&(Task.async(fn -> Node.begin(&1) end)))
|> Enum.map(&Task.await/1)
end

Writing the function "once" in Elixir

I'm coming to Elixir from primarily a Javascript background. in JS, it's possible to write a higher order function "once" which returns a function that will invoke the passed in function only once, and returns the previous result on subsequent calls- the trick is manipulating variables that were captured via closure:
var once = (func) => {
var wasCalled = false, prevResult;
return (...args) => {
if (wasCalled) return prevResult;
wasCalled = true;
return prevResult = func(...args);
}
}
It seems to me that it's not possible to create this function in Elixir, due to its different variable rebinding behavior. Is there some other clever way to do it via pattern matching or recursion, or is it just not possible? Without macros that is, I'd imagine those might enable it. Thanks
Using the current process dictionary:
defmodule A do
def once(f) do
key = make_ref()
fn ->
case Process.get(key) do
{^key, val} -> val
nil ->
val = f.()
Process.put(key, {key, val})
val
end
end
end
end
Or if the function will be passed across processes, an ets table can be used:
# ... during application initialization
:ets.new(:cache, [:set, :public, :named_table])
defmodule A do
def once(f) do
key = make_ref()
fn ->
case :ets.lookup(:cache, key) do
[{^key, val}] -> val
[] ->
val = f.()
:ets.insert(:cache, {key, val})
val
end
end
end
end
Application.put_env / Application.get_env can also be used to hold global state, though usually is used for configuration settings.
It's not considered idiomatic in most cases, but you can do this with Agent:
defmodule A do
def once(fun) do
{:ok, agent} = Agent.start_link(fn -> nil end)
fn args ->
case Agent.get(agent, & &1) do
nil ->
result = apply(fun, args)
:ok = Agent.update(agent, fn _ -> {:ok, result} end)
result
{:ok, result} ->
result
end
end
end
end
Now if you run this:
once = A.once(fn sleep ->
:timer.sleep(sleep)
1 + 1
end)
IO.inspect once.([1000])
IO.inspect once.([1000])
IO.inspect once.([1000])
IO.inspect once.([1000])
You'll see that the first line is printed after 1 second, but the next 3 are printed instantly, because the result is fetched from the agent.
While both already given answers are perfectly valid, the most precise translation from your javascript is shown below:
defmodule M do
use GenServer
def start_link(_opts \\ []) do
GenServer.start_link(__MODULE__, nil, name: __MODULE__)
end
def init(_args) do
Process.sleep(1_000)
{:ok, 42}
end
def value() do
start_link()
GenServer.call(__MODULE__, :value)
end
def handle_call(:value, _from, state) do
{:reply, state, state}
end
end
(1..5) |> Enum.each(&IO.inspect(M.value(), label: to_string(&1)))
Use the same metric as in #Dogbert’s answer: the first value is printed with a delay, all subsequent are printed immediately.
This is an exact analog of your memoized function using GenServer stage. GenServer.start_link/3 returns one of the following:
{:ok, #PID<0.80.0>}
{:error, {:already_started, #PID<0.80.0>}}
That said, it is not restarted if it’s already started. I do not bother to check the returned value since we are all set in any case: if it’s the initial start, we call the heavy function, if we were already started, the vaklue is already at fingers in the state.

HashDict and OTP GenServer context within Elixir

I am having trouble using the HashDict function within OTP. I would like to use one GenServer process to put and a different one to fetch. When I try and implement this, I can put and fetch items from the HashDict when calling from the same GenServer; it works perfectly (MyServerA in the example below). But when I use one GenServer to put and a different one to fetch, the fetch implementation does not work. Why is this? Presumably it's because I need to pass the HashDict data structure around between the three different processes?
Code example below:
I use a simple call to send some state to MyServerB:
MyServerA.add_update(state)
For MyServerB I have implemented the HashDict as follows:
defmodule MyServerB do
use GenServer
def start_link do
GenServer.start_link(__MODULE__,[], name: __MODULE__)
end
def init([]) do
#Initialise HashDict to store state
d = HashDict.new
{:ok, d}
end
#Client API
def add_update(update) do
GenServer.cast __MODULE__, {:add, update}
end
def get_state(window) do
GenServer.call __MODULE__, {:get, key}
end
# Server APIs
def handle_cast({:add, update}, dict) do
%{key: key} = update
dict = HashDict.put(dict, key, some_Value)
{:noreply, dict}
end
def handle_call({:get, some_key}, _from, dict) do
value = HashDict.fetch!(dict, some_key)
{:reply, value, dict}
end
end
So if from another process I use MyServerB.get_state(dict,some_key), I don't seem to be able to return the contents of the HashDict...
UPDATE:
So if I use ETS I have something like this:
def init do
ets = :ets.new(:my_table,[:ordered_set, :named_table])
{:ok, ets}
end
def handle_cast({:add, update}, state) do
update = :ets.insert(:my_table, {key, value})
{:noreply, ups}
end
def handle_call({:get, some_key}, _from, state) do
sum = :ets.foldl(fn({{key},{value}}, acc)
when key == some_Key -> value + acc
(_, acc) ->
acc
end, 0, :my_table)
{:reply, sum, state}
end
So again, the cast works - when I check with observer I can see the its filling up with my key value pairs. However, when I try my call it returns nothing again. So I'm wondering if I'm handling the state incorrectly?? Any help, gratefully received??
Thanks
Your problem is with this statement:
I would like to use one GenServer process to put and a different one to fetch.
In Elixir processes cannot share state. So you cannot have one process with data, and another process reading it directly. You could for example, store the HashDict in one process and then have the other process send a message to the first asking for data. That would make it appear as you describe, however behind the scenes it would still have all transactions go through the first process. There are techniques for doing this in a distributed/concurrent fashion so that multiple cores are utilize but that may be more work than you're looking to do at the moment.
Take a look at ETS, which will allow you to create a public table and access the data from multiple processes.
ETS is the way to go. Share a HashDict as state between GenServers is not possible.
I really don't know how you are testing your code, but ETS by default has read and write concurrency to false. For example, if you have no problem with reading or writing concurrently then you can change your init function to:
def init do
ets = :ets.new :my_table, [:ordered_set, :named_table,
read_concurrency: true,
write_concurrency: true]
{:ok, ets}
end
Hope this helps.

Resources