I am having trouble using the HashDict function within OTP. I would like to use one GenServer process to put and a different one to fetch. When I try and implement this, I can put and fetch items from the HashDict when calling from the same GenServer; it works perfectly (MyServerA in the example below). But when I use one GenServer to put and a different one to fetch, the fetch implementation does not work. Why is this? Presumably it's because I need to pass the HashDict data structure around between the three different processes?
Code example below:
I use a simple call to send some state to MyServerB:
MyServerA.add_update(state)
For MyServerB I have implemented the HashDict as follows:
defmodule MyServerB do
use GenServer
def start_link do
GenServer.start_link(__MODULE__,[], name: __MODULE__)
end
def init([]) do
#Initialise HashDict to store state
d = HashDict.new
{:ok, d}
end
#Client API
def add_update(update) do
GenServer.cast __MODULE__, {:add, update}
end
def get_state(window) do
GenServer.call __MODULE__, {:get, key}
end
# Server APIs
def handle_cast({:add, update}, dict) do
%{key: key} = update
dict = HashDict.put(dict, key, some_Value)
{:noreply, dict}
end
def handle_call({:get, some_key}, _from, dict) do
value = HashDict.fetch!(dict, some_key)
{:reply, value, dict}
end
end
So if from another process I use MyServerB.get_state(dict,some_key), I don't seem to be able to return the contents of the HashDict...
UPDATE:
So if I use ETS I have something like this:
def init do
ets = :ets.new(:my_table,[:ordered_set, :named_table])
{:ok, ets}
end
def handle_cast({:add, update}, state) do
update = :ets.insert(:my_table, {key, value})
{:noreply, ups}
end
def handle_call({:get, some_key}, _from, state) do
sum = :ets.foldl(fn({{key},{value}}, acc)
when key == some_Key -> value + acc
(_, acc) ->
acc
end, 0, :my_table)
{:reply, sum, state}
end
So again, the cast works - when I check with observer I can see the its filling up with my key value pairs. However, when I try my call it returns nothing again. So I'm wondering if I'm handling the state incorrectly?? Any help, gratefully received??
Thanks
Your problem is with this statement:
I would like to use one GenServer process to put and a different one to fetch.
In Elixir processes cannot share state. So you cannot have one process with data, and another process reading it directly. You could for example, store the HashDict in one process and then have the other process send a message to the first asking for data. That would make it appear as you describe, however behind the scenes it would still have all transactions go through the first process. There are techniques for doing this in a distributed/concurrent fashion so that multiple cores are utilize but that may be more work than you're looking to do at the moment.
Take a look at ETS, which will allow you to create a public table and access the data from multiple processes.
ETS is the way to go. Share a HashDict as state between GenServers is not possible.
I really don't know how you are testing your code, but ETS by default has read and write concurrency to false. For example, if you have no problem with reading or writing concurrently then you can change your init function to:
def init do
ets = :ets.new :my_table, [:ordered_set, :named_table,
read_concurrency: true,
write_concurrency: true]
{:ok, ets}
end
Hope this helps.
Related
Is there a built-in facility or some operator that will run a sensor and negate its status? I am writing a workflow that needs to detect that an object does not exist in order to proceed to eventual success. I have a sensor, but it detects when the object does exist.
For instance, I would like my workflow to detect that an object does not exist. I need almost exactly S3KeySensor, except that I need to negate its status.
The use case you are describing is checking key in S3, if exist wait otherwise continue workflow. As you mentioned this is a Sensor use case. The S3Hook has function check_for_key that checks if key exist so all needed is just to wrap it with Sensor poke function..
A simple basic implementation would be:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.sensors.base import BaseSensorOperator
class S3KeyNotPresentSensor(BaseSensorOperator):
""" Waits for a key to not be present in S3. """
template_fields: Sequence[str] = ('bucket_key', 'bucket_name')
def __init__(
self,
*,
bucket_key: str,
bucket_name: Optional[str] = None,
aws_conn_id: str = 'aws_default',
verify: Optional[Union[str, bool]] = None,
**kwargs,
):
super().__init__(**kwargs)
self.bucket_name = bucket_name
self.bucket_key = [bucket_key] if isinstance(bucket_key, str) else bucket_key
self.aws_conn_id = aws_conn_id
self.verify = verify
self.hook: Optional[S3Hook] = None
def poke(self, context: 'Context'):
return not self.get_hook().check_for_key(self.bucket_key, self.bucket_name)
def get_hook(self) -> S3Hook:
"""Create and return an S3Hook"""
if self.hook:
return self.hook
self.hook = S3Hook(aws_conn_id=self.aws_conn_id, verify=self.verify)
return self.hook
I ended up going another way. I can use the trigger_rule argument of (any) Task -- by setting it to one_failed or all_failed on the next task I can play around with the desired status.
For example,
file_exists = FileSensor(task_id='exists', timeout=3, poke_interval=1, filepath='/tmp/error', mode='reschedule')
sing = SmoothOperator(task_id='sing', trigger_rule='all_failed')
file_exists >> sing
It requires no added code or operator, but has the possible disadvantage of being somewhat surprising.
Replying to myself in the hope that this may be useful to someone else. Thanks!
I'm new to elixir, I'm trying to find something similar to Python's ContextManager.
Problem:
I have a bunch of functions and I want to add latency metric around them.
Now we have:
def method_1 do
...
end
def method_2 do
...
end
... more methods
I'd like to have:
def method_1 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
def method_2 do
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
end
... more methods
Now code duplication is a problem
start = System.monotonic_time()
...
end = System.monotonic_time()
emit_metric(end-start)
So what is a better way to avoid code duplication in this case? I like the context manager idea in python. But now sure how I can achieve something similar in Elixir, thanks for the help in advance!
In Erlang/Elixir this is done through higher-order functions, take a look at BEAM telemetry. It is an Erlang and Elixir library/standard for collecting metrics and instrumenting your code - it is widely adopted by Pheonix, Ecto, cowboy and other libraries. Specifically, you'd be interested in :telemetry.span/3 function as it emits start time and duration measurements by default:
def some_function(args) do
:telemetry.span([:my_app, :my_function], %{metadata: "Some data"}, fn ->
result = do_some_work(args)
{result, %{more_metadata: "Some data here"}}
end)
end
def do_some_work(args) # actual work goes here
And then, in some other are of your code you listen to those events and log them/send them to APM:
:telemetry.attach_many(
"test-telemetry",
[[:my_app, :my_function, :start],
[:my_app, :my_function, :stop],
[:my_app, :my_function, :exception]],
fn event, measurements, metadata, config ->
# Handle the actual event.
end)
nil
)
I think the closest thing to python context manager would be to use higher order functions, i.e. functions taking a function as argument.
So you could have something like:
def measure(fun) do
start = System.monotonic_time()
result = fun.()
stop = System.monotonic_time()
emit_metric(stop - start)
result
end
And you could use it like:
measure(fn ->
do_stuff()
...
end)
Note: there are other similar instances where you would use a context manager in python that would be done in a similar way, on the top of my head: Django has a context manager for transactions but Ecto uses a higher order function for the same thing.
PS: to measure elapsed time, you probably want to use :timer.tc/1 though:
def measure(fun) do
{elapsed, result} = :timer.tc(fun)
emit_metric(elapsed)
result
end
There is actually a really nifty library called Decorator in which macros can be used to "wrap" your functions to do all sorts of things.
In your case, you could write a decorator module (thanks to #maciej-szlosarczyk for the telemetry example):
defmodule MyApp.Measurements do
use Decorator.Define, measure: 0
def measure(body, context) do
meta = Map.take(context, [:name, :module, :arity])
quote do
# Pass the metadata information about module/name/arity as metadata to be accessed later
:telemetry.span([:my_app, :measurements, :function_call], unquote(meta), fn ->
{unquote(body), %{}}
end)
end
end
end
You can set up a telemetry listener in your Application.start definition:
:telemetry.attach_many(
"my-app-measurements",
[[:my_app, :measurements, :function_call, :start],
[:my_app, :measurements, :function_call, :stop],
[:my_app, :measurements, :function_call, :exception]],
&MyApp.MeasurementHandler.handle_telemetry/4)
nil
)
Then in any module with a function call you'd like to measure, you can "decorate" the functions like so:
defmodule MyApp.Domain.DoCoolStuff do
use MyApp.Measurements
#decorate measure()
def awesome_function(a, b, c) do
# regular function logic
end
end
Although this example uses telemetry, you could just as easily print out the time difference within your decorator definition.
I am pretty new to elixir and functional programming in general, and I was wondering in the specific example below how can I add 'nodes' in the currently empty list in the 3rd parameter of the spawn/3 function.
def create(n) do
nodes = Enum.map(1..n, fn(_) -> spawn(Node, :begin, []) end)
end
For example what I am trying to do is similar to this:
def create(n) do
nodes = Enum.map(1..n, fn(_) -> spawn(Node, :begin, [nodes]) end)
end
I have tried piping and pre declared nodes but as processes, they are already spawn and the begin function is already triggered following the other ways.
What I am trying to do and needs nodes for is for the class Node as follows
defmodule Node do
def begin(nodes) do
# do stuff with nodes here
end
end
If I understand your problem correctly, I believe you'll need to create the nodes first.
Enum.map(1..n, fn ... create node here ... end)
Then pipe them to a map operation that will evaluate your nodes using the Node.begin function.
Enum.map(nodes, &Node.begin/1)
If Node.begin is some expensive( heavy workload ) operation you could take advantage of creating async Tasks to handle the heavy workload in separate processes.
nodes
|> Enum.map(&(Task.async(fn -> Node.begin(&1) end)))
|> Enum.map(&Task.await/1)
FYI: &Node.begin/1 passes the Node.begin function to the callback of Enum.map. The &(...) part inside of the map call is a shorthand anonymous function.
Here's the final code:
# If Node.begin is a heavy operation
def create(n) do
Enum.map(1..n, fn -> ... create nodes here ... end)
|> Enum.map(&Node.begin/1)
end
# If Node.begin will take some muscle to complete
def create(n) do
Enum.map(1..n, fn -> ... create nodes here ... end)
|> Enum.map(&(Task.async(fn -> Node.begin(&1) end)))
|> Enum.map(&Task.await/1)
end
Suppose I have a function such as:
query_server : Server.t -> string Or_error.t Deferred.t
Then I produce a list of deferred queries:
let queries : string Or_error.t Deferred.t list = List.map servers ~f:query_server
How to get the result of the first query that doesn't fail (or some error otherwise). Basically, I'd like a function such as:
any_non_error : 'a Or_error.t Deferred.t list -> 'a Or_error.t
Also, I'm not sure how to somehow aggregate the errors. Maybe my function needs an extra parameter such as Error.t -> Error.t -> Error.t or is there a standard way to combine errors?
A simple approach would be to use Deferred.List that contains list operations lifted into the Async monad, basically a container interface in the Kleisli category. We will try each server in order until the first one is ready, e.g.,
let first_non_error =
Deferred.List.find ~f:(fun s -> query_server s >>| Result.is_ok)
Of course, it is not any_non_error, as the processing is sequential. Also, we are losing the error information (though the latter is very easy to fix).
So to make it parallel, we will employ the following strategy. We will have two deferred computations, the first will run all queries in parallel and wait until all are ready, the second will become determined as soon as an Ok result is received. If the first one happens before the last one, then it means that all servers failed. So let's try:
let query_servers servers =
let a_success,got_success = Pipe.create () in
let all_errors = Deferred.List.map ~how:`Parallel servers ~f:(fun s ->
query_server s >>| function
| Error err as e -> e
| Ok x as ok -> Pipe.write_without_pushback x; ok) in
Deferred.any [
Deferred.any all_errors;
Pipe.read a_success >>= function
| `Ok x -> Ok x
| `Eof -> assert false
]
Just for learning purpose, I tried to set a dictionary as a global variable in accumulator the add function works well, but I ran the code and put dictionary in the map function, it always return empty.
But similar code for setting list as a global variable
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, acc1, acc2):
acc1.update(acc2)
if __name__== "__main__":
sc, sqlContext = init_spark("generate_score_summary", 40)
rdd = sc.textFile('input')
#print(rdd.take(5))
dict1 = sc.accumulator({}, DictParam())
def file_read(line):
global dict1
ls = re.split(',', line)
dict1+={ls[0]:ls[1]}
return line
rdd = rdd.map(lambda x: file_read(x)).cache()
print(dict1)
For anyone who arrives at this thread looking for a Dict accumulator for pyspark: the accepted solution does not solve the posed problem.
The issue is actually in the DictParam defined, it does not update the original dictionary. This works:
class DictParam(AccumulatorParam):
def zero(self, value = ""):
return dict()
def addInPlace(self, value1, value2):
value1.update(value2)
return value1
The original code was missing the return value.
I believe that print(dict1()) simply gets executed before the rdd.map() does.
In Spark, there are 2 types of operations:
transformations, that describe the future computation
and actions, that call for action, and actually trigger the execution
Accumulators are updated only when some action is executed:
Accumulators do not change the lazy evaluation model of Spark. If they
are being updated within an operation on an RDD, their value is only
updated once that RDD is computed as part of an action.
If you check out the end of this section of the docs, there is an example exactly like yours:
accum = sc.accumulator(0)
def g(x):
accum.add(x)
return f(x)
data.map(g)
# Here, accum is still 0 because no actions have caused the `map` to be computed.
So you would need to add some action, for instance:
rdd = rdd.map(lambda x: file_read(x)).cache() # transformation
foo = rdd.count() # action
print(dict1)
Please make sure to check on the details of various RDD functions and accumulator peculiarities because this might affect the correctness of your result. (For instance, rdd.take(n) will by default only scan one partition, not the entire dataset.)
For accumulator updates performed inside actions only, their value is
only updated once that RDD is computed as part of an action