How can prometheus be used to measure latency on asynchronous calls? - asynchronous

I am using Prometheus to instrument my scala code. It works fine with Counters for most of the app related metrics.
When it comes to measuring latency, I am not sure how to use Summaries or Histograms (or some other metric type) to measure the latency of asynchronous calls.
Timer.observeDuration in a callback does not really do the trick since the Timer is reset multiple times before one aync call is completed.
What approach should I take to measure asynchronous latency using prometheus metrics?

You need to pass around the timer object from where you create it to where the call is finally complete, and only then call observeDuration.

Related

Orleans - how to improve reminders precision

Which the best way to organize Reminders + Timers work?
I want to run a task with precision in a second.
As I understand I need to run a reminder with some interval. And each start of the interval will fire a timer to achieve a specific time.
But which interval of reminder should I specify to achieve two goals:
Make a very durable solution (mitigate suddenly silo scaledown, silo fault, etc)
To Increase the likelihood to achieve needed precision.
Minimize timers & reminders overhead
Reminders are for approximate times (minutes), Timers are for precise times. You can create a robust solution by combining the two.
Obviously you will need something that initially creates the grain. You will also need persistent storage for the reminders. Then create a reminder AND a timer in OnActivateAsync().
In the reminder callback check for the existence of the timer, recreate it if necessary.
In the timer callback, make sure you call a grain method
private async Task SnapshotTimerFired()
{
var me = this.AsReference<IScoobyDoGrain>();
await me.DoSomethingForAScoobySnack();
}
by making the grain call the runtime will know that grain is alive and try not to deactivate due to memory pressure

Which wait mechanism is Polly using

Polly has several retry functionalities like for example WaitAndRetryForever. I looked in the documentation but couldn't find what is used exactly for making the thread wait until the next retry. I guess Polly uses System.Timers for this or is it something completely different? Thanks for any collaboration.
Asynchronous executions (fooAsyncPolicy.ExecuteAsync(...)) wait with Task.Delay(...), freeing the thread the caller was using while the delay occurs.
Synchronous executions (fooSyncPolicy.Execute(...)) wait between retries in a cancellable thread-blocking manner. This means that, for the synchronous (a):
action();
compared to the synchronous (b):
policy.Execute(action);
the following three things all hold:
both (a) and (b) block progress from continuing (subsequent code does not run) until the statement has completed;
(b) executes action on the same thread that (a) originally would have;
(b) expresses exceptions (if Policy operation does not intervene) in the same/similar-as-possible way that (a) originally would have.
These semantics (1) (2) (3) are intentional, to keep synchronously executing code with Polly as similar in semantics/behaviour (surrounding code needs little adjustment) as executing code without Polly.
Anticipating a follow-up question: Wouldn't it be possible to write the synchronous Polly: Policy.Handle<T>().WaitAndRetry(...).Execute(action) so that it didn't block a thread while waiting before retrying?: Yes, but no solution has been found that is preferable to letting the caller control transitions to TPL Tasks or async/await and then using Polly's ExecuteAsync(...).

openCL : No. of iterations in profiling API

Trying to use clGetEventProfilingInfo for timing my kernels.
Is there any facility to give no. of iterations before which the values of start time and end time are reported?
If the kernel is run only once then , of ourse it has lots of over heads associated with it. So to get the best timing we should run the kernel several times and take the average time.
Do we have such a parameter in profiling using API? (We do have such parameters when we use third party software Tools for profiling)
The clGetEventProfilingInfo function will return profiling information for a single event, which corresponds to a single enqueued command. There is no built-in mechanism to automatically report information across a number of calls; you'll have to code that yourself.
It's pretty straightforward to do - just query the start and end times for each event you care about and add them up. If you are only running a single kernel (in a loop), then you could just use a wall-clock timer (with clFinish before you start and stop timing), or take the difference between the time the first event started and the last event finished.

Discrete Event Simulation without global Queue?

I am thinking about modelling a material flow network. There are processes which operate at a certain speed, buffers which can overflow or underflow and connections between these.
I don't see any problems modelling this in a classic Discrete Event Simulation (DES) fashion using a global event queue. I tried modelling the system without a queue but failed in early stages. Still I do not understand the underlying reason why a queue is needed, at least not for events which originate "inside" the network.
The idea of a queue-less DES is to treat the whole network as a function which takes a stream of events from the outside world and returns a stream of state changes. Every node in the network should only be affected by nodes which are directly connected to it. I have set some hopes on Haskell's arrows and Functional Reactive Programming (FRP) in general, but I am still learning.
An event queue looks too "global" to me. If my network falls apart into two subnets with no connections between them and I only ask questions about the state changes of one subnet, the other subnet should not do any computations at all. I could use two event queues in that case. However, as soon as I connect the two subnets I would have to put all events into a single queue. I don't like the idea, that I need to know the topology of the network in order to set up my queue(s).
So
is anybody aware of DES algorithms which do not need a global queue?
is there a reason why this is difficult or even impossible?
is FRP useful in the context of DES?
To answer the first point, no I'm not aware of any discrete-event simulation (DES) algorithms that do not need a global event queue. It is possible to have a hierarchy of event queues, in which each event queue is represented in its parent event queue as an event (corresponding to the time of its next event). If a new event is added to an event queue such that it becomes the queue's next event, then the event queue needs to be rescheduled in its parent to preserve the order of event execution. However, you will ultimately still boil down to a single, global event queue that is the parent of all of the others in hierarchy, and which dispatches each event.
Alternatively, you could dispense with DES and perform something more akin to a programmable logic controller (PLC) which reevaluates the state of the entire network every small increment of time. However, typically, that would be a lot slower (it may not even run as fast as real-time), because most of the time it would have nothing to do. If you pick too big a time increment, the simulation may lose accuracy.
The simplest answer to the second point is that, ultimately, to the best of my knowledge, it is impossible to do without a global event queue. Each simulation event needs to execute at the correct time, and - since time cannot run backwards - the order in which events are dispatched matters. The current simulation time is defined by the time that the current event executes. If you have separate event queues, you also have separate clocks, which would make things very confusing, to say the least.
In your case, if your subnetworks are completely independent, you could simulate each subnetwork individually. However, if the state of one subnetwork affects the state of the total network, and the state of the total network affects the state of each subnetwork, then - since an event is influenced by the events that preceded it, can only influence the events that follow, but cannot influence what preceded it - you have to simulate the whole network with a global event queue.
If it's any consolation, a true DES simulation does not perform any processing in between events (other that determining what the next event is), so there should be no wasted processing in one subnetwork if all the action is taking place in another.
Finally, functional reactive programming (FRP) is absolutely useful in the context of a DES. Indeed, I now write of lot of my DES simulations in Scala using this approach.
I hope this helps!
UPDATE: Since writing the above, I've used Sodium (an excellent FRP library, which was referenced by the OP in the comments below), and can add some further explanation: Sodium provides a means for subscribing to events, and for performing actions when those events occur. However, here I'm using the term event in a general sense, such as a button being clicked by a user in a GUI, or a network package arriving, etc. In other words, the events are not necessarily simulation events.
You can still use Sodium—or any other FRP library—as part of a simulation, to subscribe to simulation events and perform actions when they occur; however, these tools typically have no built-in support for simulation, and so you must incorporate a simulation engine as the source of simulation events, in the same way that a GUI is incorporated as the source of user interaction events. It is within this engine that the global event queue must reside.
Incidentally, if you are trying to perform parallel or distributed simulation model execution, things get considerably more complicated. You have multiple event queues in these situations, but they must be synchronized (giving the appearance of a single queue). The two basic approaches are conservative synchronization and optimistic synchronization.

How to write integration test for systems that interact asynchronously

Assume that i have function called PlaceOrder, which when called inserts the order details into local DB and puts a message(order details) into a TIBCO EMS Queue.
Once message received, a TIBCO BW will then invoke some other system(say ExternalSystem) to pass on the order details.
Now the way i wrote my integration tests is
Call the Place Order
Sleep, and check details exists in local DB
Sleep and check details exists in ExternalSystem.
Is the above approach correct? Above test gives me confidence that, End to End integration is working, but are there any better way to test above scenario?
The problem you describe is quite common, and your approach is a very typical solution.
The problem with this solution is that if the delay is too short, your tests may sometimes pass and sometimes fail, but if the delay is very long, then your just wasteing time waiting, and with many tests, it can add a lot of delay. But unless you can get some signal to tell you the order arrived in the database, then you just have to wait.
You can reduce the delay by doing lots of checks with short intervals. If you're order is not there after timeout, then you would fail the test.
In "Growing Object-Oriented Software, Guided by Tests"*, there is a chapter on this very subject, so you might want to get a copy if you will be doing a lot of this sort of testing.
"There are two ways a test can observe the system: by sampling its observable
state or by listening for events that it sends out. Of these, sampling is
often the only option because many systems don’t send any monitoring
events. It’s quite common for a test to include both techniques to interact
with different “ends” of its system"
(*) http://my.safaribooksonline.com/book/software-engineering-and-development/software-testing/9780321574442

Resources