I'm wrapping my head around Akka from a node.js perspective. I do the following node.js code below, and it's easy to write & follow. I can easily extend to handle and aggregate multiple services async, all while holding onto initial argument state data (a in example).
function handleAction(a, b, callback) {
remoteServiceOperation(b, function(err, data) {
if (!err) {
// I can reference argument a
z = a + data; // Do work with a and service result
}
});
}
Below is rough pseudo for Akka/Scala. My understanding is ask ? blocks, and should be generally avoided. Attempting to illustrate where my knowledge train ends, and how I'm not clear about how to hold state a, aggregate (not-shown), or perhaps structure Akka in a node.js style in general.
receive {
case handleAction(a) =>
remoteService ! new RemoteServiceOperation(b, c)
// About to leave and we'll loose `a`
case remoteServiceOperationResponse(data) =>
// `a` is afk
}
How can I write Akka more like node?
I guess your callback well-known style of programming in nodejs maps better of use of Futures than Actors.
It looks like this:
val service1Res: Future[Int] = service1()
val service2Res: Future[Int] = service2()
for {
i <- service1Res
j <- service2Res
} yield i+j + 2
Many people say that actors are used best for holding state and futures for things like this. The problem you will encounter when modelling everything with actors is an explosion of messages you will need to define to communicate those actors.
If you really need or want to use actors when you have that handleAction message spawn a new actor that will complete the work and will get the a in c-tor. Additional benefit of this you will have nice supervision what is probably the least used options of akka and at the same time most valuable.
Related
I am new to akka and still trying to understand the different akka and streaming concepts. For some new feature i need to add a http call to already existing stream which is working on an internal object. Something like this -
val step1Flow = Flow[SampleObject].filter(...--Filtering condition--...)
val step2Flow = Flow[SampleObject].map(obj => {
...
-- Business logic to update values in the obj --
...
})
...
override val flowGraph: Flow[SampleObject, SampleObject, NotUsed] =
bufferIn.via(Flow.fromGraph(GraphDSL.create() {
implicit builder =>
import GraphDSL.Implicits._
...
val step1 = builder.add(step1Flow)
val step2 = builder.add(step2Flow)
val step3 = builder.add(step3Flow)
...
source ~> step1 ~> step2 ~> step3 ~> merge
...
}
I need to add the new http request flow (lets call it newFlow) after step1. All these flow have Inlet and Outlet as SampleObject. Now my understanding is that the newFlow would need to be blocking because the outlet need to be SampleObject only. For that I have used Await function on the http call future. The code looks like this -
val responseFuture: Future[(Try[HttpResponse], SomeContext)] =
Source
.single(httpRequest -> context)
.via(Retry(retrySettings).join(clientFlow))
.runWith(Sink.head)
...
val (httpTry, passedAlongContext) = Await.result(responseFuture, 30.seconds)
-- logic to process response and return SampleObject --
Now this works fine but i think there should be a better way to do this without using wait. Also i think this would block the main thread till the request completes, which is going to affect the app throughput.
Could you please guide if the approach i used is correct or not. And how do i make use of some other thread pool to handle these blocking call so my main threadpool is not affected
This question seems very similar to mine but i do not understand it completely - connect Akka HTTP to Akka stream . Also i can't change the step2 or further flows.
EDIT : Added some code details for the stream
I ended up using the approach mentioned in the question because i couldn't find anything better after looking around. Adding this step decreased the throughput of my application as expected, but there are approaches to increase that can be used. Check these awesome blogs by Colin Breck -
https://blog.colinbreck.com/maximizing-throughput-for-akka-streams/
https://blog.colinbreck.com/partitioning-akka-streams-to-maximize-throughput/
To summarize -
Use Asynchronous Boundaries for flows which are blocking.
Use Futures if possible and add callbacks to futures. There are several ways to do that.
Use Buffers. There are several types of buffers available, choose what suits your needs.
Other than these, you can use inbuilt flows like -
Use "Broadcast" to broadcast your events to multiple consumers.
Use "Partition" to partition your stream into multiple streams based
on some condition.
Use "Balance" to partition your stream when there is no logical way to partition your events or they all could have different work loads.
You could use any one or multiple things from above options.
I am writing unit tests for some async sections of my code (returning Futures) that also involves the need to mock a Scala object.
Following these docs, I can successfully mock the object's functions. My question stems from the fact that withObjectMocked[FooObject.type] returns Unit, where async tests in scalatest require either an Assertion or Future[Assertion] to be returned. To get around this, I'm creating vars in my tests that I reassign within the function sent to withObjectMocked[FooObject.type], which ends up looking something like this:
class SomeTest extends AsyncWordSpec with Matchers with AsyncMockitoSugar with ResetMocksAfterEachAsyncTest {
"wish i didn't need a temp var" in {
var ret: Future[Assertion] = Future.failed(new Exception("this should be something")) // <-- note the need to create the temp var
withObjectMocked[SomeObject.type] {
when(SomeObject.someFunction(any)) thenReturn Left(Error("not found"))
val mockDependency = mock[SomeDependency]
val testClass = ClassBeingTested(mockDependency)
ret = testClass.giveMeAFuture("test_id") map { r =>
r should equal(Error("not found"))
} // <-- set the real Future[Assertion] value here
}
ret // <-- finally, explicitly return the Future
}
}
My question then is, is there a better/cleaner/more idiomatic way to write async tests that mock objects without the need to jump through this bit of a hoop? For some reason, I figured using AsyncMockitoSugar instead of MockitoSugar would have solved that for me, but withObjectMocked still returns Unit. Is this maybe a bug and/or a candidate for a feature request (the async version of withObjectMocked returning the value of the function block rather than Unit)? Or am I missing how to accomplish this sort of task?
You should refrain from using mockObject in a multi-thread environment as it doesn't play well with it.
This is because the object code is stored as a singleton instance, so it's effectively global.
When you use mockObject you're efectibly forcefully overriding this var (the code takes care of restoring the original, hence the syntax of usign it as a "resource" if you want).
Because this var is global/shared, if you have multi-threaded tests you'll endup with random behaviour, this is the main reason why no async API is provided.
In any case, this is a last resort tool, every time you find yourself using it you should stop and ask yourself if there isn't anything wrong with your code first, there are quite a few patterns to help you out here (like injecting the dependency), so you should rarely have to do this.
I have several functions that deal with database interactions. (like readModelById, updateModel, findModels, etc) that I try to use in a functional style.
In OOP, I'd create a class that takes DB-connection-parameters in the constructor, creates the database-connection and save the DB-handle in the instance. The functions then would just use the DB-handle from "this".
What's the best way in FP to deal with this? I don't want to hand around the DB handle throughout the entire application. I thought about partial application on the functions to "bake in" the handle, but that creates ugly boilerplate code, doing it one by one and handing it back.
What's the best practice/design pattern for things like this in FP?
There is a parallel to this in OOP that might suggest the right approach is to take the database resource as parameter. Consider the DB implementation in OOP using SOLID principles. Due to Interface Segregation Principle, you would end up with an interface per DB method and at least one implementation class per interface.
// C#
public interface IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day);
}
public class GetRegistrationsImpl : IGetRegistrations
{
public Task<Registration[]> GetRegistrations(DateTime day)
{
...
}
private readonly DbResource _db;
public GetRegistrationsImpl(DbResource db)
{
_db = db;
}
}
Then to execute your use case, you pass in only the dependencies you need instead of the whole set of DB operations. (Assume that ISaveRegistration exists and is defined like above).
// C#
public async Task Register(
IGetRegistrations a,
ISaveRegistration b,
RegisterRequest requested
)
{
var registrations = await a.GetRegistrations(requested.Date);
// examine existing registrations and determine request is valid
// throw an exception if not?
...
return await b.SaveRegistration( ... );
}
Somewhere above where this code is called, you have to new up the implementations of these interfaces and provide them with DbResource.
var a = new GetRegistrationsImpl(db);
var b = new SaveRegistrationImpl(db);
...
return await Register(a, b, request);
Note: You could use a DI framework here to attempt to avoid some boilerplate. But I find it to be borrowing from Peter to pay Paul. You pay as much in having to learn a DI framework and how to make it behave as you do for wiring the dependencies yourself. And it is another tech new team members have to learn.
In FP, you can do the same thing by simply defining a function which takes the DB resource as a parameter. You can pass functions around directly instead of having to wrap them in classes implementing interfaces.
// F#
let getRegistrations (db: DbResource) (day: DateTime) =
...
let saveRegistration (db: DbResource) ... =
...
The use case function:
// F#
let register fGet fSave request =
async {
let! registrations = fGet request.Date
// call your business logic here
...
do! fSave ...
}
Then to call it you might do something like this:
register (getRegistrations db) (saveRegistration db) request
The partial application of db here is analogous to constructor injection. Your "losses" from passing it to multiple functions is minimal compared to the savings of not having to define interface + implementation for each DB operation.
Despite being in a functional-first language, the above is in principle the same as the OO/SOLID way... just less lines of code. To go a step further into the functional realm, you have to work on eliminating side effects in your business logic. Side effects can include: current time, random numbers, throwing exceptions, database operations, HTTP API calls, etc.
Since F# does not require you to declare side effects, I designate a border area of code where side effects should stop being used. For me, the use case level (register function above) is the last place for side effects. Any business logic lower than that, I work on pushing side effects up to the use case. It is a learning process to do that, so do not be discouraged if it seems impossible at first. Just do what you have to for now and learn as you go.
I have a post that attempts to set the right expectations on the benefits of FP and how to get them.
I'm going to add a second answer here taking an entirely different approach. I wrote about it here. This is the same approach used by MVU to isolate decisions from side effects, so it is applicable to UI (using Elmish) and backend.
This is worthwhile if you need to interleave important business logic with side effects. But not if you just need to execute a series of side effects. In that case just use a block of imperative statements, in a task (F# 6 or TaskBuilder) or async block if you need IO.
The pattern
Here are the basic parts.
Types
Model - The state of the workflow. Used to "remember" where we are in the workflow so it can be resumed after side effects.
Effect - Declarative representation of the side effects you want to perform and their required data.
Msg - Represents events that have happened. Primarily, they are the results of side effects. They will resume the workflow.
Functions
update - Makes all the decisions. It takes in its previous state (Model) and a Msg and returns an updated state and new Effects. This is a pure function which should have no side effects.
perform - Turns a declared Effect into a real side effect. For example, saving to a database. Returns a Msg with the result of the side effect.
init - Constructs an initial Model and starting Msg. Using this, a caller gets the data it needs to start the workflow without having to understand the internal details of update.
I jotted down an example for a rate-limited emailer. It includes the implementation I use on the backend to package and run this pattern, called Ump.
The logic can be tested without any instrumentation (no mocks/stubs/fakes/etc). Declare the side effects you expect, run the update function, then check that the output matches with simple equality. From the linked gist:
// example test
let expected = [SendEmail email1; ScheduleSend next]
let _, actual = Ump.test ump initArg [DueItems ([email1; email2], now)]
Assert.IsTrue(expected = actual)
The integrations can be tested by exercising perform.
This pattern takes some getting-used-to. It reminds me a bit of an Erlang actor or a state machine. But it is helpful when you really need your business logic to be correct and well-tested. It also happens to be a proper functional pattern.
I have a problem that I can solve reasonably easy with classic imperative programming using state: I'm writing a co-browsing app that shares URL's between several nodes. The program has a module for communication that I call link and for browser handling that I call browser. Now when a URL arrives in link i use the browser module to tell the
actual web browser to start loading the URL.
The actual browser will trigger the navigation detection that the incoming URL has started to load, and hence will immediately be presented as a candidate for sending to the other side. That must be avoided, since it would create an infinite loop of link-following to the same URL, along the line of the following (very conceptualized) pseudo-code (it's Javascript, but please consider that a somewhat irrelevant implementation detail):
actualWebBrowser.urlListen.gotURL(function(url) {
// Browser delivered an URL
browser.process(url);
});
link.receivedAnURL(function(url) {
actualWebBrowser.loadURL(url); // will eventually trigger above listener
});
What I did first wast to store every incoming URL in browser and simply eat the URL immediately when it arrives, then remove it from a 'received' list in browser, along the lines of this:
browser.recents = {} // <--- mutable state
browser.recentsExpiry = 40000;
browser.doSend = function(url) {
now = (new Date).getTime();
link.sendURL(url); // <-- URL goes out on the network
// Side-effect, mutating module state, clumsy clean up mechanism :(
browser.recents[url] = now;
setTimeout(function() { delete browser.recents[url] }, browser.recentsExpiry);
return true;
}
browser.process = function(url) {
if(/* sanity checks on `url`*/) {
now = (new Date).getTime();
var duplicate = browser.recents[url];
if(! duplicate) return browser.doSend(url);
if((now - duplicate_t) > browser.recentsExpiry) {
return browser.doSend(url);
}
return false;
}
}
It works but I'm a bit disappointed by my solution because of my habitual use of mutable state in browser. Is there a "Better Way (tm)" using immutable data structures/functional programming or the like for a situation like this?
A more functional approach to handling long-lived state is to use it as a parameter to a recursive function, and have one execution of the function responsible for handling a single "action" of some kind, then calling itself again with the new state.
F#'s MailboxProcessor is one example of this kind of approach. However it does depend on having the processing happen on an independent thread which isn't the same as the event-driven style of your code.
As you identify, the setTimeout in your code complicates the state management. One way you could simplify this out is to instead have browser.process filter out any timed-out URLs before it does anything else. That would also eliminate the need for the extra timeout check on the specific URL it is processing.
Even if you can't eliminate mutable state from your code entirely, you should think carefully about the scope and lifetime of that state.
For example might you want multiple independent browsers? If so you should think about how the recents set can be encapsulated to just belong to a single browser, so that you don't get collisions. Even if you don't need multiple ones for your actual application, this might help testability.
There are various ways you might keep the state private to a specific browser, depending in part on what features the language has available. For example in a language with objects a natural way would be to make it a private member of a browser object.
I'm trying to use Dart with sqlite, with this project dart-sqlite.
But I found a problem: the API it provides is synchronous style. The code will be looked like:
// Iterating over a result set
var count = c.execute("SELECT * FROM posts LIMIT 10", callback: (row) {
print("${row.title}: ${row.body}");
});
print("Showing ${count} posts.");
With such code, I can't use Dart's future support, and the code will be blocking at sql operations.
I wonder how to change the code to asynchronous style? You can see it defines some native functions here: https://github.com/sam-mccall/dart-sqlite/blob/master/lib/sqlite.dart#L238
_prepare(db, query, statementObject) native 'PrepareStatement';
_reset(statement) native 'Reset';
_bind(statement, params) native 'Bind';
_column_info(statement) native 'ColumnInfo';
_step(statement) native 'Step';
_closeStatement(statement) native 'CloseStatement';
_new(path) native 'New';
_close(handle) native 'Close';
_version() native 'Version';
The native functions are mapped to some c++ functions here: https://github.com/sam-mccall/dart-sqlite/blob/master/src/dart_sqlite.cc
Is it possible to change to asynchronous? If possible, what shall I do?
If not possible, that I have to rewrite it, do I have to rewrite all of:
The dart file
The c++ wrapper file
The actual sqlite driver
UPDATE:
Thanks for #GregLowe's comment, Dart's Completer can convert callback style to future style, which can let me to use Dart's doSomething().then(...) instead of passing a callback function.
But after reading the source of dart-sqlite, I realized that, in the implementation of dart-sqlite, the callback is not event-based:
int execute([params = const [], bool callback(Row)]) {
_checkOpen();
_reset(_statement);
if (params.length > 0) _bind(_statement, params);
var result;
int count = 0;
var info = null;
while ((result = _step(_statement)) is! int) {
count++;
if (info == null) info = new _ResultInfo(_column_info(_statement));
if (callback != null && callback(new Row._internal(count - 1, info, result)) == true) {
result = count;
break;
}
}
// If update affected no rows, count == result == 0
return (count == 0) ? result : count;
}
Even if I use Completer, it won't increase the performance. I think I may have to rewrite the c++ code to make it event-based first.
You should be able to write a wrapper without touching the C++. Have a look at how to use the Completer class in dart:async. Basically you need to create a Completer, return Completer.future immediately, and then call Completer.complete(row) from the existing callback.
Re: update. Have you seen this article, specifically the bit about asynchronous extensions? i.e. If the C++ API is synchronous you can run it in a separate thread, and use messaging to communicate with it. This could be a way to do it.
The big problem you've got is that SQLite is an embedded database; in order to process your query and provide your results, it must do computation (and I/O) in your process. What's more, in order for its transaction handling system to work, it either needs its connection to be in the thread that created it, or for you to run in serialized mode (with a performance hit).
Because these are fairly hard constraints, your plan of switching things to an asynchronous operation mode is unlikely to go well except by using multiple threads. Since using multiple connections complicates things a lot (as you can't share some things between them, such as TEMP TABLEs) let's consider going for a single serialized connection; all activity will be serialized at the DB level, but for an application that doesn't use the DB a lot it will be OK. At the C++ level, you'd be talking about calling that execute from another thread and then sending messages back to the caller thread to indicate each row and the completion.
But you'll take a real hit when you do this; in particular, you're committing to only doing one query at a time, as the technique runs into significant problems with semantic effects when you start using two connections at once and the DB forces serialization on you with one connection.
It might be simpler to do the above by putting the synchronous-asynchronous coupling at the Dart level by managing the worker thread and inter-thread communication there. That would let you avoid having to change the C++ code significantly. I don't know Dart well enough to be able to give much advice there.
Myself, I'd just stick with synchronous connection processing so that I can make my application use multi-threaded mode more usefully. I'd be taking the hit with the semantics and giving each thread its own connection (possibly allocated lazily) so that overall speed was better, but I do come from a programming community that regards threads as relatively heavyweight resources, so make of that what you will. (Heavy threads can do things that reduce the number of locks they need that it makes no sense to try to do with light threads; it's about overhead management.)