In an application which is loading data in a batch process, how can Corda trigger many flows in parallel without tracking or observables?
On the RPC client there are only two methods (startFlowDynamic and startTrackedFlowDynamic) https://docs.corda.net/api/kotlin/corda/net.corda.core.messaging/-corda-r-p-c-ops/index.html
The goal is to avoid unnecessary overhead and run many flows in parallel. When tracked flows are used Corda deliver a warning about unused listeners: https://github.com/corda/corda/blob/release-V3.2/client/rpc/src/main/kotlin/net/corda/client/rpc/internal/RPCClientProxyHandler.kt#L143
The warning that you're seeing when using startTrackedFlowDynamic is occurring because it returns several Observables that are not being used. As the warning says, you will need to manually close the unused ones by subscribing and then subscribing from them or use the helper method net.corda.client.rpc.UtilsKt#notUsed.
With regardings to running multiple flows in parallel then you can do something like this:
FlowHandle<Result> flowHandle1 = rpcProxy.startFlowDynamic(ExampleFlow.class, *flow parameters 1*);
FlowHandle<Result> flowHandle2 = rpcProxy.startFlowDynamic(ExampleFlow.class, *flow parameters 2*);
FlowHandle<Result> flowHandle3 = rpcProxy.startFlowDynamic(ExampleFlow.class, *flow parameters 3*);
Then you can wait for all of them to complete by simply waiting on each result Future object in turn:
Arrays.asList(flowHandle1, flowHandle2, flowHandle3).forEach(handle -> {
handle.getReturnValue().get();
});
Related
I am new to akka and still trying to understand the different akka and streaming concepts. For some new feature i need to add a http call to already existing stream which is working on an internal object. Something like this -
val step1Flow = Flow[SampleObject].filter(...--Filtering condition--...)
val step2Flow = Flow[SampleObject].map(obj => {
...
-- Business logic to update values in the obj --
...
})
...
override val flowGraph: Flow[SampleObject, SampleObject, NotUsed] =
bufferIn.via(Flow.fromGraph(GraphDSL.create() {
implicit builder =>
import GraphDSL.Implicits._
...
val step1 = builder.add(step1Flow)
val step2 = builder.add(step2Flow)
val step3 = builder.add(step3Flow)
...
source ~> step1 ~> step2 ~> step3 ~> merge
...
}
I need to add the new http request flow (lets call it newFlow) after step1. All these flow have Inlet and Outlet as SampleObject. Now my understanding is that the newFlow would need to be blocking because the outlet need to be SampleObject only. For that I have used Await function on the http call future. The code looks like this -
val responseFuture: Future[(Try[HttpResponse], SomeContext)] =
Source
.single(httpRequest -> context)
.via(Retry(retrySettings).join(clientFlow))
.runWith(Sink.head)
...
val (httpTry, passedAlongContext) = Await.result(responseFuture, 30.seconds)
-- logic to process response and return SampleObject --
Now this works fine but i think there should be a better way to do this without using wait. Also i think this would block the main thread till the request completes, which is going to affect the app throughput.
Could you please guide if the approach i used is correct or not. And how do i make use of some other thread pool to handle these blocking call so my main threadpool is not affected
This question seems very similar to mine but i do not understand it completely - connect Akka HTTP to Akka stream . Also i can't change the step2 or further flows.
EDIT : Added some code details for the stream
I ended up using the approach mentioned in the question because i couldn't find anything better after looking around. Adding this step decreased the throughput of my application as expected, but there are approaches to increase that can be used. Check these awesome blogs by Colin Breck -
https://blog.colinbreck.com/maximizing-throughput-for-akka-streams/
https://blog.colinbreck.com/partitioning-akka-streams-to-maximize-throughput/
To summarize -
Use Asynchronous Boundaries for flows which are blocking.
Use Futures if possible and add callbacks to futures. There are several ways to do that.
Use Buffers. There are several types of buffers available, choose what suits your needs.
Other than these, you can use inbuilt flows like -
Use "Broadcast" to broadcast your events to multiple consumers.
Use "Partition" to partition your stream into multiple streams based
on some condition.
Use "Balance" to partition your stream when there is no logical way to partition your events or they all could have different work loads.
You could use any one or multiple things from above options.
We appear to have a problem with MDriven generating the same ECO_ID for multiple objects. For the most part it seems to happen in conjunction with unexpected process shutdowns and/or server shutdowns, but it does also happen during normal activity.
Our system consists of one ASP.NET application and one WinForms application. The ASP.NET app is setup in IIS to use a single worker process. We have a mixture of WebForms and MVC, including ApiControllers. We're using a rather old version of the ECO packages: 7.0.0.10021. We're on VS 2017, target framework is 4.7.1.
We have it configured to use 64 bit integers for object id:s. Database is Firebird. SQL configuration is set to use ReadCommitted transaction isolation.
As far as I can tell we have configured EcoSpaceStrategyHandler with EcoSpaceStrategyHandler.SessionStateMode.Never, which should mean that EcoSpaces are not reused at all, right? (Why would I even use EcoSpaceStrategyHandler in this case, instead of just creating EcoSpace normally with the new keyword?)
We have created MasterController : Controller and MasterApiController : ApiController classes that we use for all our controllers. These have a EcoSpace property that simply does this:
if (ecoSpace == null)
{
if (ecoSpaceStrategyHandler == null)
ecoSpaceStrategyHandler = new EcoSpaceStrategyHandler(
EcoSpaceStrategyHandler.SessionStateMode.Never,
typeof(DiamondsEcoSpace),
null,
false
);
ecoSpace = (DiamondsEcoSpace)ecoSpaceStrategyHandler.GetEcoSpace();
}
return ecoSpace;
I.e. if no strategy handler has been created, create one specifying no pooling and no session state persisting of eco spaces. Then, if no ecospace has been fetched, fetch one from the strategy handler. Return the ecospace. Is this an acceptable approach? Why would it be better than simply doing this:
if (ecoSpace = null)
ecoSpace = new DiamondsEcoSpace();
return ecoSpace;
In aspx we have a master page that has an EcoSpaceManager. It has been configured to use a pool but SessionStateMode is Never. It has EnableViewState set to true. Is this acceptable? Does it mean that EcoSpaces will be pooled but inactivated between round trips?
It is possible that we receive multiple incoming API calls in tight succession, so that one API call hasn't been completed before the next one comes in. I assume that this means that multiple instances of MasterApiController can execute simultaneously but in separate threads. There may of course also be MasterController instances executing MVC requests and also the WinForms app may be running some batch job or other.
But as far as I understand id reservation is made at the beginning of any UpdateDatabase call, in this way:
update "ECO_ID" set "BOLD_ID" = "BOLD_ID" + :N;
select "BOLD_ID" from "ECO_ID";
If the returned value is K, this will reserve N new id:s ranging from K - N to K - 1. Using ReadCommitted transactions everywhere should ensure that the update locks the id data row, forcing any concurrent save operations to wait, then fetches the update result without interference from other transactions, then commits. At that point any other pending save operation can proceed with its own id reservation. I fail to see how this could result in the same ID being used for multiple objects.
I should note that it does seem like it sometimes produces id duplicates within one single UpdateDatabase, i.e. when saving a set of new related objects, some of them end up with the same id. I haven't really confirmed this though.
Any ideas what might be going on here? What should I look for?
The issue is most likely that you use ReadCommitted isolation.
This allows for 2 systems to simultaneously start a transaction, read the current value, increase the batch, and then save after each other.
You must use Serializable isolation for key generation; ie only read things not currently in a write operation.
MDriven use 2 settings for isolation level UpdateIsolationLevel and FetchIsolationLevel.
Set your UpdateIsolationLevel to Serializable
I have one Azure Function which is adding documents to cosmos db. One way of adding is by creating a cosmos client and then calling client.CreateDocumentAsync( ) and other is creating an Output binding to Azure Function with IAsyncCollector documents and then calling documents.AddAsync().
I would like to know what's the difference between these two and which one is more preferable.
Thanks
Two main differences:
Output binding maintains a single static instance of the client across executions. When you are calling client.CreateDocumentAsync you are in charge of maintaining the client instance, and the recommendation is to follow the singleton pattern to avoid opening multiple connections and not sharing them across executions.
Output binding actually does a UpsertDocumentAsync when AddAsync is called on the IAsyncCollector, you can check the source code: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Bindings/CosmosDBAsyncCollector.cs#L28
I have a requirement where I will be receiving a batch of records. I have to disassemble and insert the data into DB which I have completed. But I don't want any message to come out of the pipeline except the last custom made message.
I have extended FFDasm and called Disassembler(), then we have GetNext() which is returning every debatched message out and they are failing as there is subscribers. I want to send nothing out from GetNext() until Last message.
Please help if anyone have already implemented this requirement. Thanks!
If you want to send only one message on the GetNext, you have to call on Disassemble method to the base Disassemble and get all the messages (you can enqueue this messages to manage them on GetNext) as:
public new void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
{
try
{
base.Disassemble(pContext, pInMsg);
IBaseMessage message = base.GetNext(pContext);
while (message != null)
{
// Only store one message
if (this.messagesCount == 0)
{
// _message is a Queue<IBaseMessage>
this._messages.Enqueue(message);
this.messagesCount++;
}
message = base.GetNext(pContext);
}
}
catch (Exception ex)
{
// Manage errors
}
Then on GetNext method, you have the queue and you can return whatever you want:
public new IBaseMessage GetNext(IPipelineContext pContext)
{
return _messages.Dequeue();
}
The recommended approach is to publish messages after disassemble stage to BizTalk message box db and use a db adapter to insert into database. Publishing messages to message box and using adapter will provide you more options on design/performance and will decouple your DB insert from receive logic. Also in future if you want to reuse the same message for something else, you would be able to do so.
Even then for any reason if you have to insert from pipeline component then do the following:
Please note, GetNext() method of IDisassembler interface is not invoked until Disassemble() method is complete. Based on this, you can use following approach assuming you have encapsulated FFDASM within your own custom component:
Insert all disassembled messages in disassemble method itself and enqueue only the last message to a Queue class variable. In GetNext() message then return the Dequeued message, when Queue is empty return null. You can optimize the DB insert by inserting multiple rows at a time and saving them in batches depending on volume. Please note this approach may encounter performance issues depending on the size of file and number of rows being inserted into db.
I am calling DBInsert SP from GetNext()
Oh...so...sorry to say, but you're doing it wrong and actually creating a bunch of problems doing this. :(
This is a very basic scenario to cover with BizTalk Server. All you need is:
A Pipeline Component to Promote BTS.InterchageID
A Sequential Convoy Orchestration Correlating on BTS.InterchangeID and using Ordered Delivery.
In the Orchestration, call the SP, transform to SOAP, call the SOAP endpoint, whatever you need.
As you process the Messages, check for BTS.LastInterchagneMessage, then perform your close out logic.
To be 100% clear, there are no practical 'performance' issues here. By guessing about 'performance' you've actually created the problem you were thinking to solve, and created a bunch of support issues for later on, sorry again. :( There is no reason to not use an Orchestration.
As noted, 25K records isn't a lot. Be sure to have the Receive Location and Orchestration in different Hosts.
I have several function calls in a row that run and wait to return, then the next one runs. After these are run I have one function I want to run, but then I don't want to wait for it to be done before I run my return.
Here is an example of what I mean.
get_card, create_order, create_association and debit_order all need to wait for the previous function to complete before they can run. When I get to Queue.start_account_creation_task I want it to start running, but then let the return on the line below run right away too.
Meteor.methods({
singleDonation: function (data) {
logger.info("Started singleDonation");
//Get the card data from balanced and store it
var card = Utils.get_card(customerData._id, data.paymentInformation.href);
//Create a new order
var orders = Utils.create_order(data._id, customerData.href);
//Associate the card with the balanced customer
var associate = Utils.create_association(customerData._id, card.href, customerData.href);
//Debit the order
var debitOrder = Utils.debit_order(data.paymentInformation.total_amount, data._id, customerData._id, orders.href, card.href);
Queue.start_account_creation_task(customerData._id, data._id, debitOrder._id);
return {c: customerData._id, don: data._id, deb: debitOrder._id};
}
});
Sounds like you need parallel and serial control for tasks. The (as in, 400,000 downloads a day) Node.js module for that is called async, and a Meteor wrapper for it is peerlibrary:async.
Sooner or later you'll need a dedicated background task management package. If async is insufficient, have a look at my evaluation of packages to control background tasks in Meteor.
The thing that seemed to work the best for what I was trying to do was to just use a Meteor.setTimeout({}). It might seem like an odd choice, but it does everything I needed, including setting the Meteor Environment so that I didn't have to do any BindEnrironment call. It also breaks out of the current thread of calls, which means it then returns the result to the client and a second later finishes the rest of the calls (which are to external APIs that I didn't need my users sitting there waiting for).