Asynchronous data exchange between tasks - asynchronous

The documentation in the "Tasks in Toit" section indicates that the language has facilities for asynchronous data exchange between tasks. If I understand correctly, then two classes from the monitor package: Channel and Mailbox provide this opportunity. Unfortunately, I didn't find examples of using these classes, so I ask you to give at least the simplest example of the implementation of two tasks:
One of the tasks is a message generator, for example, it sends
integers or strings to the second task. The second task gets these
numbers or strings. Perhaps in this case the Channel class should
be used.
Each of the two tasks is both a generator and a receiver of messages.
Those. the first task sends a message to the second task and in turn
asynchronously receives the messages generated by the second task.
Judging by the description, the Mailbox class should be used in
this case.
Thanks in advance,
MK

Here's an example of the first part, using Channel. This class is useful if you have a stream of messages for another task.
import monitor
main:
// A channel with a backlog of 5 items. Once the reader is 5 items behind, the
// writer will block when trying to send. This helps avoid unbounded memory
// use by the in-flight messages if messages are being generated faster than
// they are being consumed. Decreasing this will tend to reduce throughput,
// increasing it will increase memory use.
channel := monitor.Channel 5
task:: message_generating_task channel
task:: message_receiving_task channel
/// Normally this could be looking at IO ports, GPIO pins, or perhaps a socket
/// connection. It could block for some unknown time while doing this. In this
/// case we just sleep a little to illustrate that the data arrives at random
/// times.
generate_message:
milliseconds := random 1000
sleep --ms=milliseconds
// The message is just a string, but could be any object.
return "Message creation took $(milliseconds)ms"
message_generating_task channel/monitor.Channel:
10.repeat:
message := generate_message
channel.send message
channel.send null // We are done.
/// Normally this could be looking at IO ports, GPIO pins, or perhaps a socket
/// connection. It could block for some unknown time while doing this. In this
/// case we just sleep a little to illustrate that the data takes a random
/// amount of time to process.
process_message message:
milliseconds := random 1000
sleep --ms=milliseconds
print message
message_receiving_task channel/monitor.Channel:
while message := channel.receive:
process_message message
Here is an example of using Mailbox. This class is useful if you have a task processing requests and giving responses to other tasks.
import monitor
main:
mailbox := monitor.Mailbox
task:: client_task 1 mailbox
task:: client_task 2 mailbox
task --background:: factorization_task mailbox
/// Normally this could be looking at IO ports, GPIO pins, or perhaps a socket
/// connection. It could block for some unknown time while doing this. For
/// this example we just sleep a little to illustrate that the data arrives at
/// random times.
generate_huge_number:
milliseconds := random 1000
sleep --ms=milliseconds
return (random 100) + 1 // Not actually so huge.
client_task task_number mailbox/monitor.Mailbox:
10.repeat:
huge := generate_huge_number
factor := mailbox.send huge // Send number, wait for result.
other_factor := huge / factor
print "task $task_number: $factor * $other_factor == $huge"
// Factorize a number using the quantum computing port.
factorize_number number:
// TODO: Use actual quantum computing instead of brute-force search.
for i := number.sqrt.round; i > 1; i--:
factor := number / i
if factor * i == number:
return factor
// This will yield so the other tasks can run. In a real application it
// would be waiting on an IO pin connected to the quantum computing unit.
sleep --ms=1
return 1 // 1 is sort-of a factor of all integers.
factorization_task mailbox/monitor.Mailbox:
// Because this task was started as a background task (see 'main' function),
// the program does not wait for it to exit so this loop does not need a real
// exit condition.
while number := mailbox.receive:
result := factorize_number number
mailbox.reply result

I'm pretty sure the Mailbox example worked great at the end of March. I decided to check it now and got the error:
In case of Console Toit:
./web.toit:8:3: error: Argument mismatch: 'task'
task --background:: factorization_task mailbox
^~~~
Compilation failed.
In case of terminal:
micrcx#micrcx-desktop:~/toit_apps/Hsm/communication$ toit execute mailbox_sample.toit
mailbox_sample.toit:8:3: error: Argument mismatch: 'task'
task --background:: factorization_task mailbox
^~~~
Compilation failed.
Perhaps this is due to the latest SDK update. Just in case:
Toit CLI:
| v1.0.0 | 2021-03-29 |

Related

.netcore I want to pull 500 kafka messages at once, how do I configure [duplicate]

As per my understanding Kafka consumer reads messages from an assigned partition sequentially...
We are planning to have multiple Kafka consumer (Java) which has same group I'd ..so if it reads sequentially from an assigned partition then how we can achieve high throughput ..i.e. For Example Producer publishes messages like 40 per sec ...
Consumer process msg 1 per sec ..though we can have multiple consumers but cannot have 40 rt??? Correct me if I'm wrong...
And in our case consumer have to commit offset only after message is processed successfully ..else message will be reprocessed... Is there any better solution???
Based on your question clarification.
A Kafka Consumer can read multiple messages at a time. But a Kafka Consumer doesn't really read messages, its more correct to say a Consumer reads a certain number of bytes and then based on the size of the individual messages, that determines how many messages will be read. Reading through the Kafka Consumer Configs, you're not allowed to specify how many messages to fetch, you specify a max/min data size that a consumer can fetch. However many messages fit inside that range is how many you will get. You will always get messages sequentially as you have pointed out.
Related Consumer Configs (for 0.9.0.0 and greater)
fetch.min.bytes
max.partition.fetch.bytes
UPDATE
Using your example in the comments, "my understanding is if i specify in config to read 10 bytes and if each message is 2 bytes the consumer reads 5 messages at a time." That is true. Your next statement, "that means the offsets of these 5 messages were random with in partition" that is false. Reading sequential doesn't mean one by one, it just means that they remain ordered. You are able to batch items and have them remain sequential/ordered. Take the following examples.
In a Kafka log, if there are 10 messages (each 2 bytes) with the following offsets, [0,1,2,3,4,5,6,7,8,9].
If you read 10 bytes, you'll get a batch containing the messages at offsets [0,1,2,3,4].
If you read 6 bytes, you'll get a batch containing the messages at offsets [0,1,2].
If you read 6 bytes, then another 6 bytes, you'll get two batches containing the messages [0,1,2] and [3,4,5].
If you read 8 bytes, then 4 bytes, you'll get two batches containing the messages [0,1,2,3] and [4,5].
Update: Clarifying Committing
I'm not 100% sure how committing works, I've mainly worked with Kafka from a Storm environment. The provided KafkaSpout automatically commits Kafka messages.
But looking through the 0.9.0.1 Consumer APIs, which I would recommend you do to. There seems to be three methods in particular that are relevant to this discussion.
poll(long timeout)
commitSync()
commitSync(java.util.Map offsets)
The poll method retrieves messages, could be only 1, could be 20, for your example lets say 3 messages were returned [0,1,2]. You now have those three messages. Now it's up you to determine how to process them. You could process them 0 => 1 => 2, 1 => 0 => 2, 2 => 0 => 1, it just depends. However you process them, after processing you'll want to commit which tells the Kafka server you're done with those messages.
Using the commitSync() commits everything returned on last poll, in this case it would commit offsets [0,1,2].
On the other hand, if you choose to use commitSync(java.util.Map offsets), you can manually specify which offsets to commit. If you're processing them in order, you can process offset 0 then commit it, process offset 1 then commit it, finally process offset 2 and commit.
All in all, Kafka gives you the freedom to process messages how to desire, you can choose to process them sequentially or entirely random at your choosing.
To achieve parallelism, which seems to be what you're asking, you use topic partitions (you split topic on N parts which are called partitions).
Then, in the consumer, you spawn multiple threads to consume from those partitions.
On the Producer side, you publish messages to random partition (default) or you provide Kafka with some message attribute to calculate hash (if ordering is required), which makes sure that all msgs with the same hash go to the same partition.
EDIT (example of offset commit request):
This is how I did it. All methods that are not provided are non-essential.
/**
* Commits the provided offset for the current client (i.e. unique topic/partition/clientName combination)
*
* #param offset
* #return {#code true} or {#code false}, depending on whether commit succeeded
* #throws Exception
*/
public static boolean commitOffset(String topic, int partition, String clientName, SimpleConsumer consumer,
long offset) throws Exception {
try {
TopicAndPartition tap = new TopicAndPartition(topic, partition);
OffsetAndMetadata offsetMetaAndErr = new OffsetAndMetadata(offset, OffsetAndMetadata.NoMetadata(), -1L);
Map<TopicAndPartition, OffsetAndMetadata> mapForCommitOffset = new HashMap<>(1);
mapForCommitOffset.put(tap, offsetMetaAndErr);
kafka.javaapi.OffsetCommitRequest offsetCommitReq = new kafka.javaapi.OffsetCommitRequest(
ConsumerContext.getMainIndexingConsumerGroupId(), mapForCommitOffset, 1, clientName,
ConsumerContext.getOffsetStorageType());
OffsetCommitResponse offsetCommitResp = consumer.commitOffsets(offsetCommitReq);
Short errCode = (Short) offsetCommitResp.errors().get(tap);
if (errCode != 0) {
processKafkaOffsetCommitError(tap, offsetCommitResp, BrokerInfo.of(consumer.host()));
ErrorMapping.maybeThrowException(errCode);
}
LOG.debug("Successfully committed offset [{}].", offset);
} catch (Exception e) {
LOG.error("Error while committing offset [" + offset + "].", e);
throw e;
}
return true;
}
You can consume the messages in batches and process them in a batched manner.
batch.max.wait.ms (property)
the consumer will wait this amount of time and polls for new message

When to use MPI_BUFFER_ATTACH?

As far as I know, MPI_BUFFER_ATTACH must be called by a process if it is going to do buffered communication. But does this include the standard MPI_SEND as well? We know that MPI_SEND may behave either as a synchronous send or as a buffered send.
You need to call MPI_Buffer_attach() only if you plan to perform (explicitly) buffered sends via MPI_Bsend().
If you only plan to MPI_Send() or MPI_Isend(), then you do not need to invoke MPI_Buffer_attach().
FWIW, buffered sends are error prone and I strongly encourage you not to use them.
MPI_Buffer_attach
Attaches a user-provided buffer for sending
Synopsis
int MPI_Buffer_attach(void *buffer, int size)
Input Parameters
buffer
initial buffer address (choice)
size
buffer size, in bytes (integer)
Notes
The size given should be the sum of the sizes of all outstanding
Bsends that you intend to have, plus MPI_BSEND_OVERHEAD for each Bsend
that you do. For the purposes of calculating size, you should use
MPI_Pack_size. In other words, in the code
MPI_Buffer_attach( buffer, size );
MPI_Bsend( ..., count=20, datatype=type1, ... );
...
MPI_Bsend( ..., count=40, datatype=type2, ... );
the value of size in the MPI_Buffer_attach call should be greater than the value computed by
MPI_Pack_size( 20, type1, comm, &s1 );
MPI_Pack_size( 40, type2, comm, &s2 );
size = s1 + s2 + 2 * MPI_BSEND_OVERHEAD;
The MPI_BSEND_OVERHEAD gives the maximum amount of space that may be used in the buffer for use by the BSEND routines in using the buffer. This value is in mpi.h (for C) and mpif.h (for Fortran).
Thread and Interrupt Safety
The user is responsible for ensuring that multiple threads do not try to update the same MPI object from different threads. This routine should not be used from within a signal handler.
The MPI standard defined a thread-safe interface but this does not mean that all routines may be called without any thread locks. For example, two threads must not attempt to change the contents of the same MPI_Info object concurrently. The user is responsible in this case for using some mechanism, such as thread locks, to ensure that only one thread at a time makes use of this routine. Because the buffer for buffered sends (e.g., MPI_Bsend) is shared by all threads in a process, the user is responsible for ensuring that only one thread at a time calls this routine or MPI_Buffer_detach.
Notes for Fortran
All MPI routines in Fortran (except for MPI_WTIME and MPI_WTICK) have an additional argument ierr at the end of the argument list. ierr is an integer and has the same meaning as the return value of the routine in C. In Fortran, MPI routines are subroutines, and are invoked with the call statement.
All MPI objects (e.g., MPI_Datatype, MPI_Comm) are of type INTEGER in Fortran.
Errors
All MPI routines (except MPI_Wtime and MPI_Wtick) return an error value; C routines as the value of the function and Fortran routines in the last argument. Before the value is returned, the current MPI error handler is called. By default, this error handler aborts the MPI job. The error handler may be changed with MPI_Comm_set_errhandler (for communicators), MPI_File_set_errhandler (for files), and MPI_Win_set_errhandler (for RMA windows). The MPI-1 routine MPI_Errhandler_set may be used but its use is deprecated. The predefined error handler MPI_ERRORS_RETURN may be used to cause error values to be returned. Note that MPI does not guarentee that an MPI program can continue past an error; however, MPI implementations will attempt to continue whenever possible.
MPI_SUCCESS
No error; MPI routine completed successfully.
MPI_ERR_BUFFER
Invalid buffer pointer. Usually a null buffer where one is not valid.
MPI_ERR_INTERN
An internal error has been detected. This is fatal. Please send a bug report to mpi-bugs#mcs.anl.gov.
See Also MPI_Buffer_detach, MPI_Bsend
Refer Here For More
Buffer allocation and usage
Programming with MPI
MPI - Bsend usage

Is there a way to send multiple transactions to counterparty without looping

Is there a way to send multiple transactions to a counterparty without using a loop in the flow? Sending one tx a time in a loop impacts the performance significantly since Suspendable behaviour doesn't work well with large volumn of txes.
At some point in time, T, an initiator may be interested in sending N numbers of transactions to a regulator/counterparty. But the current SendTransactionsFlow only send one tx at a time. And on the other side, it ReceiveTransactionFlow to record one by one.
My current code
relevantTxes.forEach{
subFlow(SendTransactionFlow(session, signedTx))
}
Is there a way to do something along the line of
subFlow(SendTransactionFlow(session, relevantTxes))
You can send the list of transactions without invoking a subflow by using send and receive.
On the sender's side:
val session = initiateFlow(otherParty)
session.send(relevantTxes)
On the receiver's side:
session.receive<List<SignedTransaction>>().unwrap { relevantTxes -> relevantTxes }

Controlling the number of spawned futures to create backpressure

I am using a futures-rs powered version of the Rusoto AWS Kinesis library. I need to spawn a deep pipeline of AWS Kinesis requests to achieve high-throughput because Kinesis has a limit of 500 records per HTTP request. Combined with the 50ms latency of sending a request, I need to start generating many concurrent requests. I am looking to create somewhere on the order of 100 in-flight requests.
The Rusoto put_records function signature looks like this:
fn put_records(
&self,
input: &PutRecordsInput,
) -> RusotoFuture<PutRecordsOutput, PutRecordsError>
The RusotoFuture is a wrapper defined like this:
/// Future that is returned from all rusoto service APIs.
pub struct RusotoFuture<T, E> {
inner: Box<Future<Item = T, Error = E> + 'static>,
}
The inner Future is wrapped but the RusutoFuture still implements Future::poll(), so I believe it is compatible with the futures-rs ecosystem. The RusotoFuture provides a synchronization call:
impl<T, E> RusotoFuture<T, E> {
/// Blocks the current thread until the future has resolved.
///
/// This is meant to provide a simple way for non-async consumers
/// to work with rusoto.
pub fn sync(self) -> Result<T, E> {
self.wait()
}
}
I can issue a request and sync() it, getting the result from AWS. I would like to create many requests, put them in some kind of queue/list, and gather finished requests. If the request errored I need to reissue the request (this is somewhat normal in Kinesis, especially when hitting limits on your shard throughput). If the request is completed successfully I should issue a request with new data. I could spawn a thread for each request and sync it but that seems inefficient when I have the async IO thread running.
I have tried using futures::sync::mpsc::channel from my application thread (not running from inside the Tokio reactor) but whenever I clone the tx it generates its own buffer, eliminating any kind of backpressure on send:
fn kinesis_pipeline(client: DefaultKinesisClient, stream_name: String, num_puts: usize, puts_size: usize) {
use futures::sync::mpsc::{ channel, spawn };
use futures::{ Sink, Future, Stream };
use futures::stream::Sender;
use rusoto_core::reactor::DEFAULT_REACTOR;
let client = Arc::new(KinesisClient::simple(Region::UsWest2));
let data = FauxData::new(); // a data generator for testing
let (mut tx, mut rx) = channel(1);
for rec in data {
tx.clone().send(rec);
}
}
Without the clone, I have the error:
error[E0382]: use of moved value: `tx`
--> src/main.rs:150:9
|
150 | tx.send(rec);
| ^^ value moved here in previous iteration of loop
|
= note: move occurs because `tx` has type `futures::sync::mpsc::Sender<rusoto_kinesis::PutRecordsRequestEntry>`, which does not implement the `Copy` trait
I have also look at futures::mpsc::sync::spawn based on recommendations but it takes owner ship of the rx (as a Stream) and does not solve my problem with the Copy of tx causing unbounded behavior.
I'm hoping if I can get the channel/spawn usage working, I will have a system which takes RusotoFutures, waits for them to complete, and then provides me an easy way to grab completion results from my application thread.
As far as I can tell your problem with channel is not that a single clone of the Sender increase the capacity by one, it is that you clone the Sender for every item you're trying to send.
The error you're seeing without clone comes from your incorrect usage of the Sink::send interface. With clone you actually should see the warning:
warning: unused `futures::sink::Send` which must be used: futures do nothing unless polled
That is: your current code doesn't actually ever send anything!
In order to apply backpressure you need to chain those send calls; each one should wait until the previous one finished (and you need to wait for the last one too!); on success you'll get the Sender back. The best way to do this is to generate a Stream from your iterator by using iter_ok and to pass it to send_all.
Now you got one future SendAll that you need to "drive". If you ignore the result and panic on error (.then(|r| { r.unwrap(); Ok::<(), ()>(()) })) you could spawn it as a separate task, but maybe you want to integrate it into your main application (i.e. return it in a Box).
// this returns a `Box<Future<Item = (), Error = ()>>`. you may
// want to use a different error type
Box::new(tx.send_all(iter_ok(data)).map(|_| ()).map_err(|_| ()))
RusotoFuture::sync and Future::wait
Don't use Future::wait: it is already deprecated in a branch, and it usually won't do what you actually are looking for. I doubt RusotoFuture is aware of the problems, so I recommend avoiding RusotoFuture::sync.
Cloning Sender increases channel capacity
As you correctly stated cloning Sender increases the capacity by one.
This seems to be done to improve performance: A Sender starts in the unblocked ("unparked") state; if a Sender isn't blocked it can send an item without blocking. But if the number of items in the queue hits the configured limit when a Sender sends an item, the Sender becomes blocked ("parked"). (Removing items from the queue will unblock the Sender at a certain time.)
This means that after the inner queue hits the limit each Sender still can send one item, which leads to the documented effect of increased capacity, but only if actually all the Senders are sending items - unused Senders don't increase the observed capacity.
The performance boost comes from the fact that as long as you don't hit the limit it doesn't need to park and notify tasks (which is quite heavy).
The private documentation at the top of the mpsc module describes more of the details.

Does C++ Actor Framework guarantee message order?

Can C++ Actor Framework be used in such a way that it guarantees message ordering between two actors? I couldn't find anything about this in the manual.
If you have only two actors communicating directly, CAF guarantees that messages arrive in the order they have been sent. Only multi-hop scenarios can cause non-determinism and message reordering.
auto a = spawn(A);
self->send(a, "foo");
self->send(a, 42); // arrives always after "foo"
At the receiving end, it is possible to change the message processing order by changing the actor behavior with become:
[=](int) {
self->become(
keep_behavior,
[=](const std::string&) {
self->unbecome();
}
);
}
In the above example, this will process the int before the string message, even though they have arrived in opposite order at the actor's mailbox.

Resources