Lua producer-consumer pattern with consumers waiting for different data - asynchronous

The problem
One data source generating data in format {key, value}
Multiple receivers each waiting for different key
Example
Getting data is run in loop. Sometimes I will want to get next value labelled with key by using
Value = MyClass:GetNextValue(Key)
I want my code to stop there until the value is ready (making some sort of future(?) value). I've tried using simple coroutines, but they work only when waiting for any data.
So the question I want to ask is something like How to implement async values in lua using coroutines or similar concept (without threads)?
Side notes
The main processing function will, apart from returning values to waiting consumers, process some of incoming data (say, labeled with special key) itself.
The full usage context should look something like:
-- in loop
ReceiveData()
ProcessSpecialData()
--
-- Called outside the loop:
V = RequestDataWithGivenKey(Key)

How to implement async values
You start by not implementing async values. You implement async functions: you don't get the value back until has been retrieved.
First, your code must be in a Lua coroutine. I'll assume you understand the care and feeding of coroutines. I'll focus on how to implement RequestDataWithGivenKey:
function RequestDataWithGivenKey(key)
local request = FunctionThatStartsAsyncGetting(key)
if(not request:IsComplete()) then
coroutine.yield()
end
--Request is complete. Return the value.
return request:GetReturnedValue()
end
FunctionThatStartsAsyncGetting returns a request back to the function. The request is an object that stores all of the data needs to process the specific request. It represents asking for the value. This should be a C-function that starts the actual async getting.
The request will be either a userdata or an encapsulated Lua table that stores enough information to communicate with the C-code that's doing the async fetching. IsComplete uses the internal request data to see if that request has completed. GetReturnedValue can only be called when IsComplete returns true; it puts the value on the Lua stack, so that this function can return it.
Your external code simply needs to handle the async stuff internally. Between resumes of these Lua coroutines, you'll need to pump whatever async stuff is doing the fetching, if there are outstanding requests.

Related

Returning multiple items in gRPC: repeated List or stream single objects?

gRPC newbie. I have a simple api:
Customer getCustomer(int id)
List<Customer> getCustomers()
So my proto looks like this:
message ListCustomersResponse {
repeated Customer customer = 1;
}
rpc ListCustomers (google.protobuf.Empty) returns (ListCustomersResponse);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
I was trying to follow Googles lead on the style. Originally I had returns (stream Customer) for GetCustomers, but Google seems to favor the ListxxxResponse style. When I generate the code, it ends up being:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<ListCustomersResponse> responseObserver) {
vs:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<Customer> responseObserver) {
Am I missing something? Why would I want to go through the hassle of creating a ListCustomersResponse when I can just do stream Customer and get the streaming functionality?
The ListCustomersResponse is just streaming the whole list at once vs streaming each customer. Googles preference seems to be to return the ListCustomersResponse style all of the time.
When is it appropriate to use the ListxxxResponse vs the stream response?
This question is hard to answer without knowing what reference you're using. It's possible there's a miscommunication, or that the reference is simply wrong.
If you're looking at the gRPC Basics tutorial though, then I might have an inkling as to what caused a miscommunication. If that's indeed your reference, then it does not recommend returning repeated fields for streamed responses; your intuition is correct: you would just want to stream the singular Customer.
Here is what it says (screenshot intentional):
You might be reading rpc ListFeatures(Rectangle) as meaning an endpoint that returns a list [noun] of features. If so, that's a miscommunication. The guide actually means an endpoint to list [verb] features. It would have been less confusing if they just wrote rpc GetFeatures(Rectangle).
So, your proto should look more like this,
rpc GetCustomers (google.protobuf.Empty) returns (stream Customer);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
generating exactly what you suspected made more sense.
Update
Ah I see, so you're looking at this example in googleapis:
// Lists shelves. The order is unspecified but deterministic. Newly created
// shelves will not necessarily be added to the end of this list.
rpc ListShelves(ListShelvesRequest) returns (ListShelvesResponse) {
option (google.api.http) = {
get: "/v1/shelves"
};
}
...
// Response message for LibraryService.ListShelves.
message ListShelvesResponse {
// The list of shelves.
repeated Shelf shelves = 1;
// A token to retrieve next page of results.
// Pass this value in the
// [ListShelvesRequest.page_token][google.example.library.v1.ListShelvesRequest.page_token]
// field in the subsequent call to `ListShelves` method to retrieve the next
// page of results.
string next_page_token = 2;
}
Yeah, I think you've probably figured the same by now, but here they have chosen to use a simple RPC, as opposed to a server-side streaming RPC (see here). I emphasize this because, I think the important choice is not the stylistic difference between repeated versus stream, but rather the difference between a simple request-response API versus a more complex and less-ubiquitous streaming API.
In the googleapis example above, they're defining an API that returns a fixed and static number of items per page, e.g. 10 or 50. It would simply be overcomplicated to use streaming for this, when pagination is already so well-understood and prevalent in software architecture and REST APIs. I think that is what they should have said, rather than "a small number." So the complexity of streaming (and learning cost to you and future maintainers) has to justified, that's all. Suppose you're actually fetching thousands of (x, y, z) items for a Point Cloud or you're creating a live-updating bid-ask visualizer for some cryptocurrency, e.g.
Then you'd start asking yourself, "Is a simple request-response API my best option here?" So it just tends to be that, the larger the number of items needing to be returned, the more streaming APIs start to make sense. And that can be for conceptual reasons, e.g. the items are a live-updating stream in time like the above crypto example, or architectural, e.g. it would be more efficient to start displaying results in the UI as partial data streams back. I think the "small number" thing you read was an oversimplification.

Cancel Http handler request

I have handlers that respond to https requests. In the handlers I call a function F1() which does some application logic and connects to a mysql db and does a query. I want to know how I can use the golang context package to cancel the Db query if the client cancels the request. Do I need to pass the ctx to F1()? Also the code I have now will take 4 seconds even if F1() returns in less then 4. How can I return as soon as F1() returns?
func handler(w http.ResponseWriter, r *http.Request) {
ctx:= r.context()
F1()
select {
case <-ctx.Done():
case <- time.After( 4*time.Second):
}
w.WriteHeader(http.statusOk)
return
}
To begin, I highly recommend taking a look at the Context blog post to familiarize yourself with contexts, in addition to reading over the context documentation itself.
To address your specific questions:
How can you cancel the database query if the user cancels their quest?
To make this work, there are a few things you want to check:
Ensure that your database driver (if you are using database/sql) supports context cancellation.
Ensure you are using the Context variants of all available methods (e.g. db.QueryContext instead of db.Query).
Ensure that you are passing the context (or a derivative of the context) through the stack from your HTTP request through to the database calls.
Do I need to pass the ctx to F1()?
Per #3 above, yes: you will need to pass the context through all intermediate calls to "connect" the database call with the request context.
How can I return as soon as F1() returns?
The code that you have in your question calls F1 in series, rather than concurrently, with your cancellation/timeout select.
If you want to apply a specific deadline to your database call, use context.WithTimeout to limit how long it can take. Otherwise, you do not need to do anything special: just call F1 with your context, and the rest will happen for you, no select is needed.

Capping an Aerospike map in Lua

We want to remove elements from Map bin based on size. There will be multiple threads which will try to do above operation. So writing an UDF to do this operation will make it synchronized between threads. But remove_by_rank_range is not working inside lua. Below is the error iwe are getting:
attempt to call field 'remove_by_rank_range' (a nil value)
sample lua code:
function delete(rec)
local testBinMap = rec.testBin
map.remove_by_rank_range(testBinMap, 0, 5)
end
The Lua map API does not include most of the operations of the Map data type, as implemented in the clients (for example, the Java client's MapOperation class).
The performance of the native map operations is significantly higher, so why would you use a UDF here, instead of calling remove_by_rank_range from the client?
The next thing to be aware of is that any write operation, whether it's a UDF or a client calling the map remove_by_rank_range method, first grabs a lock on the record. I answered another stackoverflow question about this request flow. Your UDF doesn't give any advantage to the problem you described over the client map operation.
If you want to cap the size of your map you should be doing it at the very same time you're adding new elements to the map. The two operations would be wrapped together with operate() - an insert, followed by the remove. I have an example of how to do this in rbotzer/aerospike-cdt-examples.

Any other examples of multi-state Agent programming in FSharp?

I'm investigating F# agents that have multiple states, i.e., using the "let rec/and" keyword combination (per Expert F# 3.0's "Message Processing and State Machines") to provide multiple async blocks. The only example I've been able to find so far is the "throttling agent" discussed here (also Fssnip.net). Are there any other resources for learning this pattern?
edit: My specific application is an agent that has two states,
| StartFeed rateMultiplier replychannel ->
- replychannel out data values at a delay (provided with each value)
multiplied by rateMultiplier
- loop by using
thisAgent.Post(StartFeed rateMultiplier replychannel)
| Pause ->
I would like to provide some way to pass in a feed rate multiplier value that increases/decreases the delay by the passed-in multiplier in the "feed" async state, without interrupting the feed of values. I guess the question boils down to "how do you keep an async state block actively looping while still being aware of new messages?" Almost like skipping the inbox.Receive asynchronous wait, unless a message actually comes in? Inbox.scan?
edit 2: Given the message queue aspect of MailboxProcessor, I can see that an external message (with a different rateMultiplier value) that is received by the agent and placed in the queue will successfully change the rate without interrupting the flow of data values out. Any advice on the "Pause" would be still be appreciated.
I have found Tomas Petricek's entry https://github.com/tpetricek/FSharp.AsyncExtensions/blob/master/src/Agents/BlockingQueueAgent.fs , which gives an agent, with the standard mailboxprocessor queue, a way to choose what async block it will employ to process the next incoming message (ie, let the agent 'change its state'):
inbox.Receive() is used for the 'standard state' - the agent's message 'inbox' queue is neither full nor empty (State #1)
inbox.Scan() is used for the 'edge' or limiting cases of empty (State #2) and full (State #3) message 'inbox' queue
the actions the agent (in whichever of the three states) can take in response to received messages are written as **distinct async blocks that are given their own 'and' async block in the agent's 'let rec' loop; I had thought that 'let rec...and...' async blocks were restricted to having a message receipt function (.Receive, .Scan, etc), which is incorrect, they may be any async block that maintains the desired control flow, as seen in the next feature of the 'let rec...and...' agent body:
once the agent, in whichever of the 3 states, responds to a new message by routing to the appropriate action, the action is itself finished with a call to another 'and' async block of the agent body 'let rec' loop, a 'chooseState()', an if/then block that determines which state will handle a new message and calls that 'and' async block from among the 3 available.
This example seems essential in demonstrating idiomatic use of the multi-state agent body construction, specifically how to combine the three functions of message receipt, response, and looping control as mutually recursive elements of a single 'let rec...and...and..." construction.
Of course other message-passing frameworks exist, but this is a general logic/routing design for a more complex agent, whatever the framework, so:
thanks, Tomas.

What is the use case of firebase-queue sanitize?

I am experimenting with firebase-queue. I saw the option for sanitizing. It's described in the doc as
sanitize - specifies whether the data object passed to the processing
function is sanitized of internal keys reserved for use by the queue.
Defaults to true.
What does it mean?
I am getting an error for not specifying { sanitize : false }
When the sanitize option is set, the queue sanitizes (or cleans) the input provided to the processing function so that it resembles that which the original client placed onto the queue, and doesn't contain any of the keys added by the implementation of the queue itself.
If, however, you rely on a key (usually the keys starting with an underscore, e.g. _id) that is added by the queue, and not the original client, you need to set sanitize: false so those keys are returned to your function and they're not undefined.
You can clearly see the difference with a simple processing function that just performs a console.log(data).
A quick note about why these keys are removed by default: Reading or writing directly to the location (as it looks like you're perhaps doing, by passing undefined into the client SDK child() method instead of data._id) is generally a bad idea from within the worker itself as writes performed directly are not guarded by the extensive transaction logic in the queue to prevent race conditions. If you can isolate the work to taking input from the provided data field, and returning outputs to the resolve() function, you'll likely have a better time scaling up your queue.

Resources