I'm investigating F# agents that have multiple states, i.e., using the "let rec/and" keyword combination (per Expert F# 3.0's "Message Processing and State Machines") to provide multiple async blocks. The only example I've been able to find so far is the "throttling agent" discussed here (also Fssnip.net). Are there any other resources for learning this pattern?
edit: My specific application is an agent that has two states,
| StartFeed rateMultiplier replychannel ->
- replychannel out data values at a delay (provided with each value)
multiplied by rateMultiplier
- loop by using
thisAgent.Post(StartFeed rateMultiplier replychannel)
| Pause ->
I would like to provide some way to pass in a feed rate multiplier value that increases/decreases the delay by the passed-in multiplier in the "feed" async state, without interrupting the feed of values. I guess the question boils down to "how do you keep an async state block actively looping while still being aware of new messages?" Almost like skipping the inbox.Receive asynchronous wait, unless a message actually comes in? Inbox.scan?
edit 2: Given the message queue aspect of MailboxProcessor, I can see that an external message (with a different rateMultiplier value) that is received by the agent and placed in the queue will successfully change the rate without interrupting the flow of data values out. Any advice on the "Pause" would be still be appreciated.
I have found Tomas Petricek's entry https://github.com/tpetricek/FSharp.AsyncExtensions/blob/master/src/Agents/BlockingQueueAgent.fs , which gives an agent, with the standard mailboxprocessor queue, a way to choose what async block it will employ to process the next incoming message (ie, let the agent 'change its state'):
inbox.Receive() is used for the 'standard state' - the agent's message 'inbox' queue is neither full nor empty (State #1)
inbox.Scan() is used for the 'edge' or limiting cases of empty (State #2) and full (State #3) message 'inbox' queue
the actions the agent (in whichever of the three states) can take in response to received messages are written as **distinct async blocks that are given their own 'and' async block in the agent's 'let rec' loop; I had thought that 'let rec...and...' async blocks were restricted to having a message receipt function (.Receive, .Scan, etc), which is incorrect, they may be any async block that maintains the desired control flow, as seen in the next feature of the 'let rec...and...' agent body:
once the agent, in whichever of the 3 states, responds to a new message by routing to the appropriate action, the action is itself finished with a call to another 'and' async block of the agent body 'let rec' loop, a 'chooseState()', an if/then block that determines which state will handle a new message and calls that 'and' async block from among the 3 available.
This example seems essential in demonstrating idiomatic use of the multi-state agent body construction, specifically how to combine the three functions of message receipt, response, and looping control as mutually recursive elements of a single 'let rec...and...and..." construction.
Of course other message-passing frameworks exist, but this is a general logic/routing design for a more complex agent, whatever the framework, so:
thanks, Tomas.
Related
I am new to akka and still trying to understand the different akka and streaming concepts. For some new feature i need to add a http call to already existing stream which is working on an internal object. Something like this -
val step1Flow = Flow[SampleObject].filter(...--Filtering condition--...)
val step2Flow = Flow[SampleObject].map(obj => {
...
-- Business logic to update values in the obj --
...
})
...
override val flowGraph: Flow[SampleObject, SampleObject, NotUsed] =
bufferIn.via(Flow.fromGraph(GraphDSL.create() {
implicit builder =>
import GraphDSL.Implicits._
...
val step1 = builder.add(step1Flow)
val step2 = builder.add(step2Flow)
val step3 = builder.add(step3Flow)
...
source ~> step1 ~> step2 ~> step3 ~> merge
...
}
I need to add the new http request flow (lets call it newFlow) after step1. All these flow have Inlet and Outlet as SampleObject. Now my understanding is that the newFlow would need to be blocking because the outlet need to be SampleObject only. For that I have used Await function on the http call future. The code looks like this -
val responseFuture: Future[(Try[HttpResponse], SomeContext)] =
Source
.single(httpRequest -> context)
.via(Retry(retrySettings).join(clientFlow))
.runWith(Sink.head)
...
val (httpTry, passedAlongContext) = Await.result(responseFuture, 30.seconds)
-- logic to process response and return SampleObject --
Now this works fine but i think there should be a better way to do this without using wait. Also i think this would block the main thread till the request completes, which is going to affect the app throughput.
Could you please guide if the approach i used is correct or not. And how do i make use of some other thread pool to handle these blocking call so my main threadpool is not affected
This question seems very similar to mine but i do not understand it completely - connect Akka HTTP to Akka stream . Also i can't change the step2 or further flows.
EDIT : Added some code details for the stream
I ended up using the approach mentioned in the question because i couldn't find anything better after looking around. Adding this step decreased the throughput of my application as expected, but there are approaches to increase that can be used. Check these awesome blogs by Colin Breck -
https://blog.colinbreck.com/maximizing-throughput-for-akka-streams/
https://blog.colinbreck.com/partitioning-akka-streams-to-maximize-throughput/
To summarize -
Use Asynchronous Boundaries for flows which are blocking.
Use Futures if possible and add callbacks to futures. There are several ways to do that.
Use Buffers. There are several types of buffers available, choose what suits your needs.
Other than these, you can use inbuilt flows like -
Use "Broadcast" to broadcast your events to multiple consumers.
Use "Partition" to partition your stream into multiple streams based
on some condition.
Use "Balance" to partition your stream when there is no logical way to partition your events or they all could have different work loads.
You could use any one or multiple things from above options.
I have a scheduled function that runs every three minutes.
It is supposed to look on the database (firestore), query the relevant users, send them emails or perform other db actions.
Once it sends an email to a user, it updates the user with a field 'sent_to_today:true'.
If sent_to_today == true, the function won't touch that user for around 24 hours, which is what's intended.
But, because I have many users, and the function is doing a lot of work, by the time it updates the user with sent_to_today:true, another invocation gets to that user beforehand and processes them for sending emails.
This results in some users getting the same email, twice.
What are my options to make sure this doesn't happen?
Data Model (simplified):
users (Collection)
--- userId (document)
--- sent_to_today [Boolean]
--- NextUpdateTime [String representing a Timestamp in ISO String]
When the function runs, if ("Now" >= NextUpdateTime) && (sent_to_today==false), the user is processed, otherwise, they're skipped.
How do I make sure that the user is only processed by one invocation per day, and not many?
As I said, by the time they're processed by one function invocation (which sets "sent_to_today" to true), the next invocation gets to that user and processes them.
Any help in structuring the data better or using any other logical method would be greatly appreciated.
Here is an idea I'm considering:
Each invocation sets a global document's field, "ex: busy_right_now : true" at the start, and when finished it sets it again to false. If a subsequent invocation runs before the current is finished, it does nothing if busy_right_now is still true.
Option 1.
Do you think you the function can be invoked once in ten minutes, rather every three minutes? If yes - just modify the scheduler, and make sure that 'max instances' attribute is '1'. As the function timeout is only 540 seconds, 10 minutes (600 seconds) is more than enough to avoid overlapping.
Option 2.
When a firestore document is chosen for processing, the cloud function modifies some attribute - i.e. __state - and sets its value to IN_PROGRESS for example. When the processing is finished (email is sent), that attribute value is modified again - to DONE for example. Thus, if the function picks up a document, which has the value IN_PROGRESS in the __state attribute - it simply ignores and continues to the next one.
The drawback - if the function crashes - there might be documents with IN_PROGRESS state, and there should be some mechanism to monitor and resolve such cases.
Option 3.
One cloud function runs through the firestore collection, and for each document, which is to be processed - sends a pubsub message which triggers another cloud function. That one works only with one firestore document. Nevertheless the 'state machine' control is required (like in the Option 2 above). The benefit of the option 3 - higher level of specialisation between functions, and there may be many 'second' cloud functions running in parallel.
gRPC newbie. I have a simple api:
Customer getCustomer(int id)
List<Customer> getCustomers()
So my proto looks like this:
message ListCustomersResponse {
repeated Customer customer = 1;
}
rpc ListCustomers (google.protobuf.Empty) returns (ListCustomersResponse);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
I was trying to follow Googles lead on the style. Originally I had returns (stream Customer) for GetCustomers, but Google seems to favor the ListxxxResponse style. When I generate the code, it ends up being:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<ListCustomersResponse> responseObserver) {
vs:
public void getCustomers(com.google.protobuf.Empty request,
StreamObserver<Customer> responseObserver) {
Am I missing something? Why would I want to go through the hassle of creating a ListCustomersResponse when I can just do stream Customer and get the streaming functionality?
The ListCustomersResponse is just streaming the whole list at once vs streaming each customer. Googles preference seems to be to return the ListCustomersResponse style all of the time.
When is it appropriate to use the ListxxxResponse vs the stream response?
This question is hard to answer without knowing what reference you're using. It's possible there's a miscommunication, or that the reference is simply wrong.
If you're looking at the gRPC Basics tutorial though, then I might have an inkling as to what caused a miscommunication. If that's indeed your reference, then it does not recommend returning repeated fields for streamed responses; your intuition is correct: you would just want to stream the singular Customer.
Here is what it says (screenshot intentional):
You might be reading rpc ListFeatures(Rectangle) as meaning an endpoint that returns a list [noun] of features. If so, that's a miscommunication. The guide actually means an endpoint to list [verb] features. It would have been less confusing if they just wrote rpc GetFeatures(Rectangle).
So, your proto should look more like this,
rpc GetCustomers (google.protobuf.Empty) returns (stream Customer);
rpc GetCustomer (GetCustomerRequest) returns (Customer);
generating exactly what you suspected made more sense.
Update
Ah I see, so you're looking at this example in googleapis:
// Lists shelves. The order is unspecified but deterministic. Newly created
// shelves will not necessarily be added to the end of this list.
rpc ListShelves(ListShelvesRequest) returns (ListShelvesResponse) {
option (google.api.http) = {
get: "/v1/shelves"
};
}
...
// Response message for LibraryService.ListShelves.
message ListShelvesResponse {
// The list of shelves.
repeated Shelf shelves = 1;
// A token to retrieve next page of results.
// Pass this value in the
// [ListShelvesRequest.page_token][google.example.library.v1.ListShelvesRequest.page_token]
// field in the subsequent call to `ListShelves` method to retrieve the next
// page of results.
string next_page_token = 2;
}
Yeah, I think you've probably figured the same by now, but here they have chosen to use a simple RPC, as opposed to a server-side streaming RPC (see here). I emphasize this because, I think the important choice is not the stylistic difference between repeated versus stream, but rather the difference between a simple request-response API versus a more complex and less-ubiquitous streaming API.
In the googleapis example above, they're defining an API that returns a fixed and static number of items per page, e.g. 10 or 50. It would simply be overcomplicated to use streaming for this, when pagination is already so well-understood and prevalent in software architecture and REST APIs. I think that is what they should have said, rather than "a small number." So the complexity of streaming (and learning cost to you and future maintainers) has to justified, that's all. Suppose you're actually fetching thousands of (x, y, z) items for a Point Cloud or you're creating a live-updating bid-ask visualizer for some cryptocurrency, e.g.
Then you'd start asking yourself, "Is a simple request-response API my best option here?" So it just tends to be that, the larger the number of items needing to be returned, the more streaming APIs start to make sense. And that can be for conceptual reasons, e.g. the items are a live-updating stream in time like the above crypto example, or architectural, e.g. it would be more efficient to start displaying results in the UI as partial data streams back. I think the "small number" thing you read was an oversimplification.
I have a micro-service which involved in an OAuth 1 interaction. I'm finding myself in a situation where two runs of the Lambda functions with precisely the same starting states have very different outcomes (where state is considered the "event" passed in, environment variables, and "stageParameters" from the API Gateway).
Here's a Cloudwatch log that shows two back-to-back runs:
You can see that while the starting state is identical, the execution path changes pretty quickly. In the second case (failure case), you see the log entry "Auth state changed: null" ... that is very odd indeed because in fact this is logged before even the first line of code of the "handler" is executed. Here's the beginning of the functions handler:
export const handler = (event, context, cb) => {
console.log('EVENT:\n', JSON.stringify(event, null, 2));
So where is this premature logging entry coming from? Well, one must assume that it somehow is left over from prior executions. Let me demonstrate ... it is in fact an event listener that was setup in the prior execution. This function interacts with a Firebase DB and the first time it connects it sets the following up:
auth.signInWithEmailAndPassword(username, password)
.then((result) => {
auth.onAuthStateChanged(this.watchAuthState);
where the watchAuthState function is simply:
watchAuthState(user) {
console.log(`Auth state changed:\n`, JSON.stringify(user, null, 2));
}
This seems to mean that when I run the DB a second time I am already "initialized" with the Firebase DB but apparently the authentication has been invalidated. My number one aim is to just get back to a predictive state model and have it execute precisely the same each time.
If, there are sneaky ways to reuse cached state between Lambda executions in resource useful ways then I guess that too would be interesting but only if we can do that while achieving the predictive state machine.
Regarding the logs order, look at the ID that comes after each timestamp at the beginning of each line. I believe this is the invocation ID. In the two lines you have highlighted in orange, they are from different invocations of the function. The EVENT log is the first line to get logged from the invocation with ID ending in 754ee. The Auth state changed: null line is a log entry coming from the earlier invocation of the function with invocation ID ending in c40d5.
It looks like you are setting auth state to null at the end of an invocation, but the Firebase connection is global, so the second function invocation thinks the Firebase connection is already initialized, but then it throws errors because the authentication was nulled out.
My number one aim is to just get back to a predictive state model and
have it execute precisely the same each time.
Then you need to be aware of Lambda container reuse, and not use any global variables.
The problem
One data source generating data in format {key, value}
Multiple receivers each waiting for different key
Example
Getting data is run in loop. Sometimes I will want to get next value labelled with key by using
Value = MyClass:GetNextValue(Key)
I want my code to stop there until the value is ready (making some sort of future(?) value). I've tried using simple coroutines, but they work only when waiting for any data.
So the question I want to ask is something like How to implement async values in lua using coroutines or similar concept (without threads)?
Side notes
The main processing function will, apart from returning values to waiting consumers, process some of incoming data (say, labeled with special key) itself.
The full usage context should look something like:
-- in loop
ReceiveData()
ProcessSpecialData()
--
-- Called outside the loop:
V = RequestDataWithGivenKey(Key)
How to implement async values
You start by not implementing async values. You implement async functions: you don't get the value back until has been retrieved.
First, your code must be in a Lua coroutine. I'll assume you understand the care and feeding of coroutines. I'll focus on how to implement RequestDataWithGivenKey:
function RequestDataWithGivenKey(key)
local request = FunctionThatStartsAsyncGetting(key)
if(not request:IsComplete()) then
coroutine.yield()
end
--Request is complete. Return the value.
return request:GetReturnedValue()
end
FunctionThatStartsAsyncGetting returns a request back to the function. The request is an object that stores all of the data needs to process the specific request. It represents asking for the value. This should be a C-function that starts the actual async getting.
The request will be either a userdata or an encapsulated Lua table that stores enough information to communicate with the C-code that's doing the async fetching. IsComplete uses the internal request data to see if that request has completed. GetReturnedValue can only be called when IsComplete returns true; it puts the value on the Lua stack, so that this function can return it.
Your external code simply needs to handle the async stuff internally. Between resumes of these Lua coroutines, you'll need to pump whatever async stuff is doing the fetching, if there are outstanding requests.