flink checkpointing halts while async i/o operator is awaiting response - asynchronous

Async i/o operator (1.15.2) is waiting for the future to return (basically let the embedded function complete its processing), till then checkpointing gets halt at this operator. And only proceeds once it gets finished. attaching the screenshot.
Checkpointing in progress
Basically I would like to test the scenario where async io is waiting for the response & job is restarted. So ideally the last checkpoint will get restored & technically the async i/o processing should get restarted again with the same set of data.
But as checkpoint is stuck, & if i restart the job, than previous checkpoint is restored which doesn’t have the correct set of data.
Please guide how should I tackle this scenario.

You should never let Flink end up waiting; all i/o should be done asynchronously. Otherwise, as you've seen, you'll end up interfering with checkpointing.
The async i/o operator is designed to help with this, but you must either use an asynchronous client library for connecting to the external service, or make your synchronous calls in another thread. See https://stackoverflow.com/a/64225825/2000823 for an example of how to do this.

Related

Async Operation in A Flink Sink using CompletableFuture

Background
Planning to set a up data pipeline using Flink.
The flow looks like this
Kafka --> Flink Job --> gRPC endpoint
Story so far
Blocking implementation is up and running. But that will not scale for high QPS
Tried simulating async behavior here
Problem
For Async Behavior not sure how the behavior would be
if CompletableFuture is used, per message it will be processed in Async manner, but will the next message be fetched for processing before processing of first is complete ? In other words, there is a way to achieve async processing within a task manager. But what is the behavior of Task manager in fetching next message / tuple ? Will is wait till Async process is complete or will it submit to CompletableFuture / Thread and fetch next message ? Not clear about that
Will using a custom threadpool cause any issues if not shutdown as the pipeline will be running over a long period ?
Any other solution to achieve async behavior in Flink sink ?
I would leverage Flink's support for async operators, and have a DiscardingSink, versus trying to implement a custom async sink.
And no, I don't see any reason why having a persistent thread pool would cause problems.

Async. programming in .Net Core

I was reading the documentation of Microsoft specifically the Async programming article and I didn't understand this section while he is explaining the work of the server's threads when using Async code.
because it(The server) uses async and await, each of its threads is freed up when the I/O-bound work starts, rather than when it finishes.
Could anyone help what does it mean by the threads r freed up when the I/O starts??
Here is the article : https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth
When ASP.NET gets an HTTP request, it takes a thread from the thread pool and uses that to execute the handler for that request (e.g., a specific controller action).
For synchronous actions, the thread stays assigned to that HTTP request until the action completes. For asynchronous actions, the await in the action method may cause the thread to return an incomplete task to the ASP.NET runtime. In this case, ASP.NET will free up the thread to handle other requests while the I/O is in flight.
Further reading about the difference between synchronous and asynchronous request handling and how asynchronous work doesn't require a thread at all times.
When your application makes a call to an external resource like Database or HttpClient thread, that initiated connection needs to wait.
Until it gets a response, it waits idly.
In the asynchronous approach, the thread gets released as soon as the app makes an external call.
Here is an article about how it happens:
https://medium.com/#karol.rossa/asynchronous-programming-73b4f1988cc6
And performance comparison between async and sync apporach
https://medium.com/#karol.rossa/asynchronous-performance-1be01a71925d
Here's an analogy for you: have you ever ordered at a restaurant with a large group and had someone not be ready to order when the waiter came to them? Did they bring in a different waiter to wait for him or did the waiter just come back to him after he took other people's orders?
The fact that the waiter is allowed to come back to him later means that he's freed up immediately after calling on him rather than having to wait around until he's ready.
Asynchronous I/O works the same way. When you do a web service call, for example, the slowest part (from the perspective of the client at least) is waiting for the result to come back: most of the delay is introduced by the network (and the other server), during which time the client thread would otherwise have nothing to do but wait. Async allows the client to do other things in the background.

rust tokio: calling async function from sync closure

I have the following problem:
I'm trying to call sync closure from async function, but sync closure has to later call another async function.
I cannot make async closure, because they are unstable at the moment:
error[E0658]: async closures are unstable
so I have to do it this way somehow.
I found a couple of questions related to the problem, such as this, but when I tried to implement it, im receiving the following error:
Cannot start a runtime from within a runtime.
This happens because a function (like `block_on`)
attempted to block the current thread while the
thread is being used to drive asynchronous tasks.'
here is playground link which hopefully can show what I'm having problem with.
I'm using tokio as stated in the title.
As the error message states, Tokio doesn't allow you to have nested runtimes.
There's a section in the documentation for Tokio's Mutex which states the following (link):
[It] is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code. [...] the feature that the async mutex offers over the blocking mutex is that it is possible to keep the mutex locked across an .await point, which is rarely necessary for data.
Additionally, from Tokio's mini-Redis example:
A Tokio mutex is mostly intended to be used when locks need to be held
across .await yield points. All other cases are usually best
served by a std mutex. If the critical section does not include any
async operations but is long (CPU intensive or performing blocking
operations), then the entire operation, including waiting for the mutex,
is considered a "blocking" operation and tokio::task::spawn_blocking
should be used.
If the Mutex is the only async thing you need in your synchronous call, it's most likely better to make it a blocking Mutex. In that case, you can lock it from async code by first calling try_lock(), and, if that fails, attempting to lock it in a blocking context via spawn_blocking.

Does Meteor server-side collection insert blocking process?

In http://docs.meteor.com/#insert there is a statement:
On the server, if you don't provide a callback, then insert blocks until the database acknowledges the write, or throws an exception if something went wrong.
Is it entire node process going blocked there ? Do we always need to provide a callback ?
No, it is not blocking the whole process. It just looks synchronous, in reality the Fiber (current execution context, cooperative thread) yields to other events in the event loop. You can safely use it but be careful: something can execute in the period of time after yielding and regaining the control.

Async: Why AsyncDownloadString?

Alright... I'm getting a bit confused here. The async monad allows you to use let! which will start the computation of the given async method, and suspend the thread, untill the result is available.. thats all fine, I do understand that.
Now what I dont understand is why they made an extension for the WebClient class, thats named AsyncDownloadString - Couldn't you just wrap the normal DownloadString inside an async block? I'm pretty sure, I'm missing an important point here, since I've done some testing that shows DownloadString wrapped inside an async block, still blocks the thread.
There is an important difference between the two:
The DownloadString method is synchronous - the thread that calls the method will be blocked until the whole string is downloaded (i.e. until the entire content is transferred over the internet).
On the other hand, AsyncDownloadString doesn't block the thread for a long time. It asks the operating system to start the download and then releases the thread. When the operating system receives some data, it picks a thread from the thread pool, the thread stores the data to some buffer and is again released. When all data is downloaded, the method will read all data from the buffer and resume the rest of the asynchronous workflow.
In the first case, the thread is blocked during the entire download. In the second case, threads are only busy for very short period of time (when processing received responses, but not when waiting for the server).
Internally, the AsyncDownloadString method is just a wrapper for DownloadStringAsync, so you can also find more information in the MSDN documentation.
The important point to note is that async programming is about doing operations that are not CPU bound i.e those which are IO bound. These IO bound operations are performed on IO threads (using overlapped IO feature of operating system). What this implies is that even if you wrap some factorial function inside a async block and run it inside another async block using let! binding, you won't get any benefit out of it as it will be running on CPU bound thread and the main purpose of doing async programming is to not take up a CPU bound thread when something which is of IO nature, as that CPU bound thread can be used for other purpose in the meantime the IO completes.
If you look at the various IO classes in .NET like File, Socket etc. They all have blocking as well as non blocking read and write operations. The blocking operations will wait for the IO to complete on the CPU thread and hence blocking the CPU thread till IO is done, where as the non blocking operations uses the overlapped IO API calls to perform the operation.
Async have a method to make a async block out of these non blocking APIs of Files, Socket etc. In your case calling DownloadString will block the CPU thread as it uses the blocking API of the underlying class where as AsyncDownloadString uses the non blocking - io overlapped - based API call.

Resources