Can I mix Ktor with Exposed? - asynchronous

I am writing a service using Ktor and Exposed ORM which apparently isn't async. I am coming from the Python world and back there using a blocking ORM with a async IO library is a sin as it may block all users in thread.
Does the same rule apply in Kotlin? Am I creating a bad architecture?

Exposed uses thread local storage to keep transaction instance accessible to implementation and avoid passing it along with every function call. Since transaction DSL function is executing synchronously and do not release a thread to be reusable by ktor for other calls there shouldn't be any issues with using them together.

There is coroutine support in Exposed.
Please read the documentation:
https://github.com/JetBrains/Exposed/wiki/Transactions#working-with-coroutines

Here's a blogpost that shows how to use them together:
https://ryanharrison.co.uk/2018/04/14/kotlin-ktor-exposed-starter.html
I have also successfully done so myself in a test-project but I'm not yet at a point where I'm ready to share the code.
In short, you can use Kotlin coroutines in such a way that you do the database transaction on a thread so they do not block KTOR's request handling loop.
If using the right coroutine dispatcher, then this should not give any problem with the threadlocal transaction context.

Related

Transient Database contexts from separate dependencies fails for parallel queries

Background (TLDR: I need parallel queries)
I am building REST service that needs to be able to answer queries very fast.
As such I'm pre-loading a large part of the database into memory and answering using that data instead of making complex database queries for each request. This works great, and the average response time of the API is well below the requirements and a lot faster than direct database queries.
But I have a problem. The service takes about 5 minutes to start and pre-load all of its information. During this time it can not answer queries.
Problem
I want to change this so that during the pre-load phase it makes database queries until the in-memory cache is loaded.
This leads me to a problem. I need to have multiple active queries to my database. Anyone who has tried this in EF Core has problably seen this message.
System.InvalidOperationException: A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.
The first sentence on the linked page is
Entity Framework Core does not support multiple parallel operations
being run on the same DbContext instance.
I thought this would be easily solved by wrapping my cache-loading into its own class and the direct query into another, and then having both of these requiring their own instance of the Database Context. Then my service can in turn get these injected and use both of these dependencies in parallel.
This should be what I have:
I have also set up my database context so that it uses transient for all parts.
services.AddDbContext<IDataContext, DataContext>(options =>
options.UseSqlServer(connectionString), ServiceLifetime.Transient, ServiceLifetime.Transient
);
I have also enabled MultipleActiveResultSets=True
All of this however results in the exact same error as listed above.
Again, everything is Transient except the HandlerService which is Singelton as I want this to keep a copy of the cache in memory and not have to load it for every request.
What is it I have failed to understand about the ef-core database context, or DI in general?
I figured out what the problem was. In my case there is as described above, one singleton handler. This handler has one (indirect) context (through DI) for fulfilling requests until the cache is loaded. When multiple parallel queries are sent to the API before the cache is loaded, then this error occurs as each of these request are using the same context. And in my test I was always hitting the parallel requests as part of the startup and hence the singelton service was trying to use the same db context for multiple requests. My solution is to in this one place step outside the "normal" dependency injection and use the IServiceScopeFactory to get a new instance of the dependency used to resolve requests before the cache is loaded. Bohdans answer led me to this conclusion and ultimate solution.
I'm not sure whether it qualifies for a full answer but it's too broad for a comment.
When doing .NET core background services which are obviously singletons too I use IServiceScopeFactory to create services with a limited lifetime.
Here's how I create a context
using (var scope = _scopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<DbContext>();
}
My guess is that you could inject it in your hander and use it like this too. So it would allow you to leave context as scoped instead of transient with is default setting btw.
Hope that helps.

Async calls using HTTPClient vs Direct calling methods asynchronously using Tasks for a synchronous service

I have a scenario in my existing application where on the click of a Save button a Javascript function is called. This javascript function internally makes 4-5 asynchronous calls to webservices.For some reasons we have big javascript files now with lot of business logic. Also we are facing performance issues in the application. To reduce the number of XHR calls we are making to the server, we thought of consolidating these calls on the server side and just make a single call from our Javascript.
On the server side we are using Async Await to make this calls asynchronous.So we have created a wrapper service with one method which now calls different service methods using SendAsync method exposed by HTTPClient.
Our underlying services are all synchronous and to achieve asynchronous functionality we used HTTPClient. We measured performance and it shows considerable gain.
But, one of our colleague pointed out that we will actually have an overhead of serialization and Deserialization as well as we are originating now other webservice calls from server which will ultimately run synchronously.So why not directly call the methods instead of new HTTP calls.
ow our methods are all synchronous and to make them asynchronous we will have to use Tasks which will again be overhead.
Both the approaches will be overhead but we see the making new HTTP requests using async await more inline with the microservices concept.
There is a debate and I would like to know other thoughts.
My two-cents:
The approach of aggregating the information on the server side is good.
From my point of view the use of HTTPClient internally on the server side is a solution only if you want to connect to a legacy service and you do not have the ability to integrate it directly. HTTPClient is simple to use and robust, but it's technically a lot more overhead than using a Task (think of error handling, serialisation, testing, network/socket-resources).
A Task is also nice, since it allows proper cancelation, which HTTPClient cannot achieve (HTTPClient can only close the socket, other end could still block resources).
On top of the general resource aspect, the use of Futures makes the Task a perfect match:
https://msdn.microsoft.com/en-us/library/ff963556.aspx

How bad is it to run an entire HTTP action method in separate thread using Task::Run()?

I'm writing web services in C++/CLI (not my choice) using Microsoft's Web API. A lot of functions in Web API are async, but because I'm using C++/CLI, I don't get the async/await support of C# or VB. So the fallback position is to use ContinueWith() to schedule a continuation delegate for reading the async task's result safely.
However, because C++/CLI also doesn't support inline anonymous delegates or managed lambdas, every delegate continuation must be written as a separate function somewhere. That quickly turns into spaghetti with the number of async functions in Web API.
So, to avoid the deadlock issues of Task<T>::Result, I've been trying this:
[HttpGet, Route( "get/some/dto" )]
Task< SomeDTO ^ > ^ MyActionMethod()
{
return Task::Run( gcnew Func< SomeDTO ^ >( this, &MyController::MyActionMethod2 ) );
}
SomeDTO ^ MyActionMethod2()
{
// execute code and use any task->Result calls I need without deadlocking
}
Okay, so I know this isn't great, but how bad is it? I don't yet understand enough of the guts of Web API or ASP.NET to comprehend the performance or scaling ramifications this will have.
Also, what other consequences may this have that aren't necessarily related to performance? For example, exceptions get wrapped in an extra AggregateException, which represents additional complexity and work for handling exceptions.
Your memory usage will increase with your application's parallelism. For every concurrent call to MyActionMethod you will need a separate thread with its own stack. That will cost you about 1 MB of RAM for each concurrent call. If MyActionMethod runs long enough so that 10000 instances run at once, you're looking at 10 GB of RAM. There is also CPU overhead in setting up each thread.
If concurrency is low, dropping async support won't be a problem. In that case, don't bother with Task::Run. Just change MyActionMethod to return SomeDTO^ (no Task wrapper).
Another potential concern is that lose easy use of cancellation tokens. However, for Web API it's usually fine to just let an exception propagate back to Web API, which ends up cancelling the synchronous call anyway.
Finally, if you were planning on performing any operation within your action method in parallel, you'll still need to use ContinueWith to accomplish that. Going non-async by default means you'll always perform one operation at a time. Fortunately, it's often just fine to do so.
Okay, so I know this isn't great, but how bad is it?
It's difficult to answer this without load-testing your specific scenario. But you can walk through the known semantics (taken largely from my blog).
First, when a request comes in, ASP.NET executes your handler on a thread pool thread within that request context. Your request handler calls Task.Run, which takes another thread from the thread pool and executes the actual request logic on it. The handler then returns the task returned from Task.Run; this releases the original request thread back to the thread pool.
Then, the Task.Run delegate will block on any asynchronous parts. So, this pattern has the scaling disadvantages of a regular synchronous handler, plus an extra thread context switch. Also, it uses a thread from the ASP.NET thread pool, which is not necessarily a bad thing, but in some scenarios it may throw off the ASP.NET thread pool heuristics.
Also, what other consequences may this have that aren't necessarily related to performance? For example, exceptions get wrapped in an extra AggregateException, which represents additional complexity and work for handling exceptions.
Yes, the exceptions from any .Result or Wait() calls will be wrapped in AggregateException. You may be able to avoid this by calling .GetAwaiter().GetResult() instead.
Another important consideration is that the code executing within the Task.Run is executing without a request context. So, ambient data like HttpContext.Current, current culture, thread principal, etc. are not going to be set correctly. You'll have to capture any important data before calling Task.Run and pass it down manually.

node.js asynchronous initialization issue

I am creating a node.js module which communicates with a program through XML-RPC. The API for this program changed recently after a certain version. For this reason, when a client is created (createClient) I want to ask the program its version (through XML-RPC) and base my API definitions on that.
The problem with this is that, because I do the above asynchronously, there exists a possibility that the work has not finished before the client is actually used. In other words:
var client = program.createClient();
client.doSomething();
doSomething() will fail because the API definitions have not been set, I imagine because HTTP XML-RPC response has not returned from the program.
What are some ways to remedy this? I want to be able to have a variable named client and work with that, as later I will be calling methods on it to get information (which will be returned via a callback).
Set it up this way:
program.createClient(function (client) {
client.doSomething()
})
Any time there is IO, it must be async. Another approach to this would be with a promise/future/coroutine type thing, but imo, just learning to love the callback is best :)

Which is better in this case - sync or async web service?

I'm setting up a web service in Axis2 whose job it will be to take a bunch of XML and put it on to a queue to be processed later. I understand its possible to set up a client to invoke a synchronous web service asynchronously by creating a using an "invokeNonBlocking" operation on the "Call" instance. (ref http://onjava.com/pub/a/onjava/2005/07/27/axis2.html?page=4)
So, my question is, is there any advantage to using an asynchronous web service in this case? It seems redundant because 1) the client isn't blocked and 2) the service has to accept and write the xml to queue regardless if it's synchronous or asynchronous
In my opinion, asynchronous is the appropriate way to go. A couple of things to consider:
Do you have multiple clients accessing this service at any given moment?
How often is this process occurring?
It does take a little more effort to implement the async methods. But I guarantee, in the end you will be much happier with the result. For one, you don't have to manage threading. Your primary concern might just be the volatility of the data in the que (i.e. race/deadlock conditions).
A "sync call" seems appropriate, I agree.
If the request from the client isn't time consuming, then I don't see the advantage either in making the call asynchronous. From what I understand of the situation in question here, the web-service will perform its "processing" against the request some time in the future.
If, on the contrary, the request had required a time consuming process, then an async call would haven been appropriate.
After ruminating some more about it, I'm thinking that the service should be asynchronous. The reason is that it would put the task of writing the data to the queue into a separate thread, thus lessening the chances of a timeout. It makes the process more complicated, but if I can avoid a timeout, then it's got to be done.

Resources