I have Spring Boot App with Batches. I want to start the Baches asynchronously so i am using the SimpleAsyncTaskExecutor for it.
My problem is that the Batches are performing some persistent operations so they need to use the EntityManager - At this point I am getting the exception that my EM is closed. I understand the reason of this exception - my EntityManager is not available for the new Thread i have created.
My question is how to solve it - how do i correctly provide an EM for my batch running in other thread?
Related
I have an S1 AppService Plan at Azure with a SQL Database connected. I'm using EF Core.
Every now and then, not only after restarts of the app, database commands are extremely slow. The Profiler says only "waiting". But waiting for what?
Profiler picture
How can I find out what's blocking here?
It can be observed in the shared Profiler snip that your App has been stuck on DbContext.SaveChangesAsync method which is taking time to complete and therefore triggering AWAIT_TIME, but your parallel threads keep executing. Please visit double counting in parallel threads to know more.
The SaveChangesAsync method Asynchronously saves all changes made in this context to the underlying database. In order to cancel this process if it is waiting for the task to complete, you need to use cancellationToken parameter.
Check the sample code below for your reference:
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage", "CA1801:ReviewUnusedParameters", MessageId="cancellationToken")]
public virtual System.Threading.Tasks.Task<int> SaveChangesAsync (System.Threading.CancellationToken cancellationToken);
Helpful link: https://learn.microsoft.com/en-us/dotnet/api/system.data.entity.dbcontext.savechangesasync?view=entity-framework-6.2.0#System_Data_Entity_DbContext_SaveChangesAsync_System_Threading_CancellationToken_
Background (TLDR: I need parallel queries)
I am building REST service that needs to be able to answer queries very fast.
As such I'm pre-loading a large part of the database into memory and answering using that data instead of making complex database queries for each request. This works great, and the average response time of the API is well below the requirements and a lot faster than direct database queries.
But I have a problem. The service takes about 5 minutes to start and pre-load all of its information. During this time it can not answer queries.
Problem
I want to change this so that during the pre-load phase it makes database queries until the in-memory cache is loaded.
This leads me to a problem. I need to have multiple active queries to my database. Anyone who has tried this in EF Core has problably seen this message.
System.InvalidOperationException: A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.
The first sentence on the linked page is
Entity Framework Core does not support multiple parallel operations
being run on the same DbContext instance.
I thought this would be easily solved by wrapping my cache-loading into its own class and the direct query into another, and then having both of these requiring their own instance of the Database Context. Then my service can in turn get these injected and use both of these dependencies in parallel.
This should be what I have:
I have also set up my database context so that it uses transient for all parts.
services.AddDbContext<IDataContext, DataContext>(options =>
options.UseSqlServer(connectionString), ServiceLifetime.Transient, ServiceLifetime.Transient
);
I have also enabled MultipleActiveResultSets=True
All of this however results in the exact same error as listed above.
Again, everything is Transient except the HandlerService which is Singelton as I want this to keep a copy of the cache in memory and not have to load it for every request.
What is it I have failed to understand about the ef-core database context, or DI in general?
I figured out what the problem was. In my case there is as described above, one singleton handler. This handler has one (indirect) context (through DI) for fulfilling requests until the cache is loaded. When multiple parallel queries are sent to the API before the cache is loaded, then this error occurs as each of these request are using the same context. And in my test I was always hitting the parallel requests as part of the startup and hence the singelton service was trying to use the same db context for multiple requests. My solution is to in this one place step outside the "normal" dependency injection and use the IServiceScopeFactory to get a new instance of the dependency used to resolve requests before the cache is loaded. Bohdans answer led me to this conclusion and ultimate solution.
I'm not sure whether it qualifies for a full answer but it's too broad for a comment.
When doing .NET core background services which are obviously singletons too I use IServiceScopeFactory to create services with a limited lifetime.
Here's how I create a context
using (var scope = _scopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<DbContext>();
}
My guess is that you could inject it in your hander and use it like this too. So it would allow you to leave context as scoped instead of transient with is default setting btw.
Hope that helps.
I am using spring-kafka latest version and using #KafkaListener. I am using BatchListener. In the method that is listening to the list of messages i want to call the acknowledge only if the batch of records are processed. But the spring framework is not sending those messages again until I restart the application. So I used stop() and start() methods on KafkaListenerEndpointRegistry if the records were not processed but I feel like its not a good way of solving the problem. Is there a better way of handling this.
See the documentation for the SeekToCurrentBatchErrorHandler.
The SeekToCurrentBatchErrorHandler seeks each partition to the first record in each partition in the batch so the whole batch is replayed. This error handler does not support recovery because the framework cannot know which message in the batch is failing.
Kind of an open question that I run into once in a while -- if you have an EJB stateful or stateless bean, or possibly a direct servlet process, that may with the wrong parameters start running long on a production system, how could you effectively add in a manual 'kill switch' for an administrator/person to specifically kill that thread/process?
You can't, or at least you shouldn't, interfere with application server threads directly. So a "kill switch" look definitively inappropriate to me in a Java EE environment.
I do however understand the problem you have, but would rather suggest to take an asynchronous approach where you split you job in smaller work unit.
I did that using EJB Timers and was happy with the result: An initial timer is created for the first work unit. When the app. server executes the timer, it then register as second one that correspond to the 2nd work unit, etc. Information can be passed form one work unit to the other because EJB Timers support the storage of custom information. Also, timer execution and registration is transactional, which is fine to work with database. You can even shutdown and restart the application sever with this approach. Before each work unit ran, we checked in database if the job had been canceled in the meantime.
I'm designing a part of a Java EE 6 application, consisting of EJB3 beans. Part of the requirements are multiple parallel (say a few hundred) long running (over days) database hunts. Individual hunts have different search parameters (start time, end time, query filter). Parameters may get changed over time.
Currently I'm thinking of the following:
SearchController (Stateless Session Bean) formulates a set of search parameters, sends it off to a SearchListener via JMS
SearchListener (Message Driven Bean) receives search parameters, instantiates a SearchWorker with the parameters
SearchWorker (SLSB) hunts repeatedly through the database; when it finds something, the result is sent off via JMS, and the search continues; when the given 'end-time' has reached, it ends
What I'm wondering now:
Is there a problem, with EJB3 instances running for days? (Other than that I need to be able to deal with container restarts...)
How do I know how many and which EJB instances of SearchWorker are currently running?
Is it possible to communicate with them individually (similar to sending a System V signal to a unix process), e.g. to send new parameters, to end an instance, etc..
If you're holding a huge ResultSet open for an extended period of time, you're likely to encounter either transaction timeouts or database locking issues.
There is no builtin mechanism for determining which bean instances are running in a method, so you would need to add your own mechanism. Your product might have some kind of performance monitoring that lets you know how many of each type of bean is currently running a method.
As for cross-thread communication, you would need to implement your own synchronization and periodically check in the bean method. You'll be outside the scope of standard EJB since each parallel call to a business method will allocate a new SLSB from the pool.