Hangfire causes Timeout expired exception when executing SQL Code to optimzie database - ASP.NET Core - asp.net

I have the following code setup as a scheduled task:
public class OptimizeDatabase : IJob {
#region Constructor
public OptimizeDatabase(DataContext dataContext) {
DbContext = dataContext;
}
#endregion
#region Fields
private readonly DataContext DbContext;
#endregion
#region Methods
public async Task Execute() {
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
string result = "Ok";
try {
// Rebuild Indexes
DbContext.Database.ExecuteSqlCommand("EXEC sp_MSforeachtable \"ALTER INDEX ALL ON ? REBUILD WITH (ONLINE=OFF)\"");
// Update Statistics
DbContext.Database.ExecuteSqlCommand("EXEC sp_updatestats;");
}
catch (Exception ex) {
result = ex.Message + Environment.NewLine + ex.StackTrace;
}
stopWatch.Stop();
DbContext.TaskLogs.Add(new TaskLog {
Date = DateTime.Now,
ElapsedSeconds = stopWatch.Elapsed.TotalSeconds,
Result = result,
Task = "Optimize Database"
});
await DbContext.SaveChangesAsync();
}
#endregion
}
And it's configured to run in Startup.cs
RecurringJob.AddOrUpdate<OptimizeDatabase>(x => x.Execute(), Cron.Daily(10));
All other scheduled tasks execute without issue, however, this one always throws the following error:
Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding. at
System.Data.SqlClient.SqlConnection.OnError(SqlException exception,
Boolean breakConnection, Action`1 wrapCloseInAction)
Any ideas or insights are appreciated.

The answer to your question is pretty simple. You are trying to rebuild the indices for all tables in your database and executing this statement via a hangfire job. The Hangfire jobs now try to rebuild the index for their own tables and that creates a deadlock.
You have to rebuild the indices explicitly for all your tables one after one like:
ALTER INDEX ALL ON [dbo].[A] REBUILD;
ALTER INDEX ALL ON [dbo].[B] REBUILD;

The timeout issue is because one of the queries is taking longer that it should.
In .Net there are 2 timeout as far as I know, the connection timeout (ConnectionTimeout Property) and the command timeout (CommandTimeout Property). Both timeout default time is 30 seconds.
I recommend you to:
Run your queries at the SQL Management Studio to have an idea about the time needed to run both queries in sequences as your code shows.
Set the connection timeout and command timeout when the amount of second from the previous run at SQL Management Studio. If the timeout continues to appears, try adding more time to each timeout, add 30 seconds to each timeout until you find the minimal needed time to execute your queries. Once you find it, add 30 seconds extra to each timeout just to be sure.
Taking part of your code, the change would be something like:
try {
// Change CommandTimeout
DbContext.Database.CommandTimeout = 120;
// Rebuild Indexes
DbContext.Database.ExecuteSqlCommand("EXEC sp_MSforeachtable \"ALTER INDEX ALL ON ? REBUILD WITH (ONLINE=OFF)\"");
// Update Statistics
DbContext.Database.ExecuteSqlCommand("EXEC sp_updatestats;");
}
catch (Exception ex) {
result = ex.Message + Environment.NewLine + ex.StackTrace;
}
You can take a look at this article, it explains good scenarios of possible timeout causes https://stackoverflow.com/a/8603111/2654879 .

We had face same issue, where we have to do somany table operations in a single transaction scope. The hangfire jobs get failed with the same error when multiple jobs runs parallel. The way we solved this issue is by increase the command timeout value(By default it's 30 second)

Related

Not able to read uncommitted offsets after restart Listener in Spring boot

I am trying to commit the offsets in Kafka on basis of certain conditions.
Here is my listener code.
#KafkaListener(topics = "test")
public void getTopics(#RequestBody String emp,Acknowledgment acknowledgment) {
Employee model = gson.fromJson(emp, Employee.class);
if(model.getId()%2!=0)
{
System.out.println("Kafka event consumed is: " + emp);
acknowledgment.acknowledge();
}
else {
System.out.println("Model converted value: " + model);
}
}
Here is my application.prop file configurations
spring.kafka.listener.ack-mode=manual-immediate
spring.kafka.consumer.auto-offset-reset=earliest
At first when i start the springboot app then i am passing some even employee id in topic.
Then i restart my program after making if(model.getId()%2==0) which is opposite condition.I am able to fetch the values which i had not commited first.
I retry the same process by adding spring.kafka.consumer.enable-auto-commit=false in application.properties.But this time consumer doesn't retry for earlier.I had thought that i should have been able to get what i was getting earlier and in first case it should not have worked as auto-commit should be false for committing.
Thanks for any advice.

when using #StreamListener, customization to KafkaListenerContainerFactory are getting reflected in generated KafkaMessageListenerContainer?

I am using spring-cloud-stream with kafka binder to consume message from kafka . The application is basically consuming messages from kafka and updating a database.
There are scenarios when DB is down (which might last for hours) or some other temporary technical issues. Since in these scenarios there is no point in retrying a message for a limited amount of time and then move it to DLQ , i am trying to achieve infinite number of retries when we are getting certain type of exceptions (e.g. DBHostNotAvaialableException)
In order to achieve this i tried 2 approaches (facing issues in both the approaches) -
In this approach, Tried setting an errorhandler on container properties while configuring ConcurrentKafkaListenerContainerFactory bean but the error handler is not getting triggered at all. While debugging the flow i realized in the KafkaMessageListenerContainer that are created have the errorHandler field is null hence they use the default LoggingErrorHandler. Below are my container factory bean configurations -
the #StreamListener method for this approach is the same as 2nd approach except for the seek on consumer.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object>
kafkaListenerContainerFactory(ConsumerFactory<String, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckOnError(false);
ContainerProperties containerProperties = factory.getContainerProperties();
// even tried a custom implementation of RemainingRecordsErrorHandler but call never went in to the implementation
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
Am i missing something while configuring factory bean or this bean is only relevant for #KafkaListener and not #StreamListener??
The second alternative was trying to achieve it using manual acknowledgement and seek, Inside a #StreamListener method getting Acknowledgment and Consumer from headers, in case a retryable exception is received, I do certain number of retries using retrytemplate and when those are exhausted I trigger a consumer.seek() . Example code below -
#StreamListener(MySink.INPUT)
public void processInput(Message<String> msg) {
MessageHeaders msgHeaders = msg.getHeaders();
Acknowledgment ack = msgHeaders.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
Consumer<?,?> consumer = msgHeaders.get(KafkaHeaders.CONSUMER, Consumer.class);
Integer partition = msgHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class);
String topicName = msgHeaders.get(KafkaHeaders.RECEIVED_TOPIC, String.class);
Long offset = msgHeaders.get(KafkaHeaders.OFFSET, Long.class);
try {
retryTemplate.execute(
context -> {
// this is a sample service call to update database which might throw retryable exceptions like DBHostNotAvaialableException
consumeMessage(msg.getPayload());
return null;
}
);
}
catch (DBHostNotAvaialableException ex) {
// once retries as per retrytemplate are exhausted do a seek
consumer.seek(new TopicPartition(topicName, partition), offset);
}
catch (Exception ex) {
// if some other exception just log and put in dlq based on enableDlq property
logger.warn("some other business exception hence putting in dlq ");
throw ex;
}
if (ack != null) {
ack.acknowledge();
}
}
Problem with this approach - since I am doing consumer.seek() while there might be pending records from last poll those might be processed and committed if DB comes up during that period(hence out of order). Is there a way to clear those records while a seek is performed?
PS - we are currently in 2.0.3.RELEASE version of spring boot and Finchley.RELEASE or spring cloud dependencies (hence cannot use features like negative acknowledgement either and upgrade is not possible at this moment).
Spring Cloud Stream does not use a container factory. I already explained that to you in this answer.
Version 2.1 introduced the ListenerContainerCustomizer and if you add a bean of that type it will be called after the container is created.
Spring Boot 2.0 went end-of-life over a year ago and is no longer supported.
The answer I referred you shows how you can use reflection to add an error handler.
Doing the seek in the listener will only work if you have max.poll.records=1.

Entity Framework Database Initialization: Timeout when initializing new Azure SqlDatabase

I have an ASP.NET MVC application. When a new customer is created via CustomerController I run a new background task (using HostingEnvironment.QueueBackgroundWorkItem) to create a new Azure SqlDatabase for that customer.
I use Entity Framework Code First to create/initialize the new database. Here's the code:
// My ConnectionString
var con = "...";
// Initialization strategy: create db and execute all Migrations
// MyConfiguration is just a DbMigrationsConfiguration with AutomaticMigrationsEnabled = true
Database.SetInitializer(strategy: new MigrateDatabaseToLatestVersion<CustomerDataContext, MyConfiguration>(useSuppliedContext: true));
using (var context = new CustomerDataContext(con))
{
// Neither 'Connection Timeout=300' in ConnectionString nor this line helps -> TimeoutException will rise after 30-40s
context.Database.CommandTimeout = 300;
// create the db - this lines throws the exception after ~40s
context.Database.Initialize(true);
}
My Problem is that I always get a TimeoutException after about 40secs. I think that happens because Azure cannot initialize the new database within this short period of time. Don't get me wrong: The database will be created well by Azure but I want to wait for that point / get rid of the TimeoutException.
Edit1:
I'm using Connection Timeout=300 in my ConnectionString but my app doesn't really care about that; after about 40s I'm always running into an SqlError.
Edit2:
The exception that raises is an SqlException. Message: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. Source: .Net SqlClient Data Provider
Edit3:
I can confim now that this has nothing to do with ASP.NET/IIS. Even in a simple UnitTest method the code above fails.
It seems that there is another CommandTimeout setting that is involved in database initialization process when using Code First Migrations. I want so share my solution here just in case anybody encounters this problem too.
Thanks to Rowan Miller for his hint pointing me to the solution.
Here's my code:
// Initialisation strategy
Database.SetInitializer(strategy: new CreateDatabaseIfNotExists<MyDataContext>());
// Use DbContext
using (var context = new MyDataContext(myConnectionString))
{
// Setting the CommandTimeout here does not prevent the database
// initialization process from raising a TimeoutException when using
// Code First Migrations so I think it's not needed here.
//context.Database.CommandTimeout = 300;
// this will create the database if it does not exist
context.Database.Initialize(force: false);
}
And my Configuration.cs class:
public sealed class Configuration : DbMigrationsConfiguration<MyDataContext>
{
public Configuration()
{
AutomaticMigrationsEnabled = false;
AutomaticMigrationDataLossAllowed = false;
// Very important! Gives me enough time to wait for Azure
// to initialize (Create -> Migrate -> Seed) the database.
// Usually Azure needs 1-2 minutes so the default value of
// 30 seconds is not big enough!
CommandTimeout = 300;
}
}
The command timeout and the connection timeout are two different settings. In this case you only increase the commandtimeout. You can increase the connection timeout in the web.config: Connection Timeout=120. The only time you want to increase the connection timeout is when you are creating the database.

nhibernate transactions and unit testing

I've got a piece of code that looks like this:
public void Foo(int userId)
{
try {
using (var tran = NHibernateSession.Current.BeginTransaction())
{
var user = _userRepository.Get(userId);
user.Address = "some new fake user address";
_userRepository.Save(user);
Validate();
tran.Commit();
}
}
catch (Exception) {
logger.Error("log error and don't throw")
}
}
private void Validate()
{
throw new Exception();
}
And I'd like to unit test if validations ware made correctly. I use nunit and and SQLLite database for testing. Here is test code:
protected override void When()
{
base.When();
ownerOfFooMethod.Foo(1);
Session.Flush();
Session.Clear();
}
[Test]
public void FooTest()
{
var fakeUser = userRepository.GetUserById(1);
fakeUser.Address.ShouldNotEqual("some new fake user address");
}
My test fails.
While I'm debugging I can see that exception is thrown, Commit has not been called. But my user still has "some new fake user address" in Address property, although I was expecting that it will be rollbacked.
While I'm looking in nhibernate profiler I can see begin transaction statement, but it is not followed neither by commit nor by rollback.
What is more, even if I put there try-catch block and do Rollback explicitly in catch, my test still fails.
I assume, that there is some problem in testing environment, but everything seems fine for me.
Any ideas?
EDIT: I've added important try-catch block (at the beginning I've simplified code too much).
If the exception occurs before NH has flushed the change to the database, and if you then keep using that session without evicting/clearing the object and a flush occurs later for some reason, the change will still be persisted, since the object is still dirty according to NHibernate. When rolling back a transaction you should immediately close the session to avoid this kind of problem.
Another way to put it: A rollback will not rollback in-memory changes you've made to persistent entities.
Also, if the session is a regular session, that call to Save() isn't needed, since the instance is already tracked by NH.

What steps can I take to make a worfklow activity resilient to exceptions

I am very new to Workflow Foundation development, and am worried that I am opening serious holes in our business process handling by not properly handling application / database exceptions in custom activities.
I would appreciate some steps that I could take to add this resiliency to my custom activities so that I can easily use the designer and other tools to ensure that, as far as I can, I do not create custom activities that are brittle and likely to cause workflow cleanup issues.
Here are some options, at different execution stages, that are available for you to use to handle exceptions.
First option (at activity/workflow execution time):
First of all, on custom activities, you should always try to treat exceptions inside it's execution. Some activities might not work but the overall workflow can continue and, in such cases, log the error to persistence and even show the user that something didn't work as expected but the thing will continue are good options.
That being said there'll always be cases where an activity have to (and even should) thrown exceptions and those should be treated at workflow level. Something like: if this exception occurs on this activity, do this, otherwise, do that.
Lets imagine you've a custom activity which persists something to DB:
public sealed PersistIntegerToDb : CodeActivity
{
public InArgument<int> ValueToPersist { get; set; }
protected override void Execute(CodeActivityMetadata metadata)
{
try
{
// persist
}
catch(SqlException exception)
{
// re throws the SqlException
throw new SqlException("'ValueToPersist' wasn't persisted.", exception);
}
}
}
Then, in your code or through designer you've available TryCatch activity to catch that error and treat it the way you want:
var workflow = new TryCatch
{
Try = new PersistIntegerToDb
{
ValueToPersist = 10
},
Catches =
{
new Catch<SqlException>
{
Action = new ActivityAction<SqlException>
{
Handler = new WriteLine
{
Text = "An error occurred and the value wasn't saved! Anyway workflow will continue..."
}
}
}
}
}
Or you can terminate it using TerminateWorkflow.
Second option (at design time):
Ok, but you can argue that client doesn't know that he have to handle those cases. In that case, and this is an usability option you might consider, instead of making available PersistIntegerToDb on the designer, you can provide an activity already surrounded by exceptions catches to handle, through IActivityTemplateFactory:
public sealed PersistIntegerToDbFactory : IActivityTemplateFactory
{
public Activity Create(DependencyObject target)
{
return new TryCatch
{
Try = new PersistIntegerToDb
{
ValueToPersist = 10
},
Catches =
{
new Catch<SqlException>
{
}
}
};
}
}
Now you just add PersistIntegerToDbFactoryas if it were a regular activity:
new ToolboxItemWrapper(typeof(PersistIntegerToDbFactory), null, "Persist Integer");
Third option (at validation time):
Never forget to validate workflow before execution!
var validationResults =
ActivityValidationServices.Validate(workflow);
foreach(var error in validationResults.Errors)
{
Console.WriteLine(string.Format(
"Validation error '{0}', generated on activity '{1}' in the property named {2}",
error.Message,
error.Source.DisplayName,
error.PropertyName));
}
Fourth option (at application execution time):
You can handle all not treated exception that might happen during execution, using OnUnhandledException event:
var wfApp = new WorkflowApplication(activity);
wfApp.OnUnhandledException +=
delegate(WorkflowApplicationUnhandledExceptionEventArgs e)
{
if (e.UnhandledException is SqlException)
{
Console.WriteLine("Some data wasn't properly persited.");
}
else
{
Console.WriteLine("Unknown error: " + e.UnhandledException.GetType());
Console.WriteLine("With message: " + e.UnhandledException.Message);
}
Console.WriteLine("Ok, workflow will be abort");
return UnhandledExceptionAction.Abort;
};
Note that, at this stage, you can only Abort, Cancel and Terminate the workflow and that's the reason why you should 1) avoid throwing exceptions or 2) treat exceptions inside your workflow. OnUnhandledException is your last chance to end the workflow execution gracefully and should always be treated even if for logging purposes. Something like DivideByZeroExceptions can occur and are almost impossible to predict and catch at validation time, for example.
As far as custom activities goes you should treat them as any other piece of code. Handle the errors you can and let you can't handle the rest bubble up.
At the workflow level you can use the TryCatch activity and workflow persistence to deal with errors. Specially persistence is something people overlook often. Add Persist activities at appropriate steps in your workflow and set the workflow to abort on unhandled errors. Now you can go back in and reload the last good workflow state and retry the actions that cause an unhandled exception. A great way of recovering from failures with resources like databases that might be unavailable for some reason and then come back.

Resources