TimeoutException problems.
Context: Visual Studio 2010
The problem: I occasionally (ca once a day on a program that runs once an hour) get an exception
that I do not understand.
This is the exception:
Exception message : The transaction has aborted.
Exception Source : System.Transactions
Exception Trace : at System.Transactions.TransactionStateAborted.BeginCommit(InternalTransaction tx, Boolean asyncCommit, AsyncCallback asyncCallback, Object asyncState)
at System.Transactions.CommittableTransaction.Commit()
at System.Transactions.TransactionScope.InternalDispose()
at System.Transactions.TransactionScope.Dispose()
at JDTranslation_K4_ReadEmails.Form1.ReadEmailDetails(EmailMessage emMessage, String strUserName, String strAccountName)
Exception Target : Void BeginCommit(System.Transactions.InternalTransaction, Boolean, System.AsyncCallback, System.Object)
The exception happens typically after a transaction that has lasted 25-30 minutes.
I have set up my transaction to last 5 hours (see below), so I do not understand at all why
it can time out.
And I am somewhat concerned: I know that some of the data are committed to the
database - can I trust that they all are? (It is nearly impossible to verify by
investigating the data.)
The program (part of the function) looks approximately like this:
//INSTANTIATE THE TRANSACTION SCOPE
TransactionOptions option = new TransactionOptions();
//SET THE ISOLATION LEVEL AND THE TIMEOUT DURATION
option.IsolationLevel = System.Transactions.IsolationLevel.ReadCommitted;
option.Timeout = new TimeSpan(5, 0, 0);
using (TransactionScope scope = new TransactionScope(TransactionScopeOption.Required, option))
{
try{
// do some select/update/insert/delete stuff on both connections,
// up to perhaps 3600 actions
// some of it happens in static functions declared elsewhere in the same class scope
} // this is the line mentioned in the Exception
}
I will be happy to any answers to some of my questions:
Why does the Exception happen?
Can I trust that all data are committed, or is there are real bug in C#.NET that allows partial commits?
Is my code correct, or should I organize it differently to work around the problem?
Please Help!
You can increase the time out in web.config. Moreover try to increase the pool size. Sometimes in long queries it increases the maximum thread limit.
Related
Having Consumer polling 2 records at a time, i.e.:
#Bean
ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = Map.of(
BOOTSTRAP_SERVERS_CONFIG, "localhost:9092",
GROUP_ID_CONFIG, "my-consumers",
AUTO_OFFSET_RESET_CONFIG, "earliest",
MAX_POLL_RECORDS_CONFIG, 2);
return new DefaultKafkaConsumerFactory<>(config, new StringDeserializer(), new StringDeserializer());
}
and ErrorHandler which can fail handling faulty record:
class MyListenerErrorHandler implements ContainerAwareErrorHandler {
#Override
public void handle(Exception thrownException,
List<ConsumerRecord<?, ?>> records,
Consumer<?, ?> consumer,
MessageListenerContainer container) {
simulateBugInErrorHandling(records.get(0));
skipFailedRecord(); // seek offset+1, which never happens
}
private void simulateBugInErrorHandling(ConsumerRecord<?, ?> record) {
throw new NullPointerException(
"DB transaction failed when saving info about failure on offset = " + record.offset());
}
}
Then such scenario is possible:
Topic gets 3 records
Consumer polls 2 records at a time
MessageListener fails to process the first record due to faulty payload
ErrorHandler fails to process the failure and itself throws an exception, e.g. due to some temporary issue
Third record gets processed
Second record is never processed (never enters MessageListener)
How to ensure no record is left unprocessed when ErrorHandler throws an exception with above scenario?
My goal is to achieve stateful retry logic with delays, but for brevity I omitted code responsible for tracking failed records and delaying retry.
I'd expect that after ErrorHandler throws an exception, skipping an entire batch of records should not happen. But it does.
Is it correct behavior?
Should I rather deal with commits manually that use Spring/Kafka defaults?
Should I use different ErrorHandler or handle method? (I need an access to Container to make a pause() for delayed retry logic; cannot use Thread.sleep())
Somehow related issue: https://github.com/spring-projects/spring-kafka/issues/1265
Full code: https://github.com/ptomaszek/spring-kafka-error-handler
The consumer has to be re-positioned (using seeks) in order to re-fetch the records after the failed one.
Use a DefaultErrorHandler (2.8.x and later) or a SeekToCurrentErrorHandler with earlier versions.
You can add retry options and a recoverer to deal with the failed record; by default it is just logged.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#default-eh
https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#seek-to-current
You need to do the seeks first (or in a finally block), before any exceptions can be thrown; the container does not commit the offset if the error handler throws an exception.
Kafka maintains 2 offsets - the current committed offset and the current position (set to the committed offset when the consumer starts). The next poll always returns the next record after the last poll. unless a seek is performed.
The default error handlers catch any exceptions thrown by the recoverer and makes sure that the current (and subsequent) records will be returned by the next poll. See SeekUtils.doSeeks().
I am using spring-cloud-stream with kafka binder to consume message from kafka . The application is basically consuming messages from kafka and updating a database.
There are scenarios when DB is down (which might last for hours) or some other temporary technical issues. Since in these scenarios there is no point in retrying a message for a limited amount of time and then move it to DLQ , i am trying to achieve infinite number of retries when we are getting certain type of exceptions (e.g. DBHostNotAvaialableException)
In order to achieve this i tried 2 approaches (facing issues in both the approaches) -
In this approach, Tried setting an errorhandler on container properties while configuring ConcurrentKafkaListenerContainerFactory bean but the error handler is not getting triggered at all. While debugging the flow i realized in the KafkaMessageListenerContainer that are created have the errorHandler field is null hence they use the default LoggingErrorHandler. Below are my container factory bean configurations -
the #StreamListener method for this approach is the same as 2nd approach except for the seek on consumer.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object>
kafkaListenerContainerFactory(ConsumerFactory<String, Object> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(kafkaConsumerFactory);
factory.getContainerProperties().setAckOnError(false);
ContainerProperties containerProperties = factory.getContainerProperties();
// even tried a custom implementation of RemainingRecordsErrorHandler but call never went in to the implementation
factory.getContainerProperties().setErrorHandler(new SeekToCurrentErrorHandler());
return factory;
}
Am i missing something while configuring factory bean or this bean is only relevant for #KafkaListener and not #StreamListener??
The second alternative was trying to achieve it using manual acknowledgement and seek, Inside a #StreamListener method getting Acknowledgment and Consumer from headers, in case a retryable exception is received, I do certain number of retries using retrytemplate and when those are exhausted I trigger a consumer.seek() . Example code below -
#StreamListener(MySink.INPUT)
public void processInput(Message<String> msg) {
MessageHeaders msgHeaders = msg.getHeaders();
Acknowledgment ack = msgHeaders.get(KafkaHeaders.ACKNOWLEDGMENT, Acknowledgment.class);
Consumer<?,?> consumer = msgHeaders.get(KafkaHeaders.CONSUMER, Consumer.class);
Integer partition = msgHeaders.get(KafkaHeaders.RECEIVED_PARTITION_ID, Integer.class);
String topicName = msgHeaders.get(KafkaHeaders.RECEIVED_TOPIC, String.class);
Long offset = msgHeaders.get(KafkaHeaders.OFFSET, Long.class);
try {
retryTemplate.execute(
context -> {
// this is a sample service call to update database which might throw retryable exceptions like DBHostNotAvaialableException
consumeMessage(msg.getPayload());
return null;
}
);
}
catch (DBHostNotAvaialableException ex) {
// once retries as per retrytemplate are exhausted do a seek
consumer.seek(new TopicPartition(topicName, partition), offset);
}
catch (Exception ex) {
// if some other exception just log and put in dlq based on enableDlq property
logger.warn("some other business exception hence putting in dlq ");
throw ex;
}
if (ack != null) {
ack.acknowledge();
}
}
Problem with this approach - since I am doing consumer.seek() while there might be pending records from last poll those might be processed and committed if DB comes up during that period(hence out of order). Is there a way to clear those records while a seek is performed?
PS - we are currently in 2.0.3.RELEASE version of spring boot and Finchley.RELEASE or spring cloud dependencies (hence cannot use features like negative acknowledgement either and upgrade is not possible at this moment).
Spring Cloud Stream does not use a container factory. I already explained that to you in this answer.
Version 2.1 introduced the ListenerContainerCustomizer and if you add a bean of that type it will be called after the container is created.
Spring Boot 2.0 went end-of-life over a year ago and is no longer supported.
The answer I referred you shows how you can use reflection to add an error handler.
Doing the seek in the listener will only work if you have max.poll.records=1.
I have the following code setup as a scheduled task:
public class OptimizeDatabase : IJob {
#region Constructor
public OptimizeDatabase(DataContext dataContext) {
DbContext = dataContext;
}
#endregion
#region Fields
private readonly DataContext DbContext;
#endregion
#region Methods
public async Task Execute() {
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
string result = "Ok";
try {
// Rebuild Indexes
DbContext.Database.ExecuteSqlCommand("EXEC sp_MSforeachtable \"ALTER INDEX ALL ON ? REBUILD WITH (ONLINE=OFF)\"");
// Update Statistics
DbContext.Database.ExecuteSqlCommand("EXEC sp_updatestats;");
}
catch (Exception ex) {
result = ex.Message + Environment.NewLine + ex.StackTrace;
}
stopWatch.Stop();
DbContext.TaskLogs.Add(new TaskLog {
Date = DateTime.Now,
ElapsedSeconds = stopWatch.Elapsed.TotalSeconds,
Result = result,
Task = "Optimize Database"
});
await DbContext.SaveChangesAsync();
}
#endregion
}
And it's configured to run in Startup.cs
RecurringJob.AddOrUpdate<OptimizeDatabase>(x => x.Execute(), Cron.Daily(10));
All other scheduled tasks execute without issue, however, this one always throws the following error:
Timeout expired. The timeout period elapsed prior to completion of the
operation or the server is not responding. at
System.Data.SqlClient.SqlConnection.OnError(SqlException exception,
Boolean breakConnection, Action`1 wrapCloseInAction)
Any ideas or insights are appreciated.
The answer to your question is pretty simple. You are trying to rebuild the indices for all tables in your database and executing this statement via a hangfire job. The Hangfire jobs now try to rebuild the index for their own tables and that creates a deadlock.
You have to rebuild the indices explicitly for all your tables one after one like:
ALTER INDEX ALL ON [dbo].[A] REBUILD;
ALTER INDEX ALL ON [dbo].[B] REBUILD;
The timeout issue is because one of the queries is taking longer that it should.
In .Net there are 2 timeout as far as I know, the connection timeout (ConnectionTimeout Property) and the command timeout (CommandTimeout Property). Both timeout default time is 30 seconds.
I recommend you to:
Run your queries at the SQL Management Studio to have an idea about the time needed to run both queries in sequences as your code shows.
Set the connection timeout and command timeout when the amount of second from the previous run at SQL Management Studio. If the timeout continues to appears, try adding more time to each timeout, add 30 seconds to each timeout until you find the minimal needed time to execute your queries. Once you find it, add 30 seconds extra to each timeout just to be sure.
Taking part of your code, the change would be something like:
try {
// Change CommandTimeout
DbContext.Database.CommandTimeout = 120;
// Rebuild Indexes
DbContext.Database.ExecuteSqlCommand("EXEC sp_MSforeachtable \"ALTER INDEX ALL ON ? REBUILD WITH (ONLINE=OFF)\"");
// Update Statistics
DbContext.Database.ExecuteSqlCommand("EXEC sp_updatestats;");
}
catch (Exception ex) {
result = ex.Message + Environment.NewLine + ex.StackTrace;
}
You can take a look at this article, it explains good scenarios of possible timeout causes https://stackoverflow.com/a/8603111/2654879 .
We had face same issue, where we have to do somany table operations in a single transaction scope. The hangfire jobs get failed with the same error when multiple jobs runs parallel. The way we solved this issue is by increase the command timeout value(By default it's 30 second)
I'm seeing the dreaded "The timeout period elapsed prior to obtaining a connection from the pool" error.
I've searched the code for any unclosed db connections, but couldn't find any.
What I want to do is this: the next time we get this error, have the system dump a list of which procs or http requests are holding all the handles, so I can figure out which code is causing the problem.
Even better would be to see how long those handles had been held, so I could spot used-but-unclosed connections.
Is there any way to do this?
If you are lucky enough that connection creation/opening is centralized then the following class should make it easy to spot leaked connections. Enjoy :)
using System.Threading; // not to be confused with System.Timer
/// <summary>
/// This class can help identify db connection leaks (connections that are not closed after use).
/// Usage:
/// connection = new SqlConnection(..);
/// connection.Open()
/// #if DEBUG
/// new ConnectionLeakWatcher(connection);
/// #endif
/// That's it. Don't store a reference to the watcher. It will make itself available for garbage collection
/// once it has fulfilled its purpose. Watch the visual studio debug output for details on potentially leaked connections.
/// Note that a connection could possibly just be taking its time and may eventually be closed properly despite being flagged by this class.
/// So take the output with a pinch of salt.
/// </summary>
public class ConnectionLeakWatcher : IDisposable
{
private readonly Timer _timer = null;
//Store reference to connection so we can unsubscribe from state change events
private SqlConnection _connection = null;
private static int _idCounter = 0;
private readonly int _connectionId = ++_idCounter;
public ConnectionLeakWatcher(SqlConnection connection)
{
_connection = connection;
StackTrace = Environment.StackTrace;
connection.StateChange += ConnectionOnStateChange;
System.Diagnostics.Debug.WriteLine("Connection opened " + _connectionId);
_timer = new Timer(x =>
{
//The timeout expired without the connection being closed. Write to debug output the stack trace of the connection creation to assist in pinpointing the problem
System.Diagnostics.Debug.WriteLine("Suspected connection leak with origin: {0}{1}{0}Connection id: {2}", Environment.NewLine, StackTrace, _connectionId);
//That's it - we're done. Clean up by calling Dispose.
Dispose();
}, null, 10000, Timeout.Infinite);
}
private void ConnectionOnStateChange(object sender, StateChangeEventArgs stateChangeEventArgs)
{
//Connection state changed. Was it closed?
if (stateChangeEventArgs.CurrentState == ConnectionState.Closed)
{
//The connection was closed within the timeout
System.Diagnostics.Debug.WriteLine("Connection closed " + _connectionId);
//That's it - we're done. Clean up by calling Dispose.
Dispose();
}
}
public string StackTrace { get; set; }
#region Dispose
private bool _isDisposed = false;
public void Dispose()
{
if (_isDisposed) return;
_timer.Dispose();
if (_connection != null)
{
_connection.StateChange -= ConnectionOnStateChange;
_connection = null;
}
_isDisposed = true;
}
~ConnectionLeakWatcher()
{
Dispose();
}
#endregion
}
There are some good links for monitoring connection pools. Do a google search for ".net connection pool monitoring".
One article I referred to a while back was Bill Vaughn's article (Note this is old but still contains useful info). It has some info on monitoring connection pools, but some great insights as to where leaks could be occuring as well.
For monitoring, he suggests;
"Monitoring the connection pool
Okay, so you opened a connection and closed it and want to know if the
connection is still in place—languishing in the connection pool on an
air mattress. Well, there are several ways to determine how many
connections are still in place (still connected) and even what they
are doing. I discuss several of these here and in my book:
· Use the SQL Profiler with the SQLProfiler TSQL_Replay
template for the trace. For those of you familiar with the Profiler,
this is easier than polling using SP_WHO.
· Run SP_WHO or SP_WHO2, which return information from the
sysprocesses table on all working processes showing the current status
of each process. Generally, there’s one SPID server process per
connection. If you named your connection, using the Application Name
argument in the connection string, it’ll be easy to find.
· Use the Performance Monitor (PerfMon) to monitor the pools
and connections. I discuss this in detail next.
· Monitor performance counters in code. This option permits
you to display or simply monitor the health of your connection pool
and the number of established connections. I discuss this in a
subsequent section in this paper."
Edit:
As always, check out some of the other similar posts here on SO
Second Edit:
Once you've confirmed that connections aren't being reclaimed by the pool, another thing you could try is to utilise the StateChange event to confirm when connections are being opened and closed. If you are finding that there are a lot more state changes to opened than to closed, then that would indicate that there are leaks somewhere. You could also then log the data in the statechanged event along with a timestamp, and if you have any other logging on your application, you could start to parse the log files for instances where there appears to be state changes of closed to open, with no corresponding open to closed. See this link for more info on how to handle the StateChangedEvent.
i've used this
http://www.simple-talk.com/sql/performance/how-to-identify-slow-running-queries-with-sql-profiler/
to find long running stored procedures before, i can then work back and find the method that called the SP.
dont know if that'll help
I have a site which runs in ASP.NET 3.5, NHibernate 2.2 and Sprint .NET for Dependency Injection. On our test server a rather strange error occurrs, and also almost everytime there are multiple users online. After the problem has occurred, this error is displayed for every user and every request they make - until you do an IISRESET. Then it is all ok again.
Here's the exception:
'count' must be non-negative.
Parameter name: count
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ArgumentOutOfRangeException: 'count' must be non-negative.
Parameter name: count
Source Error:
[No relevant source lines]
Source File: c:\Windows\Microsoft.NET\Framework64\v2.0.50727\Temporary ASP.NET Files\root\4bf9aa39\6dcf5fc6\App_Web_z9ifuy6t.6.cs Line: 0
Stack Trace:
[ArgumentOutOfRangeException: 'count' must be non-negative.
Parameter name: count]
System.String.CtorCharCount(Char c, Int32 count) +10082288
Spring.Objects.Factory.Support.AbstractObjectFactory.GetObjectInternal(String name, Type requiredType, Object[] arguments, Boolean suppressConfigure) +3612
Spring.Objects.Factory.Support.AbstractObjectFactory.GetObject(String name) +75
Spring.Objects.Factory.Support.DefaultListableObjectFactory.GetObjectsOfType(Type type, Boolean includePrototypes, Boolean includeFactoryObjects) +365
Spring.Context.Support.AbstractApplicationContext.GetObjectsOfType(Type type, Boolean includePrototypes, Boolean includeFactoryObjects) +136
Spring.Context.Support.AbstractApplicationContext.GetObjectsOfType(Type type) +66
[ActivationException: Activation error occured while trying to get instance of type InfoTextService, key ""]
Microsoft.Practices.ServiceLocation.ServiceLocatorImplBase.GetInstance(Type serviceType, String key) in c:\Home\Chris\Projects\CommonServiceLocator\main\Microsoft.Practices.ServiceLocation\ServiceLocatorImplBase.cs:57
Microsoft.Practices.ServiceLocation.ServiceLocatorImplBase.GetInstance() in c:\Home\Chris\Projects\CommonServiceLocator\main\Microsoft.Practices.ServiceLocation\ServiceLocatorImplBase.cs:90
OurProjectsNamespace.Infrastructure.ObjectLocator.LocateService() +86
This is indeed a very wierd error. When you look at the source of the AbstractObjectFactory.GetObjectInternal you will see the following structure:
[ThreadStatic]
private int nestingCount;
protected object GetObjectInternal(...)
{
const int INDENT = 3;
bool hasErrors = false;
try
{
nestingCount++;
if (log.IsDebugEnabled)
{
log.Debug("msg" +
new String(' ', nestingCount * INDENT));
}
// More code: Calls self recursively.
}
catch
{
nestingCount--;
hasErrors = true;
if (log.IsErrorEnabled)
{
log.Error("msg" +
new String(' ', nestingCount * INDENT));
}
}
finally
{
if (!hasErrors)
{
nestingCount--;
if (log.IsDebugEnabled)
{
log.Debug("msg" +
new String(' ', nestingCount * INDENT));
}
}
}
}
The exception you are seeing must be thrown by one of the three new String(' ', nestingCount * INDENT) calls. That particular string constructor call throws when the supplied value is negative. Because INDENT is a const, nestingCount must have a negative value in that case. nestingCount is a thread-static variable. Thread-static variables are always initialized with their default value (0 in this case) and can't be influenced by other threads. Further more, nestingCount is never used outside this method.
Because nestingCount is thread-static and only used in that method, it is hard to imagine a scenario were nestingCount can get negative. Perhaps in the case of an asynchronous (ThreadAbort) exception, but even this I find hard to imagine. Other option is that the thread-static variable is changed by someone else using reflection.
But the big question is: how to solve this?
Solution:
There's only one thing I can think of and that is reconfigure log4net in a way that debug information isn't logged. When disallowing debug information, the string(char, int) constructor will probably never get called again, which will hide the problem. Not very pretty, but possibly effective. This might work, because the AbstractObjectFactory logs using the log variable that is initialized as follows:
this.log = LogManager.GetLogger(this.GetType());
You can do this by globally disabling the writing of debug information in log4net, or –when you think this is overkill- by configuring log4net to disable debug info for the type Spring.Objects.Factory.Support.DefaultListableObjectFactory (the instance that is actually causes the exception).
Good luck.
I have seen this error occur when a database column is mapped to more than one property. A typical case is when a foreign key column is mapped to a property and to collection. A second or third pair of eyes on the config files helps spot these.
One twist with this is that it occurs for all users. Do you have a persistent object that is stored in application state?
In my case, this error occurred after some time during performance testing. It starts fine and after some time this error pops up.
Turns out it was caused by a totally unrelated [ThreadLocal] variable I used in my code. I replaced it with a method parameter and now it works fine.