I have a request to my own service that takes 15 seconds for the service to complete. Should not be a problem right? We actually have a service-side timer so that it will take at most 15 seconds to complete. However, the client is seeing "the connection was forcibly closed " and is automatically (within in the System.Net layer--I have seen it by turning on the diagnostics) retrying the GET request twice.
Oh, BTW, this is a non-SOAP situation (WCF 4 REST Service) so there is none of that SOAP stuff in the middle. Also, my client is a program, not a browser.
If I shrink the time down to 5 seconds (which I can do artificially), the retries stop but I am at a loss for explaining how the connection should be dropped so quickly. The HttpWebRequest.KeepAlive flag is, by default, true and is not being modified so the connection should be kept open.
The timing of the retries is interesting. They come at the the end of whatever timeout we choose (e.g. 10, 15 seconds or whatever) so the client side seems to be reacting only after getting the first response.
Another thing: There is no indication of a problem on the service side. It works just fine but sees a surprising (to me) couple of retry of the requests from the client.
I have Googled this issue and come up empty. The standard for keep-alive is over 100 seconds AFAIK so I am still puzzled why the client is acting the way it is--and the behavior is within the System.Net layer so I cannot step through it.
Any help here would be very much appreciated!
== Tevya ==
Change your service so it sends a timeout indication to the client before closing the connection.
Sounds like a piece of hardware (router, firewall, load balancer?) is sending a RST because of some configuration choice.
I found the answer and it was almost totally unrelated to the timeout. Rather, the problem is related to the use of custom serialization of response data. The respose data structures have some dynamically appearing types and thus cannot be serialized by the normal ASP.NET mechanisms.
To solve that problem, we create an XmlObjectSerializer object and then pass it plus the objects to serialize to System.ServiceModel.Channels.Message.CreateMessage(). Here two things went wrong:
An unexpected type was added to the message [how and why I will not go into here] which caused the serialization to fail and
It turns out that the CreateMessage() method does not immediately serialize the contents but defers the serialization until sometime later (probably just-in-time).
The two facts together caused an uncatchable serialization failure on the server side because the actual attempt to serialize the objects did not occur until the user-written service code had returned control to the WCF infrastructure.
Now why did it look like a timeout? Well, it turns out that not all the objects being returned had the unexpected object type in them. In particular, the first N objects did not. So, when the time limit was lengthened beyond 5 seconds, the N+1th object did reference the unknown type and was included in the download which broke the serialization. Later testing confirmed that the failure could happen even when only one object was being passed back.
The solution was to pre-process the objects so that no unexpected types are referenced. Then all goes well.
== Tevya ==
Related
I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
Edit
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Update
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!
I am yet to understand the behavior of web server thread, if I make an async call to say, a database, and immediately return response ( say OK ) to the client without even waiting for the async call to return back. First of all, is it a good approach ? What will happen to the thread which made the async call and if it is used again to serve another request and then the previous async call returns to this particular thread. Or does web server holds this thread waiting till the async call which it made, returns. Then the issue would be many hanging threads would be open as and web server would be available to take more requests. I am looking for an answer.
It depends on the way your HTTP servers works. But you should be very cautious.
Let's say you have a main event loop taking care of incoming HTTP connections, and workers threads which manage the HTTP communications.
A worker thread should be considered ready to accept a new HTTP request management only when it is effectively completly ready for that.
In terms of pure HTTP the more important thing is to avoid sending a response before having received the whole query. It seems simple, and it's usually the case. But if the query as a body, which may be a chunked body, it could take time to receive the whole message.
You should never send a response before, unless it's something like a 400 bad request response, followed by a real tcp/ip connection closing. If you fail to do so, and you have a message length parsing issue, the fact that you sent a response before the end of the query may lead to security problems. It could be used to exploit differences in the parsing of messages between your server and any other HTTP agent in front of your server (ssl terminator, reverse proxy, etc), in some sort of http smuggling issue. For this agent, if you made a response, it means you had the whole message, and it can send the next message, where you will in fact think this is just another part of the body.
Now if you have the whole message, you can decide to send an early response and detach an asynchronous task to really perform some sort of stuff. but this means:
you have to assume that no more output should be generated, you will not try to send any output to the request issuer, you should consider that the communication is now closed
the worker thread should not receive new requests to manage, and this is the hard part. If this thread is marked as available for a new request, it may also be killed by the thread manager (you have in Nginx or Apache request counters associated with workers, and they are killed after reaching a limit, to create fresh ones). it may also receive a gracefull reload command (usually it's a kill), etc.
So you start to enter a zone where you should know the internals of the HTTP server, which is maybe managed by you, or not, and where changes may appear sooner or later. And you start to make very strange things, which leads usually to strange issues, hard to reproduce.
Uausally the best way to handle asynchronous tasks, while still being able to understand what happen, is to use a messaging system. Put a list of tasks in queue, and get a parallel asynchronous worker process which does things with theses tasks. track status of theses tasks if you need it.
Same things may apply with the client, after receiving a very fast HTTP answer, it may need to perform some ajax status polling for the task status. And you will maybe only have to check the status of the task in the queue to send a response.
You will get more control on the whole thing.
For me I really dislike having detached threads, coming from strange code, performing heavy tasks without any way of outputing a status or reporting errors, and maybe preventing the nice application stop calls (still waiting for strange threads to join) which does not imply a killall.
It depends whether this asynchronous operation performs something which the client should be notified about.
If you return 200 OK (i.e. successfully completed) and later the asynchronous operation fails then the client will not know about the error.
You of course have some options like sending some kind of push notification over websocket or sending another request which would return the actual result and things like that. So basically depends on your needs...
I'm using Rebus SQLTransport with XML serialized messages for integration with SQL Server. Messages represent changes done in SQL Server. Because of that the order of message delivery is essential.
It is because for example message1 may contain object that is referenced (by id) in message2. Another example is that message1 may contain remove request of some object that is required to accept new object from message2.
Aggregating messages into one message would be quite complicated because messages are generated by triggers.
Having message idempotence and one worker I guess that would work except the fact that won't work if error happens and message will be moved to error queue. The error is quite possible to happen because of validation or business logic exception. Because of that I believe only human can fix the problem with message and until that time other messages should not be delivered. So I wanted to ask for advice what would be best to do in that situation. As far as I saw retry number cannot be set to infinity so should I stop the service inside of handler until problem is solved by human?
Thanks in advance
If it's important that the messages are processed in order without any "holes", I suggest you assign a sequence number to each message.
This way, if the endpoint gets a message whose sequence number is greater than the expected sequence number it can throw an exception, thus preventing out-of-order messages to be processed.
I would only do this if errors are uncommon though, and only if the message volume is fairly small.
If in-order processing is required, a much better design would be to use another message processing library that supports a pull model, which I think would fit your scenario much better than Rebus' push model.
[Teradata Database] [3130] Response limit exceeded
I have no idea what is causing this random error message. It happens when I am making a call to the database for a SELECT or to execute a stored procedure. I wish I had more information on how to reproduce this, but it appears to be intermittent.
What does this error actually mean? What types of conditions could cause this?
Edit: I've discovered that the issue goes away when I build my ASP.NET app (vs2012). It's like something related to connections is being cached somewhere on my machine. After I recycle the app pool with a rebuild, it resets everything. The next time this happens, I will try saving the web.config file which automatically recycles the app pool without rebuilding the DLL.
This is a cut&paste from the Messages manual:
3130 Response limit exceeded.
Explanation: There is a TDBMS limit of 16 outstanding responses for a
single session. If responses are allowed to pile up by an application,
this error will occur. A response is the response set from a SELECT
statement. A response is kept by the TDBMS until we know the user is
done with it at which point it is cancelled. There are two scenarios:
If KeepResp is OFF, the response is automatically cancelled when all rows have been returned to the application and the host has been
notified of the end of the response.
If KeepResp is ON, the response is held until explicitly cancelled by the user. In each case, the response can be explicitly cancelled by
the application.as soon as it is no longer needed.
Generated By: Dispatcher.
For Whom: End user.
Remedy: The responses are the property of the session and will
automatically automatically be cancelled if the session is logged off.
Cancel an old response and resubmit the request or transaction.
As you already noticed this is usually caused by a misbehaving application, too many open result sets on the server side. It's the client's responsibility to close them :-)
In my case error cam when I tried to create more then 15 Statements/Preparedstatements on same Connection Instance & then executedQuery on them.
So you should check that more then 15 Statements are not getting created on same connection or they must get closed before creating another.
Result set is generally out the picture if we are talking about 313 response limit exceed as Statement automatically closes existing resultset in case if it's reused.
I'm using web api self host inside a windows service and I've encountered a problem and after googling for couple of hours haven't found a reasonable answer.
One of the api controllers serves large stream of data (not really that large, couple of tens of MB). It takes some time to produce data so I've decided to use TransferMode.StreamedResponse to minimize the time client has to wait for response. I've also added a CompressHandler and custom CompressedContent (derived from HttpContent) mostly based on a following answer.
The controller returns an instance of IDataReader, which is then serialized by custom formatter which is lastly compressed inside CompressedContent that I've mentioned. The whole data passing is streamed so while the client receives data, a data reader on server side may still be reading rows from database. Everything works fine when client is acting nicely.
The problem occurs when a client drops connection while the data is still being serialized to the underlying network stream. I've tried to watch for IsFaulted task inside of ContinueWith delegate (in CompressedContent from the link) and dispose underlying network Stream. Unfortunately the CommunicationException (The specified network name is no longer available) is still being thrown when the control leaves my code. From the stacktrace it looks like the exception is thrown when the Web Api tries to close (end) the underlying network stream (http channel?). As it happens with unobserved exceptions it brings entire windows service down.
I've mitigated the problem by setting windows service recovery options but I would like to know if this failure can be handled in code.
Is there a way to setup a custom error handler (IErrorHandler presumably) inside web api self hosting service mode to prevent this kind of error?
I'm using Beta version, I will try to reproduce this error on RC but I somehow doubt that setting up this kind of error handler would change in any way
We had the same issue. I was able to submit a fix to MS and they have in turn released a nightly build that fixes this. They are looking at back porting the fix to the RTM. You can see the pull release here: http://aspnetwebstack.codeplex.com/SourceControl/network/forks/rdean79/issue284/contribution/3329