How can I set alert in application insights for a specific exception for e.g I want to get alerted in case of any outage in some areas of application even though application is available for e.g I want to get an alert if there are exception with the exception description like
1.) A network-related or instance-specific error occurred while establishing a connection to SQL SERVER.
2.) Timeout expired.
3.) Insufficient system storage
At this time, we only support alerting on metrics. Alerting on counts of exceptions by type or "ProblemID" is high on our backlog though. Ability to alert on arbitrary text in the exception description or message/stack trace however, is further out. At this time, a potential work around may be to send metrics of substantially different outlier values (as compared to normal operation, base line values).
Related
After deploying the function, I got this intermittent error several times.
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
The error log also shows "The request was aborted because there was no available instance."
Generally the error occurs when the Cloud Function infrastructure is not able to scale up instances fast enough to handle the incoming load. Hence, the requests get aborted as no instance becomes available to serve the requests.
This could be due to the following conditions:
A huge sudden increase in traffic.
A long cold start time.
A long request processing time.
High function error rate.
Reaching the maximum instance limit and hence the system cannot scale any further.
Transient factors attributed to the Cloud Functions service.
To mitigate the issue, you can configure the minimum instances other than 0 for the Cloud Function to avoid cold start and make sure there are instances ready to serve the requests. However, note that setting the minimum instances will incur additional charges.
From the Pro ASP.NET Core MVC 2 book (page 417):
The ASP.NET Debugging Levels
...
Critical - This level is used for messages that describe catastrophic failures.
Error - This level is used for messages that describe errors that interrupt the application....
What is the difference between catastrophic failures and interrupting?
The official Microsoft documentation explains it a little more clearly when discussing log levels:
Error = 4
For errors and exceptions that cannot be handled. These messages indicate a failure in the current activity or operation (such as the
current HTTP request), not an application-wide failure. Example log
message: Cannot insert record due to duplicate key violation.
Critical = 5
For failures that require immediate attention. Examples: data loss
scenarios, out of disk space.
See https://learn.microsoft.com/en-us/aspnet/core/fundamentals/logging/?tabs=aspnetcore2x (under the "Log Level" section.
In other words, level 4 "error" will be used for something which crashes the application in its current activity, but probably don't prevent it from continuing to serve other requests or perform other operations. Most exceptions will fall into this category.
On the other hand a level 5 "critical" error will be used for something which is likely to have a longer-term impact, potentially making the application entirely un-usable until the problem is resolved.
Today we experienced some failures from SG, BR and IE; all what seem like timeouts to our API which is hosted in the North Europe data centre:
There were no application failures on our side, so we can assume it to be a network transient-related issue between these regions and our server (North Europe).
Are there any resources that can help in confirming/troubleshooting such issues?
EDIT:
When drilling down into the failed request (as suggested by yonisha):
So it must have been either server/network (Azure) or application-related?
Yes, you can see the failures of the availability tests:
Click on the failed test instance, above the chart you've provided (make sure to change the 'Time range' to at least 1 hour.
Click on the failed request, which will open another blade with the exception details:
Depending on the exception details, you can then determine whether it was server/network issue (server was unavailable etc.), application level (you may have insufficient logging to log those failures) or client side issue (client has closed the connection etc.).
I'm not sure if I'm doing this right.
Our orchestration looks like this:
ReceiveOrder
TryScope (Long Running)
AcknowledgementScope (Atomic)
ConstructOrderAckMessage
TransformOrderToAck (using a map)
SendOrderAckToMessageQueue
AtomicWebServiceScope
ImportOrderToDBExpression
Construct and send message to another process
CatchException
ConstructErrorExpression
HandleExceptionStartOrchestration
When we tested this with about 6000 orders, we noticed that all of them resulted in an acknowledgment message (SendOrderAckToMessageQueue). The acknowledgment is a simple XML based on a schema provided by the crew that sends the order to this orchestration.
However, not all of them got imported into the database (ImportOrderToDBExpression) (about 45). In fact, there are no errors or failures or suspended instances of any kind. There's nothing unusual about the orders that did not get imported. If it failed, it did so silently.
Please note, that the AcknowledgementScope portion is something added recently; prior to that all the orders got imported successfully.
Is this because I have the Scope set incorrectly in this orchestration? Where else could the problem be? Is there a better way to send acknowledgment in a fool proof way? Thanks for any advice.
You don't mention any Catch Blocks. Do you have Catch Blocks on all your Scopes?
If there is an Exception without a Catch Block or a Catch Block that does not log the Exception, it will appear to silently fail.
Yes, the main thing you are doing wrong is calling an external DLL to insert records into a database.
Unless that DLL is very well written to be multi-threading capable including limiting the number of concurrent connections and has good retry and error handling capabilities then it can encounter an error and silently fail.
Even if you do have errors being logged in the DLL to the Event Log, you then have to give permissions for the Application name that the DLL uses to write to the event logs, otherwise the DLL will fail in it's catch blocks trying to write to the event log.
What you should be doing is using a Send Port with the appropriate Adapter to send records to the database.
Also there are very few situations in which you need an atomic scope. With an atomic scope it is up to the developer to implement any rollback. Also you probably do not need a long running scope unless you expect your Orchestration to take a long while and that is should dehydrate while waiting for a response.
Sending the Acknowledgement after the BizTalk Orchestration has received the message is fine, as long as you can then somehow resume a failed message in BizTalk, so you need to have some sort of retry mechanism.
Just curious if anyone else has got this particular error and know how to solve it?
The scenario is as follow...
We have an ASP.NET web application using Enterprise Library running on Windows Server 2008 IIS farm connecting to a SQL Server 2008 cluster back end.
MSDTC is turned on. DB connections are pooled.
My suspicion is that somewhere along the line there is a failed MSDTC transaction, the connection got returned to the pool and the next query on a different page is picking up the misbehaving connection and got this particular error. Funny thing is we got this error on a query that has no need whatsoever with distributed transaction (committing to two database, etc.). We were only doing select query (no transaction) when we got the error.
We did SQL Profiling and the query got ran on the SQL Server, but never came back (since the MSDTC transaction was already aborted in the connection).
Some other related errors to accompany this are:
New request is not allowed to start
because it should come with valid
transaction descriptor.
Internal .Net Framework Data Provider error 60.
MSDTC has default 90 seconds timeout, if one query execute exceed this time limit, you will encounter this error when the transaction is trying to commit.
A bounty may help get the answer you seek, but you're probably going to get better answers if you give some code samples and give a better description of when the error occurs.
Does the error only intermittently occur? It sounds like it from your description.
Are you enclosing the close that you want to be done as a transaction in a using TransactionScope block as Microsoft recommends? This should help avoid weird transaction behavior. Recall that a using block makes sure that the object is always disposed regardless of exceptions thrown. See here: http://msdn.microsoft.com/en-us/library/ms172152.aspx
If you're using TransactionScope there is an argument System.TransactionScopeOption.RequiresNew that tells the framework to always create a new transaction for this block of code:
Using ts As New Transactions.TransactionScope(Transactions.TransactionScopeOption.RequiresNew)
' Do Stuff
End Using
Also, if you're suspicious that a connection is getting faulted and then put back into the connection pool, the likely solution is to enclose the code that may fault the connection in a Try-Catch block and Dispose the connection in the catch block.
Old question ... but ran into this issue past few days.
Could not find a good answer until now. Just wanted to share what I found out.
My scenario contains multiple sessions being opened by multiple session factories. I had to correctly rollback and wait and make sure the other transactions were no longer active. It seems that just rolling back one of them will rollback everything.
But after adding the Thread.Sleep() between rollbacks, it doesn't do the other and continues fine with the rollback. Subsequent hits that trigger the method don't result in the "New request is not allowed to start because it should come with valid transaction descriptor." error.
https://gist.github.com/josephvano/5766488
I have seen this before and the cause was exactly what you thought. As Rice suggested, make sure that you are correctly disposing of the db related objects to avoid this problem.