Trying to work out why some of my application servers have creeped up over 1s response times using newrelic. We're using WebApi 2.0 and MVC5.
As you can see below the bulk of the time is spent under 'WebTransaction'. The throughput figures aren't particularly high - what could be causing this, and what are the steps I can take to reduce it down?
Thanks
EDIT I added transactional tracing to this function to get some further analysis - see below:
Over 1 second waiting in System.Web.HttpApplication.BeginRequest().
Any insight into this would be appreciated.
Ok - I have now solved the issue.
Cause
One of my logging handlers which syncs it's data to cloud storage was initializing every time it was instantiated, which also involved a call to Azure table storage. As it was passed into the controller in question, every call to the API resulted in this instantiate.
It was a blocking call, so it added ~1s to every call. Once i configured this initialization to be server life-cycle wide,
Observations
As the blocking call was made at the time of the Controller being build (due to Unity resolving the dependancies at this point) New Relic reports this as
System.Web.HttpApplication.BeginRequest()
Although I would love to see this a little granular, as we can see from the transactional trace above it was in fact the 7 calls to table storage (still not quite sure why it was 7) that led me down this path.
Nice tool - my new relic subscription is starting to pay for itself.
It appears that the bulk of time is being spent in Account.NewSession. But it is difficult to say without drilling down into your data. If you need some more insight into a block of code, you may want to consider adding Custom Instrumentation
If you would like us to investigate this in more depth, please reach out to us at support.newrelic.com where we will have you account information on hand.
Related
I want to use the forge viewer as a preview tool in my web app for generated data.
The problem I have is that the model derivative API is sometimes slow sometimes fast.
I read that this happens because the files are placed in a queue and being processed subsequentially.
In my opinion, this can be solved by:
Having the extraction.update webhook also tell me where I am in the queue. So I can inform my users with better progress information. Or when the queue is too long I can not stop the process.
Being able to have a private queue. I have no problem paying more credits if necessary.
Being able to generate svf2 files on my own server.
But I don't know if any of these options are possible. Or if there is another workaround.
Yes, that could be useful. I logged that request in our system: DERI-7940
Might be considered later on, but no plans currently
I'm not aware of any plans for that
We're always working on making the translation service better, but unfortunately, I cannot tell when it will meet your requirements - including the implementation of the webhook feature you mentioned.
SVF2 is specifically for very large models - is that what you are working with? If not, then I'm quite certain that translating to SVF would be faster.
I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
Edit
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Update
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!
Background: I have an ASP.NET Core App and have an API method that takes a file name of a blob that the frontend has uploaded to Azure Blob. It then needs to create a thumbnail version of the blob and return the name of the newly uploaded thumbnail Blob. Sometimes, for exactly the same file size it can take up to 40 seconds to complete. Mostly, it's around 400ms.
Below is the end to end from App Insights, I have a few things I don't understand:
1) The request duration is 37.5 s but yet the other operations add up to nowhere near this time
2) Why are there calls to master db? We are using EF6 with multiple contexts
3) The app is using an Azure App Service and SQL Azure. I don't understand why the response time is so inconsistent.
Any help would be much appreciated!
I've noticed multiple time that the first request after an application is deployed to Azure or after a long period that no requests were made to the application, it takes significantly longer to get a response.
As far as I remember it was related to start-up time of the site (if you're using an App Service on Windows based underlying VM it still uses IIS as a reverse proxy).
I solved the issue by configuring health checks that occasionally perform requests to the app.
Also, in addition to Application Insights (which logs information only after the application has started), you can try the tools listed here to see more information.
Hope it helps!
1.
The way the request timeline is displayed gives you only the time-span for the whole request (37.5s) and the individual time-spans for each dependency.
A dependency being another call that sends its run-time to the application insights.
In your example each call to the database is automatically tracked as a dependency. The code running after each database call is not though.
So e.g. requesting a database entry which takes 200ms and then issuing a Thread.Sleep of 2 seconds and requesting another database entry which takes 300ms would result in a 2 second gap between the two database-call dependencies which will each be listed with 200/300ms respectively.
You can use TelemetryClient.TrackDependency to wrap parts of your own code into its own dependency. This way you will see your own code as an entry on the request timeline.
2.
Depending on your EntityFramework database-initialisier EF will connect to the master db on context creation. (E.g. to create the database if it does not exist).
3.
Try tracking your own code to find out what parts of it are slow. EF has a few performance issues to consider, try to understand the performance caveats of the libs you use. If your calls are inconsistently slow it might be an issue with resources being over-utilized or caches being emptied too early (like for EF warm vs. cold queries).
I am testing a feature which requires a Firebase database write to happen at midnight everyday. Now it is possible that at this particular time, the client app might not be connected to the internet.
I have been using Firebase with persistence off as that can potentially cause issues of stale data in another feature of mine.
From my observation, if I disconnect the app before the write and keep it this way for a minute or so, Firebase eventually reconnects when I turn on the connectivity again and performs the write.
My main questions are:
Will this behaviour be consistent even if the connectivity is lost for quite a few hours?
Will Firebase timeout?
Since it is inside a forever running service, does it still need persistence to ensure that writes are not lost? (assume that the service does not restart).
If the service does restart, will the writes get lost?
I have some experience with this exact case, and I actually do NOT recommend the use of a background service for managing your Firebase requests. In fact, I wouldn't recommend managing Firebase requests at all (explained later).
Services, even though we can make them run forever, tend to get killed by the system quite a lot actually (unless you set their CPU priority to a higher level, but even then the system still might kill them).
If you call a Firebase Write call (of any kind), and your service gets killed, the write will get lost as you said. Unless, you create a sophisticated manager in which you store requests that haven't been committed into your internal storage, and load them up each time the service is restarted - but that is a very dirty work to do, considering the fact that Firebase Developers took care of us and made .setPersistenceEnabled(true) :)
I know, you mentioned you don't want to use it, but I STRONGLY advise you to do so. It works like charm, no services required, and you don't have to worry at all about managing your write requests. Perhaps it would be better to solve the other issue you have in order to make this possible.
To sum up, here's what I would do in your case:
I would call the .setPersistenceEnabled(true) someplace at the beginning (extending the Application class and calling it from onCreate() is recommended)
I would use Android's AlarmManager and register a BroadcastReceiver to receive an alarm at midnight (repetitive or not - you decide)
Inside the BroadcastReceiver, I'd simply call a write function of Firebase and worry about nothing :)
To make sure I covered all of your questions:
will this behaviour be consistent....
No. Case-scenario: Midnight time, your service has successfully received the call and is now trying to write into Firebase. If, for example, the user has no connection until 6 AM (just a case scenario), there is a very high chance that the system will kill it during those 6 hours, and your write will get lost. Flight Time, or staying in an area with no internet coverage - both are examples of risky scenarios that could break your app's consistency
Will Firebase Timeout?
It definitely could, as mentioned. I wouldn't take the risk and make a 80-90% working app. Use persistence and have a 100% working app :)
I believe I covered the rest of the questions..
Good luck!
I know that similar questions have been asked all over the place, but I'm having trouble finding one that relates directly to what I'm after.
I have a website where a user uploads a data file, then that file is transformed and imported into SQL. The file could be up to 50mb in size, and some times this process can take 30 minutes or sometimes even longer.
I realise I need to palm off the actual work to another process, and poll that process on the web page. I'm wondering what the best approach would be though? Being a web developer by trade, I'm finding all this new Windows Service stuff a bit confusing, and I just wanted somewhere to start.
So:
Can I do / should I being doing this with a windows service? if so, how?
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
clarifications
The data is imported into sql, there's no file distribution taking place.
If there is a failure, it absolutely MUST be reported to the user. The web page will poll every, lets say, 5 seconds, from the time the async task begins, to get the 'status' of the import. Once it's finished another response will tell the page to stop polling for status updates.
queries on final decision
ok, so as I thought, it seems that a windows service is the best idea. So as to HOW to get it to work, it seems the 'put the file there and wait for the service to pick it up' idea is the generally accepted way, is there a way I can start a process run by the service, without it having to constantly be checking a database table / folder? As I said earlier, I don't have any experience with Windows Services - I wondered if I put a public method in the service, can I call it somehow?
well ...
var thread = new Thread(() => {
// your action
});
thread.Start();
but you will have problems with that:
what if the import to sql fails? should there be any response to the client
if it fails, how do you ensure the file on a later request
what if the applications shuts down ... this newly created and started thread will be killed either
...
it's not always a good idea to store everything in sql (especially files...). if you want to make the file available to several servers why not distribute them via ftp ...?
i believe that your whole concept is a bit messed up (sry assuming this), and it might be helpful if you elaborate and give us more information about your intentions!
edit:
Can I do / should I being doing this
with a windows service? if so, how?
you can :) i advise you to create a simple console-program and convert this with srvany and sc. you can get a rough overview howto here (note: insert blanks after =... that's a silly pitfall)
the term should is relative, because you did not answer the most important question
what if a record is persisted to the database, telling a consumer that file test.img should be persisted, but your service hasn't captured it or did not transform it yet?
so ... next on
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you probably could create a WCF-service which recieves some binary-data and then stores this to a database. this request could be async. yes. but what for?
once again:
please give us more insight to your workflow: what are you exactly trying to achieve? which "environmental-conditions" to you have (eg. app A polls db and expects file-records which are referenced in table x to be persisted) ...
edit:
so you want to import a .csv-file. well that changes everything :)
but i won't advise you to use a wcf-service (there could be a usage: eg. a wcf-service which has a method to insert a single row, then your iteration through the file would be implemented in another app... not that good, though).
i would suggest following:
at first do everything in your webapp (as you've already done), but rather use some sort of bulk-insert and do your transformation/logic on the database.
if you have some sort of bottle-neck then, i would suggest you something like a minor job-service, eg:
webapp will upload the file and insert a row to a job-table. the job-service is continiously polling the table/or gets informed via wcf by the webapp (hey, hey, finally some sort of usage for WCF in your scenario... :) ) and then does the import-job, writing a finish-note to a table/or set the state of the job to finished ...
but this is a bit overkill :)
Please see if my below comments helps you to resolve your issue:
•Can I do / should I being doing this with a windows service? if so, how?
Yes you can do this with a windows service. And I think that is the way you should be doing it. You can implement your own service to process your request or you can use the open source code Job Proccessor
Basically the idea is..
You submit a request for processing
the csv file in database table with
some status as not started.
Then your windows service picks up
the request from database table which
are not started and update them as in
progress status.
Once the processing is complete
succesfully /unsuccesfuly your
service updated the database table
with status as Completed / Failed.
And your asp.net page can poll to
database table for the current status
every 5 sec or so.
•Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you should not be using WCF for this purpose.