I have next setup:
CosmosDB, one write region (Central US), one readonly region (West Europe). I need to upload new data nightly.
While uploading test data I see throttled requests to write region (Central US). But I could not find in Azure portal if throttling happened for readonly region as well or not.
I tried to find such information in Insights - but it seems there is no grouping/filtering by region 1.
I tried to create report in Metrics 2 with no luck.
So my questions are:
Is it guaranteed that data replication could not cause throttling request? By reading that thread Can replication cause request throttling? I assumed it is not guaranteed.
If replication process could consume enough RUs and cause throttling, how can I monitor it?
P.S. I tried to use all 5 consistency levels while uploading data.
Related
The volume of our application is too huge i.e around 500million per day. We are observing some loss of events. Please share if anyone has observed loss of events while consuming using #JmsListener.
There could be many reasons:
some of your messages are producing errors and they are generating rollbacks which you didn't handle carefully
you are not using transactions while you are having connectivity / sync / infrastructure issues
it can be your configuration: not using persistent messaging, caching messages on client side within driver (etc..)
Whatever it is it is NOT issue of #JmsListner annotation, but of code, driver and messaging configuration.
We've been using Visual Studio Load Test to exercise our .NET Framework 4.7.2 telemetry client where we can set up the load test to post metrics to our Rabbit MQ at a rate of about 250 metrics per second. Recently, we've had to migrate our telemetry client to .NET Core and need to run load testing and verify that it can still post metrics at the same rate. Now, Visual Studio Load Test (VSLT) is being deprecated and has no support for .NET Core framework so we've had to look to something like NBomber to use in place of VSLT.
With regards to NBomber, there doesn't seem to be enough documentation or support that I can get because I've tried all I know and cannot get NBomber to post more than 25 metrics per second. At the same time, I'm seeing 100% CPU usage.
Anyone has any insight to share with me? Thanks in advance for your help,
Tien
Turns out, my logic was bad. A senior developer and friend shared with me some insights where I was initializing a telemetry client for each posting of a metric. This was the key to high consumption of CPU and not allowing me to reach the performance I was expecting. I'm in the process of re-coding my test(s) so that NBomber can be used to initialize 250 telemetry clients posting a minimum of 1MM metrics within an hour. I ran a fix yesterday that posted 17K metrics within 56secs with just 1 telemetry client or of about a rate of 300 RPS. I thought VS LT was awesome, but I'm thinking NBomber is quite impressive.
Cheers to Load Testing with NBomber!!
Tien
If single instance of NBomber is consuming 100% of CPU and not conducting the necessary load you will need to set up another machine and run NBomber in distributed cluster mode
Why do you need cluster?
You reached the point that the capacity of one node is not enough to create a relevant load.
You want to delegate running multiple scenarios to different nodes. For example, you want to test the database by sending in parallel read and write queries. In this case, one node can send inserts and another one can send read queries.
You want to simulate a real production load that requires several nodes to participate. For example, you may have one node that periodically writes data to the Kafka broker and two nodes that constantly read this data from the Redis cache.
Also it seems that Microsoft recommends using Apache JMeter™ so it might worth giving it a try. JMeter is capable of sending messages to various MQ implementations and its documentation is more concise, i.e. see Building a JMS Topic Test Plan
My team and I have been at this for 4 full days now, analyzing every log available to us, Azure Application Insights, you name it, we've analyzed it. And we can not get down to the cause of this issue.
We have a customer who is integrated with our API to make search calls and they are complaining of intermittent but continual 502.3 Bad Gateway errors.
Here is the flow of our architecture:
All resources are in Azure. The endpoint our customers call is a .NET Framework 4.7 Web App Service in Azure that acts as the stateless handler for all the API calls and responses.
This API app sends the calls to an Azure Service Fabric Cluster - that cluster load balances on the way in and distributes the API calls to our Search Service Application. The Search Service Application then generates and ElasticSearch query from the API call, and sends that query to our ElasticSearch cluster.
ElasticSearch then sends the results back to Service Fabric, and the process reverses from there until the results are sent back to the customer from the API endpoint.
What may separate our process from a typical API is that our response payload can be relatively large, based on the search. On average these last several days, the payload of a single response can be anywhere from 6MB to 12MB. Our searches simply return a lot of data from ElasticSearch. In any case, a normal search is typically executed and returned in 15 seconds or less. As of right now, we have already increased our timeout window to 5 minutes just to try to handle what is happening and reduce timeout errors for the fact their searches are taking so long. However, we increased the timeout via the following code in Startup.cs:
services.AddSingleton<HttpClient>(s => {
return new HttpClient() { Timeout = TimeSpan.FromSeconds(300) };
});
I've read in some places that you actually have to do this in the web.config file as opposed to here, or at least in addition to it. Not sure if this is true?
So The customer who is getting the 502.3 errors have significantly increased the volumes they are sending us over the last week, but we believe we are fully scaled to be able to handle it. They are still trying to put the issue on us, but after many days of research, I'm starting to wonder if the problem is actually on their side. Could it be possible that they are not equipped to take the increased payload on their side. Can it be that their integration architecture is not scaled enough to take the return payload from the increased volumes? When we observe our resources usages (CPU/RAM/IO) on all of the above applications, they are all normal - all below 50%. This also makes me wonder if this is on their side.
I know it's a bit of a subjective question, but I'm hoping for some insight from someone who may have experienced this before, but even more importantly, from someone who has experience with a .Net API app in Azure which return large datasets in it's responses.
Any code blocks of our API app, or screenshots from Application Insights are available to post upon request - just not sure what exactly anyone would want to see yet as I type this.
Trying to work out why some of my application servers have creeped up over 1s response times using newrelic. We're using WebApi 2.0 and MVC5.
As you can see below the bulk of the time is spent under 'WebTransaction'. The throughput figures aren't particularly high - what could be causing this, and what are the steps I can take to reduce it down?
Thanks
EDIT I added transactional tracing to this function to get some further analysis - see below:
Over 1 second waiting in System.Web.HttpApplication.BeginRequest().
Any insight into this would be appreciated.
Ok - I have now solved the issue.
Cause
One of my logging handlers which syncs it's data to cloud storage was initializing every time it was instantiated, which also involved a call to Azure table storage. As it was passed into the controller in question, every call to the API resulted in this instantiate.
It was a blocking call, so it added ~1s to every call. Once i configured this initialization to be server life-cycle wide,
Observations
As the blocking call was made at the time of the Controller being build (due to Unity resolving the dependancies at this point) New Relic reports this as
System.Web.HttpApplication.BeginRequest()
Although I would love to see this a little granular, as we can see from the transactional trace above it was in fact the 7 calls to table storage (still not quite sure why it was 7) that led me down this path.
Nice tool - my new relic subscription is starting to pay for itself.
It appears that the bulk of time is being spent in Account.NewSession. But it is difficult to say without drilling down into your data. If you need some more insight into a block of code, you may want to consider adding Custom Instrumentation
If you would like us to investigate this in more depth, please reach out to us at support.newrelic.com where we will have you account information on hand.
I have a situation where a single Oracle system is the data master for two seperate CRM Systems (PeopleSoft & Siebel). The Oracle system sends CRUD messages to BizTalk for customer data, inventory data, product info and product pricing. BizTalk formats and forwards the messages on to PeopelSoft & Siebel web service interfcaes for action. After initial synchronization of the data, the ongoing operation has created a situation where the data isn't accurate in the outlying Siebel and PeopleSoft systems despite successful delivery of the data (this is another converation about what these systems mean when they return a 'Success' to BizTalk).
What do other similar implementations do to reconcile system data in this distributed service-oriented approach? Do they run a periodic dump from all systems for comparison? Are there any other techniques or methodologies for spotting failed updates and ensuring synchronization?
Your thoughts and experiences are appreciated. Thanks!
Additional Info
So why do the systems get out of synch? Whenevr a destination syste acknolwedges to BizTalk it has received the message, it means many things. Sometimes an HTTP 200 means I've got it and put it in a staging table and I'll commit it in a bit. Sometimes this is sucessful, sometimes it is not for various data issues. Sometimes the HTTP 200 means... yes I have received and comitted the data. Using HTTP, there can be issues with ordere dlivery. All of tese problems could have been solved with a lot of architehtural planning up front. It was not done. There are no update/create timestamps to prevent un-ordered delivery from stepping on data. There is no full round trip acknowledgement of data commi from destinatin systems. All of this adds up to things getting out of synch.
(sorry this is an answer and not a comment working my way up to 50 points).
Can the data be updated in the other systems or is it essentially read only?
Could you implement some further validation in the BizTalk layer to ensure that updates wouldn't fail because of data issues?
Can you grab any sort of notification that the update failed from the destination systems which would allow you to compensate in the BizTalk layer?
FWIW in situations like this I have usually end up with a central data store that contains at least the datakeys from the 3 systems that acts as the new golden repository for the data, however this is usually to compensate for multiple update sources. Seems like we also usually operate some sort of manual error queue that users must maintain.
To your idea of batch reconciliation I have seen that be quite common to compensate for transactional errors especially in the financial services realm.