We have built a microservice architecture but have run into an issue where the messages going on to the bus are too large. (Discovered since moving to Azure Service bus as this only allows 256KB compared to RabbitMQ 4MB)
We have a design as the below diagram. Where we're struggling is with the data being returned.
An example is when performing a search and returning multiple results.
To step through our current process:
Web client sends a http request to the Web Api.
Web api then puts appropriate message on to the bus. (Web api responds to client with an Accepted response)
Microservice picks up this message.
Microservice queries its database for the records matching search criteria.
Results returned from database.
A SearchResult message is added to the bus. (This contains the results)
Our response microservice is listening for this SearchResult message.
The response microservice then posts to our SignalR api.
SignalR Api sends the results back to the web client.
My question is how do we deal with large results sets when designed in this way? If it's not possible how should the design be changed to handle large results sets?
I understand we could page the results but even so one result could be over the 256KB allowance, for example a document or a particularly large object.
There are 2 ways :-
Use Kafka like system which support large size messages.
If you can't go with the 1st approach (that's appear from your question), then Microservices can place 2 types of messages for response service
(1.) If size is small then place the complete message and
(2.) If size is more than supported then place message that contain link to Azure Storage Blob which have result
Based on message, response service can get proper result and return the same to Client.
Related
My team and I have been at this for 4 full days now, analyzing every log available to us, Azure Application Insights, you name it, we've analyzed it. And we can not get down to the cause of this issue.
We have a customer who is integrated with our API to make search calls and they are complaining of intermittent but continual 502.3 Bad Gateway errors.
Here is the flow of our architecture:
All resources are in Azure. The endpoint our customers call is a .NET Framework 4.7 Web App Service in Azure that acts as the stateless handler for all the API calls and responses.
This API app sends the calls to an Azure Service Fabric Cluster - that cluster load balances on the way in and distributes the API calls to our Search Service Application. The Search Service Application then generates and ElasticSearch query from the API call, and sends that query to our ElasticSearch cluster.
ElasticSearch then sends the results back to Service Fabric, and the process reverses from there until the results are sent back to the customer from the API endpoint.
What may separate our process from a typical API is that our response payload can be relatively large, based on the search. On average these last several days, the payload of a single response can be anywhere from 6MB to 12MB. Our searches simply return a lot of data from ElasticSearch. In any case, a normal search is typically executed and returned in 15 seconds or less. As of right now, we have already increased our timeout window to 5 minutes just to try to handle what is happening and reduce timeout errors for the fact their searches are taking so long. However, we increased the timeout via the following code in Startup.cs:
services.AddSingleton<HttpClient>(s => {
return new HttpClient() { Timeout = TimeSpan.FromSeconds(300) };
});
I've read in some places that you actually have to do this in the web.config file as opposed to here, or at least in addition to it. Not sure if this is true?
So The customer who is getting the 502.3 errors have significantly increased the volumes they are sending us over the last week, but we believe we are fully scaled to be able to handle it. They are still trying to put the issue on us, but after many days of research, I'm starting to wonder if the problem is actually on their side. Could it be possible that they are not equipped to take the increased payload on their side. Can it be that their integration architecture is not scaled enough to take the return payload from the increased volumes? When we observe our resources usages (CPU/RAM/IO) on all of the above applications, they are all normal - all below 50%. This also makes me wonder if this is on their side.
I know it's a bit of a subjective question, but I'm hoping for some insight from someone who may have experienced this before, but even more importantly, from someone who has experience with a .Net API app in Azure which return large datasets in it's responses.
Any code blocks of our API app, or screenshots from Application Insights are available to post upon request - just not sure what exactly anyone would want to see yet as I type this.
I'm using Node.js and ws for my WebSocket servers and want to know the best practice methods of tracking connections and incoming and outgoing messages with Azure Azure Application Insights.
It appears as though this service is really only designed for HTTP requests and responses so would I be fine if I tracked everything as an event? I'm currently passing the JSON.parse'd connection message values.
What to do here really depends on the semantics of your websocket operations. You will have to track these manually since the Application Insights SDK can't infer the semantics to map to Request/Dependency/Event/Trace the same way it can for HTTP. The method names in the API do indeed make this unclear for non-HTTP, but it becomes clearer if you map the methods to the telemetry schema generated and what those item types actually represent.
If you would consider a receiving a socket message to be semantically beginning an "operation" that would trigger dependencies in your code, you should use trackRequest to record this information. This will populate the information in the most useful way for you to take advantage of the UI in the Azure Portal (eg. response time analysis in the Performance blade or failure rate analysis in the Failures blade). Because this request isn't HTTP, you'll have to mend your data to fit the schema a bit. An example:
client.trackRequest({name:"WS Event (low cardinality name)", url:"WS Event (high cardinality name)", duration:309, resultCode:200, success:true});
In this example, use the name field to describe that items that share this name are related and should be grouped in the UI. Use the url field as information that more completely describes the operation (like GET parameters would in HTTP). For example, name might be "SendInstantMessage" and url might be "SendInstantMessage/user:Bob".
In the same way, if you consider sending a socket message to be request for information from your app, and has meaningful impact how your "operation" acts, you should use trackDependency to record this information. Much like above, doing this will populate the data in most useful way to take advantage of the Portal UI (Application Map in this case would then be able to show you % of failed Websocket calls)
If you find you're using websockets in a way that doesn't really fit into these, tracking as an event as you are now would be the correct use of the API.
I have a .NET web service hosted in IIS. The web service has been used by clients over the past few years and there has been occassional timeout events when the client is on a slow connection (e.g. GPRS). On the other hand the clients sometimes have to POST some data to another web page (part of an ASP.NET web app) and usually the size of the data in the POST requests is bigger than the actual payloads in the web service calls. However the POST requests are far quicker as compared to the web service calls.
To establish this further I created a test web service with one method and another single web page with exactly the same operation i.e. receive 100K and send back 100K (random bytes) and I used a test client to call the web service method as well as did a post to the web page and got a response back using the same client. The difference in receiving a reply back from the web service and a response back from the web post request is huge i.e. about 1200 ms. Why is that the case? Is there any such configuration on the web service that would make such a big difference? Is it SOAP call stack? Serialization/Desrialization?
A number of factors could be contributing to this.
The first thing that leaps to mind for me is that SOAP could be considered a verbose protocol. That is, there's a LOT of data in the XML payload going both ways. XML is verbose in and of itself, and it's not exactly the fastest thing in the universe to process. Sure, you can use an optimized library to process it's data, but it'll be parsed out into object trees, then you can walk the nodes to drill down to the data you want. Unless you're using XPath, which will just do the same darned thing.
This is all presuming that you're actually using SOAP. And that your WebService is correctly configured. And that no packet loss is occurring while connecting to the Web Service. And that your firewall isn't creating issues. And that there's no encryption/decryption overhead.
In my own experience, one thing that frequently causes signficant slowdowns server side is one or more thrown exceptions. Try a Fiddler trace.
I'm using Mate's RemoteObjectInvoker to call methods in my FluorineFX based API. However, all requests seem to be sent to the server sequentiality. That is, if I dispatch a group of messages at the same time, the 2nd one isn't sent until the first returns. Is there anyway to change this behavior? I don't want my app to be unresponsive while a long request is processing.
This thread will help you to understand what happens (it talks about blazeds/livecylce but I assume that Fluorine is using the same approach). In a few words what happens is:
a)Flash player is grouping all your calls in one HTTP post.
b)The server(BlazeDs,Fluorine etc) receives the request and starts to execute the methods serially, one after another.
Solutions
a)Have one HTTP post per method, instead of one HTTP post containing all the AMF messages. For that you can use HTTPChannel instead of AMFChannels (internally it is using flash.net.URLLoader instead of flash.net.NetConnection). You will be limited to the maximum number of parallel connection defined by your browser.
b)Have only one HTTP post but implement a clever solution on the server (it will cost you a lot of development time). Basically you can write your own parallel processor and use message consumers/publishers in order to send the result of your methods to the client.
c)There is a workaround similar to a) on https://bugs.adobe.com/jira/browse/BLZ-184 - create your remoteobject by hand and append a random id at the end of the endpoint.
I have developed a chat web application which uses a SqlServer database for exchanging messages.
All clients poll every x seconds to check for new messages.
It is obvious that this approach consumes many resources, and I was wondering if there is a "cheaper" way of doing that.
I use the same approach for "presence": checking who is on.
Without using a browser plugin/extension like flash or java applet, browser is essentially a one way communication tool. The request has to be initiated by the browser to fetch data. You cannot 'push' data to the browser.
Many web app using Ajax polling method to simulate a server 'push'. The trick is to balance the frequency/data size with the bandwidth and server resources.
I just did a simple observation for gmail. It does a HttpPost polling every 5 seconds. If there's no 'state' change, the response data size is only a few bytes (not including the http headers). Of course google have huge server resources and bandwidth, that's why I mention: finding a good balance.
That is "Improving user experience vs Server resource". You might need to come out with a creative way of polling strategy, instead of a straightforward polling every x seconds.
E.g. If no activity from party A, poll every 3 seconds. While party A is typing, poll every 5 seconds. This is just a illustraton, you can play around with the numbers, or come out with a more efficient one.
Lastly, the data exchange. The challenge is to find a way to pass minimum data sizes to convey the same info.
my 2 cents :)
For something like a real-time chat app, I'd recommend a distributed cache with a SQL backing. I happen to like memcached with the Enyim .NET provider, so I'd do something like the following:
User posts message
System writes message to database
System writes message to cache
All users poll cache periodically for new messages
The database backing allows you to preload the cache in the event the cache is cleared or the application restarts, but the functional bits rely on in-memory cache, rather than polling the database.
If you are using SQL Server 2005 you can look at Notification Services. Granted this would lock you into SQL 2005 as Notification Services was removed in SQL 2008 it was designed to allow the SQL Server to notify client applications of changes to the database.
If you want something a little more scalable, you can put a couple of bit flags on the Users record. When a message for the user comes in change the bit for new messages to true. When you read the messages change it to 0. Same for when people sign on and off. That way you are reading a very small field that has a damn good chance of already being in cache.
Do the workflow would be ready the bit. If it's 1 then go get the messages from the message table. If it's 0 do nothing.
In ASP.NET 4.0 you can use the Observer Pattern with JavaScript Objects and Arrays ie: AJAX JSON calls with jQuery and or PageMethods.
You are going to always have to hit the database to do analysis on whether there is any data to return or not. The trick will be on making those calls small and only return data when needed.
There are two related solutions built-in to SQL Server 2005 and still available in SQL Server 2008:
1) Service Broker, which allows subscribers to post reads on queues (the RECEIVE command with WAIT..). In your case you would want to send your message through the database by using Service Broker Services fronting these Queues, which could then be picked up by the waiting clients. There's no polling, the waiting clients just get activated when a message is received.
2) Query Notifications, which allow a subscriber to define a Query, and the receive notifications when the dataset that would result from executing that query would change. Built on Service Broker, Query Notifications are somewhat easier to use, but may also be somewhat less efficient. (Not that Query Notifications and their siblings, Event Notifications are frequently mistaken for Notification Services (NS), which causes concern because NS is decommitted in 2008, however, Query & Event Notifications are still fully available and even enhanced in SQL Server 2008).