Maximum number of concurrent http connections in Google Cloudfunction - http

I must call an external API a certain amount of time from a Google Cloud Function. I must wait for all results before responding to the client. As I want to respond as quickly as I can, I want to make these calls async. But as I can get many calls (lets say, 250 or 1000+ calls), I'm wondering if there is a limit (there is certainly one..). I looked for the answer online, but all the things I found is about calling a CloudFunction concurrently which is not my problem here. I found some information about NodeJs, but nothing related to the CloudFunctions.
I'm using firebase.
I would also like to know if there is an easy way in in CloudFunction to use the maximum number of concurrent connections and queue the rest of the calls?

On Cloud Functions each request is processed by 1 instance of the function and the instance can process only one request in the same time (no concurrent request processing -> 2 concurrents request create 2 Cloud Functions instances).
Alternatively, with Cloud Run, you can process up to 80 request in the same time on the same instance. And so, for the same number of concurrent request, you have less instances (up to 80 time less) and because you pay the CPU and the memory, you will pay less with Cloud Run. I wrote an (old) article on this
The number of instance of a same Cloud Functions has been removed (previously, it was 1000). So, you don't have limit in the scalability (even if there is a physical limit when the region don't have enough resources to create a new instance of your function).
About the queue... There is not really a queue. the request is kept few seconds (about 10) and wait a new instance creation or a free instance (which just finish to process another request). After 10s, a 429 HTTP error code is returned.
About concurrent request on Cloud Functions I tested to call up to 25000 request in the same time and it works without issue.
However, you are limited by the function capacity (only 1 CPU, concurrency is limited) and the memory (boost the memory, boost the CPU speed and allows to handle more concurrent request -> I got a out of memory with 256Mb and 2500 concurrent requests test)
I performed the test in Go

Related

Facing latency issue in Azure EventHub consumer

How to avoid latency in EventHub consumer data ?
My Architecture (data flow): IOTHub -> EventHub -> BlobStorage (No deviation from IOTHub packet to Blob Storage JSON packet)
Deviation occurs only on consumer application side (Listener is receiving with delay of 30-50 seconds)
Azure Configuration: 4 Partitions with standard S2 tier subscription.
Publisher: 3000 packets per minute.
My question: BlobStorage has proper data without deviation, but why listener part is receiving with latency. How could I overcome this ?
Tried with EventProcessorClient with respective handlers as suggested in GitHub sample code. Works fine without error. But having huge latency.Tried EventHubProducerClient as well. still same latency issue.
I can't speak to how IoT Hub manages data internally or what it's expected latency between IoT data being received and when IoT Hub itself publishes to Event Hubs.
With respect to Event Hubs, you should expect to see individual events with varying degrees of end-to-end latency. Event Hubs is optimized for throughput (the number of events that flow through the system) and not for the latency of an individual event (the amount of time it takes for it to flow from publisher to consumer).
What I'd suggest monitoring is the backlog of events available to be read in a partition. If there are ample events already available in the partition and you’re not seeing them flow consistently through the processor as fast as you’re able to process them, that’s something we should look to tune.
Additional Event Hubs context
When an event is published - by IoT Hub or another producer - the operation completes when the service acknowledged receipt of the event. At this point, the service has not yet committed the event to a partition, and it is not available to be read. The time that it takes for an event to be available for reading varies and has no SLA associated with it. Most often, it’s milliseconds but can be several seconds in some scenarios – for example, if a partition is moving between nodes.
Another thing to keep in mind is that networks are inherently unreliable. The Event Hubs consumer types, including EventProcessorClient, are resilient to intermittent failures and will retry or recover, which will sometimes entail creating a new connection, opening a new link, performing authorization, and positioning the reader. This is also the case when scaling up/down and partition ownership is moving around. That process may take a bit of time and varies depending on the environment.
Finally, it's important to note that overall throughput is also limited by the time that it takes for you to process events. For a given partition, your handler is invoked and the processor will wait for it to complete before it sends any more events for that partition. If it takes 30 seconds for your application to process an event, that partition will only see 2 events per minute flow through.

Queue system recommendation approach

We have a bus reservation system running in GKE in which we are handling the creation of such reservations with different threads. Due to that, CRUD java methods can sometimes run simultaneously referring to the same bus, resulting in the save in our DB of the LAST simultaneous update only (so the other simultaneous updates are lost).
Even if the probabilities are low (the simultaneous updates need to be really close, 1-2 seconds), we need to avoid this. My question is about how to address the solution:
Lock the bus object and return error to the other simultaneous requests
In-memory map or Redis caché to track the bus requests
Use GCP Pub/Sub, Kafka or RabbitMQ as a queue system.
Try to focus the efforts on reducing the simultaneous time window (reduce from 1-2 seconds up to milliseconds)
Others?
Also, we are worried if in the future the GKE requests handling scalability may be an issue. If we manage a relatively higher number of buses, should we need to implement a queue system between the client and the server? Or GKE load balancer & ambassador will already manages it for us? In case we need a queue system in the future, could it be used also for the collision problem we are facing now?
Last, the reservation requests from the client often takes a while. Therefore, we are changing the requests to be handled asynchronously with a long polling approach from the client to know the task status. Could we link this solution to the current problem? For example, using the Redis caché or the queue system to know the task status? Or should we try to keep the requests synchronous and focus on reducing the processing time (it may be quite difficult).

Maximum concurrent http connections for same domain between two servers

I have a Tomcat Server running which connects to another Tableau server. I need to make about 25 GET calls from Tomcat to Tableau. Now I am trying to thread this and let each thread create its own HTTP connection object and make the call. On my local system (Tomcat local, Tableau is remote), I notice that in this case each of my thread takes about 10 seconds average, so in total 10 seconds.
However, if I do this sequentially, each request takes 2 seconds, thereby total of 50.
My doubt is, when making requests in parallel, why does each take more than 2 seconds when it takes just 2 when done sequentially?
Does this have anything to do with maximum concurrent connections to same domain from one client (browser)? But here the request is going from my Tomcat server, not browser.
If yes, what is the default rule and is there any way to change that?
In my opinion, its most likely Context Switching Overhead that the system has to go through for each request and that is why you see longer times for individual requests ( compared to one sequential thread ) but significant gain in overall processing.
It makes sense to go for parallel processing when Context Switching Overhead is negligible compared to time taken in overall activity.

Logging Web API calls MVC 4

I need to log to the database every call to my Web API.
Now of course I don't want to go to my database on every call.
So lets say I have a dictionary or a hash table object in my cache,
and every 10000 records I go to the database.
I still don't want this every 10000 user to wait for this operation.
And I can't start a different thread for long operations since the application pool
can be recycled basically on anytime.
What is the best solution for this scenario?
Thanks
I would argue that your view of durability is rather inconsistent. Your cache of 10000 objects could also be lost at any time due to an app pool recycle or server crash.
But to the original question of how to perform a large operation without causing the user to wait:
Put constraints on app pool recycling and deal with the potential data loss.
Periodically dump the cached messages to a Windows service for further processing. This is still not 100% guaranteed to preserve data, e.g. the service/server could crash.
Use a message queue (MSMQ), possibly with WCF. A message queue can persist to disk, so this can be considered reasonably reliable.
Message Queuing (MSMQ) technology enables applications running at
different times to communicate across heterogeneous networks and
systems that may be temporarily offline. Applications send messages to
queues and read messages from queues.
Message Queuing provides guaranteed message delivery, efficient
routing, security, and priority-based messaging. It can be used to
implement solutions to both asynchronous and synchronous scenarios
requiring high performance.
Taking this a step further...
Depending on your requirements and/or environment, you could probably eliminate your cache, and write all messages immediately (and rapidly) to a message queue and not worry about performance loss or a large write operation.

Is ASP.NET multithreaded (how does it execute requests)

This might be a bit of a silly question but;
If I have two people logging on to my site at exactly the same time, will the server side code be executed one after the other or will they be executed simultaneously in separate threads?
I'm curious in regards to a denial of service attack on a website login. Does the server slow down because it has a massive queue of logins or is it slow because it has a billion simultaneous logins!
This is not related to ASP.NET per se (I have very little knowledge in that area), but generally web servers. Most web servers use threads (or processes) to handle requests, so basically, whatever snippet of code you have will be executed for both connections in parallel. Of course, if you access a database or some other backend system where a lock is placed, allowing just one session to perform queries, you might have implicitly serialized all requests.
Web servers typically have a minimum and maximum number of workers, which are tuned to the current hardware (CPUs, memory, etc). If these are exhausted, new requests will be queued waiting for a worker to become available, or until a maximum queue length of pending requests has been reached at which point it disregards new connections, effectively denying service (if this is on purpose, it's called a denial of service or DoS attack).
So, in your terms it's a combination, it's a huge number of simultaneous requests filling up the queue.
It should use a thread pool. Note that they are still in the same application, so application level items like static variables are still shared between them.
from this article
"Remember ISAPI is multi-threaded so requests will come in on multiple threads through the reference that was returned by ApplicationDomainFactory.Create(). Listing 1 shows the disassembled code from the IsapiRuntime.ProcessRequest method that receives an ISAPI ecb object and server type as parameters. The method is thread safe, so multiple ISAPI threads can safely call this single returned object instance simultaneously."
So yes, in the case of a DOS attack, it would be slow because of the large number of connections
As others said, most webservers use multiple processes or threads (better) to serve multiple requests at a time. In particular, you can set each ASP.NET application pool with a max number of queued requests and max worker processes. Each process has multiple threads up to a maximum (not configurable AFAIK, I may be wrong), and incoming requests are processed on a first-in-first-out basis.
Moreover, ASP.NET processes one single request for each session - but a malicious user can open as many sessions as she wants.
Multiple logins will probably hit the database and bring it to its knees probably before the webserver itself.
As far as I know, there is not a built-in way to throttle ASP.NET requests other than setting the max number of queued requests (waiting to be processed). This number should be ideally very small. You can monitor the number of queued ASP.NET requests using performance counters. Say you find that, on peak traffic, this number is 100. You can then update application so that it refuses login attempts when this number is above 100 so that the database is not hit (never did that, just a thought).

Resources