I have tornado based game server hosted on amazon cloud 1gb ram and 10gn harddisk.I have 500 users per day and cocurrent users are 30+ at agiven time . users are based around the world and I am hosting cloud machine in USA west as most of users are from USA .
I am facing Network latency issue . When i have single user reponse time is 1 second which is also high but as users move to 10+ this reponse time starts dropping to 2 seconds .for for 50+ users its 8 Seconds .
I did test and wrote test script .
test1 .
tested with same test script as mentioned above , itested on my local machine master code running on local and testscript also on local latency less than 1000ms (90% median 220ms)
run the same code on cloud , test script on same cloud same result
running game server on cloud and script on local latency 8 seconds
1. Network I/O
If you're doing network-bound operations in your request handlers (such as connecting to a database, or sending requests to another API), then use the await statements for those tasks so that Python can pause the coroutine and process other requests asynchronously.
2. CPU tasks and Disk I/O
If you're doing cpu-bound operations (any code other than network I/O that takes over a few milliseconds to run), then use IOLoop.run_in_executor to run those tasks in a separate thread. This will free up the main thread so that the CPU can run other tasks.
Tornado is a single-threaded server. That means the CPU can run only one thing at a given time. So, if a single response takes 220ms of CPU time, and if you have 10 connected clients, it would take over 2 seconds to serve the 10th client. And this time just increases *(though not always proportionally, as Python may reuse CPU cache) as the number of clients goes up.
It appears the CPU on the cloud server is not as fast as your personal CPU, hence the increased latency.
Related
I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.
I must call an external API a certain amount of time from a Google Cloud Function. I must wait for all results before responding to the client. As I want to respond as quickly as I can, I want to make these calls async. But as I can get many calls (lets say, 250 or 1000+ calls), I'm wondering if there is a limit (there is certainly one..). I looked for the answer online, but all the things I found is about calling a CloudFunction concurrently which is not my problem here. I found some information about NodeJs, but nothing related to the CloudFunctions.
I'm using firebase.
I would also like to know if there is an easy way in in CloudFunction to use the maximum number of concurrent connections and queue the rest of the calls?
On Cloud Functions each request is processed by 1 instance of the function and the instance can process only one request in the same time (no concurrent request processing -> 2 concurrents request create 2 Cloud Functions instances).
Alternatively, with Cloud Run, you can process up to 80 request in the same time on the same instance. And so, for the same number of concurrent request, you have less instances (up to 80 time less) and because you pay the CPU and the memory, you will pay less with Cloud Run. I wrote an (old) article on this
The number of instance of a same Cloud Functions has been removed (previously, it was 1000). So, you don't have limit in the scalability (even if there is a physical limit when the region don't have enough resources to create a new instance of your function).
About the queue... There is not really a queue. the request is kept few seconds (about 10) and wait a new instance creation or a free instance (which just finish to process another request). After 10s, a 429 HTTP error code is returned.
About concurrent request on Cloud Functions I tested to call up to 25000 request in the same time and it works without issue.
However, you are limited by the function capacity (only 1 CPU, concurrency is limited) and the memory (boost the memory, boost the CPU speed and allows to handle more concurrent request -> I got a out of memory with 256Mb and 2500 concurrent requests test)
I performed the test in Go
I'm executing a load test against an application hosted in Azure. It's a cloud service with 3 instances behind an internal load balancer (Hash based load balancing mode).
When I execute the load test, it queues request even though the req/sec and total current request to IIS is quite low. I'm not sure what could be the problem.
Any suggestions?
Adding few screenshot of performance counters which might help you take decision.
Click on image to view original image.
Edit-1: Per request from Rohit Rajan,
Cloud Service is having 2 instances (meaning 2 VMs), each of them having 14 GBs of RAM and 8 cores.
I'm executing a Step load pattern start with 100 and add 100,150 user every 5 minutes, till 4-5 hours until the load reaches to 10,000 VUs.
Any call to external system are written async. Database calls are synchronous.
There is no straight forward answer to your question. One possible way would be to explore additional investigation options.
Based on your explanation, there seems to be a bottleneck within the application which is causing the requests to queue-up.
In order to investigate this, collect a memory dump when you see the requests queuing up and then use DebugDiag to run a hang analysis on it.
There are several ways to gather the memory dump.
Task Manager
Procdump.exe
Debug Diagnostics
Process Explorer
Once you have the memory dump you can install debug diag and then run analysis on it. It will generate a report which can help you get started.
Debug Diagnostics download: https://www.microsoft.com/en-us/download/details.aspx?id=49924
I have a Tomcat Server running which connects to another Tableau server. I need to make about 25 GET calls from Tomcat to Tableau. Now I am trying to thread this and let each thread create its own HTTP connection object and make the call. On my local system (Tomcat local, Tableau is remote), I notice that in this case each of my thread takes about 10 seconds average, so in total 10 seconds.
However, if I do this sequentially, each request takes 2 seconds, thereby total of 50.
My doubt is, when making requests in parallel, why does each take more than 2 seconds when it takes just 2 when done sequentially?
Does this have anything to do with maximum concurrent connections to same domain from one client (browser)? But here the request is going from my Tomcat server, not browser.
If yes, what is the default rule and is there any way to change that?
In my opinion, its most likely Context Switching Overhead that the system has to go through for each request and that is why you see longer times for individual requests ( compared to one sequential thread ) but significant gain in overall processing.
It makes sense to go for parallel processing when Context Switching Overhead is negligible compared to time taken in overall activity.
I have a Wookie-based app accepting requests behind nginx. The app works in general, but I'm running into some issues with parallel requests. For instance, when the app accepts a long-running request (R1) to generate a report from a dataset in the database (mongodb, via cl-mongo), it would appear unresponsive to any following request (R2) that comes in before the response to R1 starts being sent over the network.
The client reports an error in communicating with the server for R2, but after the server finishes with R1 and finally sends the reponse, it tries to process R2 (as evident from debugging output) -- performs proper routing etc (only too late).
Putting blackbird promises around the request processing routines didn't help (and was probably excessive anyway as Wookie is designed to be async).
So what's the proper way to handle this? I'm probably okay with clients waiting for a long time for their responses (via very long timeouts), but it would be much better to process short requests in parallel.
The idea of the underlying libraries (libevent2, libuv) of cl-async, is to use IO wait time of one task (request) for CPU time of another task (request). So it is just a mechanism to not waste IO wait time. The only thing happening in parallel is IO and at most one task using the CPU at a time (per thread/process depending on implementation).
If your requests need on avarage x ms of CPU time, then as soon as you have n requests in parallel, where n is the number of cores, your n+1st requests has to wait at least x ms, regardless of whether you use a threaded or event based server.
You can of course spawn more server processes and use load balancing to make use of all available cores.