I have a node.js service running on CloudRun (docker image based on node:14-alpine - eu west1 region for cloud run and firebase rtdb) with an average load of ~1 req/sec. Some of the requests experience high latency (a few seconds) for outgoing http requests to Homegraph API or firebase realtime database.
The average latency is:
<.21s - 50%
< 1s - 95%
< 2s - 99%
Things I've tried
increasing the memory and the CPUs assigned to the service
setting 1 instance always available (Minimum number of instances)
using keepAlive in nodejs - this does improve things, but not that much
creating a pool of firebase applications so each request has it's own app/db connection
What can/should I try next? Is there an issue with Cloud Run?
For example 1 sec call to homegraph:
Or a 4 sec call to firebase db (getUserDeviceTraits):
This is how a cold start looks (loading secrets, etc.):
Later edit (18 oct. 2021):
It simply solved itself out after a while and now it started happening again about a week ago. So after months where 99th was 1sec top and the average ~ 200msec, now the 99th has spikes of up to 12sec. Already tried updating all the dependencies (just in case). It's definitely not code related since I did no change in that period.
Related
I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.
I have tornado based game server hosted on amazon cloud 1gb ram and 10gn harddisk.I have 500 users per day and cocurrent users are 30+ at agiven time . users are based around the world and I am hosting cloud machine in USA west as most of users are from USA .
I am facing Network latency issue . When i have single user reponse time is 1 second which is also high but as users move to 10+ this reponse time starts dropping to 2 seconds .for for 50+ users its 8 Seconds .
I did test and wrote test script .
test1 .
tested with same test script as mentioned above , itested on my local machine master code running on local and testscript also on local latency less than 1000ms (90% median 220ms)
run the same code on cloud , test script on same cloud same result
running game server on cloud and script on local latency 8 seconds
1. Network I/O
If you're doing network-bound operations in your request handlers (such as connecting to a database, or sending requests to another API), then use the await statements for those tasks so that Python can pause the coroutine and process other requests asynchronously.
2. CPU tasks and Disk I/O
If you're doing cpu-bound operations (any code other than network I/O that takes over a few milliseconds to run), then use IOLoop.run_in_executor to run those tasks in a separate thread. This will free up the main thread so that the CPU can run other tasks.
Tornado is a single-threaded server. That means the CPU can run only one thing at a given time. So, if a single response takes 220ms of CPU time, and if you have 10 connected clients, it would take over 2 seconds to serve the 10th client. And this time just increases *(though not always proportionally, as Python may reuse CPU cache) as the number of clients goes up.
It appears the CPU on the cloud server is not as fast as your personal CPU, hence the increased latency.
I recently started using AWS CodeDeploy and noticed that the AllowTraffic step consistently takes between 3 and 4 minutes per instance. I've configured the health check interval to be 10 seconds and the health threshold to be 2, so I expect it to take 20 seconds. I'm using a Network Load Balancer.
I have polled on the NLB's deployment group using describe-target-health and confirmed that the target is in the initial state for the 3+ minutes that CodeDeploy is waiting. I have also confirmed that the server on the health check port is responsive at the very beginning of the three minutes.
What are other possible reasons for CodeDeploy / NLB to be so slow?
The extra time you are witnessing is not due to health check but because of initial registration of a target to the NLB takes time.
When you register a new target to your Network Load Balancer, it is expected to take between 30 and 90 seconds (can go upto 120 seconds) to complete the registration process. After registration is complete, the Network Load Balancer health check systems will begin to send health checks to the target. A newly registered target must pass health checks for the configured interval to enter service and receive traffic.
For example, if you configure your health check for a 10 second interval, and require 2 health checks to become healthy, so the the minimum time an instance will start receiving traffic is [30sec -to- 120sec] (registration) + [20sec] (health check)
An ALB is not affected by this initial registration delay so it registers instances much faster. It is just how the NLB operates at this point in time.
According to the Google Vision documentation, the maximum number of image files per request is 16. Elsewhere, however, I'm finding that the maximum number of requests per minute is as high as 1800. Is there any way to submit that many requests in such a short period of time from a single machine? I'm using curl on a Windows laptop, and I'm not sure how to go about submitting a second request before waiting for the first to finish almost a minute later (if such a thing is possible).
If you want to request 1800 images, you can group 16 images per request (1800/16) you will need 113 request.
On the other hand, if the limit is 1800 requests per minute and each request can contain 16 images, then you can process 1800 * 16 = 28800 images per minute.
Please consider that docs says: These limits apply to each Google Cloud Platform Console project and are shared across all applications and IP addresses using that project. So it doesn't matter if requests are sent from a single o many machines.
Cloud Vision can receive parallel requests, so your app should be prepared to manage this amount of requests/responses. You may want to check this example and then use threads in your preferred programming language for sending/receiving parallel operations.
In the DynamoDB documentation and in many places around the internet I've seen that single digit ms response times are typical, but I cannot seem to achieve that even with the simplest setup. I have configured a t2.micro ec2 instance and a DynamoDB table, both in us-west-2, and when running the command below from the aws cli on the ec2 instance I get responses averaging about 250 ms. The same command run from my local machine (Denver) averages about 700 ms.
aws dynamodb get-item --table-name my-table --key file://key.json
When looking at the CloudWatch metrics in the AWS console it says the average get latency is 12 ms though. If anyone could tell me what I'm doing wrong or point me in the direction of information where I can solve this on my own I would really appreciate it. Thanks in advance.
The response times you are seeing are largely do to the cold start times of the aws cli. When running your get-item command the cli has to get loaded into memory, fetch temporary credentials (if using an ec2 iam role when running on your t2.micro instance), and establish a secure connection to the DynamoDB service. After all that is completed then it executes the get-item request and finally prints the results to stdout. Your command is also introducing a need to read the key.json file off the filesystem, which adds additional overhead.
My experience running on a t2.micro instance is the aws cli has around 200ms of overhead when it starts, which seems inline with what you are seeing.
This will not be an issue with long running programs, as they only pay a similar overhead price at start time. I run a number of web services on t2.micro instances which work with DynamoDB and the DynamoDB response times are consistently sub 20ms.
There are a lot of factors that go into the latency you will see when making a REST API call. DynamoDB can provide latencies in the single digit milliseconds but there are some caveats and things you can do to minimize the latency.
The first thing to consider is distance and speed of light. Expect to get the best latency when accessing DynamoDB when you are using an EC2 instance located in the same region. It is normal to see higher latencies when accessing DynamoDB from your laptop or another data center. Note that each region also has multiple data centers.
There are also performance costs from the client side based on the hardware, network connection, and programming language that you are using. When you are talking millisecond latencies the processing time on your machine can make a difference.
Another likely source of the latency will be the TLS handshake. Establishing an encrypted connection requires multiple round trips and computation on both sides to get the encrypted channel established. However, as long as you are using a Keep Alive for the connection you will only pay this overheard for the first query. Successive queries will be substantially faster since they do not incur this initial penalty. Unfortunately the AWS CLI isn't going to keep the connection alive between requests, but the AWS SDKs for most languages will manage this for you automatically.
Another important consideration is that the latency that DynamoDB reports in the web console is the average. While DynamoDB does provide reliable average low double digit latency, the maximum latency will regularly be in the hundreds of milliseconds or even higher. This is visible by viewing the maximum latency in CloudWatch.
They recently announced DAX (Preview).
Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. For more information, see In-Memory Acceleration with DAX (Preview).