I need to get accounts from a web service that might take over the current default of 60 seconds.
It seems that my workers are constantly going into timeout, which can be modified under the web_server_worker_timeout environmental variable in webserver group. However, this variable cannot be modified under cloud composer.
Any way to get around this?
There is no way to custom this, but in composer2 you can select the kubernetes limit cpu for the webserver.
with 0.5cpu on a ENVIRONMENT_SIZE_SMALL my webserver usually fail during 5 minutes ( the 2 workers timeout ) before reaching a stable state.
Try setting more than 0.5cpu to your webserver
Related
I know that for GCP cloud scheduler the max timeout is around 20 minutes for a HTTP request source.
Is it somehow possible, on GCP (perhaps using a different service) for me to invoke an HTTP endpoint, that takes around 65 minutes to respond, every ~6 hours?
Agreeing with the comments, it would be better if you restructure your application so that it doesn’t have to rely on such a long timeout period. This is due to the drawbacks that John Hanley commented on. As for your actual question, you could combine multiple services. For example, Cloud Run has a maximum timeout of 60 minutes, which you can set up when you deploy your service.
Now, in order to run this service every 6 hours, you can make use of Cloud Workflows. Workflows is an automation tool which can be used to combine multiple GCP services in a single automated process. It can execute Cloud Run services, and you can in turn schedule this Cloud Workflow to run every 6 hours with Cloud Scheduler.
In the end what I ended up doing is creating a micro VM instance on gcp and followed this guide to manually set up a cronjob in ubuntu:
https://www.geeksforgeeks.org/how-to-setup-cron-jobs-in-ubuntu/
I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.
I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all
A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.
Many times when I try to open the tree view or task duration page of some DAGs in the UI I get the error: 504 gateway time-out.
Sometimes after that I can't even open the page with the list of DAGs.
Do you know where this problem could come from?
The CPU and memory of the machine running Airflow seem to be fine and I use RDS for the metadata.
Thanks!
I've experienced this before as well. I believe it's caused by an HTTP request that takes longer than expected for the webserver's gunicorn worker to fulfill. For example, if you set the DAG tree view to a high setting like 365 DAG runs for a DAG with a lot of tasks, you may be able to reproduce this consistently.
Can you try bumping up the timeout settings on the webserver to see if it makes a difference?
First, try increasing web_server_worker_timeout (default = 120 seconds) under the [webserver] group.
If that doesn't resolve it, you might also try increasing web_server_master_timeout under the same group.
Another technique to try is switching the webserver worker_class (default = sync) to eventlet or gevent.
Reference: https://github.com/apache/incubator-airflow/blob/c27098b8d31fee7177f37108a6c2fb7c7ad37170/airflow/config_templates/default_airflow.cfg#L225-L229
Note that the alternative worker classes require installing Airflow with the async extras like:
pip install apache-airflow[async]
You can find more info about gunicorn worker timeouts in this question: How to resolve the gunicorn critical worker timeout error?.
We are using Gunicorn with Nginx. After every time we restart gunicorn, the CPU usage took by Gunicorn keeps on increasing gradually. This increases from 0.5% to around 85% in a matter of 3-4 days. On restarting gunicorn, it comes down to 0.5%.
Please suggest what can cause this issue and how to go forward to debug and fix this.
Check workers configuration. Try use the following: cores * 2 -1
Check your application, seems that your application is blocking / freezing threads. Add timeout to all api calls, database queries, etc.
You can add an APM software to analyze your application, for example datadog.
I'd like to use OpenShift to host an application that will consume some data from queue and put it to database. So it won't receive http requests. Is there a way to automatically scale it up? (To minimize the time data spends in queue).
Unless you are going to use a paid plan, this application would get idled after 48 hours. You can use the rhc command to scaled your application up and down manually