How to auto scale AWS Memcached cluster node? - terraform-provider-aws

I am using Memcached cluster with one node. I did setup an alarms which will kickoff either 50% CPU or 50% Memory. Can I add a node when alarm gets triggered using auto scaling? fyi: I'm using Terraform to build my AWS infrastructure. Thanks in advance.
Created Memcached cluster with one node using terraform
This is what I got from AWS re:Post
https://repost.aws/questions/QU3g1RkuCBQRKGLy9c8Cc3sw#ANmom5ysKdSDCeIyRI_d0qQg

Related

ECS Cluster died, no logs, no alarms

We're running a platform made out of 5 clusters. One of the clusters died. We're using Kibana because its cheaper than Cloudwatch (log router with fluentbit). The 14 hour span that the cluster was dead shows 0 logs on Kibana, and we have no idea what happened to the cluster. A simple restart of the cluster fixed our issue. So, to make sure it doesn't die again while we're away, we need to set it up so it automatically restarts. Dev did not implement a cluster health check. We're using Kibana, so I can't use Cloudwatch to implement metrics, alarms and actions. What do I do here? How do I make the cluster restart itself when Kibana detects no incoming logs from it? Thank you.

How to setup Airflow > 2.0 high availability cluster on centos 7 or above

I want to setup HA for airflow(2.3.1) on centos7. Messaging queue - Rabbitmq and metadata db - postgres. Anybody knows how to setup it.
Your question is very large, because the high availability has multiple level and definition:
Airflow availability: multiple scheduler, multiple workers, auto scaling to avoid pressure, high storage volume, ...
The databases: a HA cluster for Rabbitmq and a HA cluster for postgres
Even if you have the first two levels, how many node you want to use? you cannot put everything in the same node, you need to run one service replica per node
Suppose you did that, and now you have 3 different nodes running in the same data center, what if there is a fire in the data center? So you need to use multiple nodes in different regions
After doing all of above, is there a risk for network problem? of course there is
If you just want to run airflow in HA mode, you have multiple option to do that on any OS:
docker compose: usually we use it for developing, but you can use it for production too, you can create multiple scheduler instances, with multiple workers, it can help you to improve the availability of your service
docker swarm: similar to docker compose with additional features (scaling, multi nodes, ...), you will not find much resources to install it, but you can use the compose files and just do some changes
kubernetes: the best solution, K8S can help you to ensure the availability of your services, easy install with helm
or just running the different services on your host: not recommended, because of manual tasks, and applying the HA is complicated

AWS Batch executor with Airflow

I'm currently using airflow on Amazon Web services using EC2 instances. The big issue is that the average usage of the instances are about 2%...
I'd like to use a scalable architecture and creating instances only for the duration of the job and kill it. I saw on the roadmap that AWS BATCH was suppose to be an executor in 2017 but no new about that.
Do you know if it possible to use AWS BATCH as an executor for all airflow jobs ?
Regards,
Romain.
There is no executor, but an operator is available from version 1.10. After you create an Execution Environment, Job Queue and Job Definition on AWS Batch, you can use the AWSBatchOperator to trigger Jobs.
Here is the source code.
Currently there is a SequentialExecutor, a LocalExecutor, a DaskExecutor, a CeleryExecutor and a MesosExecutor. I heard they're working on AIRFLOW-1899 targeted for 2.0 to introduce a KubernetesExecutor. So, looking at Dask and Celery it doesn't seem they support a mode where their workers are created per task. Mesos might, Kubernetes should, but then you'd have to scale the clusters for the workers accordingly to account for turning off the nodes when un-needed.
We did a little work to get a cloud formation setup where celery workers scale out and in based on metrics from cloud-watch of the average cpu load across the tagged workers.
You would need to create a custom Executor (extended from BaseExecutor) capable of submitting and monitoring the AWS Batch jobs. Also may need to create a custom Docker image for the instances.
I found this repository in my case is working quite well https://github.com/aelzeiny/airflow-aws-executors I'm using Batch jobs with FARGATE_SPOT with computation engine.
I'm just struggling with the logging on AWS CloudWatch and the return status in AWS batch but from Airflow perspective is working

How do I setup an Airflow of 2 servers?

Trying to split out Airflow processes onto 2 servers. Server A, which has been already running in standalone mode with everything on it, has the DAGs and I'd like to set it as the worker in the new setup with an additional server.
Server B is the new server which would host the metadata database on MySQL.
Can I have Server A run LocalExecutor, or would I have to use CeleryExecutor? Would airflow scheduler has to run on the server that has the DAGs right? Or does it have to run on every server in a cluster? Confused as to what dependencies there are between the processes
This article does an excellent job demonstrating how to cluster Airflow onto multiple servers.
Multi-Node (Cluster) Airflow Setup
A more formal setup for Apache Airflow is to distribute the daemons across multiple machines as a cluster.
Benefits
Higher Availability
If one of the worker nodes were to go down or be purposely taken offline, the cluster would still be operational and tasks would still be executed.
Distributed Processing
If you have a workflow with several memory intensive tasks, then the tasks will be better distributed to allow for higher utilizaiton of data across the cluster and provide faster execution of the tasks.
Scaling Workers
Horizontally
You can scale the cluster horizontally and distribute the processing by adding more executor nodes to the cluster and allowing those new nodes to take load off the existing nodes. Since workers don’t need to register with any central authority to start processing tasks, the machine can be turned on and off without any downtime to the cluster.
Vertically
You can scale the cluster vertically by increasing the number of celeryd daemons running on each node. This can be done by increasing the value in the ‘celeryd_concurrency’ config in the {AIRFLOW_HOME}/airflow.cfg file.
Example:
celeryd_concurrency = 30
You may need to increase the size of the instances in order to support a larger number of celeryd processes. This will depend on the memory and cpu intensity of the tasks you’re running on the cluster.
Scaling Master Nodes
You can also add more Master Nodes to your cluster to scale out the services that are running on the Master Nodes. This will mainly allow you to scale out the Web Server Daemon incase there are too many HTTP requests coming for one machine to handle or if you want to provide Higher Availability for that service.
One thing to note is that there can only be one Scheduler instance running at a time. If you have multiple Schedulers running, there is a possibility that multiple instances of a single task will be scheduled. This could cause some major problems with your Workflow and cause duplicate data to show up in the final table if you were running some sort of ETL process.
If you would like, the Scheduler daemon may also be setup to run on its own dedicated Master Node.
Apache Airflow Cluster Setup Steps
Pre-Requisites
The following nodes are available with the given host names:
master1 - Will have the role(s): Web Server, Scheduler
master2 - Will have the role(s): Web Server
worker1 - Will have the role(s): Worker
worker2 - Will have the role(s): Worker
A Queuing Service is Running. (RabbitMQ, AWS SQS, etc)
You can install RabbitMQ by following these instructions: Installing RabbitMQ
If you’re using RabbitMQ, it is recommended that it is also setup to be a cluster for High Availability. Setup a Load Balancer to proxy requests to the RabbitMQ instances.
Additional Documentation
Documentation: https://airflow.incubator.apache.org/
Install Documentation: https://airflow.incubator.apache.org/installation.html
GitHub Repo: https://github.com/apache/incubator-airflow
All airflow processes need to have the same contents in their airflow_home folder. This includes configuration and dags. If you only want server B to run your MySQL database, you do not need to worry about any airflow specifics. Simply install the database on server B and change your airflow.cfg's sql_alchemy_conn parameter to point to your database on Server B and run airflow initdb from Server A.
If you also want to run airflow processes on server B, you would have to look into scaling using the CeleryExecutor.

How to detect EC2 instance has been idle

I am looking for idea how to approach this problem, basically measuring EC2 instance activity.
Is there a way I can know that via AWS? If not, how can I measure activity on e.g. Ubuntu instance. Taking into account that I will have some processes running and looking into their activity?
I'm not sure how you are defining "instance activity" but you can monitor your Amazon EC2 instance with Amazon CloudWatch and then query on the CPUUtilization metric to get information about how much CPU your instance is using (see Amazon Elastic Compute Cloud Dimensions and Metrics for details).
It's better to measure by 2 metrics and compare, Network traffic and CPU utilization
Ref: https://www.trendmicro.com/cloudoneconformity/knowledge-base/aws/EC2/idle-instance.html#:~:text=Using%20AWS%20Console-,01%20Sign%20in%20to%20the%20AWS%20Management%20Console.,to%20identify%20the%20right%20resource).

Resources