Airflow stored in the cloud? - airflow

I would like to know if I can make the airflow UI accessible to all people who have a user, web page type. For this, I would have to connect it to a server, no? Which server do you recommend for this? I was looking around and some were using Amazon EC2.

If your goal is just making the airflow UI visible to public, there is a lot of solutions, where you can do it even in your local computer (of course it is not a good idea).
Before choosing the cloud provider and the service, you need to think about the requirements:
in your team, do you have the skills and the time to manage the server? if no you need a managed service like GCP cloud composer or AWS MWAA.
which executor yow want to use? KubernetesExecutor? CeleryExecutor on K8S? if yes you need a K8S service and not just a VM.
do you have a huge loading? do you need a HA mode? what about the scalability?
After defining the requirements, you can choose between the options:
Small server with LocalExecutor or CeleryExecutor on a VM -> AWS EC2 with a static IP and Route 53 for DNS name
A scalable server in HA mode on a K8S cluser -> AWS EKS or google GKE
A managed service and focusing only on the development part -> google cloud composer

Related

Unable to access newly created Airflow UI MWAA

I am trying to create MWAA as root user and I have all AWS services (s3 and EMR )in North California. MWAA doesn't exist in North California. Hence created this in Oregon.
I am creating this in a private network, it also required a new s3 bucket in that region for my dags folder.
I see that it also needed a new vpc and private subnet as we dont have anything in that region created by clicking on "Create VPC ".
Now when I click on airflow UI. It says
"This site can’t be reached". Do I need to add my Ip to the security group here to access Airflow UI?
Someone, please guide.
Thanks,
Xi
From AWS MWAA documentation:
3. Enable network access. You'll need to create a mechanism in your Amazon VPC to connect to the VPC endpoint (AWS PrivateLink) for your Apache Airflow Web server. For example, by creating a VPN tunnel from your computer using an AWS Client VPN.
Apache Airflow access modes (AWS)
The AWS documentation suggests 3 different approaches for accomplishing this (tutorials are linked in the documentation).
Using an AWS Client VPN
Using a Linux Bastion Host
Using a Load Balancer (advanced)
Accessing the VPC endpoint for your Apache Airflow Web server (private network access)

Azure key vault values inside AKS

I've been building Azure Bicep files with the goal of having our infrastructure codified.
All is going well except for AKS. Reading and experimenting I think I have two options.
AKS has pods with Nodejs or .net services running in them which need environment variables like database connection strings. These can be passed in at the deploy stage of each node/.net or they can be 'included' in each AKS instance.
Am I on the right track and does one have advantages over the other?
The IaC Code for your AKS should not be mixed with deployment code of workload (your Nodejs or .Net Pods).
I would also not recommend to use ENV variables for secrets and connections strings. Kubernetes upstream decided that CSI (Container Storage Interface) is the way to go.
With that being said, you can write a Bicep deployment that deploys AKS & Azure Key Vault. Enable the azureKeyvaultSecretsProvider add-on for the AKS to sync secrets from Azure KeyVault to Kubernetes secrets or directly as files into pods.
After this you write you workload deployment of Nodejs and .Net Pods and refer the AZURE KEY VAULT PROVIDER FOR SECRETS STORE CSI DRIVER. This also make you more independent if you create more cluster etc.

Setup a kubernetes cluster with bare metal servers from different subnets

What I am doing right now:
I own many VPS which I use to deploy applications with Docker compose, most of the machines come from different subnets and have a public static IP address.
For each new application I would pick a random VPS, assign the new application's subdomain's DNS with the VPS' IP address and deploy my application in this VPS behind an Nginx proxy (jwilder Nginx).
This approach is in my opinion very comfortable since jwilder's Nginx does almost the work for me and I only have to assign the correct DNS.
What I want to achieve:
For the purpose of learning, I would like to take the machines and make a Kubernetes cluster out of them, so I could learn more about this technology. My idea is that I only have to assign new subdomain's DNS to one single point, which also plays the role of a load balancer and pass the traffic to corresponding pods.
To redirect traffic to a new application I only have to configure the load balancer.
My problem:
I know this question is not very precise since I don't know a lot of Kubernetes. Moreover, my servers are not from a cloud provider like Google or AWS and I, therefore, can not use their solutions. They are not even from a single cloud provider, most of them are of my university and some are from a private cloud provider.
Could anybody tell me how can I achieve this?
I think the answer is kubeadm, you can install it on your own pc or vm.
It is gonna create a single control-plane cluster which could be joined by other of your vms and create a kubernetes cluster.
kubeadm helps you bootstrap a minimum viable Kubernetes cluster that conforms to best practices
kubeadm is designed to be a simple way for new users to start trying Kubernetes out, possibly for the first time, a way for existing users to test their application on and stitch together a cluster easily, and also to be a building block in other ecosystem and/or installer tool with a larger scope.
Your cluster pods will communicate via CNI.
CNI was created as a minimal specification, built alongside a number of network vendor engineers to be a simple contract between the container runtime and network plugins

Amazon ECS Application level monitoring and instance autohealing without ELB

i'm deploying and app to Amazon ECS and need some advice on application level monitoring (periodic HTTP 200 and/or body match). Usually i place it behind an ELB and i am sure that my ELB will take action if it sees too many HTTP errors.
However this time it's a very low budget project and the budget for the ELB should be avoided (also consider this is going to work with only one instance as the userbase is very limited).
What strategies could i adopt to grant that the application is alive (kill instance and restart in case of too many app errors)? Regarding the instance i know about AWS autohealing but that's infrastructure.
Obviously one of the problems is that not having an ELB i must bind the DNS to an EIP....so reassigning it it's crucial.
And obviously the solution should not involve any other EC2 instance, external services are acceptable but keeping it all inside AWS would be great.
Thanks a lot
Monitoring of ECS is important to improve the importance of your site. If you still think there could be issues related to deployment on AWS, I suggest to practice auto-scaling feature of AWS.
You can scale up ECS when needed and release it when not required.
Nagios is another open source monitoring tool that you can leverage. Easy to install and configure.

Flask SQLAlchemy Database with AWS Elastic Beanstalk - waste of time?

I have successfully deployed a Flask application to AWS Elastic Beanstalk. The application uses an SQLAlchemy database, and I am using Flask-Security to handle login/registration, etc. I am using Flask-Migrate to handle database migrations.
The problem here is that whenever I use git aws.push it will push my local database to AWS and overwrite the live one. I guess what I'd like to do is only ever "pull" the live one from AWS EB, and only push in rare circumstances.
Will I be able to access the SQLAlchemy database which I have pushed to AWS? Or, is this not possible? Perhaps there is some combination of .gitignore and .elasticbeanstalk settings which could work?
I am using SQLite.
Yes, your database needs to not be in version control, it should live on persistent storage (most likely the Elastic Block Storage service (EBS)), and you should handle schema changes (migrations) using something like Flask-Migrate.
The AWS help article on EBS should get you started, but at a high level, what you are going to do is:
Create an EBS volume
Attach the volume to a running instance
Mount the volume on the instance
Expose the volume to other instances using a Network File System (NFS)
Ensure that when new EBS instances launch, they mount the NFS
Alternatively, you can:
Wait until Elastic File System (EFS) is out of preview (or request access) and mount all of your EB-started instances on the EFS once EB supports EFS.
Switch to the Relational Database Service (RDS) (or run your own database server on EC2) and run an instance of (PostgreSQL|MySQL|Whatever you choose) locally for testing.
The key is hosting your database outside of your Elastic Beanstalk environment. If not, as the load increases different instances of your Flask app will be writing to their own local DB. There won't a "master" database that will contain all the commits.
The easiest solution is using the AWS Relational Database Service (RDS) to host your DB as an outside service. A good tutorial that walks through this exact scenario:
Deploying a Flask Application on AWS using Elastic Beanstalk and RDS
SQLAlchemy/Flask/AWS is definitely not a waste of time! Good luck.

Resources