Google Cloud Composer: Save on costs - airflow

I am trying to figure out how to save on costs via Google Cloud Composer. Is there anyway to spin down the server when none of your dags are running? Then spin it up again when a dag needs to run?
It's costing way too much since I believe even though my dags are not running the server remains up and we're getting charged.
Thanks,

For now, there is no possibility to enable/disable a Composer environment. In order to save money on a server that is not in use, there is a need for a feature similar to autoscaling, for which a request has already been filed.
On Medium site, you can find a lot of useful information, regarding saving costs.
One way to control your costs in Cloud Composer is to use autoscaling. The amount of nodes can be set to autoscale in GKE cluster, follow this guide. Smaller size of Cloud Composer environment and shorter running time would be best practice.
Cloud Composer charges for compute resources allocated to an environment, components continue to run even when none DAGs deployed. There's not much you can reduce/turn off, you may consider another platform services, such as Dataflow, which is serverless.
I hope you find the above pieces of information useful.

You can now take a snapshot from GUI or v1beta API then delete the environment. When you want to work on it, simply create a new environment and load the snapshot from GCS via GUI or API. Creation and snapshot operations may take 20-30 minutes.

Related

GCP Storage usage from Firebase Cloud Functions

Since the node v10 changes to Cloud Functions, one thing that we've come to accept is a default storage bundle within our GCP of which accrues a cost of ~$0.03 a month. It is believed to be Docker cache files that store the Cloud Functions, Node Modules, and Hosting Revisions.
But on an almost empty project, how can you have 535MB in storage when the source is only 83MB total?
no hosting
2 cloud functions
What are the contents of the Files? is it overhead for the revision history? is there any reasonable way to minimize it? I get asked these questions many times and I do not have an answer that I feel comfortable with.
It's the entire container image, so I imagine it contains lot of things that you don't deploy yourself - like the operating system the code runs on, and the runtime of the language you use (Node.js if you're using Firebase to deploy).

Autoscaling Google Cloud Composer

I have read this Medium article which is one of the top hits when searching for autoscaling and Cloud Composer. It shows some 'hacks' that you can use to autoscale Composer while it remains configured to use the CeleryExecutor.
I have also read the GCP docs on using KubernetesPodOperator (KPO) with Cloud Composer, and have implemented that before.
However, using KPO means that you don't get to utilize all the other Airflow Operators - you have to write your own container and code every time.
KubernetesExecutor seems to be the best way forward - you get to use the Airflow Operators, and autoscaling can be enabled since it will create a new Kubernetes Pod for every task instance.
Google Cloud Composer currently runs on CeleryExecutor, in Blocked Airflow Configurations it currently states
Cloud Composer configures Airflow to use the Celery executor
for the core-executor setting.
Will KubernetesExecutor ever be an option for Composer?
Unfortunately, your question can't be answered yet as there is no official plans of doing so. That said, I would be surprised if this wasn't at least under consideration by the Cloud Composer product team.
But as soon as there is news about it, it should be published in this Feature Request.
I would recommend this airflow-executors-explained
overview for features comparison of CeleryExecutor and KubernetesExecutor. As you have already researched in the links you have provided, the CeleryExecutor does provide the scalability for the Composer environment. Having KubernetesExecutor as an option would be nice, but is not necessary as the additional benefits don't outweigh the downsides.

Deploying wordpress as AWS lambda functions?

I am wondering if it is feasible to deploy wordpress as a series of lambda functions on AWS API gateway. Any pointers on the feasibility/gotchas would be greatly appreciated!
Thanks in advance,
PKK
You'll have a lot of things to consider with persistence and even before that, Lambda doesn't support PHP. I'd probably look at Microsoft Azure Functions instead that do support PHP and do have persistent storage.
While other languages (such as Go, Rust, Swift etc.) can be "wrapped" to run in AWS Lambda with relative ease, compiling PHP targeting the same platform and running it is a bit different (and certainly more painstaking). Think about all the various PHP modules you'd need for starters. Moreover, I can't imagine performance will be as good as something like a Go binary.
If you can do something clever with the Phalcon framework and come up with an easy build and deploy process, then maayyyybee.
Though, you'd probably need to really overhaul something like WordPress which was not designed for this at all. It still uses some pretty old conventions due to the age of the project and while that is all well and good for your typical PHP server, it's a different ball game in the sense of this "portable" PHP installation.
Keep in mind that PHP sessions are relied upon as well and so you're going to need to move those elsewhere due to the lack of persistence with AWS Lambda. You can probably find some sort of plugin for WordPress that works with Redis?? I have to imagine something like that has been built by now... But there will be many complications.
I would seriously consider using Azure Functions to begin with OR using Docker and forgoing the pricing model that cloud functions offers. You can still find some pretty cheap and scalable hosting out there.
What I've done previously was use AWS ECS (Docker) with EFS (network storage) for persistence and RDS for the database. While this doesn't carry the same pricing model as Lambda, it is still cost efficient. You can set up your ECS Service to autoscale up and down. So that way you're running the bare minimum until you need more.
I've written a more in depth article about it here: https://serifandsemaphore.io/how-to-host-wordpress-like-a-boss-b5993fcfbd8e#.n6fbnf8ii ... but it's basically just the idea of running WordPress in Docker and using EFS to offload the persistent storage issues. You can swap many of the pieces of the puzzle out if you like. Use a database hosted in some other Docker service or Compose or where ever. That part need not be RDS for example. Even your storage could be handled in a different way, though EFS worked pretty well! The only major thing to note about EFS is the write speed. Most WordPress sites are read heavy though. Your mileage will vary depending on your needs.
Is it possible? Yes, anything is possible with enough time and effort. Is it worth it? That is a question best to ask yourself.
PHP can be run on Lambda as per the documentation located here: https://aws.amazon.com/blogs/compute/scripting-languages-for-aws-lambda-running-php-ruby-and-go/ .
The bigger initial problem as stated in other comments is a persistent file system. S3 for media storage is doable via Wordpress plugin (again from the comments) but any other persistent storage for the request / script execution is the initial biggest hurdle. Tackle one problem at a time till you get to the end!

Network Scheduler Service

I am planning on a project to schedule scripts on multiple Windows and Linux servers. I'm kind of going down the path of doing this all from scratch because I have requirements which alternative software don't seem to meet (such as running tasks on completion or failure of other tasks and being able to schedule on non standard intervals).
I was thinking about having a web interface which will allow users to add/modify/delete schedules for each machine to a database.
A windows service will then be checking the database for any jobs that need to be run at that point and connect over SSH for Linux or PowerShell for windows. All the scripts will write back to the database on their progress so that they can be checked by the user.
Basically I just wanted some advice from people who knows better ways or things I may need to look out for which could cause problems because I don't have much experience.
Thanks.
Oracle Scheduler has all options where you are looking for and probably more. See Overview of Oracle Scheduler for some global info. It comes doen to having a central schedular database that submits jobs to remote job agents that do the work pretty much independent from the central schedular repository. It does report back status etc. when the repository is accessible after a job has finished.
It's a very powerful tool and it takes away a lot of complex tasks for you by giving a framework that you can start using right out of the box.

How do you handle scheduled tasks for your websites running on IIS?

I have a website that's running on a Windows server and I'd like to add some scheduled background tasks that perform various duties. For example, the client would like users to receive emails that summarize recent activity on the site.
If sending out emails was the only task that needed to be performed, I would probably just set up a scheduled task that ran a script to send out those emails. However, for this particular site, the client would like a variety of different scheduled tasks to take place, some of them always running and some of them only running if certain conditions are met. Right now, they've given me an initial set of things they'd like to see implemented, but I know that in the future there will be more.
What I am wondering is if there's a simple solution for Windows that would allow me to define the tasks that needed to be run and then have one scheduled task that ran daily and executed each of the scheduled tasks that had been defined. Is a batch file the easiest way to do this, or is there some other solution that I could use?
To keep life simple, I would avoid building one big monolithic exe and break the work to do into individual tasks and have a Windows scheduled task for each one. That way you can maintain the codebase more easily and change functionality at a more granular level.
You could, later down the line, build a windows service that dynamically loads plugins for each different task based on a schedule. This may be more re-usable for future projects.
But to be honest if you're on a deadline I'd apply the KISS principle and go with a scheduled task per task.
I would go with a Windows Service right out of the gates. This is going to be the most extensible method for your requirements, creating the service isn't going to add much to your development time, and it will probably save you time not too far down the road.
We use Windows Scheduler Service which launches small console application that just passes parameters to the Web Service.
For example, if user have scheduled reports #388 and #88, scheduled task is created with command line looking like this:
c:\launcher\app.exe report:388 report:88
When scheduler fires, this app just executes web method on web service, for example, InternalService.SendReport(int id).
Usually you already have all required business logic available in your Web application. This approach allows to use it with minimal efforts, so there is no need to create any complex .exe or windows service with pluggable modules, etc.
The problem with doing the operations from the scheduled EXE, rather than from inside a web page, is that the operations may benefit from, or even outright require, resources that the web page would have -- IIS cache and an ORM cache are two things that come to mind. In the case of ORM, making database changes outside the web app context may even be fatal. My preference is to schedule curl.exe to request the web page from localhost.
Use the Windows Scheduled Tasks or create a Windows Service that does the scheduling itself.

Resources