Reduce latency when rebuilding/updating Google Cloud Composer? - airflow

I am working with Google Cloud Composer, and whenever I change an environment variable or or any setting in airflow, it will trigger a rebuild of the airflow environment.
I was thinking there may be a way to resolve package dependencies (ie. upload python packages) to cut back on the rebuild latency.
I ask because the rebuild can take anywhere from 2-15 mins.
Has anyone had any luck with reducing this build time (with or without increasing costs)?

Environment updates can generally take between 5-30 minutes in Cloud Composer and it works as intended at this moment. Please check this public issue tracker for more insight. You can click on +1 to make it more visible to the Cloud Composer engineering team.
Please note, that the Composer needs to take care of a lot of resources, i.e. deploys Airflow within the Google Kubernetes Engine and App Engine, which forces rebuilding images of the container, updating the Airflow Webserver and so on.
I suggest you to take a look in the following Cloud Composer architecture documentation, where you can find all the components that needs to be updated with each change.
You can check if any of your PODs are in Evicted state, which means that your node is on low resource, so you should consider using higher resource machine instead of the standard one n1-standard-2.
I hope you find the above pieces of information useful.

Related

How to delete outdated Firebase Cloud function containers from GC Storage?

So recently Firebase started charging for Cloud Functions container storage: https://firebase.google.com/pricing
No free usage $0.026/GB
I have deployed 2 functions several times (no more than 10 times, can't remember exact count, but this is still pretty low, IMO). Now I am already billed a small amount (fractions of a cent for now). So seems that if I deploy the functions another few dozens of times, I'll get close to a dollar, because old (and unused) containers are not deleted from the storage bucket.
Is there a way to safely delete outdated, not used containers to free some space? Well, it may seem that a few cents are not worth the time, but still, that's not what a free tier should be like.
I found the only robust solution to this ongoing issue (for now) is to periodically remove all of the artifact files (following Doug's instructions). As noted by others, removing some of the files can cause subsequent deploy errors (I experienced these).
IMPORTANT: Only delete the artifact files, NOT the folders as this can also cause issues.
You can do partial or full deploys as normal without any issues (it seems that the artifact files are only referenced during the build/deploy process).
Not ideal by a long shot, but at least reduces the storage usage to the minimum (until it starts accumulating again).
Edit: I have experimented with Lifecycle rules in the artifacts bucket to try and automate the clearing out of the container, but the parameters provided cannot guarantee that ALL will be cleared in one hit (which you need it to).
For convenience, you can see the artifacts bucket from within the Firebase Storage UI by selecting the "Add Bucket" option and importing the buckets from GCP.
Go to the Cloud console
Select "Cloud Storage -> Browser" from the products in the hamburger menu.
You will see multiple storage buckets there. Simply dig in to the buckets that start with "artifacts" or end with "cloudbuild" and delete the old files (by date) that you don't want.
In case of Firebase Cloud Functions you can see from their documentation (lifecycle of a background function section):
When you update the function by deploying updated code, instances for older versions are cleaned up along with build artifacts in Cloud Storage and Container Registry, and replaced by new instances.
When you delete the function, all instances and zip archives are cleaned up, along with related build artifacts in Cloud Storage and Container Registry. The connection between the function and the event provider is removed.
It means that there is no need to manually cleanup and firebase deploy scripts are doing it automatically.
You should not remove build artifacts since cloud functions are scaling automatically and new instances are built from theese artifacts.
I don't really think that cost is nearly a problem since it's 0.026$/GB so you need very approximately about 76 functions to pay 1$ for their artifacts storage (I take approx size of 1 function's artifacts as 500mb). Also artifacts size should not grow up with every functions since it's basically size of dependencies which are more or less independent from number of deployed functions.

Firebase storage artifacts is huge and keeps increasing

I've just noticed that my app's storage started to increase significantly.
After having a closer look, it appeared that this was caused by the "artifacts" bucket.
I can see that the "artifacts" storage keeps increasing by about ~800Mb every week which worries me to say the least.
I assume this is related to firestore functions deploys (or not?), but is this really expected? Can I cleanup this artifacts safely?
Appreciate any suggestions on how to safely handle storage size in this case and to keep its consumption at minimum.
Figured out a solution - it appeared there is a way to setup an auto deletion rule in google cloud console for those images that clutter the storage.
go to the google cloud console, select your project -> storage -> browser https://console.cloud.google.com/storage/browser
Select the "artifacts" bucket
Under the "lifecycle" tab add a rule to auto delete old images (in my case I put "delete after 1 day since update" which works fine for me)
Storage is safe now!
NOTE: if you face any deployment issues later, like if you deploy several days in a row and if it gives you an error on deploy, just delete the whole "container" folder manually in the artifacts which should solve it and then redeploy again. (make sure not to delete the artifacts bucket itself!)
Hope the firebase team will improve that - the current behavior looks confusing as it easily leads to an unexpected bill unless you take extra steps to prevent that. But you'll never know that it will happen until it does.
I assume this is related to firestore functions deploys (or not?), but is this really expected?
Yes, it's expected. Every time you deploy functions, Cloud Build will use a dedicated Cloud Storage space for the built docker image, and retain it until you delete it.
Can I cleanup this artifacts safely?
Yes, but then you won't be able to easily revert to a prior image. You would have to deploy again from your own source code.
On top of the GCP's Life Cycle settings for artifacts images, you can also consider the following for further optimization and costs reduction of your Firebase Functions deployment:
Clean up your functions folder, don't put unnecessary files in it, as we do not know if Google will only upload files by dependencies or by the whole functions folder. Feel free to refine this item if anyone of you can confirm this.
Remove unnecessary dependencies from functions/package.json, functions/node_modules and require statements from your JS files, e.g. functions/index.js.
Compact and compress your function's JS files by removing unnecessary comments, console loggings etc. You can achieve this with the help of grunt and uglify NPM packages. Again, we're not sure if the Cloud Build (or any of the Google functions deployment system) will auto-compress the function's images for us before storing them into the Container Registry or Cloud Storage (please refine this item if you have better answer).
Organize your functions properly by creating relevant function groups so that you can deploy only certain group(s) of function rather than simply firebase deploy --only functions.
If necessary, write codes that automatically detect and resolve environmental differences, e.g. environment variables from local emulators to production/staging, because the Firebase emulators and production environments may not be 100% consistent. If you don't do that, you may end up needing to deploy several times per day due to certain negligence -- this will spike up your deployment cost.
If necessary, change your deployment plan: from daily to weekly, or even from weekly to monthly, depending on your monthly budgets, criticality, and urgency.
Lastly, I hope the community can also help to add more recommended costs reduction plans and strategies on this post in order to help some small businesses and individuals to survive better on Firebase and Google Cloud Platform as a whole. Even just some links to good articles would help. Thanks!

Cloud Composer pricing

I've set up a cloud composer environment and I'm very confused with the pricing. In the pricing page, it's written that any environment would have at least 2 DB vCPU and 2 Web server vCPU.
Further in the page, it's written that it's possible to partially run the environment (e.g. 25%/month).
However, in my case, my environment seems to be running 24/24h (my billing report shows ~48 hours of usage per day, corresponding to 24h per vCPU I guess)
The problem is that I've nothing running on it, except for airflow_monitoring DAG, that I can't stop even by switching it off or deleting it (but I read here Why is there an automatic DAG 'airflow_monitoring' generated in GCP Composer? that it was necessary?).
So my question is: why does the doc say I can partially run my environment whereas the latter seems to be running all the time without any DAG scheduled on it?
It's not currently possible to "turn off" an environment aside from deleting it entirely, because there are certain system components that always need to be active (like the managed database). The example is referring to using an environment for development for a week, and then getting rid of it completely.
Airflow consumes CPU and memory regardless of if DAGs are running/scheduled, so Composer is charging for the GCE instances/GKE cluster it's deployed on.

What service to use for Meteor App deploy?

I need Meteor App in production, without worrying about scaling in the future. App will have 500 user online in any one time - as maximum. I read in the oficial guide that the best practice for many reasons is deploy to Galaxy.
But there are also many articles about deploy to Digital Ocean and Heroku.
So what variant is the best?
Use digitalocean.
Set up nameservers, set up node on your droplet and have Mupx do the deploying. It's very easy and DO is cheap.
DO has easy to follow tutorials to do the initial setup and you will thank me later + your wallet will be happy.
I have used Galaxy in a production level app for one of our clients and it works fantastic. If you want a deployment where you don't have to worry about scaling at all, then Galaxy is for you.
With time, as your users increase and you find yourself needing more resources, you can assign better containers and CPU power with just the click of one button. It also provides complete logs and some performance metrics to help you in a completely meteor-specific manner.
The deployment process itself is as simple as having a settings json file, and running a one line command like:
DEPLOY_HOSTNAME=galaxy.meteor.com meteor deploy appname.meteorapp.com --settings settings.json

Automated Deployment and Upgrade Strategy for ASP.Net MVC Application

I am working on a ASP.net MVC4 project where a same project needs to be deployed to many clients on daily basis, each client will have its own domain / sub domain and a separate app pool and db (MSSSQL).
Doing each deployment manually could take at least 1-2 hours if everything goes well. Is there anyway using which I can do this in some automated way?
Moreover, we also need to update all of the apps when a new version is released.. may be one by one or all of them at same time. However, doing this manually could take weeks and once we have more clients then it will not possible doing this update manually.
The update involves, suspending app for some time, taking a full backup of files and db, update application code/ files in app folder, upgrade db with a script and then start app, doing some diagnosis script to check if update was successful or not, if not we need to check what went wrong?
How can we automate this updates? Any idea would be great on how to approach this issue.
As a developer for BuildMaster, I can say that this scenario, known as the "Core Version" pattern, is a common one. If you're OK with a paid solution, you can setup your deployment plans within the tool that do exactly what you described.
As a more concrete example, we experience this exact situation in a slightly different way. BuildMaster has a set of 60+ extensions that rely on a specific SDK version. In our recent 4.0 release, we had to re-deploy every extension because of breaking API changes within the SDK. This is essentially equivalent to having a bunch of customers and deploying to them all at once. We have set up our deployment plans such that any time we create a new release of the SDK application, we have the option to set a variable that says to build every extension that relies on the SDK:
In BuildMaster, the idea is to promote a build (i.e. an immutable object that travels through various environments like Dev, Test, Staging, Prod) to its final environment (where it becomes the deployed build for the release). In your case, this would be pushing your MVC application to its final environment, and that would then trigger the deployments of all dependent applications (i.e. your customers' instances of your application). For our SDK, the plan looks like this:
For your scenario, you would only need the single action, "Promote Build". As I mentioned before, any dependents would then be promoted to their final environments, so all your customer deployments would kick off once that action is run during deployment. As an example, our Azure extension's deployment plan for its final environment looks like this (internal URLs redacted):
You may have noticed that these plans are marked "Shared", which means every extension we have has the exact same deployment plan, but utilizes different variables to handle the minor differences like names, paths, etc.
Since this is such an enormous topic I could go on for ages, but I think that should be sufficient for your use-case if you wanted to try it out.
There are others but you could setup Team Server Foundation to deploy automated builds.
http://msdn.microsoft.com/en-us/library/ff650529.aspx
I find the easiest way to do this from an MVC project is to create a publish profile.
This is done by right-clicking your project selecting publish and then configuring it to your needs.
Then from TFS you create a new build definition, this kicks of a wizard which takes you through it.
There are quite a few options which would be too long to go into for every scenario.
The main change I usually find the most important is to set an MSBuild Argument to deploy with the publish profile.
This can be found at Process > Advanced > MSBuild Arguments.
Once this is configured correctly it's a simple case of right-clicking and queue new build to build and deploy.
You wil need different PublishProfile/Build configuration per deployment environment.
For backups I use a powershell script which can be called manually or from TFS.
You also have a drop folder in TFS which keeps a backup of x many releases.
The datbases are automatically configured via Sql server to backup, TBH I didn't set that up it was a DB admin guy who is also involved with releases.
From a dev testing side I use jMeter (http://jmeter.apache.org/) to run some automated scripts that check that users can login and view certain screens, just to confirm nothing major has gone wrong. However there is usually a testing team to run more detailed tests, again not setup by me.
All of the above will probably take you sometime to setup but in the long run it will literally save you weeks of time over a year.
A free alternative to TFS is http://www.cruisecontrolnet.org/, I have used this in the past too and is pretty good.
You can automate your .Net deployments with Beanstalk, which will give you a way to trigger deployments with a single click, watch progress, manage permissions and see history of deployments. Check out this guide on the topic:
http://guides.beanstalkapp.com/deployments/deploy-dotnet.html
I hope you will find it useful.
P.S. - I work at Beanstalk.

Resources