Cloud Composer pricing - airflow

I've set up a cloud composer environment and I'm very confused with the pricing. In the pricing page, it's written that any environment would have at least 2 DB vCPU and 2 Web server vCPU.
Further in the page, it's written that it's possible to partially run the environment (e.g. 25%/month).
However, in my case, my environment seems to be running 24/24h (my billing report shows ~48 hours of usage per day, corresponding to 24h per vCPU I guess)
The problem is that I've nothing running on it, except for airflow_monitoring DAG, that I can't stop even by switching it off or deleting it (but I read here Why is there an automatic DAG 'airflow_monitoring' generated in GCP Composer? that it was necessary?).
So my question is: why does the doc say I can partially run my environment whereas the latter seems to be running all the time without any DAG scheduled on it?

It's not currently possible to "turn off" an environment aside from deleting it entirely, because there are certain system components that always need to be active (like the managed database). The example is referring to using an environment for development for a week, and then getting rid of it completely.
Airflow consumes CPU and memory regardless of if DAGs are running/scheduled, so Composer is charging for the GCE instances/GKE cluster it's deployed on.

Related

Reduce latency when rebuilding/updating Google Cloud Composer?

I am working with Google Cloud Composer, and whenever I change an environment variable or or any setting in airflow, it will trigger a rebuild of the airflow environment.
I was thinking there may be a way to resolve package dependencies (ie. upload python packages) to cut back on the rebuild latency.
I ask because the rebuild can take anywhere from 2-15 mins.
Has anyone had any luck with reducing this build time (with or without increasing costs)?
Environment updates can generally take between 5-30 minutes in Cloud Composer and it works as intended at this moment. Please check this public issue tracker for more insight. You can click on +1 to make it more visible to the Cloud Composer engineering team.
Please note, that the Composer needs to take care of a lot of resources, i.e. deploys Airflow within the Google Kubernetes Engine and App Engine, which forces rebuilding images of the container, updating the Airflow Webserver and so on.
I suggest you to take a look in the following Cloud Composer architecture documentation, where you can find all the components that needs to be updated with each change.
You can check if any of your PODs are in Evicted state, which means that your node is on low resource, so you should consider using higher resource machine instead of the standard one n1-standard-2.
I hope you find the above pieces of information useful.

Azure Git Deploy Doesn't Include More Recent Commits

I am deploying a .net application to Azure via a git deployment. However when I launch the deployment I can see it spinning on the website and eventually complete, however it has an older commit message listed and not the most recent commits. I also confirmed in /site/deployments/ that it is using an older commit ID. What would possibly cause the deploy to appear as thought it works but essentially ignore a bunch of commits after a certain point? I also confirmed in the code that it isn't getting updated.
It looks like it is just taking the commit on top (the one deployed a few weeks back) and updating the date of it to be today.
UPDATE: I will also say that this branch was initially an autodeploy and it was failing for permissions for some reason. I have a bunch of instances that have a similar code base and set up and all of them work. Not sure if the fact it was an autodeploy at first is making a difference but why would pushing directly to the remote's master look like it is working but not include updates?
UPDATE 2:
Here are photos of the progression: There is definitely commits after the one it keeps reusing that I cherry-picked from other branches.

Updating code on production server when using Go

When I develop and update files on production server with PHP I just copy the files on the fly and everything seems to work without interrupting the server.
But if I am to update the code on the Go server and application and would need to kill the server, copy the src files to the server, run go install, and then start the server, this would interrupt the service, and if I do this quite often then it is going to look very bad for my users of the service.
How can I update files without the downtime when using Go with Go's http server?
PHP is an interpreted language, which means you provide your code in source format and the PHP interpreter will read it and execute it (it may create a more compact binary form so that it doesn't have to analyze the source again when needed).
Go is a compiled language, it compiles into a native executable binary; going further it is statically linked which means every code and library your app is referring to is compiled and linked when the executable is created. This implies you can't just "drop-in" new go modules into a running application.
You have to stop your running application and start the new version. You can however minimize the downtime: only stop the running application when the new version of the executable is already created and ready to be run. You may choose to compile it on a remote machine and upload the binary to the server, or upload the source and compile it on the server, it doesn't matter.
With this you could decrease the downtime to a maximum of few seconds, which your users won't notice. Also you shouldn't update in every hour, you can't really achieve significant updates in just an hour of coding. You could schedule updates daily (or even less frequently), and you could schedule them for hours when your traffic is low.
If even a few seconds downtime is not acceptable to you, then you should look for platforms which handle this for you automatically without any downtime. Check out Google App Engine - Go for example.
The grace library will allow you to do graceful restarts without annoyance for your users: https://github.com/facebookgo/grace
Yet in my experience restarting Go applications is so quick, unless you have an high traffic website it won't cause any trouble.
First of all, don't do it in that order. Copy and install first. Then you could stop the old process and run the new one.
If you run multiple instances of your app, then you can do a rolling update, so that when you bounce one server, the other ones are still serving. A similar approach is to do blue-green deployments, which has the advantage that the code your active cluster is running is always homogeneous (whereas during a rolling deploy, you'll have a mixture until they've all rolled), and you can also do a blue-green deployment where you normally have only one instance of your app (whereas rolling requires more than one). It does however require you to have double the instances during the blue-green switch.
One thing you'll want to take into consideration is any in-flight requests -- you may want to make sure that in-flight requests continue to go to old-code servers until their finished.
You can also look into Platform-as-a-Service solutions, that can automate a lot of this stuff for you, plus a whole lot more. That way you're not ssh'ing into production servers and copying files around manually. The 12 Factor App principles are always a good place to start when thinking about ops.

How to stop hive/pig install in Amazon Data Pipeline?

I don't need Hive or Pig, and Amazon Data Pipeline by default installs them on any EMR cluster it spins up. This makes testing take longer than it should. Any ideas on how to disable to install?
This is not possible as of today.
The only workaround would be to launch a small EMR cluster that you use for testing (like with single master - m1.small). Then use it with 'workergroup' rather than 'runsOn'.
Depending on type of activities you want to use, the workergroup field might or might not be supported. But you can always wrap everything in a script (python, shell or blah) and use it with ShellCommandActivity.
Update (correctly reminded by ChristopherB):
From 3.x AMI version, Hive and Pig is bundled in the AMI itself. So the steps do not pull any new packages from S3 but only activate the daemons on master node. So unless you are worried about them consuming your instance resources (CPU, memory etc), it should be okay. They would not take noticable time to run.

Automated Deployment and Upgrade Strategy for ASP.Net MVC Application

I am working on a ASP.net MVC4 project where a same project needs to be deployed to many clients on daily basis, each client will have its own domain / sub domain and a separate app pool and db (MSSSQL).
Doing each deployment manually could take at least 1-2 hours if everything goes well. Is there anyway using which I can do this in some automated way?
Moreover, we also need to update all of the apps when a new version is released.. may be one by one or all of them at same time. However, doing this manually could take weeks and once we have more clients then it will not possible doing this update manually.
The update involves, suspending app for some time, taking a full backup of files and db, update application code/ files in app folder, upgrade db with a script and then start app, doing some diagnosis script to check if update was successful or not, if not we need to check what went wrong?
How can we automate this updates? Any idea would be great on how to approach this issue.
As a developer for BuildMaster, I can say that this scenario, known as the "Core Version" pattern, is a common one. If you're OK with a paid solution, you can setup your deployment plans within the tool that do exactly what you described.
As a more concrete example, we experience this exact situation in a slightly different way. BuildMaster has a set of 60+ extensions that rely on a specific SDK version. In our recent 4.0 release, we had to re-deploy every extension because of breaking API changes within the SDK. This is essentially equivalent to having a bunch of customers and deploying to them all at once. We have set up our deployment plans such that any time we create a new release of the SDK application, we have the option to set a variable that says to build every extension that relies on the SDK:
In BuildMaster, the idea is to promote a build (i.e. an immutable object that travels through various environments like Dev, Test, Staging, Prod) to its final environment (where it becomes the deployed build for the release). In your case, this would be pushing your MVC application to its final environment, and that would then trigger the deployments of all dependent applications (i.e. your customers' instances of your application). For our SDK, the plan looks like this:
For your scenario, you would only need the single action, "Promote Build". As I mentioned before, any dependents would then be promoted to their final environments, so all your customer deployments would kick off once that action is run during deployment. As an example, our Azure extension's deployment plan for its final environment looks like this (internal URLs redacted):
You may have noticed that these plans are marked "Shared", which means every extension we have has the exact same deployment plan, but utilizes different variables to handle the minor differences like names, paths, etc.
Since this is such an enormous topic I could go on for ages, but I think that should be sufficient for your use-case if you wanted to try it out.
There are others but you could setup Team Server Foundation to deploy automated builds.
http://msdn.microsoft.com/en-us/library/ff650529.aspx
I find the easiest way to do this from an MVC project is to create a publish profile.
This is done by right-clicking your project selecting publish and then configuring it to your needs.
Then from TFS you create a new build definition, this kicks of a wizard which takes you through it.
There are quite a few options which would be too long to go into for every scenario.
The main change I usually find the most important is to set an MSBuild Argument to deploy with the publish profile.
This can be found at Process > Advanced > MSBuild Arguments.
Once this is configured correctly it's a simple case of right-clicking and queue new build to build and deploy.
You wil need different PublishProfile/Build configuration per deployment environment.
For backups I use a powershell script which can be called manually or from TFS.
You also have a drop folder in TFS which keeps a backup of x many releases.
The datbases are automatically configured via Sql server to backup, TBH I didn't set that up it was a DB admin guy who is also involved with releases.
From a dev testing side I use jMeter (http://jmeter.apache.org/) to run some automated scripts that check that users can login and view certain screens, just to confirm nothing major has gone wrong. However there is usually a testing team to run more detailed tests, again not setup by me.
All of the above will probably take you sometime to setup but in the long run it will literally save you weeks of time over a year.
A free alternative to TFS is http://www.cruisecontrolnet.org/, I have used this in the past too and is pretty good.
You can automate your .Net deployments with Beanstalk, which will give you a way to trigger deployments with a single click, watch progress, manage permissions and see history of deployments. Check out this guide on the topic:
http://guides.beanstalkapp.com/deployments/deploy-dotnet.html
I hope you will find it useful.
P.S. - I work at Beanstalk.

Resources