Firebase storage artifacts is huge and keeps increasing - firebase

I've just noticed that my app's storage started to increase significantly.
After having a closer look, it appeared that this was caused by the "artifacts" bucket.
I can see that the "artifacts" storage keeps increasing by about ~800Mb every week which worries me to say the least.
I assume this is related to firestore functions deploys (or not?), but is this really expected? Can I cleanup this artifacts safely?
Appreciate any suggestions on how to safely handle storage size in this case and to keep its consumption at minimum.

Figured out a solution - it appeared there is a way to setup an auto deletion rule in google cloud console for those images that clutter the storage.
go to the google cloud console, select your project -> storage -> browser https://console.cloud.google.com/storage/browser
Select the "artifacts" bucket
Under the "lifecycle" tab add a rule to auto delete old images (in my case I put "delete after 1 day since update" which works fine for me)
Storage is safe now!
NOTE: if you face any deployment issues later, like if you deploy several days in a row and if it gives you an error on deploy, just delete the whole "container" folder manually in the artifacts which should solve it and then redeploy again. (make sure not to delete the artifacts bucket itself!)
Hope the firebase team will improve that - the current behavior looks confusing as it easily leads to an unexpected bill unless you take extra steps to prevent that. But you'll never know that it will happen until it does.

I assume this is related to firestore functions deploys (or not?), but is this really expected?
Yes, it's expected. Every time you deploy functions, Cloud Build will use a dedicated Cloud Storage space for the built docker image, and retain it until you delete it.
Can I cleanup this artifacts safely?
Yes, but then you won't be able to easily revert to a prior image. You would have to deploy again from your own source code.

On top of the GCP's Life Cycle settings for artifacts images, you can also consider the following for further optimization and costs reduction of your Firebase Functions deployment:
Clean up your functions folder, don't put unnecessary files in it, as we do not know if Google will only upload files by dependencies or by the whole functions folder. Feel free to refine this item if anyone of you can confirm this.
Remove unnecessary dependencies from functions/package.json, functions/node_modules and require statements from your JS files, e.g. functions/index.js.
Compact and compress your function's JS files by removing unnecessary comments, console loggings etc. You can achieve this with the help of grunt and uglify NPM packages. Again, we're not sure if the Cloud Build (or any of the Google functions deployment system) will auto-compress the function's images for us before storing them into the Container Registry or Cloud Storage (please refine this item if you have better answer).
Organize your functions properly by creating relevant function groups so that you can deploy only certain group(s) of function rather than simply firebase deploy --only functions.
If necessary, write codes that automatically detect and resolve environmental differences, e.g. environment variables from local emulators to production/staging, because the Firebase emulators and production environments may not be 100% consistent. If you don't do that, you may end up needing to deploy several times per day due to certain negligence -- this will spike up your deployment cost.
If necessary, change your deployment plan: from daily to weekly, or even from weekly to monthly, depending on your monthly budgets, criticality, and urgency.
Lastly, I hope the community can also help to add more recommended costs reduction plans and strategies on this post in order to help some small businesses and individuals to survive better on Firebase and Google Cloud Platform as a whole. Even just some links to good articles would help. Thanks!

Related

Firebase Cloud Functions -- package all in a single VS Code project, or create multiple VS Code projects?

I am new to cloud functions and a little unclear about the way they are "containerized" after they are written and deployed to my project.
I have two quite different sets of functions. One set deals with image storage and firebase, another deals with some time consuming computations. They two sets (lets call them A and B) of functions use different node modules and have no dependecies on each other, except they both use Firestore.
My question is wehther it matters if I put all the functions in a single VS Code project, or if I should split them up in separate projects? One question is on the deployment side? (It seems like you deploy all the functions in the project when you run firebase deploy changes, even if some of the functions haven't changed, but probably more important is whether or not functions which don't need sharp or other other image manipulation packages are "containerized" together with other functions which maybe need stats packages and math related packages, and does it make any difference how they are organized into projects?
I realize this question is high level and not about specific code, but its not so clear to me from the various resources what is the appropriate way to bundle these two sets of unrelated cloud functions to not waste a lot of unecessary loading once theya re deployed out to Firestore.
Visual studio code project is simply a way to package your code. You can create 2 folder in your project, one for each set of function with their own firebase configuration.
Only the source repository can be a constraint here, especially if 2 different teams work on the code base and each one doesn't need to see the code of the other set of functions
In addition, if you open a VS code project with the 2 set of functions, it will take more time to load them and to lint them.
On Google Cloud side, each functions are deployed in their own container. Of course, because the packaging engine (Buildpack) doesn't know, the whole code is added inside the container. When the app start, the whole code is loaded. More you have code, longer will be the init.
If you have segregate your set of functions code in different folder in your project, only the code for the set A will be embedded in the container of functions A, and same thing for B.
Now, of course, if you put all the functions at the same level and the functions doesn't use the same data, the same code and so on, it's:
The mess to understand which function do what
The mess in the container to load too much things
So, it's not a great code base design, but it's beyond the "Google Cloud" topic, and an engineering choice.
Initially I was really confused on GCP project vs VS Code IDE project...
On a question about how cloud functions are "grouped" into containers during deployment - I strongly believe that each cloud function "image" is "deployed" into its own dedicated and separate container in the GCP. I think Guillaume described it absolutely correctly. At the same time, the "source" code packed into an "image" - might have a lot of redundancies, and there may be plenty of resources, which are not used by the given cloud function. it may be a good idea to minimize that.
I also would like to suggest, that neither development nor deployment process should depend on the client side IDE, and ideally the deployment should not happen from the client machine at all, to eliminate any local configuration/version variability between different developers. If we work together - I may use vi, and you VS Code, and Guillaume - GoLand, for example. There should not be any difference in deployment, as the deployment process should take all code from (origin/remote) git repository, rather than from the local machine.
In terms of "packaging" - for every cloud function it may be useful to "logically" consolidate all required code (and other files), so that all required files are archived together on deployment, and pushed into a dedicated GCS bucket. And exclude from such "archives" any not used (not required) files. In that case we might have many "archives" - one per cloud function. The deployment process should redeploy only modified "archives", and don't touch unmodified cloud functions.

How to delete outdated Firebase Cloud function containers from GC Storage?

So recently Firebase started charging for Cloud Functions container storage: https://firebase.google.com/pricing
No free usage $0.026/GB
I have deployed 2 functions several times (no more than 10 times, can't remember exact count, but this is still pretty low, IMO). Now I am already billed a small amount (fractions of a cent for now). So seems that if I deploy the functions another few dozens of times, I'll get close to a dollar, because old (and unused) containers are not deleted from the storage bucket.
Is there a way to safely delete outdated, not used containers to free some space? Well, it may seem that a few cents are not worth the time, but still, that's not what a free tier should be like.
I found the only robust solution to this ongoing issue (for now) is to periodically remove all of the artifact files (following Doug's instructions). As noted by others, removing some of the files can cause subsequent deploy errors (I experienced these).
IMPORTANT: Only delete the artifact files, NOT the folders as this can also cause issues.
You can do partial or full deploys as normal without any issues (it seems that the artifact files are only referenced during the build/deploy process).
Not ideal by a long shot, but at least reduces the storage usage to the minimum (until it starts accumulating again).
Edit: I have experimented with Lifecycle rules in the artifacts bucket to try and automate the clearing out of the container, but the parameters provided cannot guarantee that ALL will be cleared in one hit (which you need it to).
For convenience, you can see the artifacts bucket from within the Firebase Storage UI by selecting the "Add Bucket" option and importing the buckets from GCP.
Go to the Cloud console
Select "Cloud Storage -> Browser" from the products in the hamburger menu.
You will see multiple storage buckets there. Simply dig in to the buckets that start with "artifacts" or end with "cloudbuild" and delete the old files (by date) that you don't want.
In case of Firebase Cloud Functions you can see from their documentation (lifecycle of a background function section):
When you update the function by deploying updated code, instances for older versions are cleaned up along with build artifacts in Cloud Storage and Container Registry, and replaced by new instances.
When you delete the function, all instances and zip archives are cleaned up, along with related build artifacts in Cloud Storage and Container Registry. The connection between the function and the event provider is removed.
It means that there is no need to manually cleanup and firebase deploy scripts are doing it automatically.
You should not remove build artifacts since cloud functions are scaling automatically and new instances are built from theese artifacts.
I don't really think that cost is nearly a problem since it's 0.026$/GB so you need very approximately about 76 functions to pay 1$ for their artifacts storage (I take approx size of 1 function's artifacts as 500mb). Also artifacts size should not grow up with every functions since it's basically size of dependencies which are more or less independent from number of deployed functions.

Specifying RAM in firebase cloud function

Trying to boost the memory of an existing cloud function using the GCP functions console. I've been able to perform this operation before using Edit Function/Variable, Networking, and Advanced Settings, but now after specifying my preferred memory limit I am taken to a secondary screen asking me to upload my code. This seems redundant, as I'd just like to redeploy with more RAM.
Memory, CPU, region, runtime, and so on are all options you choose at the time of deployment. They are not dynamically configurable options that you can simply tweak over time without redeploying. You're being asked to upload your code, because you need to go through the process of deployment again. The system doesn't assume that you wan to use exactly what you had previously.
Since you tagged this with "firebase", I'll assume that you're using the Firebase CLI for deployment. In that case, you will have to change the builders in your code to use the new settings you want to apply to the next deployment.

Reduce latency when rebuilding/updating Google Cloud Composer?

I am working with Google Cloud Composer, and whenever I change an environment variable or or any setting in airflow, it will trigger a rebuild of the airflow environment.
I was thinking there may be a way to resolve package dependencies (ie. upload python packages) to cut back on the rebuild latency.
I ask because the rebuild can take anywhere from 2-15 mins.
Has anyone had any luck with reducing this build time (with or without increasing costs)?
Environment updates can generally take between 5-30 minutes in Cloud Composer and it works as intended at this moment. Please check this public issue tracker for more insight. You can click on +1 to make it more visible to the Cloud Composer engineering team.
Please note, that the Composer needs to take care of a lot of resources, i.e. deploys Airflow within the Google Kubernetes Engine and App Engine, which forces rebuilding images of the container, updating the Airflow Webserver and so on.
I suggest you to take a look in the following Cloud Composer architecture documentation, where you can find all the components that needs to be updated with each change.
You can check if any of your PODs are in Evicted state, which means that your node is on low resource, so you should consider using higher resource machine instead of the standard one n1-standard-2.
I hope you find the above pieces of information useful.

Travis and Firebase: deploy only changed functions

I'm using Travis to automatically deploy my Firebase hosted website and cloud functions as I push to GitHub, as detailed here. However, even for my small website with a limited amount of cloud functions, deploying all of the functions takes quite a long time. Were I deploying manually, I would be able to use --only to specify precisely those functions that I actually changed. Is there a way to make this information available to Travis, so that only the necessary functions are rebuilt?
https://m.youtube.com/watch?v=iyGHW4UQ_Ts
min 30 and following
This guy solves the problem by copying all functions to a cloud bucket and then making a diff for every file. This works well if all your logic is in one file. But this is not what you want for larger projects. For my own project i used webpack to create one file for each function that includes the imports. then i generate a md5 hash for that file and save it to a functions-lock.json. with the next run i can easily check against the old hash value and only deploy the changed functions. The ci should manage the state of the lock file by uploading it to the cloud or doing some git magic
Unfortunately this isn't going to be simple to do -- the Firebase CLI deploys all of your functions because it's next-to-impossible to just analyze the code and figure out which functions are impacted (since you can require other files, you might have updated dependencies but no files changed, etc.).
One thing I can think of that might be a hack would be to have named branches for functions or groups of functions. Then you could git push to the branch of the specific function you want to deploy, and have a script that uses the branch name as a signal to pass the --only functions:<fnName> to the firebase deploy command. That's not the most glamorous solution, but, depending on how much this bugs you, it might help.
So this is a bit late but the long deployment times have bothered us for a while now.
Our solution is based on CircleCI but it should be possible to adapt.
First we get all changed files in the last merged PR for our branch with
git log -m -1 --name-only --pretty="format:" ${process.env.CIRCLE_SHA1}
CIRCLE_SHA1 is the SHA of the last merge commit, i.e featurebranch -> master
Then we get all the function filenames from our /functions/ directory and use
madge to generate an array of all the dependencies those functions have.
Next we go trough all changed files that we got from git and check if their filename is part of the dependency array for a sepcific cloud function, if so we add the cloudfunction to another array.
once this is done we pretty much have an array from all cloudfunctions that have been affected by the change of a specific file that we now can map to their actual cloud function names for deployment.
Now instead of always deploying 75 cloudfunctions which takes 45 minutes we only deploy maybe 20.

Resources