Network traffic when mounting GCP bucket - networking

I'm supporting multiple application clusters, 8 individual clusters, with each cluster is being backed by a mysql instance and an NFS server to share filesystem with application servers. I want to be able to make daily backups of the database and shared filesystem to GCP cloud storage. As I'm looking at this tasks I read that I would be able to create and mount a GCP bucket, like attaching a shared filesystem, to each MySql and NFS server instances. I could also just create the GCP bucket, make local MySql and NFS shared filesystem backups, and then copy them over to the GCP bucket
There are about 12 MySql instances and 8 NFS instances
I would create one GCP bucket and create MySQL and NFS folder with a sub folder for each individual MySql and NFS instance. There would be possibly 2-3 terabytes of data backed up and continuously stored in the bucket.
Local backup and copy to bucket
Would never loose mount
copying over network using scp
have local copy to restore from when mount gets lost
Mount Bucket
can loose mount and no local copy
no copying over network using scp
My concern now is with network traffic. Assume that the GCP bucket including sub folders would contain 2-3 terabytes of data. When mounting the bucket on each MySql and NFS instance each instance would mount, share, all sub folders. ie. each instance would mount all the data, including data that belongs to another cluster.
Does anyone know, approximately:
how much network traffic a mounted gcp bucket would take?
If mounting this bucket would there be a constant synchronization between the individual MySql/NFS servers and the gcp bucket occure, since the mounted bucket act as a shared filesystem
Would it be better, network wise, to copy files from a server to the bucket instead of mounting the bucket?
Thank you for any info.
Cheers,
Roland

Related

Nginx cannot write cache content to the gcsfuse mount directory

I mounted the bucket to compute the engine VM via gcsfuse and then ran the Nginx program in the VM. Normally the Nginx program writes the cache to the directory, i.e. the mounted bucket, using the proxy_cache directive.
However, I have a problem that the Nginx can create cache files in the filesystem under the bucket directory, but the size of the cache file is always 0B. client requests keep getting "Miss cache" status.
So it seems that after mounting with gcsfuse, the Nginx application can only create cache files, but cannot write to them.
My VM environment is.
Machine type: n2-standard-2
CPU platform: Intel Cascade Lake
Architecture: x86/64
System: ubuntu-1804
In addition, gcsfuse has specified a service account with owner privileges via the --key-file directive, and the Nginx program has been run with the root user.
For example, the following debug log is an empty file (8/2b/0a7ed71cddee71b9276a0d72a030f2b8) created in the bucket after a client request and not written to the cache. What could be the cause of this possibility?
https://storage.googleapis.com/cloud_cdn_cache_bucket/debug_log.txt
Here is the debug log obtained by the command --debug_fuse --debug_fs --debug_gcs --debug_http -foreground.
You can't use Cloud Storage Fuse for cache.
There is a technical reason to that: GCSFuse is a Cloud Storage API call wrapper that transforms system calls to API calls. However, all the system calls aren't supported, especially those related to "database" format with stream write, seek, and rewrite partial content of the file. All common operations for a database (or cache) but not compliant with Cloud Storage: you can only write/read/delete a file. Update/partial write aren't supported. It's not a file system!
In addition, because you now know that GCSFuse is a wrapper (of system calls to API calls), you should feel that using that file system type is not a good idea: the latency is terrible!! It's API calls! Absolutely not recommended for cache and low latency operations.
The best solution is to use a local file system dedicated to cache. But if you scale out (more servers in parallel) you could have issues (cache is not shared between instances):
Use sticky session mechanism to always route the same user session on the same NGINX server and therefore always use the same cache context
Use the Filestore service that offers a NFS (Network File Share) system to mount the same storage space on different servers, with an interesting latency (not as good as a local file system)
You talked about a key file in your question also. I recommend you to avoid as much as you can to use the service account key files; especially if your app runs on Google Cloud. Let me know what's your key file usage in detail if you want more guidance

Shared storage for multiple GCE VMs

/* I'm new here and I've done days of research. */
What is the best practice to share files with multiple autoscaling Google Compute Engine VMs?
I'm planning to set up an Instance group of VMs with NGINX for serving static files for multiple domains. These VMs would autoscale to n (multiply itself) and the files would change a lot. I need a storage for the files these VMs will serve.
So far I've found these solutions:
1) Persistent disk + rsync -> This should have the smallest latency, but when I reach GBs of files, autoscaled VMs would be syncing for a long time after they spawn, thus throwing 404s.
2) Master VM without web server + nfs/smb -> Small latency, but no redundancy.
3) Cloud Storage + FUSE -> Big latency, great redundancy, no SLA.
4) Shared Persistent disk -> Small latency, but read-only.
5) NGINX + Cloud SQL/Bigtable/Spanner/Datastore -> Mehish latency and I don't feel well about connecting webserver to a DB.
Are there any other better solutions?
Thanks
EDIT: The static files are multiple index.html files -> homepages of multiple domains.
There is also:
6) Firebase Hosting - https://firebase.google.com/docs/hosting
or
7) another way - I would personally go with Cloud Storage but not FUSE. Or at least not for serving. You can still use FUSE for writing to Bucket(s). Of course the best way would be to just use Cloud Storage API from within the application.
For serving files:
I would create a Load balancer with a Backend bucket the same bucket where the application writes. Also be careful to enable Cloud CDN on that Load balancer.
More details at:
Load balancer - https://cloud.google.com/load-balancing/
Cloud CDN - https://cloud.google.com/cdn/
or just be bold and create now a load balancer at https://console.cloud.google.com/networking/loadbalancing/loadBalancers/list?project=
For serving static files the best is definitely to use a load balancer and backend buckets with Cloud CDN enabled.
The load balancer have rules to forward traffic. For example it can intercept all requests by hosts, subdomain, or path.
*.mysite1.com => bucket1
demo.mysite2.net => bucket1
test.mysite3.com => bucket2
Because files are served with Cloud CDN, the latency becomes minimal.
In order to write your files to a bucket you could use FUSE, or create files locally and use gsutil cp.
Persistent disks can only be shared within several Compute Engines in read mode. If you need write mode it won't work.
The last option Cloud SQL + Nginx is actually pretty good. Cloud SQL is way more fast than Mysql servers. And connection between cloud sql and gce is easy and reliable.
But it is more a matter of preferences here... if you feel comfortable writing the scripts that will read and write to it.

Flask SQLAlchemy Database with AWS Elastic Beanstalk - waste of time?

I have successfully deployed a Flask application to AWS Elastic Beanstalk. The application uses an SQLAlchemy database, and I am using Flask-Security to handle login/registration, etc. I am using Flask-Migrate to handle database migrations.
The problem here is that whenever I use git aws.push it will push my local database to AWS and overwrite the live one. I guess what I'd like to do is only ever "pull" the live one from AWS EB, and only push in rare circumstances.
Will I be able to access the SQLAlchemy database which I have pushed to AWS? Or, is this not possible? Perhaps there is some combination of .gitignore and .elasticbeanstalk settings which could work?
I am using SQLite.
Yes, your database needs to not be in version control, it should live on persistent storage (most likely the Elastic Block Storage service (EBS)), and you should handle schema changes (migrations) using something like Flask-Migrate.
The AWS help article on EBS should get you started, but at a high level, what you are going to do is:
Create an EBS volume
Attach the volume to a running instance
Mount the volume on the instance
Expose the volume to other instances using a Network File System (NFS)
Ensure that when new EBS instances launch, they mount the NFS
Alternatively, you can:
Wait until Elastic File System (EFS) is out of preview (or request access) and mount all of your EB-started instances on the EFS once EB supports EFS.
Switch to the Relational Database Service (RDS) (or run your own database server on EC2) and run an instance of (PostgreSQL|MySQL|Whatever you choose) locally for testing.
The key is hosting your database outside of your Elastic Beanstalk environment. If not, as the load increases different instances of your Flask app will be writing to their own local DB. There won't a "master" database that will contain all the commits.
The easiest solution is using the AWS Relational Database Service (RDS) to host your DB as an outside service. A good tutorial that walks through this exact scenario:
Deploying a Flask Application on AWS using Elastic Beanstalk and RDS
SQLAlchemy/Flask/AWS is definitely not a waste of time! Good luck.

Secondary storage not recognized in apache cloud stack

I am trying to setup a cloudstack (v4.4 on CentOS 6.5) management instance to talk to one physical host with XenServer (6.2) on it.
I have got so far as it setting the zone/pod/cluster/host and it can see the XenServer machine. Primary storage is also visible to it - I can see it in the dashboard. However it can't see the secondary storage and thus I can't download templates/ISOs. The dashboard says 0kb of 0kb in use for secondary storage.
I have tried having the secondary storage as local to the cloudstack management instance (whilst setting the use.local global setting to true). I have also tried setting up a new host and setting that up as the NFS share and it did not work.
I have checked in both instances that the shares I have made are mountable - and they are. I have also seeded them with the template VM by running the command outlined in the installation guide. Both places I set to be secondary storage had ample space available - 1 greater than 200GB. The other around 70GB. I have also restarted the management machine a few times.
Any help would be much appreciated!
You need secondary storage enabled in order to supply templates to your hosts. The simplest way to achieve that is to create an NFS export that is available to the host. I usually do it in the host it self. In your case that would be the XenServer. Then in the management server add the secondary storage in: Infrastructure -> Secondary Storage -> Add Secondary Storage.
Secondary storage is provided by a dedicated system VM. Once you add a secondary storage, CloudStack will create a system VM for that. Start by checking the status of the system VMs in: Infrastructure -> System VMs
The one you are looking for should be called Secondary Storage VM.
It should be running and the agent should be ready (two green circles). If the agent is not ready, first ssh to your XenServer host and then to the system VM using the link local IP (you can see the IP in the details of the VM) with the following command:
ssh -i /root/.ssh/id_rsa.cloud -p 3922 LIKN_LOCAL_IP_ADDRESS
Then in the system VM, run a diagnostic tool to check what could be wrong:
/usr/local/cloud/systemvm/ssvm-check.sh

Hosting wordpress blog on AWS

I have hosted a wordpress blog on AWS using EC2 instance t1.micro (Ubuntu).
I am not an expert on linux administration. However, after going through few tutorials, I was able to manage to have wordpress successfully running.
I noticed a warning on AWS console that "In case if your EC2 instance terminates, you will lose your data including wordpress files and data stored by MySql service."
Does that mean I should use S3 service for storing data to avoid any accidental data loss? Or my data will remain safe in an EBS volume even if my EC2 instance terminates?
By default, the root volume of an EC2 instance will be deleted if the instance is terminated. It can only be terminated automatically if its running as a spot instance. Otherwise it can only be terminated if you do it.
Now with that in mind, EBS volumes are not failure proof. They have a small chance of failing. To recover from this, you should either create regular snapshots of your EBS volume, or back up the contents of your instance to s3 or other storage service.
You can setup snspshot lifecycle policy to create scheduled volume snapshots.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html

Resources