Nginx cannot write cache content to the gcsfuse mount directory - nginx

I mounted the bucket to compute the engine VM via gcsfuse and then ran the Nginx program in the VM. Normally the Nginx program writes the cache to the directory, i.e. the mounted bucket, using the proxy_cache directive.
However, I have a problem that the Nginx can create cache files in the filesystem under the bucket directory, but the size of the cache file is always 0B. client requests keep getting "Miss cache" status.
So it seems that after mounting with gcsfuse, the Nginx application can only create cache files, but cannot write to them.
My VM environment is.
Machine type: n2-standard-2
CPU platform: Intel Cascade Lake
Architecture: x86/64
System: ubuntu-1804
In addition, gcsfuse has specified a service account with owner privileges via the --key-file directive, and the Nginx program has been run with the root user.
For example, the following debug log is an empty file (8/2b/0a7ed71cddee71b9276a0d72a030f2b8) created in the bucket after a client request and not written to the cache. What could be the cause of this possibility?
https://storage.googleapis.com/cloud_cdn_cache_bucket/debug_log.txt
Here is the debug log obtained by the command --debug_fuse --debug_fs --debug_gcs --debug_http -foreground.

You can't use Cloud Storage Fuse for cache.
There is a technical reason to that: GCSFuse is a Cloud Storage API call wrapper that transforms system calls to API calls. However, all the system calls aren't supported, especially those related to "database" format with stream write, seek, and rewrite partial content of the file. All common operations for a database (or cache) but not compliant with Cloud Storage: you can only write/read/delete a file. Update/partial write aren't supported. It's not a file system!
In addition, because you now know that GCSFuse is a wrapper (of system calls to API calls), you should feel that using that file system type is not a good idea: the latency is terrible!! It's API calls! Absolutely not recommended for cache and low latency operations.
The best solution is to use a local file system dedicated to cache. But if you scale out (more servers in parallel) you could have issues (cache is not shared between instances):
Use sticky session mechanism to always route the same user session on the same NGINX server and therefore always use the same cache context
Use the Filestore service that offers a NFS (Network File Share) system to mount the same storage space on different servers, with an interesting latency (not as good as a local file system)
You talked about a key file in your question also. I recommend you to avoid as much as you can to use the service account key files; especially if your app runs on Google Cloud. Let me know what's your key file usage in detail if you want more guidance

Related

Accessing session folder resources in a multi-instance Opencpu

I have an Opencpu server with my package in it. This package contains a function which generates a plot image. When I send a POST request to this function via Opencpu, I get the session id in response which in turns is actually a folder on the Opencpu server containing the resources, image be the one. I pass on this image url (being served by Opencpu server) to my application and it uses that to create a PDF report.
Now, I have scaled this whole scenario by creating multiple instances of Opencpu containing my package. The instances are behind a Load Balancer. When I do the same, I get the image url. When my application uses this image url, it may not be found because now the request may have gone to some another instance of the Opencpu.
How can I approach the solution of this problem? One thing I have done for now is uploading the image to some public instance and return the corresponding path to the application. But that is too coupled.
Thanks.
Load balancing is always a bit complicated, so if possible, it is
easier to just move to a larger server. Most (cloud) providers offer (virtual) instances with many cores and 100GB+ RAM, which will allow you to
support many users.
If you really need load balancing there are a few methods.
One approach is to map the ocpu-store directory on the ocpu servers to
a shared NSF server. By default opencpu stores all sessions in the
/tmp/ocpu-store directory on the server. You can set a different location by setting the
tmpdir option in your /etc/opencpu/server.conf. There is an example configuration file that sets tmpdir in the /etc/opencpu/server.conf.d/ec2.conf.disabled on your server (rename to activate).
If you don't want to setup an NSF server, a simpler approach is to
configure your load balancer to always send particular clients to a
particular backend. For example if you use nginx you can set the load
balancing method to ip-hash.
Obviously this method requires that clients do not change ip address during the session. And
it will only be effective if you have clients connecting from a variation
of ip addresses.

Google Cloud Container share disk with ftp serve

I want to share a persistence volumen from my Google Container Engine instance with my FTP Server instance in other to access to the file that upload to FTP server.
If I've use a Computer Engine Disk I can't do that because the disk became readonly.
I see that I can use a Cloud Storage but I don't know If I can used as persistence volumen in Container Engine.
Anybody can help me?
Regards
First of all you need a FTP server container image for that. I'm not sure if anything off-the-shelf exists, you may need to build it yourself.
Then you can simply use GCE Persistent Disks as you said (however I'm not sure what you mean by "because the disk became readonly", you're supposed to explain your problem in detail at StackOverflow, people here simply can't guess it correctly all the time).
So you can follow this tutorial to learn about how to use GCE Persistent Disks on GKE clusters: https://cloud.google.com/kubernetes-engine/docs/tutorials/persistent-disk then you can modify the mysql part with your FTP server docker image.
Google Cloud Storage won't give you the filesystem semantics (such as permission bits, user/group on Linux) so it probably won't suit for FTP. Also you can't mount GCS to your compute instances.

Shared storage for multiple GCE VMs

/* I'm new here and I've done days of research. */
What is the best practice to share files with multiple autoscaling Google Compute Engine VMs?
I'm planning to set up an Instance group of VMs with NGINX for serving static files for multiple domains. These VMs would autoscale to n (multiply itself) and the files would change a lot. I need a storage for the files these VMs will serve.
So far I've found these solutions:
1) Persistent disk + rsync -> This should have the smallest latency, but when I reach GBs of files, autoscaled VMs would be syncing for a long time after they spawn, thus throwing 404s.
2) Master VM without web server + nfs/smb -> Small latency, but no redundancy.
3) Cloud Storage + FUSE -> Big latency, great redundancy, no SLA.
4) Shared Persistent disk -> Small latency, but read-only.
5) NGINX + Cloud SQL/Bigtable/Spanner/Datastore -> Mehish latency and I don't feel well about connecting webserver to a DB.
Are there any other better solutions?
Thanks
EDIT: The static files are multiple index.html files -> homepages of multiple domains.
There is also:
6) Firebase Hosting - https://firebase.google.com/docs/hosting
or
7) another way - I would personally go with Cloud Storage but not FUSE. Or at least not for serving. You can still use FUSE for writing to Bucket(s). Of course the best way would be to just use Cloud Storage API from within the application.
For serving files:
I would create a Load balancer with a Backend bucket the same bucket where the application writes. Also be careful to enable Cloud CDN on that Load balancer.
More details at:
Load balancer - https://cloud.google.com/load-balancing/
Cloud CDN - https://cloud.google.com/cdn/
or just be bold and create now a load balancer at https://console.cloud.google.com/networking/loadbalancing/loadBalancers/list?project=
For serving static files the best is definitely to use a load balancer and backend buckets with Cloud CDN enabled.
The load balancer have rules to forward traffic. For example it can intercept all requests by hosts, subdomain, or path.
*.mysite1.com => bucket1
demo.mysite2.net => bucket1
test.mysite3.com => bucket2
Because files are served with Cloud CDN, the latency becomes minimal.
In order to write your files to a bucket you could use FUSE, or create files locally and use gsutil cp.
Persistent disks can only be shared within several Compute Engines in read mode. If you need write mode it won't work.
The last option Cloud SQL + Nginx is actually pretty good. Cloud SQL is way more fast than Mysql servers. And connection between cloud sql and gce is easy and reliable.
But it is more a matter of preferences here... if you feel comfortable writing the scripts that will read and write to it.

How to encrypt docker images or source code in docker images?

Say I have a docker image, and I deployed it on some server. But I don't want other user to access this image. Is there a good way to encrypt the docker image ?
Realistically no, if a user has permission to run the docker daemon then they are going to have access to all of the images - this is due to the elevated permissions docker requires in order to run.
See the extract from the docker security guide for more info on why this is.
Docker daemon attack surface
Running containers (and applications)
with Docker implies running the Docker daemon. This daemon currently
requires root privileges, and you should therefore be aware of some
important details.
First of all, only trusted users should be allowed to control your
Docker daemon. This is a direct consequence of some powerful Docker
features. Specifically, Docker allows you to share a directory between
the Docker host and a guest container; and it allows you to do so
without limiting the access rights of the container. This means that
you can start a container where the /host directory will be the /
directory on your host; and the container will be able to alter your
host filesystem without any restriction. This is similar to how
virtualization systems allow filesystem resource sharing. Nothing
prevents you from sharing your root filesystem (or even your root
block device) with a virtual machine.
This has a strong security implication: for example, if you instrument
Docker from a web server to provision containers through an API, you
should be even more careful than usual with parameter checking, to
make sure that a malicious user cannot pass crafted parameters causing
Docker to create arbitrary containers.
For this reason, the REST API endpoint (used by the Docker CLI to
communicate with the Docker daemon) changed in Docker 0.5.2, and now
uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the
latter being prone to cross-site request forgery attacks if you happen
to run Docker directly on your local machine, outside of a VM). You
can then use traditional UNIX permission checks to limit access to the
control socket.
You can also expose the REST API over HTTP if you explicitly decide to
do so. However, if you do that, being aware of the above mentioned
security implication, you should ensure that it will be reachable only
from a trusted network or VPN; or protected with e.g., stunnel and
client SSL certificates. You can also secure them with HTTPS and
certificates.
The daemon is also potentially vulnerable to other inputs, such as
image loading from either disk with ‘docker load’, or from the network
with ‘docker pull’. This has been a focus of improvement in the
community, especially for ‘pull’ security. While these overlap, it
should be noted that ‘docker load’ is a mechanism for backup and
restore and is not currently considered a secure mechanism for loading
images. As of Docker 1.3.2, images are now extracted in a chrooted
subprocess on Linux/Unix platforms, being the first-step in a wider
effort toward privilege separation.
Eventually, it is expected that the Docker daemon will run restricted
privileges, delegating operations well-audited sub-processes, each
with its own (very limited) scope of Linux capabilities, virtual
network setup, filesystem management, etc. That is, most likely,
pieces of the Docker engine itself will run inside of containers.
Finally, if you run Docker on a server, it is recommended to run
exclusively Docker in the server, and move all other services within
containers controlled by Docker. Of course, it is fine to keep your
favorite admin tools (probably at least an SSH server), as well as
existing monitoring/supervision processes (e.g., NRPE, collectd, etc).
Say if only some strings need to be encrypted. Could possibly encrypt this data using openssl or an alternative solution. Encryption solution should be setup inside the docker container. When building container - data is encrypted. When container is run - data is decrypted (possibly with the help of an entry using a passphrase passed from .env file). This way container can be stored safely.
I am going to play with it this week as time permits, as I am pretty curious myself.

Hosting wordpress blog on AWS

I have hosted a wordpress blog on AWS using EC2 instance t1.micro (Ubuntu).
I am not an expert on linux administration. However, after going through few tutorials, I was able to manage to have wordpress successfully running.
I noticed a warning on AWS console that "In case if your EC2 instance terminates, you will lose your data including wordpress files and data stored by MySql service."
Does that mean I should use S3 service for storing data to avoid any accidental data loss? Or my data will remain safe in an EBS volume even if my EC2 instance terminates?
By default, the root volume of an EC2 instance will be deleted if the instance is terminated. It can only be terminated automatically if its running as a spot instance. Otherwise it can only be terminated if you do it.
Now with that in mind, EBS volumes are not failure proof. They have a small chance of failing. To recover from this, you should either create regular snapshots of your EBS volume, or back up the contents of your instance to s3 or other storage service.
You can setup snspshot lifecycle policy to create scheduled volume snapshots.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/snapshot-lifecycle.html

Resources