I have the following setup.
2 x gce n1.standard instances running nginx/php5-fpm
1 x cloud SQL d8 instance with both GCE instances connected
to it.
The front end servers are running a stripped down version of osCommerce-2.3.3.4 with NO admin section on the front end. Its just the catalog portion of osC.
I ran a load test with load impact, and the site is unusable at around 50-100 users. I am only looking to support at absolute max 500/users at any give time. We average around 130-170 on our current server.
I am not looking for a complete explanation but any helpful places to check, things to try, stuff to read, I just need a direction to go in to get this cloud platform working like we wanted.
Thanks in advance.
Related
I wanted to resize RAM and CPU of my machine, so I stopped the VM instance and when I tried to start it I got an error:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Here you can see the screenshot.
I've tried to start VM instance today, but result was the same and I got an error message again:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Then I tried to move my instance to different region, but I got an error message:
sarkarijobalerts123#cloudshell:~ (freesarkarijobalerts)$ gcloud compute instances move wordpress-2-vm --zone=asia-south1-a --destination-zone=asia-south1-b
Moving gce instance wordpress-2-vm...failed.
ERROR: (gcloud.compute.instances.move) Instance cannot be moved while in state: TERMINATED
My website is DOWN for a couple of days, please help me.
The standard procedure is to create a snapshot out of the stopped VM instance [1] and then create a new one in another zone [2].
[1] https://cloud.google.com/compute/docs/disks/create-snapshots
[2] https://cloud.google.com/compute/docs/disks/restore-and-delete-snapshots#restore_a_snapshot_of_a_persistent_disk_to_a_new_disk
Let's have a look at the cause of this issue:
When you stop an instance it releases some resources like vCPU and memory.
When you start an instance it requests resources like vCPU and memory back and if there's not enough resources available in the zone you'll get an error message:
Error: The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
more information available in the documentation:
If you receive a resource error (such as ZONE_RESOURCE_POOL_EXHAUSTED
or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS) when requesting new
resources, it means that the zone cannot currently accommodate your
request. This error is due to Compute Engine resource obtainability,
and is not due to your Compute Engine quota.
Resource availability are depending from users requests and therefore are dynamic.
There are a few ways to solve your issue:
Move your instance to another zone by following instructions.
Wait for a while and try to start your VM instance again.
Reserve resources for your VM by following documentation to avoid such issue in future:
Create reservations for Virtual Machine (VM) instances in a specific
zone, using custom or predefined machine types, with or without
additional GPUs or local SSDs, to ensure resources are available for
your workloads when you need them. After you create a reservation, you
begin paying for the reserved resources immediately, and they remain
available for your project to use indefinitely, until the reservation
is deleted.
We are seeing approximately linear growth in our bill due to "GCP Storage egress between NA and EU" costs. As far as I can tell we have neither any storage buckets, nor instances in NA. Looking at the storage.googleapis.com/network/sent_bytes_count metric, it appears the egress might be coinciding with deployment of the App Engine app (it is a static site that is redeployed every 5-10 minutes).
How can I find out what data is being transferred from NA and how to stop this, to avoid the charges?
You can activate the Cloud Storage data access logs. It's deactivated by default because the volume of logs can be huge.
Anyway, for you case, you can activate them for your investigation, and then deactivate them.
You can also have a look on your App Engine deployment region. It's maybe the root cause.
I'm also noticing some unexpected GCP Storage egress between NA and EU costs. I'm running an App Engine app in the EU region. My theory is that this is due to container images being downloaded from gcr.io (NOT eu.gcr.io) as part of the process of deploying an App Engine version. (It says here that gcr.io is currently in the US.) I find some evidence of this in the Cloud Build history: there, I see e.g. Pulling image: gcr.io/gae-runtimes/crane:current. If I browse to gcr.io/gae-runtimes/crane, I see that its "Virtual size" is 7.66MB, so, since I've done 37 deploys by now and my bill mentions 1.58GB of egress, by itself it does not explain the figure completely, but presumably other, bigger images are being downloaded as well. (I see in the Build History things like Already have image (with digest): gcr.io/cloud-builders/gcs-fetcher, but perhaps these are charged anyway?)
When trying to download bokeh sample data following instructions in 'https://docs.bokeh.org/en/latest/docs/installation.html#sample-data' it fails to download with HTTP Error 403: Forbidden
in conda prompt:
bokeh sampledata (failed)
in Jupyter notebook
import bokeh.sampledata
bokeh.sampledata.download() (failed
TLDR; you will either need to upgrade to Bokeh version 1.3 or later, or else you can manually edit the bokeh.util.sampledata module to use the new CDN location http://sampledata.bokeh.org. You can see the exact change to make in PR #9075
The bokeh.sampledata module originally pulled data directly from a public AWS S3 bucket location hardcoded in the module. This was a poor choice that left open the possibility for abuse, and in late 2019 an incident finally happened where someone (intentionally or unintentionally) downloaded the entire dataset tens of thousands of times over a three day period, incurring a significant monetary cost. (Fortunately, AWS saw fit to award us a credit to cover this anomaly.) Starting in version 1.3 sample data is now only accessed from a proper CDN with much better cost structures. All public direct access to the original S3 bucket was removed. This change had the unfortunate effect of immediately breaking bokeh.sampledata for all previous Bokeh versions, however as an open-source project we simply cannot afford the real (and potentially unlimited) financial risk exposure.
I need to connect monitoring and tracing tools for our application. Our main code is on Express 4 running on Google Cloud Functions. All requests incoming from front nginx proxy server that handle domain and pretty routes names. Unfortunately, trace agent traces this requests, that coming on nginx front proxy without any additional information, and this is not enough to collect useful information about app. I found the Stack Driver custom API, which, as I understand might help to collect appropriate data on runtime, but I don't understand how I can connect it to Google Cloud Functions app. All other examples saying, that we must extend our startup script, but Google Cloud Functions fully automated thing, there is no such possibility here.
Found solution. I included require("#google-cloud/trace-agent"); not at the top of the index.js. It should be included before all other modules. After that it started to work.
Placing require("#google-cloud/trace-agent") as the very first import didn't work for me. I still kept getting:
ERROR:#google-cloud/trace-agent: express tracing might not work as /var/tmp/worker/node_modules/express/index.js was loaded before the trace agent was initialized.
However I managed to work around it by manually patching express:
var traceApi = require('#google-cloud/trace-agent').get();
require("#google-cloud/trace-agent/src/plugins/plugin-express")[0].patch(
require(Object.keys(require('module')._cache).find( _ => _.indexOf("express") !== -1)),
traceApi
);
my server down random every day 4-5 time cause get high load very quick..
I have install csf and with some config server now stable.. load around 5.
BUT the big isuse is : the real user very hard to access website specially from IE browser you can test at xaluan.com, it also timeout some time.
the flowing is config using in csf:
SYNFLOOD = "1"
SYNFLOOD_RATE = "100/s"
SYNFLOOD_BURST = "10"
CONNLIMIT = "80;30"
PORTFLOOD = "80;tcp;70;5"
CT_LIMIT = "29"
other config may same as default
i playing around with this config for a week but still not good..
If increase the rate up to SYNFLOOD_RATE = "140/s" or more.. the website response very fast.. only bad side effect of server load increase so fast, normal 20 and may be up to few hundred in peak time ..
my need is response time fast but load still low.. please help
thanks
ps: server runing nginx frontend, apache, mysql, php ,, the home page has around 70 elements which will cached in browser in fist time access..
my server down random every day 4-5 time cause get high load very quick
There can be many reasons for this. try nice top -c -d 2 from command line and check what process is causing too much load. You can't simply blame csf for that.
Load may get high if the DB disk I/O is high. Its better to install mytop DB monitoring tool in the server and check if that the reason.
For installing mytop use this link http://bloke.org/linux/installing-mytop-on-cpanel/
I hope this will help you to monitor DB load usage