Is it wise to serve a stale page when upstream server is busy or down? - nginx

I work for a small nonprofit that during certain times of the year, sees an sudden uptick in traffic coalescing around virtual events or particular emails.
During these busy times, our server will sometimes get overwhelmed and NGINX will occasionally respond with a 502 error.
I know we need to address what's going with our caching or purchase more server space, but for now, I am also thinking that some of the performance issues can be resolved by having NGINX return stale content when the upstream is busy.
Our content -- particularly on the pages where we see an uptick during certain times of year -- is more or less is static. My thinking is rather than return 502 to the client, why not just send the user a maybe slightly older version of the page?
Our fastcgi_use_stale config currently looks like this:
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
According to the PHP FPM error logs, we are simply maxing out our child processes, and I am assuming that's what is causing the 502 error. (Again, I know this needs to addressed.)
Most example configs look like the above as well and do not include http_503 or http_502.
Is there a reason why you would not want to include http_503 and/or http_502 so they are at least getting something? Or is there a reason why most use-stale configs don't include those codes?

Related

Error 504 bad gateway on Back Office of Prestashop store

I have a problem with my Prestashop Store where sometimes I can get into Front Office without problems, but other times I get an error 504 and worse I cannot get into Back Office because get this error 504.
This is happening for four or five days and I don't know the root of the problem. I checked server logs but it only shows Negotiation error and idle timeout (120s) error. I cannot change the php.ini or nginx conf files because the hosting I have does not allow it (they say I need to change to VPS server to have root access and what I am using which is web cloud does not grant me access). I really need some guidance because I really don't want to lose possible customers.
Based on these, the server you are using seems to be very weak.
The best solution is to switch to a server that supports PrestaShop (a dedicated PrestaShop package) and does not necessarily need a VPS.
Check this:
https://digital.com/best-web-hosting/prestashop/
Just increase timeout of fastcgi directive.
On NGINX:
server{
# ...
fastcgi_read_timeout 60s;
fastcgi_send_timeout 60s;
# ...
}

Nginx request not timing out

I am very new to Nginx. I have set up a Nginx in my Live environment.
Problem Statement
I have set 4 servers as upstream servers in my Nginx configuration. I could see there are few requests which take more than 180 seconds overall and that makes my system very slow. I could see few requests going to the first server in the upstream and then selecting the 2nd server in the upstream. So i guess, the problem could be the first server is timing out and sending back the response after some timeout period. The only timeout period set in my configuration is
proxy_read_timeout 180;
Is this the main culprit? Can I get the timeout from the server if I change this value to a lesser value?
I need to change the value in the Live only after some expert advice.
Please someone put some light into this query.

What causes 'The underlying connection was closed' on nginx?

We have a payment gateway integration that posts data to a third party URL. The user then completes their payment process and when the transaction is complete the gateway posts back to a URL on our server.
That post is failing and the gateway are reporting the following error:
ERROR 13326: Couldn't speak to ServerResultURL [https://foo.com/bar].
Full Error Details: The underlying connection was closed: An unexpected error occurred on a send.
Response object is null
When I post direct to https://foo.com/bar I get a 200 response as I'd expect so I'm not sure where this is falling down.
This is on an Ubuntu box running nginx.
What could be causing that issue and how can I find more detail about it and a way to resolve it?
EDIT:
For brevity the example above is on a URL of /bar but the reality is that I have a rewrite in place (see below). The URL that actually gets posted to is /themes/third_party/cartthrob/lib/extload.php/cardsave_server/result so I'm not sure if the rewrite below is what's causing an issue.
I would still assume not as I do get a 200 response when posting via POSTMAN.
# http://expressionengine.stackexchange.com/questions/19296/404-when-sagepay-attempts-to-contact-cartthrob-notification-url-in-nginx
location /themes/third_party/cartthrob/lib/extload.php {
rewrite ^(.*) /themes/third_party/cartthrob/lib/extload.php?$1 last;
}
Typical causes of this kind of error
I bet your server is responding to the POST to /bar with something that the gateway (PaymentSense, right?) doesn't expect. This might be because:
The gateway can't reach your Ubuntu box over the network, because a firewall or network hardware between the two is blocking it.
Your https cert is bad / expired / self-signed, and the gateway is refusing the connection.
A misconfiguration of NGINX or your web application software (PHP, I imagine? or whatever nginx is serving up) is causing /bar to respond with some odd response, like a 30x, or a 50x error page, or possibly with just the wrong response, such as an HTML page.
Something else is wrong with the response to the POST.
The script/controller running at /bar could be getting unexpected input in the POST request, so you might want to look at the request coming in.
You have a network connectivity issue.
I'll leave the first two items for you to troubleshoot, because I don't think that's what you're asking in this question.
Troubleshooting NGINX Responses
I recommend configuring it to dump its response into an nginx variable using body_filter_by_lua so that you can see what response is coming out. A good example of how to set this up is available here. I think that will lead you understand why /bar is not behaving.
Troubleshooting NGINX Requests
If that isn't revealing the cause of this, try logging the request data. You can do that with something like:
location = /bar {
log_format postdata $request_body;
access_log /var/log/nginx/postdata.log postdata;
fastcgi_pass php_cgi;
}
Review the request headers and body of this POST, and if the error isn't immediately apparent, try to replay the exact same request (using an HTTP client that gives you complete control, such as curl) and debug what is happening with /bar. Is nginx running the script/controller that you think it should be running when you make an identical POST to /bar? Add logging to the /bar script/controller process.
Use interactive debugging if necessary. (This might require remote Xdebug if you're working with PHP, but no matter what you're using on your server, most web application tools offer some form of interactive debugging.)
Network Troubleshooting
If none of this works, it's possible that the gateway simply can't reach the host and port you're running this on, or that you have some other kind of network connectivity issue. I would run tcpdump on your Ubuntu box to capture the network traffic. If you can recreate this on a quiet (network) system, that will be to your advantage. Still, it's TLS (https), so don't expect to see much other than that the connection opens and packets are arriving. If you find that you need to see inside the TLS traffic in order to troubleshoot, you might consider using mitmproxy to do so.

IIS 7.5 Custom 503 error page

I have a site using asp.net and WebAPI deployed on IIS 7.5. Normal site usage is steady but occasionally there are bursty spikes with very high usage and users get very slow response timeouts. There is no option to improve performance at this time and the client has asked for a quick fix involving a customised 503 error page with the same look as the rest of the site.
The single IIS server sits behind a load balancer and there is the option of adding a 2nd server purely to serve a static error page when the site is under heavy load. There has been a suggestion of a solution where slow responses trigger all subsequent requests go to the static server and get the custom 503 error page until the usage subsides.
As far as I can see its not possible to customise the default 503 error page that is served automatically by IIS when its request queue is full.
I can't see anything on SO about how to solve this sort of issue - can anyone help?
Load balancers use probes to check if a node is operational. One possible solution is to make your own probe that checks if load on a node has reached defined threshold. If so then the probe (your probe) would return HTTP 500 to load balancer. If load balancer receives this error code from probe then it will not forward requests to this node. When threshold is not reached then the probe would return HTTP 200.

When a database (or upstream service) timeout occurs, do I send a 503 or 504?

If a website depends on an upstream database or other abstracted service or store - basically most websites known to man - then when the upstream requests dies with a timeout, should I return a 503 or a 504?
503 Service Unavailable
The server is currently unavailable (because it is overloaded or down
for maintenance). Generally, this is a temporary state. Sometimes,
this can be permanent as well on test servers.
504 Gateway Timeout
The server was acting as a gateway or proxy and did not receive a
timely response from the upstream server.
The 504 feels more designed for proxy servers, caches or other web infrastructure, but the 503 is not right either since the service is fine, the current request just happened to die, perhaps a search might have been to broad or something.
So which is 'right' according to HTTP?
Luke
503 sounds appropriate if this is a temporary condition that will be resolved simply by waiting. http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html states: "The implication is that this is a temporary condition which will be alleviated after some delay."
500 also sounds appropriate. The RFC states: "The server encountered an unexpected condition which prevented it from fulfilling the request." An unresponsive database is an exceptional/unexpected case.
IMO, what it comes down to is this: Are you providing an error code that will help callers (i.e. HTTP clients) respond to the situation? In this case, there is really nothing a client can do other than to try again later. Given this, I would keep it simple and return 500. I think clients are more likely to care if the site is available and less likely to care about the specific reason. Plus fewer response codes makes it easier to code clients. Again this is just my opinion.

Resources