GCP deployment with nginx - uwsgi - flask fails - nginx

I have a very simple flask app that is deployed on GKE and exposed via google external load balancer. And getting random 502 responses from the backend-service (added a custom headers on backend-service and nginx to make sure the source and I can see the backend-service's header but not nginx's)
The setup is;
LB -> backend-service -> neg -> pod (nginx -> uwsgi) where pod is the application built using flask and deployed via uwsgi and nginx.
The scenario is to handle image uploads in simple-secured way. Sender sends me a token with upload request.
My flask app
receive request and check the sent token via another service using "requests".
If token valid, proceed to handle the image and return 200
If token is not valid, stop and send back a 401 response.
First, I got suspicious about the 200 and 401's. And reverted all responses to 200. Following some of the expected responses, server starts to respond 502 and keep sending it. "Some of the messages at the very beginning succeeded".
nginx error logs contains below lines
2023/02/08 18:22:29 [error] 10#10: *145 readv() failed (104: Connection reset by peer) while reading upstream, client: 35.191.17.139, server: _, request: "POST /api/v1/imageUpload/image HTTP/1.1", upstream: "uwsgi://127.0.0.1:21270", host: "example-host.com"
my uwsgi.ini file is as below;
[uwsgi]
socket = 127.0.0.1:21270
master
processes = 8
threads = 1
buffer-size = 32768
stats = 127.0.0.1:21290
log-maxsize = 104857600
logdate
log-reopen
log-x-forwarded-for
uid = image_processor
gid = image_processor
need-app
chdir = /server/
wsgi-file = image_processor_application.py
callable = app
py-auto-reload = 1
pidfile = /tmp/uwsgi-imgproc-py.pid
my nginx.conf is as below
location ~ ^/api/ {
client_max_body_size 15M;
include uwsgi_params;
uwsgi_pass 127.0.0.1:21270;
}
Lastly, my app has a healthcheck method with simple JSON response. It does no extra stuff and simply returns. This never fails as explained above.
Edit : my nginx access logs in the pod shows the response as 401 while the client receives 502.

for those who gonna face with the same issue, the problem was post data reading (or not reading).
nginx was expecting to get post data read by the proxied, in our case uwsgi, app. But according to my logic I was not reading it in some cases and returning back the response.
Setting uwsgi post-buffering solved the issue.
post-buffering = %(16 * 1024 * 1024)
Which led me to this solution;
https://stackoverflow.com/a/26765936/631965
Nginx uwsgi (104: Connection reset by peer) while reading response header from upstream

Related

nginx - connection timed out while reading upstream

I have a flask server with and endpoint that processes some uploaded .csv files and returns a .zip (in a JSON reponse, as a base64 string)
This process can take up to 90 seconds
I've been setting it up for production using gunicorn and nginx and I'm testing the endpoint with smaller .csv s. They get processed fine and in a couple seconds I get the "got blob" log. But nginx doesn't return it to the client and finally it times out. I set up a longer fail-timeout of 10 minutes and the client WILL wait 10 minutes, then time out
the proxy read timeout offered as solution here is set to 3600s
Also the proxy connect timeout is set to 75s according to this
also the timeout for the gunicorn workers according to this
The error log says: "upstream timed out connection timed out while reading upstream"
I also see examples of nginx receiving an OPTIONS request and immediately after the POST request (some CORS weirdness from the client) where nginx passes the OPTIONS request but fails to pass the POST request to gunicorn despite nginx having received it
Question:
What am I doing wrong here?
Many thanks
http {
upstream flask {
server 127.0.0.1:5050 fail_timeout=600;
}
# error log
# 2022/08/18 14:49:11 [error] 1028#1028: *39 upstream timed out (110: Connection timed out) while reading upstream, ...
# ...
server {
# ...
location /api/ {
proxy_pass http://flask/;
proxy_read_timeout 3600;
proxy_connect_timeout 75s;
# ...
}
# ...
}
}
# wsgi.py
from main import app
if __name__ == '__main__':
app.run()
# flask endpoint
#app.route("/process-csv", methods=['POST'])
def process_csv():
def wrapped_run_func():
return blob, export_filename
# ...
try:
blob, export_filename = wrapped_run_func()
b64_file = base64.b64encode(blob.getvalue()).decode()
ret = jsonify(file=b64_file, filename=export_filename)
# return Response(response=ret, status=200, mimetype="application/json")
print("got blob")
return ret
except Exception as e:
app.logger.exception(f"0: Error processing file: {export_filename}")
return Response("Internal server error", status=500)
ps. getting this error from stackoverflow
"Your post appears to contain code that is not properly formatted as code. Please indent all code by 4 spaces using the code toolbar button or the CTRL+K keyboard shortcut. For more editing help, click the [?] toolbar icon."
for having perfectly well formatted code with language syntax, I'm sorry that I had to post it ugly
Sadly I got no response
See last lines for the "solution" finally implemented
CAUSE OF ERROR: I believe the problem is that I'm hosting the Nginx server on wsl1
I tried updating to wsl2 and see if that fixed it but I need to enable some kind of "nested virtualization", as the wsl1 is running already on a VM.
Through conf changes I got it to the point where no error is logged, gunicorn return the file then it just stays in the ether. Nginx never gets/sends the response
"SOLUTION":
I ended up changing the code for the client, the server and the nginx.conf file:
the server saves the resulting file and only returns the file name
the client inserts the filename into an href that then displays a link
on click a request is sent to nginx which in turn just sends the file from a static folder, leaving gunicorn alone
I guess this is the optimal way to do it anyway, though it still bugs me I couldn't (for sure) find the reason of the error

how to send metrics to influx oss2 using jolokia in telegraf config?

after running teltelegraf -debug with jolokia config
[[inputs.jolokia2_agent]]
urls = ["http://<other ip>:8080/jolokia-war-unsecured-1.6.2/"]
[[inputs.jolokia2_agent.metric]]
name = "jr"
mbean = "java.lang:type=Runtime"
paths = ["Uptime"]
I get this errors:
[agent] Initializing plugins
2022-07-02T12:51:57Z D! [agent] Connecting outputs
2022-07-02T12:51:57Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2022-07-02T12:51:57Z D! [agent] Successfully connected to outputs.influxdb_v2
2022-07-02T12:51:57Z D! [agent] Starting service inputs
2022-07-02T12:52:07Z E! [outputs.influxdb_v2] When writing to [https://MYIP:8086]: Post "https://MYIP:8086/api/v2/write?bucket=monitoringdb&org=myorg": http: server gave HTTP response to HTTPS client
2022-07-02T12:52:07Z D! [outputs.influxdb_v2] Buffer fullness: 81 / 10000 metrics
2022-07-02T12:52:07Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)
2022-07-02T12:52:07Z E! [outputs.influxdb_v2] When writing to [https://MYIP:8086]: Post "https://MYIP:8086/api/v2/write?bucket=monitoringdb&org=myorg": http: server gave HTTP response to HTTPS client
This error is coming from your influxdb output. It says your client is using https; however, the server responded with an http response. In your config, you probably specified a URL with https://, but the server is probably only using http://.

flask server is truncating long json responses some of the times

my route is fetching user tokens
GET /tokens
average response time is around 180ms and response is json.
using Flask + nginx.
some requests, the response content is truncated at around 33kb, thus the JSON is malformed. some requests, with the same parameters, at nearly the same time, the response is ok at around 216kb.
My question is, why is this happening and why is this happening in an un consist way?
here is flask response code
class NormalResponse(Response):
def __init__(self, response):
super(NormalResponse, self).__init__(response, 200)
res = json.dumps(paginator.paginate(tokens))
return NormalResponse(res)
I found the issue related to nginx since the failed responses have this logs
2018/12/18 16:35:17 [crit] 16#16: *95010 open() "/var/tmp/nginx/uwsgi/1/42/0000000421" failed (13: Permission denied) while reading upstream, client: 172.31.72.76, server: , request: "GET /tokens?limit=501&offset=0&order=desc&owner_id=11111 HTTP/1.1", upstream: "uwsgi://unix:/run/server.socket:", host: "oauth.dev.bla_bla.com"
Seems like response overflows proxy_buffers and tries to temporarily save it to proxy_temp_path and your error message quite confirms it. You should check file permissions of nginx's user on that folder.
This problem has been resolved by adding
RUN chown -R www-data:www-data /var/tmp/nginx
to the Dockerfile

nginx Connection timed out while reading response header from upstream

I am using nginx + uwsgi over a flask app. In nginx settings the server block is having server_name *.mydomain.com; and location block for uwsgi is like
location /api/ {
include uwsgi_params;
uwsgi_pass unix:///var/uwsgi/app.sock;
.........
}
so the issue is I can access app.mydomain.com, but when i am trying app1.mydomain.com uwsgi log is not showing any request. nginx error log is showing
upstream timed out (110: Connection timed out) while reading response header from upstream, client: 122.166.94.231, server: *.mydomain.com, request: "GET /api/client/generic/ping HTTP/1.1", upstream: "uwsgi://unix:///var/uwsgi/app.sock", host: "app1.mydomain.com
I have another test setup where all these settings are same and its working. Any pointers? When i restart uwsgi and nginx app1.mydomain.com works, until i load app.mydomain.com (initial load of app.mydomain.com fails, but if i keep on refreshing it loads then app1.mydomain.com raises 504 gateway timeout and log shows Connection timed out while reading response header from upstream).
It worked when I added single-interpreter = true in uwsgi.ini settings.
A newly added python library was causing the issue.
Don't know whether this will help others.
I also ran into the same issue. uWSGI has "http", "http-socket" and "socket" options. When putting uWSGI behind a full webserver like Nginx, we should spawn uWSGI to natively speak the uWSGI protocol:
uwsgi --socket 127.0.0.1:3031 --wsgi-file foobar.py --master --processes 4 --threads 2 --stats 127.0.0.1:9191
More details from uwsgi documentation: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html#putting-behind-a-full-webserver
Looking at the uwsgi error logs and understanding what the problem is helped me. Issue was not related to Nginx configurations at all. My email host has changed and the code threw error while calling the send email code.

GeoServer times out (504 Gateway Time-Out) when accessed from OpenLayers via nginx webserver

I have developed an OpenLayers web app that uses GeoServer. I am using nginx as my webserver with proxy_pass setup for GeoServer. Everything works as expected when I use "localhost" but when I switch to my IP address I get a 504 Gateway Time-Out error for
http://98.153.141.207/geoserver/cite/wfs.
I can access GeoServer at
http://98.153.141.207/geoserver/web
via a browser without problem so it would appear the proxy continues to work as expected.
The GeoServer log shows this when the problem occurs:
Request: describeFeatureType
service = WFS
version = 1.1.0
baseUrl = http://98.153.141.207:80/geoserver/
typeName[0] = {http://www.opengeospatial.net/cite}MyLayer
outputFormat = text/xml; subtype=gml/3.1.1
Then after a minute, I get the 504 Gateway Time-Out in my JavaScript console and this shows up in the GeoServer log:
09 May 06:02:15 WARN [geotools.xml] - Error parsing: http://98.153.141.207/geoserver/wfs/DescribeFeatureType?version=1.1.0&typename=cite:MyLayer
I have tried this supposed problem URL in a browser and it works fine.
The nginx erorr log contains this:
2013/05/09 06:02:15 [error] 420#3844: *54 upstream timed out (10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond) while reading response header from upstream, client: 98.153.141.207, server: localhost, request: "GET /geoserver/wfs/DescribeFeatureType?version=1.1.0&typename=cite:MyLayer HTTP/1.1", upstream: "http://127.0.0.1:8080/geoserver/wfs/DescribeFeatureType?version=1.1.0&typename=cite:MyLayer", host: "98.153.141.207"
Further investigation reveals that this problem seems to be restrict to WFS layers only. The WMS layers work fine. Here is the declaration of my WFS layer that fails:
myLayer = new OpenLayers.Layer.Vector("MyLayer",
{
strategies: [new OpenLayers.Strategy.BBOX(),saveStrategy],
projection: "EPSG:2276",
protocol: new OpenLayers.Protocol.WFS(
{
version: "1.1.0",
url: "http://" + hostip + "/geoserver/cite/wfs",
featureNS: "http://www.opengeospatial.net/cite",
srsName: "EPSG:2276",
featureType: "MyLayer",
geometryName: "Poly",
schema: "http://" + hostip + "/geoserver/wfs/DescribeFeatureType?version=1.1.0&typename=cite:MyLayer"
})
});
Any help would be appreciated. Thanks
I managed to get this working by remove the "schema" property from the OpenLayer.Protocol.WFS of my layer. Can anyone explain why this would be the problem?

Resources