Gitlab timeouts / slow on initial page loads - nginx

I am running Gitlab on Debian using the package from the Repository. Most of the time Gitlab is running very fast, but after longer idle times Gitlab is very slow or even times out (error 502). One time I also had a timeout on a remote git access (could not reproduce the issue - timeout on the internal API).
In my setup the the Debian machine is behind another nginx proxy which also serves some other services just fine. I did the gitlab-cli checks and everything seems fine.
In the error log of my reverse proxy I only see connection timeouts:
[error] 8643#0: *4139 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.1.1.10, server: gitlab.mydomain.tld, request: "GET / HTTP/1.1", upstream: "http://{SERVER-IP}:80/", host: "gitlab.mydomain.tld"
I can see some errors in my unicorn_stderr.log
E, [2016-03-30T19:40:20.183991 #783] ERROR -- : worker=1 PID:16798 timeout (61s > 60s), killing
E, [2016-03-30T19:40:20.194969 #783] ERROR -- : reaped #<Process::Status: pid 16798 SIGKILL (signal 9)> worker=1
I, [2016-03-30T19:40:20.197554 #16871] INFO -- : worker=1 spawned pid=16871
I, [2016-03-30T19:40:20.197909 #16871] INFO -- : worker=1 ready
E, [2016-03-30T20:08:42.911429 #783] ERROR -- : worker=0 PID:16866 timeout (61s > 60s), killing
E, [2016-03-30T20:08:43.191151 #783] ERROR -- : reaped #<Process::Status: pid 16866 SIGKILL (signal 9)> worker=0
I, [2016-03-30T20:08:43.758363 #18728] INFO -- : worker=0 spawned pid=18728
I, [2016-03-30T20:08:44.108244 #18728] INFO -- : worker=0 ready
What I am a bit curious about is the fact that there are no errors in the log of the nginx delivered with gitlab.
Some more system information:
#sudo gitlab-rake gitlab:env:info
System information
System: Debian 8.3
Current User: git
Using RVM: no
Ruby Version: 2.1.8p440
Gem Version: 2.5.1
Bundler Version:1.10.6
Rake Version: 10.5.0
Sidekiq Version:4.0.1
GitLab information
Version: 8.5.0
Revision: a513e09
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: http://gitlab.mydomain.tld
HTTP Clone URL: http://gitlab.mydomain.tld/some-group/some-project.git
SSH Clone URL: git#gitlab.mydomain.tld:some-group/some-project.git
Using LDAP: no
Using Omniauth: no
GitLab Shell
Version: 2.6.10
Repositories: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks/
Git: /opt/gitlab/embedded/bin/git
Edit:
My nginx config on the "external" reverse proxy looks like this:
server {
listen 443;
ssl on;
server_name gitlab.mydomain.tld;
access_log /var/log/nginx/gitlab.mydomain.tld.access.log;
error_log /var/log/nginx/gitlab.mydomain.tld.error.log;
ssl_certificate /etc/nginx/ssl/gitlab.mydomain.tld_unified.crt;
ssl_certificate_key /etc/nginx/ssl/mydomain.tld.key;
location / {
proxy_pass http://gitlab:80;
proxy_redirect default;
proxy_set_header Host $http_host;
proxy_set_header X_FORWARDED_PROTO "https";
satisfy any;
}
}
Edit2:
I took the suggested answer into account and also considered this source: https://github.com/gitlabhq/gitlabhq/blob/master/doc/install/requirements.md
I assigned 2GB RAM to the VM now, and also added one additional unicorn worker.
Edit3:
The problem seems to be solved by adding more memory and using 3 unicorn workers.

Jan,
I have a similar setup although our box is dedicated to GITlab. Without knowing the specs of your server (GITLAB likes memory) and the load on that box I would suggest the following diagnostics:
Does your upstream nginx use identical parameters as the gitlab nginx configuration? They have tweaked a number of things including timeouts.
What kind of request result in time outs? Some operations (like generating diffs) can take some time to render.
If you run the requests via SSH do you also experience time outs?
Have you checked global logs in /var/log?

FYI: I had to enlarge my small GitLab installation to have 4GB RAM not to throw OOM errors
Now I think, I'd better go with gogs or other alternative.

Related

Nginx memcached with fallback to remote service

I can't get Nginx working with memcached module, the requirement is to query remote service, cache data in memcached and never fetch remote endpoint until backend invalidates the cache. I have 2 containers with memcached v1.4.35 and one with Nginx v1.11.10.
The configuration is the following:
upstream http_memcached {
server 172.17.0.6:11211;
server 172.17.0.7:11211;
}
upstream remote {
server api.example.com:443;
keepalive 16;
}
server {
listen 80;
location / {
set $memcached_key "$uri?$args";
memcached_pass http_memcached;
error_page 404 502 504 = #remote;
}
location #remote {
internal;
proxy_pass https://remote;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
I tried to set memcached upstream incorrectly but I get HTTP 499 instead and warnings:
*3 upstream server temporarily disabled while connecting to upstream
It seems with described configuration Nginx can reach memcached successfully but can't write or read from it. I can write and read to memcached with telnet successfully.
Can you help me please?
My guesses on what's going on with your configuration
1. 499 codes
HTTP 499 is nginx' custom code meaning the client terminated connection before receiving the response (http://lxr.nginx.org/source/src/http/ngx_http_request.h#0120)
We can easily reproduce it, just
nc -k -l 172.17.0.6 172.17.0.6:11211
and curl your resource - curl will hang for a while and then press Ctrl+C — you'll have this message in your access logs
2. upstream server temporarily disabled while connecting to upstream
It means nginx didn't manage to reach your memcached and just removed it from the pool of upstreams. Suffice is to shutdown both memcached servers and you'd constantly see it in your error logs (I see it every time with error_log ... info).
As you see these messages your assumption that nginx can freely communicate with memcached servers doesn't seem to be true.
Consider explicitly setting http://nginx.org/en/docs/http/ngx_http_memcached_module.html#memcached_bind
and use the -b option with telnet to make sure you're correctly testing memcached servers for availability via your telnet client
3. nginx can reach memcached successfully but can't write or read from it
Nginx can only read from memcached via its built-in module
(http://nginx.org/en/docs/http/ngx_http_memcached_module.html):
The ngx_http_memcached_module module is used to obtain responses from
a memcached server. The key is set in the $memcached_key variable. A
response should be put in memcached in advance by means external to
nginx.
4. overall architecture
It's not fully clear from your question how the overall schema is supposed to work.
nginx's upstream uses weighted round-robin by default.
That means your memcached servers will be queried once at random.
You can change it by setting memcached_next_upstream not_found so a missing key will be considered an error and all of your servers will be polled. It's probably ok for a farm of 2 servers, but unlikely is it what your want for 20 servers
the same is ordinarily the case for memcached client libraries — they'd pick a server out of a pool according to some hashing scheme => so your key would end up on only 1 server out of the pool
5. what to do
I've managed to set up a similar configuration in 10 minutes on my local box - it works as expected. To mitigate debugging I'd get rid of docker containers to avoid networking overcomplication, run 2 memcached servers on different ports in single-threaded mode with -vv option to see when requests are reaching them (memcached -p 11211 -U o -vv) and then play with tail -f and curl to see what's really happening in your case.
6. working solution
nginx config:
https and http/1.1 is not used here but it doesn't matter
upstream http_memcached {
server 127.0.0.1:11211;
server 127.0.0.1:11212;
}
upstream remote {
server 127.0.0.1:8080;
}
server {
listen 80;
server_name server.lan;
access_log /var/log/nginx/server.access.log;
error_log /var/log/nginx/server.error.log info;
location / {
set $memcached_key "$uri?$args";
memcached_next_upstream not_found;
memcached_pass http_memcached;
error_page 404 = #remote;
}
location #remote {
internal;
access_log /var/log/nginx/server.fallback.access.log;
proxy_pass http://remote;
proxy_set_header Connection "";
}
}
server.py:
this is my dummy server (python):
from random import randint
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
return 'Hello: {}\n'.format(randint(1, 100000))
This is how to run it (just need to install flask first)
FLASK_APP=server.py [flask][2] run -p 8080
filling in my first memcached server:
$ telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
set /? 0 900 5
cache
STORED
quit
Connection closed by foreign host.
checking:
note that we get a result every time although we stored data
only in the first server
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
$ curl http://server.lan && echo
cache
this one is not in the cache so we'll get a response from server.py
$ curl http://server.lan/?q=1 && echo
Hello: 32337
whole picture:
the 2 windows on the right are
memcached -p 11211 -U o -vv
and
memcached -p 11212 -U o -vv

nginx Connection timed out while reading response header from upstream

I am using nginx + uwsgi over a flask app. In nginx settings the server block is having server_name *.mydomain.com; and location block for uwsgi is like
location /api/ {
include uwsgi_params;
uwsgi_pass unix:///var/uwsgi/app.sock;
.........
}
so the issue is I can access app.mydomain.com, but when i am trying app1.mydomain.com uwsgi log is not showing any request. nginx error log is showing
upstream timed out (110: Connection timed out) while reading response header from upstream, client: 122.166.94.231, server: *.mydomain.com, request: "GET /api/client/generic/ping HTTP/1.1", upstream: "uwsgi://unix:///var/uwsgi/app.sock", host: "app1.mydomain.com
I have another test setup where all these settings are same and its working. Any pointers? When i restart uwsgi and nginx app1.mydomain.com works, until i load app.mydomain.com (initial load of app.mydomain.com fails, but if i keep on refreshing it loads then app1.mydomain.com raises 504 gateway timeout and log shows Connection timed out while reading response header from upstream).
It worked when I added single-interpreter = true in uwsgi.ini settings.
A newly added python library was causing the issue.
Don't know whether this will help others.
I also ran into the same issue. uWSGI has "http", "http-socket" and "socket" options. When putting uWSGI behind a full webserver like Nginx, we should spawn uWSGI to natively speak the uWSGI protocol:
uwsgi --socket 127.0.0.1:3031 --wsgi-file foobar.py --master --processes 4 --threads 2 --stats 127.0.0.1:9191
More details from uwsgi documentation: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html#putting-behind-a-full-webserver
Looking at the uwsgi error logs and understanding what the problem is helped me. Issue was not related to Nginx configurations at all. My email host has changed and the code threw error while calling the send email code.

Rails 4.2 + NGINX - application root won't load

My server setup works for a Rails 4.0 app, but fails on a 4.2 app. I get this error:
An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
NGINX config:
server {
listen 80;
server_name localhost;
passenger_enabled on;
rails_env production;
root /home/deploy/myapp/current/public;
}
NGINX error.log:
2014/10/13 16:17:06 [error] 9261#0: *9 upstream prematurely closed connection while reading response header from upstream, client: ***.***.***.***, server: localhost, request: "GET / H$
Rails production.log:
W, [2014-10-13T16:11:57.305892 #10891] WARN -- : Warning. Error encountered while saving cache a4b17298d22d34199795f642dc5b96ec8d58cc6c/orders.css.scssc: can't dump anonymous class #<$
W, [2014-10-13T16:11:57.314170 #10891] WARN -- : Warning. Error encountered while saving cache a4b17298d22d34199795f642dc5b96ec8d58cc6c/pages.css.scssc: can't dump anonymous class #<C$
W, [2014-10-13T16:11:57.319744 #10891] WARN -- : Warning. Error encountered while saving cache a4b17298d22d34199795f642dc5b96ec8d58cc6c/registrations.css.scssc: can't dump anonymous c$
If I manually put in an index.html file in the public directory, I can see that. But it fails when I want to go to the root path of the app. Any ideas?
OK so this is a bit embarrassing. To further troubleshoot my application, I started in production mode on my local machine, and when I loaded the app I got the following error on the web page:
Missing `secret_key_base` for 'production' environment, set this value in `config/secrets.yml`
So that was it. I guess I missed this new security feature by jumping straight from Rails 4.0 to 4.2 Not sure why it didn't show in the logs, but at least I found it eventually.

Problems running flask app on uwsgi / nginx

I have created a flask app and up to this point have been using the default flask server for creating/testing it. Now i want to deploy it to a server. I am using uwsgi and nginx, though i am pretty new to both. i know there are a lot of guides and questions about similar things, but i couldnt find the solution after looking through as much as i could understand
The following is from my uwsgi log :
machine: x86_64
clock source: unix
detected number of CPU cores: 1
current working directory: /home/ben/flask/MLS-Flask
detected binary path: /home/ben/flask/MLS-Flask/mls-flask-ve/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 1024
your memory page size is 4096 bytes
detected max file descriptor number: 1024
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /home/ben/flask/MLS-Flask/mls_uwsgi.sock fd 3
Python version: 3.3.3 (default, Dec 30 2013, 16:29:41) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
Set PythonHome to /home/ben/flask/MLS-Flask/mls-flask-ve
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x11755d0
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 72760 bytes (71 KB) for 1 cores
*** Operational MODE: single process ***
added /home/ben/flask/MLS-Flask/ to pythonpath.
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x11755d0 pid: 2926 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (and the only) (pid: 2926, cores: 1)
I am assuming the uwsgi is at least running? I am fairly new to this so i am not quite sure that the problem is.
my nginx config is :
server{
listen 8080;
charset utf-8;
location / {try_files $uri #app; }
location #app {
include uwsgi_params;
uwsgi_pass unix:/home/ben/flask/MLS-Flask/mls_uwsgi.sock;
}
}
my uwsgi ini is :
[uwsgi]
uid = nginx
gid = nginx
base = /home/ben/flask/MLS-Flask
home = %(base)/mls-flask-ve
pythonpath = %(base)
chdir = /home/ben/flask/MLS-Flask
module = runp
#socket file's location
socket = /home/ben/flask/MLS-Flask/mls_uwsgi.sock
#permissions for the socket file
chmod-socket = 666
#variable that holds a flask application inside the module imported
callable = app
#location of log file
logto = /var/log/uwsgi/%n.log
and the file the uwsgi ini is running is my flask app:
from app import app
if __name__ == "__main__":
app.run(debug = False, port = 8080)
I may have some extraneous stuff in my uwsgi ini or nginx config, but i am not sure if those would necessarily be the problems. Can anyone see any reasons why this might not be working? I am currently getting a 502 bad gateway error on localhost:8080, so i am guessing it has something to do with my flask, uwsgi ini/socket.
i appreciate any help.
It turned out my nginx user didnt have access to the socket because the / and /home/ directory was owned by the root group and root user. I ended up giving full access to the owner and group all the way from / directory to the socket (this probably is not the safest solution security wise, but i can further refine it after i get everything working.)
I had the same problem :
Always check socket permissions by using ls -lhtr
Try putting socket in /run/myapp/mysock.sock folder
Create an empty sock file in this folder vi mysock.sock
Set permissions of this empty file to have full access by your user and group stated
in the service. chown user:group /run/myapp/mysock.sock

nginx errors readv() and recv() failed

I use nginx along with fastcgi. I see a lot of the following errors in the error logs
readv() failed (104: Connection reset
by peer) while reading upstream and
recv() failed (104: Connection reset
by peer) while reading response header
from upstream
I don't see any problem using the application. Are these errors serious or how to get rid of them.
I was using php-fpm in the background and slow scripts were getting killed after a said timeout because it was configured that way. Thus, scripts taking longer than a specified time would get killed and nginx would report a recv or readv error as the connection is closed from the php-fpm engine/process.
Update:
Since nginx version 1.15.3 you can fix this by setting the keepalive_requests option of your upstream to the same number as your php-fpm's pm.max_requests:
upstream name {
...
keepalive_requests number;
...
}
Original answer:
If you are using nginx to connect to php-fpm, one possible cause can also be having nginx' fastcgi_keep_conn parameter set to on (especially if you have a low pm.max_requests setting in php-fpm):
http|server|location {
...
fastcgi_keep_conn on;
...
}
This may cause the described error every time a child process of php-fpm restarts (due to pm.max_requests being reached) while nginx is still connected to it. To test this, set pm.max_requests to a really low number (like 1) and see if you get even more of the above errors.
The fix is quite simple - just deactivate fastcgi_keep_conn:
fastcgi_keep_conn off;
Or remove the parameter completely (since the default value is off). This does mean your nginx will reconnect to php-fpm on every request, but the performance impact is negligible if you have both nginx and php-fpm on the same machine and connect via unix socket.
Regarding this error:
readv() failed (104: Connection reset by peer) while reading upstream and recv() failed (104: Connection reset by peer) while reading response header from upstream
there was 1 more case where I could still see this.
Quick set up overview:
CentOS 5.5
PHP with PHP-FPM 5.3.8 (compiled from scratch with some 3rd party
modules)
Nginx 1.0.5
After looking at the PHP-FPM error logs as well and enabling catch_workers_output = yes in the php-fpm pool config, I found the root cause in this case was actually the amfext module (PHP module for Flash).
There's a known bug and fix for this module that can be corrected by altering the amf.c file.
After fixing this PHP extension issue, the error above was no longer an issue.
This is a very vague error as it can mean a few things. The key is to look at all possible logs and figure it out.
In my case, which is probably somewhat unique, I had a working nginx + php / fastcgi config. I wanted to compile a new updated version of PHP with PHP-FPM and I did so. The reason was that I was working on a live server that couldn't afford downtime. So I had to upgrade and move to PHP-FPM as seamlessly as possible.
Therefore I had 2 instances of PHP.
1 directly talking with fastcgi (PHP 5.3.4) - using TCP / 127.0.0.1:9000 (PHP 5.3.4)
1 configured with PHP-FPM - using Unix socket - unix:/dir/to/socket-fpm
(PHP 5.3.8)
Once I started up PHP-FPM (PHP 5.3.8) on an nginx vhost using a socket connection instead of TCP I started getting this upstream error on any fastcgi page taking longer than x minutes whether they were using FPM or not. Typically it was pages doing large SELECTS in mysql that took ~2 min to load. Bad I know, but this is because of back end DB design.
What I did to fix it was add this in my vhost configuration:
fastcgi_read_timeout 5m;
Now this can be added in the nginx global fastcgi settings as well. It depends on your set up. http://wiki.nginx.org/HttpFcgiModule
Answer # 2.
Interestingly enough fastcgi_read_timeout 5m; fixed one vhost for me.
However I was still getting the error in another vhost, just by running phpinfo();
What fixed this for me was by copying over a default production php.ini file and adding the config I needed into it.
What I had was an old copy of my php.ini from the previous PHP install.
Once I put the default php.ini from 'shared' and just added in the extensions and config I needed, this solved my problem and no longer did I have nginx errors readv() and recv() failed.
I hope 1 of these 2 fixes helps someone.
Also it can be a very simple problem - there is an infinity cicle somewhere in your code, or an infinity trying to connect an external host on your page.
Some times this problem happen because of huge of requests. By default the pm.max_requests in php5-fpm maybe is 100 or below.
To solve it increase its value depend on the your site's requests, For example 500.
And after the you have to restart the service
sudo service php5-fpm restart
Others have mentioned the fastcgi_read_timeout parameter, which is located in the nginx.conf file:
http {
...
fastcgi_read_timeout 600s;
...
}
In addition to that, I also had to change the setting request_terminate_timeout in the file: /etc/php5/fpm/pool.d/www.conf
request_terminate_timeout = 0
Source of information (there are also a few other recommendations for changing php.ini parameters, which may be relevant in some cases): https://ma.ttias.be/nginx-and-php-fpm-upstream-timed-out-failed-110-connection-timed-out-or-reset-by-peer-while-reading/

Resources