IRCD-hybrid Connections Limited at 4026 - networking

I'm running an Ubuntu Server 12.04 instance on EC2, with an installation of IRCD-hybrid 7.2 on it. Right now, I'm trying to load test the server by making a bunch on connections and seeing how much the server can handle. I have a script that connects to the room.
My problem is that I can get 4026 connections in the server maximum. My other socket connections just don't seem to work. I have the max clients set to 100k just to be safe and 50k for max number per ip.
When i run
sysctl fs.file-nr -> fs.file-nr = 4576 0 1513750
Also, my ulimits have been set:
ulimit -S -> 65536
My ulimit -n is 1024, but since I can get 4026 connections, I don't see how that's affecting it.
ulimit -n -> 1024
Memory and CPU are also nowhere even close to maximum when I run into this.
My code is this:
import random
import sys
import socket
import string
import time
n = ''.join(random.choice(string.letters) for i in xrange(40))
HOST="<MYHOST IS HERE>"
PORT=6666
NICK=n
IDENT=n
REALNAME=n
readbuffer=""
s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.send("NICK %s\r\n" % NICK)
s.send("USER %s %s %s :%s\r\n" % (IDENT, HOST, REALNAME, REALNAME))
s.send('JOIN #foobar\r\n')
while 1:
readbuffer=readbuffer+s.recv(1024)
temp=string.split(readbuffer, "\n")
readbuffer=temp.pop( )
for line in temp:
line=string.rstrip(line)
line=string.split(line)
if 'PRIVMSG' in line:
print line
if(line[0]=="PING"):
s.send("PONG %s\r\n" % line[1])
Is there a setting on ircd-hybrid that sets this? The terminal window says that "Server is full" when I try to connect with a regular client and I already have 4026 connections.

There are two types of ulimit, hard and soft. A ulimit on a particular resource may be increased up to the hard limit by the process. However it maybe listed no further.
On my box (ubuntu 12.04), the soft file descrption is 1024), but the hard limit is 4096
$ulimit -n
1024
moment#moment:~/tmp 20:26:04 0
$ulimit -n -H
4096
moment#moment:~/tmp 20:26:16 0
$ulimit -n -S
1024
It's entirely plausible that your irc server is increasing up to this hard limit.
A horrible hack to increase your ulimit temporarily works like this.
sudo su
ulimit -n 10000
su USERNAME
Long term you would need to increase the limit system wide or , preferably, increase the ulimit for just the process you are running. For daemons I normally do this using the ulimit instruction in upstart configuration files.
In general strace can be useful for debugging problems like this (this will probably show an earlier call to increase the file ulimit)

Related

Ubuntu (Oracle VM) - Mounted Samba shares hang indefinitely

I have a VM instance on Oracle Cloud (Ubuntu 22.04) set up with ZeroTier to act as a web server for some services that should work with my local Synology NAS.
For some of those services I also need to mount three SMB shares from my NAS with the ZeroTier tunnel, but I can't make it work.
I used mount and mount.cifs plenty of times with automounting too, this time it acts very strange:
running the mount command seems to succeed from the console, but /var/log/syslog reads
CIFS: VFS: \\XXX.XXX.XXX.XXX has not responded in 180 seconds.
Reconnecting...
if trying to access one of the shares (ls or lsof or cd or any other command), it succeeds for only one of the shares (always the same one), but only for the first time any command is given:
$ ls /temp
folder1 folder2 folder3
any other following command just "hangs" as if they system is working on something, but it stays like that indefinitely most of the times:
$ ls /temp
█
Just a few times it spits out this error
lsof: WARNING: can't stat() cifs file system /temp
Output information may be incomplete.
ls 1475 ubuntu 3r DIR 0,44 0 123207681 /temp
findmnt reads:
└─/temp //XXX.XXX.XXX.XXX/Downloads cifs rw,relatime,vers=2.0,cache=strict, username=[redacted],uid=1005,noforceuid,gid=0,noforcegid,addr=XXX.XXX.XXX.XXX,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=65536,wsize=65536,bsize=1048576,echo_interval=60,actimeo=1
for the remaining two "mounted" shares, none of them seems to respond to any command, not even the very first command, and they just hang like the one share that, at least, lets me browse for one time;
umount and umount -l take at least 2-3 minutes to successfully unmount the shares.
Same behavior when using smbclient and also with NFS shares from the same NAS.
What I have already tried:
update kernel and all packages;
remove, purge and reinstall cifs-utils, smbclient and so on...
tried mounting the same shares in another client / node within the ZeroTier network and it works just fine; also browsing from Windows and Android file manager apps with and without ZeroTier works flawlessly;
tried all SMB versions including SMBv3 and SMBv1 (CIFS);
tried different browsing or mounting methods / commands including mount, mount.cifs, autofs, smbclient;
tried to debug what happens behind the console, but didn't found anything that seems related to this in logs, htop or anything else. During the "hanging" sessions there is no spike in CPU, RAM or Network usage in either the Oracle VM or Synology NAS;
checked, reset and reconfigured all permissions on my NAS for shares, folders and files recursively and reconfigured users groups permissions.
What I haven't tried yet (I'll try as soon as possible):
reproduce this on another Oracle VM configured the same as the faulty one and another with a different base image (maybe Oracle Linux?);
It seems to me that the mount.cifs process doesn't really succeeds in mounting the share correctly, as it doesn't show as such anywhere. It also seems an issue not related to folder/file permissions, but rather something related to networking?
A note on something that may or may not be related to this: ZeroTier on my Synology NAS does not seems to work with IPv4 only - it remains OFFLINE. The node goes ONLINE only when IPv6 is enabled, but I must say that this is the only node in my ZT network that shows a IPv6 as public IP in the ZT web GUI - the other nodes show IPv4 public addresses.
If anyone has any clue on this, I'll be happy to support and reproduce any advice. Thank you!
I'm using YailScale, but I presume it will work the same.
You need to add the port 445 to /etc/iptables/rules.v4 just under the SSH setup like below:
-A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m state --state NEW -m tcp --dport 445 -j ACCEPT (like this)
Then you need to edit the interfaces in /etc/samba/smb.conf to:
interfaces = lo tailscale0 100.0.0.0/24
Obviously, my interface is tailscale0, but yours will be different. Use ip link show to find yours. You may also need to change your IP range to suit ZeroTeirs, such as 100.0.0.0/24, which is what tailscale uses.
Then reboot!
I couldn't get it working without doing this.

Cstack_info() output different between Rstudio Server and Rstudio Desktop on Ubuntu 20.04LTS

I am having trouble getting rid of the CStack limit when running my code.
I managed to get rid of the error by appending
* hard stack unlimited
* soft stack unlimited
* soft memlock unlimited
* hard memlock unlimited
root soft stack unlimited
root hard stack unlimited
root soft memlock unlimited
root hard memlock unlimited
to /etc/security/limits.conf which fixes the problem on RStudio Desktop.
I get the following output from running Cstack_info()
> Cstack_info()
size current direction eval_depth
NA NA 1 2
This is the output from ulimit -s on the desktop terminal
coolshades#coolshades-ws:~$ ulimit -s
unlimited
Code runs perfectly on RStudio Desktop.
On the same machine, I also am running RStudio Server (free) to run code remotely. It would seem that these settings are not sticking when running RStudio Server.
This is the output from Cstack_info() on the RStudio Server
> Cstack_info()
size current direction eval_depth
7969177 26336 1 2
This is the ulimit output from terminal on the RStudio Server
coolshades#coolshades-ws:~$ ulimit -s
8192
I am able to change the limit back to unlimited with ulimit -s unlimited. But it will only kick in after Rsession is restarted. However, when I restart the R session, the output of ulimit -s reverts back to 8192.
I am out of ideas as to how best to tackle this problem and hope a more experienced RStudio Server user will be able to advise on this matter.
I have solved this problem.
I had to make the following changes to the following files:
sudo nano /etc/systemd/user.conf add DefaultLimitSTACK=134217728
sudo nano /etc/systemd/system.conf add DefaultLimitSTACK=134217728
Make sure the number you define is a power of 2, else Ubuntu fails to login for some reason.
I have 128GB of RAM. So I have set my limit to 2^27.
Hope this helps someone with the same problem.

ApacheBench gets "Connection reset by peer" from Go server only with "-race" flag

I have a minimal Go HTTP server (code below). When I start the server with go run server.go and then fire off 5000 concurrent requests using
ab -c 5000 -n 5000 http://localhost:8080/
everything works as expected. However, if I start my server with the race detector flag:
go run -race server.go
then I get an issue running ApacheBench even with only 1000 concurrent requests:
apr_socket_recv: Connection reset by peer (54)
Interestingly, my Go server doesn't crash or print any error messages, and is able to continue receiving new requests. This suggests that the problem is not the Go process running out of memory because of the "-race" burden.
Additional details:
I'm running Go 1.10 on a Mac
ab -V tells me I'm using Version 2.3 of ab (default shipment with Macbook, and it looks like ab has been dropped from brew).
If I run ab with the -r flag so that it doesn't exit right away, I get the output: Test aborted after 10 failures. So it seems like my Go server must be dropping connections rather than queueing them up...
Go server code:
package main
import (
"log"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
log.Printf("got one\n")
}
func main() {
http.HandleFunc("/", handler)
log.Fatal(http.ListenAndServe(":8080", nil))
}
ulimit settings:
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4864
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 709
virtual memory (kbytes, -v) unlimited

bind failure: Address already in use even though recycle and reuse flags are set to 1

Environment:
Unix client and unix server.
Tool used : curl.
Client/Server should ignore the time wait time (2 *MSL ) when establishing connection.
This is done by executing the following commands :
sysctl net.ipv4.tcp_tw_reuse=1
sysctl net.ipv4.tcp_tw_recycle=1
Local port must be specified so that it can re-used.
Start the connection.
Example : while [ 1 ]; do curl --local-port 9056 192.168.40.2; sleep 30; done
I am still seeing the error even though it should have ignored time wait period.
Any idea why this is happening?

How can I test an outbound connection to an IP address as well as a specific port?

OK, we all know how to use PING to test connectivity to an IP address. What I need to do is something similar but test if my outbound request to a given IP Address as well as a specif port (in the present case 1775) is successful. The test should be performed preferably from the command prompt.
Here is a small site I made allowing to test any outgoing port. The server listens on all TCP ports available.
http://portquiz.net
telnet portquiz.net XXXX
If there is a server running on the target IP/port, you could use Telnet. Any response other than "can't connect" would indicate that you were able to connect.
To automate the awesome service portquiz.net, I did write a bash script :
NB_CONNECTION=10
PORT_START=1
PORT_END=1000
for (( i=$PORT_START; i<=$PORT_END; i=i+NB_CONNECTION ))
do
iEnd=$((i + NB_CONNECTION))
for (( j=$i; j<$iEnd; j++ ))
do
#(curl --connect-timeout 1 "portquiz.net:$j" &> /dev/null && echo "> $j") &
(nc -w 1 -z portquiz.net "$j" &> /dev/null && echo "> $j") &
done
wait
done
If you're testing TCP/IP, a cheap way to test remote addr/port is to telnet to it and see if it connects. For protocols like HTTP (port 80), you can even type HTTP commands and get HTTP responses.
eg
Command IP Port
Telnet 192.168.1.1 80
The fastest / most efficient way I found to to this is with nmap and portquiz.net described here: http://thomasmullaly.com/2013/04/13/outgoing-port-tester/ This scans to top 1000 most used ports:
# nmap -Pn --top-ports 1000 portquiz.net
Starting Nmap 6.40 ( http://nmap.org ) at 2017-08-02 22:28 CDT
Nmap scan report for portquiz.net (178.33.250.62)
Host is up (0.072s latency).
rDNS record for 178.33.250.62: electron.positon.org
Not shown: 996 closed ports
PORT STATE SERVICE
53/tcp open domain
80/tcp open http
443/tcp open https
8080/tcp open http-proxy
Nmap done: 1 IP address (1 host up) scanned in 4.78 seconds
To scan them all (took 6 sec instead of 5):
# nmap -Pn -p1-65535 portquiz.net
The bash script example of #benjarobin for testing a sequence of ports did not work for me so I created this minimal not-really-one-line (command-line) example which writes the output of the open ports from a sequence of 1-65535 (all applicable communication ports) to a local file and suppresses all other output:
for p in $(seq 1 65535); do curl -s --connect-timeout 1 portquiz.net:$p >> ports.txt; done
Unfortunately, this takes 18.2 hours to run, because the minimum amount of connection timeout allowed integer seconds by my older version of curl is 1. If you have a curl version >=7.32.0 (type "curl -V"), you might try smaller decimal values, depending on how fast you can connect to the service. Or try a smaller port range to minimise the duration.
Furthermore, it will append to the output file ports.txt so if run multiple times, you might want to remove the file first.

Resources