How to fix high latency and retransmission rate in Ubuntu 18.04

How to fix high latency and retransmission rate in Ubuntu 18.04 - tcp

I installed Ubuntu 18.04 on Hyper-V Win Server 2016.
And network performance of the Ubuntu is bad: I'm hosting few sites (Apache + PHP) and sometime response time is > 10 seconds. Sometimes it is fast.
As I troubleshooted, I see this netstat results:
# netstat -s | egrep -i 'loss|retran'
3447700 segments retransmitted
226 times recovered from packet loss due to fast retransmit
Detected reordering 6 times using reno fast retransmit
TCPLostRetransmit: 79831
45 timeouts after reno fast retransmit
6247 timeouts in loss state
2056435 fast retransmits
107095 retransmits in slow start
TCPLossProbes: 220607
TCPLossProbeRecovery: 3753
TCPSynRetrans: 90564
What can be cause of such high "segments retransmitted" number? And how to fix it?
Few notes:
- VMQ is disabled for Ubuntu VM
- The host system Network adapter is Intel I210
- I disabled IPv6 both on host and in VM
Here is WireShark showing, that it takes ~7 seconds to connect (just initial connection) to my site Propovednik.com:
Sep 20: So far, the issue seems to be caused by OVH / SoYouStart bad network:
This command shows 20-30% packets loss:
sudo ping us.soyoustart.com -c 10 -i 0.2 -p 00 -s 1200 -l 5

The problem could be anywhere along the network, including the workstation where you work from. I suggest you check the network as retransmissions and packetloss means that either something is malfunctioning or misconfigured. If this is on a wireless network, you could be out of range of your router.
I am pinging the website you noted from my computer and there is no packetloss.

Related

autossh tunnel getting killed after 10 minutes

I have an autossh tunnel set up over which I am sending something that needs an uninterrupted connection for a couple dozen minutes. However, I noticed that every 10 minutes the SSH tunnel managed by autossh is killed and recreated.
This is not due to an inactive connection, as there is active communication happening through that channel.
The command used to set up the tunnel was:
autossh -C -f -M 9910 -N -L 6969:127.0.0.1:12345 remoteuser#example.com

In my case the problem was a clash of the monitoring ports on the remote server. There are multiple servers all autossh-ing to the single central server and two of those "clients" used the same monitoring port (-M).
The default interval in which autossh tries to communicate over the monitoring channel is 600 seconds, 10 minutes. When autossh starts up, it does not verify that it could open the remote monitoring port. Everything will look fine until the time when autossh tries to check that the connection is open - and it fails. At that point the SSH tunnel will be forcibly killed and recreated.
A good way to check if this is your case as well is change the default timeout using the AUTOSSH_POLL environment variable:
AUTOSSH_POLL=10 autossh -C -f -M 9910 -N -L 6969:127.0.0.1:12345 remoteuser#example.com

rsync slow start (starting up delay, after change of router)

After I changed router, there is a delay in rsync that i could not trace even with high verbosity output settings for logs. I am running an rsync daemon on an android device (Wifi connected to router). I copy files from that device to another device (LAN connected to router). The delay can be seen even on "--list-only" option. Things are working very well for about 2 years already before I changed our router. The log is shown below:
rsync -vvvvvvvvvv --stats --progress --list-only --port xxxx xxx#xxx.xxx.xxx.xxx::share/
FILE_STRUCT_LEN=16, EXTRA_LEN=4
opening tcp connection to xxx.xxx.xxx.xxx port xxxx
Connected to xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx)
msg checking charset: UTF-8
There is a delay of atleast 8 seconds after that initial connection. This is the problem. After this delay, everything goes smoothly. Everything is fast.
sending daemon args: --server --sender -vvvvvvvvvvde.Lsfx --list-only . share/ (6 args)
(Client) Protocol versions: remote=30, negotiated=30
receiving file list ...
recv_file_name(.)
recv_file_name(xxxx)
....
....
I can confirm it only occurs in the new router because the delay disappears when I connect the old one. My new router is TL WR841N. I have tried disabling all security features in it (flooding protection, etc) except the usual WPA2-AES password for the access point. I could not trace what rsync is trying to do when the delay occurs.
Although the delay is just in the beginning, my current backup methods include starting rsync many times. Thus if I start rsync 10 times, there is already an overhead of 8*10 seconds for starting delay, which did not exist with my previous router.
I have tried researching already, but the problems of others I have found are not very relevant to the delay I am experiencing (most have problems on SSH, but I am running an rsync daemon, not rsync on SSH)
How can I trace the problem? Any idea for the delay?

Internet Provider with "Private WAN" to the clients?

This is strange. How this actually works. So far I know it is "impossible" to have a network like this.
I'm going to explain in details how my network works.
I have a LAN. 192.168.1.0/24 and router is on 192.168.1.1, This router has a public address.
I can share the IP address because I'm running there a server for tests nothing more. It is ok so far.
Now the magic happens.
When I trace the route to an IP I got this (To google DNS):
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
1 zonhub.home (192.168.1.1) 1.160 ms 1.676 ms 1.340 ms
2 * * *
3 10.137.211.97 (10.137.211.97) 12.915 ms 12.526 ms 12.145 ms
4 10.255.49.90 (10.255.49.90) 10.349 ms 10.255.49.102 (10.255.49.102) 11.483 ms 11.042 ms
5 80.157.128.249 (80.157.128.249) 34.577 ms 80.157.130.41 (80.157.130.41) 32.917 ms 80.157.130.33 (80.157.130.33) 30.602 ms
6 mad-sa3-i.MAD.ES.NET.DTAG.DE (217.5.95.161) 33.396 ms 80.157.128.22 (80.157.128.22) 27.107 ms mad-sa3-i.MAD.ES.NET.DTAG.DE (217.5.95.161) 29.510 ms
7 80.157.128.22 (80.157.128.22) 28.050 ms 72.14.235.20 (72.14.235.20) 32.767 ms 80.157.128.22 (80.157.128.22) 27.932 ms
8 72.14.235.20 (72.14.235.20) 29.780 ms 72.14.235.18 (72.14.235.18) 27.020 ms 26.706 ms
9 216.239.43.233 (216.239.43.233) 49.456 ms 209.85.240.191 (209.85.240.191) 44.034 ms 216.239.43.233 (216.239.43.233) 51.935 ms
10 72.14.236.191 (72.14.236.191) 53.374 ms 209.85.253.20 (209.85.253.20) 50.699 ms 216.239.43.233 (216.239.43.233) 44.918 ms
11 209.85.251.231 (209.85.251.231) 50.151 ms * 216.239.49.45 (216.239.49.45) 47.309 ms
12 google-public-dns-a.google.com (8.8.8.8) 51.536 ms 50.180 ms 45.505 ms
What is that 2nd, 3rd and 4th hop? How can it be on class A private address when 192.168.1.1 is running the NAT service where I have my LAN and my external 3 publics addresses (yes I have 3 and is 3 "class A" IP's on 88,89,93 network).
Another thing is how on 4th hop we have the 2nd octet 255?
Anyone feel free to traceroute my no-ip domain: synackfiles.no-ip.org
Just don't mess with my router (it blocks if you port scan or fail to log in in ssh or http auth so you get banned for this. If you just traceroute it is fine) :P
Now, second magic and weird stuff happens.
I'm going to run nmap. So i GOT THIS:
sudo nmap -sV -A -O 10.137.211.113 -vv -p 1-500 -Pn
Starting Nmap 6.00 ( http://nmap.org ) at 2013-11-14 15:24 WET
NSE: Loaded 93 scripts for scanning.
NSE: Script Pre-scanning.
NSE: Starting runlevel 1 (of 2) scan.
NSE: Starting runlevel 2 (of 2) scan.
Initiating Parallel DNS resolution of 1 host. at 15:24
Completed Parallel DNS resolution of 1 host. at 15:24, 0.04s elapsed
Initiating SYN Stealth Scan at 15:24
Scanning 10.137.211.113 [500 ports]
SYN Stealth Scan Timing: About 30.40% done; ETC: 15:26 (0:01:11 remaining)
SYN Stealth Scan Timing: About 60.30% done; ETC: 15:26 (0:00:40 remaining)
Completed SYN Stealth Scan at 15:26, 101.14s elapsed (500 total ports)
Initiating Service scan at 15:26
Initiating OS detection (try #1) against 10.137.211.113
Initiating Traceroute at 15:26
Completed Traceroute at 15:26, 9.05s elapsed
Initiating Parallel DNS resolution of 1 host. at 15:26
Completed Parallel DNS resolution of 1 host. at 15:26, 0.01s elapsed
NSE: Script scanning 10.137.211.113.
NSE: Starting runlevel 1 (of 2) scan.
Initiating NSE at 15:26
Completed NSE at 15:26, 0.00s elapsed
NSE: Starting runlevel 2 (of 2) scan.
Nmap scan report for 10.137.211.113
Host is up (0.0010s latency).
All 500 scanned ports on 10.137.211.113 are filtered
Device type: general purpose|specialized|media device
Running: Barrelfish, Microsoft Windows 2003|PocketPC/CE|XP, Novell NetWare 3.X, Siemens embedded, Telekom embedded
OS CPE: cpe:/o:barrelfish:barrelfish cpe:/o:microsoft:windows_server_2003::sp1 cpe:/o:microsoft:windows_server_2003::sp2 cpe:/o:microsoft:windows_ce cpe:/o:microsoft:windows_xp:::professional cpe:/o:novell:netware:3.12
Too many fingerprints match this host to give specific OS details
TCP/IP fingerprint:
SCAN(V=6.00%E=4%D=11/14%OT=%CT=%CU=%PV=Y%G=N%TM=5284EBAB%P=armv7l-unknown-linux-gnueabi)
T7(R=Y%DF=N%TG=80%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=R)
U1(R=N)
IE(R=N)
TRACEROUTE (using proto 1/icmp)
HOP RTT ADDRESS
1 2.32 ms zonhub.home (192.168.1.1)
2 ... 30
NSE: Script Post-scanning.
NSE: Starting runlevel 1 (of 2) scan.
NSE: Starting runlevel 2 (of 2) scan.
Read data files from: /usr/bin/../share/nmap
OS and Service detection performed. Please report any incorrect results at http://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 122.65 seconds
Raw packets sent: 1109 (49.620KB) | Rcvd: 4 (200B)
Well, this is odd. I don't know how my country's WAN is designed and built.
I'm from Portugal and my ISP is "ZON TVCABO". You can search now. :P
This is very very very interesting..
Sincerely,
int3

I cannot tell you how your providers WAN is built - but in order to
save public IPs - one can design an ISPs internal network with private
IPs. The routers that are not needed to be available from public will
have private IPs only - the IPs assigned to you can be routed to your
uplink over routers that are ISP internally only.
the 2nd hop has no tracing allowed, but allows to forward them.
the 4th - 10.255.x.x is a private IP in the 10.0.0.0/8 A range. (you
can use numbers from 0-255)

Preventing TCP SYN retry in netcat (for port knocking)

I'm trying to write the linux client script for a simple port knocking setup. My server has iptables configured to require a certain sequence of TCP SYN's to certain ports for opening up access. I'm able to successfully knock using telnet or manually invoking netcat (Ctrl-C right after running the command), but failing to build an automated knock script.
My attempt at an automated port knocking script consists simply of "nc -w 1 x.x.x.x 1234" commands, which connect to x.x.x.x port 1234 and timeout after one second. The problem, however, seems to be the kernel(?) doing automated SYN retries. Most of the time more than one SYN is being send during the 1 second nc tries to connect. I've checked this with tcpdump.
So, does anyone know how to prevent the SYN retries and make netcat simply send only one SYN per connection/knock attempt? Other solutions which do the job are also welcome.

Yeah, I checked that you may use nc too!:
$ nc -z example.net 1000 2000 3000; ssh example.net
The magic comes from (-z: zero-I/O mode)...

You may use nmap for port knocking (SYN). Just exec:
for p in 1000 2000 3000; do
nmap -Pn --max-retries 0 -p $p example.net;
done

try this (as root):
echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
or this:
int sc = 1;
setsockopt(sock, IPPROTO_TCP, TCP_SYNCNT, &sc, sizeof(sc));

You can't prevent the TCP/IP stack from doing what it is expressly designed to do.

How can I test an outbound connection to an IP address as well as a specific port?

OK, we all know how to use PING to test connectivity to an IP address. What I need to do is something similar but test if my outbound request to a given IP Address as well as a specif port (in the present case 1775) is successful. The test should be performed preferably from the command prompt.

Here is a small site I made allowing to test any outgoing port. The server listens on all TCP ports available.
http://portquiz.net
telnet portquiz.net XXXX

If there is a server running on the target IP/port, you could use Telnet. Any response other than "can't connect" would indicate that you were able to connect.

To automate the awesome service portquiz.net, I did write a bash script :
NB_CONNECTION=10
PORT_START=1
PORT_END=1000
for (( i=$PORT_START; i<=$PORT_END; i=i+NB_CONNECTION ))
do
iEnd=$((i + NB_CONNECTION))
for (( j=$i; j<$iEnd; j++ ))
do
#(curl --connect-timeout 1 "portquiz.net:$j" &> /dev/null && echo "> $j") &
(nc -w 1 -z portquiz.net "$j" &> /dev/null && echo "> $j") &
done
wait
done

If you're testing TCP/IP, a cheap way to test remote addr/port is to telnet to it and see if it connects. For protocols like HTTP (port 80), you can even type HTTP commands and get HTTP responses.
eg
Command IP Port
Telnet 192.168.1.1 80

The fastest / most efficient way I found to to this is with nmap and portquiz.net described here: http://thomasmullaly.com/2013/04/13/outgoing-port-tester/ This scans to top 1000 most used ports:
# nmap -Pn --top-ports 1000 portquiz.net
Starting Nmap 6.40 ( http://nmap.org ) at 2017-08-02 22:28 CDT
Nmap scan report for portquiz.net (178.33.250.62)
Host is up (0.072s latency).
rDNS record for 178.33.250.62: electron.positon.org
Not shown: 996 closed ports
PORT STATE SERVICE
53/tcp open domain
80/tcp open http
443/tcp open https
8080/tcp open http-proxy
Nmap done: 1 IP address (1 host up) scanned in 4.78 seconds
To scan them all (took 6 sec instead of 5):
# nmap -Pn -p1-65535 portquiz.net

The bash script example of #benjarobin for testing a sequence of ports did not work for me so I created this minimal not-really-one-line (command-line) example which writes the output of the open ports from a sequence of 1-65535 (all applicable communication ports) to a local file and suppresses all other output:
for p in $(seq 1 65535); do curl -s --connect-timeout 1 portquiz.net:$p >> ports.txt; done
Unfortunately, this takes 18.2 hours to run, because the minimum amount of connection timeout allowed integer seconds by my older version of curl is 1. If you have a curl version >=7.32.0 (type "curl -V"), you might try smaller decimal values, depending on how fast you can connect to the service. Or try a smaller port range to minimise the duration.
Furthermore, it will append to the output file ports.txt so if run multiple times, you might want to remove the file first.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex