ZeroWindow errors and netstat statistics? - networking

I have been told that one of my servers intermittently throws ZeroWindow errors. I would like to monitor this in Prometheus.
If I run neststat -s some of the results are:
netstat -s
Ip:
...
IcmpMsg:
...
Tcp:
...
TcpExt:
TCPFromZeroWindowAdv: 96
TCPToZeroWindowAdv: 96
TCPWantZeroWindowAdv: 16
It is very difficult to find a definition for this the closest that I have found is:
WantZeroWindowAdv: +1 each time window size of a sock is 0
ToZeroWindowAdv: +1 each time window size of a sock dropped to 0
FromZeroWindowAdv: +1 each time window size of a sock increased from 0
Reading this I believe that WantZeroWindowAdv show the ZeroWindow problems. (It counts each time that a socket is requested its window size and responds with 0.)
Not part of the question - then I would need to add this to nodes_netstat.go for prometheus.
Am I correct - is this approach valid? Netstat is not highly documented.

Your descriptions of "To" and "From" are correct.
"Want" is when TCP would have liked to have sent a zero window back to a sender, but couldn't because that would have implied a shrinking of the window rather than it being full.

Related

How to know the netwrok traffic my test (using JMeter) is going to generate?

I am going to run load test using JMeter over Amazon AWS and I need to know before starting my test how much traffic is it going to generate over network.
The criteria that Amazon has in their policy is:
sustains, in aggregate, for more than 1 minute, over 1 Gbps (1 billion bits per second) or 1 Gpps (1 billion packets per second). If my test is going to exceed this criteria we need to submit a form before starting the test.
so how can I know if the test is going to exceed this number or not?
Run your test with 1 virtual user and 1 iteration in command-line non-GUI mode like:
jmeter -n -t test.jmx -l result.csv
To get an approximate figure open Open the result.csv file using Aggregate Report listener and there you will have 2 columns: Received KB/sec and Sent KB/sec. Multiply it by the duration of your test in seconds and you will get the number you're looking for.
alternatively you can open the result.csv file using MS Excel or LibreOffice Calc or equivalent where you can sum bytes and sentBytes columns and get the traffic with 1 byte precision:

SIM5320E - POST request with large data is slow

I have built a prototype using a raspberry and a sim5320E module. The goal is to send a large amount of data (~100Kb) through HTTP using this 3G module.
I have followed the instructions specified in section 16.5 (HTTPS) of the AT Command set for the SIM5320:
https://cdn-shop.adafruit.com/datasheets/SIMCOM_SIM5320_ATC_EN_V2.02.pdf
And it worked fine, except that it is slow.
From what I understand from the documentation (and seen from my tests), the data to be sent must be divided in chunks of max 4096 bytes.
Every chunk must be sent to what is called the "sending buffer" using the command AT+CHTTPSSEND.
Every now and then, we must check that the sending buffer does not have too much data in cache using the AT+CHTTPSSEND? command.
The last AT+CHTTPSSEND command commits all sending data.
My problem is that every AT+CHTTPSSEND takes around 10 seconds to complete, which means that my HTTP request will take around 250 seconds to complete.
Anybody knows what might cause this slowness?
Here is some code to illustrate the issue:
def send_chunk(self, chunk):
# Send chunk
self._send('CHTTPSSEND={}'.format(len(chunk)), wait_for=">")
self._send_raw(chunk.encode())
# Check how much data is left in the sending buffer
# Wait for this data to be under 3Kb
data_left = 3001
while data_left > 3000:
response = self._send('CHTTPSSEND?', wait_for="+CHTTPSSEND:")
data_left = int(response.strip().split(" ")[1])
time.sleep(2)
And here are the logs I get:
>> AT+CHTTPSSEND=4096 -> This commands takes ~10 seconds
<< >
>> Sending chunk of data
<< OK
>> AT+CHTTPSSEND?
<< +CHTTPSSEND: 0

Write and read from a serial port

I am using the following python script to write AT+CSQ on serial port ttyUSB1.
But I cannot read anything.
However, when I fire AT+CSQ on minicom, I get the required results.
What may be the issue with this script?
Logs:
Manual Script
root#imx6slzbha:~# python se.py
Serial is open
Serial is open in try block also
write data: AT+CSQ
read data:
read data:
read data:
read data:
Logs:
Minicom console
1. ate
OK
2. at+csq
+CSQ: 20,99
3. at+csq=?
OKSQ: (0-31,99),(99)
How can I receive these results in the following python script?
import serial, time
#initialization and open the port
#possible timeout values:
# 1. None: wait forever, block call
# 2. 0: non-blocking mode, return immediately
# 3. x, x is bigger than 0, float allowed, timeout block call
ser = serial.Serial()
ser.port = "/dev/ttyUSB1"
ser.baudrate = 115200
ser.bytesize = serial.EIGHTBITS #number of bits per bytes
ser.parity = serial.PARITY_NONE #set parity check: no parity
ser.stopbits = serial.STOPBITS_ONE #number of stop bits
ser.timeout = None #block read
#ser.timeout = 0 #non-block read
ser.timeout = 3 #timeout block read
ser.xonxoff = False #disable software flow control
ser.rtscts = False #disable hardware (RTS/CTS) flow control
ser.dsrdtr = False #disable hardware (DSR/DTR) flow control
ser.writeTimeout = 2 #timeout for write
try:
ser.open()
print("Serial is open")
except Exception, e:
print "error open serial port: " + str(e)
exit()
if ser.isOpen():
try:
print("Serial is open in try block also")
ser.flushInput() #flush input buffer, discarding all its contents
ser.flushOutput()#flush output buffer, aborting current output
#and discard all that is in buffer
#write data
ser.write("AT+CSQ")
time.sleep(1)
# ser.write("AT+CSQ=?x0D")
print("write data: AT+CSQ")
# print("write data: AT+CSQ=?x0D")
time.sleep(2) #give the serial port sometime to receive the data
numOfLines = 1
while True:
response = ser.readline()
print("read data: " + response)
numOfLines = numOfLines + 1
if (numOfLines >= 5):
break
ser.close()
except Exception, e1:
print "error communicating...: " + str(e1)
else:
print "cannot open serial port "
You have two very fundamental flaws in your AT command handling:
time.sleep(1)
and
if (numOfLines >= 5):
How bad are they? Nothing will ever work until you fix those, and by that I mean completely change the way you send and receive command and responses.
Sending AT commands to a modem is a communication protocol like any other protocols, where certain parts and behaviours are required and not optional. Just like you would not write a HTTP client that completely ignores the responses it gets back from the HTTP server, you must never write a program that sends AT commands to a modem and completely ignores the responses the modem sends back.
AT commands are a link layer protocol, with with a window size of 1 - one. Therefore after sending a command line, the sender MUST wait until has received a response from the modem that it is finished with processing the command line, and that kind of response is called Final result code.
If the modem uses 70ms before it responds with a final result code you have to wait at least 70ms before continuing, if it uses 4 seconds you have to wait at least 4 seconds before continuing, if it uses several minutes (and yes, there exists AT commands that can take minutes to complete) you have to wait for several minutes. If the modem has not responded in an hour, your only options are 1) continue waiting, 2) just give up or 3) disconnect, reconnect and start all over again.
This is why sleep is such a horrible approach that in the very best case is a time wasting ticking bomb. It is as useful as kicking dogs that stand in your way in order to get them to move. Yes it might actually work some times, but at some point you will be sorry for taking that approach...
And regarding numOfLines there is no way anyone in advance can know exactly how many lines a modem will respond with. What if your modem just responds with a single line with the ERROR final result code? The code will deadlock.
So this line number counting has to go completely away, and instead your code should be sending a command line and then wait for the final result code by reading and parsing the response lines from the modem.
But before diving too deep into that answer, start by reading the V.250 specification, at least all of chapter 5. This is the standard that defines the basics of AT command, and will for instance teach you the difference between a command and a command line. And how to correctly terminate a command line which you are not doing, so the modem will never start processing the commands you send.

Why can't I use hPutStr after printing the result of hGetContents?

I'm new to stackoverflow so forgive me if I do something wrong. I trying to understand how a simple server would work in Haskell. I think I'm missing something very simple or fundamental about how hGetContents works.
import Network
import System.IO
main = withSocketsDo $ do
socket <- listenOn $ PortNumber 5002
(h, _, _) <- accept socket
c <- hGetContents h
-- putStrLn c -- doesn't work
-- putStrLn $ head $ lines c -- works!
-- putStrLn $ unlines $ take 2 $ lines c -- works!
-- putStrLn $ unlines $ take 3 $ lines c -- works!
-- putStrLn $ unlines $ take 6 $ lines c -- works!
putStrLn $ unlines $ take 10 $ lines c -- doesn't work
hPutStr h $ "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nHello!\r\n"
hClose h
After running the program, I navigate via web browser to http://localhost:5002. The problem seems to be that, depending on how much I've parsed the handle contents, I eventually am unable to send a response. I'd like to be able to parse the request before I send a response. I've commented in the code the cases that work and the cases that don't. Hoogle says that for hGetContents (lazy) the handle is "semi-closed" as it is being read. Am I misunderstanding the laziness or should I consider the handle closed once I begin parsing its contents?
The error I get is "hPutChar: resource vanished (Broken pipe)." Thanks for any help.
I tried to reproduce your problem. For that I executed your code and send it a request using nc:
printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11" | nc localhost 5002
As expected the server (code from your question) printed out first 10 lines and exited without any error. The client (nc) printed:
HTTP/1.0 200 OK
Content-Length: 5
Hello!
and also exited without an error.
So, at first I couldn't understand what's your problem, but then I tried to send a smaller request:
printf "1\n2\n3\n4\n5\n6\n" | nc localhost 5002
The server printed first 6 lines and didn't exit. The client also didn't exit, so I interrupted it with Ctrl-C and after that the server exited with "resource vanished" error.
I took some thinking and it started making sense to me. I don't understand lazy IO too good, so if my explanation isn't clear or correct it would be helpful if someone with better understanding would improve it.
Let's follow your code. First:
(h, _, _) <- accept socket
c <- hGetContents h
You open a handle and read it's content. Note that the handle is lazy and the content that you get is also lazy. When we say that something is lazy we mean that it can be passed around without being evaluated (it's often referred as 'call by name' vs 'call by value').
Now:
putStrLn $ unlines $ take 10 $ lines c
Here it is, you pass your lazy, unevaluated content to another function take 10. take 10 will try to evaluate first 10 elements of a list and return them, if there are less than 10 elements in the list it would simply return all of them. After take 10 we have putStrLn and unlines which both perfectly compatible with laziness.
Now let's say that client sends an input that is only 6 lines long and then starts waiting for the respond. Our server lazily receives the content and tries to print first 10 lines. First, take 10 function happily consumes the first 6 lines and passes them over to putStrLn . unlines, what happens then? take 10 can't just finish it's output because there is absolutely no indication that it is the end. The handle is still open, bytes still can be floating from client to server, so it just waits for more input.
This behaviour can be observed by running:
nc localhost 5002
and manually typing there 10 lines. The input would appear on server line-by-line as you type. After you will type the 10th line the server will respond with "Hello" message.
P.S: I guess that the behaviour that you described happens because you web browser sends 6 to 9 lines of something with the request.
To test, debug and analyze this kind of low level servers you should use simple tools like nc and curl instead of your web browser :)
When you initiate a lazy read on a handle, you give up the right to do anything much else with the handle until the contents string is fully forced, or you close the handle manually (at which point attempting to force any more of the contents string will lead to bad behavior or an error).
TL;DR
This is not a situation where lazy I/O is appropriate. The situations where a lazy read on a socket is appropriate can probably be counted on zero fingers. You can use regular strict I/O if you like, or conduit, or pipes, or some Haskell web framework like Yesod or Scotty or various other competitors.
Calling hGetContents puts the handle into a "semi-closed" state. You should not perform any operations on the handle after that point. You should only use the string returned from hGetContents.
Put simply, don't use lazy I/O here. You need to manually read and write individual strings one at a time, since the timing matters.
In general, lazy I/O is kind of neat, but it doesn't work well for anything much beyond toy examples.

Munin: What does the 'm' mean in the y-axis of nginx requests?

I have the following munin-generated graph of nginx requests:
What does the 'm' in the y-axis mean?
The nginx munin plugin at /usr/share/munin/plugins/nginx_request is extracting:
if ($response->content =~ /^\s+(\d+)\s+(\d+)\s+(\d+)/m) {
print "request.value $3\n";
Which means it is taking the third component of nginx_status, which appears to be the total accumulated request count. Here is an example execution from this same server:
$ curl http://127.0.0.1/nginx_status
Active connections: 1
server accepts handled requests
2936 2936 4205
Reading: 0 Writing: 1 Waiting: 0
The munin nginx plugin is passing the following to rrdtool:
print "graph_title Nginx requests\n";
print "graph_args --base 1000\n";
print "graph_category nginx\n";
print "graph_vlabel Request per second\n";
print "request.label req/sec\n";
print "request.type DERIVE\n";
print "request.min 0\n";
print "request.label requests port $port\n";
print "request.draw LINE2\n";
The 'm' is the 'milli' prefix for the units. So, 400 m means 0.400.
By default, RRDTool uses the SI prefixes -- 2000 is shown as 2k, 0.01 is shown as 10m and so on. This isn't normally an issue except when there are no units or the thing being measured doesnt make sense in fractional parts.
The way to stop this behaviour is to not use the %s in the GPRINT (this fixes the legend), and to use the --units-exponent=0 option (this fixes the Y-axis). I don't know that it is possible to make munin do this, though. You might be able to modify the plugin to add '--units-exponent 0' to the graph_args though.

Resources