Why TCP not recved ACK from server in Solaris machine? - networking

In the client side , sftp application send some packet to ssh server port 22.
SFTP application send packet to TCP , from etherial capture we can see that
sftp packet send from application to TCP and TCP send to packet to server but TCP not recieved TCP ACK from the server so TCP again send the packet after few second but still no response from server..
It seems that server no received the packet from client.
Client SFTP allication wait in select for TCP recv with timeout of 120 second
after 120 second application get timeout from select and close the SFTP operation
with timeout error.
In capture I can see TCP retransmit packet many times but fail to recv server TCP ACK.
Scenario:
1. Timeout happen sometime only.
2. After this issue next SFTP opration[upload] success with same server.
3. It seems network has no issue because next upload is working fine.
4. both client and server has SOLARIS OS
5. we are unable to reproduce this in our Lab environment
6. This issue happen only in real customer network.
7. Appln is in C language. SSH server is Open SSH server.
I want to know:
1. How can we found reason for TCP not recv ACK repply form Server.
2. Is any TCP system setting in solaris cause this issue.
3. please provide any inforamtion so that we can resolve this issue.

I assume your topology looks like this:
10.25.190.12 10.10.10.10
[e1000g0] [eth0]
SFTP_Client--------------Network------------OpenSSH_Server
There are two things you need to do:
Establish whether there is regular significant packet loss between your client and server. TCP tolerates some packet loss, but if you start dropping a lot (which is honestly hard to quantify) it's going to just give up in some circumstances. I would suggest two ways of detecting packet loss... the first is mtr, the second is ping. mtr is far preferable, since you get loss statistics per-hop (see below). Run mtr 10.10.10.10 from the client and mtr 10.25.190.12 from the server. Occasionally, packet loss is path-dependent, so it's useful to do it from both sides when you really want to nail down the source of it. If you see packet loss, work with your network administrators to fix it first; you're wasting your time otherwise. In the process of fixing the packet loss, it's possible you will fix this TCP ACK problem as well.
If there is no significant packet loss, you need to sniff both sides of the connection simultaneously with snoop or tshark (you can get tshark from SunFreeware) until you see the problem again. When you find this situation with missing TCP ACKs, figure out whether: A) the OpenSSH_Server sent the ACK, and B) whether the SFTP_Client received it. If the Client gets the ACK on its ethernet interface, then you probably need to start looking in your software for clues. You should be restricting your sniffs to the IP addresses of the client and server. In my experience, this kind of issue is possible, but not a common problem; 90+% of the time, it's just network packet loss.
Sample output from mtr:
mpenning#mpenning-T61:~$ mtr -n 4.2.2.4
HOST: mpenning-T61 Loss% Snt Last Avg Best Wrst StDev
1. 10.239.84.1 0.0% 407 8.8 9.1 7.7 11.0 1.0
2. 66.68.3.223 0.0% 407 11.5 9.2 7.1 11.5 1.3
3. 66.68.0.8 0.0% 407 19.9 16.7 11.2 21.4 3.5
4. 72.179.205.58 0.0% 407 18.5 23.7 18.5 28.9 4.0
5. 66.109.6.108 5.2% 407 16.6 17.3 15.5 20.7 1.5 <----
6. 66.109.6.181 4.8% 407 18.2 19.1 16.8 23.6 2.3
7. 4.59.32.21 6.3% 407 20.5 26.1 19.5 68.2 14.9
8. 4.69.145.195 6.4% 406 21.4 27.6 19.8 79.1 18.1
9. 4.2.2.4 6.8% 406 22.3 23.3 19.4 32.1 3.7

Related

Packet loss issue

My internet download speed is 8 Mbps, upload speed is 17.6 Mbps ping is 2 ms & jitter is also 2 ms. The problem is when I run the packet loss check by running this command: ping -n 100 8.8.8.8 it gives 100% packet loss but it should be under or equals 1%. What should I do to make it under 1% ?
Porbably ICMP (which ping uses by default, ICMP ECHO_REQUEST (type 8) to be specific) is blocked by a firewall, therefore you have a 100% packet loss

iperf TCP much faster than UDP, why? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I am wondering why iperf shows much better performance in TCP than UDP. This question is very similar to this one.
UDP should be much faster than TCP because there are no acknowledge and congestion detection. I am looking for an explanation.
UDP (807 MBits/sec)
$ iperf -u -c 127.0.0.1 -b10G
------------------------------------------------------------
Client connecting to 127.0.0.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 127.0.0.1 port 52064 connected with 127.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 962 MBytes 807 Mbits/sec
[ 3] Sent 686377 datagrams
[ 3] Server Report:
[ 3] 0.0-10.0 sec 960 MBytes 805 Mbits/sec 0.004 ms 1662/686376 (0.24%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-order
TCP (26.7 Gbits/sec)
$ iperf -c 127.0.0.1
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[ 3] local 127.0.0.1 port 60712 connected with 127.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 31.1 GBytes 26.7 Gbits/sec
The default length of UDP datagrams is 1470 bytes. You probably need to increase the length with the -l parameter. For 26Gb/s I'd try something like 50000 for your -l parameter and go up or down from there
You also probably need to add a space between your '-b10G' so that it knows 10G is the value to use for the -b parameter. Also I believe the capital G means GigaBYTES. Your maximum achievable bandwidth with a TCP test is 26 GigaBITS which isnt anywhere close to 10GB. I would make your -b parameter value 26g, with a lower-case g.
I suspect you're using the old iperf version 2.0.5 which has known performance problems with UDP. I'd suggest a 2.0.10 version.
iperf -v will give the version
Note 1: The primary issue in 2.0.5 associated with this problem is due to mutex contention between the client thread and the reporter thread. The shared memory between these two threads was increased to address the issue.
Note 3: There are other performance related enhancements in 2.0.10.
Bob
UDP should be much faster than TCP because there are no acknowledge and congestion detection.
That will mostly depending on what you are looking to do. If you need to transfer files between two end-points in the Internet, unless you manually implement a reliable transmission mechanism on UDP at the application level, you will want to use TCP.
In my opinion, it does not make much sense to do a pure UDP bandwidth test with iPerf, as essentially it just results in iPerf trying to put packets on the wire as fast as possible. I would suggest using it for generating UDP flows with a constant data rate, in order to roughly measure what would happen to UDP traffic, such as VoIP, in your network.
TCP is helped with various hadware offloads such as tso/gro where as UDP is not helped by any of those offloads as they don't apply on udp datagrams.

xmpp: size of stanzas versus bandwidth

I am trying to estimate bandwidth usage of a XMPP application.
The application is receiving a 150-bytes ping each second, and answering with a ping of the same size.(*)
However, when I measure the data usage, I get something like 900 bytes per ping (and not the 300 expected)
I have a suspicion this might relate to something layer bellow (TCP? IP?) and datagram sizes. But, so far, reading the TCP/IP guide did not lead me anywhere.
Another hypothesis would be that this overhead comes from XMPP itself, somehow.
Can anyone enlighten me ?
(*) to get this "150 bytes" I counted the number of chars in the <iq> (the xml representation of the ping)
I am using TLS, but not BOSH (actually, BOSH on the other connection: I am measuring results in the android client, and the pings are coming from a web application, but I think that should not matter)
The client is Xabber, running on android
Lets try to calculate the worst-case overhead down to the IP level.
For TLS we have:
With TLS 1.1 and up, in CBC mode: an IV of 16 bytes.
Again, in CBC mode: TLS padding. TLS uses blocks of 16 bytes, so it may need to add 15 bytes of padding.
(Technically TLS allows for up to 255 bytes of padding, but in practice I think that's rare.)
With SHA-384: A MAC of 48 bytes.
TLS header of 5 bytes.
That's 84 extra bytes.
TCP and IP headers are 40 bytes (from this answer) if no extra options are used, but for IPv6 this would be 60 bytes.
So you could be seeing 84 + 60 + 150 = 294 bytes per ping.
However, on the TCP level we also need ACKs. If you are pinging a different client (especially over BOSH), then the pong will likely be too late to piggyback the TCP ACK for the ping. So the server must send a 60 byte ACK for the ping and the client also needs to send a 60 byte ACK for the pong.
That brings us to:
294 + 60 + 294 + 60 = 708
900 still sounds a lot too large. Are you sure the ping and the pong are both 150 bytes?

how to delay sip 183 in asterisk

my calls did not receive ringback tones but went to the IVR system in less than 4 seconds. When reviewed the SIP captures on my end and noticed that the SIP 180 Ringing message is followed instantly by SIP 183 Session Progress with SDP. The SIP 183 with SDP is indicating that my asterisk server is ready to send through audio and since there is no ringing within audio streams so, no ringback is observed. So, please tell me how to put delay in SIP 183.
I am using asterisk 1.4 in centos 5
You can't put any delay in progress messages.
Actualy you can, but that require rewrite of chan_sip.c(posible, but costly)
You can remove 183 compleatly. See this article:
http://www.voip-info.org/wiki/view/Asterisk+sip+progressinband

Windows 7 or Vista TCP behavior changes

Resolution, of sorts
The client computer that was showing this problem had Trend Micro Security installed. This security suite placed a service or driver on top of each network adapter in the system. I did not bother to debug further once this legacy app started working again.
Update 1
I disabled TCP window scale auto-tuning on Win7.
On Windows 7 if I unplug the ethernet cable directly connected to the server, the disconnection happens after about 5 seconds but the client process crashes. netstat on the server reports two TCP connections to the client that are no longer valid, because the client process did not gracefully shutdown and close the connections.
After putting the server in this strange state after the physical disconnect, If I restart the client process it hangs while connecting to the server (just as described in the original)
If I perform a physical disconnection on the XP side, the disconnect happens more quickly than on Win7. Some sort of keep alive value or behavior is different on XP. While ssh'd (via Putty) the ssh connection dies more quickly on XP than Win7 as well.
Original
I have a legacy TCP client/server app that appears to foul up the server only when the client is a Windows 7 machine.
The server is OpenEmbedded Linux running 2.6.11.
A Windows 7 client connects for a bit, and eventually gets to a state where the client disconnects after a second or two.
Once the server is in this state, If I immediately connect a Windows XP client, the XP client cannot connect either.
I cannot appear to get the server into the buggy state by connecting with an XP client alone.
I'd like to know what changes were made to the TCP/IP stack starting with Vista or Windows 7 so I can better debug the legacy code.
I'd also like to know what commands I can run on the Linux server that might better help me understand why the connections are failing.
Perhaps the best thing you can do is to fire up tcpdump or wireshark under linux and analyze the TCP SYN sent by both Windows XP and Windows 7. Wireshark allows you to break out bit-by-bit what TCP options are sent... for example, this is what you see from a debian lenny box making a TCP connection:
Transmission Control Protocol, Src Port: 58456 (58456), Dst Port: 23 (23), Seq: 0, Len: 0
Source port: 58456 (58456)
Destination port: 23 (23)
Sequence number: 0 (relative sequence number)
Header length: 40 bytes
Flags: 0x02 (SYN)
0... .... = Congestion Window Reduced (CWR): Not set
.0.. .... = ECN-Echo: Not set
..0. .... = Urgent: Not set
...0 .... = Acknowledgment: Not set
.... 0... = Push: Not set
.... .0.. = Reset: Not set
.... ..1. = Syn: Set
.... ...0 = Fin: Not set
Window size: 5840
Checksum: 0x8b77 [correct]
[Good Checksum: True]
[Bad Checksum: False]
Options: (20 bytes)
Maximum segment size: 1460 bytes
SACK permitted
Timestamps: TSval 136991740, TSecr 0
NOP
Window scale: 6 (multiply by 64)
My suspicion is that you'll see differences in RFC 1323 Window Scaling, but I don't have an XP machine handy to validate this.
I gave a detailed response of how to analyze TCP connections using tcptrace under linux in this answer...
How can I measure the performance and TCP RTT of my server code?
I also suspect a Window Scaling issue. I cannot find a link just at the moment, but there were complaints when Vista first came out reporting that that something was screwing with some routers (belkins If I recall). They traced it down to a problem with one of the window sizes that Vista (and thereby Windows 7) changes by default. The routers would get hung up and need to be reset every few minutes.
You can issue some commands to turn off window scaling, see if your problem goes away.
From Here:
netsh interface tcp set global autotuninglevel=disabled
Edit:
Try disabling IPv6 on windows 7. Link on how to do that. With IPv4, it should act the same as windows XP. Load up wireshark on the two systems and compare the differences

Resources