Kubernetes + Calico on Oracle Cloud VMs - networking

[ Disclaimer: this question was originally posted on ServerFault. However, since the official K8s documentation states "ask your questions on StackOverflow", I am also adding it here ]
I am trying to deploy a test Kubernetes cluster on Oracle Cloud, using OCI VM instances - however, I'm having issues with pod networking.
The networking plugin is Calico - it seems to be installed properly, but no traffic gets across the tunnels from one host to another. For example, here I am trying to access nginx running on another node:
root#kube-01-01:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
nginx-dbddb74b8-th9ns 1/1 Running 0 38s 192.168.181.1 kube-01-06 <none>
root#kube-01-01:~# curl 192.168.181.1
[ ... timeout... ]
Using tcpdump, I see the IP-in-IP (protocol 4) packets leaving the first host, but they never seem to make it to the second one (although all other packets, including BGP traffic, make it through just fine).
root#kube-01-01:~# tcpdump -i ens3 proto 4 &
[1] 16642
root#kube-01-01:~# tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
root#kube-01-01:~# curl 192.168.181.1
09:31:56.262451 IP kube-01-01 > kube-01-06: IP 192.168.21.64.52268 > 192.168.181.1.http: Flags [S], seq 3982790418, win 28000, options [mss 1400,sackOK,TS val 9661065 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
09:31:57.259756 IP kube-01-01 > kube-01-06: IP 192.168.21.64.52268 > 192.168.181.1.http: Flags [S], seq 3982790418, win 28000, options [mss 1400,sackOK,TS val 9661315 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
09:31:59.263752 IP kube-01-01 > kube-01-06: IP 192.168.21.64.52268 > 192.168.181.1.http: Flags [S], seq 3982790418, win 28000, options [mss 1400,sackOK,TS val 9661816 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
root#kube-01-06:~# tcpdump -i ens3 proto 4 &
[1] 12773
root#kube-01-06:~# tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
What I have checked so far:
The Calico routing mesh comes up just fine. I can see the BGP traffic on the packet capture, and I can see all nodes as "up" using calicoctl
root#kube-01-01:~# ./calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.13.23.123 | node-to-node mesh | up | 09:12:50 | Established |
| 10.13.23.124 | node-to-node mesh | up | 09:12:49 | Established |
| 10.13.23.126 | node-to-node mesh | up | 09:12:50 | Established |
| 10.13.23.129 | node-to-node mesh | up | 09:12:50 | Established |
| 10.13.23.127 | node-to-node mesh | up | 09:12:50 | Established |
| 10.13.23.128 | node-to-node mesh | up | 09:12:50 | Established |
| 10.13.23.130 | node-to-node mesh | up | 09:12:52 | Established |
+--------------+-------------------+-------+----------+-------------+
The security rules for the subnet allow all traffic. All the nodes are in the same subnet, and I have a stateless rule permitting all traffic from other nodes within the subnet (I have also tried adding a rule permitting IP-in-IP traffic explicitly - same result).
The source/destination check is disabled on all the vNICs on the K8s nodes.
Other things I have noticed:
I can get calico to work if I disable IP in IP encapsulation for same-subnet traffic, and use regular routing inside the subnet (as described here for AWS)
Other networking plugins (such as weave) seem to work correctly.
So my question here is - what is happening to the IP-in-IP encapsulated traffic? Is there anything else I can check to figure out what is going on?
And yes, I know that I could have used managed Kubernetes engine directly, but where is the fun (and the learning opportunity) in that? :D
Edited to address Rico's answer below:
1) I'm also not getting any pod-to-pod traffic to flow through (no communication between pods on different hosts). But I was unable to capture that traffic, so I used node-to-pod as an example.
2) I'm also getting a similar result if I hit a NodePort svc on another node than the one the pod is running on - I see the outgoing IP-in-IP packets from the first node, but they never show up on the second node (the one actually running the pod):
root#kube-01-01:~# tcpdump -i ens3 proto 4 &
[1] 6499
root#kube-01-01:~# tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
root#kube-01-01:~# curl 127.0.0.1:32137
20:24:08.460069 IP kube-01-01 > kube-01-06: IP 192.168.21.64.40866 > 192.168.181.1.http: Flags [S], seq 3175451438, win 43690, options [mss 65495,sackOK,TS val 19444115 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
20:24:09.459768 IP kube-01-01 > kube-01-06: IP 192.168.21.64.40866 > 192.168.181.1.http: Flags [S], seq 3175451438, win 43690, options [mss 65495,sackOK,TS val 19444365 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
20:24:11.463750 IP kube-01-01 > kube-01-06: IP 192.168.21.64.40866 > 192.168.181.1.http: Flags [S], seq 3175451438, win 43690, options [mss 65495,sackOK,TS val 19444866 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
20:24:15.471769 IP kube-01-01 > kube-01-06: IP 192.168.21.64.40866 > 192.168.181.1.http: Flags [S], seq 3175451438, win 43690, options [mss 65495,sackOK,TS val 19445868 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
Nothing on the second node ( kube-01-06, the one that is actually running the nginx pod ):
root#kubespray-01-06:~# tcpdump -i ens3 proto 4
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
I used 127.0.0.1 for ease of demonstration - of course, the exact same thing happens when I hit that NodePort from an outside host:
20:25:17.653417 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56630 > 192.168.181.1.http: Flags [S], seq 980178400, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
20:25:17.654371 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56631 > 192.168.181.1.http: Flags [S], seq 3932412963, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
20:25:17.667227 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56632 > 192.168.181.1.http: Flags [S], seq 2017119223, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
20:25:20.653656 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56630 > 192.168.181.1.http: Flags [S], seq 980178400, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
20:25:20.654577 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56631 > 192.168.181.1.http: Flags [S], seq 3932412963, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
20:25:20.668595 IP kube-01-01 > kube-01-06: IP 192.168.21.64.56632 > 192.168.181.1.http: Flags [S], seq 2017119223, win 64240, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0 (ipip-proto-4)
3) As far as I can tell (please correct me if I'm wrong here), the nodes are aware of routes to pod networks, and pod-to-node traffic is also encapsulated IP-in-IP (notice the protocol 4 packets in the first capture above)
root#kube-01-01:~# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
alpine-9d85bf65c-2wx74 1/1 Running 1 23m 192.168.82.194 kube-01-08 <none>
nginx-dbddb74b8-th9ns 1/1 Running 0 10h 192.168.181.1 kube-01-06 <none>
root#kube-01-01:~# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
<snip>
192.168.181.0 10.13.23.127 255.255.255.192 UG 0 0 0 tunl0

Maybe it is a MTU issue:
Typically the MTU for your workload interfaces should match the network MTU. If you need IP-in-IP then the MTU size for both the workload and tunnel interfaces should be 20 bytes less than the network MTU for your network. This is due to the extra 20 byte header that the tunnel will add to each packet.
Read more here.

After a long time and a lot of testing, my belief is that this was caused by IP-in-IP (ipip, or IP protocol 4) traffic being blocked by the Oracle cloud networking layer.
Even though I was unable to find this documented anywhere, it is something that is common for cloud providers (Azure, for example, does the same thing - disallows IP-in-IP and unknown IP traffic).
So the possible workarounds here should be the same ones as the ones listed in the Calico documentation for Azure:
Disabling IP-in-IP for same-subnet traffic (as I mentioned in the question)
Switching Calico to VXLAN encapsulation
Using Calico for policy only, and flannel for encapsulation (VXLAN)

Are you having issues connecting from Pod to Pod?
The short answer here would seem that he PodCidr packets are getting encapsulated when they are communicating to another pod either on the same node or another node.
Note:
By default, Calico’s IPIP encapsulation applies to all container-to-container traffic.
So you will be able to connect to a pod on another node if you are inside the pod. For example, if you connect with kubectl exec -it <pod-name>.
This is the reason you can't connect to a pod/container from root#kube-01-01:~# since your node/host doesn't know anything about the PodCidr. It sends 192.168.x.x packets through the default node/host route, however, your physical network is not 192.168.x.x so they get lost since there's no other node/host that physically understands that.
The way you would connect to a nginx service would be through a Kubernetes Service, this is different from the network overlay and allows you connect to pods outside of the PodCidr. Note that these service rules are managed by the kube-proxy and are generally iptables rules. Also, with iptables you can explicitly do things like if you want to talk to IP A.A.A.A you need to go through a physical interface (i.e. tun0) or you have to through IP B.B.B.B.
Hope it helps!

Related

Neutron+VLAN. Allow a trunk traffic between VMs

Please tell me, how can I make it possible to transfer Trunk traffic at the user level in the Neutron Private Network?
Description
We have the following private, non-shared network without any Gateway:
I want to make sure that VMs can configure Trunk-VLAN connections to each other. At the same time, the creation of these connections was controlled at the OS level (the count of such connections and VMs is completely chaotic and random )
The problem is the following - if you make several VMs on, for example, Centos and try to set up the relationship of one VM to another through the Trunk port with ID, for example, 5, the Neutron network completely drop such traffic.
### Both VMs ###
[root#vlan-X centos]# modprobe bonding
[root#vlan-X centos]# modprobe 8021q
[root#vlan-X centos]# echo "8021q" > /etc/modules-load.d/8021q.conf
[root#vlan-X centos]# echo "bonding" > /etc/modules-load.d/bonding.conf
### VM1 ###
[root#vlan-1 centos]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
NAME="eth1"
DEVICE="eth1"
ONBOOT="yes"
TYPE="Ethernet"
BOOTPROTO="none"
[root#vlan-1 centos]# vi /etc/sysconfig/network-scripts/ifcfg-eth1.5
ONBOOT=yes
VLAN=yes
DEVICE=eth1.5
BOOTPROTO=static
IPADDR=192.168.10.15
PREFIX=24
[root#vlan-1 centos]# systemctl restart network
### VM2 ###
[root#vlan-2 centos]# vi /etc/sysconfig/network-scripts/ifcfg-eth1
NAME="eth1"
DEVICE="eth1"
ONBOOT="yes"
TYPE="Ethernet"
BOOTPROTO="none"
[root#vlan-2 centos]# vi /etc/sysconfig/network-scripts/ifcfg-eth1.5
ONBOOT=yes
VLAN=yes
DEVICE=eth1.5
BOOTPROTO=static
IPADDR=192.168.10.16
PREFIX=24
[root#vlan-2 centos]# systemctl restart network
[root#vlan-2 centos]# ping 192.168.10.15
PING 192.168.10.15 (192.168.10.15) 56(84) bytes of data.
From 192.168.10.16 icmp_seq=1 Destination Host Unreachable
From 192.168.10.16 icmp_seq=2 Destination Host Unreachable
From 192.168.10.16 icmp_seq=3 Destination Host Unreachable
From 192.168.10.16 icmp_seq=4 Destination Host Unreachable
From 192.168.10.16 icmp_seq=5 Destination Host Unreachable
From 192.168.10.16 icmp_seq=6 Destination Host Unreachable
From 192.168.10.16 icmp_seq=7 Destination Host Unreachable
From 192.168.10.16 icmp_seq=8 Destination Host Unreachable
^C
--- 192.168.10.15 ping statistics ---
11 packets transmitted, 0 received, +8 errors, 100% packet loss, time 10001ms
pipe 4
At the same time, tcpdump looks like this - that is, VLAN was successfully accepted, but then even ARP is unsuccessful
[root#vlan-2 centos]# tcpdump -e -nvvvti eth1
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
fa:16:3e:8f:7f:df > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.15 tell 192.168.10.16, length 28
fa:16:3e:8f:7f:df > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.15 tell 192.168.10.16, length 28
fa:16:3e:8f:7f:df > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.15 tell 192.168.10.16, length 28
fa:16:3e:8f:7f:df > Broadcast, ethertype 802.1Q (0x8100), length 46: vlan 5, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.15 tell 192.168.10.16, length 28
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
At the same time, the same design on VirtualBox with an internal network works fine.
What I have tried to solve this problem:
I tried creating a network with option "--transparent-vlan" - nothing changed
I tried creating a network with option "--transparent-vlan --disable-port-security" - nothing changed
I tried to connect the "Trunk" option to Neutron and configure additional entities like "network trunk" and "subport" - nothing changed. I had a suspicion that this was all intended to set up communication between two VMs on different networks, and not many VMs on the same network.
In general, this option has one limitation - it only works with Linuxbridge
A worked configuration for me:
vi /etc/neutron/neutron.conf
...
[DEFAULT]
vlan_transparent = true
...
vi /etc/neutron/plugins/ml2/ml2_conf.ini
...
[ml2]
type_drivers = flat,vlan,vxlan,gre
tenant_network_types = vxlan
mechanism_drivers = linuxbridge
...
vi /etc/neutron/plugin.ini
...
[ml2]
type_drivers = flat,vlan,vxlan,gre
mechanism_drivers = linuxbridge
...
And only then I was able to create a network with option "vlan_transparent", but not through the CLI, but through the API:
$ curl -s -X POST http://internal.mystack.net:9696/v2.0/networks -H "X-Auth-Token: <TOKEN>" -H "Content-Type: application/json" -d '{"network": {"name": "test", "admin_state_up": true, "tenant_id": "56b0cfe82ef94b2b8a60c53d72921a8b", "vlan_transparent": true}}'
$ openstack network show test --debug
...
{"networks": [{"provider:physical_network": null, "ipv6_address_scope": null, "dns_domain": null, "revision_number": 4, "port_security_enabled": true, "provider:network_type": "vxlan", "id": "78af4991-1b50-4b8d-9299-3a5dfaf689a2", "router:external": false, "availability_zone_hints": [], "availability_zones": [], "ipv4_address_scope": null, "shared": false, "project_id": "56b0cfe82ef94b2b8a60c53d72921a8b", "status": "ACTIVE", "subnets": [], "private_dns_domain": "mcs.local.", "description": "", "tags": [], "updated_at": "2022-03-29T13:22:00Z", "provider:segmentation_id": 88, "name": "test", "admin_state_up": true, "tenant_id": "56b0cfe82ef94b2b8a60c53d72921a8b", "created_at": "2022-03-29T13:22:00Z", "mtu": 1400, "vlan_transparent": true}]}
That is, if you add any "openvswitch" or "l2population" to the configuration in the "mechanism_drivers", then all this will abruptly stop working
I didn't develop the experiment further. realized that it doesn't suit me

Cannot ping gateway for external Openstack network

I installed Openstack Ansible, Pike version. There is a separate network controller and on it one physical network interface. We created VLAN 139 that leads the traffic to gateway. Config file for that part looks like:
/etc/network/interfaces
...
auto eno1.139
iface eno1.139 inet manual
vlan-raw-device eno1
# OpenStack Networking VLAN bridge
auto br-vlan
iface br-vlan inet manual
bridge_stp off
bridge_waitport 0
bridge_fd 0
bridge_ports eno1.139
We created an external Openstack network using:
openstack network create --external --share --provider-physical-network vlan --provider-network-type vlan --provider-segment 139 provider1
and all the other steps (subnet, router, etc)
As per documentation, first test should be pinging default gateway from router namespace. When I try that it is not working:
root#infra1-neutron-agents-container-e800e983:/# ip netns exec qrouter-eb842b12-9a35-4a93-baa9-38cc73531d9f ping 139.25.25.193
When I do TCP dump on physical network interface of controller node I can see packets going out without any problem:
openstackadmin#clcontroller:~$ sudo tcpdump -i eno1 --immediate-mode -e -n | grep 139.25.25.193
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno1, link-type EN10MB (Ethernet), capture size 262144 bytes
16:30:09.182894 fa:16:3e:d4:b6:a1 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 50: vlan 139, p 0, ethertype 802.1Q, vlan 139, p 0, ethertype ARP, Request who-has 139.25.25.193 tell 139.25.25.200, length 28
I see ARP request getting to gateway that has 139.25.25.193 and I am trying to ping:
hpadmin#hos-gw01:~$ sudo tcpdump -i any --immediate-mode -e -n | grep 139.25.25.193
[sudo] password for hpadmin:
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:53:29.857281 B fa:16:3e:d4:b6:a1 ethertype 802.1Q (0x8100), length 62: vlan 139, p 0, ethertype 802.1Q, vlan 139, p 0, ethertype ARP, Request who-has 139.25.25.193 tell 139.25.25.200, length 38
15:53:29.857281 B fa:16:3e:d4:b6:a1 ethertype 802.1Q (0x8100), length 58: vlan 139, p 0, ethertype ARP, Request who-has 139.25.25.193 tell 139.25.25.200, length 38
but what is confusing is my gateway is not responding to those ARP requests.
If I try to do same thing from stand alone Linux machine connected to same network segment and same VLAN everything works perfect.
Any idea what the problem might be? Thanks in advance.
It seems that problem was that external OpenStack network was set up to be on VLAN 139. Once we changed it to be flat everything started working without any problems. I am still confused, though, why gateway did not sent ARP responses.

libvirt DHCP fails from host

I am having trouble setting up a PXE VM. It is sending DHCP requests and the server is sending responses, but the VM does not appear to be processing the response. I am unsure as to the cause.
I did confirm physical machines are working just fine with the same DHCP and PXE settings and the DHCP requests and responses are the same as with the VM.
The DHCP server is provided by MaaS and is on the host.
Below is an image of the error.
The VM is created with: virt-install --name=maas-node-1 --connect=qemu:///system --ram=15360 --vcpus=8 --hvm --virt-type=kvm --pxe --boot network,hd --os-variant=ubuntu16.04 --graphics vnc --os-type=linux --accelerate --disk=/var/lib/libvirt/images/maas-node-1.qcow2,bus=virtio,format=qcow2,cache=none,sparse=true,size=60 --network=bridge:br0,model=virtio
The network are configured as:
auto br0
iface br0 inet static
address 192.168.10.2
network 192.168.10.0
broadcast 192.168.10.255
netmask 255.255.255.0
gateway 192.168.10.1
dns-nameservers 192.168.10.2
bridge_ports bond0
bridge_stp off
bridge_fd 0
bridge_maxwait 0
auto bond0
iface bond0 inet manual
mtu 1500
bond-miimon 100
bond-downdelay 200
bond-updelay 200
bond-mode 0
bond-slaves none
post-up ifenslave bond0 eno1 eno2 eno3 eno4
pre-down ifenslave bond0 eno1 eno2 eno3 eno4
...
DHCP request is:
steel.maas.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 18:03:73:f8:ea:c9 (oui Unknown), length 257, xid 0xf97e014f, Flags [Broadcast] (0x8000)
Client-Ethernet-Address 18:03:73:f8:ea:c9 (oui Unknown)
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Discover
Client-ID Option 61, length 6: ieee1394 03:73:f8:ea:c9
Parameter-Request Option 55, length 3:
Default-Gateway, Subnet-Mask, Domain-Name-Server
DHCP Reply is:
steel.maas.bootps > 255.255.255.255.bootpc: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0xf97e014f, Flags [Broadcast] (0x8000)
Your-IP steel.maas
Server-IP steel.maas
Client-Ethernet-Address 18:03:73:f8:ea:c9 (oui Unknown)
file "pxelinux.0"
Vendor-rfc1048 Extensions
Magic Cookie 0x63825363
DHCP-Message Option 53, length 1: Offer
Server-ID Option 54, length 4: steel.maas
Lease-Time Option 51, length 4: 600
Subnet-Mask Option 1, length 4: 255.255.255.0
Default-Gateway Option 3, length 4: 192.168.10.1
Domain-Name-Server Option 6, length 4: steel.maas
The problem was that ARP was not being responded to due to a trait of using bond-mode 0 with no trunking on the switch. Switching to balance-tlb fixed the issues.
This helped narrow the problem down: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/785668

"No route to host" when I do a telnet [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have two VM's in Azure with different public IP's:
10.10.1.9
10.10.1.6
When I do a telnet with following command from the server 10.10.1.6, I get the a error:
telnet 10.10.1.9 2181
Trying 10.10.1.9...
telnet: connect to address 10.10.1.9: No route to host
When I do a tcpdump in 10.10.1.9 side, I get the following log:
#tcpdump -i eth0 port 2181
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:35.530270 IP 10.10.1.6.55910 > 10.10.1.9.eforward: Flags [S], seq 1018543857, win 14600, options [mss 1418,sackOK,TS val 181360935 ecr 0,nop,wscale 7], length 0
Same time I also do a tcpdump on 10.10.1.6 side while i do a telnet from 10.10.1.6 to 10.10.1.9
tcpdump -i eth0 port 2181
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
07:55:57.970696 IP 10.10.1.6.55910 > 10.10.1.9.eforward: Flags [S], seq 1018543857, win 14600, options [mss 1460,sackOK,TS val 181360935 ecr 0,nop,wscale 7], length 0
**tcpdump on 10.10.1.9 with arp **
#tcpdump -i eth0 port 2181 or arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
08:00:18.356153 IP 10.10.1.6.55944 > 10.10.1.9.eforward: Flags [S], seq 3337054296, win 14600, options [mss 1418,sackOK,TS val 181643770 ecr 0,nop,wscale 7], length 0
08:00:42.294801 ARP, Request who-has 10.10.1.6 tell 10.10.1.9, length 28
08:00:42.295859 ARP, Reply 10.10.1.6 is-at 12:34:56:78:9a:bc (oui Unknown), length 28
tcpdump on 10.10.1.6
tcpdump -i eth0 port 2181 or arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
08:00:40.805565 IP 10.10.1.6.55944 > 10.10.1.9.eforward: Flags [S], seq 3337054296, win 14600, options [mss 1460,sackOK,TS val 181643770 ecr 0,nop,wscale 7], length 0
08:00:45.805204 ARP, Request who-has 10.10.1.9 tell 10.10.1.6, length 28
08:00:45.805721 ARP, Reply 10.10.1.9 is-at 12:34:56:78:9a:bc (oui Unknown), length 28
08:02:04.752283 ARP, Request who-has 10.10.1.9 tell 10.10.1.6, length 28
08:02:04.753141 ARP, Reply 10.10.1.9 is-at 12:34:56:78:9a:bc (oui Unknown), length 28
Sequence of run :
First I ran tcpdumps on both 10.10.1.9 and 10.10.1.10 and then tried doing telnet from 10.10.1.10.
arp -a on 10.10.1.9
#arp -a
? (10.10.1.7) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.4) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.1) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.8) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.10) at <incomplete> on eth0
? (10.10.1.11) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.6) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.5) at 12:34:56:78:9a:bc [ether] on eth0
arp -a on 10.10.1.6
#arp -a
? (10.10.1.1) at 12:34:56:78:9a:bc [ether] on eth0
? (10.10.1.10) at <incomplete> on eth0
? (10.10.1.9) at 12:34:56:78:9a:bc [ether] on eth0
Thanks in advance.
The tcpdump in 10.10.1.9 says that it received a packet from 10.10.1.10. but could not reply back...as a result we get "No route to host" on 10.10.1.10 side.
You should get "No route to host" if there is, in fact, no route from 10.10.1.10 to 10.10.1.9, not just because a packet sent from 10.10.1.10 to 10.10.1.9 didn't get a reply. I.e., you should only get "No route to host" if 10.10.1.10 couldn't send a packet to 10.10.1.9 in the first place!
Now, perhaps the OS running on 10.10.1.10 is being stupid and returning EHOSTUNREACH ("No route to host") rather than, for example, ETIMEDOUT ("Operation timed out") if it never gets a SYN+ACK back from the initial SYN.
Or perhaps there was a route from 10.10.1.10 to 10.10.1.9 during the time the
23:46:30.003480 IP 10.10.1.10.42946 > 10.10.1.9.eforward: Flags [S], seq 2823099523, win 14600, options [mss 1418,sackOK,TS val 74982205 ecr 0,nop,wscale 7], length 0
packet was sent, but 10.10.1.9 wasn't able to, or decided not to, respond to that initial SYN with a SYN+ACK, and when 10.10.1.10 retransmitted the SYN, it was no longer able to send packets to 10.10.1.9, and reported "No route to host".
If this is reproducible, I would suggest running tcpdump on both hosts, to see more details as to what happened. I would suggest running a command such as
tcpdump -i eth0 port 2181 or arp
so that, for example, if the problem is that the ARP entry for the other host timed out on one of the hosts, and a subsequent attempt to re-ARP for the other host's MAC address failed, that will show up. (I'm assuming here that there's no router in between 10.10.1.10 and 10.10.1.9, so that "No route to host" really means "No ARP entry for host".)
(Another possibility is that there's some sort of "packet filter"/firewall in place on one or the other host, handling some ports differently from others, so that connecting to port 22 is possible but connecting to port 2181 isn't possible.)

rsyslog to resend event from client after abnormal server crash

Rsyslog Server IP: 192.168.122.94
Rsyslog Client IP: 192.168.122.93
1) Done rsyslog server force reboot
root#rsyslogserver:~# reboot -f
Write failed: Broken pipe
2) After reboot I have sent an event from rsyslog client.
3) Server is running on port 1014 and client is configured to forward logs to server on 1014
3) Ran tcpdump on rsyslog server to listen the communication on port 1014. For the first time when we send event after force reboot, rsyslog client is not able to forward event to rsyslog server. Then after, rsyslog client is able to forward logs to rsyslog server.
root#rsyslogserver:~# tcpdump -i eth1 "src port 1014"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
11:03:05.687971 IP 192.168.122.94.1014 > 192.168.122.93.40036: Flags [R], seq 3944299399, win 0, length 0
11:05:28.096264 IP 192.168.122.94.1014 > 192.168.122.93.52079: Flags [S.], seq 3014852900, ack 1286331701, win 14480, options [mss 1460,sackOK,TS val 4294939924 ecr 149156552,nop,wscale 6], length 0
11:05:28.096605 IP 192.168.122.94.1014 > 192.168.122.93.52079: Flags [.], ack 394, win 243, options [nop,nop,TS val 4294939924 ecr 149156552], length 0
Reason:
This seems general behavior of any TCP connection. If any System crashes or terminates abnormally and after that if we send any TCP request then it resets old pre-cash connection and establishes new connection.
This will not happen for normal reboot.
RefLink:
https://en.wikipedia.org/wiki/TCP_reset_attack (Section TCP resets)
But here my question is how to prevent loss of that event for the first time.
Will there be any configurations in rsyslog server/client side to prevent event loss.

Resources