Random delay in port binding for distributed_virtual_router and router_centralized_snat in openstack neutron - openstack

I have created a private network called "Private_Network" in the range of 192.168.220.0/24 plus a virtual router called "Virtual_Router" inside openstack which is connected to the external network. Then I connect the default gateway of "Private_Network" i.e. 192.168.220.1 to the "Virtual_Router" so that I all VMs connected to "Private_Network" can access the Internet via SNAT.
I used vxlan as overlay network, and a flat provider network.
By connecting "Private_Network" to "Virtual_Router", two ports are immediately created: router_interface_distributed with the IP address of 192.168.220.1 and router_centralized_snat with IP address of 192.168.220.45, however, both ports are in the DOWN state for a bit long and random time like 2 hours, or 45 minutes, or 20 minutes. I should mention, that rarely the ports get UP as soon as (less than a minute) I connect the "Private_Network" to "Virtual_Router".
please see the two ports just after creation that both are in the Down state
I have searched a lot to find out the main reason behind this issue. I am not convinced that server configuration is wrong because I have a few cases in which the two ports gets up right after I connect the "Private_Network" to "Virtual_Router". So, I tried to look at the log files and noticed there are three main phases to pass in order to get each of those ports to UP state: DHCP, port binding, and L2 provisioning. I changed the log level to DEBUG and investigate the log files in details.
I run the following process several times:
create a brand project in Horizon.
create a brand virtual network (called "Private_Network") in the range of 192.168.220.x/24.
create a brand virtual router (called "Virtual_Router") connected to external network.
connect port 192.168.220.1 (default gateway) of "Private_Network" to "Virtual_Router".
cat /var/log/neutron/* | grep snat_port
From more than ten cases I experimented, neutron either stuck at the "port binding" phase or "L2 provisioning" phase.
When It stuck at "port binding" it takes random time to finish, like 45 minutes, 20 minutes or 10 minutes and Once "port binding" phase is done, "L2 provisioning" phase will be done in less than a minute and port state changes to "UP".
However, In case that "L2 provisioning" phase get stuck, the previous two phases finished in less that 1 minutes, but "L2 provisioning" get stuck for hours. Its so confusing to me why do I see this much delay in getting ports "UP".
I would appreciate if anybody can assist me to resolve this issue.

Related

Identifying GPRS Dynamic IP connections from the same computer

I'm facing a challenging problem here that don't know how to resolve:
Context: I have a game launcher that connects to my server and if doesn't detect any cheating software on the player computer, launches the game and tells the server to allow that IP to connect to the game server.
This has many potential issues like if there are multiple players under the same IP but I make a queue in that case so all is fine until here.
Now the main problem is that I don't have control over what information sends the game, I can only modify the launcher. For this reason all is IP based as that's the only way I have to identify that a certain player is logging in and has been authorised by launcher. It goes:
Launcher connects to Server and tells him to Allow IP A.
Server replies: ok (save IP A)
Launcher starts game.
Player tries to login.
A connection is established to the server, server checks if origin IP (IP A) is allowed to log in, if yes, go ahead.
So, the system even though far from ideal, does the job, and considering game is compiled and we cannot modify it, I couldn't think of better way.
Anyway now we come to the problem:
Certain players, when they open the launcher, all goes fine, game launches but then when player tries to login server denies connection as it comes from a different origin IP!
That broke up my mind, how can two tcp connections made within a few seconds of difference from client A to server B have a different client source IP? obviously this ruins all my system. I even tried to periodically fetch IP from sites like whatismyip to see if it was changing overtime but it wasn't the case, it seems like maybe because it goes to another port, or I don't know the reason, sometimes changes it and sometimes it doesn't.
It seems to be related to players being using tethering internet connections,as I e never seen this before on a common internet connection.
So basically, I'm not sure what could I do to identify/relate those two connections and this is a big problem as many players are unable to join my game and I cannot let them join without the launcher for obvious reasons.
My random ideas to resolve it range from bad to terrible:
open multiple connections to server on different ports and see if that gives different source iOS
let player connect and then do some kind of validation based on netstat check on client: when player is connected to game server I should see it there and could send that info to server, server would kick any client connected if there's no validation from launch, however, I think I would still have the problem to link both connections.
maybe there's another way that I'm not aware of to identify this connections. Assume I have full control in server side and in launcher, but I cannot change the game server packet that does the "login" attempt.
Based on your assumptions (IP-based only, game/server unmodifiable), it looks like we are hitting a wall indeed..
For the moment the only thing that comes to mind is performing multiple requests to the server instead of one, and until the user finally logs in.
I mean:
Periodically: launcher connects to server and tells him to Allow current IP. Server saves this IP and hopefully at some point you will have discovered all IPs.
Do this in the backgound until the player is finalizing its login (or a fixed period of time)
With some luck, if you open multiple connections during the whole period of time needed to start the game and login, you will have discovered and allowed all IPs of the user. This will mitigate the issue but not eliminate it.
I'll edit this post if I think about something else.

Cannot ping across router (see details)

I'm trying to go through an online course to study for my CCENT and CCNA certification exams, and I've come across a trouble spot.
In the module, he's going over basic network setup, including setting up interfaces, assigning ip addresses, the works.
At the end of the video for that portion, he's testing the connection by pinging a second machine from across the router, and having no issue in doing so, however I can't seem to make it work. Crude ASCII topology drawing below.
Currently, I can ping both ends of the router from either machine, and can ping both machines from the router no problem. What am I missing, or what have I not done in order to be able to ping one machine to the other? I want to make sure I have this working before I move on in the course.
10.0.0.0/25 10.0.0.128/25
|CPU 1|-------G0/0--|R1|--G0/1----------|CPU 2|
R1: G0/0: 10.0.0.1/25
G0/1: 10.0.0.129/25
CPU 1:10.0.0.10/25
default gateway: 10.0.0.1
CPU 2:10.0.0.130/25
default gateway: 10.0.0.129
Are the PCs in question "physical" machines? One common cause for PC to PC ping failure on "Physical" machines is the windows firewall. It would need to be disabled on the remote machine you wish to ping in order to get a response.
Thank you
Please vote on this answer if it was helpful.

JMS Connection Latency

I am examining an application where a JBOSS Application Server communicates with satellite JBOSS Application Servers (1 main server, hundreds of satellites).
When observing in the Windows Resource Monitor I can view the connections and see the latency by satellite - most are sub-second, but I see 10 over 1 second, of those 4 over 2 seconds and 1 over 4 seconds. This is a "moment in time" view, so as connections expire and rebuild when they need, the trend can shift. I do observe the same pair of systems have a ping latency matching seen on the connection list - so I suspect it's connection related (slow pipe, congestion, or anything in the line between points A and B).
My question is what should be a target latency, keeping in mind the satellites are VPN'd from various field sites. I use 2 seconds as a dividing line to when I need the network team to investigate, I'd like to survey what rule of thumb do you use in evaluating when the latency for a transient connection starts peaking - is it over a second? I do observe the same pair of systems have a ping latency matching seen on the connection list - so I know it's connection related.

How to test the stability of internet connection for this particular scenerio

I work for a company and we have a device that we are installing in small shops for their payment transactions. This device uses internet connection as the primary connection and in case internet goes down, it fails to 3G connection. During this time there is a downtime for few minutes
But we are having issues, where customers are calling us and says that their site goes down repeatedly throughout the day. When we look into our logs we see that our device has indeed failed over and back a number of times from primary to 3G and back to primary. We advise them that they need to check with ISP and make sure there is no internet drops.
Often customer say that they have consulted with ISP and they seem to say there are no issues from their end.
The only other possible reason that I can think of as to why the device keeps falling is due to faulty cabling. Are there are other way that we can test out that the problem is to do with Internet and not our device?
Perhaps you ought to expand the test routines included in the device, assuming the device has the memory capacity and/or libraries and computing power available.
For example, does your device determine the Internet is down only if it cannot reach a certain IP destination? If so, you may want to expand this by 1) testing to ensure timeouts aren't too short due to upstream congestion, 2) testing another known location such as Google's DNS server 8.8.8.8 when the intended destination IP fails, and 3) testing the internal gateway to determine if the ISP modem/router has rebooted for some reason.

Observer in a distributed environment

Machine A needs to send a message to machine B. Machine A has a static IP but machine B does not.
One option I could think of to solve this problem is that machine B opens a TCP connection to machine A and then machine A sends the data/message to machine B. However, this solution has the following limitations:
a) It is not scalable if there are many such machines as machine B to whom/which the data has be sent. It might be a kill on the resources of machine A.
b) It is machine A that needs to send the data when it wants. Machine B does not know when there will be data for it. In the current design, machine B will have to keep polling machine A repeatedly with a TCP connection asking if it has any data for it or not. This can get expensive if there are many machine B's.
Is there a less expensive way to solve this problem? The Observer design pattern comes to mind. Machine B could subscribe to a notification from machine A to inform it when data becomes available. However, how does one implement the pattern in a distributed environment when machine B does not have a static IP?
Observer aside, is there a way other than using raw sockets for machine A to send that data to machine B, that would be less expensive?
What if machine B makes a call to machine A to register its IP address for updates? That would be a quick message exchange. Whenever machine A has data it could create a new connection to all of the IPs that have registered themselves and send them the data.
Look at IP Multicast, though you may be able to get by with simple UDP broadcast.
Not having a static IP means that it is accessible from outside, but it's address changes?
If it does, then you can have the machine B call A.detach(old_ip); A.attach(new_ip) every time the address is changed.
I'd go with your original idea, except that there's no need to poll - B can just maintain an idle TCP connection to A at all times, then when A wants to send a message it just sends it out to all the clients it has connected at that time. Overhead won't be a problem - even a fairly old machine can handle thousands of simultaneous mostly-idle TCP connections.
(You'll also want to implement some kind of keep-alive echo / echo reply type messages if the gaps between real messages are longer than a few minutes, so that B can quickly detect a dead connection and reconnect, and to avoid connection-tracking information in firewalls or routers in path from timing-out).

Resources