GCE Network Load Balancer loops traffic back to VM - networking

On GCE, using a Network Load Balancer (NLB), I have the following scenario:
1 VM with internal IP of 10.138.0.62 (no external IP)
1 VM with internal IP of 10.138.0.61 (no external IP)
1 NLB with a target pool (Backend) that contains both of these VMs
1 Health check that monitors a service on these VMs
The simple issue is that when one of these VMs hits the NLB IP address, the request is immediately resolved to the IP of the same instance making the request, it never gets balanced between the two VMs, it never makes it to the other VM. Even if the VM making the request has failed its health check. For example:
VM on 10.138.0.62 is in target pool of NLB and its service is healthy.
VM on 10.138.0.61 is in target pool of NLB and its service is NOT healthy.
Make a request from the second VM, on 10.138.0.61, to the NLB, and even though this same VM has failed its health check, traffic will still be delivered to itself. It's basically ignoring the fact there's a NLB and health checks entirely, and simply saying, "If the VM is in the target pool for this NLB and it attempts contact with the IP of the NLB, loop the traffic back to itself".
Note that if I remove the VM on IP 10.138.0.61 from the target pool of the NLB and try the connection again, it immediately goes through to the other VM that's still in the target pool, just like I'd expect it to. If I put the VM on IP 10.138.0.61 back in the target pool and attempt to hit the NLB, again it will only loop back to the calling machine on 10.138.0.61
Googling around a bit, I saw that this behavior happens on some versions of Windows Server and its NLB, but I didn't expect this on GCE. Have others seen the same behavior? Is this just a known behavior that I should expect? If so, any workarounds?

This is working as intended. Due to how networks are configured in a virtual environment, this will always result in the load balanced VM returning the request to itself ignoring health check status. Please check the link provided for more information.

Related

Aws ec2 - Unable to consume http server from a different machine on the same network

Followed this tutorial to setup two ec2 instances: 12 . Creation of two EC2 instances and how to establish ping communication - YouTube
The only difference is I used a linux image.
I setup a simple python http server on a machine (on port 8000). But I cannot access this from my other machine; whenever I curl, the program kind of waits. (It might eventually timeout but I wasn't patient enough to witness that).
However, the workaround, I figured, was that you have to add a port rule via the security group. I do not like this option since it means that that port (for the machine that hosts the web server) can be accessed via the internet.
I was looking for an experience similar to what people usually have at home with their routers; machines connected to the same home router can reach out to other machines on any port (provided the destination machine has some service hosted on that port).
What is the solution to achieve something like this when working with ec2?
The instance is open to the internet because you are allowing access from '0.0.0.0/0' (anywhere) in the inbound rule of the security group.
If you want to the communication to be allowed only between the instances and not from the public internet. You can achieve that by assigning the same security group to both the instances and modifying the inbound rule in the security group to allow all traffic or ICMP traffic sourced from security group itself.
You can read more about it here:
AWS Reference

Google Cloud Platform networking: Resolve VM hostname to its assigned internal IP even when not running?

Is there any way in the GCP, to allow VM hostnames to be resolved to their IPs even when the VMs are stopped?
Listing VMs in a project reveals their assigned internal IP addresses even when the VMs are stopped. This means that, as long as the VMs aren't re-created, their internal IPs are statically assigned.
However, when our VMs are stopped, the DNS resolution stops working:
ping: my-vm: Name or service not known
even though the IP is kept assigned to it, according to gcloud compute instances list.
I've tried reserving the VM's current internal IP:
gcloud compute addresses create my-vm --addresses 10.123.0.123 --region europe-west1 --subnet default
However, the address name my-vm above is not related to the VM name my-vm and the reservation has no effect (except for making the IP unavailable for automatic assignment in case of VM re-creation).
But why?
Some fault-tolerant software will have a configuration for connecting to multiple machines for redundancy, and if at least one of the connections could be established, the software will run fine. But if the hostname cannot be resolved, this software would not start at all, forcing us to hard-code the DNS in /etc/hosts (which doesn't scale well to a cluster of two dozen VMs) or to use IP addresses (which gets hairy after a while). Specific example here is freeDiameter.
Ping uses the IP ICMP protocol. This requires that the target is running and responding to network requests.
Google Compute Engine VMs use DHCP for private IP addresses. DHCP is integrated with (communicates with) Google DNS. DHCP informs DNS about running network services (VM IP address and hostname). If the VM is shutdown, this link does not exist. DHCP/DNS information is updated/replaced/deleted hourly.
You can set up Google Cloud DNS private zones, create entries for your VPC resources and resolve private IP addresses and hostnames that persist.

Network problems connecting multiple clients from same public ip to my google Compute Engine Instance

We are using a Centos 7 Google Cloud Instance web server, and I'm experiencing connectivity problems when multiple clients from my company try to connect to the web server at the same time.
We are surfing the site ok, then suddenly can't connect for a while (perhaps some 10 or 20 seconds, and then we can connect again.
At the same moment, I can browse it perfectly from other ip public from the same subnet and company, other cellphones with 4g, etc.
It seems thats some DDOS filter, waf protection, ips signature is doing something.
The server only have apache and nothing else.
Is my diagnosis on the right track? How can I fix this behaviour?
It would be worth to check any firewall and load balancer for your application server. As you suggesting that you are able to access website from same subnet when it is inaccessible, can you perform port scanning to review http service and latency through nmap command: nmap -p 80 [public IP address] from external network?
It is worth to perform VM instance health checks (CPU load, network I/O performance etc.) during the time website becomes inaccessible. There might be chances that some resources becomes unavailable during high load.

netsh portproxy hangs multiple IP addresses

Windows Server 2008 R2, fully patched and updated.
I have 4 static IPs on a dedicated server. I will refer to them as follows:
x.x.x.x
y.y.y.y
z.z.z.z
a.a.a.a
x.x.x.x is the default external and internal IP address of the server.
All external IPs are the same as the internal IPs, all running on the same NIC.
x.x.x.x and y.y.y.y were running on port 80 for HTTP through IIS, with different host headers handling the destinations. That worked perfectly.
I recently added two new IP addresses, z.z.z.z and a.a.a.a for a different application that uses two ports, but we want external port 80 traffic to translate to the internal ports it is using.
We want incoming traffic to work as follows:
Incoming traffic on x.x.x.x:80 map to x.x.x.x:8080
Incoming traffic on y.y.y.y:80 map to y.y.y.y:8080
Incoming traffic on z.z.z.z:80 map to y.y.y.y:8088
Incoming traffic on a.a.a.a:80 map to y.y.y.y:8089
We changed the binding in IIS to only listen on the specific IPs and internal ports so that port 80 was only being listened to by netsh portproxy.
We have been able to accomplish this with 4 separate netsh portproxy rules and everything works great. All traffic to the two HTTP IPs works fine, and the traffic for the other two IPs to the other two internal ports get routed properly as well.
The problem is that everything works as expected, but occasionally something hangs, usually around 4 PM EST, and the websites are no longer available. There are no application pool or website crashes. Just ports no longer routing.
When it hangs the easiest fix is to run "portproxy reset" and re-create the portproxy rules through a batch file and everything works again.
I guess my question is if there is some kind of idle timeout built into netsh portproxy or possibly some type of buffer overrun protection.
Not a single log shows any faults.
The application pools in IIS were adjusted to recycle at shorter intervals (30 minutes) to prevent any long running worker processes in case that was the issue. Same result.
I can easily create a Windows service to check the port status at a very short interval and reset the specific portproxy rule, but this is not ideal, as there is still the potential for packet loss and unavailability of the services (even if for only a few seconds) requiring the HTTP requests to be re-sent.
Again, I should reiterate that everything works great until a certain point in the day, and that this has absolutely nothing to do with the Windows Firewall, as we get the exact same results with it on or off. There are no apparent DDoS or other types of attacks either. All separate websites and other applications still run on their internal ports when the portproxy hangs (i.e. accessing http://example.com:8080 still works without issue).
The point of failure is the netsh portproxy.
Has anyone experienced similar issues? I am considering adding a Fortinet hardware firewall that has this functionality built in, but I am wondering if that will handle it any better than what is already in place.

Can the internal IP addresses of azure worker role instances be swapped?

Like if one is on 192.168.1.1 and the other on 192.168.1.2, can you configure the machine's to each other's static IP addresses and thereby have them start receiving information for each other's InstanceInputEndpoints (since now the Azure gateway should route the InstanceInputEndpoint to the new owner of the IP address)?
No, you can't do that!
And, as of today (Dec. 2013) you are highly advised to never set static IP Address of your Virtual Machine inside Windows Azure! You should always use default DHCP configuration. If you wand IP Address predictability check out this blog post. You can still use Azure Virtual Network with Web and Worker Roles and have IP Address predictability.
If you use VM you should create several VM and a Networks, define address space you will use. When you create a VM, make sure your VM is using the networks that you create. If you forget to include the vm into the network, you need to recreate the VM.
Example how to change the internal IP using 3 VM:
Server A is connected to the network and get ip 192.168.0.1
Server B 192.168.0.2
Server C 192.168.0.3
shutdown all your server from azure portal so that the status is deallocated then turn it on with this sequence:
Server B
Server C
Server A
The Result will be:
Server A 192.168.0.3
Server B 192.168.0.1
Server C 192.168.0.2
If you turn off the VM from inside the VM, it won't be change the internal IP.
If you remote onto one of your VMs, you will see an XML file at
C:\config
The filename looks something like
[deployment id]_[role name]_[instance number].[version number]
Inside the file you will find all the instances in the deployment with their IP addresses. If you edit the IP address in this file for a particular role instance on a particular VM, that VM will think that the IP address for the instance is the one in the file and will start routing traffic to it.
Warning: I've never tried to do this programmatically. Also, the changes will get wiped out if there is any update to the deployment (either initiated by you or by Azure). And there might be some other horrible side effect.

Resources