BGP Multipath in Cisco and Juniper from different ASes - networking

I recently started to look into BGP load balancing, explicitly, I was wondering if it there is a way in BGP to perform load balancing on two paths whose advertisement came from two external BGP speakers from to distinct ASes.
When looking into the corresponding Cisco and Juniper documentation, it is said that if the functionality is enabled, then load balancing is applied if the decision process results in a tie, which is only the case if the advertised routes come from the same external AS.
Could someone explain me why a tie is always implying that the advertisements came from the same AS, and whether it is possible to do load balancing over two different ASes.
Thank you in advance.

The reason is that in order to avoid potential routing loops, ECMP requires that the "competing" IGP costs be equal (per the "fish" problem -- see https://routingfreak.wordpress.com/tag/traffic-engineering/ for an explanation). Given that each AS, almost by definition, has its own independent IGP, the IGPs are not comparable (and would likely be unequal anyway). Also, typically, each prefix will have different AS path lengths from the two ASes. It could work; for example, with static routes to reach the bgp next hop.

Related

Advanced HTTP/2 proxy for load balancing of distributed scraping solution

I have built a distributed HTTP scraper solution that uses different "exit addresses" addresses by design in order to balance the network load.
The solution supports IPv4, IPv6 and HTTP proxy to route the traffic.
Each processor was responsible to define the most efficient route to balance the traffic and it was temporarily implemented manually for prototyping. Currently, the solution grows and with the number of processors as the complexity of the load balancing task get higher, that's why I need a way to create a component dedicated to it.
I did some rather extensive research, but seem to have failed in finding a solution for load balancing traffic between IPv6, IPv4 (thousands of local addresses) and public HTTP proxies. The solution needs to support weights, app-level response checks and cool-down periods.
Does anyone know a solution that already solves this problem? Before I start developing a custom one.
Thanks for your help!
If you search for load balancing proxy you'll discover the Cache Array Routing Protocol (CARP). This CARP might not be what you're searching for and there exists servers only for the proxy-cache what I never knew till now.
Nevertheless those servers have own load balancers too, and perhaps that's a detail where it's worth it to search more.
I found a presentation mentioning CARP as outstanding solution too: https://cs.nyu.edu/artg/internet/Spring2004/lectures/lec_8b.pdf
Example: for proxy-arrays in Netra Proxy Cache Server: https://docs.oracle.com/cd/E19957-01/805-3512-10/6j3bg665f/index.html
Also there exist several concepts for load-balancing (https://link.springer.com/article/10.1023/A:1020943021842):
The three proposed methods can broadly be divided into centralized and decentralized
approaches. The centralized history (CH) method makes use of the transfer rate of each
request to decide which proxy can provide the fastest turnaround time for the next job.
The route transfer pattern (RTP) method learns from the past history to build a virtual
map of traffic flow conditions of the major routes on the Internet at different times of the
day. The map information is then used to predict the best path for a request at a particular time of the day. The two methods require a central executive to collate information
and route requests to proxies. Experimental results show that self-organization can be
achieved (Tsui et al., 2001). The drawback of the centralized approach is that a bottleneck and a single point of failure is created by the central executive. The decentralized
approach—the decentralized history (DH) method—attempts to overcome this problem
by removing the central executive and put a decision maker in every proxy (Kaiser et al.,
2000b) regarding whether it should fetch a requested object or forward the request to another
proxy.
As you use public proxy-servers probably you won't use decentralized history (DH) but centralized history (CH) OR the route transfer pattern (RTP).
Perhaps it would be even useful to replace your own solution completely, i.e. by this: https://github.blog/2018-08-08-glb-director-open-source-load-balancer/. I've no reason for this special example, it's just random by search results I found.
As I'm not working with proxy-servers this post is just a collection of findings, but perhaps there is a usable detail for you. If not, don't mind - probably you know most or all already and it's never adding anything new for you. Also I never mention any concrete solution.
Have you checked this project? https://Traefik.io which supports http/2 and tcp load balancing. The project is open source and available on github. It is build using Go. I'm using it now as my reverse proxy with load balancing for almost everything.
I also wrote a small blog post on docker and Go where I showcase the usage of Traefik. That also might help you in your search. https://marcofranssen.nl/docker-tips-and-tricks-for-your-go-projects/
In the traefik code base you might find your answer, or you might decide to utilize traefik to achieve your goal instead of home grown solution.
See here for a nice explanation on the soon to be arriving Traefik 2.0 with TCP support.
https://blog.containo.us/back-to-traefik-2-0-2f9aa17be305

bind zmq dealer to multiple repliers

I want to create a proxy server which routes incoming packets from REQ type sockets to one of the REP sockets on one of the computers in a cluster. I have been reading the guide and I think the proper structure is a combination of ROUTER and DEALER on the proxy server. Where the ROUTER passes messages to the dealer to be distributed. However, I cannot figure out how to create this connection scheme. Is this the correct architecture? If so how to I bind a dealer to multiple addresses. The flow I envision is like this REQ->ROUTER|DEALER->[REP, REP, ...] where only one REP socket would handle a single request.
NB: forget about packets -- think in terms of "Behaviour", that's the key
ZeroMQ is rather an abstract layer for certain communication-behavioral patterns, so while terms alike socket do sound similar to what one has read/used previously, the ZeroMQ-world is by far different from many points of view.
This very formalism allows ZeroMQ Formal-Communication-Patterns to grow in scale, to get assembled in higher-order-patterns ( for load-balancing, for fault-tolerance, for performance-scaling ). Mastering this style of thinkign, you forget about packets, thread-sync-issues, I/O-polling and focus on your higher-abstraction-based design -- on Behaviour -- rather than on underlying details. This makes your design both free from re-inventing wheel & very powerful, as you re-use a highly professional tools right for your problem-domain tasks.
DEALER->[REP,REP,...] Segment
That said, your DEALER-node ( in fact a ZMQsocket-access-node, having The Behaviour called a "DEALER" to resemble it's queue/buffering-style, it's round-robin dispatcher, it's send-out&expect-answer-in model ) may .bind() to multiple localhost address:port-s and these "service-points" may also operate over different TransportClass-es -- one working over tcp://, another over inproc://, if that makes sense for your Design Architecture -- ZeroMQ empowers you to use this transparently abstracted from all the "awfull&dangerous" lower level gritty-nitties.
ZeroMQ also allows to reverse .connect() / .bind()
In principle, where helpfull, one may reverse the .bind() and .connect() from DEALER to a known target address of the respective REP entity.
You leave a couple details out that are important to determining the correct architecture.
When you say "from REQ type sockets to one of the REP sockets on one of the computers in a cluster", how do you determine which computer gets the message? Is it addressed to a specific computer? Does a computer announce its availability before it can receive a message? Does each message just get passed to the next one in line in a round-robin fashion? (if it's not the last one, you probably don't want a DEALER socket)
When you say "how do I bind a dealer to multiple addresses", it's not clear what you mean by "addresses"... Do you mean to say that the proxy has a unique IP address that it uses to communicate with each computer in the cluster? Or are you just wondering how to manage the connection to multiple different peers with the same socket? The former is a special case, the latter is simple.
I'm going to work with the following assumptions:
You want a worker computer from the cluster to announce its availability for work before it receives any work, and any computer in the cluster can handle any job. A faster worker, or a worker working on a smaller job, will not have to wait behind some slow worker to finish their job and get a new job first.
The proxy/broker uses a single ip interface to communicate with all workers.
If those are true, then what you want will be closer to this:
REQ->ROUTER|ROUTER->[REQ, REQ, ...]
A worker will create a request to the backend router socket to announce its availability, and await a reply with work. Once it is finished, it will create a new request with the finished work, which again announces its availability. The other half of the pattern you've already worked out.
This is the Simple Pirate Pattern from the ZMQ guide. It's a good place to start, but it's not very robust. This is in the Reliable Request-Reply Patterns section of the guide, and I suggest you read or reread that section carefully as it will guide you well. In particular, they keep refining this pattern into more and more reliable implementations and wind up with the Majordomo pattern, which is very robust and fault tolerant. You should see if you need all the features that provides or if you can scale it back a little. Either way, you should learn and understand what these patterns are doing and why before you make the choice to do something different.

Autodiscovery in P2P Applications

I want to create a P2P application on the internet. What is the best or if none exist a good enough way to do auto-discovery of other nodes in a decentralized network?
Grothoff and GauthierDickey from the GNUnet project (an anonymous censorship-resistant file-sharing network) researched on the question of bootstrapping a p2p network without any central hostlist.
They found that for the Gnutella (Limewire) network a random ip search needed on average 2500 connection attempts to find a peer.
In the paper they proposed a method which reduced the required connection attempts to 817 for Gnutella and 51 for the E2DK network.
Achieved was this through creating a statistical profile of p2p users for every DNS organization, this small (around 100kb) discovery database has to be created in advance and shipped with the p2p client.
This is the holy grail of P2P. There isn't a magic solution really - there's no way a node can discover other nodes without a good known point to act as a reference (well, you can do so on a LAN by using broadcasting, but not on the internet). P2P filesharing tends to work by having known websites distributing 'start points' for discovery, and then further discovery (I would expect) can come from asking nodes what other nodes they know about.
A good place to start on research would be Distributed Hash Tables.
As for security, that topic will be in the literature somewhere, I should think - again I would recommend Wikipedia. Non-existent ones are trivially dealt with: if you can't contact an IP/port, don't keep it on your list, and if a node regularly provides non-existent pointers, consider de-prioritising it or removing it from your list entirely.
For evil nodes, it depends on your use case, but let's say you are doing file sharing. If you request a section of a file, check with several nodes what the file section's hash should be, and then request by hash. If the evil node gives you a chunk that has a different hash, then you can again de-prioritise or forget that node.
Distributed processing systems work a little differently: they tend to ask several unrelated nodes to perform the same work, and then they use a voting system (probably using hashing again) to determine whether evilness is at hand. If a node provides consistently bad results, the administrator is contacted or the IP is removed from the known nodes list.
ok, for two peers to find each other they both have to know a common, lets say, mediator to exchange IPs once. You can use anything for this kind of the first handshake whilst being able to WRITE and READ from that "channel". i.e: DNS (your well known domains), e-Mail, IRC, Twitter, Facebook, dropbox, etc.

Advanced: Link aggregation, MPIO, iSCSI MC/S

I am trying to find the proper way of accomplishing the following.
I would like to provide 2Gb/s access for clients accessing a fileserver guest vm on a ESXi server, which itself access the datastore over iSCSI. Therefore the ESXi server need 2Gbps connection to the NAS. I would also like to provide 2Gbps directly on the NAS.
Looks like there are three technology which can help. Link aggregation (802.3ad, LAG, Trunk), Multi Path IO (MPIO), and iSCSI Multiple connection per session (MC/S).
However each have their own purpose and drawbacks, Aggregation provide 2Gbps total but a single connection (I think it's based on source/dest MAC address) can only get 1Gbps, which is useless (I think for iSCSI for example which is a single stream), MPIO seem a good option for iSCSI as it balance any traffic on two connection however it seem to require 2 IPs on the Source and 2 IPs on the DEST, I am unsure about MCs.
Here is what I would like to archive, however I am not sure of the technology to employ on each NIC pair of 1Gbps.
I also think this design is flawed because doing link aggregation between the NAS and the switch would prevent me from using MPIO on the ESX as it also require 2 IP on the nas and I think link aggregation will give me a single IP.
Maybe using MCs instead of MPIO would work?
Here a diagram:
If you want to achieve 2Gbps to a VM in ESX it is possible using MPIO & iSCSI but as you say you will need two adapters on the ESX host and two on the NAS. The drawback is that your NAS will need to support multiple connections from the same initiator, not all of them do. The path policy will need to be set to round-robin so you can use Active-Active connections. In order to get ESX to use both paths # over 50% each you will need to adjust the round robin balancing mode to switch paths every 1 IOPS instead of 1000. You can do this by SSHing to the host and using esxcli (if you need full instructions on how to do that I can provide them).
After this you should be able to run IOMeter on a VM and see the data rate # over 1Gbps, maybe 150MB/s for 1500 MTU and if you are using jumbo frames, then you will get around 200MB/s.
On another note (which might prove useful to your setups in the future), it is possible to achieve 2Gbps with two adapters on the source and bonded adapter on the NAS (so 2 → 1) when using the MPIO iSCSI Initiator that comes with Server 2008. This initiator works slightly different to VMWare and doesn't require your NAS to support many connections from one initiator — from what I can tell it spawns multiple initiators instead of sessions.

Pinging Computer through specefic route

I have a network of computers connected in form of a graph.
I want to ping from one computer(A) to another computer(B). A and B are connected to each other through many different ways, but I want to PING via only a particular edges only. I have the information of the edges to be followed during pinging available at both A and B.
How should I do this?
You could source route the ping but the return would choose its own path.
Furthermore, source-routed packets are often filtered due to security concerns. (Not always, they are useful and sometimes even required at edge routers.)
If the machines are under your local administrative control, then you could ensure that source-routed packets are permitted. As long as you are able to start a daemon on machine B, you could also easily enough design your own ping protocol that generates source-routed echo returns.
Well, this is actually done by routing protocols that are configured on the media in between the computers (routers I expect). I think there isn't a way where you can say "use that specific route". The routers have different protocols (OSPF, EIGRP, RIPv2) and they do the load balancing. The only way you would be sure of one specific route is to use static routing, but this isn't dynamically done where your computer decides the route.
This is normal because :
if you would be able to chose a route, DoS would be quite easy to do to kill one route.

Resources