Building a small 4 node cluster - few quick questions about networking - networking

I'm putting together a small 4 node cluster on which I'm going to be running storm. I have a few questions about the networking side of things. First off all the computers are equipped with gigabit ethernet however the hub that I currently have only goes up to 100 megabits. Should I upgrade my hub? Or will the performance gain be negligible? Second I read on a few sites that a hub is not the best piece of hardware to use that a switch would be better for my purposes. I'm trying to use Storm to have one machine pull data down from the internet and then pass it off to the others for processing. Would a switch or hub be more useful? Thanks for all your help folks.

A Router can allow for serious networking capabilities, it's also oftentimes overkill. With only 4 machines you're probably much more likely to want a Gigabit Switch instead: sold in stores oftentimes under the name Gigabit Router -- which is technically a lie as it's usually a Bridge (Hub or Switch, Networking has a lot of overloaded names). Router are many times more expensive than Switches if you have difficulty identifying between the two from just marketing names. A hub on the other hand is oftentimes a dumb Switch with less capabilities (and sometimes speed penalties in high data flow situations).
The question as to if you need to upgrade is dependent on where you bottleneck is. Is the data you're sending large? Do your cluster computer spend a lot of time computing instead of receiving data? First determine if your networking speed will be your bottleneck, then decide if you should upgrade that bottleneck. If you're worried about network speed but aren't 100% sure it will be a bottleneck, a cheap 1 Gigabit Switch won't cost you much and will almost certainly meet you're needs.
Also note that if you're data needs to first come over the internet (isn't generated on your side of the network) you're bottleneck will almost certainly be your internet connection before your local network.
So essentially, profile your problem before making a choice.

Related

What are the advantages of using multiple ports in a game?

What are the advantages of using multiple ports in a game? I understand why some would use a combination TCP and UDP for different purposes, but why do some games use multiple TCP or UDP ports? Is there any advantage to this? I am asking because I find myself making networking code for my game and I wonder why others go out of there way to have multiple ports?
For example GTA V uses 5 UDP ports and Assasins Creed Revelations uses 4 TCP and 4 UDP ports.
There is always a reason.
Quite often they are not (entirely) technical. For instance one team is working on the inter-game chat functionality while another is working on the server-client protocol for game X. Then they are integrated into the same product, but nobody bothers unifying the protocols due to costs, time constraints, concerns regarding future maintainability etc.
There are also purely technical reasons:
If the game server and the chat server run on different locations, it is natural to use multiple connections; the alternative is to use a sort of reverse-NAT box on the server side, but it's risky since it's a bottleneck and a single point of failure.
Stability: if the chat server crashes or malfunctions you don't want it to also bring down the stream between the client and the game server, so it's safer to communicate on parallel connections.
Overall it's a classic example of theory meets practice: it would be great to use a single port (and connection), but it is more practical to separate the different transmissions and interactions for a variety of reasons.
A bit off topic, have you noticed how many connections are opened when accessing a web page nowadays? It's often several dozens. Would probably be an order of magnitude less without all the ads though. Anyways, compared to that, a game opening 5 connections is nothing.

Software Routing

"Commercial software routers from companies such as Vyatta can typically only attain transfer data at speeds of up to three gigabits per second. That isn’t fast enough to take advantage of the full speed of a typical network card, which operates at 10 gigabits per second." [1]
How is the speed of the network interface card relevant in this scenario? Aren't software routers connecting multiple Virtual Machines running on the same physical host? [2] Unless a PC has multiple network interface cards, it is unlikely that it functions as a packet switch between different physical hosts.
My interpretation suggests that there seem to exist two different kinds of software routing: (1) Embedding a real time operating system on an actual router. (2) Writing application layer code on a PC that can handle packets being transmitted between different virtual machines running on that very PC. Is this correct?
It depends on what your router is doing. If it's literally just looking at a static route table and forwarding packets out another interface, there isn't much hit in performance.
It's when you get into things like NAT, Crypto, QoS, SPI... that you will see performance degradation. Hardware vendors are usually using custom silicon to process the more advanced features, this allows for higher throughput packet forwarding.
Now that merchant silicon is fast enough and the open source applications are getting better, the performance gap is closing.
It really depends on your use case as far as what you want to use. I've gone with both and not seen performance hits, but the software versions weren't handling high throughput workloads.
Performance of the link from the virtual network to the physical eventually becomes important at any reasonable scale. You're right that, within the same physical host, things can be pretty quick, but that requires that one can get everything needed in one box.
While merchant silicon has come a long way in improving the performance of networking equipment, greater gains are taking place getting CPU's to handle networking tasks better. Both AMD and Intel have improved their architectures to the point where 10 Gbps forwarding is a reality. Intel has developed a specialized library (DPDK Wiki Page) that takes care of a lot of low-level networking functions at high performance.

Can a million New York city devices be programmed for true peer-to-peer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
If Chris and Pat want to exchange a text message, they send and receive via their network providers, which charge them for a connection.
If Chris and Pat are both located in New York City, and there are enough wireless devices between Chris and Pat all close enough to each other to form a continuous chain, is it possible for all those devices to be programmed to cooperatively forward packets amongst each other, bypassing the need for network providers?
It would seem the "address" of each device would have to include current geographic coordinates, and devices would have to report their movements frequently enough so routing attempts could still find them, but the speed and capacity of devices nowadays could handle that, right?
Would such a network be viable? Does it already exist or has it been attempted? Is there some kind of inherent programming problem that is difficult to overcome?
There are a few interesting things here:
Reachability. At least you need to use a technology that can do ad-hoc and peer-to-peer networking. Of those technologies only bluetooth, NFC and WiFi are more or less often implemented. Of those again only wifi currently may have the strength to connect to devices in other houses or to the street, but even there typical ranges are 30-60m (and that's for APs, it might be lower for UEs).
Mobility. ANY short-range wireless communication protocol has difficulties with fast-moving devices. It's simple math, suppose your coverage is 50m in diameter, if you move at about 20km/h or 5.5m/s, you have less than 10s to actually detect, connect and send data while passing this link. Oh, but then we did not consider receiving traffic, you actually have to let all devices know that for the next 10s you want to receive data now via this access network. To give an example, wifi connectivity times with decent authentication (which you need for something like this) alone takes a few seconds. 10s might be doable, but as soon we talk about cars, trains, ... it's becoming almost impossible with current technology. But then again, if you can't connect to those, what are the odds you will cross some huge boulevards with your limited reachability?
Hop to hop delays. You need a lot of those. We can fairly assume that you need at least a hop each 20-30m, let's average at 40 hops/km. So to send a packet over lets say 5km you'd need 200 hops. Each hop needs to take in a packet (L2 processing), route it (L3 processing) and send it out again (L2 processing). While mobile devices are relatively powerful these days I wouldn't assume they can handle that in the microseconds routers do. Next to that in a wireless network you have to wait for a transmission slot, which can actually take in the order of ms (each hop!). So all in all, odds are huge this would be a terribly slow network.
Loss. Well, this depends a bit on the wireless protocol, either it has its own reliable delivery protocol (which will make the previous point worse) or it doesn't. In that last case, suppose your wireless link has about .1% loss, or 99.9% no-loss, this would actually end up with an 18.1% loss rate for the 200 hops considered previously ( (1-0.999**200)*100) This is nearly impossible to work with in day-to-day communications.
Routing. lets say you need a few millions of devices and thus routes. For traditional routing this usually takes some very heavy multicore routers with loads of processing power. Let's just say mobile devices (today) can't cut that yet. A purely geographically based routing mechanism might work, but I can't personally think of any (even theoretical) system for this that works today. You still have to distribute those routes, deal with (VERY) frequent route updates, avoid routing loops, and so on. So even with that I'd guess you'd hit the same scale issues as with for example OSPF. But all-in-all I think this is something that mobile devices will be able to handle somewhere in the not-so-far future, we're just talking about computing capacity here.
There are some other points why such a network is very hard today, but these are the major ones I know of. Is it impossible? No, of course not, but I just wanted to show why I think it is almost impossible with the current technologies and would require some very significant improvements, not just building the network.
If everyone has a device with sufficient receive/process/send capabilities, then backbones (ISP's) aren't really necessary. Start at mesh networking to find the huge web of implementations, devices, projects, etc., that have already been in development. The early arpanet was essentially true peer-to-peer, but the number of net nodes grew faster than the nodes' individual capabilities, hence the growth of backbones and those damn fees everyone's paying to phone and cable companies.
Eventually someone will realize there are a million teenagers in NYC that would be happy to text and email each other for free. They'll create a 99-cent download to let everyone turn their phones and laptops and discarded devices into routers and repeaters, and it'll go viral.
Someday household rooftop repeaters might become as common as TV antennas used to be.
Please check: Wireless sensor network
A wireless sensor network (WSN) of spatially distributed autonomous sensors to monitor physical or environmental conditions, such as temperature, sound, pressure, etc. and to cooperatively pass their data through the network to a main location

How to avoid crashing my user's router?

It appears that cheap consumer routers are fairly easy to crash: hanging around in various backup/sync software forums, I see this mentioned from time to time. Developers seem to be putting a fair amount of effort into making sure they don't crash the routers.
What are the "do"s and "don't"s for my network-heavy application to ensure that it doesn't cause issues with badly designed routers? Especially one that intends to connect to a number of peers?
IMO trying to workaround bad hardware is the road to nowhere, because every router fails in its own remarkable way :).
What you can do in the network-heavy application is assume that network is not stable media (routers can crash, etc) and design application network operations accordingly.
For instance, provide reconnect logic, connection timeouts, some sort of state caching to allow users work with app even if network connectivity is gone.
Concerning faulty routers - they usually crash because of great number of simultaneous connections (e.g. downloading via bittorrent or other p2p protocol). So, maintaining minimum number of connections can help.

Lots of ports with little data, or one port with lots of data?

I've been checking out using a system called ROS (http://www.ros.org) for some work.
There are lots of different types of data that get sent between network nodes in ROS.
You define a struct of data that you want to send in a message, and ROS will handle opening a specific port between the two nodes that will only send that struct of data.
So if there are 5 different messages, there will be 5 different ports.
As opposed to this scenario, I have seen other platforms that just push all the different messages across one port. This means that there needs to be a sort of multiplexing/demultiplexing (done by some sort of message parsing on the receivers end).
What I wonder is... which is better from a performance perspective?
Do operating systems switch based on ports quickly, so that a system like ROS doesn't have to do too much work to work out what is in the message and interpreting it?
OR
Is opening lots of ports going to mean lots of slower kernel calls, and the cost of having to work out and translate message types end up being more then the time spent switching between ports?
When this scales to a large amount of data at high rates and lots of different messages types there will be lots of ports. So I imagine that when scaling each of these topologies that performance will be a big factor in selecting the way to work.
I should also point out that these nodes usually exist on one small network, or most of the time on the one machine in which networking is used as a force of inter-process communication. So the transmission time is only a very small factor in the overall system timing.
ROS being an architecture for robots may have one node for every sensor and actuator, so depending on the complexity of your system we may be talking about 20-30 nodes pushing small-ish (100bytes or so) data between 10-100Hz
It depends. I do not know the specifics of ROS but in networking it comes down to the following constraints:
Distance: speed of light is fast but over a distance it starts making a difference
Protocol Overhead: connection oriented vs. connection-less
On the OS side, maintaining a list of free ports isn't such much of an overhead - of course there is a cost to it but everything is relative: if you are talking about a distributed system with long distance links, then it is easy to argue that cycling through OS network ports ranks as lower concern compared to managing communication quality.
Without a more specific question, I'll stop here.
I don't have any data on this, but it seems plausible that multiple ports might be handled more efficiently by multi-core systems, as opposed to demultiplexing within the program.

Resources