Rsync over Infiniband/RDMA - rsync

Does rsync currently supports datta transfer over RDMA/Infiniband? I have to send some data to another server, but it's taking long to transfer, after searching for some time, i found there is somethng called as Infiniband network protocol which uses RDMA to send data and incurs low latency.
I'm wondering if it's possible to rsync over RDMA too? and if not, then How can we use Infiniband, does it has same API as rsync? It looks like Infiniband is restricted to some hardware based protocol internally.

You can configure the infiniband ipoib and use rsync over the ip connection. While this is not as fast as native infiniband it is faster than expected. I believe (could be wrong) that ipoib is limited to 10Gb/s thought that could be a limitation of my older cards.
For example, rsync over my infiniband ipoib connected between two systems seems to top out at 144MB/s Which is 1.15Gb/s. This is the limit of the drive write speed on the receiving drive.(a spinning HDD) I can get up to 500MB/s (4Gb/s) doing an SSD to SSD rsync over infiniband.
So at the moment, my limitations are the speeds at which I can read and/or write to the storage device. Not the speed of the infiniband connection. Meaning, connecting to a drive over my infiniband network is as fast as having the drive on the local machine.
Hope this helps some.

Related

Software Routing

"Commercial software routers from companies such as Vyatta can typically only attain transfer data at speeds of up to three gigabits per second. That isn’t fast enough to take advantage of the full speed of a typical network card, which operates at 10 gigabits per second." [1]
How is the speed of the network interface card relevant in this scenario? Aren't software routers connecting multiple Virtual Machines running on the same physical host? [2] Unless a PC has multiple network interface cards, it is unlikely that it functions as a packet switch between different physical hosts.
My interpretation suggests that there seem to exist two different kinds of software routing: (1) Embedding a real time operating system on an actual router. (2) Writing application layer code on a PC that can handle packets being transmitted between different virtual machines running on that very PC. Is this correct?
It depends on what your router is doing. If it's literally just looking at a static route table and forwarding packets out another interface, there isn't much hit in performance.
It's when you get into things like NAT, Crypto, QoS, SPI... that you will see performance degradation. Hardware vendors are usually using custom silicon to process the more advanced features, this allows for higher throughput packet forwarding.
Now that merchant silicon is fast enough and the open source applications are getting better, the performance gap is closing.
It really depends on your use case as far as what you want to use. I've gone with both and not seen performance hits, but the software versions weren't handling high throughput workloads.
Performance of the link from the virtual network to the physical eventually becomes important at any reasonable scale. You're right that, within the same physical host, things can be pretty quick, but that requires that one can get everything needed in one box.
While merchant silicon has come a long way in improving the performance of networking equipment, greater gains are taking place getting CPU's to handle networking tasks better. Both AMD and Intel have improved their architectures to the point where 10 Gbps forwarding is a reality. Intel has developed a specialized library (DPDK Wiki Page) that takes care of a lot of low-level networking functions at high performance.

c - netmap - Tun/tap vs netmap/pf_ring/dpdk

Would a Tun/tap device avoid a netmap/pf_ring/dpdk installation ? If tun/tap allow to bypass kernel, isn't it the same thing ?
Or those codes bring so many optimizations that they overclass tun os bypass strategy ?
The final goal is to port tcp/ip from kernel to user space, FOR TESTING PURPOSES.
I don't quite understand here.
Thanks
no.
for userspace tcpip implementation see lwip or rumpkernel.
dpdk/pfring/netmap as you probably know are about getting packets to userspace as fast as possible.
tun/tap are virtual interface things. probably not what you're after.
Tun/tap are not particularly performant. They miss out on the IP stack, but there is a lot of copying still involved. Profile some code using them to see. I think the best option for straight userspace networking is probably AF_PACKET using the ring buffer option, but that will is still an indirect ring buffer that gets copied to the network card ring buffer rather than being direct like you get with solutions like dpdk. It depends on your performance requirements - if it is just for testing correctness any solution should be fine.

cudaMemcpy device to distant host

I am working on a simulation which is running on a host and use the GPU for the computation. Once the computation is done, the host copy the memory from the device to itself and then send the computed data to a distant host.
Basically the data will do : GPU -> HOST -> NETWORK CARD
Since the simulation is in real time, time is very important, and I would like to have something like that : GPU -> NETWORKCARD, in order to reduce the delay of data transfer.
Is it possible?
If no, is it something that we might see someday?
Edit : Distant host => CPU
Yes, this is possible in CUDA 4.0 and later using the GPUDirect facility on platforms which support unified direct addressing (which I think is basically linux with Fermi or Kepler Telsa cards at this stage). You haven't said much about what you mean by "distant host", but if you have a network where MPI is feasible, there is probably a ready solution for you to use.
At least mvapich2 already has support for GPU-GPU transfers using either Infiniband or TCP/IP, including RDMA directly to the Infiniband adapter over the PCI express bus. Other MPI implementations probably also have support by now, although I haven't look too closely at it recently to know for sure.

How to monitor network traffic of running processes

I need a program which monitors network traffic. But like this way: It will show running processes and which IPs and websites are they getting/sending packets. I had such program, but I can't find nor remember its name. All programs I find on google searches returns me same program style which only monitors general network traffic.
You can use wireshark on packet level. Netstat on port level (local).
to monitor a network:
put a port on the switch as monitor port, and put the device in promiscious mode.
use wireshark to see the traffic.
(wireshark was ethereal in the past)
You can try LSP or WinpCap monito process traffic。
I hope this may be helpful to you.
The program you need depends on the type of architecture. If you have devices supporting Netflow, this could be very handy to identify bottlenecks or missues. There are just a few good tools for netflow under a low budget, try solarwinds or Pandora FMS.
For SNMP monitoring, probably the most common case, most tools do a good job: cacti, zabbix, pandora fms or nagios. OpenNMS and Pandora FMS have the best management of Traps, and only a few manage v3 properly.
For a mixed scope on monitoring: server, apps and networking, you have less tools, we use Pandora FMS for that reason, can manage netflow, snmp, wmi (for remote server monitoring) and agent based monitoring for unix & windows server.
Some links:
http://pandorafms.com/Producto/network-monitoring/en
http://opems.org

Is SCTP good for peer-to-peer apps?

I am considering using SCTP instead of TCP for a p2p app written in C. Should I do it? Also how does the speed of SCTP compare to the speed of TCP?
EDIT:
I found that SCTP can be tunneled over UDP with the only problem being tunneled SCTP is not interoperable with untunneled SCTP.
Have you considered whether your target systems will all have SCTP pre-installed on them or whether your application will need to include SCTP itself? In my experience I would not expect all systems to have SCTP installed on them, and I would expect them not to if it were Windows.
If you include SCTP in the application itself then that will more than double the number of messages being passed into an out of the Kernel which will impact performance when compared with using the pre installed TCP.
Have you considered what benefits you want from SCTP? You mentioned fault tolerance but for this to work with SCTP it requires the application to have multiple ethernet ports and and IP addresses. Is this likely on your app?
As much as I love SCTP (!) I would seriously consider sticking with TCP unless you are sure SCTP is needed or unless you control the hosts your app is deployed on.
Regards
If it's for a local area network, sure go for it.
Note however that if you plan to use it on the open internet many consumer grade firewalls aren't flexible enough to permit unrecognised IP protocols through them.
How does it help you?
You're P2P, so every peer must have at least one socket open to every other peer.
If you've got a socket open, then you can do everything you need to do over that. If you've taken the approach of one socket per file and you have multiple files being tranferred concurrently between two given peers, then SCTP will save you one socket per file. However, on a normal P2P network of any size, you will almost never have multiple files being transferred concurrently between two peers.
Just have one socket and have your own little protocol; send a packet with a header, the header indicates content type, e.g. a command, or part a file - and if so, which file, and which byte range.
Of course, you get a little overhead for that, whereas if you have one socket for commands and one per file, you're more efficient. Is saving one socket per peer (assuming one download at a time) worth the time/hassle/complexity of using SCTP?

Resources