Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have two adjacent computers, both running a recent version of Ubuntu. Both computers have:
Multiple USB 2.0 ports
RJ-45 connection
5400RPM hard drive
Express Card card slot
PCMCIA Type II
I want to transfer as much data as possible in a set period of time.
What is the fastest physical medium to transfer data between the two computers without swapping hard drives?
What is the fastest protocol (not necessarily TCP/IP based) for transferring high-entropy data? If it is TCP/IP, what needs to be tweaked for optimal performance?
First of all, RJ-45 is not a medium, but just a connector type. So your ethernet connection could be anything between 10BASE-T (10 Mbit) and 10GBASE-T (10 Gbit). Using ethernet the link speed is defined by the lowest common speed grade supported by both peers.
The USB Hi-Speed mode is specified for 480 Mbit/s (60 MByte/s), but the typical maximum speed is somewhere near (40 MByte/s) due to the protocol overhead. This speed is only for direct USB host to client connections, but you have 2 USB hosts and so you need some kind of device in the middle to handle the client parts. I guess that will also lower the achievable data rate.
With ethernet you have a simple plug 'n play technology with a well known (socket) API. The transfer speed depends on the link type:
Max. TCP/IP data transfer rates (taken from here):
Fast Ethernet (100Mbit): 11.7 MByte/s
Gigabit Ethernet (1000Mbit): 117.6 MByte/s
The USB 2.0 specification results in a 480 Mbit/s rate, which is 60 MB/s.
Ethernet depends on the network cards (NIC) used and to a lesser degree the wiring used. If both NICs are 1Gbit/s they will both auto-negotiate to 1 Gbit/s translating to 125 MB/s. If one or both NICs only support 100 Mbit/s then they will auto-negotiate to 100 Mbit/s and your speed will be 12.5 MBytes/s.
Wireless is also an option with 802.11n supporting up to 600 Mb/s (75 MB/s) - faster than USB 2.0.
USB 3.0 is the latest USB spec supporting up to 5 Gb/s (625 MB/s).
Ofcourse actual throughput will differ and depend on many other factors, such as wiring, interference, latency, etc.
TCP vs. UDP protocol depends on the type of connection you need and your application's capacity to deal with dropped packets, etc. TCP has a higher initial cost for building up the initial connection, but the transmission is reliable and for long running transactions may turn out to be the fastest. UDP is cheaper to create connections, but you may have dropped packets.
Maximum Transmission Unit (MTU) is a parameter than can have a significant affect on an IP based network. Picking the right MTU depends on several factors. The Internet has numerous articles on this.
Other tweaks are the basics like closing known chatty apps, netbios service if your on windows, etc (lots of hits on google for speeding up tcp).
Related
I would have a question regarding physical problem detection in a link with ping.
If we have a fiber or cable which has a problem and generate some CRC errors on the frame (visible with switch or router interface statistics), it's possible all ping pass because of the default small icmp packet size and statistically fewer possibilities of error. First, can you confirm this ?
Also, my second question, if I ping with a large size like 65000 bytes, one ping will generate approximately 65000 / 1500(mtu) = 43 frames, as ip framgents, then the statistics to get packet loss (because normally if one ip fragment is lost the entire ip packet is lost) with large ping is clearly higher ? Is this assumption is true ?
The global question is, with large ping, could we easier detect a physical problem on a link ?
A link problem is a layer 1 or 2 problem. ping is a layer 3 tool, if you use it for diagnosis you might get completely unexpected results. Port counters are much more precise in diagnosing link problems.
That said, it's quite possible that packet loss for small ping packets is low while real traffic is impacted more severely.
In addition to cable problems - that you'll need to repair - and a statistically random loss of packets there are also some configuration problems that can lead to CRC errors.
Most common in 10/100 Mbit networks is a duplex mismatch where one side uses half-duplex (HDX) transmission with CSMA/CD while the other one uses full-duplex (FDX) - once real data is transmitted, the HDX side will detect collisions, late collisions and possibly jabber while the FDX side will detect FCS errors. Throughput is very low, put ping with its low bandwidth usually works.
Duplex mismatches happen most often when one side is forced to full duplex, thus deactivating auto-negotiation and the other side defaults to half duplex.
Is there a maximum in specifications,do they start to interfer if many try to connect at the same time?
what are the modes of communication is there a secured mode or something else ?
Maximum packet size ?
can I send an image or a sound using ble ?
There is no limit in the specification. In reality at least around a millisecond must be allocated to serve each connection event. So if you use a 7.5 ms connection interval you could not expect more than at maximum 10 connections without getting dropped packets (and therefore larger latency). Connection setup/scanning will also miss a large amount of packets if the radio is busy handling current connections.
The maximum packet length is 31 bytes for advertisements (up to Bluetooth 4.2). While connected the longest packet length is 27 bytes. Bluetooth 4.2 defines a packet length extension allowing larger packets but far from all implementations support that.
The security that BLE offers is the bonding procedure. After bonding the devices have established a shared secret key which is then used to encrypt and sign all data being sent.
Sending normal-sized images or sounds will take several seconds since the throughput is quite low.
I think you should really read the Bluetooth specification or some summary to get the answer to your questions.
I have a project in which I need to have the lowest latency possible (in the 1-100 microseconds range at best) for a communication between a computer (Windows + Linux + MacOSX) and a microcontroller (arduino or stm32 or anything).
I stress that not only it has to be fast, but with low latency (for example a fast communication to the moon will have a low latency).
For the moment the methods I have tried are serial over USB or HID packets over USB. I get results around a little less than a millisecond. My measurement method is a round trip communication, then divide by two. This is OK, but I would be much more happy to have something faster.
EDIT:
The question seems to be quite hard to answer. The best workaround I found is to synchronize clocks of the computer and microcontroler. Synchronization requires communication indeed. With the process below, dt is half a round trip, and sync is the difference between the clocks.
t = time()
write(ACK);
read(remotet)
dt = (time() - t) / 2
sync = time() - remotet - dt
Note that the imprecision of this synchronization is at most dt. The importance of the fastest communication channel stands, but I have an estimation of the precision.
Also note technicalities related to the difference of timestamp on different systems (us/ms based on epoch on Linux, ms/us since the MCU booted on Arduino).
Pay attention to the clock shift on Arduino. It is safer to synchronize often (every measure in my case).
USB Raw HID with hacked 8KHz poll rate (125us poll interval) combined with Teensy 3.2 (or above). Mouse overclockers have achieved 8KHz poll rate with low USB jitter, and Teensy 3.2 (Arduino clone) is able to do 8KHz poll rate with a slightly modified USB FTDI driver on the PC side.
Barring this, and you need even better, you're now looking at PCI-Express parallel ports, to do lower-latency signalling via digital pins directly to pins on the parallel port. They must be true parallel ports, and not through a USB layer. DOS apps on gigahertz-level PCs were tested to get sub-1us ability (1.4Ghz Pentium IV) with parallel port pin signalling, but if you write a virtual device driver, you can probably get sub-100us within Windows.
Use raised priority & critical sections out of the wazoo, preferably a non-garbage-collected language, minimum background apps, and essentially consume 100% of a CPU core on your critical loop, and you can definitely reliably achieve <100us. Not 100% of the time, but certainly in the territory of five-nines (and probably even far better than that). If you can tolerate such aberrations.
To answer the question, there are two low latency methods:
Serial or parallel port. It is possible to get latency down to the millisecond scale, although your performance may vary depending on manufacturer. One good brand is Brainboxes, although their cards can cost over $100!
Write your own driver. It should be possible to achieve latencies on the order of a few hundred micro seconds, although obviously the kernel can interrupt your process mid-way if it needs to serve something with a higher priority. This how a lot of scientific equipment actually works. (and a lot of the people telling you that a PC can't be made to work on short deadlines are wrong).
For info, I just ran some tests on a Windows 10 PC fitted with two dedicated PCIe parallel port cards.
Sending TTL (square wave) pulses out using Python code (actually using Psychopy Builder and Psychopy coder) the 2 channel osciloscope showed very consistant offsets between the two pulses of 4us to 8us.
This was when the python code was run at 'above normal' priority.
When run at normal priority it was mostly the same apart from a very occassional 30us gap, presumably when task switching took place)
In short, PCs aren't set up to handle that short of deadline. Even using a bare metal RTOS on an Intel Core series processor you end up with interrupt latency (how fast the processor can respond to interrupts) in the 2-3 µS range. (see http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/industrial-solutions-real-time-performance-white-paper.pdf)
That's ignoring any sort of communication link like USB or ethernet (or other) that requires packetizing data, handshaking, buffering to avoid data loss, etc.
USB stacks are going to have latency, regardless of how fast the link is, because of buffering to avoid data loss. Same with ethernet. Really, any modern stack driver on a full blown OS isn't going to be capable of low latency because of what else is going on in the system and the need for robustness in the protocols.
If you have deadlines that are in the single digit of microseconds (or even in the millisecond range), you really need to do your real time processing on a microcontroller and have the slower control loop/visualization be handled by the host.
You have no guarantees about latency to userland without real time operating system. You're at the mercy of the kernel and it's slice time and preemption rules. Which could be higher than your maximum 100us.
In order for a workstation to respond to a hardware event you have to use interrupts and a device driver.
Your options are limited to interfaces that offer an IRQ:
Hardware serial/parallel port.
PCI
Some interface bridge on PCI.
Or. If you're into abusing IO, the soundcard.
USB is not one of them, it has a 1kHz polling rate.
Maybe Thunderbolt does, but I'm not sure about that.
Ethernet
Look for a board that has a gigabit ethernet port directly connected to the microcontroller, and connect it to the PC directly with a crossover cable.
I referred to different threads about reliable UDP vs TCP for large file transfers. However, before making the decision of choosing UDP over TCP ( and add reliability mechanism to UDP ) I want to benchmark performance of UDP & TCP. Is there any utility in linux or windows that can give me this performance benchmark ?
I found Iperf is one such utility. But when I used Iperf on two linux machines to send data using both udp and tcp, I found that TCP performs better than UDP for 10MB of data. This was surprising for me as it is well known fact that UDP performs better than TCP.
My Questions are :
Does UDP always perform better than TCP ? or is there any specific
scenario where UDP is better than TCP.
Is there any published
benchmarks for validating this fact ?
Is there any standard utilty to measure tcp and udp performance on a particular network ?
Thanks in Advance
UDP is NOT always faster than TCP. There are many TCP performance turning including RSS/vRSS. For example, TCP on Linux-on-HyperV can get 30Gbps and on Linux-on-Azure can get 20G+. //I think for Windows VM, it is similar; also on other virt platform such XEN, KVM, TCP did even better.
There are lot of tools to measure: iPerf, ntttcp (Windows), ntttcp-for-Linux, netperf, etc:
iPerf3: https://github.com/esnet/iperf
Windows NTTTCP: https://gallery.technet.microsoft.com/NTttcp-Version-528-Now-f8b12769
ntttcp-for-Linux: https://github.com/Microsoft/ntttcp-for-linux
Netperf: http://www.netperf.org/netperf/
The differences have two sides, conceptually and practically. Many documentation regarding performance is from the '90s when CPU power was significantly faster than network speed and network adapters were quite basic.
Consider, UDP can technically be faster due to less overheads but modern hardware is not fast enough to saturate even 1 GigE channels with smallest packet size. TCP is pretty much accelerated by any card from checksumming to segmentation through to full offload.
Use UDP when you need multicast, i.e. distributing to say more than say a few recipients. Use UDP when TCP windowing and congestion control is not optimised, such as high latency, high bandwidth WAN links: see UDT and WAN accelerators for example.
Find any performance documentation for 10 GigE NICs. The basic problem is that hardware is not fast enough to saturate the NIC, so many vendors provide total TCP/IP stack offload. Also consider file servers, such as NetApp et al, if software is used you may see tweaking the MTU to larger sizes to reduce the CPU overheads. This is popular with low end equipment such as SOHO NAS devices from ReadyNAS, Synology, etc. With high end equipment if you offload the entire stack then, if the hardware is capable, better latency can be achieved with normal Ethernet MTU sizes and Jumbograms become obsolete.
iperf is pretty much the one goto tool for network testing, but it will not always be the best on Windows platforms. You need to look at Microsoft's own tool NTttcp:
http://msdn.microsoft.com/en-us/windows/hardware/gg463264.aspx
Note these tools are more about testing the network and not application performance. Microsoft's tool goes to extremes with basically a large memory locked buffer queued up waiting for the NIC to send as fast as possible with no interactivity. The tool also includes a warm up session to make sure no mallocs are necessary during the test period.
Can someone explain the concepts of IPoIB and TCP over infiniband? I understand the overall concept and data rates provided by native infiniband, but dont quite understand how TCP and IPoIB fit in. Why do u need them and what do they do? What is the difference when someone says their network uses IPoIB or TCP with infiniband? Which one is better? I am not from a strong networking background, so it would be nice if you could elaborate.
Thank you for your help.
InfiniBand adapters ("HCAs") provide a couple of advanced features that can be used via the native "verbs" programming interface:
Data transfers can be initiated directly from userspace to the hardware, bypassing the kernel and avoiding the overhead of a system call.
The adapter can handle all of the network protocol of breaking a large message (even many megabytes) into packets, generating/handling ACKs, retransmitting lost packets, etc. without using any CPU on either the sender or receiver.
IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; and for example Linux has an "ib_ipoib" driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an HCA act like an ordinary NIC.
IPoIB does not make full use of the HCAs capabilities; network traffic goes through the normal IP stack, which means a system call is required for every message and the host CPU must handle breaking data up into packets, etc. However it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link (although the CPU will probably not be able to run the IP stack fast enough to use a 32 Gb/sec QDR IB link).
Since IPoIB provides a normal IP NIC interface, one can run TCP (or UDP) sockets on top of it. TCP throughput well over 10 Gb/sec is possible using recent systems, but this will burn a fair amount of CPU. To your question, there is not really a difference between IPoIB and TCP with InfiniBand -- they both refer to using the standard IP stack on top of IB hardware.
The real difference is between using IPoIB with a normal sockets application versus using native InfiniBand with an application that has been coded directly to the native IB verbs interface. The native application will almost certainly get much higher throughput and lower latency, while spending less CPU on networking.