c - netmap - Tun/tap vs netmap/pf_ring/dpdk - networking

Would a Tun/tap device avoid a netmap/pf_ring/dpdk installation ? If tun/tap allow to bypass kernel, isn't it the same thing ?
Or those codes bring so many optimizations that they overclass tun os bypass strategy ?
The final goal is to port tcp/ip from kernel to user space, FOR TESTING PURPOSES.
I don't quite understand here.
Thanks

no.
for userspace tcpip implementation see lwip or rumpkernel.
dpdk/pfring/netmap as you probably know are about getting packets to userspace as fast as possible.
tun/tap are virtual interface things. probably not what you're after.

Tun/tap are not particularly performant. They miss out on the IP stack, but there is a lot of copying still involved. Profile some code using them to see. I think the best option for straight userspace networking is probably AF_PACKET using the ring buffer option, but that will is still an indirect ring buffer that gets copied to the network card ring buffer rather than being direct like you get with solutions like dpdk. It depends on your performance requirements - if it is just for testing correctness any solution should be fine.

Related

read raw packets over network with C#?

I've got a proprietary BMS language that is sending it's info over a specific UDP port on the network. The existing interface is not very well made or maintained, and functions poorly.
I have access to the stack for the code, and don't mind creating some interpretation functionality
My question is what is the best way that I should be receiving these raw packets in my program to be interpreted? I'm not finding any good documentation on how to do this, and I wanted to try and do it in a reasonably appropriate way.
Do I basically need to make my program constantly sniff a specific port? and will this be cumbersome to the network or program to be doing this?
You tagged this BACnet. Why don't you try Wireshark, with a capture filter "udp port 47808" and see if wireshark exposes the packets in a way that makes sense to you. (or have you done this). If it is bacnet, then normal UDP sockets, bound to port 47808 is the way to go. Note, that 47808-47823 are the most common BACnet "default" ports. Use cports or something to see exactly what port(s) your application is bound to.
You could use a packet-capture library - but that has security connotations, so instead you can probably (for most part) get away with using a .NET 'UdpClient'.
But! The real challenge is the breaking-down & interpretation of the BACnet packets, which is the hard part.
There is (now!/finally) a NuGet package for BACnet - not that I've used it, but that might be one of the best choices for your case.
But I also suggest you experiment with the (advanced & free) VTS (Visual Test Tool) too.
You could also try using the BACnet stack that YABE uses too.

Implementing VPN in an embedded system using LwIP

I've been asked to implement VPN capabilities in an existing software project on an embedded system, in order to make the device available via network to an external server while avoiding trouble with firewalls (no need for encryption, just to make it accessible).
Unfortunately, the embedded system is based on a Cortex-M4 MCU, therefore Linux, which would allow for VPN nearly out of the box, is not an option. All I've got is an RTOS and a working LwIP stack.
I've used VPNs in the past. However, my network knowledge is rather limited concerning implementing VPNs, so I'm rather stumped. As I think, I'd use the current LwIP instance for building up the tunnel connection, and the application would use a second instance for the actual network communication, while the network interface of the second instance is a virtual one (like a tap device on linux), encapsulating its low level data and tranceiving it via the tunnel connection of the first LwIP instance.
Maybe this way I'd be able to create a custom solution for the problem, but the solution should conform to any standards (as the server will be any kind of sophisticated system).
So I wonder if anyone has been confronted with a task like this, and would appreciate any hint what to do, at least a direction where to look at.
Thanks in advance!

cudaMemcpy device to distant host

I am working on a simulation which is running on a host and use the GPU for the computation. Once the computation is done, the host copy the memory from the device to itself and then send the computed data to a distant host.
Basically the data will do : GPU -> HOST -> NETWORK CARD
Since the simulation is in real time, time is very important, and I would like to have something like that : GPU -> NETWORKCARD, in order to reduce the delay of data transfer.
Is it possible?
If no, is it something that we might see someday?
Edit : Distant host => CPU
Yes, this is possible in CUDA 4.0 and later using the GPUDirect facility on platforms which support unified direct addressing (which I think is basically linux with Fermi or Kepler Telsa cards at this stage). You haven't said much about what you mean by "distant host", but if you have a network where MPI is feasible, there is probably a ready solution for you to use.
At least mvapich2 already has support for GPU-GPU transfers using either Infiniband or TCP/IP, including RDMA directly to the Infiniband adapter over the PCI express bus. Other MPI implementations probably also have support by now, although I haven't look too closely at it recently to know for sure.

Inter process communication using GA library

How (GA) Global array library (an implementation of ARMCI) is used for communication between two process located on different remote machines.
Is that something similar to TCP socket programming where one process wait for data and the other transfers it ?
I try to see the documentation that ga_put() and ga_get() are two operation that used for inter-process communication. till now I only able to come up with a program running on the same machine that use shared-Memory architecture (I have used ga_put() and ga_get() to put data in Global array and to get it respectively ).
Now, I want use this program for communicating data (basically performaning one-sided communication) between two remote processes. Obiviously putting the program that I am running on single machine on the remote side will work out. It needs some way to tell which machine should we access and get the right data. And here is where I need your help. how can I do this? (what is its equivalent of TCP/IP listen, accept and connect ... on GA ? )
Or is that the case that GA also uses TCP/IP socket underneath ?
can some one please explain to me? and sample code of two remote processes communicating is also appreciable.
thanks,
I am answering my question after all. May be it will help some one looking for the same issue.
GA Library is implemented to work with MPI. So we have something like:
MPI_Init(..)
GA_Initialize()
MA_Init(..)
// .... do sothing here
GA_Terminate()
MPI_Finalize()
The answer to my question is:
MPI has the following primitives to be able to support client-server commuication:
//in the server side
MPI_Open_port()
MPI_Comm_accept()
//do MPI_Send() or MPI_Recv()
MPI_Close_port()
//client Side
MPI_Comm_connect()
//do MPI_Recv() or MPI_Send()
depending on the hardware support and the MPI implementation used, MPI might use sockets, or other mechanisms (e.g SAN (System area network)).
In general, most MPI implementations use sockets for TCP based communication.
So, yes GA also uses sockets underneath (of course depending on the MPI implementation used)
cheers,

NDIS 5/6 driver with tcp/ip stack, is there code?

I'm trying to write a windows kernel driver which requires tcp/ip communication using NDIS 5/6. Since it will use NDIS, as I understand it, it needs it's own tcp/ip stack implementation.
Could anyone point me in the direction of an implementation of this, or something close to it?
Any help would be greatly appreciated!
Kind regards
You don't need to implement your own TCP/IP stack!
First, are you sure that this needs to be done in a driver? All your complex code and business logic should usually be in a usermode application or service. Drivers are mostly meant to be very simple wrappers around hardware. This rule isn't just some abstract principle either — it's much easier to write usermode code, where you can use a familiar debugger and the much-broader set of Win32 APIs. You'll solve your problem sooner if you can move most of your code to usermode.
If you really must do TCP socket I/O in kernel mode, then you should use Winsock Kernel (WSK). WSK allows you to open a socket, similar to Winsock in usermode. (Although the usermode Winsock API has more options and features; WSK is bare-bones).
WSK is available on Windows Vista and later. If you must support Windows XP, then you need to use TDI. TDI is much harder to get right; I don't reccomend using it if you can avoid it.

Resources