NBAD, Netflow on layer 7 - networking

I'm developing Network Behavior Anomaly Detection and I'm using Cisco protocol NetFlow for collecting traffic information. I want to collect information about layer 7 of ISO OSI Reference Model, especially https protocol.
What is the best way to achieve this?

Maybe someone find it helpful:
In my opinion you should try sFlow or Flexible NetFlow.
SFlow uses a sampling to achieve scalability. System architecture consists receiving devices getting two types of samples:
-randomly sampling packets
-basis of sampling counters at certain time intervals
Sampled packets are sent as sFlow datagrams to a central server running the software for the analysis and reporting of network traffic, sFlow collector.
SFlow may be implemented in hardware or software, and while the name "sFlow" means that this is flow technology, however, this technology is not flow at all, and represents the transmission image on the basis of samples.
NetFlow is a real flow technology. Entries for the flow generated in the network devices and combined into packages.
Flexible NetFlow allows customers to export almost everything that passes through the router, including the entire package and doing it in real time, like sFlow.
In my opinion Flexible NetFlow is much better and if you're afraid of DDoS attack choose it.
If FNF is better why use sFlow? Cause many switches today only supports sFlow, and if we don't have possibility of use FNF and want to get real-time data sFlow is best option.

Related

Network Simulation using NS-3

Can I simulate susceptible-infected-susceptible (SIS) model using NS-3?
I'm aiming to model malware flow using SIS and trying to simulate using NS-3.
I'm a newbie to networks, and have been searching for this since hours, going through tens of research papers but can't find anything similar.
SIS implies some sort of discovery mechanism to spread the infection of a network. These discovery mechanisms typically use exploits in network capable daemons. So, to simulate SIS, you want a simulator that works at the application level of the network stack.
ns-3 has some application-level capabilities, but it's mostly intended to be used to simulate the network stack below the application level. The application-level capabilities that ns-3 does have are limited to traffic generation. Discovering the presence of a service on a Node, let alone a compromised version of a daemon, is not supported.
So, it seems like you'll need to find another simulator. I'm not sure what your options are, but depending on how complex of a simulation you want, you could just roll your own by representing the network as a graph, and infecting a susceptible node with probability p.

Wide Area Network for embedded systems without cellular or internet

I'm looking into how one would create a network of embedded systems. What I'd like to achieve is for a device (basically a chip with network capabilities) to directly send data to a server but not use the internet(tcp/ip) or cellular data(like GSM etc).
I don't have much expertise in this field. Most of the networking protocols I've seen like ZigBee are designed for a Local Area Networks. Wide Area Network can be achieved perhaps over mesh or hoping etc. But is there a known protocol for long range networking, say for sensors, assuming there aren't low power constraints?
I am guessing you want to avoid the internet and GSM, not because you have anything against the protocols but because you want your solution to work without having to rely these networks.
If so then you don't have to rule out TCP/IP as this can be used in private networks also.
From your description it sounds like the closest thing that would meet your needs would be a satellite based communications system. So long as you are not worried about price, power and to a certain extend size, then your sensors can communicate from anywhere using satellite links.
There are also HAP - High Altitude Platforms. These are essentially like low flying Satellites, or high flying planes/blimps, so don't have the same coverage but need less power for a given communication bandwidth. If you search for 'High Altitude Platform Networking' you should find plenty of examples such as the following which is an up to date summary of the technology at the time of writing:
http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2175-91462016000300249
As mentioned above, many if not all of these systems will support IP based communication protocols on top of their lower layers. Unless you really have some issue with the protocols themselves, it seems sensible to use them as there is such a wealth of experience, tools etc associated with IP communications, and using them does not make you dependent on the wider 'Internet'.
Its also worth mentioning that a common pattern is to have local groups of sensors communicate with each other and or with a gateway and the gateway then communicate over the long link back to your server. This allows the individual device be smaller, cheaper, lower power etc. This may not match your requirements if you are not likey to have clusters of sensors, however.
If you search for satellite sensor networks you may find you get a lot of hits for the gateway case mentioned above. This article 'A Survey of Architectures and Scenarios in Satellite-Based Wireless Sensor Networks: System Design Aspects' looks to be a good overview which includes HAPs also and it is available to download form this site at the time of writing:
https://www.researchgate.net/publication/250003254_A_Survey_of_Architectures_and_Scenarios_in_Satellite-Based_Wireless_Sensor_Networks_System_Design_Aspects
I would look into LoRa which is designed with these requirements in mind.
You still need infrastructure for that (gateways, network servers), you could roll out your own or (in urbanized areas) use pre-existing one like TTN.
The choice of networking technologies depends on many factors, such as range, power constraints, bandwidth requirements and so on.
User Raber already pointed out that you could look into LoRa / LoRaWAN. Since nobody else mentioned it yet, my suggestion is to also have a look at SigFox technology, which is slightly different from LoRa in what it offers and their business model.

Network design: one-to-many high bandwidth transmissions that excludes users

If StackOverflow is the wrong Exchange for this question, please help direct me to the correct one.
Short Version
What is the best design for a networking application in which one user transmits a constant, high-bandwidth stream of data to many other addresses? The solution must not require the uploader to duplicate the packets for each recipient and preferably will not transmit to users that have not been accepted by the transmitter.
Long Version
A friend and I have written an application that enables someone to transmit data in real time to one or more recipients that he wants to receive the data. I have designed the high-level application protocol to use UDP and to encode the data so that each packet can be lost without hurting the use of the rest. This solution requires managing sockets with each user and sending each packet to every user.
The problem here is that the stream can be very high bandwidth. The user can modify the settings for how high quality the data he is sending should be, and can end up sending 6 Mbps to each user. It is unfeasible to expect a user to pay his ISP enough to be allowed to upload such a stream to the preferred minimum of four other users at a time.
We need a way for the transmitter to send a packet exactly once and have each user receive a copy.
We have looked at multicasting. It may be what we need to use in the end, but we are concerned about the fact that anyone can join any group. It would be preferable to not allow users we do not want to see the data to not be allowed to join in. There is also the problem that if multiple transmitters happen to use the same group, viewers may find that they are receiving multiple streams' worth of data when they only want one.
My searching has revealed something IBM published over a decade ago called Explicit Multicast (Xcast) that looks perfect, but I have yet to find any information to determine whether this technology is commonly supported. Also, I have not yet seen whether it supports datagrams.
Does anyone know the best way to design an application that meets our needs?
Please keep in mind that we have no funds to support our project. Solutions need to be free.
Edit
In the summary above, I hinted at but failed to explicitly state that this is for a real-time application. The motivating drive behind the application is to keep the clients/recipients as close together in time as is possible. If packets are lost or arrive too late to be used in keeping the server and clients in phase, they need to be disregarded. That is why I designed the application protocol on top of UDP with independent data in each packet. Even if a client receives only one packet out of 300 for a given time step, it will use what it did get.
I think that I_am_Helpful's recommendation may be a good step in the right direction (or possibly the destination). I need to do some experimentation to determine whether using a system like Spread will work. However, I do not think I can budget more than additional 17 ms in transmission time.
If you can think of a system that enables sending unreliable datagrams to a specific group of users (like Spread) for a real-time application (unlike Spread, see p. 3), please let me know about it.
We need a way for the transmitter to send a packet exactly once and
have each user receive a copy.
In my limited knowledge, I would say that Reliable Multicasting appears to be one of the viable option for broadcasting in the group. I would like to mention that there are some of the possible Java API's* which could help you achieving the same :
JGroups Java API
The Spread Toolkit -> Spread consists of a library that user applications are linked with, a binary daemon which runs on each computer that is part of the processor group, and various utility and demonstration programs.
Appia
*NOTE : I have never worked with these API's.
It would be preferable to not allow users we do not want to see the
data to not be allowed to join in.
They do provide this feature, e.g., Spread supports thousands of groups with different sets of members. It also provides a range of reliability, ordering and stability guarantees for messages. JGroups can be used to create groups of processes whose members can send messages to each other. It also has facilities like group creation and deletion(Group members can be spread across LANs or WANs).
There is also the problem that if multiple transmitters happen to use
the same group, viewers may find that they are receiving multiple
streams' worth of data when they only want one.
When you could easily create multiple groups in the same network(using Spread,etc.), then, I believe that would no longer be an issue. It is your responsibility to declassify users into different groups.
I hope the given information helps. Good LUCK.
Via multicast you achieve exactly you want: sending each packet once, but authentication seems to be a concern for you.
One possible solution could be simetric cryptography, where the same key is used to encrypt and decrypt. Via TCP your clients connect to a server and fetch the multicast IP Address of the transmission and its associated key, then they join the multicast group and decrypt the incomming transmission.
If you accept a more flexible solution, you could have a server which sends a transmission in real time to a set of distributed servers. Your clients connect to one of these distributed servers via unicast, and after authentication is done, they are inluded in a list of receivers. Each distributed server sends each new transmission packet to each registered client via UDP. in ordinary situations your clients would have the same experience as if it was delivered in a multicast group, but the servers will spend far more bandwidth. Multiple transmission at a time will be allowed, so it could be good for you, and you can have more control, as clients can send signals to the servers, like PAUSE, and etc.

Flow based routing and openflow

This may not be the typical stackoverflow question.
A colleague of mine has been speculating that flow-based routing is going to be the next big thing in networking. Openflow provides the technology to use low cost switches in large application, IT data-centers, etc; replacing Cisco, HP, etc switch and routers. The theory is that you can create a hierarchy these openflow switches with simple configuration, eg. no spanning tree. Open flow will route each flow to the appropriate switch/switch-port, using only the knowledge of the hierarchy of switches (no routers). The solution is suppose to save enterprises money and simplify networking.
Q. He is speculating that this may dramatically change enterprise networking. For many reasons, I am skeptical. I would like to hear your thoughts.
OpenFlow is a research project from Stanford University led by professor Nick McKeown. In the original OpenFlow research paper, the goal of OpenFlow was to give researchers a way "to run experimental protocols in the networks they use every day." For years networking researchers have had an almost impossible task deploying and evaluating their ideas on real networks with real Ethernet switches and IP routers. The difficultly is that real switches and routers from companies like Cisco, HP, and others, are all closed, proprietary boxes that implement standard "protocols", like Ethernet spanning tree, and OSPF. There are business reasons why Cisco and HP won't let you run software on their switches and routers; there is no technical reason. OpenFlow was invented to solve a people problem: if Cisco is not willing to let you run code on their switch, maybe they can at least provide a very narrow interface to let you remotely configure their switch, and that narrow interface is called OpenFlow.
To my knowledge more than a dozen companies are currently implementing OpenFlow support for their switches. Some like HP are only providing the OpenFlow software for research purposes. Others like NEC are actually offering commercial support.
For academic researchers that want to evaluate new routing protocols in real networks, OpenFlow is a huge win. For switch vendors, it is less clear if OpenFlow support will help, hurt, or have no effect in the long run. After all, the academic research market is very small.
The reason why OpenFlow is most often discussed in the context of enterprise networks is that OpenFlow grew out of a previous research project called Ethane that used OpenFlow's mechanism of remotely programming switches in an enterprise network in order to centralize a security policy. Ethane, and by extension OpenFlow, has led directly to two startup companies: Nicira, founded by Martin Casado, and Big Switch Networks, founded by Guido Appenzeller. It would be easier to implement an Ethane-like system if all of the switches in the network supported OpenFlow.
Closely related to enterprise networks are data center networks, the networks that interconnect thousands to tens of thousands of servers in companies such as Google, Facebook, Microsoft, Amazon.com, and Yahoo!. One problem with Ethernet is that it does not scale to this many servers on the same Layer 2 network. We attempted to solve this problem in a research project called PortLand. We used OpenFlow to facilitate programming the switches from a central controller, which we called a Fabric Manager. We released the PortLand source code as open source.
However, we also found a limitation to OpenFlow's functionality. In another data center networking research project called Helios, we were not able to use OpenFlow because it did not provide a mechanism for bonding multiple switch ports into a Link Aggregation Group (LAG). Presumably one could extend the OpenFlow specification indefinitely until it all possible switch features become exposed.
There are other networks as well such as the Internet access networks, Internet backbones, home networks, wireless networks, cellular networks, etc. Researchers are trying to see where OpenFlow fits into all of these markets. What it really comes down to is the question, "what problem does OpenFlow solve?" Ethane makes a case for enterprise networks but I have not yet seen a compelling case for any other type of network. OpenFlow might be the next big thing, or it might end up being a case of "don't solve a people problem with a technical solution."
In order to assess the future of flow-based networking and OpenFlow, here’s the way to think about it.
It starts with the silicon trends: Moore’s Law (2X transistors per 18-24 months), and a correlated but slower improvement in the I/O bandwidth available on a single chip (roughly 2X every 30-36 months). You can now buy full-featured 10GbE single chip switches with 64 ports, and chips which have a mix of 40GbE and 10GbE ports with comparable total I/O bandwidth.
There are a variety of ways physically connect these in a mesh (ignoring the loop-free constraints of spanning tree and the way Ethernet learns MAC addresses). In the high performance computing (HPC) world, a lot of work has been done building clusters with InfiniBand and other protocols using meshes of small switches to network the compute servers. This is now being applied to Ethernet meshes. The geometry of a CLOS or fat-tree topology enables a two stage mesh with a large number of ports. The math is thus: Where n is the # of ports per chip, the number of devices you can connect in a two-stage mesh is (n*2)/2, and the number you can connect in a three-stage mesh is (n*3)/4. While with standard spanning tree and learning, the spanning tree protocol will disable the multi-path links to the second stage, most of the Ethernet switch vendors have some sort of multi-chassis Link Aggregation protocol which gets around the multi-pathing limitation. There is also standards work in this area. Although it might not be obvious, the vast majority of Link Aggregation schemes allocate traffic so all the frames of any given flow take the same path. This is done in order to minimize out-of-order frames so they don’t get dropped by some higher level protocol. They could have chosen to call this “flow based multiplexing” but instead they call it “link aggregation”.
Although the devil is in the details, there are a variety of data center operators and vendors that have concluded they don’t need to have large multi-slot chassis switches in the aggregation/core layer for server connect, instead using meshes of inexpensive 1U or 2U switches.
People have also concluded that eventually you need some kind of management station to set up the configuration of all the switches. Again, drawing from the experience with HPC and InfiniBand, they use what is called an InfiniBand Controller. In the telecom world, most telecom networks have evolved to separate the management and part of the control plane from the boxes that carry the data traffic.
Summarizing the points above, meshes of Ethernet switches with an external management plane with multipath traffic where flows are kept in order is evolutionary, not revolutionary, and is likely to become mainstream. At least one major company, Juniper, has made a big public statement about their endorsement of this approach. I'd call all of these "flow-based routing".
Juniper and other vendors’ proprietary approaches notwithstanding, this is an area that cries out for standards. The Open Networking Foundation (ONF), was founded to promote standards in this area, starting with OpenFlow. Within a couple of months, the sixty+ members of ONF will be celebrating their first year anniversary. Each member has, I am led to believe, paid tens of thousands of dollars to join. While the OpenFlow protocol has a ways to go before it is widely adopted, it has real momentum.
#Nathan: OpenFlow 1.1 actually adds some primitives that enable the use of multiple links via the Multipath Proposal.
An excellent view of OpenFlow by Simon Crosby
http://community.citrix.com/display/ocb/2011/03/21/The+Rise+of+the+Software+Defined+Network
More context on SDN which discusses IETF's SDN initiative and ONF's Openflow. Working in conjuction is a powerful combination http://bit.ly/A8xYso
Nathan, Excellent historical account and overview of openflow. Thanks!
You've hit on the points that I've been wrapping my head around as to why Openflow might not be widely adopted. Since it was designed to be open to allow researcher the ability to run experimental protocols and not necessarily be "compatible with" the big players Cisco/HP/etc. it puts itself into niche (although potentially big) market, more on this later. And as you've stated it's recieved some adoption in the "cloud data centers (CDC)" e.g. google, facebook, etc because they need to exploit experimental protocols to gain a competitive advantage or optimize for their application.
As you've stated some switch vendors have added openflow capability to capitalize on the niche need in academia and potentially sell into the CDC; google, facebook. This is potentially a big market (or bubble if you're pessimistic).
The problem that I see is that the majority of the market (80% or more) is enterprise IT data centers. The requirements here is for stable, compatible networking. Open and less expensive would be nice, but not at the cost of the former.
One could think of a day where corporate IT is partially or completely cloud-sourced where QoS is maintained by the cloud provider. In this case, experimental protocols could be leveraged to provide a competitive advantaged for speed or QoS. In which case; openflow could play a more dominant roll. I personally think this scenario is many years off.
So, the conclusion I come to is that other than in research and perhaps CDCs (google, facebook), the market is pretty small. I suppose that if researchers use openflow to come up with a better protocol for say link aggregation, or congestion management, then eventually Cisco and HP will provide those in their standard offering because their customers will demand it. So openflow could be a market influencer (via the research community), but it would not be a market disruptor.
Do you agree with my conclusions? Thanks for your input.

Lots of ports with little data, or one port with lots of data?

I've been checking out using a system called ROS (http://www.ros.org) for some work.
There are lots of different types of data that get sent between network nodes in ROS.
You define a struct of data that you want to send in a message, and ROS will handle opening a specific port between the two nodes that will only send that struct of data.
So if there are 5 different messages, there will be 5 different ports.
As opposed to this scenario, I have seen other platforms that just push all the different messages across one port. This means that there needs to be a sort of multiplexing/demultiplexing (done by some sort of message parsing on the receivers end).
What I wonder is... which is better from a performance perspective?
Do operating systems switch based on ports quickly, so that a system like ROS doesn't have to do too much work to work out what is in the message and interpreting it?
OR
Is opening lots of ports going to mean lots of slower kernel calls, and the cost of having to work out and translate message types end up being more then the time spent switching between ports?
When this scales to a large amount of data at high rates and lots of different messages types there will be lots of ports. So I imagine that when scaling each of these topologies that performance will be a big factor in selecting the way to work.
I should also point out that these nodes usually exist on one small network, or most of the time on the one machine in which networking is used as a force of inter-process communication. So the transmission time is only a very small factor in the overall system timing.
ROS being an architecture for robots may have one node for every sensor and actuator, so depending on the complexity of your system we may be talking about 20-30 nodes pushing small-ish (100bytes or so) data between 10-100Hz
It depends. I do not know the specifics of ROS but in networking it comes down to the following constraints:
Distance: speed of light is fast but over a distance it starts making a difference
Protocol Overhead: connection oriented vs. connection-less
On the OS side, maintaining a list of free ports isn't such much of an overhead - of course there is a cost to it but everything is relative: if you are talking about a distributed system with long distance links, then it is easy to argue that cycling through OS network ports ranks as lower concern compared to managing communication quality.
Without a more specific question, I'll stop here.
I don't have any data on this, but it seems plausible that multiple ports might be handled more efficiently by multi-core systems, as opposed to demultiplexing within the program.

Resources