Finite State Machine of concurrent system - fsm

I devise a new algorithm to use flow analysis technique to detect unreachability faults in concurrent systems. I need to find some finite state machine of large concurrent system (probably with hundreds of states) such as network protocols to do experiments. However, I can't find it on the web. Can anyone give me some clue?
I need state machines that the transitions between them should be synchronized.
Thanks in advance.

TCP states:
http://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed.svg
this is not big but good one.

Any non-trivial Erlang program. Erlang programs usually consist of hundreds of (potentially concurrent) processes exchanging messages.

I've heard the SIP state machine when used with Reliable Provisional Responses and ICE gets really large. But reconstructing a state diagram from those standards will be time consuming (SIP developers all over the world would be grateful for having such a diagram, if it was correct and complete).
Q.931 (the ISDN UNI protocol) has nice SDL state diagrams. It's only 25 states though.

Related

Can i ignore UDP's lack of reliability features in a controlled environment?

I'm in a situation where, logically, UDP would be the perfect choice (i need to be able to broadcast to hundreds of clients). This is in a very small and controlled environment (the whole network is over a few square metters, all devices are local, the network is way oversized with gigabit ethernet and switches everywhere).
Can i simply "ignore" all of the added reliability that needs to be tossed on udp (checking messages arrived, resending them etc) as those mostly apply where the is expected packet loss (the internet) or is it really suggested to handle udp as "may not arrive" even in such conditions?
I'm not asking for theorycrafting, really wondering if anyone could tell me from experience if i'm actually likely to have udp packets missing in such an environment or is it's going to be a really rare event as obviously sending things and assuming that worked is much simpler than handling all possible errors.
This is a matter of stochastics. Even in small local networks, packet losses will occur. Maybe they have an absolute probability of 1e-10 in a normal usage scenario. Maybe more, maybe less.
So, now comes real-world experience: Network controllers and Operating systems do have a tough live, if used in high-throughput scenarios. Worse applies to switches. So, if you're near the capacity of your network infrastructure, or your computational power, losses become far more likely.
So, in the end it's just a question on how high up in the networking stack you want to deal with errors: If you don't want to risk your application failing in 1 in 1e6 cases, you will need to add some flow/data integrity control; which really isn't that hard. If you can live with the fact that the average program has to be restarted every once in a while, well, that's error correction on user level...
Generally, I'd encourage you to not take risks. CPU power is just too cheap, and bandwidth, too, in most cases. Try ZeroMQ, which has broadcast communication models, and will ensure data integrity (and resend stuff if necessary), is available for practically all relevant languages, and runs on all relevant OSes, and is (at least from my perspective) easier to use than raw UDP sockets.

NBAD, Netflow on layer 7

I'm developing Network Behavior Anomaly Detection and I'm using Cisco protocol NetFlow for collecting traffic information. I want to collect information about layer 7 of ISO OSI Reference Model, especially https protocol.
What is the best way to achieve this?
Maybe someone find it helpful:
In my opinion you should try sFlow or Flexible NetFlow.
SFlow uses a sampling to achieve scalability. System architecture consists receiving devices getting two types of samples:
-randomly sampling packets
-basis of sampling counters at certain time intervals
Sampled packets are sent as sFlow datagrams to a central server running the software for the analysis and reporting of network traffic, sFlow collector.
SFlow may be implemented in hardware or software, and while the name "sFlow" means that this is flow technology, however, this technology is not flow at all, and represents the transmission image on the basis of samples.
NetFlow is a real flow technology. Entries for the flow generated in the network devices and combined into packages.
Flexible NetFlow allows customers to export almost everything that passes through the router, including the entire package and doing it in real time, like sFlow.
In my opinion Flexible NetFlow is much better and if you're afraid of DDoS attack choose it.
If FNF is better why use sFlow? Cause many switches today only supports sFlow, and if we don't have possibility of use FNF and want to get real-time data sFlow is best option.

Flow based routing and openflow

This may not be the typical stackoverflow question.
A colleague of mine has been speculating that flow-based routing is going to be the next big thing in networking. Openflow provides the technology to use low cost switches in large application, IT data-centers, etc; replacing Cisco, HP, etc switch and routers. The theory is that you can create a hierarchy these openflow switches with simple configuration, eg. no spanning tree. Open flow will route each flow to the appropriate switch/switch-port, using only the knowledge of the hierarchy of switches (no routers). The solution is suppose to save enterprises money and simplify networking.
Q. He is speculating that this may dramatically change enterprise networking. For many reasons, I am skeptical. I would like to hear your thoughts.
OpenFlow is a research project from Stanford University led by professor Nick McKeown. In the original OpenFlow research paper, the goal of OpenFlow was to give researchers a way "to run experimental protocols in the networks they use every day." For years networking researchers have had an almost impossible task deploying and evaluating their ideas on real networks with real Ethernet switches and IP routers. The difficultly is that real switches and routers from companies like Cisco, HP, and others, are all closed, proprietary boxes that implement standard "protocols", like Ethernet spanning tree, and OSPF. There are business reasons why Cisco and HP won't let you run software on their switches and routers; there is no technical reason. OpenFlow was invented to solve a people problem: if Cisco is not willing to let you run code on their switch, maybe they can at least provide a very narrow interface to let you remotely configure their switch, and that narrow interface is called OpenFlow.
To my knowledge more than a dozen companies are currently implementing OpenFlow support for their switches. Some like HP are only providing the OpenFlow software for research purposes. Others like NEC are actually offering commercial support.
For academic researchers that want to evaluate new routing protocols in real networks, OpenFlow is a huge win. For switch vendors, it is less clear if OpenFlow support will help, hurt, or have no effect in the long run. After all, the academic research market is very small.
The reason why OpenFlow is most often discussed in the context of enterprise networks is that OpenFlow grew out of a previous research project called Ethane that used OpenFlow's mechanism of remotely programming switches in an enterprise network in order to centralize a security policy. Ethane, and by extension OpenFlow, has led directly to two startup companies: Nicira, founded by Martin Casado, and Big Switch Networks, founded by Guido Appenzeller. It would be easier to implement an Ethane-like system if all of the switches in the network supported OpenFlow.
Closely related to enterprise networks are data center networks, the networks that interconnect thousands to tens of thousands of servers in companies such as Google, Facebook, Microsoft, Amazon.com, and Yahoo!. One problem with Ethernet is that it does not scale to this many servers on the same Layer 2 network. We attempted to solve this problem in a research project called PortLand. We used OpenFlow to facilitate programming the switches from a central controller, which we called a Fabric Manager. We released the PortLand source code as open source.
However, we also found a limitation to OpenFlow's functionality. In another data center networking research project called Helios, we were not able to use OpenFlow because it did not provide a mechanism for bonding multiple switch ports into a Link Aggregation Group (LAG). Presumably one could extend the OpenFlow specification indefinitely until it all possible switch features become exposed.
There are other networks as well such as the Internet access networks, Internet backbones, home networks, wireless networks, cellular networks, etc. Researchers are trying to see where OpenFlow fits into all of these markets. What it really comes down to is the question, "what problem does OpenFlow solve?" Ethane makes a case for enterprise networks but I have not yet seen a compelling case for any other type of network. OpenFlow might be the next big thing, or it might end up being a case of "don't solve a people problem with a technical solution."
In order to assess the future of flow-based networking and OpenFlow, here’s the way to think about it.
It starts with the silicon trends: Moore’s Law (2X transistors per 18-24 months), and a correlated but slower improvement in the I/O bandwidth available on a single chip (roughly 2X every 30-36 months). You can now buy full-featured 10GbE single chip switches with 64 ports, and chips which have a mix of 40GbE and 10GbE ports with comparable total I/O bandwidth.
There are a variety of ways physically connect these in a mesh (ignoring the loop-free constraints of spanning tree and the way Ethernet learns MAC addresses). In the high performance computing (HPC) world, a lot of work has been done building clusters with InfiniBand and other protocols using meshes of small switches to network the compute servers. This is now being applied to Ethernet meshes. The geometry of a CLOS or fat-tree topology enables a two stage mesh with a large number of ports. The math is thus: Where n is the # of ports per chip, the number of devices you can connect in a two-stage mesh is (n*2)/2, and the number you can connect in a three-stage mesh is (n*3)/4. While with standard spanning tree and learning, the spanning tree protocol will disable the multi-path links to the second stage, most of the Ethernet switch vendors have some sort of multi-chassis Link Aggregation protocol which gets around the multi-pathing limitation. There is also standards work in this area. Although it might not be obvious, the vast majority of Link Aggregation schemes allocate traffic so all the frames of any given flow take the same path. This is done in order to minimize out-of-order frames so they don’t get dropped by some higher level protocol. They could have chosen to call this “flow based multiplexing” but instead they call it “link aggregation”.
Although the devil is in the details, there are a variety of data center operators and vendors that have concluded they don’t need to have large multi-slot chassis switches in the aggregation/core layer for server connect, instead using meshes of inexpensive 1U or 2U switches.
People have also concluded that eventually you need some kind of management station to set up the configuration of all the switches. Again, drawing from the experience with HPC and InfiniBand, they use what is called an InfiniBand Controller. In the telecom world, most telecom networks have evolved to separate the management and part of the control plane from the boxes that carry the data traffic.
Summarizing the points above, meshes of Ethernet switches with an external management plane with multipath traffic where flows are kept in order is evolutionary, not revolutionary, and is likely to become mainstream. At least one major company, Juniper, has made a big public statement about their endorsement of this approach. I'd call all of these "flow-based routing".
Juniper and other vendors’ proprietary approaches notwithstanding, this is an area that cries out for standards. The Open Networking Foundation (ONF), was founded to promote standards in this area, starting with OpenFlow. Within a couple of months, the sixty+ members of ONF will be celebrating their first year anniversary. Each member has, I am led to believe, paid tens of thousands of dollars to join. While the OpenFlow protocol has a ways to go before it is widely adopted, it has real momentum.
#Nathan: OpenFlow 1.1 actually adds some primitives that enable the use of multiple links via the Multipath Proposal.
An excellent view of OpenFlow by Simon Crosby
http://community.citrix.com/display/ocb/2011/03/21/The+Rise+of+the+Software+Defined+Network
More context on SDN which discusses IETF's SDN initiative and ONF's Openflow. Working in conjuction is a powerful combination http://bit.ly/A8xYso
Nathan, Excellent historical account and overview of openflow. Thanks!
You've hit on the points that I've been wrapping my head around as to why Openflow might not be widely adopted. Since it was designed to be open to allow researcher the ability to run experimental protocols and not necessarily be "compatible with" the big players Cisco/HP/etc. it puts itself into niche (although potentially big) market, more on this later. And as you've stated it's recieved some adoption in the "cloud data centers (CDC)" e.g. google, facebook, etc because they need to exploit experimental protocols to gain a competitive advantage or optimize for their application.
As you've stated some switch vendors have added openflow capability to capitalize on the niche need in academia and potentially sell into the CDC; google, facebook. This is potentially a big market (or bubble if you're pessimistic).
The problem that I see is that the majority of the market (80% or more) is enterprise IT data centers. The requirements here is for stable, compatible networking. Open and less expensive would be nice, but not at the cost of the former.
One could think of a day where corporate IT is partially or completely cloud-sourced where QoS is maintained by the cloud provider. In this case, experimental protocols could be leveraged to provide a competitive advantaged for speed or QoS. In which case; openflow could play a more dominant roll. I personally think this scenario is many years off.
So, the conclusion I come to is that other than in research and perhaps CDCs (google, facebook), the market is pretty small. I suppose that if researchers use openflow to come up with a better protocol for say link aggregation, or congestion management, then eventually Cisco and HP will provide those in their standard offering because their customers will demand it. So openflow could be a market influencer (via the research community), but it would not be a market disruptor.
Do you agree with my conclusions? Thanks for your input.

Lots of ports with little data, or one port with lots of data?

I've been checking out using a system called ROS (http://www.ros.org) for some work.
There are lots of different types of data that get sent between network nodes in ROS.
You define a struct of data that you want to send in a message, and ROS will handle opening a specific port between the two nodes that will only send that struct of data.
So if there are 5 different messages, there will be 5 different ports.
As opposed to this scenario, I have seen other platforms that just push all the different messages across one port. This means that there needs to be a sort of multiplexing/demultiplexing (done by some sort of message parsing on the receivers end).
What I wonder is... which is better from a performance perspective?
Do operating systems switch based on ports quickly, so that a system like ROS doesn't have to do too much work to work out what is in the message and interpreting it?
OR
Is opening lots of ports going to mean lots of slower kernel calls, and the cost of having to work out and translate message types end up being more then the time spent switching between ports?
When this scales to a large amount of data at high rates and lots of different messages types there will be lots of ports. So I imagine that when scaling each of these topologies that performance will be a big factor in selecting the way to work.
I should also point out that these nodes usually exist on one small network, or most of the time on the one machine in which networking is used as a force of inter-process communication. So the transmission time is only a very small factor in the overall system timing.
ROS being an architecture for robots may have one node for every sensor and actuator, so depending on the complexity of your system we may be talking about 20-30 nodes pushing small-ish (100bytes or so) data between 10-100Hz
It depends. I do not know the specifics of ROS but in networking it comes down to the following constraints:
Distance: speed of light is fast but over a distance it starts making a difference
Protocol Overhead: connection oriented vs. connection-less
On the OS side, maintaining a list of free ports isn't such much of an overhead - of course there is a cost to it but everything is relative: if you are talking about a distributed system with long distance links, then it is easy to argue that cycling through OS network ports ranks as lower concern compared to managing communication quality.
Without a more specific question, I'll stop here.
I don't have any data on this, but it seems plausible that multiple ports might be handled more efficiently by multi-core systems, as opposed to demultiplexing within the program.

Is there an algorithm for fingerprinting the TCP congestion control algorithm used in a captured session?

I would like a program for determining the TCP congestion control algorithm used in a captured TCP session.
The referenced Wikipedia article states:
TCP New Reno is the most commonly
implemented algorithm, SACK support is
very common and is an extension to
Reno/New Reno. Most others are
competing proposals which still need
evaluation. Starting with 2.6.8 the
Linux kernel switched the default
implementation from reno to BIC. The
default implementation was again
changed to CUBIC in the 2.6.19
version.
Also:
Compound TCP is a Microsoft
implementation of TCP which maintains
two different congestion windows
simultaneously, with the goal of
achieving good performance on LFNs
while not impairing fairness. It has
been widely deployed with Microsoft
Windows Vista and Windows Server 2008
and has been ported to older Microsoft
Windows versions as well as Linux.
What would be some strategies for determining which CC algorithm is in use (from a third party capturing the session)?
Update
This project has built a tool to do this:
The Internet has recently been
evolving from homogeneous congestion
control to heterogeneous congestion
control. Several years ago, Internet
traffic was mainly controlled by the
standard TCP AIMD algorithm, whereas
Internet traffic is now controlled by
many different TCP congestion control
algorithms, such as AIMD, BIC, CUBIC,
CTCP, HSTCP, HTCP, HYBLA, ILLINOIS,
LP, STCP, VEGAS, VENO, WESTWOOD+, and
YEAH. However, there is very little
work on the performance and stability
study of the Internet with
heterogeneous congestion control. One
fundamental reason is the lack of the
deployment information of different
TCP algorithms. The goals of this
project are to:
1) develop tools for identifying the TCP algorithms in the Internet,
2) conduct large-scale TCP-algorithm measurements in the Internet.
There are many more congestion control algorithms than you mention here, off the top of my head the list includes: FAST, Scalable, HSTCP, HTCP, Bic, Cubic, Veno, Vegas.
There are also small variations of them due to bug fixes in actual implementations and I'd guess that implementations in different OSes also behave slightly different from one another.
But if I need to try to come up with an idea it would be to estimate the RTT of the connection, you can try to look at the time it took between the third and the fourth packets, as the first and second packets may be tainted by ARPs and other discovery algorithms along the route.
After you have an estimate for RTT you could try to refine it along the way, I'm not exactly sure how you could do that though. But you don't require a full spec for the program, just ideas :-)
With the RTT figured out you can try to put the packets into RTT bins and count the number of in flight data packets in each bin. This way you'll be able to "plot" estimated-cwnd (# of packets in bin) to time and try some pattern matching there.
An alternative would be to go along the trace and try to "run" in your head the different congestion control algorithms and see if the decision at any point matches with the decision you would have done. It will require some leniency and accuracy intervals.
This definitely sounds like an interesting and challenging task!

Resources