How to find states given TCP flags as observable states in HMM - tcp

I am implementing HMM(Hidden markov model).I have obtained a dataset of TCP flags such as Synchronized, Reset, Acknowledgement, FIN/ACK, PUSH/ACK.
The problem is I have to find the number of states so that I can calculate the conditional probabilities, transition probabilities, emission probabilities.
I have assumed random number of states considering the TCP flags as observables. Using Baum-Welch algorithm calculated the transition as well as emission probabilities. But taking random number of states we do not know whether the output is accurate.
So we are trying to find a better way to find out number of states and specifically which are the states to be used.
We are trying to implement the following paper Adaptive IDS using hybrid approach.
Any help would be appreciated.
Thanks in advance!

Thinks are easier or I do not understand the question.
SYN, SYN/ACK, ... are TCP flags. Interpret them as a first classification of TCP messages, so, as TCP message types. These are the events in the TCP finite-state machine.
The states of TCP finite state machine are CLOSE_WAIT, FIN_WAIT_1, ... . In total, 12 states.
If you look for "tcp state machine" in google images, you will easily find a draw of the state machine. By example: http://www.ssfnet.org/Exchange/tcp/Graphics/tcpStateDiagram1.gif
Synchronized is not a TCP flag nor state.

Related

UnitDiskRadioMedium no power consumption settings? (omnetpp)

Looking at:
OMNET++: How to obtain wireless signal power?
and
https://github.com/inet-framework/inet/blob/master/examples/wireless/scaling/omnetpp.ini
there seem to be no power consumption related settings to packets that are sent in a UnitDiskRadio.
Is there a way of setting packet power consumption in a unit disk radio medium, or, conversely, communication range in ApskScalarRadioMedium?
UnitDiskRadio is a simplified version of a radio, where you are not interested in the transmission, propagation, attenuation etc. details. You just want to have a clear cut transmission distance. Above that, the transmission always fails, below that the transmission always succeed. This is simple, fast and suitable if you want to simulate high level behavior like application level or routing. You really don't care how much your radio draws from a power grid (or battery) in this case.
On the other hand, if you are interested in low level details, the whole radio transmission process should be modeled. In this case, you model the power draw and based on that transmission and there is no clear cut transmission range. Whether a transmission succeeds is a probabilistic outcome depending on power, antenna configuration, encoding, modulation, noise and a lot of other stuff, so you cannot set it as a simple "range".
TLDR: No, you cannot set both of them on the same radio.
PS: and make sure that you do not mix and match various power parameters. The first question you linked is about getting the power of a received packet (i.e. how strong that signal was when it was received). The second link show how to configure the transmission power (that goes out on the antenna), and in the question you are referring to power consumption which is a third thing, meaning how much you draw from a battery to make the transmission. They are NOT the same thing.

Packet Loss ratio in VEINS/Omnet++

I am new to VEINS/Omnet++ and trying various broadcast suppression techniques and would like to calculate the packet loss ratio. I assume I have to use this formula :
Packet Loss Ratio = TotalLostPackets / SentPackets
But since some nodes send 0 packets, is there an easy way to specify this in the Omnet++ .anf config file or maybe in VEINS without doing manual adjustments? Otherwise if any node sends a 0 packet, then all graphs appear as infinity.
Thank you!
This does not directly answer your question, but I would warn against using this equation in a simulation where not all nodes might send the same number of packets or where broadcasts are sent. Each packet sent as a broadcast can potentially be received by many other nodes meaning that even a simulation where only 1 packet is sent might also record 7 successful receptions and 5 packet losses. Your equation would calculate the loss rate as 5/1=500% whereas I would find a rate of 5/12=42% more reasonable.
As a side effect of calculating loss rate as "fail/(success+fail)" you will not need to take special care for nodes that did not send/receive packets.

classification of user browsing activity using machine learning

if you record all IP traffic (using wireshark or similar program) while browsing the internet, you'll find many packets sent not as part of of your browsing activity.
my question is:
if you wish to classify the packets (sent from your PC) into two groups:
1) packets sent as part of your browsing activity
2) all other packets
how would you use machine learning to solve this issue?
you can assume the packet-payload can't be used for this purpose because it's either encapsulated or encrypted, so only packet-headers can be used, e.g. TCP window size, TCP flag bits, packet length and packet directions.
Sounds like a binary classification problem.
There are three basic approaches you might use:
Collect packages you can manually label by "browsing activity" and "others" and train binary classifier on top (like SVM etc.)
Collect just packages which are "browsing activity" and train one-class classifier on top (like one class SVM)
Just collect all the data you can and try to cluster it into two clusters, there is a (very small unfortunately!) chance that the division found will be the one you are looking for
In each of the above cases you will need to prepare set of features to represent your data. So either a constant set of some features, or you might try to simply use packet header as a raw text and traing some text-based model, like some convolutional neural network etc.

Zigbee beaconing vs non beaconing

When using a non beaconing Zigbee network, I know that the 802.15.4 spec defines the use of CSMA-CA to control when two devices get access to a channel to make sure no two nodes "step on each others toes" so to speak. My understanding is that very simply, it requires each node to "listen before talking". Is that correct? Is there more information on the Zigbee implementation of this? In other words, where do I go to learn more about how to program a Zigbee chip to implement the same?
Also, if i have 20 end nodes sending data asynchronously to one coordinator, is the channel access mechanism enough to ensure that they do not broadcast at the same time and flood the coordinator? If five nodes (for example) attempt to broadcast at the same time, how will mutual exclusion be ensured? Where can I get some details on that?
Thanks
Rishi
The maximum size of a 802.15.4 packet is 1024 bits of payload. So the maximum duration of the frame (running in standard 250kbps rate on the 2.4GHz band) is about 5ms when you take preamble etc into account. If your end devices are polling at 1 poll/second it should easily manage 20 end nodes I think. If it gets too much the exponential backoff should ease the collision rate.
I'm sure you've seen these when searching, but just in case:
http://www.prismmodelchecker.org/casestudies/zigbee.php
http://www.dagstuhl.de/Materials/Files/07/07101/07101.FruthMatthias.Slides.pdf
http://www-public.it-sudparis.eu/~gauthier/Tools/802_15_4_MAC_PHY_Usage.pdf

How many times should I retransmit a packet before assuming that it was lost?

I've been creating a reliable networking protocol similar to TCP, and was wondering what a good default value for a re-transmit threshold should be on a packet (the number of times I resend the packet before assuming that the connection was broken). How can I find the optimal number of retries on a network? Also; not all networks have the same reliability, so I'd imagine this 'optimal' value would vary between networks. Is there a good way to calculate the optimal number of retries? Also; how many milliseconds should I wait before re-trying?
This question cannot be answered as presented as there are far, far too many real world complexities that must be factored in.
If you want TCP, use TCP. If you want to design a custom-protocol for transport layer, you will do worse than 40 years of cumulative experience coded into TCP will do.
If you don't look at the existing literature, you will miss a good hundred design considerations that will never occur to you sitting at your desk.
I ended up allowing the application to set this value, with a default value of 5 retries. This seemed to work across a large number of networks in our testing scenarios.

Resources