I am writing a thesis concerning the identification of patterns in a network traffic. The input file contains thousands of data lines, each providing information as timestamps, source and destination IP addresses, source and destination ports, interfaces, number pf bytes and packets being exchanged between the source and the destination and protocols. The start and end-time are always the same in a data line.
My question is if there is possible to assign all IP addresses to categories such as routers/servers/clients only based on the info provided or if there are also other info necessary in order to assign all addresses correctly? (the ports used are about 100-150 and are both registered and unregistered).
Thank you!
Your question is very broad because it depends a lot on what categories you have in mind. For example, what's your definition of a server? Anyway, technically NetFlow does not support any kind of endpoint type qualification so you have to rely on statistics. If a certain destination IP address has a significant (absolute) amount of traffic to for example (destination) port 25 it would likely be an SMTP server. And the sender can perhaps be categorized as a client unless it also received a lot of SMTP traffic (so it would be relaying). Since NetFlow usually runs on routers (and less frequently on switches) your NetFlow origin IP address is likely a router. Large amounts of traffic to or from an IP address on a specific port will likely denominate that IP address as a server. You have to determine the boundaries for that. And - if needed - the type of server. SMTP could also run an a non-standard port (e.g. 80), less likely but you could possibly detect that by measuring the amount of ingress vs egress data. My guess would be that several standard protocols have identifiable ratios on this.
Related
I'm testing how windows handle IPv4 ID. I need to generate 10,000 TCP or UDP packets per second with different source IPs(my netmask is 255.255.0.0, so there are 2^16 IPs available. But it's not the case here since I can do IP spoofing). I know that I can change the count parameter in scapy.sendrecv.send to generate a large number of packets at the same time but the configuration of the packets is all the same. Moreover, I also want occasionally to pick out some responses to check the status.
I'm currently thinking about using multithreaded but I'm not sure how to do that. Can anyone give me a structure to start with?
Not sure how else to phrase the question so that's why it's phrased the way it is. I have these two questions:
Is it possible for two TCP segments with source port 80 to be sent by different processes at the sending host
Is it possible for two UDP segments with source port 5723 to be sent by different processes at the same host?
I was unsure of the answer at first, but I believe the answer for both of these to be no, it isn't possible. The reason for this is in the case of TCP, there is no way to uniquely identify the segment because the 4 tuple (source port, dest port, source ip, and dest ip) will be the same across both processes which means no way to distinguish between the segments.
Similarly, for UDP, the IP datagram will carry the source/dest IP, however, these will be the same. The UDP segment will carry the source port/dest port, but again, these will be the same. This means no ability to distinguish between segments for either protocol.
Possible solutions are to use the processes on two separate clients (would mean separate IP, solving the problem in both scenarios), or using the processes from the same host with different ports.
Please inform me if this is correct or if I'm way off, please tell me why. Thank you for your time!
There's a related question: TCP: can two different sockets share a port?
This piece is relevant there:
a given socket connection is uniquely identified by a combination of transport protocol, client IP+port, and server IP+port. Multiple clients can connect to the same server IP+port only if their client IP+port are different from each other
So I think you're mostly right but there can be special circumstances caused by SO_REUSEADDR or SO_REUSEPORT which might allow multiple different processes to reuse the same port: TCP - possible for same client-side port to be used for different connections by different applications simlutaneously?
I understand that it's different than a hub in that instead of packets being broadcasted to all devices connected to the device, it knows exactly who requested the packet by looking at the MAC layer.
However, is it still possible to use a packet sniffer like Wireshark to intercept packets meant for other users of the switch? Or is this only a problem with ethernet hubs that doesn't affect switches due to the nature of how a switch works?
On a slightly off topic side note, what exactly is classified as a LAN? For example, imagine two separate ethernet switches are hooked up to a router. Would each switch be considered a separate LAN? What is the significance of having multiple LAN's within the same network?
it knows exactly who requested the packet by looking at the MAC layer.
More exactly, the switch uses the MAC destination address to forward a frame to the port associated with that address. Addresses are automatically learned by looking at the MAC source address on received frames.
A switch is stateless, ie. is has no memory who requested which data. A layer-2 switch also has no understanding of IP packets, addresses or protocols. All a basic switch does is learn source addresses and forward by destination address.
is it still possible to use a packet sniffer like Wireshark to intercept packets meant for other users of the switch?
Yes. You'll need a managed switch supporting port mirroring or SPANning. This doesn't intercept frames, it just copies them to the mirror port. If you need to actually intercept frames you have to put your interceptor in between the nodes (physically or logically).
With a repeater hub, every bit is repeated to every node in the collision domain, making monitoring effortless.
what exactly is classified as a LAN?
This depends on who you ask and on the context. A LAN can be a layer-1 segment/bus aka collision domain (obsolete), a layer-2 segment (broadcast domain), a layer-3 subnet (mostly identical with an L2 segment) or a complete local network installation (when contrasted with SAN or WAN).
Adding to #Zac67:
Regarding this question:
is it still possible to use a packet sniffer like Wireshark to
intercept packets meant for other users of the switch?
There are also active ways in which you can trick the Switch into sending you data that is meant for other machines. By exploiting the Switch's mechanism, one can send a frame with a spoofed source MAC, and then the Switch will transfer frames destined to this MAC - to the sender's port (until someone else sends a frame with that MAC address).
This video discusses this in detail:
https://www.youtube.com/watch?v=YVcBShtWFmo&list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg&index=18
In general, I recommend the following video that explains this in detail and in a visual way:
https://www.youtube.com/watch?v=Youk8eUjkgQ&list=PL9lx0DXCC4BMS7dB7vsrKI5wzFyVIk2Kg&index=17
what exactly is classified as a LAN?
So indeed this is one of the least-well-defined terms in Computer Networks. With regards to the Data Link Layer, a LAN can be defined as a segment, that is - a broadcast domain. In this case, two devices are regarded as part of the same segment iff they are one hop away from one another - that is, they can switch frames in the second layer.
While reading the book on TCP/IP I came across the words which are as "Although it looks as though the use of the flow label may make the source and destination addresses useless, the parts of the Internet that use connection-less service at the network layer still keep these addresses for several reasons.One reason is that part of the packet path may still be using the connection-less service. Another reason is that the protocol at the network layer is designed with these addresses and it may take a while before they can be changed". Now my question to you is if a connection has been formed between hosts in a connection-oriented manner then how come a path of a packet may still be using the connection-less services. Because as per my knowledge prevails the virtual path always be formed at while 3-way handshake is taking place which is the TCP/IP connection (which uses a connection-oriented service) ? And my second question for the second reason is that which protocol they are talking about since these words are stated below the Heading of "Connection-Oriented Services" therefore, it's making me pissed off to understand the literal meaning behind the words(The core conceptual understanding). And correct if anyone thinks I am having a wrong concept at any place. I'll be obliged. Thanks.
TCP as a connection-oriented protocol runs on top of IP which is connection-less. The routers used in transport only look at the IP packet, the TCP segment is simply payload and transported along. TCP provides several algorithms to form a virtual connection over a connection-less network.
The IP packet goes from hop to hop. On each hop, a router makes a forwarding decision solely based on the destination IP address. (More sophisticated devices may inspect more packet elements including source address and payload, but they aren't simple routers.)
The "path" is made up of all these individual hops. Because each hop is based on an independent routing decision the path can change at any time and for any packet. The path is not laid out by the TCP handshake.
Basically, you have to look at each protocol layer individually. Each one serves its own function.
I hope this also answers the second question.
I need to simulate a massive amount of TCP/IP ethernet traffic. For example, I want to simulate the environment that an ISP has where there might be 40,000 different IP addresses sending TCP/UDP IP traffic to different remote hosts. This is my ideal setup:
Traffic generator - > the device I want to test (one inbound interface and one outbound interface) - > traffic receiver.
The device I want to test is a network traffic monitor/QOS appliance. It effectively sits 'in-line', one interface would be connected to the traffic generator and the other interface connected to the traffic receiver. This in-line interface is effectively a bridge and is not assigned an IP address. It can monitor & apply QOS rules on all traffic passing over that bridge interface.
Layer 4 control is important, so that I can set port numbers (80, 443, 22 etc). Layer 7 application information would be ideal as the device I am testing also does deep packet inspection.
Methods I have already tried include using iperf but in order to simulate 40,000 IP addresses I would need to configure 40,000 virtual interfaces on both the traffic generator and the traffic receiver manually, and I have found that iperf is limited to about 1000 simultaneous connections(on my set up). I have also tried replaying large PCAP files, but then I do not have control over the packets to test QOS capabilities.
Other software/solutions I have looked into are:
http://mininet.org/ (can't handle the amount of connections I need).
ns-3
I am looking for someone to point me in the right direction. Thank you.
There are commercial products for this kind of thing. Short of a home-brew setup with a combination of apache bench, siege, and tcpreplay (which would take significant effort to implement).
See www.spirent.com or www.ixiacom.com.