Subtract value from data with keys through multiple columns in R - r

I have two data.tables with multiple columns as keys (they consists of the columns record, dstPort, srcPort, proto, dstIP, and srcIP).
Both have the same format.
dataset_1:
record dstPort srcPort proto dstIP srcIP state timestamp
1: state 80 32768 tcp 192.168.101.5 192.168.101.89 syn 1466580661185059
2: state 80 32768 tcp 192.168.101.5 192.168.101.89 syn_ack 1466520661604781
3: state 80 32768 tcp 192.168.101.5 192.168.101.89 close 1466532661885439
4: state 80 55555 tcp 192.168.101.5 192.168.101.89 syn 1466532661885440
and dataset_2:
record dstPort srcPort proto dstIP srcIP state timestamp
1: state 80 32768 tcp 192.168.101.5 192.168.101.89 established 1466537661727619
2: state 80 32768 tcp 192.168.101.5 192.168.101.89 close 1466532661986891
3: state 80 44444 tcp 192.168.101.5 192.168.101.89 established 1466537661727619
The following is what I would like to do for every key in the dataset:
I want to find the records (rows) with the same key and where a given state is available (i.e. state syn in dataset_1 and established in dataset_2 ).
For these records I want to subtract the timestamps from each other.
I.e.:
For every Key in dataset_1, i.e.:
state 80 32768 tcp 192.168.101.5 192.168.101.89 for state syn gives timestamp 1466580661185059
and Key in dataset_2:
state 80 32768 tcp 192.168.101.5 192.168.101.89 for state established gives timestamp 1466537661727619
After subtracting timestamps:
1466580661185059-1466537661727619 = 42999457440
It could be that there is no record for a key in dataset_2. This is why sorting does not work (which is what all my tries were based to).
An exemplary try is (after having them sort which is not possible anymore):
dt_state1 <- subset(dt, state == 'established')
dt_state2 <- subset(dt, state == 'syn')
dt_delta_test <- data.table(x=(dt_state1$timestamp/1000)- (dt_state2$timestamp/1000),'timestamp'= dt_state1$timestamp-min(dt_state1$timestamp))
Update 1:
#lmo:
F1_in = as.data.table(read.csv(file=Filename, header=TRUE, sep=","))
keys=c("record","dstPort","srcPort","dstIP","srcIP")
state1 = 'syn'
state2 = 'established'
dt_state1 <- subset(F1_in, state == state2)
setkey(dt_state1, keys)
Error in setkeyv(x, cols, verbose = verbose, physical = physical) : some columns are not in the data.table: keys
dt_state2 <- subset(F1_in, state == state1)
setkey(dt_state2, keys)
Error in setkeyv(x, cols, verbose = verbose, physical = physical) : some columns are not in the data.table: keys
dt_state1[dt_state2, timestamp - i.timestamp]
Error in `[.data.table`(dt_state1, dt_state2, timestamp - i.timestamp) :
When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.
I don't know why this error occurs..
#toni057 Your solution does not change anything for me (I had to do some changes because it threw some errors). I tried the following code:
F1_in = as.data.table(read.csv(file=Filename, header=TRUE, sep=","))
keys=c("record","dstPort","srcPort","dstIP","srcIP")
state1 = 'syn'
state2 = 'established'
dt_state1 <- subset(F1_in, state == state2)
setkey(dt_state1, keys)
dt_state2 <- subset(F1_in, state == state1)
setkey(dt_state2, keys)
dt_state1 %>%
filter("state" == 'syn') %>%
left_join(filter(dt_state2, "state" == 'established'), by = keys) %>%
mutate(timestamp_diff = timestamp.x - timestamp.y)
I also changed the dt of the second filter. But there is no change in dt_state1 at all..

If your goal is to take differences of the timestamps between the two data.tables where they both share the same key, you could use left join, and then calculate the difference:
# get stuff set up
library(data.table)
# convert data.frames to data.tables by reference
setDT(dt_state1)
setDT(dt_state2)
# set keys
setkey(dt_state1, record, dstPort, srcPort, proto, dstIP, srcIP)
setkey(dt_state2, record, dstPort, srcPort, proto, dstIP, srcIP)
# perform left join and get timestamp difference
dt_state1[dt_state2, timestamp - i.timestamp]
[1] 42999457440 -17000122838 -4999842180 47999198168 -12000382110 -101452 NA
This performs a left join (which subsets the observations in dt_state1 to include only those present in dt_state2) and subtracts dt_state2's timestamp from dt_state1.
The first entry of the returned vector is the value you listed in your example.
data
dt_state1 <- read.table(header=T, text="
record dstPort srcPort proto dstIP srcIP state timestamp
1: state 80 32768 tcp 192.168.101.5 192.168.101.89 syn 1466580661185059
2: state 80 32768 tcp 192.168.101.5 192.168.101.89 syn_ack 1466520661604781
3: state 80 32768 tcp 192.168.101.5 192.168.101.89 close 1466532661885439
4: state 80 55555 tcp 192.168.101.5 192.168.101.89 syn 1466532661885440")
dt_state2 <- read.table(header=T, text="
record dstPort srcPort proto dstIP srcIP state timestamp
1: state 80 32768 tcp 192.168.101.5 192.168.101.89 established 1466537661727619
2: state 80 32768 tcp 192.168.101.5 192.168.101.89 close 1466532661986891
3: state 80 44444 tcp 192.168.101.5 192.168.101.89 established 1466537661727619")

library(dplyr)
dt_state1 %>%
filter(state == 'syn') %>%
left_join(filter(dt_state2, state == 'established), by = insert all you keys here) %>%
mutate(timestamp_diff = timestamp.x - timestamp.y)

Related

unable to set windows server 2019 MTU from Ubuntu by ICMP fragmentaion

I'm trying to set PMTU for windows server 2019(B) on ubuntu 20.0.4(A). In order to
check whether the set succeeds, I need to make the size of packet sent from B to A bigger
than the set PMTU(1300) but less than default PMTU(1500). I know that something like
'verify ping' will make the size of outbound packet same as inbound packet. So I can
send a packet with size 1400 with DF set and check whether the response is
fragmented to make sure the setting is successful. However, I
don't know the official name of it and how to create it with scapy.
OK, I figured that out. PING is enough for that. However, I can't set the MTU for windows server 2019 by scapy. This is the code I used in Ubuntu
import scapy.all as scapy
sip = '192.168.100.4'
dip = '192.168.100.5'
ip = scapy.IP()
icmp = scapy.ICMP()
ip.dst = dip
ip.src = sip
ip.protocol = 1 # ICMP
icmp.type = 3 # Destination Unreachable
# set ICMP code Fragmentation needed but DF set 4
icmp.code = 4 # Fragmentation needed
mtu = 1300
icmp.unused = mtu
# Construct the Inner IP embedded into the ICMP error message to simulate
# the packet which caused the ICMP error
ip_orig = scapy.IP()
# ip_orig.src = '10.10.10.2'
# ip_orig.dst = '10.10.10.1'
ip_orig.src = '192.168.100.5'
ip_orig.dst = '192.168.100.4'
udp_orig = scapy.UDP()
udp_orig.sport = 50000
udp_orig.dport = 50000
udp_orig1 = scapy.UDP()
udp_orig1.sport = 53
udp_orig1.dport = 631
# Send the packet
udp_orig.dport = 631
udp_orig.sport = 88
# scapy.send(ip/udp_orig)
scapy.send (ip/icmp/ip_orig/udp_orig1)
After run it, the PING sent back is still with length 1514.

Getting TCP Retransmission instead of ACK on TUN device

I'm trying to implement a TCP stack over TUN device according to RFC 793 in Linux. By default, my program is in the LISTEN state and is waiting for an SYN packet to establish a connection. I use nc to send an SYN:
$ nc 192.168.20.99 20
My program responds with SYN, ACK, but nc doesn't send an ACK at the end. This is the flow:
# tshark -i tun0 -z flow,tcp,network
1 0.000000000 192.168.20.1 → 192.168.20.99 TCP 60 39284 → 20 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=1691638570 TSecr=0 WS=128
2 0.000112185 192.168.20.99 → 192.168.20.1 TCP 40 20 → 39284 [SYN, ACK] Seq=0 Ack=1 Win=10 Len=0
3 1.001056784 192.168.20.1 → 192.168.20.99 TCP 60 [TCP Retransmission] [TCP Port numbers reused] 39284 → 20 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=1691639571 TSecr=0 WS=128
|Time | 192.168.20.1 |
| | | 192.168.20.99 |
|0.000000000| SYN | |Seq = 0
| |(39284) ------------------> (20) |
|0.000112185| SYN, ACK | |Seq = 0 Ack = 1
| |(39284) <------------------ (20) |
|1.001056784| SYN | |Seq = 0
| |(39284) ------------------> (20) |
More info about my TCP header:
Frame 2: 40 bytes on wire (320 bits), 40 bytes captured (320 bits) on interface tun0, id 0
Raw packet data
Internet Protocol Version 4, Src: 192.168.20.99, Dst: 192.168.20.1
Transmission Control Protocol, Src Port: 20, Dst Port: 39310, Seq: 0, Ack: 1, Len: 0
Source Port: 20
Destination Port: 39310
[Stream index: 0]
[Conversation completeness: Incomplete, CLIENT_ESTABLISHED (3)]
[TCP Segment Len: 0]
Sequence Number: 0 (relative sequence number)
Sequence Number (raw): 0
[Next Sequence Number: 1 (relative sequence number)]
Acknowledgment Number: 1 (relative ack number)
Acknowledgment number (raw): 645383655
0101 .... = Header Length: 20 bytes (5)
Flags: 0x012 (SYN, ACK)
Window: 10
[Calculated window size: 10]
Checksum: 0x99b0 [unverified]
[Checksum Status: Unverified]
Urgent Pointer: 0
NOTE: I'm aware of the ISN prediction attack, but this is just a test, and 0 for the sequence number is just as random as any other number in this case.
UPDATE: This is the output of tcpdump which says I'm calculating checksum wrong:
# tcpdump -i tun0 -vv -n
...
IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40, bad cksum 16f3 (->911b)!)
192.168.20.99.20 > 192.168.20.1.39308: Flags [S.], cksum 0x9bb0 (incorrect -> 0x1822), seq 0, ack 274285560, win 10, length 0
...
Here is my checksum calculator (From RFC 1071):
uint16_t checksum(void *addr, int count)
{
uint32_t sum = 0;
uint16_t *ptr = addr;
while (count > 1) {
sum += *ptr++;
count -= 2;
}
if (count > 0)
sum += *(uint8_t *)ptr;
while (sum >> 16)
sum = (sum & 0xffff) + (sum >> 16);
return ~sum;
}
And I'm passing the combination of pseudo-header with the TCP segment for TCP checksum. (in big-endian order):
uint16_t tcp_checksum(struct tcp_header *tcph, uint8_t *pseudo_header)
{
size_t len = PSEUDO_HEADER_SIZE + (tcph->data_offset * 4);
uint8_t combination[len];
memcpy(combination, pseudo_header, PSEUDO_HEADER_SIZE);
dump_tcp_header(tcph, combination, PSEUDO_HEADER_SIZE);
return checksum(combination, len / 2);
}
What am I doing wrong here?
Problem solved by calculating checksums via in_cksum.c from tcpdump source code, which is a line-by-line implementation of the RFC 1071. I also had to set IFF_NO_PI for the tun device. For this case, using a tap device instead of a tun device is probably a better choice to handle EtherType.

Malformed IP Scapy

My goal is to develop a script that can send IP packets to any host to any other host in a different subnet. Right now everything is seemingly working, except my IP packet is malformed so scapy cannot send it.
def sendIPMessage(interfaceName, dst_ip, routerIP, message):
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.bind(("", port))
src_addr = get_mac_address(interface=interfaceName)
my_ip = get_ip_address(interfaceName)
netmask = ipaddress.ip_address(dst_ip) in ipaddress.ip_network(my_ip)
if netmask is True: # if dst is in the same network
arp_MAC = sendArpMesage(interfaceName, dst_ip)
else:
arp_MAC = sendArpMesage(interfaceName, routerIP)
ether = Ether(src=str(src_addr), dst=str(arp_MAC))
print(ether.show())
size = len(message) + 14
ip = IP(src=my_ip, dst=dst_ip, proto=17, ihl=5, len=size, ttl=5, chksum=0)
#print(ip.show())
payload = Raw(message)
packet = ether / ip / msg
del packet[IP].chksum
packet = packet.__class__(bytes(packet)) # same as packet.show2()
print(packet.show())
success = send(packet)
if success is not None:
print(success.show)
else:
print("success is None")
Here is the show() information
Begin emission:
*Finished sending 1 packets.
Received 1 packets, got 1 answers, remaining 0 packets
###[ Ethernet ]###
dst = 4e:98:22:86:f6:75
src = 00:00:00:00:00:11
type = LOOP
None
###[ Ethernet ]###
dst = 4e:98:22:86:f6:75
src = 00:00:00:00:00:11
type = IPv4
###[ IP ]###
version = 4
ihl = 5
tos = 0x0
len = 28
id = 1
flags =
frag = 0
ttl = 5
proto = udp
chksum = 0xe9c2
src = 192.168.1.101
dst = 10.0.0.1
\options \
###[ UDP ]###
sport = 21608
dport = 26995
len = 8297
chksum = 0x7320
###[ Padding ]###
load = 'a test'
None
.
Sent 1 packets.
success is None
And this is what wireshark currently looks like
I am not sure if the problem is because the checksum values do not align, but any help creating this packet would be appreciated

R - subsetting by date

i'm trying to subset a large dataframe by date field ad facing strange behaviour:
1) find interesting time interval:
> ld[ld$bps>30000000,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1400199 2015-03-31 13:52:24 0.008 TCP 3.3.3.3 3128 4.4.4.4 65115 0 39 32507 32500000
1711899 2015-03-31 14:58:10 0.004 TCP 3.3.3.3 3128 4.4.4.7 49357 0 29 23830 47700000
2) and try to look whats happening on that second:
> ld[ld$Date.first.seen=="2015-03-31 13:52:24",]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1401732 2015-03-31 13:52:24 17.436 TCP 3.3.3.3 3128 6.6.6.6 51527 0 3 1608 737
don't really understand the behavior - i should get way more results.
for example
> ld[1399074,]
Date.first.seen Duration Proto Src.IP.Addr Src.Pt Dst.IP.Addr Dst.Pt Tos Packets Bytes bps
1399074 2015-03-31 13:52:24 0.152 TCP 10.10.10.10 3128 11.11.11.11 62375 0 8 3910 205789
for date i use POSIXlt
> str(ld)
'data.frame': 2657583 obs. of 11 variables:
$ Date.first.seen: POSIXlt, format: "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:00" "2015-03-31 06:00:01" ...
...
would appreciate any assistance. thanks!
POSIXlt may carry additional info which is supressed when printing the entire data.frame, timezone, daylight savings etc. Have a look at https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html.
Printing only the POSIXlt variable (ld$Date.first.seen) does generally supply at least some of this additional information.
If you're not for some particular reason required to keep your variable in the POSIXlt and if you don't need the extra functionality the format enables, a simple:
ld$Date.first.seen = as.character(ld$Date.first.seen)
Added before your subset statement will probably solve your problem.

How Convert hex ip from Java?

i used cat /proc/pid/net/udp6 and become:
sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
63: 00000000000000000000000000000000:D9BF 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 1000 0 181584 2 c16e8d00 0
I know how its structured and the 00000000000000000000000000000000:D9BFmust be local ip. How can I convert it to normal ip format like 127.0.0.1?
InetAddress a = InetAddress.getByAddress(DatatypeConverter.parseHexBinary("0A064156"));

Resources