Let's take this table with characters and HEX encodings in Unicode and UTF-8.
Does anyone know how it is possible to convert UTF-8 hex to Unicode code point using only math operations?
E.g. let's take the first row. Given 227, 129 130 how to get 12354?
Is there any simple way to do it by using only math operations?
Unicode code point
UTF-8
Char
30 42 (12354)
e3 (227) 81 (129) 82 (130)
あ
30 44 (12356)
e3 (227) 81 (129) 84 (132)
い
30 46 (12358)
e3 (227) 81 (129) 86 (134)
う
* Source: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=12288&unicodeinhtml=hex
This video is the perfect source (watch from 6:15), but here is its summary and code sample in golang. With letters I mark bits taken from UTF-8 bytes, hopefully it makes sense. When you understand the logic it's easy to apply bitwise operators):
Bytes
Char
UTF-8 bytes
Unicode code point
Explanation
1-byte (ASCII)
E
1. 0xxx xxxx0100 0101 or 0x45
1. 0xxx xxxx0100 0101 or U+0045
no conversion needed, the same value in UTF-8 and unicode code point
2-byte
Ê
1. 110x xxxx2. 10yy yyyy1100 0011 1000 1010 or 0xC38A
0xxx xxyy yyyy0000 1100 1010 or U+00CA
1. First 5 bits of the 1st byte2. First 6 bits of the 2nd byte
3-byte
あ
1. 1110 xxxx2. 10yy yyyy3. 10zz zzzz1110 0011 1000 0001 1000 0010 or 0xE38182
xxxx yyyy yyzz zzzz0011 0000 0100 0010 or U+3042
1. First 4 bits of the 1st byte2. First 6 bits of the 2nd byte3. First 6 bits of the 3rd byte
4-byte
𐄟
1. 1111 0xxx2. 10yy yyyy3. 10zz zzzz4. 10ww wwww1111 0000 1001 0000 1000 0100 1001 1111 or 0xF090_849F
000x xxyy yyyy zzzz zzww wwww0000 0001 0000 0001 0001 1111 or U+1011F
1. First 3 bits of the 1st byte2. First 6 bits of the 2nd byte3. First 6 bits of the 3rd byte4. First 6 bits of the 4th byte
2-byte UTF-8
func get(byte1 byte, byte2 byte) {
int1 := uint16(byte1 & 0b_0001_1111) << 6
int2 := uint16(byte2 & 0b_0011_111)
return rune(int1 + int2)
}
3-byte UTF-8
func get(byte1 byte, byte2 byte, byte3 byte) {
int1 := uint16(byte1 & 0b_0000_1111) << 12
int2 := uint16(byte2 & 0b_0011_111) << 6
int3 := uint16(byte3 & 0b_0011_111)
return rune(int1 + int2 + int3)
}
4-byte UTF-8
func get(byte1 byte, byte2 byte, byte3 byt3, byte4 byte) {
int1 := uint(byte1 & 0b_0000_1111) << 18
int2 := uint(byte2 & 0b_0011_111) << 12
int3 := uint(byte3 & 0b_0011_111) << 6
int4 := uint(byte4 & 0b_0011_111)
return rune(int1 + int2 + int3 + int4)
}
Block - 130.0.0.0/25 .
Want to create 8 subnet.
Binary Form of Block will be
1000 0010.0000 0000.0000 0000.0000 0000/25
Subnet mask would be
1111 1111.1111 1111.1111 1111.1000 0000
How will I make 8 subnetwork out of this?
You can handle 2 ^ 7 = 128 IP's (binary form: 000 0000 - 111 1111). Each subnetwork will have 128 / 8 = 16 IP's. So the subnetworks will look as follows.
130.0.0.0 - 130.0.0.15
130.0.0.16 - 130.0.0.31
130.0.0.32 - 130.0.0.47
130.0.0.48 - 130.0.0.63
130.0.0.64 - 130.0.0.79
130.0.0.80 - 130.0.0.95
130.0.0.96 - 130.0.0.111
130.0.0.112 - 130.0.0.127
Write a program to swap odd and even bits in an integer.
For exp, bit 0 and bit 1 are swapped, bit 2 and bit 3 are swapped.
The solution uses 0xaaaaaaaa and 0x55555555.
Can I know what does 0xaaaaaaaa and 0x55555555 means in binary number?
Each four bits constitutes a hex digit thus:
0000 0 1000 8
0001 1 1001 9
0010 2 1010 A
0011 3 1011 B
0100 4 1100 C
0101 5 1101 D
0110 6 1110 E
0111 7 1111 F
So, for example, 0x1234 would be 0001 0010 0011 01002.
For your specific examples:
0xaaaaaaaa = 1010 1010 ... 1010
0x55555555 = 0101 0101 ... 0101
The reason why a solution might use those two values is that, if you AND a value with 0xaaaaaaaa, you'll get only the odd bits (counting from the left), which you can then shift right to move them to the even bit positions.
Similarly, if you AND a value with 0x55555555, you'll get only the even bits, which you can then shift left to move them to the odd bit positions.
Then you can just OR those two values together and the bits have been swapped.
For example, let's start with the 16-bit value abcdefghijklmnop (each letter being a bit and with a zero bit being . to make it more readable):
abcdefghijklmnop abcdefghijklmnop
AND 1.1.1.1.1.1.1.1. AND .1.1.1.1.1.1.1.1
= a.c.e.g.i.k.m.o. = .b.d.f.h.j.l.n.p
>>1 = .a.c.e.g.i.k.m.o <<1 = b.d.f.h.j.l.n.p.
\___________ ___________/
\ /
.a.c.e.g.i.k.m.o
OR b.d.f.h.j.l.n.p.
= badcfehgjilknmpo
So each group of two bits has been swapped around. In C, that would be something like:
val = ((val & 0xAAAAAAAA) >> 1) | ((val & 0x55555555) << 1);
but, if this is classwork of some description, I'd suggest you work it out yourself by doing individual operations.
For an in-depth explanation of the bitwise operators that allow you to do this, see this excellent answer here.
I' ve been trying all day to find out how to devide a network into 4 networks:
I got the IP: 195.232.176.0 /20
So by converting the IP into the following formation
[octal 1] . [octal 2] . [octal 3] . [octal 4]
11000011 . 11101000 . 10110000 . 00000000
By setting 1 in the first 20bits starting counting from the left I have:
==================20bits==============
11111111 . 11111111 . 1111 0000 . 00000000 = 225.225.240.0 (Subnet Mask)
So the four networks are :
Starting IP Address: 195.232.176.0 /20
195.232.x.0 /20
195.232.x.0 /20
195.232.x.0 /20
195.232.x.0 /20
Are the 4 networks correct?? My problem is on how will I put the x numbers with the data I have found!
It's easy peasy once you know subnetting.
Let's have a closer look at the IP 195.232.176.0 /20. What is the full range?
195.232.176.0 to 195.232.191.255
So to divide the 195.232.176.0 /20 up into 4 subnets, you'll need a /22 mask for each subnet.
195.232.176.0/22 (195.232.176.0 to 195.232.179.255)
195.232.180.0/22 (195.232.180.0 to 195.232.183.255)
195.232.184.0/22 (195.232.184.0 to 195.232.187.255)
195.232.188.0/22 (195.232.188.0 to 195.232.191.255)
Trying to deconstruct this TCPdump BPF style filter, and need some help:
'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'
Its taken from here
Steps that have taken to better understand what is going on:
1. Lets convert the 0x47455420 to ascii
===> GET
===> tcp[((tcp[12:1] & 0xf0) >> 2):4] = GET
2. Examine the inner tcp filter: (tcp[12:1] & 0xf0)
===> the 0xf0 == 0000 0000 1111 0000 ===> I suppose it is save to discard the upper zeros so I can write 1111 0000
===> tcp[12:1] == 08 (start filtering from byte 13 (0 based indexing, so you could also say start with the byte that has index 12) for 1 byte, so only 13th byte);
===> 08 == 0000 1000
===> 0000 1000 & 1111 0000 == 0000 (bitwise and = if both are 1 then end result is one)
This is where I got confused. The explanation in the hyperlink I provided above is saying
multiply it by four ( (tcp[12:1] & 0xf0)>>2 ) which should give the tcp header length
Impossible if it is zero. Please:
help me find the mistake in my calculations (maybe I'm mixing TCP and IP headers?);
provide some guidance whether my logic is correct.
This is the packet:
19:10:30.091065 IP (tos 0x0, ttl 63, id 40127, offset 0, flags [DF], proto TCP (6), length 2786)
10.240.35.81.47856 > 172.17.13.201.8080: Flags [P.], cksum 0xf2ef (incorrect -> 0xb8f8), seq 2263020471:2263023205, ack 4187927811, win 28, options [nop,nop,TS val 1906863883 ecr 214445688], length 2734
0x0000: 1a17 8e8a a3a0 026d 627d 049c 0800 4500 .......mb}....E.
0,1 2,3 ... ... ... ... 12,13 ... <=== byte indexes
1,2 3,4 ... ... ... ... 13,14 ... <=== counting how many bytes
0x0010: 0ae2 9cbf 4000 3f06 ac3b 0af0 2351 ac11 ....#.?..;..#Q.. <=== 0x0010 number correctly identifies that the first two diggits are the 16th byte
16,17 ... ...
0x0020: 0dc9 baf0 1f90 86e2 f3b7 f99e b503 8018 ................
0x0030: 001c f2ef 0000 0101 080a 71a8 6f0b 0cc8 ..........q.o...
0x0040: 2e78 4745 5420 2f69 636f 6e73 2f75 6e6b .xGET./icons/unk
0x0050: 6e6f 776e 2e67 6966 2048 5454 502f 312e nown.gif.HTTP/1.
0x0060: 310d 0a68 6f73 743a 2070 6870 2d6d 696e 1..host:.php-min
tcp[12:1] is the byte at an offset of 12 bytes from the beginning of the TCP header; the 12 is not the offset from the beginning of the packet, it's the offset from the beginning of the TCP header (it's tcp[12:1], not ether[12:1] or something such as that). The "1" is the number of bytes being referred to.
According to RFC 793, which is the specification for TCP, the byte at an offset of 12 bytes from the beginning of the TCP header contains the data offset in the upper 4 bits and the lower 4 bits are reserved bits. The data offset is "The number of 32 bit words in the TCP Header", which "indicates where the data begins".
The data in the packet is being displayed as a sequence of byte pairs. It's a bit easier to understand if presented as a sequence of individual bytes, so:
0x0000: 1a 17 8e 8a a3 a0 02 6d 62 7d 04 9c 08 00 45 00
eth dest eth src etype IP hdr
The first 6 bytes of the packet are the Ethernet destination address.
The next 6 bytes of the packet are the Ethernet source address.
The 2 bytes after that are the Ethernet type value; it's big-endian, so its value is 0x0800, which is the Ethernet type value for IPv4.
The next 2 bytes are the first 2 bytes of the IPv4 header. According to RFC 791, which is the specification for IPv4, the first byte of the IPv4 header contains the IP version in the upper 4 bits and the header length in the lower 4 bits. That byte has a value of 0x45, so the IP version is 4 (as it should be, for IPv4) and the header length is 5. The header length "is the length of the internet header in 32 bit words", so that's 5 32-bit words, or 20 bytes.
So, for now, let's skip the IPv4 header and go to the TCP header:
0x0020: 0d c9 ba f0 1f 90 86 e2 f3 b7 f9 9e b5 03 80 18
TCP header 12 13
So byte 12 of the TCP header is 0x80. 0x80 & 0xf0 is 0x80, and 0x80 >> 2 is 0x20, which is 32; this is consistent with the upper 4 bits of that byte being the data offset, in 32-bit words, as 8*4 = 32.
tcp[((tcp[12:1] & 0xf0) >> 2):4] is thus, for this packet, tcp[32:4], i.e. the 4 bytes at an offset of 32 from the beginning of the TCP header.
32 bytes from the beginning of the TCP header is:
0x0040: 2e78 4745 5420 2f69 636f 6e73 2f75 6e6b
^
there, and that's the "GET" header of the HTTP request, beginning at the beginning of the TCP segment data. Te 4 bytes in question are "GET ".
So the 12 in tcp[12:1] is not the offset from the beginning of the packet, it's the offset from the beginning of the TCP header (it's tcp[12:1], not ether[12:1] or something such as that).
And, in answer to the questions about the bytes of the packet and what they are:
0x0000: 1a 17 8e 8a a3 a0: Ethernet destination
02 6d 62 7d 04 9c: Ethernet source
08 00: Ethernet type/length field - 0x0800 = IPv4
So the first 14 (0x000e) bytes of the packet are the Ethernet header.
In this packet, the Ethernet type/length field is 0x0800, so the Ethernet payload, following the Ethernet header, is an IPv4 packet, beginning with an IPv4 header:
45: IPv4 version/header length
00: IPv4 Type of Service/Differentiated Service
0x0010: 0a e2: IPv4 total length
9c bf: IPv4 identification
40 00: IPv4 flags/fragment offset
3f: IPv4 time-to-live
06: IPv4 (next) protocol - 6 = TCP
ac 3b: IPv4 header checksum
0a f0 23 51: IPv4 source address
ac 11: first 2 bytes of IPv4 destination address
0x0020: 0d c9: second 2 bytes of IPv4 destination address
The IPv4 header length is 5, so the IPv4 header is 20 bytes. That's the minimum IPv4 header length; it can't be smaller, but it can be larger, if there are IPv4 options after the fixed-length part of the header. There aren't any, in this case.
As the protocol field has the value 6, the IPv4 payload is a TCP packet:
ba f0: TCP source port (47856)
1f 90: TCP destination port (8080)
86 e2 f3 b7: TCP sequence number
f9 9e b5 03: TCP acknowledgment number
80: TCP data offset + reserved bits
18: reserved bits + TCP flags
0x0030: 00 1c: TCP window
f2 ef: TCP checksum
00 00: TCP urgent pointer
That's the 20-byte fixed-length portion of the TCP header; however, the TCP header length is 32 bytes, so there are an additional 12 bytes of TCP options in the header:
01: TCP No-Operation option
01: TCP No-Operation option
08 0a 71 a8 6f 0b 0c c8: first 8 bytes of TCP Timestamp option
0x0040: 2e 78: last 2 bytes of TCP Timestamp option
A TCP header's length must be a multiple of 32 bits, i.e. a multiple of 4 bytes; TCP options might not be a multiple of 4 in length - the TCP Timestamp option is 10 bytes long - so the No-Operation option is used for padding.
So those 32 bytes were the TCP header; what follows is the TCP payload. Apparently, this is on an HTTP connection (the packet is being sent to port 8080, which is an alternate HTTP port), and this is the beginning of an HTTP GET request:
47 45 54 20 2f 69 63 6f 6e 73 2f 75 6e 6b
0x0050: 6e 6f 77 6e 2e 67 69 66 20 48 54 54 50 2f 31 2e
0x0060: 31 0d 0a 68 6f 73 74 3a 20 70 68 70 2d 6d 69 6e
So:
as this was captured either on an Ethernet or on a Wi-Fi network when not in monitor mode (or on some other type of network that either uses Ethernet headers or on which the adapter or driver supplies "fake Ethernet" headers, as with Wi-Fi), the packet will start with an Ethernet header;
as the Ethernet type value is 0x0800, it's followed by an IPv4 header;
as the IPv4 protocol value is 6, it's followed by a TCP header;
as one of the TCP port numbers is a port number typically used by HTTP (8080), it's probably followed by HTTP data of some sort (this isn't guaranteed, however - TCP port numbers are more like hints).
For ARP over the same network, you'll again have an Ethernet header (the ffff ffff is the Ethernet broadcast address, so the packet is being broadcast, as ARP requests usually are), with an Ethernet type of 0x0806, which is the Ethernet type value for ARP.
For ICMP over the same network, you'll again have an Ethernet header, and you'll also have an IPv4 header, so the Ethernet type will be 0x0800. The value in the protocol field in the IPv4 header will be 1, for ICMP.