I'm writing some code to convert an v4 ip stored in a string to a custom data type (a class with 4 integers in this case).
I was wondering if I should accept ips like the one I put in the title or only ips wiht no preceding zeros, let's see it with an example.
This two ips represent the same to us (humans) and for example windows network configuration accepts them:
192.56.2.1 and 192.056.2.01
But I was wondering if the second one is actually correct or not.
I mean, according to the RFC is the second ip valid?.
Thanks in advance.
Be careful, inet_addr(3) is one of Unix's standard API to translate a textual representation of IPv4 address into an internal representation, and it interprets 056 as an octal number:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html
All numbers supplied as parts in IPv4 dotted decimal notation may be decimal, octal, or hexadecimal, as specified in the ISO C standard (that is, a leading 0x or 0X implies hexadecimal; otherwise, a leading '0' implies octal; otherwise, the number is interpreted as decimal).
Its younger brothers like inet_ntop(3) and getaddrinfo(3) are all the same:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_ntop.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html
Summary
Although such textual representations of IP addresses like 192.056.2.01 might be valid on all platforms, different OS interpret them differently.
This would be enough reason for me to avoid such a way of textual representation.
Pros
In decimal numerotation 056 is equals to 56 so why not?
Cons
0XX format is commonly used to octal numerotation
Whatever your decisions just put it on your documentation and it will be ok :)
Defining if it is correct or not depends on your implementation.
As you mentioned windows OS considers it correct because it removes any leading zeros when it resolves the IP.
So if in your program you set an appropriate logic, e.g every subset of the IP stored in your 4 integer class, without the leading zeros, it will be correct for your case too.
Textual Representation of IPv4 and IPv6 Addresses is an “Internet-Draft”,
which, I guess, is like an RFC wanna-be.
(Also, it expired a decade ago, on 2005-08-23,
and, apparently, has not been reissued,
so it’s not even close to being official.)
Anyway, in Section 2: History it says,
The original IPv4 “dotted octet” format was never fully defined in any RFC,
so it is necessary to look at usage,
rather than merely find an authoritative definition,
to determine what the effective syntax was.
The first mention of dotted octets in the RFC series is …
four dot-separated parts, each of which consists of
“three digits representing an integer value in the range 0 through 255”.
A few months later, [[IPV4-NUMB][3]] …
used dotted decimal format, zero-filling each encoded octet to three digits.
⋮
Meanwhile,
a very popular implementation of IP networking went off in its own direction.
4.2BSD introduced a function inet_aton(), …
[which] allowed octal and hexadecimal in addition to decimal,
distinguishing these radices by using the C language syntax
involving a prefix “0” or “0x”, and allowed the numbers to be arbitrarily long.
The 4.2BSD inet_aton() has been widely copied and imitated,
and so is a de facto standard
for the textual representation of IPv4 addresses.
Nevertheless, these alternative syntaxes have now fallen out of use …
[and] All the forms except for decimal octets are seen as non-standard
(despite being quite widely interoperable) and undesirable.
So, even though [POSIX defines the behavior of inet_addr][4]
to interpret leading zero as octal (and leading “0x” as hex),
it may be safest to avoid it.
P.S. [RFC 790][3] has been obsoleted by [RFC 1700][5],
which uses decimal numbers of one, two, or three digits,
without leading zeroes.
[3]: https://www.rfc-editor.org/rfc/rfc790 "the "Assigned Numbers" RFC"
[4]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html
[5]: https://www.rfc-editor.org/rfc/rfc1700
Related
When using zero-compression on the following IPv6 address
2001:0DB8:0000:CD30:0000:0000:0000:0000/60
Why is this not correct:
2001:DB8::CD30::/60
... while this is:
2001:DB8:0:CD30::/60
Zero compression can only be made once. The reason for this is, that the IPv6 address is not unique any more otherwise.
Take your example 2001:DB8::CD30::/60 will it expand to
2001:0DB8:0000:0000:0000:CD30:0000:0000/60
or
2001:0DB8:0000:0000:CD30:0000:0000:0000/60
or
2001:0DB8:0000:CD30:0000:0000:0000:0000/60
...?
If only one "::" is used, the result will always be unique as there is only one possible fixed number of zeros to be inserted.
Because it is ambiguous.
The address 2001:DB8::CD30:: could be expanded in any of the following possibilities:
2001:DB8:0:CD30:0:0:0:0
2001:DB8:0:0:CD30:0:0:0
2001:DB8:0:0:0:CD30:0:0
2001:DB8:0:0:0:0:CD30:0
The reason is that :: is used to shorten multiple zeros in the 16-bit address field.
In your example 2001:0DB8:0000:CD30:0000:0000:0000:0000/60, it only has multiple 0s in the 16-bit field at the suffix, the 0000 in 2001:0DB8:0000:CD30: is just one 16-bit field and you'd just use 0 to shorten it.
More interesting question: How would you shorten this2001:0000:0000:CD30:0000:0000:0000:0000/60?
It is defined in the standard:
In addition, Section 2.2 of [RFC4291] notes,
'The "::" can only appear once in an address.'
What it means that the address can be written as either:
2001:0:0:CD30::/60 OR 2001::0:CD30:0:0:0:0/60.
Both are valid, but I'd prefer the first representation since the purpose of zeroco mpression is to shorten the address where the first representation is shorter.
How does the 68000 internally represent instructions.
I've read that there are different types of instructions: single effective operation word format instructions, brief and full extension word format instructions. The single effective operation word instruction seems to represent the instruction and the lower 6 bits of this instruction the addressing mode and register. Does this addressing mode and register tell you if there follows a brief or full extension word format instruction, which on his turn represents the operands for the instruction. Do you know a better manual than the 68000 programming reference manual.
Thanks in advance
The actual internal representation is a combination of "microcode" and "nanocode". The 68000 has 544 17-bit microcode words which dispaches to 366 68-bit nanocode words.
While this may not be what you wanted to know, this link may provide some insights:
http://www.easy68k.com/paulrsm/doc/dpbm68k1.htm
right, on m68000 indexed modes uses the brief extension. In "Address Register Indirect with Index (8-Bit Displacement) Mode" (d8, An, Xn), the BEW is filled with D/A (if Xn is a data or address register), Xn (the register number), W/L (to threat Xn contents as 16 or 32bits), scale to 0 (see note), and the 8-bit displacement.
on other hand, other modes, like the 16bit displacement, "Address with displacement" (d16,An) , the extension is only a word with the displacement.
the note is: brief extension word - m68k doesn't support the 2bits for scale so is set to 0; scale on BEW using the scale bits, and full extensions are only suported m68020,40,-> cpus. http://etd.dtu.dk/thesis/264182/bac10_19.pdf
has anybody some good ideas to compare two ipv6 addresses. It look like the shortage rules are making it complicated.
for instance the full address
1234:0db8:0000:0000:0000:ff00:ff00:0011
leading zero can be removed => 1234:0db8::::ff00:ff00:11
one group of empty fields can be removed 1234:0db8::ff00:ff00:00111
the last 32 bit can be an old fashioned ipv4 address 1234:0db8::::ff00:172.0.0.15
You can use the standard library function socket.inet_pton to convert the addresses into a byte string for comparison:
>>> socket.inet_pton(socket.AF_INET6,'1234:0db8::ff00:ff00:0011')
'\x124\r\xb8\x00\x00\x00\x00\x00\x00\xff\x00\xff\x00\x00\x11'
>>> socket.inet_pton(socket.AF_INET6,'1234:0db8:0000:0000:0000:ff00:ff00:0011')
'\x124\r\xb8\x00\x00\x00\x00\x00\x00\xff\x00\xff\x00\x00\x11'
This will reduce the risk of you creating your own IPv6 bug.
Example above is in python, but the inet_pton function is available on different platforms and languages:
http://msdn.microsoft.com/en-us/library/windows/desktop/cc805844(v=vs.85).aspx
http://man7.org/linux/man-pages/man3/inet_pton.3.html
You could just split it by colons and then compare each value.
If you encounter an empty field -> insert '0000' for it.
If you encounter a field with less than 4 digits -> fill it up with zeroes
Additionally you could give each of the fields a weight to emphasize the values of the fields.
I'm trying to get into assembler and I often come across numbers in the following form:
org 7c00h
; initialize the stack:
mov ax, 07c0h
mov ss, ax
mov sp, 03feh ; top of the stack.
7c00h, 07c0h, 03feh - What is the name of this number notation? What do they mean? Why are they used over "normal" decimal numbers?
It's hexadecimal, the numeral system with 16 digits 0-9 and A-F. Memory addresses are given in hex, because it's shorter, easier to read, and the numbers that represent memory locations don't mean anything special to humans, so no sense to have long numbers. I would guess that somewhere in the past someone had to type in some addresses by hand as well, might as well have started there.
Worth noting also, 0:7C00 is the boot sector load address.
Further worth noting: 07C0:03FE is the same address as 0:7FFE due to the way segmented addressing works.
This guy's left himself a 510 byte stack (he made the very typical off-by-two error in setting up the boot sector's stack).
These are numbers in hexadecimal notation, i.e. in base 16, where A to F have the digit values 10 to 15.
One advantage is that there is a more direct conversion to binary numbers. With a little bit of practice it is easy to see which bits in the number are 1 and which are 0.
Another is is that many numbers used internally, such as memory addresses, are round numbers in hexadecimal, i.e. contain a lot of zeros.
Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.
If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.
But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?
I would use The "URL and Filename safe" Base 64 Alphabet.
Base 64 uses two character sets.
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.
If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.
Depending on the language you are using, a lot of them have built in encode and decode functions.
You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.
If you want it tight you can use a base-36 encoding (from 0 to Z).
Using the hint of base36 I currently use something like this (in Python):
>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.