What's the difference between 0x040h and 0x04?
How this difference should be understood? And where/when each should be used?
I've already seen this Rules for when to append -h/-H to hex numbers in assembly?.
What's the difference between 0x040h and 0x04
The latter is an acceptable way to write 4 in a C program, the former is not. The 'h' is also redundant with '0x' prefix, which already implies hexadecimal in many (most?) programming languages.
Related
I had a question regarding whether two hexadecimal written differently are the same? For example, is 0xf9h and 0xf9 the same? I have heard from some people that they are completely different numbers. Aren't 0x and 0xh just different formats? Thanks.
I've never seen both an 0x prefix and an h suffix. But yes, they're both saying "this is a hexadecimal number". Nether x nor h are valid hex digits, so there is no ambiguity.
What characters are allowed in common lisp symbols? Can you give a regular expression to match them (or are they beyond the capable of regular grammars to describe)?
I have tried looking for information on this, but all I can find are some examples in CLHS, but no concrete definition of what exactly a legal symbol is.
Edit:
So, common lisp symbols can legally contain any character.
However, the parser doesn't just accept any character as it reads lisp code. What are the rules for parsable symbols? E.g. symbols that can be supplied as 'quoted symbols or inside of '(quoted lists).
I am interested in generating and reading non-bar-delimited symbols, from a non-lisp language. It should suffice, for my application, to use [a-zA-Z0-9:&-]+, but I tend to prefer to be as accurate as possible, which is why I am trying to determine if there is a regex that can match symbols. Matching the |delimited syntax| would be a bonus, but non-delimited symbols would suffice.
This needs to be symbols that would be loaded legally when using (read). The answer is not that symbols can contain any character:
[1]> (read t)
#
*** - READ from #<IO TERMINAL-STREAM>: objects printed as # in view of *PRINT-LEVEL* cannot be read back in
I want to know the rules, or a regex, for what is a valid symbol here, without delimiting it with |.
As sds mentioned, symbol names can contain any characters. Given any string, you can create a symbol with that name. However, based on your comments, it sounds like you're wonder what, under fairly default settings, will be read as a symbol. The answer is still "pretty much anything", with a few exceptions.
The relevant sections in the HyperSpec begin with 2.2 Reader Algorithm, which describes the tokenization process. It describes the process in detail, but perhaps the most important part is:
When dealing with tokens, the reader's basic function is to
distinguish representations of symbols from those of numbers. When a
token is accumulated, it is assumed to represent a number if it
satisfies the syntax for numbers listed in Figure 2-9. If it does not
represent a number, it is then assumed to be a potential number if it
satisfies the rules governing the syntax for a potential number. If a
valid token is neither a representation of a number nor a potential
number, it represents a symbol.
The Figure 2.9 mentioned in that except is in section 2.3.1 Numbers as Tokens, which says:
When a token is read, it is interpreted as a number or symbol. The token is interpreted as a number if it satisfies the syntax for numbers specified in the next figure.
So, the process is really "tokenize the stream, and for each token, check if it's a number, and if it's not a number, then it's a symbol." I realize this doesn't provide an a nice clean grammar for symbols, but that's just the way the language is defined. If you sit down to the task of writing a tokenizer and reader for a Lisp, you may find that this is a pretty convenient way of going about it. You pretty much just need to recognize which characters terminate a symbol, which characters start and end lists, what gets eliminated as whitespace, and what your escape characters are. Then you read nested lists of tokens, turning each token into a number or a symbol (or a string, etc.).
Perhaps one of the easiest ways to see why you have to do this in terms of tokenization and then checking for numbers is the fact that Common Lisp has a *read-base*variable that controls the base. Depending on the value of *read-base*, some things are numbers or symbols, and you can't know until you know what the complete token is, and what the current state of the runtime is.
CL-USER> 'beef
BEEF
CL-USER> (setf *read-base* 16)
16
CL-USER> 'beef
48879
CL-USER> (setf *read-base* a) ; set it back to 10, which is now a
10
CL-USER> (setf *read-base* 36)
36
CL-USER> 'hello ; a number
29234652
CL-USER> 'hello\ world ; a symbol
|HELLO WORLD|
Any character can be in a symbol. E.g.:
(length (loop for i to char-code-limit
collect (intern (string (code-char i)))))
==> 1114113
I'm writing some code to convert an v4 ip stored in a string to a custom data type (a class with 4 integers in this case).
I was wondering if I should accept ips like the one I put in the title or only ips wiht no preceding zeros, let's see it with an example.
This two ips represent the same to us (humans) and for example windows network configuration accepts them:
192.56.2.1 and 192.056.2.01
But I was wondering if the second one is actually correct or not.
I mean, according to the RFC is the second ip valid?.
Thanks in advance.
Be careful, inet_addr(3) is one of Unix's standard API to translate a textual representation of IPv4 address into an internal representation, and it interprets 056 as an octal number:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html
All numbers supplied as parts in IPv4 dotted decimal notation may be decimal, octal, or hexadecimal, as specified in the ISO C standard (that is, a leading 0x or 0X implies hexadecimal; otherwise, a leading '0' implies octal; otherwise, the number is interpreted as decimal).
Its younger brothers like inet_ntop(3) and getaddrinfo(3) are all the same:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_ntop.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html
Summary
Although such textual representations of IP addresses like 192.056.2.01 might be valid on all platforms, different OS interpret them differently.
This would be enough reason for me to avoid such a way of textual representation.
Pros
In decimal numerotation 056 is equals to 56 so why not?
Cons
0XX format is commonly used to octal numerotation
Whatever your decisions just put it on your documentation and it will be ok :)
Defining if it is correct or not depends on your implementation.
As you mentioned windows OS considers it correct because it removes any leading zeros when it resolves the IP.
So if in your program you set an appropriate logic, e.g every subset of the IP stored in your 4 integer class, without the leading zeros, it will be correct for your case too.
Textual Representation of IPv4 and IPv6 Addresses is an “Internet-Draft”,
which, I guess, is like an RFC wanna-be.
(Also, it expired a decade ago, on 2005-08-23,
and, apparently, has not been reissued,
so it’s not even close to being official.)
Anyway, in Section 2: History it says,
The original IPv4 “dotted octet” format was never fully defined in any RFC,
so it is necessary to look at usage,
rather than merely find an authoritative definition,
to determine what the effective syntax was.
The first mention of dotted octets in the RFC series is …
four dot-separated parts, each of which consists of
“three digits representing an integer value in the range 0 through 255”.
A few months later, [[IPV4-NUMB][3]] …
used dotted decimal format, zero-filling each encoded octet to three digits.
⋮
Meanwhile,
a very popular implementation of IP networking went off in its own direction.
4.2BSD introduced a function inet_aton(), …
[which] allowed octal and hexadecimal in addition to decimal,
distinguishing these radices by using the C language syntax
involving a prefix “0” or “0x”, and allowed the numbers to be arbitrarily long.
The 4.2BSD inet_aton() has been widely copied and imitated,
and so is a de facto standard
for the textual representation of IPv4 addresses.
Nevertheless, these alternative syntaxes have now fallen out of use …
[and] All the forms except for decimal octets are seen as non-standard
(despite being quite widely interoperable) and undesirable.
So, even though [POSIX defines the behavior of inet_addr][4]
to interpret leading zero as octal (and leading “0x” as hex),
it may be safest to avoid it.
P.S. [RFC 790][3] has been obsoleted by [RFC 1700][5],
which uses decimal numbers of one, two, or three digits,
without leading zeroes.
[3]: https://www.rfc-editor.org/rfc/rfc790 "the "Assigned Numbers" RFC"
[4]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/inet_addr.html
[5]: https://www.rfc-editor.org/rfc/rfc1700
This is something I have been thinking while reading programming books and in computer science class at school where we learned how to convert decimal values into hexadecimal.
Can someone please tell me what are the advantages of using hexadecimal values and why we use them in programmnig?
Thank you.
In many cases (like e.g. bit masks) you need to use binary, but binary is hard to read because of its length. Since hexadecimal values can be much easier translated to/from binary than decimals, you could look at hex values as kind of shorthand notation for binary values.
It certainly depends on what you're doing.
It comes as an extension of base 2, which you probably are familiar with as essential to computing.
Check this out for a good discussion of
several applications...
https://softwareengineering.stackexchange.com/questions/170440/why-use-other-number-bases-when-programming/
The hexadecimal digit corresponds 1:1 to a given pattern of 4 bits. With experience, you can map them from memory. E.g. 0x8 = 1000, 0xF = 1111, correspondingly, 0x8F = 10001111.
This is a convenient shorthand where the bit patterns do matter, e.g. in bit maps or when working with i/o ports. To visualize the bit pattern for 169d is in comparison more difficult.
A byte consists of 8 binary digits and is the smallest piece of data that computers normally work with. All other variables a computer works with are constructed from bytes. For example; a single character can be stored in a single byte, and a 32bit integer consists of 4 bytes.
As bytes are so fundamental we want a way to write down their value as neatly and efficiently as possible. One option would be to use binary, but then we would need a lot of digits. This takes up a lot of space and can be confusing when many numbers are written in sequence:
200 201 202 == 11001000 11001001 11001010
Using hexadecimal notation, we can write every byte using just two digits:
200 == C8
Also, as 16 is a power of 2, it is easy to convert between hexadecimal and binary representations in your head. This is useful as sometimes we are only interested in a single bit within the byte. As a simple example, if the first digit of a hexadecimal representation is 0 we know that the first four binary digits are 0.
I'm trying to get into assembler and I often come across numbers in the following form:
org 7c00h
; initialize the stack:
mov ax, 07c0h
mov ss, ax
mov sp, 03feh ; top of the stack.
7c00h, 07c0h, 03feh - What is the name of this number notation? What do they mean? Why are they used over "normal" decimal numbers?
It's hexadecimal, the numeral system with 16 digits 0-9 and A-F. Memory addresses are given in hex, because it's shorter, easier to read, and the numbers that represent memory locations don't mean anything special to humans, so no sense to have long numbers. I would guess that somewhere in the past someone had to type in some addresses by hand as well, might as well have started there.
Worth noting also, 0:7C00 is the boot sector load address.
Further worth noting: 07C0:03FE is the same address as 0:7FFE due to the way segmented addressing works.
This guy's left himself a 510 byte stack (he made the very typical off-by-two error in setting up the boot sector's stack).
These are numbers in hexadecimal notation, i.e. in base 16, where A to F have the digit values 10 to 15.
One advantage is that there is a more direct conversion to binary numbers. With a little bit of practice it is easy to see which bits in the number are 1 and which are 0.
Another is is that many numbers used internally, such as memory addresses, are round numbers in hexadecimal, i.e. contain a lot of zeros.