Is there a standard for phone numbers? - standards

Before you say that this has already been asked, know that I've already reviewed these:
Is there a standard for storing normalized phone numbers in a database? - This is from 2008, and says that, at the time, there was no such standard. I'm hoping that something changed in the last 13 years.
How to validate phone numbers using regex - I already have the parse; it's quite easy: If it's not a digit, skip it. This question is not about the parser, but the format in which I save/display it. I am not worried about how hard it is to parse, but whether it's in standard format.
Say I'm working on a program that has to deal with phone numbers, and I want to make sure that they're saved and displayed in a standard format, so other programs and humans can also understand them predictably & consistently.
For instance, I've seen the following all be valid representations for the same US phone number:
1234567
123-4567
123 4567
5551234567
(555) 1234567
555-1234567
555 123 4567
555-123-4567
(555)-123-4567
(555) 123-4567
(5) 123 4567
1-555-123-4567
(1) 555-123-4567
+1 555-123-4567
+1 555 123-4567
+1 (555) 123-4567
Ad nauseum…
And then different countries represent numbers in different ways:
55 1234 567 8901
55 12 3456 7890
55 123 456 7890
55 1234 567890
555 123 456
(55) 123 4567
5.555.123-45-67
Ad nauseum…
As you can see, the number of ways a user can see a valid phone number is nearly infinite (The Wikipedia page for Telephone numbers in the UK is 26 printer pages long). I want all the numbers in my database and on the screen to be in a universally-recognizable format. As far as I can tell, ISO and ANSI have no defined format. Is there any standard notation for phone numbers?

There's no ISO standard, but there are ITU standards. You want E.123 and E.164.
In summary, any phone number is represented by +CC MMMMMM... where CC is the country code, and is one to three digits, and MMMMMM... is the area code (where applicable) and subscriber number. The total number of digits may not be more than 15. The + means "your local international dialling prefix".
I'll give a few examples using my own land line number.
So, for example, if you are in Germany, the number +44 2087712924 would be dialled as 00442087712924, and in the US you would dial it as 011442087712924. The 44 means that it is a UK number, and 2087712924 is the local part.
In practice, the long string of MMMMM... is normally broken up into smaller parts to make it easier to read. How you do that is country-specific. The example I give would normally be written +44 20 8771 2924.
As well as the unambiguous E.123 representation above, which you can use from anywhere in the world that allows international dialling, each country also has its own local method of representing numbers, and some have several. The example number will sometimes be written as 020 8771 2924 or (020) 8771 2924. The leading 0 is, strictly speaking, not part of the area code (that's 20) but a signal to the exchange meaning "here comes a number that could go outside the local area". Very occasionally the area code will be ommitted and the number will be written 8771 2924. All these local representations are ambiguous, as they could represent valid numbers in more than one country, or even valid numbers in more than one part of the same country. This means that you should always store a number with its country code, and ideally store it in E.123 notation. In particular you should note that phone numbers ARE NOT NUMBERS. A number like 05 is the same as 5. A phone number 05 is not the same as 5, and storage systems will strip leading zeroes from numbers. Store phone numbers as CHAR or VARCHAR in your database.
Finally, some oddities. The example number will be written by some stupid people as 0208 771 2924. This is diallable, but if you strip off the leading 0208 assuming that it is an area code, then the remainder is not valid as a local number. And some countries with broken phone systems [glares at North America] have utterly bonkers systems where in some places you must dial all 10 digits for a local call, some where you must not, some where you must dial 1NNN NNN NNNN, some where you must not include the leading one, and so on. In all such cases, storing the number as +CC MMMMM... is correct. It is up to someone actually making a call (or their dialling software) to figure out how to translate that into a dialable sequence of digits for their particular location.

There is a lot of local countries standards. On one of my projects I had the same problem. Solved as:
In DB everything stored as numbers: 123456789
Depending on selected web page language, this number pre-formatted when page loads.
Examples:
France, Luxemburg format phone numbers like 12.34.56.78.90 or 12 34
56 78 90
Germany: 123 45 67 or 123-45-67
Great Britain: 020 1234 1234 or 1234 01234 12345
Italy, Netherlands: 12 1234567 or 020-1234567
CIS countries: 123 123-45-67

Related

How to access pointers in SUBLEQ

I've recently started to learn about SUBLEQ One Instruction Set Computers and am currently trying to write a simple assembler for a SUBLEQ emulator i wrote. So far I've implemented DB, MOV, INC and DEC, but I am struggling a bit with the MOV instruction, when it has pointers as arguments.
For example: MOV 20, 21 to move data from address 21 to address 20 in SUBLEQ looks like this (assuming address 100 is zero and the program starts at address zero):
sble 20 20 3
sble 21 100 6
sble 100 20 9
The content at the target address is zeroed and the content at the source address is added to the destination by subtracting it two times.
Now to my problem: If one argument is a pointer, for example MOV 20, [21] so that the contents of address 21 are pointing to the real data I want to to copy to address 20, how can that be represented using SUBLEQ?
I'll start off by saying I know very little about how subleq is used in practice, so take this answer with a grain of salt:
One downside of subleq is that it is notoriously difficult to use pointers, but it can edit its own code. This means that you will have to use the code to rewrite the address being looked at with the value at 21.
for example, if you somehow got the code to go to a line appended after your current code you could use this:
# (I used the quotes to mean next line)
sble 3 3 " # set the first value of the second instruction to 0
sble 21 101 " # put the value of 21 into an unused address
sble 101 3 " # subtract the value of 101 and put it back into the code
sble 3 3 " # reset the value at 3 to 0
it might be a good decision to have to have a movp (move-pointer) command, so you don't accidentally mess up your mov command code at runtime
this means that a new method of using pointers has to be thought of differently for every problem someone comes across, but will usually be done by editing the code with the code

Determine user location based latitude-longitude

I am planing to do a system that allow user to enter a 10 values of (digits, Characters) then I can determine his location.
I would to do some mathematics stuff or anythings that allow me to convert the (latitude-longitude) to one string (digits, Characters).
Is it possible to do that if yes please give me hint how I can do it!
thanks
At a code length of 10 characters, an Open Location Code (a.k.a. “Plus Code”) gives about 14m of resolution. Usually you'd have a + between the first 8 and the last 2 characters, but you can infer that. You can type and find these codes easily in Google Maps.
Geohash uses base 32 instead of base 20, so each character provides more information. 8 characters there already give you 19m resolution, the way I read Wikipedia. There is a chance you'd accidentially have obscenities in your code, though, which other codes try harder to avoid.
Geohash-36 uses 36 base characters, and avoids vowels (to prevent obscenities), but relies on character case. Wikipedia gives the accuracy of 10 characters as ⅙m.
All of these are well documented and probably have freely accessible reference implementations, too. You can also read about the design principles behind these.

How can I encode 0000 to 11110 in 4B/5B encoding scheme

From the 4B/5B encoding scheme dataward 0000 in encoded to 11110 codeword similarly 0001 is encoded to 01001 etc.
Here the result of XOR operation between two codewords will be another valid codeword.
For example XOR of 11110 and 01001 is another codeword 10111 whose dataword is 1011.Here I have no problem.
Again, to avoid dc component NRZ-I line coding scheme is used. As a result there is not three consecutive Zero's in the output codewords.
There is no more one heading and two tailing zero's in codewords. We have no worry about the number of one's in NRZ-I coding scheme.
But, how can I encode 0000 to 11110 or 0001 to 01001 and which
algorithm I should apply for this encoding scheme.
I search google and study books too. But everywhere they are telling only the same thing but I did not get my answer.
Thanks in advance
Decimal Representation
To understand this mechanism properly we should consider all codewords’ decimal value. Observe the above table carefully I converted all binary value of your table to decimal form.
Now to avoid dc component during transmission we should consider only the codewords which don’t have more than one starting and two tailing zeros .
So we get every two consecutive datawords are assigned to another two consecutive codewords.
Like this
(2,3) to (20,21),
(4,5) to (10,11)
(6,7) to (14,15)
(8,9) to (18,19)
(10,11) to (22,23)
(12, 13) to (26,27)
(14,15) to (28,29)
Exception
(0,1) to (30,9)
1 is assigned to 9 because all codewords from 0 to 8 (inclusive) are invalid because of having excessive zero . So first valid codeword 9 is assigned to 1.
If all valid codewords are assigned consecutively then changing only one bit (single bit error) during transmission it will convert to next or previous codeword and this error will remain undetected.
We know that in block coding if a valid codeword is convert to another valid codeword during transmission as a result of error , it will remain undetected and this a limitation of block coding. So to avoid this these all valid codewords are not consecutively assigned with datawords.

How to extract first 20 bits of Hexadecimal address?

I have the following hexadecimal 32 bit virtual address address: 0x274201
How can I extract the first 20 bits, then convert them to decimal?
I wanted to know how to do this by hand.
Update:
#Pete855217 pointed out that the address 0x274201 is not 32 bit.
Also 0x is not part of the address as it is used to signify
a hexadecimal address.
Which suggests that I will add 00 after 0X, so now a true 32 bit address would be: 0x00274201. I have updated my answer!
I believe I have answered my own question and I hope I am correct?
First convert HEX number 0x00274201 to BIN (this is the long way but I learned something from this):
However, I noticed the first 20 bits include 00274 in HEX. Which makes sense because every HEX digit is four BIN digits.
So, since I wanted the first 20 bits, then I am really asking for the
first five HEX digits because 5 * 4 = 20 (bits in BIN)
Thus this will yield 00274 in HEX = 628 in DEC (decimal).

How is overflow detected at the binary level?

I'm reading the textbook Computer Organization And Design by Hennessey and Patterson (4th edition). On page 225 they describe how overflow is detected in signed, 2's complement arithmetic. I just can't even understand what they're talking about.
"How do we detect [overflow] when it does occur? Clearly, adding or
substracting two 32-bit numbers can yield a result that needs 33 bits
to be fully expressed."
Sure. And it won't need 34 bits because even the smallest 34 bit number is twice the smallest 33 bit number, and we're adding 32 bit numbers.
"The lack of a 33rd bit means that when overflow occurs, the sign bit
is set with the value of the result instead of the proper sign of
the result."
What does this mean? The sign bit is set with the "value" of the result... meaning it's set as if the result were unsigned? And if so, how does that follow from the lack of a 33rd bit?
"Since we need just one extra bit, only the sign bit can be wrong."
And that's where they lost me completely.
What I'm getting from this is that, when adding signed numbers, there's an overflow if and only if the sign bit is wrong. So if you add two positives and get a negative, or if you add two negatives and get a positive. But I don't understand their explanation.
Also, this only applies to unsigned numbers, right? If you're adding signed numbers, surely detecting overflow is much simpler. If the last half-adder of the ALU sets its carry bit, there's an overflow.
note: I really don't know what tags are appropriate here, feel free to edit them.
Any time you want to deal with these kind of ALU items be it add, subtract, multiply, etc, start with 2 or 3 bit numbers, much easier to get a handle on than 32 or 64 bit numbers. After 2 or 3 bits it doesn't matter if it is 22 or 2200 bits it all works exactly the same from there on out. Basically you can by hand if you want make a table of all 3 bit operands and their results such that you can examine the whole table visually, but a table of all 32 bit operands against all 32 bit operands and their results, can't do that by hand in a reasonable time and cannot examine the whole table visually.
Now twos complement, that is just a scheme for representing positive and negative numbers, and it is not some arbitrary thing it has a reason, the reason for the madness is that your adder logic (which is also what the subtractor uses which is the same kind of thing the multiplier uses) DOES NOT CARE ABOUT UNSIGNED OR SIGNED. It does not know the difference. YOU the programmer cares in my three bit world the bit pattern 0b111 could be a positive seven (+7) or it could be a negative one. Same bit pattern, feed it to the add logic and the same thing comes out, and the answer that comes out I can choose to interpret as unsigned or twos complement (so long as I interpret the operands and the result all as either unsigned or all as twos complement). Twos complement also has the feature that for negative numbers the most significant bit (msbit) is set, for positive numbers it is zero. So it is not sign plus magnitude but we still talk about the msbit being the sign bit, because except for two special numbers that is what it is telling us, the sign of the number, the other bits are actually telling us the magnitude they are just not an unsigned magnitude as you might have in sign+magnitude notation.
So, the key to this whole question is understanding your limits. For a 3 bit unsigned number our range is 0 to 7, 0b000 to 0b111. for a 3 bit signed (twos complement) interpretation our range is -4 to +3 (0b100 to 0b011). For now limiting ourselves to 3 bits if you add 7+1, 0b111 + 0b001 = 0b1000 but we only have a 3 bit system so that is 0b000, 7+1 = 8, we cannot represent 8 in our system so that is an overflow, because we happen to be interpreting the bits as unsigned we look at the "unsigned overflow" which is also known as the carry bit or flag. Now if we take those same bits but interpret them as signed, then 0b111 (-1) + 0b001 (+1) = 0b000 (0). Minus one plus one is zero. No overflow, the "signed overflow" is not set...What is the signed overflow?
First what is the "unsigned overflow".
The reason why "it all works the same" no matter how many bits we have in our registers is no different than elementary school math with base 10 (decimal) numbers. If you add 9 + 1 which are both in the ones column you say 9 + 1 = zero carry the 1. you carry a one over to the tens column then 1 plus 0 plus 0 (you filled in two zeros in the tens column) is 1 carry the zero. You have a 1 in the tens column and a zero in the ones column:
1
09
+01
====
10
What if we declared that we were limited to only numbers in the ones column, there isn't any room for a tens column. Well that carry bit being a non-zero means we have an overflow, to properly compute the result we need another column, same with binary:
111
111
+ 001
=======
1000
7 + 1 = 8, but we cant do 8 if we declare a 3 bit system, we can do 7 + 1 = 0 with the carry bit set. Here is where the beauty of twos complement comes in:
111
111
+ 001
=======
000
if you look at the above three bit addition, you cannot tell by looking if that is 7 + 1 = 0 with the carry bit set or if that is -1 + 1 = 0.
So for unsigned addition, as we have known since grade school that a carry over into the next column of something other than zero means we have overflowed that many placeholders and need one more placeholder, one more column, to hold the actual answer.
Signed overflow. The sort of academic answer is if the carry in of the msbit column does not match the carry out. Let's take some examples in our 3 bit world. So with twos complement we are limited to -4 to +3. So if we add -2 + -3 = -5 that wont work correct?
To figure out what minus two is we do an invert and add one 0b010, inverted 0b101, add one 0b110. Minus three is 0b011 -> 0b100 -> 0b101
So now we can do this:
abc
100
110
+ 101
======
011
If you look at the number under the b that is the "carry in" to the msbit column, the number under the a the 1, is the carry out, these two do not match so we know there is a "signed overflow".
Let's try 2 + 2 = 4:
abc
010
010
+ 010
======
100
You may say but that looks right, sure unsigned it does, but we are doing signed math here, so the result is actually a -4 not a positive 4. 2 + 2 != -4. The carry in which is under the b is a 1, the carry out of the msbit is a zero, the carry in and the carry out don't match. Signed overflow.
There is a shortcut to figuring out the signed overflow without having to look at the carry in (or carry out). if ( msbit(opa) == msbit(opb) ) && ( msbit(res) != msbit(opb) ) signed overflow, else no signed overflow. opa being one operand, opb being the other and res the result.
010
+ 010
======
100
Take this +2 + +2 = -4. msbit(opa) and msbit(opb) are equal, and the result msbit is not equal to opb msbit so this is a signed overflow. You could think about it using this table:
x ab cr
0 00 00
0 01 01
0 10 01
0 11 10 signed overflow
1 00 01 signed overflow
1 01 10
1 10 10
1 11 11
This table is all the possible combinations if carry in bit, operand a, operand b, carry out and result bit for a single column turn your head sideways to the left to sort of see this x is the carry in, a and b columns are the two operands. cr as a pair is the result xab of 011 means 0+1+1 = 2 decimal which is 0b10 binary. So taking the rule that has been dictated to us, that if the carry in and carry out do not match that is a signed overflow. Well the two cases where the item in the x column does not match the item in the c column are indicated those are the cases where a and b inputs match each other, but the result bit is the opposite of a and b. So assuming the rule is correct this quick shortcut that does not require knowing what the carry bits are, will tell you if there was a signed overflow.
Now you are reading an H&P book. Which probably means mips or dlx, neither mips or dlx deal with carry and signed flags in the way that most other processors do. mips is not the best first instruction set IMO primarily for that reason, their approach is not wrong in any way, but being the oddball, you will spend forever thinking differently and having to translate when going to most other processors. Where if you learned the typical znvc flags (zero flag, negative flag, v=signed overflow, c=carry or unsigned overflow) way then you only have to translate when going to mips. Normally these are computed on every alu operation (for the non-mips type processors) you will see signed and unsigned overflow being computed for add and subtract. (I am used to an older mips, maybe this gen of books and the current instruction set has something different). Calling it addu add unsigned right at the start of mips after learning all of the above about how an adder circuit does not care about signed vs unsigned, is a huge problem with mips it really puts you in the wrong mindset for understanding something this simple. Leads to the belief that there is a difference between signed addition and unsigned addition when there isn't. It is only the overflow flags that are computed differently. Now multiply, and divide there is definitely a twos complement vs unsigned difference and you ideally need a signed multiply and an unsigned multiply or you need to deal with the limitation.
I recommend a simple (depending on how strong your bit manipulation is and twos complement) exercise that you can write in some high level language. Basically take all the combinations of unsigned numbers 0 to 7 added to 0 to 7 and save the result. Print out both as decimal and as binary (three bits for operands, four bits for result) and if the result is greater than 7 print overflow as well. Repeat this using signed variables using the numbers -4 to +3 added to -4 to +3. print both decimal with a +/- sign and the binary. If the result is less than -4 or greater than +3 print overflow. From those two tables you should be able to see that the rules above are true. Looking strictly at the operand and result bit patterns for the size allowed (three bits in this case) you will see that the addition operation gives the same result, same bit pattern for a given pair of inputs independent of whether those bit patterns are considered unsigned or twos complement. Also you can verify that unsigned overflow is when the result needs to use that fourth column, there is a carry out off of the msbit. For signed when the carry in doesn't match the carry out, which you see using the shortcut looking at the msbits of the operands and result. Even better is to have your program do those comparisons and print out something. So if you see a note in your table that the result is greater than 7 and a note in your table that bit 3 is set in the result, then you will see for the unsigned table that is always the case (limited to inputs of 0 to 7). And the more complicated one, signed overflow, is always when the result is less than -4 and greater than 3 and when the operand upper bits match and the result upper bit does not match the operands.
I know this is super long and very elementary. If I totally missed the mark here, please comment and I will remove or re-write this answer.
The other half of the twos complement magic. Hardware does not have subtract logic. One way to "convert" to twos complement is to "invert and add one". If I wanted to subtract 3 - 2 using twos complement what actually happens is that is the same as +3 + (-2) right, and to get from +2 to to -2 we invert and add one. Looking at our elementary school addition, did you notice the hole in the carry in on the first column?
111H
111
+ 001
=======
1000
I put an H above where the hole is. Well that carry in bit is added to the operands right? Our addition logic is not a two input adder it is a three input adder yes? Most of the columns have to add three one bit numbers in order to compute two operands. If we use a three input adder on the first column now we have a place to ... add one. If I wanted to subtract 3 - 2 = 3 + (-2) = 3 + (~2) + 1 which is:
1
011
+ 101
=====
Before we start and filled in it is:
1111
011
+ 101
=====
001
3 - 2 = 1.
What the logic does is:
if add then carry in = 0; the b operand is not inverted, the carry out is not inverted.
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
The addition above shows a carry out, I didn't mention that this was an unsigned operation 3 - 2 = 1. I used some twos complement tricks to perform an unsigned operation, because here again no matter whether I interpret the operands as signed or unsigned the same rules apply for if add or if subtract. Why I said that the carry out MIGHT BE inverted is that some processors invert the carry out and some don't. It has to do with cascading operations, taking say a 32 bit addition logic and using the carry flag and an add with carry or subtract with borrow instruction creating a 64 bit add or subtract, or any multiple of the base register size. Say you have two 64 bit numbers in a 32 bit system a:b + c:d where a:b is the 64 bit number but it is held in the two registers a and b where a is the upper half and b is the lower half. so a:b + c:d = e:f on a 32 bit system unsigned that has a carry bit and add with carry:
add f,b,d
addc e,a,c
The add leaves its carry out bit from the upper most bit lane in the carry flag in the status register, the addc instruction is add with carry takes the operands a+c and if the carry bit is set then adds one more. a+c+1 putting the result in e and the carry out in the carry flag, so:
add f,b,d
addc e,a,c
addc x,y,z
Is a 96 bit addition, and so on. Here again something very foreign to mips since it doesn't use flags like other processors. Where the invert or don't invert comes in for signed carry out is on the subtract with borrow for a particular processor. For subtract:
if subtract then carry in = 1; the b operand is inverted, the carry out MIGHT BE inverted.
For subtract with borrow you have to say if the carry flag from the status register indicates a borrow then the carry in is a 0 else the carry in is a 1, and you have to get the carry out into the status register to indicate the borrow.
Basically for the normal subtract some processors invert b operand and carry on in the way in and carry out on the way out, some processors invert the b operand and carry in in the way in but don't invert carry out on the way out. Then when you want to do a conditional branch you need to know if the carry flag means greater than or less than (often the syntax will have a branch if greater or branch if less than and sometimes tell you which one is the simplified branch if carry set or branch if carry clear). (If you don't "get" what I just said there that is another equally long answer which won't mean anything so long as you are studying mips).
As a 32-bit signed integers are represented by 1 sign-bit and 31 bits for the actual number we are effectively adding two 31 bit-numbers. Hence the 32nd bit (sign bit) will be where the overflow will be visible.
"The lack of a 33rd bit means that when overflow occurs, the sign bit is set with the value of the result instead of the proper sign of the result."
Imagine the following addition of two positive numbers (16 bit to simpify):
0100 1100 0011 1010 (19514)
+ 0110 0010 0001 0010 (25106)
= 1010 1110 0110 1100 (-20884 [or 44652])
For the summation of two large negative numbers however the extra bit would be required
1100 1100 0011 1010
+ 1110 0010 0001 0010
=11010 1110 0110 1100
Usually the CPU have this 33rd bit (or whatever bitsize it operates on +1) exposed as a overflow-bit in the micro-architecture.
Their description relates to operations on values with a particular bit sequence: the first bit corresponds to the sign of the value, and the other bits relate to the magnitude of that value.
What does this mean? The sign bit is set with the "value" of the result...
They mean that the overflow bit - the one that is a consequence of adding two numbers that need to spill into the next digit over - is dumped into the same place that the sign bit should be.
"Since we need just one extra bit, only the sign bit can be wrong."
All this means is that, when you perform arithmetic that overflows, the only bit whose value may be incorrect is the sign bit. All of the other bits are still the value they should be.
This is a consequence of what was described above: confusion between the sign bit's value due to overflow.

Resources