Dividing A Register With 16 Bit Into 8 Bit Two Parts - cpu-registers

i have encountered some registers in some websites and my textbook. Generally, 16 bit registers are divided into two parts. These two parts with 8 bits are classified L(low) and H(high).
Why is this performed ?
Is it that we work on the 8 bit registers ?
Do these low and high specify an input for utilizing different parts of the register ?

If the cpu you are talking about is from the family of the 8086 processor, yes, it has 16 bits general purpose registers that could be accessed directly, meaning moving the 16 bits at a time, or loading only one byte (8 bits) to either the lower part of the register (the less meaningull bits), or to the higher part of the register (the mos significant bits).
Why is this performed ?
I don't know all the reasons but there must be a mix of trade-offs at the time of designing the cpu. Remember that at that time the predecessors of this cpu were 8 bit cpus which could have caused some back compatibility requirements.
Is it that we work on the 8 bit registers ?
The cpu works with the 16 bits registers but you can address the lower part or the higher part individually.
Do these low and high specify an input for utilizing different parts
of the register ?
Input or output, you could also read their values.

Related

How does the Program read 32 bit from the memory in a single clock cycle?

So, I have this assignment where I need to design a RISC-32-bit 5 stage pipeline. I must support at least 32 (32-bit) instructions and 32 (32-bit) data values. The memory should be read in 1 clock cycle. Now, for this, I have used a word addressable memory (1 address will contain 32 bits). But, I want to make this byte addressable.
One way of doing this is making the external clock four times slower and then passing these into the other stages of the pipeline. But passing the original clock into the memory part. But, this will make the simulation a bit hectic, like I have to run the clock 20 times (instead of 5).
Another way of doing this will be running a clock (attached to the memory) that will be four times faster than the external clock. So, by the time a single clock cycle passes, memory will be accessed four times so that we would have brought the complete 32-bit. But, circuits for doubling/quadrupling the frequency of a clock seem too complicated.
Are there simpler frequency doubler circuits that can be implemented, or is there any other way to do this?
I am using logisim-evolution to simulate this, and for the memory part, I have used the in-built RAM.
This is the RAM:
The normal way to make a 32-bit byte-addressable memory is to have four 8-bit memory subsystems that are all fed the top N-2 bits of the byte address. When doing a 32-bit load or store, all four memory subsystems will be active. When doing a 16-bit load or store, the second-from-the-bottom address bit will be used control whether to activate the first and second subsystems or the third and fourth. When doing an 8-bit load or store, the bottom address bit will select between the first and second, or between the third and fourth, subsystem.

Do NaN-boxing and tagged pointers have a future on 64bit platforms?

On common 64bit architectures like x86-64 and arm64, usually only 48 bits are used for memory addressing, while the other bits are copies of bit 47 (which is usually zero for user-space programs). Thus, the remaining 16 bits can be used to store additional data like type tags etc., as long as those bits are masked off before dereferencing. Alternatively, the 48 bits can fit into the NaN-representation of a 64-bit float number. Both techniques are often used by dynamic/interpreted languages.
I've read about Intel 5-level-paging which would extend the address range from 48 to 57 bits, thus significantly reducing the leftover bits and also rendering NaN-boxing impossible. The Linux Kernel has already added support for this paging scheme.
Given that 48 bits correspond to 262,144 GiB of memory we can assume that we won't need the 57 bit range anytime soon on consumer devices like PCs, laptops and phones, and thus one might assume that on those devices we will remain on the 48 bit mode for a long time to come, with the above mentioned techniques remaining viable, while the 57 bit mode will be only used for servers/supercomputers.
Am I correct to make those assumptions? Or are there indicators that even consumer-scale devices will use the 57 bit mode in the near future?
Even if memory-mapped persistent storage becomes widespread (NV-DIMM), it'll be a while before consumer PCs have more than 64TiB or 128TiB of storage + DRAM. Remember that high-half kernels want half the virtual address space for kernel use, and typically want to direct-map all physical memory to a bit contiguous range of virtual addresses. As well as making other mappings in kernel space, I think. e.g. see https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt for what Linux does.
As you suspect, OSes wouldn't actually enable PML5 on computers that have far less than 256TiB of physical address space. There's no need for that much virtual address space and it has a performance cost (more expensive page-walks from another level of page tables). The page-walk hardware wouldn't always be able to keep the two actually-used top-level entries cached; invalidations of everything on CR3 changes can force flushing. (Page-walk hardware can in general cache upper levels of the radix tree to speed up TLB misses for nearby pages.)

Why Motorola 68k's 32-bit general-purpose registers are divided into data registers and address registers?

The 68k registers are divided into two groups of eight. Eight data registers (D0 to D7) and eight address registers (A0 to A7). What is the purpose of this separation, would not be better if united?
The short answer is, this separation comes from the architecture limitations and design decisions made at the time.
The long answer:
The M68K implements quite a lot of addressing modes (especially when compared with the RISC-based processors), with many of its instructions supporting most (if not all) of them. This gives a large variety of addressing modes combinations within every instruction.
This also adds a complexity in terms of opcode execution. Take the following example:
move.l $10(pc), -$20(a0,d0.l)
The instruction is just to copy a long-word from one location to another, simple enough. But in order to actually perform the operation, the processor needs to figure out the actual (raw) memory addresses to work with for both source and destination operands. This process, in which operands addressing modes are decoded (resolved), is called the effective address calculation.
For this example:
In order to calculate the source effective address - $10(pc),
the processor loads the value of PC (program) counter register
and adds $10 to it.
In order to calculate the destination effective address -
-$20(a0,d0.l), the processor loads the value of A0 register, adds the value of D0 register to it, then subtracts
$20.
This is quite a lot of calculations of a single opcode, isn't it?
But the M68K is quite fast in performing these calculations. In order to calculate effective addresses quickly, it implements a dedicated Address Unit (AU).
As a general rule, operations on data registers are handled by the ALU (Arithmetic Logical Unit) and operations involving address calculations are handled by the AU (Address Unit).
The AU is well optimized for 32-bit address operations: it performs 32-bit subtraction/addition within one bus cycle (4 CPU ticks), which ALU doesn't (it takes 2 bus cycles for 32-bit operations).
However, the AU is limited to just load and basic addition/subtraction operations (as dictated by the addressing modes), and it's not connected to the CCR (Conditional Codes Register), which is why operations on address registers never update flags.
That said, the AU should've been there to optimize calculation of complex addressing modes, but it just couldn't replace the ALU completely (after all, there were only about 68K transistors in the M68K), hence there are two registers set (data and address registers) each having their own dedicated unit.
So this is just based on a quick lookup, but using 16 registers is obviously easier to program. The problem could be that you would then have to make instructions for each of the 16 registers. Which would double the number of opcodes needed. Using half for each purpose is not ideal but gives access to more registers in general.

What if a bus can't take a whole instruction length?

I'm learning about computer architecture and I know how a computer works when it executes a program. The thing that makes me confused is when the instruction length is longer than the width of the bus AND the instruction length is NOT the double of the bus width. Let's say we have 12 bit instructions and an 8 bit bus. What does the computer do? Does it:
Analyse the PC
Go to the address of the PC
Fetch 8 bits of the instruction
store 8 bits in instruction register
increase PC by 8 bits (???)
fetch the remaining 4 bits
fill the instruction register (which is 12 bits long?)
Well as you see I'm confused here. I suppose it's not like this, but I need to know in detail how it works and what the PC is after every step.
Would be very grateful for some help! Thanks in advance.
Normally, the smalls amount of memory that can be read or written is 1 byte, i.e. 8 bits. So if the CPU needs 12 bits only, it has to read two 8-bit bytes. From the 16 bits, the required 12 bits are extracted by hardware, and the remaining 4 bits are not used.
Since this is not so memory efficient, the instruction length of a CPU normally is a multiple of 8 bits, e.g. by packing operands directly into the instruction.
So your 7 steps in your example are right except step 6, in which 8 bits are fetched, of which only 4 would be used.

Why 24 bits registers?

In my work I deal with different micro-controllers, micro-processors and DSP processors. Many of them have 24-bits registers and counters.
I know how to use them, this is not my question.
My question is why do they have 24-bits register! why not make it 32 bit?
and as I know, it is not a problem of size, because the registers are already 32bits, but have maximum of 0xFFFFFF.
Do this provide easier HW implementation? Faster calculations?
Or it is just "hmmm, lets put 24-bits registers to make the job of programmers more hard"?
My guess is that most DSP applications simply don't need 32-bits. Digital audio uses 24-bits fidelity the most. Implementing 32-bits would require more transistors thus would result in higher costs.
Why would 32 bits be easier for the programmer?
Also, you state that the registers have a maximum of 0xFFFFFF, which makes them 24-bits by definition, not 32-bits as you suggest.
There is no particular reason for 8/16/32/64 bits. There are 24 bit DSPs, 18 bit PICs, 36 bit PDP... Each bit costs time, money and power so having enough bits is good enough. No need to over do it. Just look at the original PCs with 20 adress lines, even though the memory pointers could be up to 32 bits.
Tagging onto Tomas' answer, some DSPs have a register mode where overflowing locks the value at the highest state. If the data is 24-bit and it rolls over to the 25th bit, it should lock there, not at the 32-bit rollover.
For audio you would typically want 16 bit output. Since you lose some precision during processing they pick a reasonable size that is somewhat bigger than 16 bit, which happens to be 24 bit.
The reason not to go to full 32 bits is that that would need substantially more hardware, especially for multiplication.

Resources