Z80 Register Endianness - cpu-registers

Considering this sample code:
ZilogZ80A cpu = new ZilogZ80A();
Console.WriteLine("H : " + cpu.GeneralRegisters.H.ToString());
Console.WriteLine("L : " + cpu.GeneralRegisters.L.ToString());
Console.WriteLine("HL: " + cpu.GeneralRegisters.HL.ToString());
Console.WriteLine("Load 23268 (0x5AE4) into register HL...");
Console.WriteLine("H : " + cpu.GeneralRegisters.H.ToString());
Console.WriteLine("L : " + cpu.GeneralRegisters.L.ToString());
Console.WriteLine("HL: " + cpu.GeneralRegisters.HL.ToString());
Which is doing the following:
Load 229 (decimal) into register H
Load 90 (decimal) into register L
Print out the values (hex, binary MSB, decimal) of the H, L and HL registers
Load 23268 (decimal) into register HL
Print out the values of the H, L and HL registers again.
Sample output:
H : 08-bit length register (#45653674): 0x00E5 | MSB 0b11100101 | 229
L : 08-bit length register (#41149443): 0x005A | MSB 0b01011010 | 90
HL: 16-bit length register (#39785641): 0x5AE5 | MSB 0b01011010 11100101 | 23269
Load 23268 (0x5AE4 into register HL...
H : 08-bit length register (#45653674): 0x00E4 | MSB 0b11100100 | 228
L : 08-bit length register (#41149443): 0x005A | MSB 0b01011010 | 90
HL: 16-bit length register (#39785641): 0x5AE4 | MSB 0b01011010 11100100 | 23268
Now for the questions:
Are the above assumptions (and sample output) on how the registers function correct?
Do the other register pairs (AF, BC, DE) function the exact same way?
If the answer to 1. and 2. is yes, why is the Z80 then considered little endian? When the HL register contents gets written to memory the L byte goes first, but (when reading them sequentially afterwards the bytes surely are in big endian order)?

Yes — HL is composed of H as the most significant byte, L as the least. If you perform a 16-bit operation like ADD HL,BC then carry from the top bit of L+C will flow into the computation of H+B. All the register pairs are alike in this regard.
That's because the logical order things are written in isn't related to endianess. E.g. in C you don't have to write 0x0001 on some platforms to equal 0x0100 on others. When writing, you write the most significant first.
The z80 is little endian because if you were to store HL to memory, L would be written a byte before H. If you were to read, L would be read from the address before H.

ld hl, $1234
ld ($fc00), hl
At this point, H = $12, L = $34, as your code suggests. The byte at $fc00 = $34, and the byte at $fc01 = $12.
So if you subsequently do:
ld hl, $5678
ld ($fc02), hl
($fc00) = $34, ($fc01) = $12, ($fc02) = $78, and ($fc03) = $56. So reading byte by byte from $fc00, memory would be $34127856, instead of $12345678, because Z80 is little endian.


Implement recursion in ASM without procedures

I'm trying to implement functions and recursion in an ASM-like simplified language that has no procedures. Only simple jumpz, jump, push, pop, add, mul type commands.
Here are the commands:
(all variables and literals are integers)
set (sets the value of an already existing variable or declares and initializes a new variable) e.g. (set x 3)
push (pushes a value onto the stack. can be a variable or an integer) e.g. (push 3) or (push x)
pop (pops the stack into a variable) e.g. (pop x)
add (adds the second argument to the first argument) e.g. (add x 1) or (add x y)
mul (same as add but for multiplication)
jump (jumps to a specific line of code) e.g. (jump 3) would jump to line 3 or (jump x) would jump to the line # equal to the value of x
jumpz (jumps to a line number if the second argument is equal to zero) e.g. (jumpz 3 x) or (jumpz z x)
The variable 'IP' is the program counter and is equal to the line number of the current line of code being executed.
In this language, functions are blocks of code at the bottom of the program that are terminated by popping a value off the stack and jumping to that value. (using the stack as a call stack) Then the functions can be called anywhere else in the program by simply pushing the instruction pointer onto the stack and then jumping to the start of the function.
This works fine for non-recursive functions.
How could this be modified to handle recursion?
I've read that implementing recursion with a stack is a matter of pushing parameters and local variables onto the stack (and in this lower level case, also the instruction pointer I think)
I wouldn't be able to do something like x = f(n). To do this I'd have some variable y (that is also used in the body of f), set y equal to n, call f which assigns its "return value" to y and then jumps control back to where f was called from, where we then set x equal to y.
(a function that squares a number whose definition starts at line 36)
1 - set y 3
2 - set returnLine IP
3 - add returnLine 2
4 - push returnLine
5 - jump 36
6 - set x y
36 - mul y 2
37 - pop returnLine
38 - jump returnLine
This doesn't seem to lend itself to recursion. Arguments and intermediate values would need to go on the stack and I think multiple instances on the stack of the same address would result from recursive calls which is fine.
Next code raises the number "base" to the power "exponent" recursively in "John Smith Assembly":
1 - set base 2 ;RAISE 2 TO ...
2 - set exponent 4 ;... EXPONENT 4 (2^4=16).
3 - set result 1 ;MUST BE 1 IN ORDER TO MULTIPLY.
4 - set returnLine IP ;IP = 4.
5 - add returnLine 4 ;RETURNLINE = 4+4.
6 - push returnLine ;PUSH 8.
7 - jump 36 ;CALL FUNCTION.
36 - jumpz 43 exponent ;FINISH IF EXPONENT IS ZERO.
37 - mul result base ;RESULT = ( RESULT * BASE ).
38 - add exponent -1 ;RECURSIVE CONTROL VARIABLE.
39 - set returnLine IP ;IP = 39.
40 - add returnLine 4 ;RETURN LINE = 39+4.
41 - push returnLine ;PUSH 43.
42 - jump 36 ;RECURSIVE CALL.
43 - pop returnLine
44 - jump returnLine
In order to test it, let's run it manually :
1 | 2
2 | 4
3 | 1
4 | 4
5 | 8
6 | 8
7 |
36 |
37 | 2
38 | 3
39 | 39
40 | 43
41 | 43(1)
42 |
36 |
37 | 4
38 | 2
39 | 39
40 | 43
41 | 43(2)
42 |
36 |
37 | 8
38 | 1
39 | 39
40 | 43
41 | 43(3)
42 |
36 |
37 | 16
38 | 0
39 | 39
40 | 43
41 | 43(4)
42 |
36 |
43 | 43(4)
44 |
43 | 43(3)
44 |
43 | 43(2)
44 |
43 | 43(1)
44 |
43 | 8
44 |
8 |
Edit : parameter for function now on stack (didn't run it manually) :
1 - set base 2 ;RAISE 2 TO ...
2 - set exponent 4 ;... EXPONENT 4 (2^4=16).
3 - set result 1 ;MUST BE 1 IN ORDER TO MULTIPLY.
4 - set returnLine IP ;IP = 4.
5 - add returnLine 7 ;RETURNLINE = 4+7.
6 - push returnLine ;PUSH 11.
7 - push base ;FIRST PARAMETER.
8 - push result ;SECOND PARAMETER.
9 - push exponent ;THIRD PARAMETER.
10 - jump 36 ;FUNCTION CALL.
36 - pop exponent ;THIRD PARAMETER.
37 - pop result ;SECOND PARAMETER.
38 - pop base ;FIRST PARAMETER.
39 - jumpz 49 exponent ;FINISH IF EXPONENT IS ZERO.
40 - mul result base ;RESULT = ( RESULT * BASE ).
41 - add exponent -1 ;RECURSIVE CONTROL VARIABLE.
42 - set returnLine IP ;IP = 42.
43 - add returnLine 7 ;RETURN LINE = 42+7.
44 - push returnLine ;PUSH 49.
45 - push base
46 - push result
47 - push exponent
48 - jump 36 ;RECURSIVE CALL.
49 - pop returnLine
50 - jump returnLine
Your asm does provide enough facilities to implement the usual procedure call / return sequence. You can push a return address and jump as a call, and pop a return address (into a scratch location) and do an indirect jump to it as a ret. We can just make call and ret macros. (Except that generating the correct return address is tricky in a macro; we might need a label (push ret_addr), or something like set tmp, IP / add tmp, 4 / push tmp / jump target_function). In short, it's possible and we should wrap it up in some syntactic sugar so we don't get bogged down with that while looking at recursion.
With the right syntactic sugar, you can implement Fibonacci(n) in assembly that will actually assemble for both x86 and your toy machine.
You're thinking in terms of functions that modify static (global) variables. Recursion requires local variables so each nested call to the function has its own copy of local variables. Instead of having registers, your machine has (apparently unlimited) named static variables (like x and y). If you want to program it like MIPS or x86, and copy an existing calling convention, just use some named variables like eax, ebx, ..., or r0 .. r31 the way a register architecture uses registers.
Then you implement recursion the same way you do in a normal calling convention, where either the caller or callee use push / pop to save/restore a register on the stack so it can be reused. Function return values go in a register. Function args should go in registers. An ugly alternative would be to push them after the return address (creating a caller-cleans-the-args-from-the-stack calling convention), because you don't have a stack-relative addressing mode to access them the way x86 does (above the return address on the stack). Or you could pass return addresses in a link register, like most RISC call instructions (usually called bl or similar, for branch-and-link), instead of pushing it like x86's call. (So non-leaf callees have to push the incoming lr onto the stack themselves before making another call)
A (silly and slow) naively-implemented recursive Fibonacci might do something like:
int Fib(int n) {
if(n<=1) return n; // Fib(0) = 0; Fib(1) = 1
return Fib(n-1) + Fib(n-2);
## valid implementation in your toy language *and* x86 (AMD64 System V calling convention)
### Convenience macros for the toy asm implementation
# pretend that the call implementation has some way to make each return_address label unique so you can use it multiple times.
# i.e. just pretend that pushing a return address and jumping is a solved problem, however you want to solve it.
%define call(target) push return_address; jump target; return_address:
%define ret pop rettmp; jump rettmp # dedicate a whole variable just for ret, because we can
# As the first thing in your program, set eax, 0 / set ebx, 0 / ...
global Fib
# input: n in edi.
# output: return value in eax
# if (n<=1) return n; // the asm implementation of this part isn't interesting or relevant. We know it's possible with some adds and jumps, so just pseudocode / handwave it:
... set eax, edi and ret if edi <= 1 ... # (not shown because not interesting)
add edi, -1
push edi # save n-1 for use after the recursive call
call Fib # eax = Fib(n-1)
pop edi # restore edi to *our* n-1
push eax # save the Fib(n-1) result across the call
add edi, -1
call Fib # eax = Fib(n-2)
pop edi # use edi as a scratch register to hold Fib(n-1) that we saved earlier
add eax, edi # eax = return value = Fib(n-1) + Fib(n-2)
During a recursive call to Fib(n-1) (with n-1 in edi as the first argument), the n-1 arg is also saved on the stack, to be restored later. So each function's stack frame contains the state that needs to survive the recursive call, and a return address. This is exactly what recursion is all about on a machine with a stack.
Jose's example doesn't demonstrate this as well, IMO, because no state needs to survive the call for pow. So it just ends up pushing a return address and args, then popping the args, building up just some return addresses. Then at the end, follows the chain of return addresses. It could be extended to save local state across each nested call, doesn't actually illustrate it.
My implementation is a bit different from how gcc compiles the same C function for x86-64 (using the same calling convention of first arg in edi, ret value in eax). gcc6.1 with -O1 keeps it simple and actually does two recursive calls, as you can see on the Godbolt compiler explorer. (-O2 and especially -O3 do some aggressive transformations). gcc saves/restores rbx across the whole function, and keeps n in ebx so it's available after the Fib(n-1) call. (and keeps Fib(n-1) in ebx to survive the second call). The System V calling convention specifies rbx as a call-preserved register, but rbi as call-clobbered (and used for arg-passing).
Obviously you can implement Fib(n) much faster non-recursively, with O(n) time complexity and O(1) space complexity, instead of O(Fib(n)) time and space (stack usage) complexity. It makes a terrible example, but it is trivial.
Margaret's pastebin modified slightly to run in my interpreter for this language: (infinite loop problem, probably due to a transcription error on my part)
set n 3
push n
set initialCallAddress IP
add initialCallAddress 4
push initialCallAddress
jump fact
set finalValue 0
pop finalValue
print finalValue
jump 100
set rip 0
pop rip
pop n
push rip
set temp n
add n -1
jumpz end n
push n
set link IP
add link 4
push link
jump fact
pop n
mul temp n
pop rip
push temp
jump rip
Successful transcription of Peter's Fibonacci calculator:
String[] x = new String[] {
//n is our input, which term of the sequence we want to calculate
"set n 5",
//temp variable for use throughout the program
"set temp 0",
//call fib
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//program is finished, prints return value and jumps to end
"print returnValue",
"jump end",
//the fib function, which gets called recursively
//if this is the base case, then we assert that f(0) = f(1) = 1 and return from the call
"jumple base n 1",
"jump notBase",
"set returnValue n",
"pop temp",
"jump temp",
//we want to calculate f(n-1) and f(n-2)
//this is where we calculate f(n-1)
"add n -1",
"push n",
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//return from the call that calculated f(n-1)
"pop n",
"push returnValue",
//now we calculate f(n-2)
"add n -1",
"set temp IP",
"add temp 4",
"push temp",
"jump fib",
//return from call that calculated f(n-2)
"pop n",
"add returnValue n",
//this is where the fib function ultimately ends and returns to caller
"pop temp",
"jump temp",
//end label

ADC transfer function

I took over the project from someone who had gone a long time ago.
I am now looking at ADC modules, but I don't get what the codes mean by.
ADC: AD7609 ( 18bit/8 channel)
Instrumentation Amp : INA114
Process: Reading volts(0 ~ +10v) --> Amplifier(INA114) --> AD7609.
Here is codes for that:
After complete conversion of 8 channels which stored in data[9]
Convert data to micro volts??
//convert to microvolts and store the readings
// unsigned long temp[], data[]
temp[0] = ((data[0]<<2)& 0x3FFFC) + ((data[1]>>14)& 0x0003);
temp[1] = ((data[1]<<4)& 0x3FFF0) + ((data[2]>>12)& 0x000F);
temp[2] = ((data[2]<<6)& 0x3FFC0) + ((data[3]>>10)& 0x003F);
temp[3] = ((data[3]<<8)& 0x3FF00) + ((data[4]>>8)& 0x00FF);
temp[4] = ((data[4]<<10)& 0x3FC00) + ((data[5]>>6)& 0x03FF);
temp[5] = ((data[5]<<12) & 0x3F000) + ((data[6]>>4)& 0x0FFF);
temp[6] = ((data[6]<<14)& 0x3FFF0) + ((data[7]>>2)& 0x3FFF);
temp[7] = ((data[7]<<16)& 0x3FFFC) + (data[8]& 0xFFFF);
I don't get what these codes are doing...? I know it shifts but how they become micro data format?
transfer function
//store the final value in the raw data array adstor[]
adstor[i] = (signed long)(((temp[i]*2000)/131072)*10000);
131072 = 2^(18-1) but I don't know where other values come from
AD7609 datasheet says The FSR for the AD7609 is 40 V for the ±10 V range and 20 V for the ±5 V range, so I guessed he chose 20vdescribed in the above and it somehow turned to be 2000???
Does anyone have any clues??
-------------------Updated question from here ---------------------
I don't get how 18bit concatenated value of data[0] + 16bit concatenated value of data[1] turn to be microvolt after ADC transfer function.
+---+---+--- +---+---+---+---+---+---++---+---+---++---+---+---++
analog volts | 1.902v | 1.921v | 1.887v | 1.934v |
digital value| 12,464 | 12,589 | 12,366 | 12,674 |
I just make an example from data[3:0]
1 resolution = 20v/2^17-1 = 152.59 uV/bit and 1.902v/152.59uv = 12,464
Now get thru concatenation:
temp[0] = ((data[0]<<2)& 0x3FFFC) + ((data[1]>>14)& 0x0003) = C2C0
temp[1] = ((data[1]<<4)& 0x3FFF0) + ((data[2]>>12)& 0x000F) = 312D3
temp[2] = ((data[1]<<6)& 0x3FFC0) + ((data[3]>>10)& 0x003F) = 138C
Then put those into transfer function and get microvolts
adstor[i] = (signed long)(((temp[i]*2000)/131072)*10000);
adstor[0]= 7,607,421 with temp[0] !=1.902*e6
adstor[1]= 30,735,321 with temp[1] != 1.921*e6
adstor[2]= 763,549 with temp[2]
As you notice, they are quite different from the analog value in table.
I don't understand why data need to bit-shifting and <<,>> and added up with two data[]??
Please note that the maximum 18-bit value is 2^18-1 = $3FFFF = 262143
For [2] it appears that s/he splits 18-bit word concatenated values into longs for easier manipulation by step [3].
[3]: Regarding adstor[i] = (signed long)(((temp[i]*2000)/131072)*10000);
To convert from raw A/D reading to volts s/he multiplies with the expected volts and divides by the maximum possible A/D value (in this case, $3FFFF) so there seems to be an error in the code as s/he divides by 2^17-1 and not 2^18-1. Another possibility is s/he uses half the range of the A/D and compensates for that this way.
If you want 20V to become microvolts you need to multiply it by 1e6. But to avoid overflow of the long s/he splits the multiplication into two parts (*2000 and *10000). Because of the intermediate division the number gets small enough to be multiplied at the end by 10000 without overflowing at the expense of possibly losing some least significant bit(s) of the result.
P.S. (I use $ as equivalent to 0x due to many years of habit in certain assembly languages)

Convert an 8bit number to hex in z80 assembler

I am writing a game for the ZX Spectrum using z80 and have a bit of a problem.
I have manipulated a routine to convert a number held in the “a” register to a hex value held in “de”. I’m not sure of how to convert the other way, EG pass in a hex value in de and convert this to decimal held in “a”.
NB: The following routine converts the input to the ascii values that represent the values 0 through to F. EG if a = 255 then d =70 and e = 70 as “F” is ascii value 70.
NumToHex ld c, a ; a = number to convert
call Num1
ld d, a
ld a, c
call Num2
ld e, a
ret ; return with hex number in de
Num1 rra
Num2 or $F0
add a, $A0
adc a, $40 ; Ascii hex at this point (0 to F)
Can anyone advise on a solution to work this in reverse or offer a better solution?
This code takes DE has a hexadecimal number in ASCII and converts it to binary in A. It assumes that DE is a valid hexadecimal number and uses uppercase 'A' through 'F'. It will fail if lowercase letters are used or any ASCII character outside of '0' .. '9' and 'A' .. 'F'.
HexToNum ld a,d
call Hex1
add a,a
add a,a
add a,a
add a,a
ld d,a
ld a,e
call Hex1
or d
Hex1 sub a,'0'
cp 10
ret c
sub a,'A'-'0'-10
Update: Have now tested code and fixed bug in handling of 'A' .. 'F' case in Hex1.
Update: Using "add a,a" which is faster than "sla a". Note that if speed is a concern both conversions can be done much more quickly with lookup tables.

About pointers and ASCII code

im learning more about c language and i have 1 doubt about 1 code that i have seen.
int i = (65*256+66)*256+67;
int* pi;
char* pc;
pi = &i;
pc = (char*)pi;
printf("%c %c %c \n", *pc, *(pc+1), *(pc+2));
Output is: C B A
I know that ASCII code of A is 65, B is 66, and C is 67 but the variable i is none of them.
If i put variable i=65, the output is just A and dont show B or C, why?
And i would like to know why this code have that output. Thanks for any help.
The line
int i = (65*256+66)*256+67;
turns i into the following
00000000 01000001 01000010 01000011
int = 4 bytes or 4 groups of 8 bits
char = 1 byte or 1 group of 8 bits.
What happens is that a char pointer is used to point to a subset of the original int bits.
At first the pointer points to the 8 least significant bits (the group to the right).
And the letter C is printed. Then, the pointer it self is incremented by 1 which makes it point to the next group of 8 bits in the memory which happens to be B. And once more for the A.
*256 means left shift by 8 bit (1 byte) so the line
int i = (65*256+66)*256+67;
actually put A,B,C on 3 adjacent bytes in memory
then pi pointer made point to the address of integer i, then same address down cast to char pointer pc, so pc actually hold the address to a byte that contains 'A', and of course if you add 1 and 2 to the address that means the adjacent 'B' and 'C' get pointed to and print out.
EDIT: just to clarify a bit more int is 32 bit long but char is 8 bit, that's why u need a char pointer to represent an address valid for 8 bit long.
Characters are stored as bytes, as you probably know. The initializing of the variable 'i' has the following meaning:
65*256 // store 65 ('A') and left shift it by 8 byte (= '*256')
(65*256+66)*256 // add 66 ('B') and shift the whole thing again
(65*256+66)*256+67 // add 67 ('C')
'pi' is initialized as a INT pointer to 'i'
'pc' is initialized as a CHAR pointer to 'pi'
So 'pc' then holds the address of the beginning of the 3 bytes stored in 'i', which holds 'A'.
By adding 1 and 2 to the address in pc, you get the second and third bytes (containing 'B' and 'C'), as follows:
printf("%c %c %c \n", *pc, *(pc+1), *(pc+2));
Working on the bits here ;D

What Does OpenCL Upsample Do?

After reading the documentation for the OpenCL function upsample, I still have no idea what it does.
The documentation's description of the function is:
result[i] = ((gentype)hi[i] << 8|16|32) | lo[i]
What does that mean? What does upsample do?
Perhaps this is best explained through code rather than words (a snippet is worth 2^10 words, after all):
uchar hi = 0xAA;
uchar lo = 0xBB;
ushort x = upsample(hi, lo); // x = 0xAABB
There are overloads for signed versions which respect the signedness rules, and vector overloads too:
uchar2 hi = (uchar2)(0xAA, 0xBB);
uchar2 lo = (uchar2)(0x11, 0x22);
ushort2 x = upsample(hi, lo); // x = {0xAA11, 0xBB22}
Those don't do anything special as you might conceivably imagine, they just operate component-wise.
Mathematically, the description of the function makes sense, by "pushing" the hi argument to the most significant bits of the output so that it appears in the first 8 (short), 16 (int), 32 (long) bits. Below is an example using the ushort upsample(uchar hi, uchar lo) overload for illustration:
upsample(hi, lo) = (hi << 8) | lo
hi = 01010101
lo = 01101110
lo = 0000000001101110 (extended to result type ushort)
hi << 8 = 0101010100000000 (extended to result type ushort)
(hi << 8) | lo = 0101010100000000
| 0000000001101110
= 0101010101101110
^ ^
hi lo
Actually, thanks, I didn't know about this function, I could certainly make use of it myself. Cheers!
