Z80 flags - how to generate? - microcontroller

I'm designing a Z80 compatible project. I'm up to designing the flags register.
I originally thought that the flags were generated straight from the ALU depending on the inputs and type of ALU operation.
But after looking at the instructions and the flags result it doesn't seem that the flags are always consistent with this logic.
As a result I'm then assuming I also have to feed the ALU the op-code as well to generate the correct flags each time. But this would seem to make the design over-complicated. And before making this huge design step I wanted to check with the Internet.
Am I correct? OR just really confused, and it is as simple as I originally thought?

Of course, the type of the operation is important. Consider overflow when doing addition and subtraction. Say, you're adding or subtracting 8-bit bytes:
1+5=6 - no overflow
255+7=6 - overflow
1-5=252 - overflow
200-100=100 - no overflow
200+100=44 - overflow
100-56=44 - no overflow
Clearly, the carry flag's state here depends not only on the input bytes or the resultant byte value, but also on the operation. And it indicates unsigned overflow.
The logic is very consistent. If it's not, it's time to read the documentation to learn the official logic.
You might be interested in this question.

Your code is written for a CP/M operating system. I/O is done through the BDOS (Basic Disc Operating System) interface. Basically, you load an operation code in the C register, any additional parameters in other registers, and call location 0x5. Function code C=2 is write a character in the E register to the console (=screen). You can see this in action at line 1200:
ld e,a
ld c,2
call bdos
pop hl
pop de
pop bc
pop af
ret
bdos push af
push bc
push de
push hl
call 5
pop hl
pop de
pop bc
pop af
ret
For a reference to BDOS calls, try here.
To emulate this you need to trap calls to address 5 and implement them using whatever facilities you have available to you.

Related

Where is TIMSK1 defined?

I am trying to understand timer interrupts in Arduino.
I am finding a major difficulty in retrieving documentation.
Sources online refer to apparently "magical" constants (like TIMSK1 for example) but I am unable to find where they are defined.
Do they come from some sort of header file?
Is there some reference documentation anywhere?
You've already answered, but I'd like to fill in some gaps, perhaps.
Those symbols are not magical, they are the names of the processor and I/O registers and values for the bit-fields inside those registers. As you found, the processor datasheet explains the layout of the I/O registers. This is standard for all microcontroller vendors.
Here's the home page for the Mega328P: https://www.microchip.com/en-us/product/ATmega328p
since the datasheet PDF might change some day. Note that farther down the page there are many very helpful "application notes" that describe how to do things...
How does a C program, or an Arduino script, get the values of the registers and fields? From the vendor supplied header files. See here for Microchips "packs": http://packs.download.atmel.com
The compiler that comes with the Arduino IDE distribution has included the headers for the boards it supports. They're buried pretty deeply - we go to the datasheet to find the register and field names, not usually to the header files.
For the mega328, the header file is here (macos): ~/Arduino.app/Contents/Java/hardware/tools/avr/avr/include/avr/iom328p.h
I'm not sure why you said "(although not explained in full detail)". There's really not much to it - it just holds a few bits that enable interrupts when certain things happen in Timer/Counter 1.
From the 328P datasheet 01/2015, TIMSK1 is described on page 90 "All interrupts are individually masked with the Timer Interrupt Mask Register (TIMSK1)." The register layout is on page 112. And its position within the I/O register set on page 278. A programmer doesn't need more than that! :-)
(I wish I could attach a pdf... it has a big table of the registers, fields, interrupt vectors, DIP pins, ... all generated from that header file.)
After some search I found that in the the ATMega processor manual those constants are named (although not explained in full detail)
One possible location of the manual is:
https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel-7810-Automotive-Microcontrollers-ATmega328P_Datasheet.pdf

Arguing whether a situation leads to data hazard or not

I was going through the section of pipelining from the text Computer Organization [5e] by Hamacher et. al.. There I came across a situation which the authors claim causes data hazard.
The situation is shown below:
For example, stage E in the four-stage pipeline of Figure 8.2b is responsible for arithmetic and logic operations, and one clock cycle is assigned for this task. Although this may be sufficient for most operations, some operations, such as divide, may require more time to complete. Figure 8.3 shows an example in which the operation specified in instruction I2 requires three cycles to complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write stage must be told to do nothing, because it has no data to work with. †: Meanwhile, the information in buffer B2 must remain intact until the Execute stage has completed its operation. This means that stage 2 and, in turn, stage 1 are blocked from accepting new instructions because the information in B1 cannot be overwritten. Thus, steps D4 and F5 must be postponed as shown.
... Any condition that causes the pipeline to stall is called a hazard. We have just seen an example of a data hazard. A data hazard is any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline.
In the example above, the authors assume that a data hazard has occurred, and two stall cycles are introduced into the pipeline. The main reason that they give for this data hazard is that, since the execute phase requires 2 more cycles than the usual need for instruction 2, so the data on which the write back stage should work has to wait for 2 cycles...
But I am having a little difficulty in accepting this analysis. Usually, the books give examples of data hazards in situations, where there is data dependency (the usual RAW, WAR, etc..). But here there is no such thing. And I thought this to be a structural hazard assuming that I2 cannot use the EX stage as I1 is using it.
Moreover, the text assumes that there is no queuing of the results of the stages in the buffer. Clear from the statement marked with †, Meanwhile, the information in the buffer..., (where there is a little flaw as well, because, if no queuing is there, then the output of D3 in cycle 4 shall overwrite the value in buffer B2 on which the EX stage is working, a contradiction to their own assumption).
I thought that the stalls are introduced due to this no queuing condition... and structural hazard, and if things are properly managed as shown below, no stalls shall be there.
This is what I assume:
I assume that the execute stage has more than one separate functional units (e.g. one where calculations of instruction 1 are performed. [basic ALU requiring 1 cycle duration], one for integer division, another for integer multiplication etc.) [So structural hazard is out of the way now.]
I also assume that the pipeline buffers can store the results produced in the stages in a queue. [So that the problem in statement marked with † is no longer there.]
This being said, the situation is now as follows:
However hard I tried with the assumptions, I could not remove the bubbles shown in blue. [Even if queuing is assumed in buffers, the buffers cannot give the result out of order, so those stalls remain].
With this exercise of mine, I feel that the example shown in the text is indeed a hazard and that too data hazard (even though there was no data dependencies ?), as in my exercise there was no chance of structural hazard...
Am I correct?
And I thought this to be a structural hazard assuming that I2 cannot use the EX stage as I1 is using it.
Yup, that's the terminology I'd use, based on wikipedia: https://en.wikipedia.org/wiki/Hazard_(computer_architecture).
That article restricts data hazards to only RAW, WAR, and WAW. As such, they're only visible when you consider which operands are being used.
e.g. an independent multiply (result not read by the next few insns) could be allowed to complete out of order, after executing in a separate multi-cycle or pipelined multiplier unit.
Write-back conflicts would be a problem if the slow ALU instruction needed to write a GPR in the same cycle as a later add or something. Also data hazards like WAW, since mul r3, r2, r1 / sw r3, (r4) / add r3, r2, r1 should leave r3 = r2+r1 not r2*r1.
MIPS solved all that with the special hi:lo reg pair for mult/div, allowing the mul and div units to be loosely coupled to the 5-stage pipeline. And the ISA had pretty relaxed rules about what was allowed to happen to those registers, e.g. writing one with mthi r3 destroys the previous value of the other, so mflo r2 would give unpredictable results after mthi. Raymond Chen's article.
An "in-order pipeline" means instructions start execution in program order, no necessarily that they complete in program order. It's very common for modern in-order pipelines to allow memory operations to complete out of order, allowing memory-level parallelism and allowing instruction scheduling to hide load-use latency of L1d cache hits. It's also possible to pipeline higher-latency ALU operations as long as hazards are detected and handled somehow.
Do these authors use the term "structural hazard" at all, or do they consider all (non-control?) hazards to be data hazards?
At this point it seems like primarily a terminology issue. IDK if they're on their own in using terminology this way, or if there is another convention with any popularity other than the one Wikipedia describes.
Separate from your main question, In clock cycles 4 and 5, you have two instructions in the E stage at the same time. If something stalls in the E stage, the stall bubbles need to come before the E stage in later instructions, like in the Fig 8.3 image you linked from the book.
And yeah, it's weird that they talk about the pipeline register between stages needing to stay constant. If a multi-cycle non-pipelined execution unit needs to keep values around, it could snapshot them.
Unless maybe the stall signal makes the Decode stage keep generating that output repeatedly until the stall signal is de-asserted and the pipeline register will finally latch the output of the previous stage instead of ignoring it. There are latches / flip-flops that have a control signal separate from the clock that makes them ignore their input and keep outputting what they were already outputting.

Zwift : Add resistance with FTMS control point

I try to build a smart home trainer.
at this moment, it is connected with Zwift with the Fitness Machine Service.
I can send to zwift Power and Cadence and i can play.
Now i try to add the control point (one of the characteristics included in FTMS)
But i cannot finish the transaction described in the specifications.
I think it's not very easy.
The xml file which describe the control point is empty!
There is no complete sequence diagram of flow chart.
at this moment, i can receive a write event from zwift into the control point.
First zwift send 0x7 and then 0x0
After that...again write 7 into the control point and then 0
I try to answer (indicate) 0x80, 0x801, 0x0180 for the 2 bytes needed (cf specification)
I think i don't really understand the specification
Have you somme informations to help me?
any flow chart, sequence diagram for the resistance level update?
do you confirme i juste need to indicate 2 bytes to answer to a write from zwift into the control point?
#Yonkee
I agree that it's a bit weird that the Control Point XML is empty, since the FTMS specification PDF doesn't give a complete list of e.g. what to expect as parameters to the various commands.
However, I've managed to create an implementation that supports some of the use cases you might be interested in. It currently supports setting target resistance and power, but not the simulation parameters you will need for the regular simulation mode in Zwift. You can find this implementation on github. It's not complete, and I just handle a few commands there, but you can grasp the concept from the code.
I wrote this using the information I could find online, basically the Fitness Machine Specification and the different GATT Characteristics Specifications.
When an app like Zwift write to the CP it's an OP Code optionally followed by parameters, you should reply with the OP Code 0x80 (Response Code) followed by the OP Code that the reply is for, optionally followed by parameters.
In the case of OP Code 0x00 (Request Control) you should therefore reply with: 0x80, 0x00, 0x01. The last 0x01 is the result code for "Success".
In the case of OP Code 0x07 (Start/Resume), you should reply with: 0x80, 0x07, 0x01. Assuming you consider the request successful, the other possible responses are detailed in Table 4.24 of the FTMS Specification PDF.
On command you should look into is OP Code 0x11. When running Zwift in "non-workout mode", so just the normal mode, Zwift will use OP Code 0x11 followed by a set of simulation parameters: Wind Speed, Grade, Rolling resistance coeff and Wind resistance coeff. See the "4.16.2.18 Set Indoor Bike Simulation Parameters Procedure" of the PDF for details on the format of those parameters.
I hope this enables you to proceed. Happy hacking!

How can the processor discern a far return from a near return?

Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.

Can someone please explain to me (in very simple terms) what the difference between the esp, ebp, and esi register is?

I've got to learn assembly and I'm very confused as to what the different registers do/point to.
On some architectures, like MIPS, all registers are created equal, and there is really no difference beyond the name of the register (and software conventions). On x86 you can mostly use any registers for general-purpose computing, but some registers are implicitly bound to the instruction set.
Lots of information about special purposes for registers can be found here.
Examples:
eax, accumulator: many arithmetic instructions implicitly operate on eax. There are also special shorter EAX-specific encodings for many instructions: add eax, 123456 is 1 byte shorter than add ecx, 123456, for example. (add eax, imm32 vs. add r/m32, imm32)
ebx, base: few implicit uses, but xlat is one that matches the "Base" naming. Still relevant: cmpxchg8b. Because it's rarely required for anything specific, some 32-bit calling-conventions / ABIs use it as a pointer to the "global offset table" in Position Independent Code (PIC).
edx, data: some arithmetic operations implicitly operate on the 64-bit value in edx:eax
ecx, counter used for shift counts, and for rep movs. Also, the mostly-obsolete loop instruction implicitly decrements ecx
esi, source index: some string operations read a string from the memory pointed to by esi
edi, destination index: some string operations write a string to the memory pointed to by edi. e.g. rep movsb copies ECX bytes from [esi] to [edi].
ebp, base pointer: normally used to point to local variables. Used implicitly by leave.
esp, stack pointer: points to the top of the stack, used implicitly by push, pop, call and ret
The x86 instruction set is a complex beast, really. Many instructions have shorter forms that implicitly use one register or another. Some registers can be used to do certain addressing while others cannot.
The Intel 80386 Programmer's Reference Manual is a irreplaceable resource, it basically tells you everything there is to know about x86 assembly, except for newer extensions and performance on modern hardware.
The PC Assembly (e)book is a great resource for learning assembly.
The sp register is the stack pointer, used for stack operation like push and pop.
The stack is known as a LIFO structure (last-in, first-out), meaning that the last thing pushed on is the fist thing popped off. It's used, among other things, to implement the ability to call functions.
The bp register is the base pointer, and is commonly used for stack frame operations.
This means that it's a fixed reference to locate local variables, passed parameters and so forth on the stack, for a given level (while sp may change during the execution of a function, bp usually does not).
If you're looking at assembly language like:
mov eax, [bp+8]
you're seeing the code access a stack-level-specific variable.
The si register is the source index, typically used for mass copy operations (di is its equivalent destination index). Intel had these registers along with specific instructions for quick movement of bytes in memory.
The e- variants are just the 32-bit versions of these (originally) 16-bit registers. And, as if that weren't enough, we have 64-bit r- variants as well :-)
Perhaps the simplest place to start is here. It's specific to the 8086 but the concepts haven't changed that much. The simplicity of the 8086 compared to the current crop will be a good starting point for your education. Once you've learned the basics, it will be much easier to move up to the later members of the x86 family.
Transcribed here and edited quite a bit, to make the answer self-contained.
GENERAL PURPOSE REGISTERS
8086 CPU has 8 general purpose registers, each register has its own name:
AX - the accumulator register (divided into AH/AL). Probably the most commonly used register for general purpose stuff.
BX - the base address register (divided into BH/BL).
CX - the count register (divided into CH/CL). Special purpose instructions for loping and shifting.
DX - the data register (divided into DH/DL). Used with AX for some MUL and DIV operations, and for specifying ports in some IN and OUT operations.
SI - source index register. Special purpose instruction to use this as a source of mass memory transfers (DS:SI).
DI - destination index register. Special purpose instruction to use this as a destination of mass memory transfers (ES:DI).
BP - base pointer, primarily used for accessing parameters and variables on the stack.
SP - stack pointer, used for the basic stack operations.
SEGMENT REGISTERS
CS - points at the segment containing the current instruction.
DS - generally points at segment where variables are defined.
ES - extra segment register, it's up to a coder to define its usage.
SS - points at the segment containing the stack.
Although it is possible to store any data in the segment registers, this is never a good idea. The segment registers have a very special purpose - pointing at accessible blocks of memory.
Segment registers work together with general purpose register to access any memory value. For example, if we would like to access memory at the physical address 12345h, we could set the DS = 1230h and SI = 0045h. This way we can access much more memory than with a single register, which is limited to 16 bit values.
The CPU makes a calculation of the physical address by multiplying the segment register by 10h and adding the general purpose register to it (1230h * 10h + 45h = 12345h):
1230
0045
=====
12345
The address formed with 2 registers is called an effective address.
This usage is for real mode only (which is the only mode the 8086 had). Later processors changed these registers from segments to selectors and they are used to lookup addresses in a table, rather than having a fixed calculation performed on them.
By default BX, SI and DI registers work with DS segment register; and BP and SP work with SS segment register.
SPECIAL PURPOSE REGISTERS
IP - the instruction pointer:
Always points to next instruction to be executed.
Offset address relative to CS.
IP register always works together with CS segment register and it points to currently executing instruction.
FLAGS REGISTER
Determines the current state of the processor. These flags are modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program.
Generally you cannot access these registers directly.
Carry Flag CF - this flag is set to 1 when there is an unsigned overflow. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no overflow this flag is set to 0.
Parity Flag PF - this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits.
Auxiliary Flag AF - set to 1 when there is an unsigned overflow for low nibble (4 bits).
Zero Flag ZF - set to 1 when result is zero. For non-zero result this flag is set to 0.
Sign Flag SF - set to 1 when result is negative. When result is positive it is set to 0. (This flag takes the value of the most significant bit.)
Trap Flag TF - Used for on-chip debugging.
Interrupt enable Flag IF - when this flag is set to 1 CPU reacts to interrupts from external devices.
Direction Flag DF - this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward, when this flag is set to 1 the processing is done backward.
Overflow Flag OF - set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).
Here's a simplified summary:
ESP is the current stack pointer, so you generally only update it to manipulate stack, and EBP is intended for stack manipulation too, for example saving the value of ESP before allocating stack space for local variables. But you can use EBP as a general purpose register too.
ESI is the Extended Source Index register, "string" (different from C-string, and I don't mean the type of C-string women wear either) instructions like MOVS use ESI and EDI.
Memory Addressing:
x86 CPUs have these special registers called "segment registers", each of them can point to different address, for example DS (commonly called data segment) may point to 0x1000000, and SS (commonly called stack segment) may point to 0x2000000.
When you use EBP and ESP, the default segment register used is SS, for ESI (and other general purpose registers), it's DS. For example, let's say DS=0x1000000, SS=0x2000000, EBP=0x10, ESI=0x10, so:
mov eax,[esp] //loading from address 0x2000000 + 0x10
mov eax,[esi] //loading from address 0x1000000 + 0x10
You can also specify a segment register to use, overriding the default:
mov eax,ds:[ebp]
In terms of addition, subtraction, logical operations, etc, there's no real difference between them.

Resources