synthesizable asynchronous fifo design towards an FPGA - asynchronous

I need some advice on how to design an asynchronous FIFO. I understand the meta stability issue when capturing data into a different clock domain, my question is how does using a two flip flop shift register assist in synchronization of write pointer and read pointer values for full and empty flag calculation.
When register captures a data of a different domain there is a possibility it can enter a metastable state and can settle to a unknown value, so how do u effectively resolve this issue.
Thanks

Your read and write pointers need to use gray encoding when transferred from one clock domain to the other. As you should know, only 1 bit of a gray counter is different between two consecutive values. Thus, metastability can affect only the one changing bit. After re-synchronization, the transferred pointer will be either the updated pointer or its previous value.
In either case, this is not a problem and only lead to pessimistic flags/count for your FIFO.
I use regular counter for my read/write pointer, and use the following functions to convert them to gray code. They are in VHDL, but you should get the idea:
function bin_to_gray(a: unsigned) return unsigned is
begin
return a xor ('0' & a(a'left downto 1));
end function bin_to_gray;
function gray_to_bin(a: unsigned) return unsigned is
variable ret : unsigned(a'range);
begin
ret(a'left) := a(a'left);
for i in a'left-1 downto 0 loop
ret(i) := ret(i+1) xor a(i);
end loop;
return ret;
end function gray_to_bin;

Jonathan explained it well.
I would just like to add a few points:
First, in addition to your 2-stage synchronizer registers you must also have a source register.
You can never feed signals from combinational logic into your 2-stage synchronizer, since combinational logic produce glitches.
You must also be aware that Verilog and VHDL has no built-in support for clock domain crossings and metastability.
Even if you create a proper 2-stage synchronizer to transfer the gray coded pointers, there is no guarantee that the synthesis tool does not change your synchronizers in a way which make it ineffective in protecting againsts metastability. Some synthesis tools try to detect synchronizers and leave them alone. Some don't. And in either case, you should not rely on it.
For a completely proper clock domain crossing, you must constrain the synchronizer and the source register using vendor-specific attributes and SDC timing constraints.

Related

How can the processor discern a far return from a near return?

Reading Intel's big manual, I see that if you want to return from a far call, that is, a call to a procedure in another code segment, you simply issue a return instruction (possibly with an immediate argument that moves the stack pointer up n bytes after the pointer popping).
This, apparently, if I'm interpreting things correctly, is enough for the hardware to pop both the segment selector and offset into the correct registers.
But, how does the system know that the return should be a far return and that both an offset AND a selector need to be popped?
If the hardware just pops the offset pointer and not the selector after it, then you'll be pointing to the right offset but wrong segment.
There is nothing special about the far return command compared to the near return version.
They both look identical as far as I can tell.
I assume then that the processor, perhaps at the micro-architecture level, keeps track of which calls are far and which are close so that when they're returned from, the system knows how many bytes to pop and where to pop them (pointer registers and segment selector registers).
Is my assumption correct?
What do you guys know about this mechanism?
The processor doesn't track whether or not a call should be far or near; the compiler decides how to encode the function call and return using either far or near opcodes.
As it is, FAR calls have no use on modern processors because you don't need to change any segment register values; that's the point of a flat memory model. Segment registers still exist, but the OS sets them up with base=0 and limit=0xffffffff so just a plain 32-bit pointer can access all memory. Everything is NEAR, if you need to put a name on it.
Normally you just don't even think about segmentation so you don't actually call it either. But the manual still describes the call/ret opcodes we use for normal code as the NEAR versions.
FAR and NEAR were used on old 86 processors, which used a segmented memory model. Programs at that time needed to choose what kind of architecture they wished to support, ranging from "tiny" to "large". If your program was small enough to fit in a single segment, then it could be compiled using NEAR calls and returns exclusively. If it was "large", the opposite was true. For anything in between, you had power to choose whether local functions needed to be able to be either callable/returnable from code in another segment.
Most modern programs (besides bootloaders and the like) run on a different construct: they expect a flat memory model. Behind the scenes the OS will swap out memory as needed (with paging not segmentation), but as far as the program is concerned, it has its virtual address space all to itself.
But, to answer your question, the difference in the call/return is the opcode used; the processor obeys the command given to it. If you mistake (say, give it a FAR return opcode when in flat mode), it'll fail.

PIC24F - Set LATx specific pins without effecting the other pins

Is there a way to set specific port pins without effecting other pins at the same port?
For example:
I used LATB[13:6] for 7-segment LCD, the rest LATB bits are used for other purposes.
Now I need to set LATB = 0x003F for display '0', if i do this the rest of the bits are changed.
Someone can help me?
You'll have to split the operation, since you can't address specifically bits 6 to 13 in a 16 bit register. For instance, assuming LATB is a 16 bit register on which bits 6 to 13 (a range of 8 bits) map to a 7-segment display with period (making 8 segments), and we want to set those pins in particular to 0x3f = 0b00111111, we can do:
LATB = (LATB & ~(0xff<<6)) | (0x3f<<6);
0xff is a bit mask of which bits we want to affect, representing 8 bits, which we shift into position 6-13 using <<6.
However, this is not atomic; we are reading, masking out the bits we want to adjust, setting them to new values, and writing back the entire register including the preserved other bits. Thus we may need for instance to disable interrupts around such a line.
For many MCUs there are particular code paths supporting modification of single bits, or dedicated logic for clear/set. Those might mean that you could perform the adjustment without risking trampling another change if you stick to plainer operations, such as:
val = 0x3f;
LATB |= (val<<6); // set bits which should be set
LATB &= (val<<6) | ~(0xff<<6); // clear bits that should be clear
In this example, we're not doing the display update in one step, but each update we are making is left in a form the compiler might be able to optimize to a single instruction (IOR and AND, respectively).
Some processors also have instructions to access sections of a word like this, frequently named bitfield operations. I don't think PIC24 is among those. It does have single-bit access instructions, but they seem to either operate on the working file or require fixed bit positions, which means setting bit by bit would have to be unrolled.
C also does have a concept of bit fields, which means is is possible to define a struct interpretation of the latch register that does have a name for the bits you want to affect, but it's a fairly fragile method. You're writing architecture specific code anyway when relying on the particular register names. It is likely best to inspect documentation for your compiler and platform libraries.

Ada pragma Pack or Alignment attribute for Records?

Having just discovered alignment issues for the first time I am unsure on which method is the best/safest way to deal with them. I have a record which I am serialising to send over a Stream and vice-versa so it must meet the interface spec and contain no padding.
Given the example record:
type MyRecord is record
a : Unsigned_8;
b : Unsigned_32;
end record;
This by default would require 8 bytes but I am able to remove packing using 2 methods:
for MyRecord'Alignment use 1;
or
pragma Pack (MyRecord);
I have found a few questions relating to C examples but haven't been able to find a clear answer on which method is the most appropriate, how to determine which method to use or if they are equivalent?
UPDATE
When I tried both on my 'real' code rather than a basic example I found that the Alignment attribute achieved what I was looking for. pragma Pack significantly reduced the size, not confirmed but I assume it has packed the many enumerated types I'm using, overriding the 'Size use 8 attribute applied to each type.
For Streams you could leave MyRecord without any representation clauses and use the default MyRecord’Write and MyRecord’Read; ARM 13.13.2(9) says
For elementary types, Read reads (and Write writes) the number of stream elements implied by the Stream_Size for the type T; the representation of those stream elements is implementation defined. For composite types, the Write or Read attribute for each component is called in canonical order, which is last dimension varying fastest for an array (unless the convention of the array is Fortran, in which case it is first dimension varying fastest), and positional aggregate order for a record.
One possible disadvantage of the GNAT implementation (and maybe of others) is that the ’Write and ’Read calls each end in a call to the underlying network software. Not a problem (aside from possible inefficiency) normally, but if you’re using TCP_NODELAY (or worse, UDP) this is not the behaviour you’re looking for.
Overloading ’Write leads back to your original problem (but at least it’s confined to the overloading procedure, so the rest of your program can deal with properly aligned data).
I’ve used an in-memory stream for this (especially the UDP case); ’Write to the in-memory stream, then send the Stream_Element_Array to the socket. One example is ColdFrame.Memory_Streams (.ads, .adb).
I think you want the record representation clauses, if you want full control:
for MyRecord'Size use 40;
for MyRecord use record
a at 0 range 0 .. 7;
b at 1 range 0 .. 31;
end record;
(or some such, I might have messed up some of the indices here).
NB: edited as per comment by Simon

VHDL Logic produces wrong result when using higher frequencies

I don't have much experience with VHDL, so excuse me if it's stupid boring question, but I couldn't find appropriate answer. I have code that is a bit simplified here.
process (sys_clk, reset):
begin
if reset = '0' then
-- reseting code
elsif rising_edge(sys_clk) then
if data_ready = '1' and old_data_ready = '0' then -- rising edge of asynchronous sig
-- update few registers, and assigning values to few signals
elsif error_occured = '1' and old_error_occurred = '0' then
-- update few registers, and assigning values to few signals (same registers and signal as above)
end if;
old_data_ready <= data_ready;
old_error_occured <= error_occured;
end if;
end process;
Signal is kept much longer high than period of sys_clk is, but it's not know for how long. It varies.
These IFs result in two (one each) registers and an AND circuit. I believe you know that.
This worked, but very badly. The were errors to often. So I made special project using two processes. One active on rising edge of data_ready and one on error_occured. But I could use it just to increment and decrement to separate counters. I used that to verify that problem with my code is that sometimes this rising edge detection does not work. sys_clk is 27MHz, and I made much bigger project using that same frequency and they worked well. But there was no detection of rising edge of asynchronous signals this way. So I reduced frequency to 100kHz, because I don't really need higher frequencies. And that solved my problem.
But just for curiosity, what is the best way to test for rising edge of asynchronous signal when few of these signal affect same registers and device needs to work on higher frequencies?
I use Altera Quartus II and Cyclone II FPGA.
If the signal you are sampling is truly asynchronous, you have to deal with the issue of metastability. If the data_ready signal is in a metastable state exactly on sys_clk's rising edge, old_data_ready and the first if-statement might see different versions of data_ready. Also, you have an asynchronous reset. If the reset signal is released exactly when the data_ready is changing, it may result in data_ready being sampled to different values though out your system. A simulator will not reveal metastability problems, because the code is logically correct.
To circumvent these problems, have asynchronous reset between modules, but synchronous within.
Also, synchronize any signal coming from a different clock domain. A synchronizer is a couple of flip flops placed closely together. When the signal passes through the FFs, any metastability issues will be resolved before it reaches your logic. There is a formula for calculating mean time between failure (MTBF) due to metastability in FPGAs. I won't recite it, but what is basically says is that using this simple method reduces MTBF from seconds to billions of years.
VHDL synchronizer:
process(clk, rst) is
begin
if(rising_edge(clk)) then
if(rst = '0') then
data_ready_s1 <= '0';
data_ready_s2 <= '0';
else
data_ready_s1 <= data_ready;
data_ready_s2 <= data_ready_s1;
end if;
end if;
end process;
Use data_ready_s2 in your module.
Then you constrain the path between the flipflops in the UCF file:
TIMEGRP "FF_s1" = FFS("*_s1") FFS("*_s1<*>");
TIMEGRP "FF_s2" = FFS("*_s2") FFS("*_s2<*>");
TIMESPEC TS_SYNC = FROM "FF_s1" TO "FF_s2" 2 ns DATAPATHONLY;

VHDL STD_LOGIC_VECTOR Wildcard Values

I've been trying to write a Finite State Machine in VHDL code for a simple 16-bit processor I'm implementing on an Altera DE1 board. In the Finite State Machine, I have a CASE statement that handles the different 16-bit instructions, which are brought into the FSM by a 16-bit STD_LOGIC_VECTOR. However, I'm having a little trouble in the decode state where the Finite State Machine decodes the instruction. One of the instructions is an ADD which takes two registers as operands and a third as the destination register. However, I also have an ADD instruction which takes a register and a 5-bit immediate value as operands and a second register for the destination. My problem is that in the CASE statement, I need to be able to differentiate between the two different ADD instructions. So, I thought that if I use wildcard values like "-" or "X" in the CASE statement, I would be able to differentiate between the two with just two cases instead of listing all of the possible register/immediate value combinations. For example:
CASE IR IS --(IR stands for "Instruction Register")
WHEN "0001------0-----" => (Go to 3-register add);
WHEN "0001------1-----" => (Go to 2-register/immediate value add);
WHEN OTHERS => (Do whatever);
END CASE;
These aren't the only two instructions I have, I just put these two to make this post a little shorter. When I compile and run this code, the processor stops executing when it gets to the "decode" state. Also, Quartus gives many, many warnings saying things like "VHDL choice warning at LC3FSM.vhd(37): ignored choice containing meta-value ""0001------0-----"""
I am at a loss as to how to go about accomplishing this. I REALLY do not and probably don't need to define every single 16-bit combination, and I hope there's a way to use wildcards in a STD_LOGIC_VECTOR to minimize the number of combinations I will have to define.
Does anybody know how to accomplish this?
Thanks
That can't be done unfortunately. Rather unexpectedly for most users, the comparison operator = and the case comparison perform a literal comparison. This is because the std_logic type is just a set of characters, which happen to perform like logic values due to the way other functions (eg and and or) are defined.
VHDL-2008 introduces a new case statement case? which performs as you expect - you'll need to tell your compiler to operate in VHDL 2008 mode. In addition, there is a ?= operator in VHDL 2008 which compares two values, taking account of -s.
If you are lumbered with a compiler which still doesn't support VHDL 2008, complain to the supplier. There is also a std_match function allows you to perform comparisons in older VHDL revisions, but nothing that I am aware to make the case statement work that way.
Assuming you don't need the other bits in the instruction you could hack your way around this by masking the other bits with a pre-check process. (Or just ensure the other bits are reset when you write the instruction?)
This really is a bit of a hack.
assuming IR is stored as a variable
if IR(15 downto 12) == "0001" then
IR := IR_in(15 downto 12) & "0000000" & IR_in(5) & "00000";
else
IR := IR_in
end if;
CASE IR IS --(IR stands for "Instruction Register")
WHEN "0001000000000000" => (Go to 3-register add);
WHEN "0001000000100000" => (Go to 2-register/immediate value add);
WHEN OTHERS => (Do whatever);
END CASE;
Alternatively assuming your instruction is cleverly thought out (are the first four bits the command word or something along those lines?) you could do nested case statements and do the differentiation as needed in those sub blocks.

Resources