VHDL Logic produces wrong result when using higher frequencies

VHDL Logic produces wrong result when using higher frequencies - asynchronous

I don't have much experience with VHDL, so excuse me if it's stupid boring question, but I couldn't find appropriate answer. I have code that is a bit simplified here.
process (sys_clk, reset):
begin
if reset = '0' then
-- reseting code
elsif rising_edge(sys_clk) then
if data_ready = '1' and old_data_ready = '0' then -- rising edge of asynchronous sig
-- update few registers, and assigning values to few signals
elsif error_occured = '1' and old_error_occurred = '0' then
-- update few registers, and assigning values to few signals (same registers and signal as above)
end if;
old_data_ready <= data_ready;
old_error_occured <= error_occured;
end if;
end process;
Signal is kept much longer high than period of sys_clk is, but it's not know for how long. It varies.
These IFs result in two (one each) registers and an AND circuit. I believe you know that.
This worked, but very badly. The were errors to often. So I made special project using two processes. One active on rising edge of data_ready and one on error_occured. But I could use it just to increment and decrement to separate counters. I used that to verify that problem with my code is that sometimes this rising edge detection does not work. sys_clk is 27MHz, and I made much bigger project using that same frequency and they worked well. But there was no detection of rising edge of asynchronous signals this way. So I reduced frequency to 100kHz, because I don't really need higher frequencies. And that solved my problem.
But just for curiosity, what is the best way to test for rising edge of asynchronous signal when few of these signal affect same registers and device needs to work on higher frequencies?
I use Altera Quartus II and Cyclone II FPGA.

If the signal you are sampling is truly asynchronous, you have to deal with the issue of metastability. If the data_ready signal is in a metastable state exactly on sys_clk's rising edge, old_data_ready and the first if-statement might see different versions of data_ready. Also, you have an asynchronous reset. If the reset signal is released exactly when the data_ready is changing, it may result in data_ready being sampled to different values though out your system. A simulator will not reveal metastability problems, because the code is logically correct.
To circumvent these problems, have asynchronous reset between modules, but synchronous within.
Also, synchronize any signal coming from a different clock domain. A synchronizer is a couple of flip flops placed closely together. When the signal passes through the FFs, any metastability issues will be resolved before it reaches your logic. There is a formula for calculating mean time between failure (MTBF) due to metastability in FPGAs. I won't recite it, but what is basically says is that using this simple method reduces MTBF from seconds to billions of years.
VHDL synchronizer:
process(clk, rst) is
begin
if(rising_edge(clk)) then
if(rst = '0') then
data_ready_s1 <= '0';
data_ready_s2 <= '0';
else
data_ready_s1 <= data_ready;
data_ready_s2 <= data_ready_s1;
end if;
end if;
end process;
Use data_ready_s2 in your module.
Then you constrain the path between the flipflops in the UCF file:
TIMEGRP "FF_s1" = FFS("*_s1") FFS("*_s1<*>");
TIMEGRP "FF_s2" = FFS("*_s2") FFS("*_s2<*>");
TIMESPEC TS_SYNC = FROM "FF_s1" TO "FF_s2" 2 ns DATAPATHONLY;

Related

Why does the first execution of an Ada procedure takes longer than other executions?

I try to write a delay procedure for a FE310 microcontroller. I need to write this procedure because I use a zero footprint runtime (ZFP) that doesn't provide the native Ada delays.
The procedure rely on a 64 bits hardware timer. The timer is incremented 32768 times per second. The procedure reads the timer, calculates the final value by adding a value to the read value and then reads the timer until it reaches its final value.
I toggle a pin before and after the execution and check the delay with a logic analyzer. The delays are quite accurate except for the first execution where they are 400 us to 600 us longer than requested.
Here is my procedure:
procedure Delay_Ms (Ms : Positive)
is
Start_Time : Machine_Time_Value;
End_Time : Machine_Time_Value;
begin
Start_Time := Machine_Time;
End_Time := Start_Time + (Machine_Time_Value (Ms) * Machine_Time_Value (LF_Clock_Frequency)) / 1_000;
loop
exit when Machine_Time >= End_Time;
end loop;
end Delay_Ms;
Machine_Time is a function reading the hardware timer.
Machine_Time_Value is a 64 bits unsigned integer.
I am sure the hardware aspect is correct because I wrote the same algorithm in C and it behaves exactly as expected.
I think that GNAT is adding some code that is only executed the first time. I searched to web for mentions of a similar behavior, but didn't find anything relevant. I found some information about elaboration code and how it can be removed, but after some research, I realized that elaboration code is executed before the main and shouldn't be the cause my problem.
Do you know why the first execution of procedure like mine could take longer? Is it possible to avoid this kind of behavior?

As Simon Wright suggested, the different first execution time is because the MCU reads the code from the SPI flash on first execution but reads it from the instruction cache on subsequent executions.
By default, the FE310 SPI clock is the processor core clock divided by 8. When I set the SPI clock divider to 2, the difference in execution time is divided by 4.

Counting cycles on Cortex M0+

I have a Cortex M0+ (SAML21) board that I'm using for performance testing. I'd like to measure how many cycles a given piece of code takes. I tried using DWT (DWT_CONTROL), but it never produced a result; it returned 0 cycles regardless of what code ran.
// enable the use DWT
*DEMCR = *DEMCR | 0x01000000;
// Reset cycle counter
*DWT_CYCCNT = 0;
// enable cycle counter
*DWT_CONTROL = *DWT_CONTROL | 1 ;
// some code here
// .....
// number of cycles stored in count variable
count = *DWT_CYCCNT;
Is there a way to count cycles (perhaps with an interrupt and counter?) much like I can query for milliseconds (eg. millis() on Arduino)?

I cannot find any mention of the cycle counter register in the ARMv6-M Architecture Reference Manual.
So I'd say, this is not possible with an internal counter like it is in the bigger siblings like the M3, M4 and so on.
This is also stated in this knowledge base article:
This article was written for Cortex-M3 and Cortex-M4, but the same points apply to Cortex-M7, Cortex-M33 and Cortex-M55. Newer Cortex-M processors at the higher end of performance, such as Cortex-M55, may include an extended Performance Motnioring Unit that provides additional preformance measuring capabilities, but these are outside the scope of this article. The smaller Cortex-M processors such as Cortex-M0, Cortex-M0+ and Cortex-M23 do not include the DWT capabilities described here, and, other than the Cortex-M23, do not include ETM instruction trace, but all Cortex-M processors provide the "tarmac" capability for the chip designers.
(Emphasis mine)
So other means have to be used:
some debuggers can measure the time between hitting two breakpoints (or between two stops), the accuracy of this is usually limited by interacting with the OS, so can easily be in the order of 20 ms
use an internal timer with high enough clock frequency to give reasonable results and start / stop it before and after the interesting region
toggle a pin and measure the time with a logic analyzer / oscilloscope

According to the CMSIS header file for the M0+ (core_cm0plus.h), the Core Debug Registers are only accessible over the Debug Access Port and not via the processor. I can only suggest using some free running timer (maybe SysTick) or perhaps your debugger can be of some help to get access to the required registers.

Not able to read the pin value from Arduino Mega using PINxn

Using the register of an Arduino Mega 2560, I am trying to grab the information of the PORTA. I have referred to the datasheet (pages 69-72) and understood that I've to use PINxn (PINA) for this. But all I am getting is 0 as output. I have connected the pin to a LED.
The code and the output are mentioned below.
CODE
#define F_CPU 16000000
#include <avr/io.h>
int main(void) {
DDRA = (1 << DDA0); // sets the pin OUTPUT
__asm__("nop\n\t");
PORTA = 0x01; // Sets it HIGH
unsigned int i = PINA;
Serial.println(i);
}
OUTPUT
0
Thanks in advance for your time – if I’ve missed out anything, over- or under-emphasised a specific point let me know in the comments.

If you want to read back the value previously written to output, I recommend to read it from the register you wrote to, i.e. PORTA.
However according to provided docu (bold by me):
13.2.4 Independent of the setting of Data Direction bit DDxn, the port pin can be read through the PINxn Register bit.
A possible explanation for reading back the old value, immediatly after writing a different one, is probably the shortly following part in the same chapter:
PINxn Register bit and the preceding latch constitute a synchronizer. This is
needed to avoid metastability if the physical pin changes value near the edge of the internal clock, but it also introduces
a delay.
So you will have to account for that delay.
Have a look at timing features provided e.g. by available libraries and at available timer hardware.
But as a proof of concept, I propose to demonstrate by
print the value of PINA before writing the inverted value
write the inverted value to PORTA (inverting only the relevant bit of course)
read and print the value of PINA afterwards (hoping that your header uses volatile here) many times (say 1000)
I expect that you will see several old values, but then the new value.
Depending on how the printing is done (busy waiting?), once might be sufficient.
Your NOP (__asm__("nop\n\t");) might be designed to do the appropriate waiting. But I think it is misplaced (should be after writing new value) and it might be too short. If it is from example code, it should be sufficient. Move it, and maybe do it twice, to be sure for first try. That is likely to be effective.

You should put the "nop" in between the "PORTA = " assignment and "PINA" read. Because the instruction of writing to the PORTx register updates the status of the output pins just at the end of the system clock cycle at the rising edge of the clock generator, but reading from the PINx register returns information which is latched in an intermediate buffer. The buffer latches at the middle (i.e. at the falling edge of the clock generator) of the previous clock cycle.
So, reading from the PINx is always delayed for from 0.5 to 1.5 clock cycles.
If the logic level changed in some system clock just before it's middle (i.e. before the falling edge of the clock generator), then this value will be immediately latched, and available for read thru reading the PINx register at the next system clock cycle. Thus, the delay is 0.5 cycles
If the logic level changed just after that latching moment, then, it will be latched only in the next cycle, and will be available for reading in the cycle next after that, thus introducing the delay of 1.5 cycles
The writing to PORTx register updates the output value at the end of the clock cycle, so, it only latched in the next cycle, and will be available for reading only in next cycle after that.
The C compiler is pretty good for optimizaion, so, two consequent lines with PORTA assignment and PINA reading were compiled to just two consequent out PORTA, rxx and in ryy, PINA instructions, which cause that effect

synthesizable asynchronous fifo design towards an FPGA

I need some advice on how to design an asynchronous FIFO. I understand the meta stability issue when capturing data into a different clock domain, my question is how does using a two flip flop shift register assist in synchronization of write pointer and read pointer values for full and empty flag calculation.
When register captures a data of a different domain there is a possibility it can enter a metastable state and can settle to a unknown value, so how do u effectively resolve this issue.
Thanks

Your read and write pointers need to use gray encoding when transferred from one clock domain to the other. As you should know, only 1 bit of a gray counter is different between two consecutive values. Thus, metastability can affect only the one changing bit. After re-synchronization, the transferred pointer will be either the updated pointer or its previous value.
In either case, this is not a problem and only lead to pessimistic flags/count for your FIFO.
I use regular counter for my read/write pointer, and use the following functions to convert them to gray code. They are in VHDL, but you should get the idea:
function bin_to_gray(a: unsigned) return unsigned is
begin
return a xor ('0' & a(a'left downto 1));
end function bin_to_gray;
function gray_to_bin(a: unsigned) return unsigned is
variable ret : unsigned(a'range);
begin
ret(a'left) := a(a'left);
for i in a'left-1 downto 0 loop
ret(i) := ret(i+1) xor a(i);
end loop;
return ret;
end function gray_to_bin;

Jonathan explained it well.
I would just like to add a few points:
First, in addition to your 2-stage synchronizer registers you must also have a source register.
You can never feed signals from combinational logic into your 2-stage synchronizer, since combinational logic produce glitches.
You must also be aware that Verilog and VHDL has no built-in support for clock domain crossings and metastability.
Even if you create a proper 2-stage synchronizer to transfer the gray coded pointers, there is no guarantee that the synthesis tool does not change your synchronizers in a way which make it ineffective in protecting againsts metastability. Some synthesis tools try to detect synchronizers and leave them alone. Some don't. And in either case, you should not rely on it.
For a completely proper clock domain crossing, you must constrain the synchronizer and the source register using vendor-specific attributes and SDC timing constraints.

VHDL STD_LOGIC_VECTOR Wildcard Values

I've been trying to write a Finite State Machine in VHDL code for a simple 16-bit processor I'm implementing on an Altera DE1 board. In the Finite State Machine, I have a CASE statement that handles the different 16-bit instructions, which are brought into the FSM by a 16-bit STD_LOGIC_VECTOR. However, I'm having a little trouble in the decode state where the Finite State Machine decodes the instruction. One of the instructions is an ADD which takes two registers as operands and a third as the destination register. However, I also have an ADD instruction which takes a register and a 5-bit immediate value as operands and a second register for the destination. My problem is that in the CASE statement, I need to be able to differentiate between the two different ADD instructions. So, I thought that if I use wildcard values like "-" or "X" in the CASE statement, I would be able to differentiate between the two with just two cases instead of listing all of the possible register/immediate value combinations. For example:
CASE IR IS --(IR stands for "Instruction Register")
WHEN "0001------0-----" => (Go to 3-register add);
WHEN "0001------1-----" => (Go to 2-register/immediate value add);
WHEN OTHERS => (Do whatever);
END CASE;
These aren't the only two instructions I have, I just put these two to make this post a little shorter. When I compile and run this code, the processor stops executing when it gets to the "decode" state. Also, Quartus gives many, many warnings saying things like "VHDL choice warning at LC3FSM.vhd(37): ignored choice containing meta-value ""0001------0-----"""
I am at a loss as to how to go about accomplishing this. I REALLY do not and probably don't need to define every single 16-bit combination, and I hope there's a way to use wildcards in a STD_LOGIC_VECTOR to minimize the number of combinations I will have to define.
Does anybody know how to accomplish this?
Thanks

That can't be done unfortunately. Rather unexpectedly for most users, the comparison operator = and the case comparison perform a literal comparison. This is because the std_logic type is just a set of characters, which happen to perform like logic values due to the way other functions (eg and and or) are defined.
VHDL-2008 introduces a new case statement case? which performs as you expect - you'll need to tell your compiler to operate in VHDL 2008 mode. In addition, there is a ?= operator in VHDL 2008 which compares two values, taking account of -s.
If you are lumbered with a compiler which still doesn't support VHDL 2008, complain to the supplier. There is also a std_match function allows you to perform comparisons in older VHDL revisions, but nothing that I am aware to make the case statement work that way.

Assuming you don't need the other bits in the instruction you could hack your way around this by masking the other bits with a pre-check process. (Or just ensure the other bits are reset when you write the instruction?)
This really is a bit of a hack.
assuming IR is stored as a variable
if IR(15 downto 12) == "0001" then
IR := IR_in(15 downto 12) & "0000000" & IR_in(5) & "00000";
else
IR := IR_in
end if;
CASE IR IS --(IR stands for "Instruction Register")
WHEN "0001000000000000" => (Go to 3-register add);
WHEN "0001000000100000" => (Go to 2-register/immediate value add);
WHEN OTHERS => (Do whatever);
END CASE;
Alternatively assuming your instruction is cleverly thought out (are the first four bits the command word or something along those lines?) you could do nested case statements and do the differentiation as needed in those sub blocks.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex