Synchronous vs Asynchronous Resets in FPGA system - asynchronous

I'm new to creating a FPGA system to drive an I2C Bus (although I imagine that this problem applies to any FPGA system) using a variety of different modules, and which all use a synchronous reset.
The modules are clocked using a clock divider module that takes the system clock and outputs a lower frequency to the rest of the system.
The problem I'm having is, when the reset signal goes low, the clock divider resets, and therefore the clock that other modules depend on stop - thus the other modules do not register the reset
An obvious solution would be to have an asynchronous reset, however, in Xilinx ISE it doesn't appear to like them and throws a warning saying that this is incompatible with the Spartan-6 FPGA (especially when the code after the asynchronous code IS synchronous, which it is because an I2C bus uses the bus clock to put bits onto the bus).
Another solution would be for the clock divider to simply not be reset-able, thus the clock would never stop and all modules would reset correctly. However this then means that the clock divider registers cannot be initialised/reinitialised to a known state - which I've been told would be a big problem, although I know you can use the := '0'/'1'; operator in simulation, but this does not work once programmed on the actual FPGA(?).
What is the convention for synchronous resets? Are clock generators generally just not reset? Or do they only reset on the instantaneous edge of the reset signal? Or are none of my suggestions a real solution!
I've put in a timing diagram as well as my code to illustrate both what I mean, and to show the code I've been using.
Thanks very much!
David
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
library UNISIM;
use UNISIM.VComponents.all;
ENTITY CLK_DIVIDER IS
GENERIC(INPUT_FREQ : INTEGER;
OUT1_FREQ : INTEGER;
OUT2_FREQ : INTEGER
);
PORT(SYSCLK : IN STD_LOGIC;
RESET_N : IN STD_LOGIC;
OUT1 : OUT STD_LOGIC;
OUT2 : OUT STD_LOGIC);
END CLK_DIVIDER;
architecture Behavioral of Clk_Divider is
constant divider1 : integer := INPUT_FREQ / OUT1_FREQ / 2;
constant divider2 : integer := INPUT_FREQ / OUT2_FREQ / 2;
signal counter1 : integer := 0;
signal counter2 : integer := 0;
signal output1 : std_logic := '0';
signal output2 : std_logic := '0';
begin
output1_proc : process(SYSCLK)
begin
if rising_edge(SYSCLK) then
if RESET_N = '0' then
counter1 <= 0;
output1 <= '1';
else
if counter1 >= divider1 - 1 then
output1 <= not output1;
counter1 <= 0;
else
counter1 <= counter1 + 1;
end if;
end if;
end if;
end process;
output2_proc : process(SYSCLK)
begin
if rising_edge(SYSCLK) then
if RESET_N = '0' then
counter2 <= 0;
output2 <= '1';
else
if counter2 >= divider2 - 1 then
output2 <= not output2;
counter2 <= 0;
else
counter2 <= counter2 + 1;
end if;
end if;
end if;
end process;
OUT1 <= output1;
OUT2 <= output2;
end Behavioral;

Don't generate internal clocks with user logic, but use a device specific PLL/DCM if multiple clocks are really needed. All the user logic running on the derived clocks should then be held in reset until the clocks are stable, and reset for user logic can then be released as required by design. Either synchronous reset or asynchronous reset can be used.
But i this case, probably generate a clock enable signal instead, and assert this enable signal for a single cycle each time update of the signals are required in order to generate whatever protocol is needed, e.g. the I2C protocol with appropriate timing.
Using fewer clocks, combined with synchronous clock enable signals, makes setup for Static Timing Analysis (STA) easier, and also avoid issues with reset synchronization and Clock Domain Crossing (CDC).

A robust way of handling the resets in a system like this is as follows:
Use a DCM/PLL/MMCM in the Xilinx FPGA to process the input system clock and generate all the output clock frequencies you need, bearing in mind for really low frequencies you should use a clock within the specifications of the clock manager and generate a clock enable signal to use in conjunction with it. This can be reset from the host system at start-up or if at any point the input clock is removed and then re-applied.
Invert the LOCKED signal from the clock manager to generate an active high reset when it is in reset or in the process of locking to the input. This should be passed through an SRL16 or SRL32 to delay it. This SRL should be clocked with the output of the PLL after it's been put onto the global clock routing with a BUFG. Use an extra flip-flop after the SRL for improved timing. This signal can then be used as an active high synchronous reset to rest of the logic in the device where it is needed.
If you get timing errors on the clock enable signal because it is high-fanout net the this could also be put through a BUFG to access the fast global clock network to improve timing.

#Stuart Vivian
(this should be posted as comment but I don't have enough reputation points to do so, sorry about that)
Consider using a counter instead of a shift register for delaying resets because if a LUT content is not cleared after loading the bitstream (some FPGA families have this behaviour), the reset signal may bounce, leading to unpredictable results.

Related

STM32F4 UART half word addressing

Trying to roll my own code for STM32F4 UART.
A peculiarity of this chip is that if you use byte addressing as the GNAT compiler does when setting a single bit, the corresponding bit in the other byte of the half word is set. The data sheet says use half word addressing. Is there a way to tell the compiler to do this? I tried
for CR1_register'Size use 16;
but this had no effect. Writing the whole 16 bit word works, but you lose the ability to set named bits.
The GNAT way to do this, as used in the AdaCore Ada Drivers Library, is to use the GNAT-only aspect Volatile_Full_Access, about which the GNAT Reference Manual says
This is similar in effect to pragma Volatile, except that any reference to the object is guaranteed to be done only with instructions that read or write all the bits of the object. Furthermore, if the object is of a composite type, then any reference to a subcomponent of the object is guaranteed to read and/or write all the bits of the object.
The intention is that this be suitable for use with memory-mapped I/O devices on some machines. Note that there are two important respects in which this is different from pragma Atomic. First a reference to a Volatile_Full_Access object is not a sequential action in the RM 9.10 sense and, therefore, does not create a synchronization point. Second, in the case of pragma Atomic, there is no guarantee that all the bits will be accessed if the reference is not to the whole object; the compiler is allowed (and generally will) access only part of the object in this case.
Their code is
-- Control register 1
type CR1_Register is record
-- Send break
SBK : Boolean := False;
...
end record
with Volatile_Full_Access, Size => 32,
Bit_Order => System.Low_Order_First;
for CR1_Register use record
SBK at 0 range 0 .. 0;
...
end record;
Portable way is to do this explicitly: read whole record, modify, then write it back. As long as it is declared Volatile a compiler will not optimize reads and writes out.
-- excerpt from my working code --
declare
R : Control_Register_1 := Module.CR1;
begin
R.UE := True;
Module.CR1 := R;
end;
This is very verbose, but it does its work.

Precedence of initialized port/signal assigned to port in VHDL

I have a question regarding initialization in VHDL. If I have an entity output port that is initialized to a certain value, but is assigned to a signal that is initialized to a different value, what initial value will the output assume. I mean a situation like the following:
entity TEST_ENTITY is
Port (port0 : out STD_LOGIC := '0');
end TEST_ENTITY;
architecture Behavioral of TEST_ENTITY is
signal signal0 : STD_LOGIC := '1';
begin
port0 <= signal0;
end Behavioral;
I would assume that the initialization value of the signal will take precedence. Is this correct?
There is no precedence here. Signal assignments take at least one delta cycle to pass. So at time 0, port0 will be '0' and signal0 will be '1'. Port0 will become '1' after 1 delta cycle has elapsed.

Prevent unwanted Toggling in a Timer

I have a problem with arduino due timers. First let me explain what i know of them.I don't know if there is a way to solve this issue for general timers. Due timers features:
1) They always start from zero,
2) They work as UP-COUNTING or UP-DOWN counting timers,
3) Each timer has two compare registers.
My project involves cases to work in sampled times(period), i.e. timer runs for a sampled time and based on values in compare registers the outputs TIOA and TIOB toggles.I am working in up-down mode. Now the problem is when I have zero in a compare register I expect a zero output (on TIOA and TIOB) for whole period. But the timer is toggling output for zero comparison also. i.e. instead of getting a zero always i am getting a square wave with (2*period) as its time period. Is this common problem for other timers also?
Can you guys suggest me a workaround for this problem?
Thanks in advance.
#include <AdvaDueTC.h>
int default_clock = 1;
int RCcntS = 2187*2;
int period0 = 65536;
int a = 2180;
int b = 0;
void subrtn()
{
changeTC_TC3_Period(RCcntS); // loading sampler TC3 with RCcntS
changeTC_TC0_Period(RCcntS/2,a,b); // loading timer TC0 with RCcntT
}
void setup() {
setupTC3_Interrupt(period0,default_clock ,subrtn);//setup sampler interrupt
setupTC_TC0_Timing(period0, default_clock);
}
void loop() {
// put your main code here, to run repeatedly:
}
functions used are :
Here TC3 is in UP mode and TC0 is in UPDOWN mode of operation. TIOA0 and TIOB0 are used for obtaining toggling output.(i.e. in REG_TC0_CMR0, ACPA,BCPB are set to 3). Here TIOB0 is toggling and I want to stay at one valve (0 or 3.3v) for whole period.
Thanks for your suggestion.
when I have zero in a compare register I expect a zero output
i expect the output to be triggered two times (UP and DOWN) every tick (i think you call it period), because the timer is overflowing EVERY tick.
Solution is turn off the timer comparison.
this seems to me a PWM, maybe you'll get better result using the dedicated HW
Yes what you said is correct. At first I couldn't get it but this MCU timer has option to set or clear the timer output value for whole period. so without going for TOGGLE always, I used these options to get desired operation.

Replacement for Arduinos millis() that is reliable also with disabled interrupts

As stated in stackoverflow-17135805 the millis() function does not return the correct time, if the interrupts where disabled, while Arduino had to detect an overflow of timer0.
I have a time critical program that uses a lot of functions which have to disable the interrupts. So my program runs 1:30 while it thinks it was running only for 1:00.
Is there another timer that I can use to avoid this problem?
It happens to me when I use the GSM Module:
// startpoint
unsigned long t = 0;
unsigned long start = millis();
while ( (millis()-start) < 30000 ){
//read a chunk from the gprs module
for (int i=0;i<8;i++)
client.read();
//do this loop every 10ms
while( (millis()-start) < t*10 ){};
t++;
}
//endpoint
From the startpoint to the endpoint it should take 30 seconds. Instead it takes 65 seconds.
If you have to disable interrupts so often and so long your best bet would be to use an external timer. I highly recommend DS3231. Since it has a build in crystal it is easier to setup than a 1307 and it is also significantly more accurate.
You could use one of the other hardware timers
to keep track of the time. For example, on the Leonardo Timer 1 is a 16 bit timer.
To set it up directly (this obliterates code portability) there are a couple steps.
TCCR1A = 0;
this puts the timer in "normal" mode, meaning it just runs to 0xFFFF and wraps back to 0x0000.
TCCR3B = 0;
TCCR3B = _BV(CS11) | _BV(CS10);
this starts the timer and sets it to use a clock/64 prescale, which equates to 1 tic every 4us.
To check the time:
long time; // declared somewhere in scope.
time = TCNT1; // this reads the timer count register
time *= 4; // this multiplies time by 4 to give you us.
As mentioned earlier, TCNT1 wraps around at 0xFFFF = 65536. So, with the pre-scaler set as above, that gives you about 65536 * 4E-6 = .262 seconds of counting before your program needs to put the data into a bigger variable (assuming you care). Hopefully it isn't a problem to poll things more often than 4 times a second, which gets you away from interrupts.
Several arduino core functions utilize these timers, so you'll need to verify that the core functions you need don't depend on the timer you choose. For example, doing the above will break analogWrite() on certain pins.

How do I know if the kernels are executing concurrently?

I have a GPU with CC 3.0, so it should support 16 concurrent kernels. I am starting 10 kernels by looping through clEnqueueNDRangeKernel for 10 times. How do I get to know that the kernels are executing concurrently?
One way which I have thought is to get the time before and after the NDRangeKernel statement. I might have to use events so as to ensure the execution of the kernel has completed. But I still feel that the loop will start the kernels sequentially. Can someone help me out..
To determine if your kernel executions overlap, you have to profile them. This requires several steps:
1. Creating the command-queues
Profiling data is only collected if the command-queue is created with the property CL_QUEUE_PROFILING_ENABLE:
cl_command_queue queues[10];
for (int i = 0; i < 10; ++i) {
queues[i] = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE,
&errcode);
}
2. Making sure all kernels start at the same time
You are right in your assumption that the CPU queues the kernels sequentially. However, you can create a single user event and add it to the wait list for all kernels. This causes the kernels not to start running before the user event is completed:
// Create the user event
cl_event user_event = clCreateUserEvent(context, &errcode);
// Reserve space for kernel events
cl_event kernel_events[10];
// Enqueue kernels
for (int i = 0; i < 10; ++i) {
clEnqueueNDRangeKernel(queues[i], kernel, work_dim, global_work_offset,
global_work_size, 1, &user_event, &kernel_events[i]);
}
// Start all kernels by completing the user event
clSetUserEventStatus(user_event, CL_COMPLETE);
3. Obtain profiling times
Finally, we can collect the timing information for the kernel events:
// Block until all kernels have run to completion
clWaitForEvents(10, kernel_events);
for (int i = 0; i < 10; ++i) {
cl_ulong start;
clGetEventProfilingInfo(kernel_event[i], CL_PROFILING_COMMAND_START,
sizeof(start), &start, NULL);
cl_ulong end;
clGetEventProfilingInfo(kernel_event[i], CL_PROFILING_COMMAND_END,
sizeof(end), &end, NULL);
printf("Event %d: start=%llu, end=%llu", i, start, end);
}
4. Analyzing the output
Now that you have the start and end times of all kernel runs, you can check for overlaps (either by hand or programmatically). The output units are nanoseconds. Note however that the device timer is only accurate to a certain resolution. You can query the resolution using:
size_t resolution;
clGetDeviceInfo(device, CL_DEVICE_PROFILING_TIMER_RESOLUTION,
sizeof(resolution), &resolution, NULL);
FWIW, I tried this on a NVIDIA device with CC 2.0 (which should support concurrent kernels) and observed that the kernels were run sequentially.
You can avoid all the boilerplate code suggested in the other answers (which are correct by the way) by using C Framework for OpenCL, which simplifies this task a lot, and gives you detailed information about OpenCL events (kernel execution, data transfers, etc), including a table and a plot dedicated to overlapped execution of said events.
I developed this library in order to, among other things, simplify the process described in the other answers. You can see a basic usage example here.
Yes, as you suggest, try to use the events, and analyze all the QUEUED, SUBMIT, START, END values. These should be absolute values in "device time", and you may be able to see if processing (START to END) overlaps for the different kernels.

Resources