Why does the first execution of an Ada procedure takes longer than other executions? - microcontroller

I try to write a delay procedure for a FE310 microcontroller. I need to write this procedure because I use a zero footprint runtime (ZFP) that doesn't provide the native Ada delays.
The procedure rely on a 64 bits hardware timer. The timer is incremented 32768 times per second. The procedure reads the timer, calculates the final value by adding a value to the read value and then reads the timer until it reaches its final value.
I toggle a pin before and after the execution and check the delay with a logic analyzer. The delays are quite accurate except for the first execution where they are 400 us to 600 us longer than requested.
Here is my procedure:
procedure Delay_Ms (Ms : Positive)
is
Start_Time : Machine_Time_Value;
End_Time : Machine_Time_Value;
begin
Start_Time := Machine_Time;
End_Time := Start_Time + (Machine_Time_Value (Ms) * Machine_Time_Value (LF_Clock_Frequency)) / 1_000;
loop
exit when Machine_Time >= End_Time;
end loop;
end Delay_Ms;
Machine_Time is a function reading the hardware timer.
Machine_Time_Value is a 64 bits unsigned integer.
I am sure the hardware aspect is correct because I wrote the same algorithm in C and it behaves exactly as expected.
I think that GNAT is adding some code that is only executed the first time. I searched to web for mentions of a similar behavior, but didn't find anything relevant. I found some information about elaboration code and how it can be removed, but after some research, I realized that elaboration code is executed before the main and shouldn't be the cause my problem.
Do you know why the first execution of procedure like mine could take longer? Is it possible to avoid this kind of behavior?

As Simon Wright suggested, the different first execution time is because the MCU reads the code from the SPI flash on first execution but reads it from the instruction cache on subsequent executions.
By default, the FE310 SPI clock is the processor core clock divided by 8. When I set the SPI clock divider to 2, the difference in execution time is divided by 4.

Related

Confused about Airflow's BaseSensorOperator parameters : timeout, poke_interval and mode

I have a bit of confusion about the way BaseSensorOperator's parameters work: timeout & poke_interval.
Consider this usage of the sensor :
BaseSensorOperator(
soft_fail=True,
poke_interval = 4*60*60, # Poke every 4 hours
timeout = 12*60*60, # Timeout after 12 hours
)
The documentation mentions the timeout acts to set the task to 'fail' after it runs out. But I'm using a soft_fail=True, I don't think it retains the same behavior, because I've found the task failed instead of skipping after I've used both parameters soft_fail and timeout.
So what does happen here?
The sensor pokes every 4 hours, and at every poke, will wait for the duration of the timeout (12 hours)?
Or does it poke every 4 hours, for a total of 3 pokes, then times out?
Also, what happens with these parameters if I use the mode="reschedule"?
Here's the documentation of the BaseSensorOperator
class BaseSensorOperator(BaseOperator, SkipMixin):
"""
Sensor operators are derived from this class and inherit these attributes.
Sensor operators keep executing at a time interval and succeed when
a criteria is met and fail if and when they time out.
:param soft_fail: Set to true to mark the task as SKIPPED on failure
:type soft_fail: bool
:param poke_interval: Time in seconds that the job should wait in
between each tries
:type poke_interval: int
:param timeout: Time, in seconds before the task times out and fails.
:type timeout: int
:param mode: How the sensor operates.
Options are: ``{ poke | reschedule }``, default is ``poke``.
When set to ``poke`` the sensor is taking up a worker slot for its
whole execution time and sleeps between pokes. Use this mode if the
expected runtime of the sensor is short or if a short poke interval
is requried.
When set to ``reschedule`` the sensor task frees the worker slot when
the criteria is not yet met and it's rescheduled at a later time. Use
this mode if the expected time until the criteria is met is. The poke
inteval should be more than one minute to prevent too much load on
the scheduler.
:type mode: str
"""
Defining the terms
poke_interval: the duration b/w successive 'pokes' (evaluation the necessary condition that is being 'sensed')
timeout: Just poking indefinitely is inadmissible (if for e.g. your buggy code is poking on day to become 29 whenever month is 2, it will keep poking for upto 4 years). So we define a maximum period beyond which we stop poking and terminate (the sensor is marked either FAILED or SKIPPED)
soft_fail: Normally (when soft_fail=False), sensor is marked as FAILED after timeout. When soft_fail=True, sensor will instead be marked as SKIPPED after timeout
mode: This is a slightly complex
Any task (including sensor) when runs, eats up a slot in some pool (either default pool or explicitly specified pool); essentially meaning that it takes up some resources.
For sensors, this is
wasteful: as a slot is consumed even when we are just waiting (doing no actual work
dangerous: if your workflow has too many sensors that go into sensing around the same time, they can freeze a lot of resources for quite a bit. In fact too many having ExternalTaskSensors is notorious for putting entire workflows (DAGs) into deadlocks
To overcome this problem, Airflow v1.10.2 introduced modes in sensors
mode='poke' (default) means the existing behaviour that we discussed above
mode='reschedule' means after a poke attempt, rather than going to sleep, the sensor will behave as though it failed (in current attempt) and it's status will change from RUNNING to UP_FOR_RETRY. That ways, it will release it's slot, allowing other tasks to progress while it waits for another poke attempt
Citing the relevant snippet from code here
if self.reschedule:
reschedule_date = timezone.utcnow() + timedelta(
seconds=self._get_next_poke_interval(started_at, try_number))
raise AirflowRescheduleException(reschedule_date)
else:
sleep(self._get_next_poke_interval(started_at, try_number))
try_number += 1
For more info read Sensors Params section
And now answering your questions directly
Q1
The sensor pokes every 4 hours, and at every poke, will wait for the duration of the timeout (12 hours)?
Or does it poke every 4 hours, for a total of 3 pokes, then times out?
point 2. is correct
Q2
Also, what happens with these parameters if I use the
mode="reschedule"?
As explained earlier, each one of those params are independent and setting mode='reschedule' doesn't alter their behaviour in any way
BaseSensorOperator(
soft_fail=True,
poke_interval = 4*60*60, # Poke every 4 hours
timeout = 12*60*60, # Timeout of 12 hours
mode = "reschedule"
)
Let's say the criteria is not met at the first poke. So it will run again after 4 hours of interval. But the worker slot will be freed during the wait since we're using the mode="reschedule".
That is what I understood.

Time measurements for function in microcontroller

I am using two microcontrollers for a project, I want to measure execution time for some code with the help of internal timer of both microcontrollers. But One microcontroller's timer count till 32 bit value and second microcontroller's timer can count till 16bit value then it restart. I know that execution time of code is more than 16 bit value. could you suggest me any solution for this problem. (Turning ON and OFF GPIO pin doesn't provide useful results)
You should be able to measure execution time using either type of timer, assuming that execution time is less than days or hours. The real problem is how to configure the timer to meet your needs. How you configure the timer will control the precision or granularity of the measurement, as well as the maximum interval that can be measured.
The general approach will be thus:
Identify required precision and estimate the longest interval to be measured
Given the precision, determine the timer clock prescaler or divider that will meet your precision requirements. For example, if the clock speed is 50 MHz, and you need microsecond precision, then select a prescaler such that (Prescaler) / (Clock speed) ~ 1 microsecond. A spreadsheet helps with this. For this case, a divider value of 64 gives us about 1.28 microseconds per timer increment.
Determine if your timer register is large enough. For a 16-bit timer, you can measure (1.28 microseconds) * (2^16 - 1) = 0.084 seconds, or about a tenth of a second. If the thing you are measuring takes longer than this, you will need to rethink your precision requirements.
You should by now have identified the key parameters to configuring the timer, keeping in mind the limitations. If you update your answer with more specifics, such as the microcontrollers you plan to use and what you're trying to measure, I can be more specific.

Computation during analogRead on Arduino

The Arduino A/D converter takes about 0.1ms according to the manual. Actually, my tests show that on an Uno I can execute about 7700 per second in a loop.
Unfortunately analogRead waits while the reading is being performed, making it difficult to get anything done.
I wish to interleave computation with a series of A/D conversions. Is there any way of initiating the analogRead, then checking the timing and getting the completed value later? If this needs to be low-level and non-portable to other versions, I can deal with that.
Looking for a solution that would allow sampling all the channels on an Arduino on a regular basis, then sending data via SPI or I2C. I am willing to consider interrupts, but the sampling must remain extremely periodic.
Yes, you can start an ADC conversion without waiting for it to complete. Instead of using analogRead, check out Nick Gammon's example here, in the "Read Without Blocking" section.
To achieve a regular sample rate, you can either:
1) Let it operate in free-running mode, where it takes samples as fast as it can, or
2) Use a timer ISR to start the ADC, or
3) Use millis() to start a conversion periodically (a common "polling" solution). Be sure to step to the next conversion time by adding to the previously calculated conversion time, not by adding to the current time:
uint32_t last_conversion_time;
void setup()
{
...
last_conversion_time = millis();
}
void loop()
{
if (millis() - last_conversion_time >= ADC_INTERVAL) {
<start a new conversion here>
// Assume we got here as calculated, even if there
// were small delays
last_conversion_time += ADC_INTERVAL; // not millis()+ADC_INTERVAL!
// If there are other delays in your program > ADC_INTERVAL,
// you won't get back in time, and your samples will not
// be regularly-spaced.
Regardless of how you start the conversion periodically, you can either poll for completion or attach an ISR to be called when it is complete.
Be sure to use the volatile keyword for variables which are shared between the ISR and loop.

thrust transform performance number

Can any body tell me that thrust routines are blocking or non blocking?
I want to time it, here are the code snippets-
code snippet -1:
clock_t start,end;
start = clock();
thrust::transform( a.begin(), a.end(), b.begin(), thrust::negate<int>());
end = clock();
code snippet - 2
clock_t start,end;
start = clock();
thrust::transform( a.begin(), a.end(), b.begin(), thrust::negate<int>());
cudaThreadSynchronize();
end = clock();
code snippet -1 is taking very less time in compare to code snippet -2
why is this happening? and which one is the right way to time the thrust routines so that i may compare it to my parallel code.
I don't believe Thrust formally defines which APIs are blocking and which are non-blocking anywhere in the documentation. However, a transform call like your example should be executed in a single back-end closure operation (which translates into a single kernel call without host-device data copies) and should be asynchronous.
Your second code snippet is closer to the correct way to time a Thrust operation, but note that
clock() is generally implemented using a low resolution time source and is probably not suitable for timing these types of operations. You should find a higher resolution host source timer, or better still, use the CUDA events API to time your code. You can see an example of how to use these APIs in this question-answer pair.
cudaThreadSynchronize is a deprecated API as of the CUDA 4.0 release. You should use cudaDeviceSynchronize instead.

VHDL Logic produces wrong result when using higher frequencies

I don't have much experience with VHDL, so excuse me if it's stupid boring question, but I couldn't find appropriate answer. I have code that is a bit simplified here.
process (sys_clk, reset):
begin
if reset = '0' then
-- reseting code
elsif rising_edge(sys_clk) then
if data_ready = '1' and old_data_ready = '0' then -- rising edge of asynchronous sig
-- update few registers, and assigning values to few signals
elsif error_occured = '1' and old_error_occurred = '0' then
-- update few registers, and assigning values to few signals (same registers and signal as above)
end if;
old_data_ready <= data_ready;
old_error_occured <= error_occured;
end if;
end process;
Signal is kept much longer high than period of sys_clk is, but it's not know for how long. It varies.
These IFs result in two (one each) registers and an AND circuit. I believe you know that.
This worked, but very badly. The were errors to often. So I made special project using two processes. One active on rising edge of data_ready and one on error_occured. But I could use it just to increment and decrement to separate counters. I used that to verify that problem with my code is that sometimes this rising edge detection does not work. sys_clk is 27MHz, and I made much bigger project using that same frequency and they worked well. But there was no detection of rising edge of asynchronous signals this way. So I reduced frequency to 100kHz, because I don't really need higher frequencies. And that solved my problem.
But just for curiosity, what is the best way to test for rising edge of asynchronous signal when few of these signal affect same registers and device needs to work on higher frequencies?
I use Altera Quartus II and Cyclone II FPGA.
If the signal you are sampling is truly asynchronous, you have to deal with the issue of metastability. If the data_ready signal is in a metastable state exactly on sys_clk's rising edge, old_data_ready and the first if-statement might see different versions of data_ready. Also, you have an asynchronous reset. If the reset signal is released exactly when the data_ready is changing, it may result in data_ready being sampled to different values though out your system. A simulator will not reveal metastability problems, because the code is logically correct.
To circumvent these problems, have asynchronous reset between modules, but synchronous within.
Also, synchronize any signal coming from a different clock domain. A synchronizer is a couple of flip flops placed closely together. When the signal passes through the FFs, any metastability issues will be resolved before it reaches your logic. There is a formula for calculating mean time between failure (MTBF) due to metastability in FPGAs. I won't recite it, but what is basically says is that using this simple method reduces MTBF from seconds to billions of years.
VHDL synchronizer:
process(clk, rst) is
begin
if(rising_edge(clk)) then
if(rst = '0') then
data_ready_s1 <= '0';
data_ready_s2 <= '0';
else
data_ready_s1 <= data_ready;
data_ready_s2 <= data_ready_s1;
end if;
end if;
end process;
Use data_ready_s2 in your module.
Then you constrain the path between the flipflops in the UCF file:
TIMEGRP "FF_s1" = FFS("*_s1") FFS("*_s1<*>");
TIMEGRP "FF_s2" = FFS("*_s2") FFS("*_s2<*>");
TIMESPEC TS_SYNC = FROM "FF_s1" TO "FF_s2" 2 ns DATAPATHONLY;

Resources