Firmware update via CANbus and FreeRTOS watch dog triggered issue - arduino

I develop a firmware running a ESP32-based custom PCB in a connected battery (hereafter the BATTERY).
Th battery is capable of CANbus connectivity, and I want to take advantage of it to upgrade the firmware.
For this purpose, I took a second PCB (hereafter the UPDATER) which I use to transfer the firmware bin file to the BATTERY.
On the BATTERY, I have FreeRTOS task with a relatively high priority of 9: its responsibility is to wait for the UPDATER incoming message and respond accordingly.
On the other hand, the UPDATER runs a FreeRTOS task that is used to send new firmware bin file chunks of data to the BATTERY. The task also runs at priority 9.
Streamlined exchange protocol is as follows:
UPDATER BATTERY
Start w/ i=0 Start w/ chunkId = 0
| |
loop: |
| |
Take ith 8-byte chunk |
from bin file and |
transmit to battery |
| |
o --------------------------->o
| |
| send current chunkId
| back to UPDATER
| and chunkId++
| |
o< ---------------------------o
|
Read received chunkId
and compare to I
If i != chunkId
then error and stop
Otherwise i++
and 'loop:' till the end
of file is reached
|
Success
The typical firmware bin file size is 1.8MB.
The issue I have, is on the BATTERY, and it is related to the canRx() task for(;;) loop. After a many loops and chunks exchanged, I end up with a watch dog triggered:
E (548436) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:E (548436) task_wdt:  - IDLE0 (CPU 0)
E (548436) task_wdt: Tasks currently running:
E (548436) task_wdt: CPU 0: btController
E (548436) task_wdt: CPU 1: IDLE1
E (548436) task_wdt: Aborting.
abort() was called at PC 0x401d105c on core 0ELF file SHA256: 0000000000000000Backtrace: 0x40095ac4:0x3ffbffd0 0x40095d3d:0x3ffbfff0 0x401d105c:0x3ffc0010 0x400913f1:0x3ffc0030 0x401e0289:0x3ffd37d0 0x401e085d:0x3ffd37f0 0x400979f2:0x3ffd3820
My task runs on CORE_1 and the code looks like so:
void canRx(void *arg) {
can_message_t rx_frame;
firmwareUpdater.log("FU: Installing high performance canRx task handler");
for (;;) {
if (can_receive(&rx_frame, pdMS_TO_TICKS(20)) == ESP_OK) { // 1000
// Receive
const unsigned id = rx_frame.identifier;
//#if LOG_LEVEL >= LOG_LEVEL_DEBUG
firmwareUpdater.log("FU: CAN <== RX 0x%04x", id);
//#endif
switch (id) {
// Restart smartbox
// == Reboot a device from the smartbox
case CANBUS_CANID_RESTART:
{
// = What deice must be restarted?
byte device = rx_frame.data[0];
switch (device)
{
// Restart smartbox
case CANBUS_DEVICE_SMARTBOX:
pepsr.restart("Requested by CANbus");
break;
default:
firmwareUpdater.error("FU: asked to restart device %02x which is unknown: ignored");
break;
}
break;
}
// Start (over) update process
case FIRMWAREUPDATE_CANID_START_OF_PROCESS:
{
const unsigned fileSize = (rx_frame.data[0] << 24) | (rx_frame.data[1] << 16) | (rx_frame.data[2] << 8) | rx_frame.data[3];
firmwareUpdater.startProcess(fileSize);
}
break;
case FIRMWAREUPDATE_CANID_FIRMWARE_IDENTIFIER:
firmwareUpdater.sendFirmwareBuild();
delay(1000);
firmwareUpdater.sendFirmwareIdentifier();
break;
case FIRMWAREUPDATE_CANID_ADDCHUNK:
firmwareUpdater.addChunk(rx_frame.data, rx_frame.data_length_code);
break;
default:
firmwareUpdater.log("FU: CAN <== RX 0x%04x | Not an ADD CHUNK", id);
break;
}
}
else {
firmwareUpdater.log("FU: CAN: no pending messages");
}
// === VERY IMPORTANT: Let watchdog know we are still there
delay(20);
}
}
#endif
And the task itself is fired as follows:
log("Installing hi speed RX callback");
xTaskCreatePinnedToCore(&canRx, "CAN_rx", 4096, NULL,
//5 //TASK_PRIORITY_REGULAR_5
9 // TASK_PRIORITY_HI_9
, &xCanRxHandle, tskNO_AFFINITY);
As you can see, I tried to transfer control to the scheduler by playing with:
The task priority by taking it down to 5;
Adding extra delay() in the loop;
Reducing the timeout value of the `can_receive() from 1000 down to 20.
But nothing really works: for instance, by increasing the delay()(point 2), I manage to make the transfer more robust but at the expense of an unmanageable total duration.
My bet is that something is fundamentally wrong somewhere.
I need extra thought and help! Thanks in advance.

Related

GPS module on ESP32 not giving valid logs

Environments
osx
esp32
vscode
platformio
I am working on an ESP32 module with this GPS module (very similar except the one I have has "ublox" logo on it - bought about 2 years ago).
#include <Arduino.h>
#include <HardwareSerial.h>
#include <TinyGPS++.h>
TinyGPSPlus gps;
HardwareSerial SerialGPS(2);
void setup() {
Serial.begin(115200); // RX TX
SerialGPS.begin(9600, SERIAL_8N1, 16, 17);
}
void loop() {
Serial.println("------------");
Serial.print("available(): ");
Serial.println(SerialGPS.available());
Serial.println("------------");
while (SerialGPS.available() > 0) {
char c = SerialGPS.read();
Serial.print(c);
gps.encode(c);
}
Serial.println();
if (gps.location.isValid()) {
Serial.print("LAT=");
Serial.println(gps.location.lat(), 6);
Serial.print("LONG=");
Serial.println(gps.location.lng(), 6);
Serial.print("ALT=");
Serial.println(gps.altitude.meters());
} else {
Serial.println("not valid");
}
delay(1000);
}
I took it outside and ran it for over 15 mins, and I see the data are still invalid.
------------
available(): 195
------------
$GPRMC,023424.00,V,,,,,,,051120,,,N*79
$GPVTG,,,,,,,,,N*30
$GPGGA,023424.00,,,,,0,00,99.99,,,,,,*65
$GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99*30
$GPGSV,1,1,00*79
$GPGLL,,,,,023424.00,V,N*49
not valid
------------
available(): 195
------------
$GPRMC,023425.00,V,,,,,,,051120,,,N*78
$GPVTG,,,,,,,,,N*30
$GPGGA,023425.00,,,,,0,00,99.99,,,,,,*64
$GPGSA,A,1,,,,,,,,,,,,,99.99,99.99,99.99*30
$GPGSV,1,1,00*79
$GPGLL,,,,,023425.00,V,N*48
not valid
Since I see letters coming in, I don't think TX and RX are mixed up.
I am giving it 5V (although not exactly sure if it should be 3.3v or 5v).
How can I get valid GPS data coming in from this module?
To me it looks like the GPS module is sending data properly, but hasn't got any available. It could still be looking for satellites. You can try printing the number of available ones, and simply wait longer:
Add the following lines to your program before your if statement:
Serial.println(gps.time.value()); // Raw time in HHMMSSCC format (u32)
Serial.println(gps.time.hour()); // Hour (0-23) (u8)
Serial.println(gps.time.minute()); // Minute (0-59) (u8)
Serial.println(gps.time.second()); // Second (0-59) (u8)
Serial.println(gps.satellites.value()); // Number of satellites in use (u32)
The first step should be that your GPS module gets the correct time. This should happen after a few minutes, probably. Then the number of satellites in use should go up, and you should start getting valid results once a reasonable number of satellites are found. I tend to get a reading with probably about 9 satellites.
If it is a cheaper module it might take a while, especially from a cold start.

why the ROM designed from distributed memory block in vivado can't work?

I am trying to use the Distributed Memory Generator presented by vivado to store some data. But the simulation out of the ROM is always 'xxx'.
Here's my IP setting:I named it temp. Memory config: Depth=160, width=12, ROM.
Port config: input options: registered
Here's my testbench:
`timescale 1ns/1ps
module tb();
reg clk;
reg [6:0] a;
wire [11:0] out;
initial
begin
clk <= 0;
a <= 7'b0000000;
end
always #5 clk <= ~clk;
always #20 a <= a + 1;
temp u (.a(a), .clk(clk), .spo(out));
defparam u.inst.C_READ_MIF = "temp.mif";//initial ROM
endmodule
DEPTH = 160;
WIDTH = 12;
ADDRESS_RADIX = BIN;
DATA_RADIX = BIN;
CONTENT
BEGIN
0:100000000010;
1:100000000100;
10:000000000011;
11:100000000001;
100:100000001011;
101:000000000001;
110:000000000001;
111:000000000000;
1000:000000000000;
1001:100000000100;
1010:100000001000;
1011:100000000100;
1100:100000000001;
1101:100000000001;
1110:100000000000;
1111:000000000110;
10000:100000000011;
10001:100000000101;
10010:000000000000;
10011:100000000100;
10100:100000000011;
10101:000000000001;
10110:100000000000;
10111:000000000001;
11000:100000000100;
11001:000000000100;
11010:000000000000;
11011:000000000101;
11100:000000000101;
11101:000000000010;
11110:000000000011;
11111:100000000011;
100000:100000000010;
100001:000000000001;
100010:000000000001;
100011:000000000100;
100100:100000000000;
100101:000000000100;
100110:100000000000;
100111:100000000010;
101000:000000001000;
101001:000000000110;
101010:000000000000;
101011:100000000010;
101100:100000000101;
101101:100000000100;
101110:100000000011;
101111:100000001010;
110000:100000000000;
110001:100000000010;
110010:000000000111;
110011:100000000011;
110100:000000000001;
110101:100000000011;
110110:100000000100;
110111:000000000110;
111000:100000000000;
111001:100000000001;
111010:000000000100;
111011:000000000011;
111100:000000001010;
111101:100000001011;
111110:100000000000;
111111:000000000010;
1000000:000000000000;
1000001:000000000010;
1000010:000000000001;
1000011:000000000100;
1000100:100000000100;
1000101:100000000111;
1000110:000000000100;
1000111:100000000010;
1001000:000000000001;
1001001:100000000000;
1001010:000000000010;
1001011:100000000001;
1001100:100000001010;
1001101:000000000110;
1001110:100000000100;
1001111:000000000100;
1010000:100000000001;
1010001:000000000000;
1010010:000000000000;
1010011:000000000100;
1010100:000000000100;
1010101:100000000001;
1010110:100000000100;
1010111:000000000100;
1011000:100000000110;
1011001:000000000010;
1011010:000000000010;
1011011:000000000100;
1011100:000000001000;
1011101:100000000101;
1011110:100000000000;
1011111:000000000000;
1100000:000000000000;
1100001:100000000110;
1100010:100000000111;
1100011:100000000001;
1100100:100000000000;
1100101:100000000001;
1100110:100000001000;
1100111:000000000010;
1101000:000000000010;
1101001:000000000011;
1101010:100000000010;
1101011:000000000010;
1101100:100000000110;
1101101:100000000000;
1101110:000000000000;
1101111:000000000001;
1110000:000000000011;
1110001:100000000011;
1110010:100000000011;
1110011:100000000101;
1110100:000000001011;
1110101:100000000000;
1110110:100000000001;
1110111:000000000001;
1111000:000000000001;
1111001:100000000000;
1111010:100000000001;
1111011:000000001000;
1111100:000000000000;
1111101:100000000001;
1111110:100000000110;
1111111:100000000100;
10000000:000000000101;
10000001:000000000100;
10000010:000000000101;
10000011:100000000011;
10000100:100000000101;
10000101:000000000001;
10000110:000000000101;
10000111:000000000001;
10001000:000000000001;
10001001:000000000111;
10001010:100000000000;
10001011:100000000100;
10001100:100000001000;
10001101:000000000010;
10001110:100000000010;
10001111:100000000000;
10010000:000000000101;
10010001:000000000011;
10010010:100000000001;
10010011:000000000110;
10010100:000000000110;
10010101:100000000010;
10010110:100000000000;
10010111:000000000000;
10011000:100000000000;
10011001:100000000100;
10011010:000000000100;
10011011:000000000111;
10011100:000000000001;
10011101:100000000010;
10011110:100000000110;
10011111:100000000100;
END
Here's my simulation result
I don't know why out remains '000'. And how should I use the ROM block?
Most likely is that your initialization fails. Check for an error message in your simulation log file like 'Can't open...'.
The reason why I suspect that is because the Xilinx file locations in Vivado are very, very nasty. They are relative to the simulation directory which can be 4 or 5 directories deep inside the Vivado project directory. Not only that, the depth recently changed when I switched to a newer version and I had to adapt every path in every old test-bench! (Thank you Xilinx!!)
This is an example of a simulation where I had to pass a filename in my testbench (tb):
.file_path ("../../../../../test_data/"),
.file_name ("testpattern_00.bin"),
The directory structure I use is:
---+--tb
| |
| +-- test_bench.sv
|
+--test_data
| |
| +-- testpattern_00.bin
|
+--vivado_directory
|
+--vivado_project.xpr

printf alternative when using "define _GNU_SOURCE"

After reading https://www.quora.com/How-can-I-bypass-the-OS-buffering-during-I-O-in-Linux I want to try to access data on the serial port with the O_DIRECT option, but the only way I can seem to do that is by adding the GNU_SOURCE define but when I tried to execute the program, nothing at all is printed on the screen.
If I remove "#define _GNU_SOURCE" and compile, then the system gives me an error on O_DIRECT.
If I remove the define and the O_DIRECT flag, then incorrect (possibly outdated) data is always read, but the data is printed on the screen.
I still want to use the O_DIRECT flag and be able to see the data, so I feel I need an alternative command to printf and friends, but I don't know how to continue.
I attached the code below:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>
#include <termios.h>
#define TIMEOUT 5
int main(){
char inb[3]; //our byte buffer
int nread=0; //number bytes read from port
int n; //counter
int iosz=128; //Lets get 128 bytes
int fd=open("/dev/ttyS0", O_NOCTTY | O_RDONLY | O_SYNC | O_DIRECT); //Open port
tcflush(fd,TCIOFLUSH);
for(n=0;n<iosz;n++){
int s=time(NULL); //Start timer for 5 seconds
while (time(NULL)-s < TIMEOUT && nread < 1){
inb[0]='A'; //Fill buffer with bad data
inb[1]='B';
inb[2]='C';
nread=read(fd,(char*)inb,1); //Read ONE byte
tcflush(fd,TCIOFLUSH);
if (nread < 0 || time(NULL)-s >= TIMEOUT){
close(fd); //Exit if read error or timeout
return -1;
}
}
printf("%x:%d ",inb[0] & 0xFF,nread); //Print byte as we receive it
}
close(fd); //program ends so close and exit
printf("\n"); //Print byte as we receive it
return 0;
}
First off, I'm no expert on this topic, just curious about it, so take this answer with a pinch of salt.
I don't know if what you're trying to do here (if I'm not looking at it the wrong way it seems to be to bypass the kernel and read directly from the port to userspace) was ever a possibility (you can find some examples, like this one but I could not find anything properly documented) but with recent kernels you should be getting an error running your code, but you're not catching it.
If you add these lines after declaring your port:
...
int fd=open("/dev/ttyS0", O_NOCTTY | O_RDONLY | O_SYNC | O_DIRECT );
if (fd == -1) {
fprintf(stderr, "Error %d opening SERIALPORT : %s\n", errno, strerror(errno));
return 1;
}
tcflush(fd,TCIOFLUSH);
....
When you try to run you'll get: Error 22 opening SERIALPORT : Invalid argument
In my humble and limited understanding, you should be able to get the same effect changing the settings on termios to raw, something like this should do:
struct termios t;
tcgetattr(fd, &t); /* get current port state */
cfmakeraw(&t); /* set port state to raw */
tcsetattr(fd, TCSAFLUSH, &t); /* set updated port state */
There are many good sources for termios, but the only place I could find taht also refers to O_DIRECT (for files) is this one.

I am getting this watchdog Timer error while doing a simple task in arduino with esp32

i have seen a lot of forums where this problem is dicussed but nothing seems to work. I am working with esp32 and it was all fine untill out of nowhere this watch dog timer error came up. I am new to it so i cant really fix this.
I have another code but i copied a very simple chunk of it and created a new file but watch dog timer error is appearing here too. I dont know what the issue.
it said idle0 is not resetting watach dog timer and "wifi" task is running on cpu0.
ERROR LOG
E (42418) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (42418) task_wdt: - IDLE0 (CPU 0)
E (42418) task_wdt: Tasks currently running:
E (42418) task_wdt: CPU 0: wifi
E (42418) task_wdt: CPU 1: IDLE1
E (42418) task_wdt: Aborting.
abort() was called at PC 0x400d96f7 on core 0
Backtrace: 0x4008c470:0x3ffbe270 0x4008c6a1:0x3ffbe290 0x400d96f7:0x3ffbe2b0 0x400815dd:0x3ffbe2d0 0x40136087:0x00000000
Rebooting...
ets Jun 8 2016 00:22:57
rst:0xc (SW_CPU_RESET),boot:0x17 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0018,len:4
load:0x3fff001c,len:1100
load:0x40078000,len:10088
load:0x40080400,len:6380
entry 0x400806a4
I have tried running my task on cpu1 or core 1 too but wifi is automatically running on cpu or core0. And getting the same error.
have also tried adding delays but nothing works.
char *wssid = "PTCL-TB";
char *wpassword = "pakistan";
bool connected2Wifi = false;
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
delay(10);
Serial.println('\n');
WiFi.begin(wssid, wpassword); // Connect to the network
Serial.print("Connecting to ");
Serial.print(wssid);
while (WiFi.status() != WL_CONNECTED) { // Wait for the Wi-Fi to connect
delay(500);
Serial.print('.');
}
Serial.println('\n');
Serial.println("Connection established!");
Serial.print("IP address:\t");
Serial.println(WiFi.localIP());
}
void loop() {
// put your main code here, to run repeatedly:
} ```
I want to connect to wifi in this task. It's very simple and i have also copied it from a reliable source whose code was running. but the error seems to be rigid.
Go to Tools -> CPU Frequency and set it to 160, 80 or 240 MHz (the ones that support WiFi/BT).

Killing a Haskell binary

If I press Ctrl+C, this throws an exception (always in thread 0?). You can catch this if you want - or, more likely, run some cleanup and then rethrow it. But the usual result is to bring the program to a halt, one way or another.
Now suppose I use the Unix kill command. As I understand it, kill basically sends a (configurable) Unix signal to the specified process.
How does the Haskell RTS respond to this? Is it documented somewhere? I would imagine that sending SIGTERM would have the same effect as pressing Ctrl+C, but I don't know that for a fact...
(And, of course, you can use kill to send signals that have nothing to do with killing at all. Again, I would imagine that the RTS would ignore, say, SIGHUP or SIGPWR, but I don't know for sure.)
Googling "haskell catch sigterm" led me to System.Posix.Signals of the unix package, which has a rather nice looking system for catching and handling these signals. Just scroll down to the "Handling Signals" section.
EDIT: A trivial example:
import System.Posix.Signals
import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar
termHandler :: MVar () -> Handler
termHandler v = CatchOnce $ do
putStrLn "Caught SIGTERM"
putMVar v ()
loop :: MVar () -> IO ()
loop v = do
putStrLn "Still running"
threadDelay 1000000
val <- tryTakeMVar v
case val of
Just _ -> putStrLn "Quitting" >> return ()
Nothing -> loop v
main = do
v <- newEmptyMVar
installHandler sigTERM (termHandler v) Nothing
loop v
Notice that I had to use an MVar to inform loop that it was time to quit. I tried using exitSuccess from System.Exit, but since termHandler executes in a thread that isn't the main one, it can't cause the program to exit. There might be an easier way to do it, but I've never used this module before so I don't know of one. I tested this on Ubuntu 12.10.
Searching for "signal" in the ghc source code on github revealed the installDefaultSignals function:
void
initDefaultHandlers(void)
{
struct sigaction action,oact;
// install the SIGINT handler
action.sa_handler = shutdown_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGINT, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGINT handler");
}
#if defined(HAVE_SIGINTERRUPT)
siginterrupt(SIGINT, 1); // isn't this the default? --SDM
#endif
// install the SIGFPE handler
// In addition to handling SIGINT, also handle SIGFPE by ignoring it.
// Apparently IEEE requires floating-point exceptions to be ignored by
// default, but alpha-dec-osf3 doesn't seem to do so.
// Commented out by SDM 2/7/2002: this causes an infinite loop on
// some architectures when an integer division by zero occurs: we
// don't recover from the floating point exception, and the
// program just generates another one immediately.
#if 0
action.sa_handler = SIG_IGN;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGFPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGFPE handler");
}
#endif
#ifdef alpha_HOST_ARCH
ieee_set_fp_control(0);
#endif
// ignore SIGPIPE; see #1619
// actually, we use an empty signal handler rather than SIG_IGN,
// so that SIGPIPE gets reset to its default behaviour on exec.
action.sa_handler = empty_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGPIPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGPIPE handler");
}
set_sigtstp_action(rtsTrue);
}
From that, you can see that GHC installs at least SIGINT and SIGPIPE handlers. I don't know if there are any other signal handlers hidden in the source code.

Resources