Unexpected close of tokio mpsc channel - asynchronous

I am trying to use tokio::mpsc::channel to send data from a synchronous function to tokio thread to handle it asynchronously.
Since tokio::mpsc::channel is an async function, I spawn a runtime from the sync function to create rx and tx and return tx after moving rx to the newly spawned task in it.
However, it does not work as I expected and I have conducted some debugging and found out the followings.
The channel is not closed right after the creation.
The channel reported itself to be closed right after the rx moved into a separate task.
Right at the moment the channel reported itself to be closed, the spawned thread, which holds the moved rx seems does not even start. Hence couldn't be dropped, I guess.
https://docs.rs/tokio/latest/tokio/sync/mpsc/struct.Sender.html.
The documentation says that the tx report itself is closed only when rx handle gets dropped or explicitly calls the close function. It seems neither in this case.
fn websocket(base: &Url, id: &str) -> Result<(Sender<String>, Stream<String>), BoxErr> {
...
let ws_reciver_sink = Sink::new();
let ws_receiver_stream = ws_reciver_sink.stream();
let (ws_sender_tx, mut ws_sender_rx) = mpsc::channel(100);
debug!("ws_sender_channel is closed 1: {}", ws_sender_tx.is_closed());
runtime::Builder::new_current_thread()
.enable_all()
.build()?
.block_on(async move {
let (ws_stream, _res) = connect_async(url).await?;
let (mut ws_sender_inner, mut ws_receiver_inner) = ws_stream.split();
debug!("spawning ws_recv_task");
let ws_recv_task = tokio::spawn(async move {
while let Some(Ok(Message::Text(msg))) = ws_receiver_inner.next().await {
ws_reciver_sink.send(msg);
}
});
debug!("spawning ws_send_task");
let ws_send_task = tokio::spawn(async move {
debug!("moving ws_sender_rx handle");
while let Some(msg) = ws_sender_rx.recv().await {
if ws_sender_inner.send(Message::Text(msg)).await.is_err() {
ws_recv_task.abort();
}
debug!("dropping ws_sender_rx handle");
}
});
Ok::<(), BoxErr>(())
})?;
debug!("ws_sender_channel is closed 2: {}", ws_sender_tx.is_closed());
Ok((ws_sender_tx, ws_receiver_stream))
}
Output
[2022-12-13T11:58:45Z DEBUG reqwest::connect] starting new connection: http://127.0.0.1:8000/
[2022-12-13T11:58:45Z DEBUG iot] connecting to websocket ws://127.0.0.1:8000/node/test-a
[2022-12-13T11:58:45Z DEBUG iot] ws_sender_channel is closed 1: false
[2022-12-13T11:58:45Z DEBUG tungstenite::handshake::client] Client handshake done.
[2022-12-13T11:58:45Z DEBUG iot] spawning ws_recv_task
[2022-12-13T11:58:45Z DEBUG iot] spawning ws_send_task
[2022-12-13T11:58:45Z DEBUG iot] ws_sender_channel is closed 2: true
[2022-12-13T11:58:45Z DEBUG reqwest::connect] starting new connection: http://127.0.0.1:8000/
[2022-12-13T11:58:45Z DEBUG iot] ws_sender_channel is closed: true
thread 'tests::test_connect' panicked at 'called `Result::unwrap()` on an `Err` value: SendError("Some nights")', iot/src/lib.rs:199:50
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
What am I missing? Please enlighten me.

Related

Firmware update via CANbus and FreeRTOS watch dog triggered issue

I develop a firmware running a ESP32-based custom PCB in a connected battery (hereafter the BATTERY).
Th battery is capable of CANbus connectivity, and I want to take advantage of it to upgrade the firmware.
For this purpose, I took a second PCB (hereafter the UPDATER) which I use to transfer the firmware bin file to the BATTERY.
On the BATTERY, I have FreeRTOS task with a relatively high priority of 9: its responsibility is to wait for the UPDATER incoming message and respond accordingly.
On the other hand, the UPDATER runs a FreeRTOS task that is used to send new firmware bin file chunks of data to the BATTERY. The task also runs at priority 9.
Streamlined exchange protocol is as follows:
UPDATER BATTERY
Start w/ i=0 Start w/ chunkId = 0
| |
loop: |
| |
Take ith 8-byte chunk |
from bin file and |
transmit to battery |
| |
o --------------------------->o
| |
| send current chunkId
| back to UPDATER
| and chunkId++
| |
o< ---------------------------o
|
Read received chunkId
and compare to I
If i != chunkId
then error and stop
Otherwise i++
and 'loop:' till the end
of file is reached
|
Success
The typical firmware bin file size is 1.8MB.
The issue I have, is on the BATTERY, and it is related to the canRx() task for(;;) loop. After a many loops and chunks exchanged, I end up with a watch dog triggered:
E (548436) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:E (548436) task_wdt:  - IDLE0 (CPU 0)
E (548436) task_wdt: Tasks currently running:
E (548436) task_wdt: CPU 0: btController
E (548436) task_wdt: CPU 1: IDLE1
E (548436) task_wdt: Aborting.
abort() was called at PC 0x401d105c on core 0ELF file SHA256: 0000000000000000Backtrace: 0x40095ac4:0x3ffbffd0 0x40095d3d:0x3ffbfff0 0x401d105c:0x3ffc0010 0x400913f1:0x3ffc0030 0x401e0289:0x3ffd37d0 0x401e085d:0x3ffd37f0 0x400979f2:0x3ffd3820
My task runs on CORE_1 and the code looks like so:
void canRx(void *arg) {
can_message_t rx_frame;
firmwareUpdater.log("FU: Installing high performance canRx task handler");
for (;;) {
if (can_receive(&rx_frame, pdMS_TO_TICKS(20)) == ESP_OK) { // 1000
// Receive
const unsigned id = rx_frame.identifier;
//#if LOG_LEVEL >= LOG_LEVEL_DEBUG
firmwareUpdater.log("FU: CAN <== RX 0x%04x", id);
//#endif
switch (id) {
// Restart smartbox
// == Reboot a device from the smartbox
case CANBUS_CANID_RESTART:
{
// = What deice must be restarted?
byte device = rx_frame.data[0];
switch (device)
{
// Restart smartbox
case CANBUS_DEVICE_SMARTBOX:
pepsr.restart("Requested by CANbus");
break;
default:
firmwareUpdater.error("FU: asked to restart device %02x which is unknown: ignored");
break;
}
break;
}
// Start (over) update process
case FIRMWAREUPDATE_CANID_START_OF_PROCESS:
{
const unsigned fileSize = (rx_frame.data[0] << 24) | (rx_frame.data[1] << 16) | (rx_frame.data[2] << 8) | rx_frame.data[3];
firmwareUpdater.startProcess(fileSize);
}
break;
case FIRMWAREUPDATE_CANID_FIRMWARE_IDENTIFIER:
firmwareUpdater.sendFirmwareBuild();
delay(1000);
firmwareUpdater.sendFirmwareIdentifier();
break;
case FIRMWAREUPDATE_CANID_ADDCHUNK:
firmwareUpdater.addChunk(rx_frame.data, rx_frame.data_length_code);
break;
default:
firmwareUpdater.log("FU: CAN <== RX 0x%04x | Not an ADD CHUNK", id);
break;
}
}
else {
firmwareUpdater.log("FU: CAN: no pending messages");
}
// === VERY IMPORTANT: Let watchdog know we are still there
delay(20);
}
}
#endif
And the task itself is fired as follows:
log("Installing hi speed RX callback");
xTaskCreatePinnedToCore(&canRx, "CAN_rx", 4096, NULL,
//5 //TASK_PRIORITY_REGULAR_5
9 // TASK_PRIORITY_HI_9
, &xCanRxHandle, tskNO_AFFINITY);
As you can see, I tried to transfer control to the scheduler by playing with:
The task priority by taking it down to 5;
Adding extra delay() in the loop;
Reducing the timeout value of the `can_receive() from 1000 down to 20.
But nothing really works: for instance, by increasing the delay()(point 2), I manage to make the transfer more robust but at the expense of an unmanageable total duration.
My bet is that something is fundamentally wrong somewhere.
I need extra thought and help! Thanks in advance.

(airflow) emr steps operator -> emr steps sensor; sensor failed -> trigger before operator

I want to handle failovers.
But, sensor failed -> retries only sensor self.
But, I want to trigger before operator.
This is flow chart.
a -> a_sensor (failed) -> a (retry) -> a_sensor -> (done)
Can I do this?
I recommend waiting the EMR job in the operator itself, even if this keeps the task running and occupying the worker slot, but it doesn't consume much resources, and you can simply manage the timeout, cleanup and the retry strategy:
class EmrOperator(BaseOperator):
...
def execute():
run_job():
wait_job()
def wait_job():
while not (is_finished()):
sleep(10s)
def on_kill():
cleanup()
And you can use the official operator EmrAddStepsOperator which already supports this.
And if you want to implement what you have mentioned in the question, Airflow doesn't support the retry for a group of tasks yet, but you can achieve this using callbacks:
a = EmrOperator(..., retries=0)
a_sensor = EmrSensor(, retries=0, on_failure_callback=emr_a_callback)
def emr_a_callback(ti, dag_run,):
max_retries = 3
retry_num = ti.xcom_pull(dag_ids=ti.task_id, key="retry_num")
if retry > max_retries:
retrun # do nothing
task_a = dag_run.get_task_instance("<task a id>")
task_a.state = None # pass a state to None
ti.state = None # pass the sensor state to None

cannot move out of a captured variable in an async `Fn` closure

Here is my code. In this program, I want to create a simple websocket server. When user sends a request to the ws://{url}/, the browser will establish a websocket connection with the server.
use std::{collections::HashMap, sync::Arc};
use async_std::{prelude::*, sync::Mutex};
use tide_websockets::WebSocket;
use uuid::Uuid;
#[async_std::main]
async fn main() {
let connections = Arc::new(Mutex::new(HashMap::new()));
let mut app = tide::new();
app.at("/").get(WebSocket::new(move |_, mut stream| async move {
let uuid = Uuid::new_v4();
// Add the connection to clients when opening a new connection
connections.lock().await.insert(uuid, stream.clone());
// Waiting for the connection to be closed
while let Some(Ok(_)) = stream.next().await {}
// Remove the connection from clients when it is closed
connections.lock().await.remove(&uuid);
Ok(())
}));
// app.bind(url).await
}
When I tried to compile this program, the rustc said:
error[E0507]: cannot move out of `connections`, a captured variable in an `Fn` closure
--> src/main.rs:11:57
|
9 | let connections = Arc::new(Mutex::new(HashMap::new()));
| ----------- captured outer variable
10 | let mut app = tide::new();
11 | app.at("/").get(WebSocket::new(move |_, mut stream| async move {
| ____________________________________--------------------_^
| | |
| | captured by this `Fn` closure
12 | | let uuid = Uuid::new_v4();
13 | |
14 | | // Add the connection to clients when opening a new connection
15 | | connections.lock().await.insert(uuid, stream.clone());
| | -----------
| | |
| | variable moved due to use in generator
| | move occurs because `connections` has type `Arc<async_std::sync::Mutex<HashMap<Uuid, WebSocketConnection>>>`, which does not implement the `Copy` trait
... |
23 | | Ok(())
24 | | }));
| |_____^ move out of `connections` occurs here
For more information about this error, try `rustc --explain E0507`.
error: could not compile `mre` due to previous error
And this is the definition of the Websocket::new method (no sure if it's useful):
impl<S, H, Fut> WebSocket<S, H>
where
S: Send + Sync + Clone + 'static,
H: Fn(Request<S>, WebSocketConnection) -> Fut + Sync + Send + 'static,
Fut: Future<Output = Result<()>> + Send + 'static,
{
/// Build a new WebSocket with a handler function that
pub fn new(handler: H) -> Self {
Self {
handler: Arc::new(handler),
ghostly_apparition: PhantomData,
protocols: Default::default(),
}
}
// ...
}
I tried searching this problem before posting this question. Most of the answers are either irrelevant, or need to modify the source code of the method (Websocket::new method here). But this method is not written by me but is from a third-party crate. Is there still any way to resolve this problem?
The argument of WebSocket::new() has to be an Fn closure, meaning, it must be callable repeatedly.
In your code, however, it internally uses the connections variable inside of an async move, meaning it moves the variable into the async block. This can for obvious reasons only be done once.
It's easy to fix, though. Instead of moving the entire connections variable in, you need to create a new Arc reference of the connections variable and move that one into the async move. So every invocation gets its own copy of it, making it compatible with Fn.
Here is a compiling version:
use std::{collections::HashMap, sync::Arc};
use async_std::{prelude::*, sync::Mutex};
use tide_websockets::WebSocket;
use uuid::Uuid;
#[async_std::main]
async fn main() {
let connections = Arc::new(Mutex::new(HashMap::new()));
let mut app = tide::new();
app.at("/").get(WebSocket::new(move |_, mut stream| {
let connections = Arc::clone(&connections);
async move {
let uuid = Uuid::new_v4();
// Add the connection to clients when opening a new connection
connections.lock().await.insert(uuid, stream.clone());
// Waiting for the connection to be closed
while let Some(Ok(_)) = stream.next().await {}
// Remove the connection from clients when it is closed
connections.lock().await.remove(&uuid);
Ok(())
}
}));
// app.bind(url).await
}

Send SIGINT to a process by sending ctrl-c to stdin

I'm looking for a way to mimick a terminal for some automated testing: i.e. start a process and then interact with it via sending data to stdin and reading from stdout. E.g. sending some lines of input to stdin including ctrl-c and ctrl-\ which should result in sending signals to the process.
Using std::process::Commannd I'm able to send input to e.g. cat and I'm also seeing its output on stdout, but sending ctrl-c (as I understand that is 3) does not cause SIGINT sent to the shell. E.g. this program should terminate:
use std::process::{Command, Stdio};
use std::io::Write;
fn main() {
let mut child = Command::new("sh")
.arg("-c").arg("-i").arg("cat")
.stdin(Stdio::piped())
.spawn().unwrap();
let mut stdin = child.stdin.take().unwrap();
stdin.write(&[3]).expect("cannot send ctrl-c");
child.wait();
}
I suspect the issue is that sending ctrl-c needs the some tty and via sh -i it's only in "interactive mode".
Do I need to go full fledged and use e.g. termion or ncurses?
Update: I confused shell and terminal in the original question. I cleared this up now. Also I mentioned ssh which should have been sh.
The simplest way is to directly send the SIGINT signal to the child process. This can be done easily using nix's signal::kill function:
// add `nix = "0.15.0"` to your Cargo.toml
use std::process::{Command, Stdio};
use std::io::Write;
fn main() {
// spawn child process
let mut child = Command::new("cat")
.stdin(Stdio::piped())
.spawn().unwrap();
// send "echo\n" to child's stdin
let mut stdin = child.stdin.take().unwrap();
writeln!(stdin, "echo");
// sleep a bit so that child can process the input
std::thread::sleep(std::time::Duration::from_millis(500));
// send SIGINT to the child
nix::sys::signal::kill(
nix::unistd::Pid::from_raw(child.id() as i32),
nix::sys::signal::Signal::SIGINT
).expect("cannot send ctrl-c");
// wait for child to terminate
child.wait().unwrap();
}
You should be able to send all kinds of signals using this method. For more advanced "interactivity" (e.g. child programs like vi that query terminal size) you'd need to create a pseudoterminal like #hansaplast did in his solution.
After a lot of research I figured out it's not too much work to do the pty fork myself. There's pty-rs, but it has bugs and seems unmaintained.
The following code needs pty module of nix which is not yet on crates.io, so Cargo.toml needs this for now:
[dependencies]
nix = {git = "https://github.com/nix-rust/nix.git"}
The following code runs cat in a tty and then writes/reads from it and sends Ctrl-C (3):
extern crate nix;
use std::path::Path;
use nix::pty::{posix_openpt, grantpt, unlockpt, ptsname};
use nix::fcntl::{O_RDWR, open};
use nix::sys::stat;
use nix::unistd::{fork, ForkResult, setsid, dup2};
use nix::libc::{STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO};
use std::os::unix::io::{AsRawFd, FromRawFd};
use std::io::prelude::*;
use std::io::{BufReader, LineWriter};
fn run() -> std::io::Result<()> {
// Open a new PTY master
let master_fd = posix_openpt(O_RDWR)?;
// Allow a slave to be generated for it
grantpt(&master_fd)?;
unlockpt(&master_fd)?;
// Get the name of the slave
let slave_name = ptsname(&master_fd)?;
match fork() {
Ok(ForkResult::Child) => {
setsid()?; // create new session with child as session leader
let slave_fd = open(Path::new(&slave_name), O_RDWR, stat::Mode::empty())?;
// assign stdin, stdout, stderr to the tty, just like a terminal does
dup2(slave_fd, STDIN_FILENO)?;
dup2(slave_fd, STDOUT_FILENO)?;
dup2(slave_fd, STDERR_FILENO)?;
std::process::Command::new("cat").status()?;
}
Ok(ForkResult::Parent { child: _ }) => {
let f = unsafe { std::fs::File::from_raw_fd(master_fd.as_raw_fd()) };
let mut reader = BufReader::new(&f);
let mut writer = LineWriter::new(&f);
writer.write_all(b"hello world\n")?;
let mut s = String::new();
reader.read_line(&mut s)?; // what we just wrote in
reader.read_line(&mut s)?; // what cat wrote out
writer.write(&[3])?; // send ^C
writer.flush()?;
let mut buf = [0; 2]; // needs bytewise read as ^C has no newline
reader.read(&mut buf)?;
s += &String::from_utf8_lossy(&buf).to_string();
println!("{}", s);
println!("cat exit code: {:?}", wait::wait()?); // make sure cat really exited
}
Err(_) => println!("error"),
}
Ok(())
}
fn main() {
run().expect("could not execute command");
}
Output:
hello world
hello world
^C
cat exit code: Signaled(2906, SIGINT, false)
Try adding -t option TWICE to force pseudo-tty allocation. I.e.
klar (16:14) ~>echo foo | ssh user#host.ssh.com tty
not a tty
klar (16:14) ~>echo foo | ssh -t -t user#host.ssh.com tty
/dev/pts/0
When you have a pseudo-tty, I think it should convert that to SIGINT as you wanted to do.
In your simple example, you could also just close stdin after the write, in which case the server should exit. For this particular case it would be more elegant and probably more reliable.
Solution without using a crate
Now that you are spawning a command in Rust, you might as well spawn another to send SIGINT to it. That command is kill.
So, you can do this:
use std::process::{Command, Stdio};
use std::io::{Result, Write};
fn main() -> Result<()> {
let mut child = Command::new("sh")
.arg("-c").arg("-i").arg("cat")
.stdin(Stdio::piped())
.spawn()?;
let mut stdin = child.stdin.take().unwrap();
let mut kill = Command::new("kill")
.arg(child.id().to_string())
.spawn()?;
kill.wait()
}

Killing a Haskell binary

If I press Ctrl+C, this throws an exception (always in thread 0?). You can catch this if you want - or, more likely, run some cleanup and then rethrow it. But the usual result is to bring the program to a halt, one way or another.
Now suppose I use the Unix kill command. As I understand it, kill basically sends a (configurable) Unix signal to the specified process.
How does the Haskell RTS respond to this? Is it documented somewhere? I would imagine that sending SIGTERM would have the same effect as pressing Ctrl+C, but I don't know that for a fact...
(And, of course, you can use kill to send signals that have nothing to do with killing at all. Again, I would imagine that the RTS would ignore, say, SIGHUP or SIGPWR, but I don't know for sure.)
Googling "haskell catch sigterm" led me to System.Posix.Signals of the unix package, which has a rather nice looking system for catching and handling these signals. Just scroll down to the "Handling Signals" section.
EDIT: A trivial example:
import System.Posix.Signals
import Control.Concurrent (threadDelay)
import Control.Concurrent.MVar
termHandler :: MVar () -> Handler
termHandler v = CatchOnce $ do
putStrLn "Caught SIGTERM"
putMVar v ()
loop :: MVar () -> IO ()
loop v = do
putStrLn "Still running"
threadDelay 1000000
val <- tryTakeMVar v
case val of
Just _ -> putStrLn "Quitting" >> return ()
Nothing -> loop v
main = do
v <- newEmptyMVar
installHandler sigTERM (termHandler v) Nothing
loop v
Notice that I had to use an MVar to inform loop that it was time to quit. I tried using exitSuccess from System.Exit, but since termHandler executes in a thread that isn't the main one, it can't cause the program to exit. There might be an easier way to do it, but I've never used this module before so I don't know of one. I tested this on Ubuntu 12.10.
Searching for "signal" in the ghc source code on github revealed the installDefaultSignals function:
void
initDefaultHandlers(void)
{
struct sigaction action,oact;
// install the SIGINT handler
action.sa_handler = shutdown_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGINT, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGINT handler");
}
#if defined(HAVE_SIGINTERRUPT)
siginterrupt(SIGINT, 1); // isn't this the default? --SDM
#endif
// install the SIGFPE handler
// In addition to handling SIGINT, also handle SIGFPE by ignoring it.
// Apparently IEEE requires floating-point exceptions to be ignored by
// default, but alpha-dec-osf3 doesn't seem to do so.
// Commented out by SDM 2/7/2002: this causes an infinite loop on
// some architectures when an integer division by zero occurs: we
// don't recover from the floating point exception, and the
// program just generates another one immediately.
#if 0
action.sa_handler = SIG_IGN;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGFPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGFPE handler");
}
#endif
#ifdef alpha_HOST_ARCH
ieee_set_fp_control(0);
#endif
// ignore SIGPIPE; see #1619
// actually, we use an empty signal handler rather than SIG_IGN,
// so that SIGPIPE gets reset to its default behaviour on exec.
action.sa_handler = empty_handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
if (sigaction(SIGPIPE, &action, &oact) != 0) {
sysErrorBelch("warning: failed to install SIGPIPE handler");
}
set_sigtstp_action(rtsTrue);
}
From that, you can see that GHC installs at least SIGINT and SIGPIPE handlers. I don't know if there are any other signal handlers hidden in the source code.

Resources