Asynchronous I/O with io_uring and pipe - asynchronous

I'm trying to fork() the io_uring simulation by creating parent and child process, each process have its own rings, parent for writing to the pipe, child for reading from the pipe, however I stumbled upon some error when the reader side read an empty pipe and instead of waiting for content in the pipe (if therers no error the simulation indeed run faaster), it update the CQE Async task failed. To overcome the problem, I didn't set the O_NONBLOCK in the pipe2(), this makes the reader to wait until theres content in the pipes before unblocking. But on the downside, it will call fcntl() with call syscall and the performance of the simulation is slower than the original simulation file. I wanted to know how to set reader side of pipe to be ready without syscalls
*Edit
I wonder if I can incoporate io_uring (pipes as file descriptor) with eventfd so instead of using blocking pipes, once the write to pipe is completed, it will notify the eventfd, so the reader SQE will wait until eventfd is ready before it read from pipe to prevent reading error

Related

Julia - Is there a way to prevent a block of code from being immediately interrupted by Ctrl+C/SIGINT in Julia?

There is this answer for python from an old thread which I used in the past to great effect (see first answer here How to prevent a block of code from being interrupted by KeyboardInterrupt in Python?). I'd like to know if it is possible to do something similar in Julia.
That's pretty much it, but here's some background.
I'm developing a Markov Chain Monte Carlo algorithm in Julia and I need to insure that the state of the chain is always valid, especially following a Ctrl+C/SIGINT. This is particularly important during development in the REPL where I want to be able to stop the chain but then restart it where it was left off without having to start from scratch and wait for the burn-in period all over again.
As it stand it is almost always the case that the chain will be in an invalid state following a SIGINT.
In other words I want to have blocks of code that are uninterruptibles, namely critical blocks within which a SIGINT is deferred until the block has terminated.
Ctrl+C/SIGINT causes the InterruptException, so you can place your code within a try block and catch the InterruptException.
try
# ... your algorithm here ...
catch e
if e isa InterruptException
save_your_chain_to_a_valid_state()
then_exit()
else
rethrow()
end
end
If you're running the code outside of the REPL, you'll first have to call Base.exit_on_sigint(false) before the try block.
disable_sigint prevents a block of code to be interrupted immediately, whereas catching InterruptException is a mean to handle the interrupt:
try
disable_sigint() do
for i in 1:5
sleep(1)
print(".")
end
end
println("\nnow it is safe to interrupt immediately")
while true
sleep(1)
print("*")
end
catch InterruptException
println("interrupt catched")
end
If running a script remember how to catch Ctrl-C:
julia -e 'include(pop!(ARGS))' script.jl
disable_sigint is able to mask the InterruptException generated by a SIGINT signal (for example a signal coming from a
kill command on unix). For example if the running code is notified throwing directly an interruptException() it is interrupted as soon as possible ignoring the disable_sigint directive.
This may be the case of a jupyter environment where communication between client and kernel processes happens through a IPC channel (ZeroMQ transport channel) or the case of vscode where a "supervisor" process manage a child julia execution context.
Chances are that the kernel interrupt command delivered from jupiter frontend to jupyter kernel backend that "supervises" julia code is not delivered to child process using a SIGINT OS-signal.

In Trio, how do you write data to a socket without waiting?

In Trio, if you want to write some data to a TCP socket then the obvious choice is send_all:
my_stream = await trio.open_tcp_stream("localhost", 1234)
await my_stream.send_all(b"some data")
Note that this both sends that data over the socket and waits for it to be written. But what if you just want to queue up the data to be sent, but not wait for it to be written (at least, you don't want to wait in the same coroutine)?
In asyncio this is straightforward because the two parts are separate functions: write() and drain(). For example:
writer.write(data)
await writer.drain()
So of course if you just want to write the data and not wait for it you can just call write() without awaiting drain(). Is there equivalent functionality in Trio? I know this two-function design is controversial because it makes it hard to properly apply backpressure, but in my application I need them separated.
For now I've worked around it by creating a dedicated writer coroutine for each connection and having a memory channel to send data to that coroutine, but it's quite a lot of faff compared to choosing between calling one function or two, and it seems a bit wasteful (presumably there's still a send buffer under the hood, and my memory channel is like a buffer on top of that buffer).
I posted this on the Trio chat and Nathaniel J. Smith, the creator of Trio, replied with this:
Trio doesn't maintain a buffer "under the hood", no. There's just the kernel's send buffer, but the kernel will apply backpressure whether you want it to or not, so that doesn't help you.
Using a background writer task + an unbounded memory channel is basically what asyncio does for you implicitly.
The other option, if you're putting together a message in multiple pieces and then want to send it when you're done would be to append them into a bytearray and then call send_all once at the end, at the same place where you'd call drain in asyncio
(but obviously that only works if you're calling drain after every logical message; if you're just calling write and letting asyncio drain it in the background then that doesn't help)
So the question was based on a misconception: I wanted to write into Trio's hidden send buffer, but no such thing exists! Using a separate coroutine that waits on a stream and calls send_all() makes more sense than I had thought.
I ended up using a hybrid of the two ideas (using separate coroutine with a memory channel vs using bytearray): save the data to a bytearray, then use a condition variable ParkingLot to signal to the other coroutine that it's ready to be written. That lets me coalesce writes, and also manually check if the buffer's getting too large.

printf messages show up after about 10 of them are sent [duplicate]

Why does printf not flush after the call unless a newline is in the format string? Is this POSIX behavior? How might I have printf immediately flush every time?
The stdout stream is line buffered by default, so will only display what's in the buffer after it reaches a newline (or when it's told to). You have a few options to print immediately:
Print to stderrinstead using fprintf (stderr is unbuffered by default):
fprintf(stderr, "I will be printed immediately");
Flush stdout whenever you need it to using fflush:
printf("Buffered, will be flushed");
fflush(stdout); // Will now print everything in the stdout buffer
Disable buffering on stdout by using setbuf:
setbuf(stdout, NULL);
Or use the more flexible setvbuf:
setvbuf(stdout, NULL, _IONBF, 0);
No, it's not POSIX behaviour, it's ISO behaviour (well, it is POSIX behaviour but only insofar as they conform to ISO).
Standard output is line buffered if it can be detected to refer to an interactive device, otherwise it's fully buffered. So there are situations where printf won't flush, even if it gets a newline to send out, such as:
myprog >myfile.txt
This makes sense for efficiency since, if you're interacting with a user, they probably want to see every line. If you're sending the output to a file, it's most likely that there's not a user at the other end (though not impossible, they could be tailing the file). Now you could argue that the user wants to see every character but there are two problems with that.
The first is that it's not very efficient. The second is that the original ANSI C mandate was to primarily codify existing behaviour, rather than invent new behaviour, and those design decisions were made long before ANSI started the process. Even ISO nowadays treads very carefully when changing existing rules in the standards.
As to how to deal with that, if you fflush (stdout) after every output call that you want to see immediately, that will solve the problem.
Alternatively, you can use setvbuf before operating on stdout, to set it to unbuffered and you won't have to worry about adding all those fflush lines to your code:
setvbuf (stdout, NULL, _IONBF, BUFSIZ);
Just keep in mind that may affect performance quite a bit if you are sending the output to a file. Also keep in mind that support for this is implementation-defined, not guaranteed by the standard.
ISO C99 section 7.19.3/3 is the relevant bit:
When a stream is unbuffered, characters are intended to appear from the source or at the destination as soon as possible. Otherwise characters may be accumulated and transmitted to or from the host environment as a block.
When a stream is fully buffered, characters are intended to be transmitted to or from the host environment as a block when a buffer is filled.
When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when a new-line character is encountered.
Furthermore, characters are intended to be transmitted as a block to the host environment when a buffer is filled, when input is requested on an unbuffered stream, or when input is requested on a line buffered stream that requires the transmission of characters from the host environment.
Support for these characteristics is implementation-defined, and may be affected via the setbuf and setvbuf functions.
It's probably like that because of efficiency and because if you have multiple programs writing to a single TTY, this way you don't get characters on a line interlaced. So if program A and B are outputting, you'll usually get:
program A output
program B output
program B output
program A output
program B output
This stinks, but it's better than
proprogrgraam m AB ououtputputt
prproogrgram amB A ououtputtput
program B output
Note that it isn't even guaranteed to flush on a newline, so you should flush explicitly if flushing matters to you.
To immediately flush call fflush(stdout) or fflush(NULL) (NULL means flush everything).
stdout is buffered, so will only output after a newline is printed.
To get immediate output, either:
Print to stderr.
Make stdout unbuffered.
Note: Microsoft runtime libraries do not support line buffering, so printf("will print immediately to terminal"):
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setvbuf
by default, stdout is line buffered, stderr is none buffered and file is completely buffered.
You can fprintf to stderr, which is unbuffered, instead. Or you can flush stdout when you want to. Or you can set stdout to unbuffered.
Use setbuf(stdout, NULL); to disable buffering.
There are generally 2 levels of buffering-
1. Kernel buffer Cache (makes read/write faster)
2. Buffering in I/O library (reduces no. of system calls)
Let's take example of fprintf and write().
When you call fprintf(), it doesn't wirte directly to the file. It first goes to stdio buffer in the program's memory. From there it is written to the kernel buffer cache by using write system call. So one way to skip I/O buffer is directly using write(). Other ways are by using setbuff(stream,NULL). This sets the buffering mode to no buffering and data is directly written to kernel buffer.
To forcefully make the data to be shifted to kernel buffer, we can use "\n", which in case of default buffering mode of 'line buffering', will flush I/O buffer.
Or we can use fflush(FILE *stream).
Now we are in kernel buffer. Kernel(/OS) wants to minimise disk access time and hence it reads/writes only blocks of disk. So when a read() is issued, which is a system call and can be invoked directly or through fscanf(), kernel reads the disk block from disk and stores it in a buffer. After that data is copied from here to user space.
Similarly that fprintf() data recieved from I/O buffer is written to the disk by the kernel. This makes read() write() faster.
Now to force the kernel to initiate a write(), after which data transfer is controlled by hardware controllers, there are also some ways. We can use O_SYNC or similar flags during write calls. Or we could use other functions like fsync(),fdatasync(),sync() to make the kernel initiate writes as soon as data is available in the kernel buffer.

Will the write() system call block further operation till read() is involved, or vice versa?

Written as part of a TCP/IP client-server:
Server:
write(nfds,data1,sizeof(data1));
usleep(1000);
write(nfds,data2,sizeof(data2));
Client:
read(fds,s,sizeof(s));
printf("%s",s);
read(fds,s,sizeof(s));
printf("%s",s);
Without usleep(1000) between the two calls to write(), the client prints data1 twice. Why is this?
Background:
I am doing a Client-Server program where the server has to send two consecutive pieces of information after their acquisition, via the network (socket); nfds is the file descriptor we get from accept().
In the client side, we receive these information via read; here fds is the file descriptor obtained via socket().
My issue is that when I am NOT using the usleep(1000) between the write() functions, the client just prints the info represented by data1 twice, instead of printing data1 and then data2. When I put in the usleep() it's fine. Exactly WHY is this happening? Is write() blocking the operation till the buffer is read or is read() blocking the operation till info is written into the buffer? Or am I completely off the page?
You are making several false assumptions. There is nothing in TCP that guarantees that one send equals one receive. There is a lot of buffering, at both ends, and there are deliberate delays in sending to as to coalesce packets (the Nagle algorithm). When you call read(), or recv() and friends, you need to store the result into a variable and examine it for each of the following cases:
-1: an error: examine/log/print errno, or strerror(), or call perror(), and in most cases close the socket and exit the reading loop.
0: end of stream; the owner has closed the connection; close the socket and exit the reading loop.
a positive value but less than you expected: keep reading and accumulate the data until you have everything you need.
a positive value that is more than you expected: process the data you expected, and save the rest for next time.
exactly what you expected: process the data, discard it all, and repeat. This isn the easy case, and it is rare, but it is the only case you are currently programming for.
Don't add sleeps into networking code. It doesn't solve problems, it only delays them.

Piping as interprocess communication

I am interested in writing separate program modules that run as independent threads that I could hook together with pipes. The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines. There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Are there differences in the behaviour named and unnamed pipes?
Does it matter which end of the pipe I open first with named pipes?
Is the behaviour of pipes consistent between different Linux systems?
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Wow, that's a lot of questions. Let's see if I can cover everything...
It seems like the receiving end will
block waiting for input, which I would
expect
You expect correctly an actual 'read' call will block until something is there. However, I believe there are some C functions that will allow you to 'peek' at what (and how much) is waiting in the pipe. Unfortunately, I don't remember if this blocks as well.
will the sending end block sometimes
waiting for someone to read from the
stream
No, sending should never block. Think of the ramifications if this were a pipe across the network to another computer. Would you want to wait (through possibly high latency) for the other computer to respond that it received it? Now this is a different case if the reader handle of the destination has been closed. In this case, you should have some error checking to handle that.
If I write an eof to the stream can I
keep continue writing to that stream
until I close it
I would think this depends on what language you're using and its implementation of pipes. In C, I'd say no. In a linux shell, I'd say yes. Someone else with more experience would have to answer that.
Are there differences in the behaviour
named and unnamed pipes?
As far as I know, yes. However, I don't have much experience with named vs unnamed. I believe the difference is:
Single direction vs Bidirectional communication
Reading AND writing to the "in" and "out" streams of a thread
Does it matter which end of the pipe I
open first with named pipes?
Generally no, but you could run into problems on initialization trying to create and link the threads with each other. You'd need to have one main thread that creates all the sub-threads and syncs their respective pipes with each other.
Is the behaviour of pipes consistent
between different linux systems?
Again, this depends on what language, but generally yes. Ever heard of POSIX? That's the standard (at least for linux, Windows does it's own thing).
Does the behaviour of the pipes depend
on the shell I'm using or the way I've
configured it?
This is getting into a little more of a gray area. The answer should be no since the shell should essentially be making system calls. However, everything up until that point is up for grabs.
Are there any other questions I should
be asking
The questions you've asked shows that you have a decent understanding of the system. Keep researching and focus on what level you're going to be working on (shell, C, so on). You'll learn a lot more by just trying it though.
This is all based on a UNIX-like system; I'm not familiar with the specific behavior of recent versions of Windows.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
Yes, although on a modern machine it may not happen often. The pipe has an intermediate buffer that can potentially fill up. If it does, the write side of the pipe will indeed block. But if you think about it, there aren't a lot of files that are big enough to risk this.
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Um, you mean like a CTRL-D, 0x04? Sure, as long as the stream is set up that way. Viz.
506 # cat | od -c
abc
^D
efg
0000000 a b c \n 004 \n e f g \n
0000012
Are there differences in the behaviour named and unnamed pipes?
Yes, but they're subtle and implementation dependent. The biggest one is that you can write to a named pipe before the other end is running; with unnamed pipes, the file descriptors get shared during the fork/exec process, so there's no way to access the transient buffer without the processes being up.
Does it matter which end of the pipe I open first with named pipes?
Nope.
Is the behaviour of pipes consistent between different linux systems?
Within reason, yes. Buffer sizes etc may vary.
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
No. When you create a pipe, under the covers what happens is your parent process (the shell) creates a pipe which has a pair of file descriptors, then does a fork exec like this pseudocode:
Parent:
create pipe, returning two file descriptors, call them fd[0] and fd[1]
fork write-side process
fork read-side process
Write-side:
close fd[0]
connect fd[1] to stdout
exec writer program
Read-side:
close fd[1]
connect fd[0] to stdin
exec reader program
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Is everything you want to do really going to lay out in a line like this? If not, you might want to think about a more general architecture. But the insight that having lots of separate processes interacting through the "narrow" interface of a pipe is desirable is a good one.
[Updated: I had the file descriptor indices reversed at first. They're correct now, see man 2 pipe.]
As Dashogun and Charlie Martin noted, this is a big question. Some parts of their answers are inaccurate, so I'm going to answer too.
I am interested in writing separate program modules that run as independent threads that I could hook together with pipes.
Be wary of trying to use pipes as a communication mechanism between threads of a single process. Because you would have both read and write ends of the pipe open in a single process, you would never get the EOF (zero bytes) indication.
If you were really referring to processes, then this is the basis of the classic Unix approach to building tools. Many of the standard Unix programs are filters that read from standard input, transform it somehow, and write the result to standard output. For example, tr, sort, grep, and cat are all filters, to name but a few. This is an excellent paradigm to follow when the data you are manipulating permits it. Not all data manipulations are conducive to this approach, but there are many that are.
The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines.
Good points. Be aware that there isn't really a pipe mechanism between machines, though you can get close to it with programs such as rsh or (better) ssh. However, internally, such programs may read local data from pipes and send that data to remote machines, but they communicate between machines over sockets, not using pipes.
There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.
OK; asking questions is one (good) way to learn. Experimenting is another, of course.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
Yes. There is a limit to the size of a pipe buffer. Classically, this was quite small - 4096 or 5120 were common values. You may find that modern Linux uses a larger value. You can use fpathconf() and _PC_PIPE_BUF to find out the size of a pipe buffer. POSIX only requires the buffer to be 512 (that is, _POSIX_PIPE_BUF is 512).
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Technically, there is no way to write EOF to a stream; you close the pipe descriptor to indicate EOF. If you are thinking of control-D or control-Z as an EOF character, then those are just regular characters as far as pipes are concerned - they only have an effect like EOF when typed at a terminal that is running in canonical mode (cooked, or normal).
Are there differences in the behaviour named and unnamed pipes?
Yes, and no. The biggest differences are that unnamed pipes must be set up by one process and can only be used by that process and children who share that process as a common ancestor. By contrast, named pipes can be used by previously unassociated processes. The next big difference is a consequence of the first; with an unnamed pipe, you get back two file descriptors from a single function (system) call to pipe(), but you open a FIFO or named pipe using the regular open() function. (Someone must create a FIFO with the mkfifo() call before you can open it; unnamed pipes do not need any such prior setup.) However, once you have a file descriptor open, there is precious little difference between a named pipe and an unnamed pipe.
Does it matter which end of the pipe I open first with named pipes?
No. The first process to open the FIFO will (normally) block until there's a process with the other end open. If you open it for reading and writing (aconventional but possible) then you won't be blocked; if you use the O_NONBLOCK flag, you won't be blocked.
Is the behaviour of pipes consistent between different Linux systems?
Yes. I've not heard of or experienced any problems with pipes on any of the systems where I've used them.
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
No: pipes and FIFOs are independent of the shell you use.
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Just remember that you must close the reading end of a pipe in the process that will be writing, and the writing end of the pipe in the process that will be reading. If you want bidirectional communication over pipes, use two separate pipes. If you create complicated plumbing arrangements, beware of deadlock - it is possible. A linear pipeline does not deadlock, however (though if the first process never closes its output, the downstream processes may wait indefinitely).
I observed both above and in comments to other answers that pipe buffers are classically limited to quite small sizes. #Charlie Martin counter-commented that some versions of Unix have dynamic pipe buffers and these can be quite large.
I'm not sure which ones he has in mind. I used the test program that follows on Solaris, AIX, HP-UX, MacOS X, Linux and Cygwin / Windows XP (results below):
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
static const char *arg0;
static void err_syserr(char *str)
{
int errnum = errno;
fprintf(stderr, "%s: %s - (%d) %s\n", arg0, str, errnum, strerror(errnum));
exit(1);
}
int main(int argc, char **argv)
{
int pd[2];
pid_t kid;
size_t i = 0;
char buffer[2] = "a";
int flags;
arg0 = argv[0];
if (pipe(pd) != 0)
err_syserr("pipe() failed");
if ((kid = fork()) < 0)
err_syserr("fork() failed");
else if (kid == 0)
{
close(pd[1]);
pause();
}
/* else */
close(pd[0]);
if (fcntl(pd[1], F_GETFL, &flags) == -1)
err_syserr("fcntl(F_GETFL) failed");
flags |= O_NONBLOCK;
if (fcntl(pd[1], F_SETFL, &flags) == -1)
err_syserr("fcntl(F_SETFL) failed");
while (write(pd[1], buffer, sizeof(buffer)-1) == sizeof(buffer)-1)
{
putchar('.');
if (++i % 50 == 0)
printf("%u\n", (unsigned)i);
}
if (i % 50 != 0)
printf("%u\n", (unsigned)i);
kill(kid, SIGINT);
return 0;
}
I'd be curious to get extra results from other platforms. Here are the sizes I found. All the results are larger than I expected, I must confess, but Charlie and I may be debating the meaning of 'quite large' when it comes to buffer sizes.
8196 - HP-UX 11.23 for IA-64 (fcntl(F_SETFL) failed)
16384 - Solaris 10
16384 - MacOS X 10.5 (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
32768 - AIX 5.3
65536 - Cygwin / Windows XP (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
65536 - SuSE Linux 10 (and CentOS) (fcntl(F_SETFL) failed)
One point that is clear from these tests is that O_NONBLOCK works with pipes on some platforms and not on others.
The program creates a pipe, and forks. The child closes the write end of the pipe, and then goes to sleep until it gets a signal - that's what pause() does. The parent then closes the read end of the pipe, and sets the flags on the write descriptor so that it won't block on an attempt to write on a full pipe. It then loops, writing one character at a time, and printing a dot for each character written, and a count and newline every 50 characters. When it detects a write problem (buffer full, since the child is not reading a thing), it stops the loop, writes the final count, and kills the child.

Resources