How to detect end of stream in F#? - asynchronous

I have a F# function like the following:
open System.IO
open Microsoft.FSharp.Control.CommonExtensions
let rec copyData (ins:Stream) (outs:Stream) = async {
let! bytes = ins.AsyncRead(1)
do! outs.AsyncWrite(bytes)
return! moveData ins outs
}
When the ins stream reaches the end, it throws an AtEndOfStream exception. So I have to catch it in the calling function. How can I prevent this exception by detecting the stream is currently at the end?

The AsyncRead overload that you're using here tries to read exactly the number of bytes you specified (and it fails if it reaches the end, because it cannot read the specified number of bytes).
Alternatively, you can use an overload that takes a buffer and returns the number of bytes read:
let rec copyData (ins:Stream) (outs:Stream) = async {
let buffer = Array.zeroCreate 1024
let! bytes = ins.AsyncRead(buffer)
if bytes > 0 then
do! outs.AsyncWrite(buffer, 0, bytes)
return! moveData ins outs
}
This overload does not throw an exception at the end of the stream, but instead it will return 0 (and it won't write anything into the buffer). So you can just check if the number of bytes read is greater than 0 and stop otherwise.
If the stream is closed already before calling copyData then you'll need to check for CanRead or handle the exception, but if the stream is open before calling AsyncRead, you'll just get 0 back.

Just check the CanRead property, like so:
let rec copyData (ins:Stream) (outs:Stream) = async {
if ins.CanRead then
let! bytes = ins.AsyncRead(1)
do! outs.AsyncWrite(bytes)
return! moveData ins outs
}

Related

Rust TCP how to get bytearray length?

I have a TCP Client in rust, which should communicate with a Java Server. I got the basics working and can send bytearrays between them.
But for the bytearray buffer, I need to know the length of the bytearray. But I don't know I should obtain it. At the moment, I only have a fixed size for the buffer right now.
My Rust code looks like this:
loop {
let mut buffer = vec![0; 12]; //fixed buffer length
let n = stream.read(&mut buffer).await;
let text = from_utf8(&buffer).unwrap();
println!("{}", text);
}
In Java, you can send the size of the buffer directly as an Integer with DataInputStream. Is there any option to do that in rust?
For example, this is how I'm doing it in Java:
public String readMsg(Socket socket) throws IOException {
DataInputStream in = new DataInputStream(new BufferedInputStream(socket.getInputStream()));
byte[] bytes = new byte[in.readInt()]; //dynamic buffer length
in.readFully(bytes);
return new String(bytes, StandardCharsets.US_ASCII);
}
What you want to know is a property of the protocol that you are using. It's not a property of the programming language you use. Based on your Java code it seems like you are using a protocol which sends a 4 byte length field before the message data (signed/unsigned?).
If that is the case you can handle reading the message the same way in Rust:
1. Read the 4 bytes in order to obtain the length information
2. Read the remaining data
3. Deserialize the data
fn read_message(stream: Read) -> io::Result<String> {
let mut buffer = [0u8; 4];
// Read the length information
stream.read_exact(&mut buffer[..])?;
// Deserialize the length
let size = u32::from_be_bytes(buffer);
// Allocate a buffer for the message
// Be sure to check against a maximum size before doing this in production
let mut payload = vec![0; size];
stream.read_exact(&mut payload[..]).await;
// Convert the buffer into a string
let text = String::from_utf8(payload).map_err(/* omitted */)?;
println!("{}", text);
Ok(text)
}
This obviously is only correct if your protocol uses length prefixed messages with a 4byte unsigned int prefix. This is something that you need to check.

Misunderstanding of how the Read trait works for TcpStreams

My goal is to read some bytes from a TcpStream in order to parse the data in each message and build a struct from it.
loop {
let mut buf: Vec<u8> = Vec::new();
let len = stream.read(&mut buf)?;
if 0 == len {
//Disconnected
}
println!("read() -> {}", len);
}
Like in Python, I thought the stream.read() would block until it received some data.
So I've set up a server that calls the loop you see above for each incoming connection. I've then tried to connect to the server with netcat; netcat connects successfully to the server and blocks on the stream.read(), which is what I want; but as soon as I send some data, read() returns 0.
I've also tried doing something similar with stream.read_to_end() but it only appears to only return when the connection is closed.
How can I read from the TcpStream, message per message, knowing that each message can have a different, unknown, size ?
You're getting caught with your pants down by an underlying technicality of Vec more than by std::io::Read, although they both interact in this particular case.
The definition and documentation of Read states:
If the return value of this method is Ok(n), then it must be guaranteed that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:
The important part is bolded.
When you define a new Vec the way you did, it starts with a capacity of zero. This means that the underlying slice (that you will use as a buffer) has a length of zero. As a result, since it must be guaranteed that 0 <= n <= buf.len() and since buf.len() is zero, your read() call immediately returns with 0 bytes read.
To "fix" this, you can either assign a default set of elements to your Vec (Vec::new().resize(1024, 0)), or just use an array from the get-go (let mut buffer:[u8; 1024] = [0; 1024])

buffer in rust-tokio streams is there a way to use something other then &[u8]?

I am trying to make a echo server that capitalize a String when it replies, to practice with tokio as an exercise. I used an array as a buffer which is annoying because what if the string overflows the buffer?
I would like to know if there is a better way to this without using an array, ideally just using a String or a vector without needing to create the buffer array.
I tried read_from_string() but is not async and ends up blocking the socket.
extern crate tokio;
use tokio::net::TcpListener;
use tokio::prelude::*;
fn main() {
let addr = "127.0.0.1:6142".parse().unwrap();
let listener = TcpListener::bind(&addr).unwrap();
let server = listener
.incoming()
.for_each(|socket| {
let (mut reader, mut writer) = socket.split();
let mut buffer = [0; 16];
reader.poll_read(&mut buffer)?;
let s = std::str::from_utf8(&buffer).unwrap();
s.to_uppercase();
writer.poll_write(&mut s.as_bytes())?;
Ok(())
})
.map_err(|e| {
eprintln!("something went wrong {}", e);
});
tokio::run(server);
}
Results:
"012345678901234567890" becomes -> "0123456789012345"
I could increase the buffer of course but it would just kick the can down the road.
I believe tokio_codec is a right tool for such tasks. Tokio documentation: https://tokio.rs/docs/going-deeper/frames/
It uses Bytes / BytesMut as its buffer - very powerful structure which will allow you to process your data however you want and avoid unnecessary copies

How to handle I/O of a subprocess asynchronously? [duplicate]

This question already has answers here:
How to read subprocess output asynchronously
(2 answers)
How do I read the output of a child process without blocking in Rust?
(4 answers)
Closed 3 years ago.
I have a subprocess, which may or may not write something to it's stdout in a specific amount of time, e.g. 3 seconds.
If a new line in the subprocess stdout starts with the correct thing, I want to return the line.
Optimally I would like to realize something like this:
use std::io::{BufRead, BufReader};
use std::thread;
use std::time::Duration;
pub fn wait_for_or_exit(
reader: &BufReader<&mut std::process::ChildStdout>,
wait_time: u64,
cmd: &str,
) -> Option<String> {
let signal: Arc<AtomicBool> = Arc::new(AtomicBool::new(false));
let signal_clone = signal.clone();
let child = thread::spawn(move || {
thread::sleep(Duration::from_millis(wait_time));
signal_clone.store(true, Ordering::Relaxed);
});
let mut line = String::new();
while !signal.load(Ordering::Relaxed) {
//Sleep a really small amount of time not to block cpu
thread::sleep(Duration::from_millis(10));
//This line is obviously invalid!
if reader.has_input() {
line.clear();
reader.read_line(&mut line).unwrap();
if line.starts_with(cmd) {
return Some(line);
}
}
}
None
}
The only line not working here is reader.has_input().
Obviously, if the subprocess answers much faster than the wait_time for a repeated amount of times, there will be a lot of sleeping threads, but I can take care of that with channels.
There are two approaches.
You can spin up a separate thread, and then use some mechanism (probably a channel) to signal success or failure to your waiting thread.
You can use async IO as you mentioned, such as the futures and tokio lib.
I'll demo both. I prefer the futures/Tokio approach, but if you're not familiar with the futures model, then option one might be better.
The Rust stdlib has a Channels API, and this channel actually features a recv_timeout which can help us out quite a bit.
use std::thread;
use std::time::Duration;
use std::sync::mpsc;
// this spins up a separate thread in which to wait for stuff to read
// from the BufReader<ChildStdout>
// If we successfully read, we send the string over the Channel.
// Back in the original thread, we wait for an answer over the channel
// or timeout in wait_time secs.
pub fn wait_for_or_exit(
reader: &BufReader<&mut std::process::ChildStdout>,
wait_time: u64,
cmd: &str,
) -> Option<String> {
let (sender, receiver) = mpsc::channel();
thread::spawn(move || {
let line = reader.read_line();
sender.send(line);
});
match receiver.recv_timeout(Duration::from_secs(wait_time)) {
Ok(line) => if line.starts_with(cmd)
{ Some(line) } else
{ None },
Err(mpsc::RecvTimeoutError::Timeout) => None,
Err(mpsc::RecvTimeoutError::Disconnected) => None
}
}
Option two assumes that you're building a future's based app. In order to accomplish what you want using Async IO is a file descriptor that will let us set NON_BLOCKING. Luckily we don't have to do that ourselves. The Futures and Tokio APIs handle this nicely. The trade-off, is that you have to compose your code out of non-blocking futures.
The code below was taken almost entirely from Tokio Process with a Futures timeout that comes from the Tokio API.
extern crate futures;
extern crate tokio;
extern crate tokio_process;
use std::process::Command;
use std::time::{Duration};
use futures::Future;
use tokio_process::CommandExt;
use tokio::prelude::*;
const TIMEOUT_SECS: u64 = 3;
fn main() {
// Like above, but use `output_async` which returns a future instead of
// immediately returning the `Child`.
let output = Command::new("echo").arg("hello").arg("world")
.output_async();
let future = output.map_err(|e| panic!("failed to collect output: {}", e))
.map(|output| {
assert!(output.status.success());
assert_eq!(output.stdout, b"hello world\n");
println!("received output: {}", String::from_utf8(output.stdout).unwrap());
})
.timeout(Duration::from_secs(TIMEOUT_SECS)) // here is where we say we only want to wait TIMETOUT seconds
.map_err(|_e| { println!("Timed out waiting for data"); });
tokio::run(future);
}

Reading from TcpStream results in empty buffer

I want to read data from a TCP stream but it results in an empty Vec:
extern crate net2;
use net2::TcpBuilder;
use std::io::Read;
use std::io::Write;
use std::io::BufReader;
let tcp = TcpBuilder::new_v4().unwrap();
let mut stream = tcp.connect("127.0.0.1:3306").unwrap();
let mut buf = Vec::with_capacity(1024);
stream.read(&mut buf);
println!("{:?}", buf); // prints []
When I use stream.read_to_end the buffer is filled but this takes way too long.
In Python I can do something like
import socket
TCP_IP = '127.0.0.1'
TCP_PORT = 3306
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
#s.send(MESSAGE)
data = s.recv(BUFFER_SIZE)
s.close()
print "received data:", data
How can I achieve this in Rust?
The two methods you tried don't work for different reasons:
read(): "does not provide any guarantees about whether it blocks waiting for data". In general, read() is unreliable from a users perspective and should only be used as a building block for higher level functions, like read_to_end().
But maybe more importantly, you have a bug in your code: you create your vector via with_capacity() which reserves memory internally, but doesn't change the length of the vector. It is still empty! When you now slice it like &buf, you pass an empty slice to read(), thus read() cannot read any actual data. To fix that, the elements of your vector need to be initialized: let mut buf = vec![0; 1024] or something like that.
read_to_end(): calls read() repeatedly until EOF is encountered. This doesn't really make sense in most TCP stream situations.
So what should you use instead? In your Python code you read a specific number of bytes into a buffer. You can do that in Rust, too: read_exact(). It works like this:
const BUFFER_SIZE: usize = 1024;
let mut stream = ...;
let mut buf = [0; BUFFER_SIZE];
stream.read_exact(&mut buf);
println!("{:?}", buf);
You could also use take(). That way you can use read_to_end():
const BUFFER_SIZE: usize = 1024;
let mut stream = ...;
let mut buf = Vec::with_capacity(BUFFER_SIZE);
stream.take(BUFFER_SIZE).read_to_end(&mut buf);
println!("{:?}", buf);
If you want to use the stream multiple times, you probably want to use by_ref() before calling take().
The two code snippets are not equivalent though! Please read the documentation for more details.

Resources