Why is Rust's std::thread::sleep allowing my HTTP response to return the correct body? - http

I am working on the beginning of the final chapter of The Rust Programming Language, which is teaching how to write an HTTP response with Rust.
For some reason, the HTML file being sent does not display in the browser unless I have Rust wait before calling TcpResponse::flush().
Here is the code:
use std::io::prelude::*;
use std::net::TcpListener;
use std::net::TcpStream;
use std::fs;
use std::thread::sleep;
use std::time::Duration;
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
handle_connection(stream);
}
}
fn handle_connection(mut stream: TcpStream) {
let mut buffer = [0; 1024];
stream.read(&mut buffer).unwrap();
let contents = fs::read_to_string("hello.html").unwrap();
let response = format!(
"HTTP/1.1 200 OK\r\nContent-Length: {}\r\n{}",
contents.len(),
contents
);
stream.write(response.as_bytes()).unwrap();
// let i = stream.write(response.as_bytes()).unwrap();
// println!("{} bytes written to the stream", i);
// ^^ using this code instead will sometimes make it display properly
sleep(Duration::from_secs(1));
// ^^ uncommenting this will cause a blank page to load.
stream.flush().unwrap();
}
I observe the same behavior in multiple browsers.
According to the Rust book, calling TcpListener::flush should ensure that the bytes finish writing to the stream. So why would I be unable to view the HTML file in the browser unless I sleep the thread before flushing?
I have done hard reloading and restarted the server with cargo run multiple times and the behavior is the same. I have also printed out the file contents to the terminal, and the contents are being read fine under either condition (of course they are).
I wonder if this is a problem with my operating system. I'm on Windows 10.
It isn't really holding the project up as I can continue learning (and I'm not planning on putting an actual web project into production right now), but I would appreciate any insight anyone has on this issue. There must be something about Rust's handling of the stream or the environment that I am not understanding.
Thanks for your time!

Related

How to send GET request longer than 65535 symbols from rust?

I am rewriting part of my API from python to rust. In particular, I am trying to make an HTTP request to OSRM server to get a big distance matrix. This kind of request can have quite large URLs. In python everything works fine, but in rust I get an error:
thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(TooLong)'
I have tried to use several HTTP client libraries: reqwest, surf, isahc, awc. But it turns out that constraining logic is located at the URL processing library https://github.com/hyperium/http and most HTTP clients depend on this library. So they behave the same. I could not use some libs, for example with awc I got compile-time errors with my async code.
Is there any way to send a large GET request from rust, preferably asynchronously?
As freakish pointed out in the comments already, having such a long URL is a bad idea, anything longer than 2,000 characters won't work in most browsers.
That being said: In the comments, you stated that an external API wants those crazily long URIs, so you don't really have an alternative. Therefore, let's give this problem a shot.
It looks like the limitation to 65.534 bytes is because the http library stores the position of the query string as a u16 (and uses 65,535 if there is no query part). The following patch seems to make the code use u32 instead, thereby raising the number of characters to 4,294,967,294 (if you've got longer URIs than that, you might be able to use u64 instead, but that would be an URI of a length greater than 4 GB – I doubt you need this):
--- a/src/uri/mod.rs
+++ b/src/uri/mod.rs
## -141,7 +141,7 ## enum ErrorKind {
}
// u16::MAX is reserved for None
-const MAX_LEN: usize = (u16::MAX - 1) as usize;
+const MAX_LEN: usize = (u32::MAX - 1) as usize;
// URI_CHARS is a table of valid characters in a URI. An entry in the table is
// 0 for invalid characters. For valid characters the entry is itself (i.e.
diff --git a/src/uri/path.rs b/src/uri/path.rs
index be2cb65..9abec4c 100644
--- a/src/uri/path.rs
+++ b/src/uri/path.rs
## -11,10 +11,10 ## use crate::byte_str::ByteStr;
#[derive(Clone)]
pub struct PathAndQuery {
pub(super) data: ByteStr,
- pub(super) query: u16,
+ pub(super) query: u32,
}
-const NONE: u16 = ::std::u16::MAX;
+const NONE: u32 = ::std::u32::MAX;
impl PathAndQuery {
// Not public while `bytes` is unstable.
## -32,7 +32,7 ## impl PathAndQuery {
match b {
b'?' => {
debug_assert_eq!(query, NONE);
- query = i as u16;
+ query = i as u32;
break;
}
b'#' => {
You could try to get this merged, however the issue covering this problem sounds like a pull request might not be accepted. Depending on your use case, you could fork the repository, commit the fix and then use the Cargo features for overriding dependencies to make Cargo use your patched version instead of the version in the repositories. The following addition to your Cargo.toml might get you started:
[patch.crates-io]
http = { git = 'https://github.com/your/repository' }
Note however that this only overrides the current version of the Uri crate – as soon as a new version of the original crate is published, it will probably be chosen by Cargo until you update your fork.

corrupted file when sending it via POST with reqwest

I'm trying to send a POST request to my server including a file. I can do that with curl following this example with no problems https://gokapi.readthedocs.io/en/latest/advanced.html#interacting-with-the-api, but I can't with Rust.
When I try to implement the request with Rust I have issues, namely the file is corrupted as if it was sent in the wrong way. I tried to get it working with this code,
fn upload(file: &String) -> Result<(), Box<dyn std::error::Error>> {
let client = reqwest::blocking::Client::new();
let mut form = multipart::Form::new()
.file("file", file)
.unwrap()
.text("allowedDownloads", "0")
.text("expiryDays", "2")
.text("password", "");
let res = client
.post("http://myserver.com/api/files/add")
.header(ACCEPT, "application/json")
.header("apikey", "secret")
.header("Accept-Encoding", "gzip, deflate, br")
.multipart(form)
.send();
let response_json = json::parse(&res.unwrap().text().unwrap()).unwrap();
let id = &response_json["FileInfo"]["Id"];
print!("http://myserver.com/downloadFile?id={}", id);
Ok(())
}
but the server receives a bad file, 7zip gives me this error.
tried doing the same script in python, and I got it working in 3 lines.
import requests
files = {'file': ("1398608 Fractal Dreamers - Gardens Under a Spring Sky.osz", open("1398608 Fractal Dreamers - Gardens Under a Spring Sky.osz", "rb"), "application/octet-stream")}
request = requests.post("http://myserver/api/files/add", files=files, headers={'apikey': 'api'})
the file uploaded from the python script works flawlessly, while the rust doesn't.
Any help is appreciated as I'm still a beginner with Rust.
I also did try Sending attachment with reqwest but I get
{"Result":"error","ErrorMessage":"multipart: NextPart: bufio: buffer full"}
EDIT: the issue looks like it's related to file (being the filename) including some strange characters? Test subject file was "1398608 Fractal Dreamers - Gardens Under a Spring Sky.osz", but changing it to "a.osz" made the issue disappear. I have no clue why and how is that
content of the zip is:
"Fractal Dreamers - Gardens Under a Spring Sky ([Crz]xz1z1z) [Vernal].osu"
"audio.mp3"
I get the error with the full name, but "1398608 Fractal Dreamers - Gardens Under a Spring Sky.zip" works as well. What's the issue with .osz?

Streaming with ZeroRPC

As you may know, ZeroRPC documentation is sparse. I can't get Streaming between a Python server and a Node client to work.
Here is the Python method:
#zerorpc.stream
def PublishWhaterver(self, some_args):
yield "love"
yield "stack"
yield "overflow"
Here is the Node call:
export const tryStream = () => {
connectedZerorpcClient.invoke('PublishWhatever', (error, res, more) => {
console.log('STREAM', res, more);
});
};
This code will log "STREAM love", and then do nothing.
So here are my questions:
In the Python server code, am I supposed to call PublishWhatever with relevant args so that it yield additionnal values ?
In the Node client, should I call some recursive function when there is more data ?
What I am trying to implement is a Pub/Sub system but right now implementation seems to only exists for a Python server and a Python client, there are no Node example.
The example on the main page and tests are not relevant either, it shows how to stream an array that already exists when the invoke method is called.Here the messages are generated during some heavy computations, I want the server to be able to tell the client "here, some data are ready" and never disconnect.
Well, ZeroRPC actively promotes, that it is using its own python implementation code as a self-documentation how things work. In other words, no one has spent such additional efforts needed so as to publish a user-focused, the less a learning-process focused documentation.
Anyway, try to obey the few "visible" statements from the ZeroRPC description.
#zerorpc.stream
def PublishWhaterver(self, some_args):
yield ( "love", "stack", "overflow", ) # one, tuple-wrapped result-container

Memory leak while sending response from rebus handler

I saw a very strange behavior in my rebus handler which is self hosted in exe. Right after sending response using bus.send method it adds up some memory consumed by process. I tried to look up object graph using memory profile and found that rebus is holding response message in serialized format somewhere.
Object graph was showing below hierarchy to the root.
System.Message --> CachedBodyMessage --> stream
Give me some pointers if anybody is aware of this thing.
I understand that a memory leak is a grave concern, but my belief is that it is unlikely that Rebus should contain a memory leak.
This belief is rooted in the fact that I have been running Windows Service-hosted Rebus endpoints in production for 1,5 years now, and several of them (e.g. the timeout managers) have sometimes been running for several months without being restarted.
I'd like to be absolutely bulletproof sure though, so I'm willing to investigate the issue you're reporting.
You're mentioning "CachedBodyMessage" - judging by the names of fields inside System.Messaging.Message, it sounds like it's something within MSMQ. To try to reproduce your issue, I coded the following test:
[Test, Ignore("Only works in RELEASE mode because otherwise object references are held on to for the duration of the method")]
public void DoesNotLeakMessages()
{
// arrange
const string inputQueueName = "test.leak.input";
var queue = new MsmqMessageQueue(inputQueueName);
disposables.Add(queue);
var body = Encoding.UTF8.GetBytes(new string('*', 32768));
var message = new TransportMessageToSend
{
Headers = new Dictionary<string, object> { { Headers.MessageId, "msg-1" } },
Body = body
};
var weakMessageRef = new WeakReference(message);
var weakBodyRef = new WeakReference(body);
// act
queue.Send(inputQueueName, message, new NoTransaction());
message = null;
body = null;
GC.Collect();
GC.WaitForPendingFinalizers();
// assert
Assert.That(weakMessageRef.IsAlive, Is.False, "Expected the message to have been collected");
Assert.That(weakBodyRef.IsAlive, Is.False, "Expected the body bytes to have been collected");
}
which verifies that the sent transport message is collected as it should (will only do this in RELEASE mode though, because of the way DEBUG mode holds on to object references within scope)
I'll try and run the TimePrinter sample now and leave it running for a while to see if I can reproduce the issue. If you stumble upon more information about e.g. exactly which objects are leaking, it would be very helpful.
Thanks again for taking the time to report your worries to me :)
Followup:
I've modified the TimePrinter sample so that it sends 50 msg/s and includes a 64 KB random string payload with each message, and I've tracked the memory usage for almost four hours now. As you can see, it does not look like memory is being leaked.
I'll leave it running the rest of the day, just to be sure.
Maybe you can tell me some more about why you suspected there was a memory leak in the first place?
Update:
As you can see from the trace, it has now been running for 7 hours and thus more than 1,200,000 messages containing more than 70 GB of data has been sent and consumed by the same process. If cached message bodies were leaking, I am pretty sure that we would have been able to see something rising on the graph.

"Throttled" async download in F#

I'm trying to download the 3000+ photos referenced from the xml backup of my blog. The problem I came across is that if just one of those photos is no longer available, the whole async gets blocked because AsyncGetResponse doesn't do timeouts.
ildjarn helped me to put together a version of AsyncGetResponse which does fail on timeout, but using that gives a lot more timeouts - as though requests that are just queued timeout. It seems like all the WebRequests are launched 'immediately', the only way to make it work is to set the timeout to the time required to download all of them combined: which isn't great because it means I have adjust the timeout depending on the number of images.
Have I reached the limits of vanilla async? Should I be looking at reactive extensions instead?
This is a bit embarassing, because I've already asked two questions here on this particular bit of code, and I still haven't got it working the way I want!
I think there must be a better way to find out that a file is not available than using a timeout. I'm not exactly sure, but is there some way to make it throw an exception if a file cannot be found? Then you could just wrap your async code inside try .. with and you should avoid most of the problems.
Anyway, if you want to write your own "concurrency manager" that runs certain number of requests in parallel and queues remaining pending requests, then the easiest option in F# is to use agents (the MailboxProcessor type). The following object encapsulates the behavior:
type ThrottlingAgentMessage =
| Completed
| Work of Async<unit>
/// Represents an agent that runs operations in concurrently. When the number
/// of concurrent operations exceeds 'limit', they are queued and processed later
type ThrottlingAgent(limit) =
let agent = MailboxProcessor.Start(fun agent ->
/// Represents a state when the agent is blocked
let rec waiting () =
// Use 'Scan' to wait for completion of some work
agent.Scan(function
| Completed -> Some(working (limit - 1))
| _ -> None)
/// Represents a state when the agent is working
and working count = async {
while true do
// Receive any message
let! msg = agent.Receive()
match msg with
| Completed ->
// Decrement the counter of work items
return! working (count - 1)
| Work work ->
// Start the work item & continue in blocked/working state
async { try do! work
finally agent.Post(Completed) }
|> Async.Start
if count < limit then return! working (count + 1)
else return! waiting () }
working 0)
/// Queue the specified asynchronous workflow for processing
member x.DoWork(work) = agent.Post(Work work)
Nothing is ever easy. :)
I think the issues you're hitting are intrinsic to the problem domain (as opposed to merely being issues with the async programming model, though they do interact somewhat).
Say you want to download 3000 pictures. First, in your .NET process, there is something like System.Net.ConnectionLimit or something I forget the name of, that will e.g. throttle the number of simultaneous HTTP connections your .NET process can run simultaneously (and the default is just '2' I think). So you could find that control and set it to a higher number, and it would help.
But then next, your machine and internet connection have finite bandwidth. So even if you could try to concurrently start 3000 HTTP connections, each individual connection would get slower based on the bandwidth pipe limitations. So this would also interact with timeouts. (And this doesn't even consider what kinds of throttles/limits are on the server. Maybe if you send 3000 requests it will think you are DoS attacking and blacklist your IP.)
So this is really a problem domain where a good solution requires some intelligent throttling and flow-control in order to manage how the underlying system resources are used.
As in the other answer, F# agents (MailboxProcessors) are a good programming model for authoring such throttling/flow-control logic.
(Even with all that, if most picture files are like 1MB but then there is a 1GB file mixed in there, that single file might trip a timeout.)
Anyway, this is not so much an answer to the question, as just pointing out how much intrinsic complexity there is in the problem domain itself. (Perhaps it's also suggestive of why UI 'download managers' are so popular.)
Here's a variation on Tomas's answer, because I needed an agent which could return results.
type ThrottleMessage<'a> =
| AddJob of (Async<'a>*AsyncReplyChannel<'a>)
| DoneJob of ('a*AsyncReplyChannel<'a>)
| Stop
/// This agent accumulates 'jobs' but limits the number which run concurrently.
type ThrottleAgent<'a>(limit) =
let agent = MailboxProcessor<ThrottleMessage<'a>>.Start(fun inbox ->
let rec loop(jobs, count) = async {
let! msg = inbox.Receive() //get next message
match msg with
| AddJob(job) ->
if count < limit then //if not at limit, we work, else loop
return! work(job::jobs, count)
else
return! loop(job::jobs, count)
| DoneJob(result, reply) ->
reply.Reply(result) //send back result to caller
return! work(jobs, count - 1) //no need to check limit here
| Stop -> return () }
and work(jobs, count) = async {
match jobs with
| [] -> return! loop(jobs, count) //if no jobs left, wait for more
| (job, reply)::jobs -> //run job, post Done when finished
async { let! result = job
inbox.Post(DoneJob(result, reply)) }
|> Async.Start
return! loop(jobs, count + 1) //job started, go back to waiting
}
loop([], 0)
)
member m.AddJob(job) = agent.PostAndAsyncReply(fun rep-> AddJob(job, rep))
member m.Stop() = agent.Post(Stop)
In my particular case, I just need to use it as a 'one shot' 'map', so I added a static function:
static member RunJobs limit jobs =
let agent = ThrottleAgent<'a>(limit)
let res = jobs |> Seq.map (fun job -> agent.AddJob(job))
|> Async.Parallel
|> Async.RunSynchronously
agent.Stop()
res
It seems to work ok...
Here's an out of the box solution:
FSharpx.Control offers an Async.ParallelWithThrottle function. I'm not sure if it is the best implementation as it uses SemaphoreSlim. But the ease of use is great and since my application doesn't need top performance it works well enough for me. Although since it is a library if someone knows how to make it better it is always a nice thing to make libraries top performers out of the box so the rest of us can just use the code that works and just get our work done!

Resources