Web server - how to parse requests? Asynchronous Stream Tokenizer?

Web server - how to parse requests? Asynchronous Stream Tokenizer? - http

I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).
I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.
Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.
Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?
Any guidance on this is appreciated.
I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?

For HTTP the best way is reading the headers in memory completely (until you receive \r\n\r\n) and then simply splitting by \r\n to get the headers and every header by : to separate name and value.
There's no need to use a complex parser for that.

Related

NiFi forward/duplicate TCP Stream

I'm supposed to duplicate a binary TCP Stream.
So I set up a NiFi 1.9.0 server, put in a ListenTCP processor and a PutTCP processor, configured the proper IPs and Ports and connected them.
So far so good, the packets were received by the ListenTCP processor and also forwareded by the PutTCP processor.
But NiFi seems to mess around with the data somehow, the sent packets aren't exactly the same as received. I expected NiFi to just forward everything 1:1 but something is happening and I cannot find out what.
I've been playing around with the Character Set, Max Batch Size and Batching Message Delemiter settings on the ListenTCP processor and also with the Outgoing Message Delemiter and Character Set on the PutTCP processor.
I also messed around with a MergeContent processor but didn't get it to work properly.
Here you can see the difference between received (red) and sent data (captured using tcpflow).
Link to picture
Another problem is that I don't really know the data I'm processing, it says in the documentation:
These log files are in the machine-readable binary format that is described by the XML file called ebm.xml.
and
The streamed events are in the TCP-based binary format.
I do have access to ebm.xml file, but not sure how I can make use of it.
Anyone an idea how I can get NiFi to simply forward everything?
I'm new to NiFi, so I might have missed some possibilites...

The ListenTCP processor reads data from the stream using a new-line character as a logical message separator. For example, if the stream had:
<chunk1><new-line><chunk2><new-line><chunk3><new-line>
It would result in reading chunk1, chunk2, and chunk3 into an internal queue.
When it writes them back out it uses the outgoing message delimiter. So the outgoing flow file would be:
<chunk1><outgoing-delim><chunk2><outgoing-delim><chunk3><outgoing-delim>
Unfortunately it is more geared towards receiving textual data such as logs which are typically line-delimited. The chunks should be passing through unaltered as byte[], but typically binary data wouldn't have these logical new-line boundaries, so I'm not sure how well it works for that.

How to read stream data from a TCP socket in Swift 2?

Let's suppose, I have a custom server that listens to connections on some port and once it has received a connection, it starts sending data (sort of a logger). Here's the first question:
Can it be just binary data? Actually, I need just two non-zero 8-bit values, and I was thinking of 0-value byte to separate each new portion of data.
These three bytes will be sent once or may be twice a second.
So, now I am looking for some code snippet in Swift 2 to properly read this data. Normally, I would expect calling
connectSocket(IP,port)
which would connect to the socket, and once it receives the first chunk of data,
socketCallBack()
is called, or something like that.
Intuitively, I don't like the idea of checking data in a while (true) loop. Or is this the proper way?
I've seen an example, when it first sends 'get' request to the server and immediately starts waiting for response. Probably, I can call it using a timer, once a second? Will it be correct?
What I am concerned about is trafic. Right now I have impemented it through a web-server, but I don't like that it spends way too much trafic for that overhead http data.
Probably, with that tcp connections on timer that would be much less, and it would save even more trafic if I establish just one connection in the beginning and transmit the data within this connection. Am I right?

Invalid data handling over TCP socket

I have a GUI application that sends/recv over tcp to a server.
Sometimes, we get junk data while doing a tcp recv from the server. While reading these nulls or invalid data, the client application crashes sometimes.
Is there a good way to validate this data? - other than catching this exception.
I dont want the GUI application to crash because of bad data sent by the server.

TCP has a checksum that it uses to validate the data received; that is done by the operating system (or sometimes the network hardware, if you have nice hardware). If the contents are not correct, with a very high probability, the data that was sent was incorrect. I just state that because I'm not totally sure that you were aware of this fact.
If you need to validate the data, you will have to validate the data. Write a function that parses your data, and returns a meaningful value only if there's meaningful data. Make your GUI aware of this.
Your question is kind of self-answering... you can't say "I want to be fault-tolerant, but I don't want to care about faults" ("other than catching this exception"), and based on the lack of description of the data you'd expect, I'd say you don't really care about the form of the data.

recv() data of unknown size with Berkeley Sockets

I have a code in C++ in which i use recv() from Berkeley Sockets to receive data from a remote host. The issue is that i do not know the size of the data ( which is variable ) so i need some kind of timeout opt ( probably ) to make this work.
Since I'm new in sockets programming, i was wondering how does for example a web client handle responses from a server ( eg a server sends the html data to the client ). Does it use some kind of timeout, since it doesn't know how big the page is ? Same with an FTP client.

When your data is of variable length, then typically that data is framed within another container. That is to say, there's a header preceding the actual data block that tell the receiver how much data it should accept.
For example HTTP uses new line characters to delimit data. If there's variable-length message, then in the header it will include "Content-length:" field that indicates exactly how many bytes to read once entire header is received (header stops when you read 2 consecutive new lines).
It is perfectly fine to read 4 bytes from socket, get how much data follows, then do another receive and read the rest. Only be careful, when you ask for 4 bytes, the socket might give you anywhere between 1-4 bytes so anything less than 4 means you need to go back and ask for remaining few bytes. This is a very common mistake. In dev environment you will almost always get 4 bytes when asking for 4, but once you deploy your app, somewhere on some machine you will get random crashes because their network behavior is somehow different.
Generally, it is a bad approach to rely on timeouts to determine when you reach end of data. With a timeout, you might get things "reliably" working in a well-controlled dev environment, but it is a very flaky solution. Any CPU/disk/network hick up might cause your app to stop receiving prematurely. You are also limiting your data throughput and responsiveness since your app is sleeping for some time interval instead of doing work.

WSAECONNABORTED when using recv for the second time

I am writing a 2D multiplayer game consisting of two applications, a console server and windowed client. So far, the client has a FD_SET which is filled with connected clients, a list of my game object pointers and some other things. In the main(), I initialize listening on a socket and create three threads, one for accepting incoming connections and placing them within the FD_SET, another one for processing objects' location, velocity and acceleration and flagging them (if needed) as the ones that have to be updated on the client. The third thread uses the send() function to send update info of every object (iterating through the list of object pointers). Such a packet consists of an operation code, packet size & the actual data. On the client I parse it, by reading first 5 bytes (the opcode and packet size) which are received correctly, but when I want to read the remaining part of the packet (since I now know the size of it), I get a WSAECONNABORTED (error code 10053). I've read about this error, but can't see why it occurs in my application. Any help would be appreciated.

The error means the system closed the socket. This could be because it detected that the client disconnected, or because it was sending more data than you were reading.
A parser for network protocols typcally needs a lot of work to make it robust, and you can't tell how much data you will get in a single read(), e.g. you may get more than your operation code and packet size in the first chunk you read, you might even get less (e.g. only the operation code). Double check this isn't happening in your failure case.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex