Invalid data handling over TCP socket - tcp

I have a GUI application that sends/recv over tcp to a server.
Sometimes, we get junk data while doing a tcp recv from the server. While reading these nulls or invalid data, the client application crashes sometimes.
Is there a good way to validate this data? - other than catching this exception.
I dont want the GUI application to crash because of bad data sent by the server.

TCP has a checksum that it uses to validate the data received; that is done by the operating system (or sometimes the network hardware, if you have nice hardware). If the contents are not correct, with a very high probability, the data that was sent was incorrect. I just state that because I'm not totally sure that you were aware of this fact.
If you need to validate the data, you will have to validate the data. Write a function that parses your data, and returns a meaningful value only if there's meaningful data. Make your GUI aware of this.
Your question is kind of self-answering... you can't say "I want to be fault-tolerant, but I don't want to care about faults" ("other than catching this exception"), and based on the lack of description of the data you'd expect, I'd say you don't really care about the form of the data.

Related

How to read stream data from a TCP socket in Swift 2?

Let's suppose, I have a custom server that listens to connections on some port and once it has received a connection, it starts sending data (sort of a logger). Here's the first question:
Can it be just binary data? Actually, I need just two non-zero 8-bit values, and I was thinking of 0-value byte to separate each new portion of data.
These three bytes will be sent once or may be twice a second.
So, now I am looking for some code snippet in Swift 2 to properly read this data. Normally, I would expect calling
connectSocket(IP,port)
which would connect to the socket, and once it receives the first chunk of data,
socketCallBack()
is called, or something like that.
Intuitively, I don't like the idea of checking data in a while (true) loop. Or is this the proper way?
I've seen an example, when it first sends 'get' request to the server and immediately starts waiting for response. Probably, I can call it using a timer, once a second? Will it be correct?
What I am concerned about is trafic. Right now I have impemented it through a web-server, but I don't like that it spends way too much trafic for that overhead http data.
Probably, with that tcp connections on timer that would be much less, and it would save even more trafic if I establish just one connection in the beginning and transmit the data within this connection. Am I right?

If TCP is a data stream, how do messages get created?

Closest resemblance to my question was posted here.
However, I am still having troubles understanding how a TCP data stream creates "messages" if you will. Arn't messages things that happen after xx amount of time? A TCP stream is a constant flow of data.
For example, a game server running at 30hz. If messages are sent out at 30 times a second, they must be using something internally to do that, what is it?

recv() data of unknown size with Berkeley Sockets

I have a code in C++ in which i use recv() from Berkeley Sockets to receive data from a remote host. The issue is that i do not know the size of the data ( which is variable ) so i need some kind of timeout opt ( probably ) to make this work.
Since I'm new in sockets programming, i was wondering how does for example a web client handle responses from a server ( eg a server sends the html data to the client ). Does it use some kind of timeout, since it doesn't know how big the page is ? Same with an FTP client.
When your data is of variable length, then typically that data is framed within another container. That is to say, there's a header preceding the actual data block that tell the receiver how much data it should accept.
For example HTTP uses new line characters to delimit data. If there's variable-length message, then in the header it will include "Content-length:" field that indicates exactly how many bytes to read once entire header is received (header stops when you read 2 consecutive new lines).
It is perfectly fine to read 4 bytes from socket, get how much data follows, then do another receive and read the rest. Only be careful, when you ask for 4 bytes, the socket might give you anywhere between 1-4 bytes so anything less than 4 means you need to go back and ask for remaining few bytes. This is a very common mistake. In dev environment you will almost always get 4 bytes when asking for 4, but once you deploy your app, somewhere on some machine you will get random crashes because their network behavior is somehow different.
Generally, it is a bad approach to rely on timeouts to determine when you reach end of data. With a timeout, you might get things "reliably" working in a well-controlled dev environment, but it is a very flaky solution. Any CPU/disk/network hick up might cause your app to stop receiving prematurely. You are also limiting your data throughput and responsiveness since your app is sleeping for some time interval instead of doing work.

Web server - how to parse requests? Asynchronous Stream Tokenizer?

I'm attempting to create a simple webserver in C# in asynchronous socket programming style. The purpose is very narrow - a Comet server (http long-polling).
I've got the windows service running, accepting connections, dumping request info to the Console and returning simple fixed content to the client.
Now, I can't figure out a manageable strategy for parsing the request data asynchronously and safely. I've written synchronous LL1 parsers before. I'm not sure if LL1 Parser is appropriate or necessary for HTTP. I don't know how to tokenize the input stream asynchronously. All I can think of is having an input buffer per client, reading into that, then copying that to a StringBuilder and periodically checking to see if I have a complete request. But that seems inefficient and might led to difficult to debug/maintain code.
Also, there are the two phases of the connection of receiving the request in full and the sending a response - in this case, after some delay. Once the request is validated and actionable, only then am I planning to enroll the connection in the long-polling manager. However, a misbehaving client could continue to send data and fill up a buffer, so I think I need to continue to monitor and empty the input stream during the response phase, right?
Any guidance on this is appreciated.
I guess the first step is knowing whether it is possible to efficiently tokenize a network stream asynchronously and without a large intermediate buffer. Even without a proper parser, the same challenges of creating a tokenizer apply to reading "lines" of input at a time, or even reading until double blank lines (one big token). I don't want to read one byte at a time from the network, but neither do I want to read too many bytes and have to store them in some intermediate buffer, right?
For HTTP the best way is reading the headers in memory completely (until you receive \r\n\r\n) and then simply splitting by \r\n to get the headers and every header by : to separate name and value.
There's no need to use a complex parser for that.

Are CRC's required for critical webservice calls?

I have data that is stored on a local machine and periodically replicated using webservices. This data is critical to the application of this program and is along the lines of business transactions.
TransactionHeader JOIN TransactionDetail
So forth.
Should I be using some type of CRC checking when sending the data to the webservice or is this handled by the TCP protocol itself sufficiently?
EDIT: Just to be clear the data isnt deleted from the client until the server acknowledges receipt and I use strongly typed parameters in my webservice but I am more thinking about "mangled" data (although in all cases but string it should theoretically fail datatype casting).
normally tcp does a fine job of transferring data intact but if that data is business critical then you shouldn't leave checking that data up to tcp and should use a good hash function
at the tcp level everything is reduced to byte strings so if you transfered a number if that number was changed in transit due to an error and just happened to still be an number to the other side it would mean that datatype casting wouldn't catch that
if the main problem your dealing with is transfer checking then crc32 or such would work fine but if those hashes are used to verify the data after it's been received and stored a much better hash like sha1 or such should be used

Resources