Bytes to receive on gen_tcp:recv by parsing json

Bytes to receive on gen_tcp:recv by parsing json - tcp

Working on a chat server, I need to receive json via gen_tcp in erlang.
One way is to send a 4byte int header which is a good idea so i can also reject messages from clients if they exceed the max length but add complexity on client side.
Another way is to read line, should work too for json if i am not wrong.
Third idea is to read json using depth tracking (counting '{' maybe?)
That way i can also set max message length and make client code less complex.
How can i do it specially with erlang i.e. check number of square brackets opened and keep receiving till last closes? or if its even a good idea?
How does xmpp and other messaging protocols handle this problem?

Another way is to read line, should work too for json if i am not wrong.
Any key or value in json can contain a newline, and if your read protocol is: "Stop reading when a newline character is read from the socket.", you will not read the whole json if any key or value in the json has a newline character in it.
Third idea is to read json using depth tracking (counting '{' maybe?)
Ugh. Too complex. And json can start with a [ as well. And, a key or value could contain a ] or a } too.
The bottom line is: you need to decide on what should mark the end of a sent message. You could choose some relatively unique string like: --*456?END OF MESSAGE!123**--, but once again a key or value in the json could possibly contain that string--and that is why byte headers are used. You should be able to make an informed choice on how you want to proceed after reading this.

Related

QTextBrowser display issue in Qt 4.8.6 - always displayed wrongly

I hope to read some characters or strings and display them with QTextBrowse from serial port by Qt 4.8.6 and called the following functions( textBrowser is a object of QTextBrowser):
connect(com, SIGNAL(readyRead()), this, SLOT(readSerialPort()));
connect(textBrowser, SIGNAL(textChanged()), SimApplianceQtClass, SLOT(on_textBrowser_textChanged()));
void SimApplianceQt::on_textBrower_textChanged()
{
ui.textBrowser->moveCursor(QTextCursor::End);
}
void SimApplianceQt::readSerialPort()
{
QByteArray temp = com->readAll();
ui.textBrowser->insertPlainText(temp);
}
However, every time I cannot display characters or strings in the textBrowser rightly. Those input strings are always cut into smaller strings to be displayed in multiple lines in the textBrowser. For example, a string "0123456789" may be displayed as (in multiple lines):
01
2345
6789
How to deal with this issue? Many thanks.

What happens is that the readyRead signal is fired not after everything has been received, but after something has been received and is ready to read.
There is no guarantee that everything will have arrived or is readable by the time you receive the first readyRead.
This is a common "problem" for almost any kind of IO, especially if the data is larger than very few bytes. There is usually no automatic way to know when all the data has been received.
There are a few possible solutions:
All of them will require you to put the data in a buffer in readSerialPort() instead of adding it directly to the text browser. Maybe a simple QByteArray member variable in SimApplianceQt would already do the trick in your case.
The rest depends on the exact solution.
If you have access to the sender of the data, you could send the
number of bytes that will be sent before sending the actual string.
This must always be in an integer type of the same size (for
example, always a quint32). Then, in readSerialPort(), you would
first read that size, and then continue to read bytes to your buffer
in readSerialPort() until everything has been received. And then,
you could finally print it. I'd recommend that one. It is also what is used in almost all cases where this problem arises.
If you have access to the sender of the data, you could send some
kind of "ending sequence" at the end of the string. In your
readSerialPort(), you would then continue to read bytes into your
buffer until you receive that ending sequence. Once the ending
sequence has been received, you can print everything that came in
prior to it. Note that the ending sequence itself could be interrupted,
so you'd have to take care of that, too.
If you do not have access to the sender, the best idea I could come
up with would be to work with a timer. You put everything into a
buffer and re-start that timer each time you readSerialPort() is
called. When the timer runs out, that means no new data has been
sent for a while and you can probably print what you have so far.
This is... risky and I wouldn't recommend it if there is any other way.

Generating a multipart/byterange response without scanning the parts ahead of sending

I would like to generate a multipart byte range response. Is there a way for me to do it without scanning each segment I am about to send out, since I need to generate multipart boundary strings?
For example, I can have a user request a byterange that would have me fetch and scan 2GB of data, which in my case involves me loading that data into my (slow) VM as strings and so forth. Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option? I see that many developers just grab a UUID as the boundary and are probably willing to risk a tiny probability that it will appear somewhere within the part, but that risk seems to be small enough multiple people are taking it?
To explain in more detail: scanning the parts ahead of time (before generating the response) is not really feasible in my case since I need to fetch them via HTTP from an upstream service. This means that I effectively have to prefetch the entire part first to compute a non-matching multipart boundary, and only then can I splice that part into the response.

Assuming the data can be arbitrary, I don’t see how you could guarantee absence of collisions without scanning the data.
If the format of the data is very limited (like... base 64 encoded?), you may be able to pick a boundary that is known to be an illegal sequence of bytes in that format.
Even if your boundary does collide with the data, it must be followed by headers such as Content-Range, which is even more improbable, so the client is likely to treat it as an error rather than consume the wrong data.
Major Web servers use very simple strategies. Apache grabs 8 random bytes at startup and renders them in hexadecimal. nginx uses a sequential counter left-padded with zeroes.
UUIDs are designed to avoid collisions with other UUIDs, not with arbitrary data. A UUID is no more likely to be a good boundary than a completely random string of the same length. Moreover, some UUID variants include information that you may not want to disclose, such as your machine’s MAC address.
Ideally I would like to simply state in the response that a part has a length of a certain number of bytes, and be done with it. Is there any tooling that could provide me with this option?
Maybe you can avoid supporting multiple ranges and simply tell the clients to request each range separately. In that case, you don’t use the multipart format, so there is no problem.
If you do want to send multiple ranges in one response, then RFC 7233 requires the multipart format, which requires the boundary string.
You can, of course, invent your own mechanism instead of that of RFC 7233. In that case:
You cannot use 206 (Partial Content). You must use 200 (OK) or some other applicable status code.
You cannot use the multipart/byteranges media type. You must come up with your own media type.
You cannot use the Range request header.
Because a 200 (OK) response to a GET request is supposed to carry a (full) representation of the resource, you must do one of the following:
encode the requested ranges in the URL; or
use something like POST instead of GET; or
use a custom, non-standard status code instead of 200 (OK); or
(not sure if this is a correct approach) use media type parameters, send them in Accept, and add Accept to Vary.
The chunked transfer coding may be useful, but you cannot rely on it alone, because it is a property of the connection, not of the payload.

How to get raw (un-decoded) byte arrays from a TCPSocket in Ruby

I've been using ruby to setup a TCPSocket with a server and I've hit a snag. When receiving data from the socket (with either a socket.gets or a socket.recv) I get something like this:
x00\x03!\xB2\x00\x00*\xCF
What I get when I capture the packets in Wireshark is
x00\x03\x21\xB2\x00\x00\x2a\xCF
As you can see, the \x21 is decoded into the ASCII equivalent ! and the \x2a is decoded into the ASCII equivalent *.
I've checked and googled a ton of times and have not yet found a solution to get the raw, un-decoded information. I have a parser built that will search for the relevant data from the stream and grab what I need, but I don't want to have to waste time re-encoding it before I have to decode it. Or, I incorporate ASCII into my parser, but that would be a huge pain. There is a lot of bytes in this stream and to re-encode them all would be time consuming. I also see that netcat returns the same output from the TCP stream that ruby does. I could not figure out how to get netcat to output the un-decoded byte arrays either.
Code:
require 'socket'
s = TCPSocket.new "10.0.0.3", 27000
while true do
item = s.recv(5000)
puts item
puts item.inspect
end
This is my first foray into socket programming, so I apologize if I missed something very obvious.

I kind of invented the problem in my head, I am dumb.
To solve this, all you need to do is take the string of TCP information and call unpack("H*") on it like this:
"x00\x03!\xB2\x00\x00*\xCF".unpack("H*")
=> ["7830300321b200002acf"]
Which is exactly like x00\x03\x21\xB2\x00\x00\x2a\xCF
Now I just need to adjust my parser to split it or deal with the big clump of byte arrays
More info on unpack

How to handle passing different types of serialized messages on a network

I'm currently sitting with the problem of passing messages that might contain different data over a network. I have created a prototype of my game, and now I'm busy implementing networking for my game.
I want to send different types of messages, as I think it would be silly to constantly send all the information every network-tick and I would rather send different messages that contain different data. What would be the best way to distinguish what message is received on the receiving side?
Currently I have a system where I prepend a string which distinguishes a certain type of message. My message is then sent through my own message parser class where it determines the type, and deserializes it to the correct type.
What I would like to know is if there is a better way of doing this? It seems like it should be a fairly common problem and so there must be a more trivial solution, unless I'm already doing it the trivial way.
Thanks!

I have read again carefully your question, and now I do not understand what is your problem, you say Currently I have a system where I prepend a string which distinguishes a certain type of message. My message is then sent through my own message parser class where it determines the type, and deserializes it to the correct type.
Looks OK, you may reduce the size of your message with my answer below horizontal line but the principle stays identical.
This the right way for asynchronous communication, but if you do synchrone you know that when you send A message you will receive B answer, so you do not have to prepend with a string which distinguishes the message, but you have to take care not sending another message before having the answer from the previous ...
So if you know how is formatted the answer you do not need any identification bytes, for example you know that the first four bytes is an integer, then a float on eight bytes, etc ...
Use boost::serialization, typically you save your structures, even with pointers, within a dumb bytes buffer, send that buffer over your network, and the other side de-serialize.
This example shows how Boost.Serialization can be used with asio to encode and decode structures for transmission over a socket.
Even if it is using boost::asio you could extract only the serialization part easily.

How to properly determine that the end-of-input has been reached in a QTCPSocket?

I have the following code that reads from a QTCPSocket:
QString request;
while(pSocket->waitForReadyRead())
{
request.append(pSocket->readAll());
}
The problem with this code is that it reads all of the input and then pauses at the end for 30 seconds. (Which is the default timeout.)
What is the proper way to avoid the long timeout and detect that the end of the input has been reached? (An answer that avoids signals is preferred because this is supposed to be happening synchronously in a thread.)

The only way to be sure is when you have received the exact number of bytes you are expecting. This is commonly done by sending the size of the data at the beginning of the data packet. Read that first and then keep looping until you get it all. An alternative is to use a sentinel, a specific series of bytes that mark the end of the data but this usually gets messy.

If you're dealing with a situation like an HTTP response that doesn't contain a Content-Length, and you know the other end will close the connection once the data is sent, there is an alternative solution.
Use socket.setReadBufferSize to make sure there's enough read buffer for all the data that may be sent.
Call socket.waitForDisconnected to wait for the remote end to close the connection
Use socket.bytesAvailable as the content length
This works because a close of the connection doesn't discard any buffered data in a QTcpSocket.

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex