as you probably know, there three modes of gen_tcp. {active, false}, {active, true} and {active, once}.
I have read some documents about {active, false}, {active, true} and {active, once}. However, I didn't get it.
What is difference between {active, false} and {active, true} and {active, once}?
Could you please explain plainly?
It's about flow control: you have an Erlang process handling incoming network traffic. Usually you want it to react to incoming packets quickly, but you don't want its queue of messages to grow faster than it can process it - but in certain cases you'll have different goals.
With {active, false}, you have explicit control of when the process receives incoming traffic: it only happens when you call gen_tcp:recv. However, while the process is waiting in gen_tcp:recv, it cannot receive other Erlang messages. Perhaps some other Erlang process is sending a message telling it to stop, but it doesn't know that yet because it's concentrating on getting network input.
With {active, true}, network input gets sent to the process as a message as soon as it is available. That means that you could have a receive expression that expects both network traffic and simple Erlang messages from other processes. This mode of operation could be useful if you're confident that your process can handle the input faster than it arrives, but you could end up with a long message queue that never gets cleared.
{active, once} is a compromise between the two: you receive incoming data as Erlang messages, meaning that you can mix network traffic with other work, but after receiving a packet you need to explicitly call inet:setopts with {active, once} again to receive more data, so you get to decide how quickly your process receives messages.
Since Erlang/OTP 17.0 there is yet another option, {active, N}, where N is an integer. That means that you can receive N messages before you have to call inet:setopts again. That could give higher throughput without having to give up flow control.
{active, false}
You have to read a chunk of data from the socket by calling gen_tcp:recv().
{active, true}
Erlang automatically reads chunks of data from the socket for you and gathers the chunks into a complete message and puts the message in the process mailbox. You read the messages using a receive clause. If some hostile actor floods your mailbox with messages, your process will crash.
{active, once}
Equivalent to {active, true} for the first chunks of data read from the socket, then {active, false} for any subsequent chunks of data.
You also need to understand how specifying {packet, N} influences things. See here: Erlang gen_tcp not receiving anything.
Related
I want to manage HTTP or RTSP sessions with Erlang.
For example, a standart session for RTSP protocol looks like:
OPTIONS rtsp://192.168.1.55/test/ RTSP/1.0\r\n
CSeq: 1\r\n
User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)\r\n
...
PLAY rtsp://192.168.1.55/test/ RTSP/1.0\r\n
CSeq: 5\r\n
Session: 1\r\n
Range: npt=0.000-\r\n
User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24)\r\n
The length of the every message is different.
For erlang, gen_server:listen uses an option {active, true} (to allow getting of an unlimited qantity of data) or {active, false} (for getting fixed length of data).
Is there a recommended method how to get and parse such messages with variable length?
For HTTP, use one of the HTTP packet modes documented for the inet:setopts/2 function. For example, to set a socket to receive HTTP messages as binaries, you can set the {packet, http_bin} on the socket. Have a look at my simple web server example to see how to use the HTTP packet modes.
For RTSP, there's no built-in packet parser, but because RTSP headers are line-oriented like HTTP, you can do your own header parsing using the {packet, line} mode. In that mode, you'll receive one header at a time until you receive an empty line indicating the end of the headers. You can then change the socket to {packet, raw} mode to receive any message body. The Content-Length header if present indicates the size of any message body.
The {active, true} vs {active, false} socket modes you mention control how data arrive at the controlling process (owner) of the socket.
The {active, true} mode sends all data from the socket to the controlling process as soon as they arrive. In this mode, data arrive as messages on the owner's message queue. Receiving messages on the process message queue is great because it allows the process to also handle other non-socket-related Erlang messages while handling socket data, but {active, true} isn't used that often because it provides no TCP back-pressure to the sender, and so a fast sender can overrun the receiver.
The {active, false} mode requires the receiver to call gen_tcp:recv/2,3 on the socket to retrieve data. This doesn't have the back-pressure problem of {active, true} but it can make message handling awkward since the Erlang process has to actively request the socket data rather than just sitting in a receive loop as it can with the other active modes.
Two other active modes you didn't mention are {active, once} and {active, N}. In {active, once} mode, the receiving process gets a single message via its message queue at a time, with the socket moving to the passive {active, false} mode after each message. To get another message, the receiver has to set {active, once} on the socket again when it's ready for the next message. This mode is nice because messages arrive on the process message queue same as they do with {active, true} mode, but back-pressure still works. The {active, N} mode is similar except that N messages, rather than just one, are received before the socket reverts to passive mode.
In the MPI Standard Section 3.4 (page 37):http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
the synchronous send completion means
1. the send-buffer can be reused
2. the receiver has started to receive data.
The standard says "has started" instead of "has completed", so I have a question about this: Imagine a case:
The sender calls MPI_Ssend, then a receiver is matched and has started to receive data. At this time, the send is complete and returned. As the MPI standard said, the send-buffer can be reused, so the sender modifies some data of the send-buffer. At the same time, the receiver is receiving data very slowly (e.g. network is very bad), so how can we guarantee the data finally received by the receiver is same as the original data stored in sender's send-buffer?
Ssend is synchronous. It means that Ssend cannot return before the corresponding Recv is called.
Ssend is Blocking. It means that the function return only when it is safe to touch the "send-buffer".
Synchronous and blocking are 2 different thing, I know it can be confusing.
Most implementation of Send works as follow (MPICH,OpenMPI,CRAY-MPI):
For small message the send-buffer is copied to the memory which is reserved for MPI. As soon as the copy is done the send return.
For large message, no copy are done, therefore the Send return once the entire send-buffer has been send to the network (which cannot be done before the Revc has been called, to avoid to overload the network memory)
So a MPI_Send is: Blocking, asynchronous for small message,synchronous for large one.
A Ssend works as follow:
As soon as the Recv is started AND the send-buffer is either copied or fully in the network, the Ssend return.
Ssend should be avoided as much as one can. As it slow down the communication (due to the fact that the network need to tell the sender that the recv has started)
I am writing erlang tcp-server which runs following protocol.
Each packet has exactly 4 bytes size.
There is one special-case packet - <<?SPECIAL_BYTE, 0, PayloadLength:2/big-unsigned-integer-unit:8>>. This packet indicates, that server must read next PayloadLength bytes of raw data.
I can receive raw stream of data and parse this protocol in erlang code, of course. But I wonder, is there any way to use builtin erlang packet packaging? When my packets preceded with its length, I can say [{packet, HeaderLength}]. Is there any way to force erlang to automatically package received data by 4-bytes chuks?
UPD: I am planning to use {active, once} mode. Also I can use gen_tcp:recv(Socket, 4), but I afraid performance penalty due multiple socket reads in this case. Is my fear justified?
Erlang's native packet decoding is extremely helpful, when it matches the actual format of your data. If your packets are always encoded with a 4 bytes (32 bits) big-endian length, then {packet, 4} is exactly what you need. However, if there are some exceptions in your encoding, then you must use {packet, raw} and do the decoding yourself.
For the decoding, you can indeed use the socket in passive mode witb {active, false}, read four bytes and then the rest of the packet. You can also use the socket in active mode, but in this case you must be prepared to receive less or more than your packet header, including more than a single packet. {active, once} can help but will not solve the problem. Passive mode might be easier to deal with in your case.
Performance-wise, you can refer to the following question:
How can the "packet" option of socket in Erlang accelerate the tcp transmission so much? However, I highly suggest to focus on getting a working implementation before trying to optimize it. Premature optimization never yields good results.
I have a program where there is a master/slave setup, and I have some functions implemented for the master which sends different kinds of data to the slaves. Some functions send to individual slaves, but some broadcast information to all the slaves via MPI_Bcast.
I want to have only one receive function in the slaves, so I want to know if I can probe for a message and know if it was broadcasted or sent as a normal blocking message, since there are different method to receive what was broadcasted and what was sent normally.
No, you can't decide whether to call Bcast or Recv on the basis of a probe call.
A MPI_Bcast call is a collective operation -- all MPI tasks must participate. As a result, these are not like point to point communication; they make use of the fact that all processes are involved to make higher-order optimizations.
Because the collective operations imply so much synchronization, it just doesn't make sense to allow other tasks to check to see whether they should start participating in a collective; it's something which has to be built into the logic of a program.
The root process' role in a broadcast is not like a Send; it can't, in general, just call MPI_Bcast and then proceed. The implementation will almost certainly block until some other number of processes have participated in the broadcast; and
The other process' role in a broadcast is not like receiving a message; in general it will be both receiving and sending information. So participating in a broadcast is different from making a simple Recv call.
So Probe won't work; the documentation for MPI_Probe is fairly clear that it returns information about what would happen upon the next MPI_Recv, and Recv is a different operation than Bcast.
You may be able to get some of what you want in MPI 3.0, which is being finalized now, which allows for nonblocking collectives -- eg, MPI_Ibcast. In that case you could start the Broadcast and call MPI_Test to check on the status of the request. However, even here, everyone would need to call the MPI_Ibcast first; this just allows easier interleaving of collective and point-to-point communication.
I'm trying to make a simple server/application in Erlang.
My server initialize a socket with gen_tcp:listen(Port, [list, {active, false}, {keepalive, true}, {nodelay, true}]) and the clients connect with gen_tcp:connect(Server, Port, [list, {active, true}, {keepalive, true}, {nodelay, true}]).
Messages received from the server are tested by guards such as {tcp, _, [115, 58 | Data]}.
Problem is, packets sometimes get concatenated when sent or received and thus cause unexpected behaviors as the guards consider the next packet as part of the variable.
Is there a way to make sure every packet is sent as a single message to the receiving process?
Plain TCP is a streaming protocol with no concept of packet boundaries (like Alnitak said).
Usually, you send messages in either UDP (which has limited per-packet size and can be received out of order) or TCP using a framed protocol.
Framed meaning you prefix each message with a size header (usualy 4 bytes) that indicates how long the message is.
In erlang, you can add {packet,4} to your socket options to get framed packet behavior on top of TCP.
assuming both sides (client/server) use {packet,4} then you will only get whole messages.
note: you won't see the size header, erlang will remove it from the message you see. So your example match at the top should still work just fine
You're probably seeing the effects of Nagle's algorithm, which is designed to increase throughput by coalescing small packets into a single larger packet.
You need the Erlang equivalent of enabling the TCP_NODELAY socket option on the sending socket.
EDIT ah, I see you already set that. Hmm. TCP doesn't actually expose packet boundaries to the application layer - by definition it's a stream protocol.
If packet boundaries are important you should consider using UDP instead, or make sure that each packet you send is delimited in some manner. For example, in the TCP version of DNS each message is prefixed by a 2 byte length header, which tells the other end how much data to expect in the next chunk.
You need to implement a delimiter for your packets.
One solution is to use a special character ; or something similar.
The other solution is to send the size of the packet first.
PacketSizeInBytes:Body
Then read the provided amount of bytes from your message. When you're at the end you got your whole packet.
Nobody mentions that TCP may also split your message into multiple pieces (split your packet into two messages).
So the second solution is the best of all. But a little hard. While the first one is still good but limits your ability to send packets with special characters. But the easiest to implement. Ofc theres a workaround for all of this. I hope it helps.