data dependency throw redis, how to gracefully wait - asynchronous

I have two back-end services A and B, and the web front-end initiates asynchronous requests to AB at the same time. The expected processing path is that A processes and generates some data first into redis and B picked for processing. But, cause requests are asynchronous, B may receive the request first. Due to lack of data processed by A, B stuck in waiting. Code like below. Any optimization suggestion? thx.
String runOnlyResultKey = RUN_ONLY_RESULT_KEY_PREFIX + msg.getString(DATA);
int i = 0;
while (!redisHelper.hasKey(runOnlyResultKey) && i < 5 * 10) {
String runOnlyResult = redisHelper.get(runOnlyResultKey);


TCP WiFi python-arduino communication problem

I try to steer a drone with a joytick from pc over WiFi. On board I have an Aruino like board called Particle Photon. I want to send 3 floats (pitch, roll, throttle) via TCP and I contocted this sort of thing:
On the PC - python - client side of things, it just sends this with 20 FPS/50ms frequency:
s.sendall(bytearray(struct.pack("f", float(-20*axis0))))
s.sendall(bytearray(struct.pack("f", float(20*axis1))))
s.sendall(bytearray(struct.pack("f", float(axis2))))
except socket.error as e:
print("error while sending :: " + str(e))
where s is my socket.
On the server-drone side I have this:
if (myIMU.delt_t >= 25)
if (TCPcomms && client.connected()){
// Check for 12 bytes (3 floats) from joystick input --
// if we have them, update roll, pitch and throttle references
if(client.available() >= 12){
byte tempBuff[4];
float newInput[3];
for(int j = 0; j < 3; j++){
for(int i = 0; i < 4; i++)
tempBuff[i] =;
newInput[j] = *((float*)(tempBuff));
roll_reference = newInput[0];
pitch_reference = newInput[1];
throttle_reference = newInput[2];
Drone loop works a lot faster hence the time check at the top (no need to check too often).
Now, I print-debugged it on slower speeds and everything seems fine. Whenever there are 3 floats ready, drone code reads them, if not it just continues.
But half the times I try to run it with normal speeds Particle board just checks out after some time and disconnects entirely. Apparently it does that whenever there is any problem, but it doesn't make a good job of communiocating what it is...
Earlier I also had a mechanism that disconnects after X seconds but it also dropepd connections often. I checked with Wireshark that it always coincided with retransmitting packets.
I guess my question is is it WiFi/TCP's fault and it's just not good for this type of a task or am I doing something stupid? Did anyone have a simillar issue with Arduino or Particle?

Using MPI_Scatter with 3 processes

I am new to MPI , and my question is how the root(for example rank-0) initializes all its values (in the array) before other processes receive their i'th value from the root?
for example:
in the root i initialize: arr[0]=20,arr[1]=90,arr[2]=80.
My question is ,If i have for example process (number -2) that starts a little bit before the root process. Can the MPI_Scatter sends incorrect value instead 80?
How can i assure the root initialize all his memory before others use Scatter ?
Thank you !
The MPI standard specifies that
If comm is an intracommunicator, the outcome is as if the root executed n
send operations, MPI_Send(sendbuf+i, sendcount, extent(sendtype), sendcount, sendtype, i,...), and each process executed a receive, MPI_Recv(recvbuf, recvcount, recvtype, i,...).
This means that all the non-root processes will wait until their recvcount respective elements have been transmitted. This is also known as synchronized routine (the process waits until the communication is completed).
You as the programmer are responsible of ensuring that the data being sent is correct by the time you call any communication routine and until the send buffer available again (in this case, until MPI_Scatter returns). In a MPI only program, this is as simple as placing the initialization code before the call to MPI_Scatter, as each process executes the program sequentially.
The following is an example based in the document's Example 5.11:
int grank, gsize,*sendbuf;
int root, rbuf[100];
MPI_Comm_rank( comm, &grank );
MPI_Comm_size(comm, &gsize);
root = 0;
if( grank == root ) {
sendbuf = (int *)malloc(gsize*100*sizeof(int));
// Initialize sendbuf. None of its values are valid at this point.
for( int i = 0; i < gsize * 100; i++ )
sendbuf[i] = i;
rbuf = (int *)malloc(100*sizeof(int));
// Distribute sendbuf data
// At the root process, all sendbuf values are valid
// In non-root processes, sendbuf argument is ignored.
MPI_Scatter(sendbuf, 100, MPI_INT, rbuf, 100, MPI_INT, root, comm);
MPI_Scatter() is a collective operation, so the MPI library does take care of everything, and the outcome of a collective operation does not depend on which rank called earlier than an other.
In this specific case, a non root rank will block (at least) until the root rank calls MPI_Scatter().
This is no different than a MPI_Send() / MPI_Recv().
MPI_Recv() blocks if called before the remote peer MPI_Send() a matching message.

How to use QTcpSocket for high frequent sending of small data packages?

We have two Qt applications. App1 accepts a connection from App2 through QTcpServer and stores it in an instance of QTcpSocket* tcpSocket. App1 runs a simulation with 30 Hz. For each simulation run, a QByteArray consisting of a few kilobytes is sent using the following code (from the main/GUI thread):
QByteArray block;
/* lines omitted which write data into block */
tcpSocket->write(block, block.size());
The receiver socket listens to the QTcpSocket::readDataBlock signal (in main/GUI thread) and prints the corresponding time stamp to the GUI.
When both App1 and App2 run on the same system, the packages are perfectly in sync. However when App1 and App2 are run on different systems connected through a network, App2 is no longer in sync with the simulation in App2. The packages come in much slower. Even more surprising (and indicating our implementation is wrong) is the fact that when we stop the simulation loop, no more packages are received. This surprises us, because we expect from the TCP protocol that all packages will arrive eventually.
We built the TCP logic based on Qt's fortune example. The fortune server, however, is different, because it only sends one package per incoming client. Could someone identify what we have done wrong?
Note: we use MSVC2012 (App1), MSVC2010 (App2) and Qt 5.2.
Edit: With a package I mean the result of a single simulation experiment, which is a bunch of numbers, written into QByteArray block. The first bits, however, contain the length of the QByteArray, so that the client can check whether all data has been received. This is the code which is called when the signal QTcpSocket::readDataBlock is emitted:
QDataStream in(tcpSocket);
if (blockSize == 0) {
if (tcpSocket->bytesAvailable() < (int)sizeof(quint16))
return; // cannot yet read size from data block
in >> blockSize; // read data size for data block
// if the whole data block is not yet received, ignore it
if (tcpSocket->bytesAvailable() < blockSize)
// if we get here, the whole object is available to parse
QByteArray object;
in >> object;
blockSize = 0; // reset blockSize for handling the next package
The problem in our implementation was caused by data packages being piled up and incorrect handling of packages which had only arrived partially.
The answer goes in the direction of Tcp packets using QTcpSocket. However this answer could not be applied in a straightforward manner, because we rely on QDataStream instead of plain QByteArray.
The following code (run each time QTcpSocket::readDataBlock is emitted) works for us and shows how a raw series of bytes can be read from QDataStream. Unfortunately it seems that it is not possible to process the data in a clearer way (using operator>>).
QDataStream in(tcpSocket);
while (tcpSocket->bytesAvailable())
if (tcpSocket->bytesAvailable() < (int)(sizeof(quint16) + sizeof(quint8)+ sizeof(quint32)))
return; // cannot yet read size and type info from data block
in >> blockSize;
in >> dataType;
char* temp = new char[4]; // read and ignore quint32 value for serialization of QByteArray in QDataStream
int bufferSize = in.readRawData(temp, 4);
delete temp;
temp = NULL;
QByteArray buffer;
int objectSize = blockSize - (sizeof(quint16) + sizeof(quint8)+ sizeof(quint32));
temp = new char[objectSize];
bufferSize = in.readRawData(temp, objectSize);
buffer.append(temp, bufferSize);
delete temp;
temp = NULL;
if (buffer.size() == objectSize)
//ready for parsing
else if (buffer.size() > objectSize)
//buffer size larger than expected object size, but still ready for parsing
// buffer size smaller than expected object size
while (buffer.size() < objectSize)
char* temp = new char[objectSize - buffer.size()];
int bufferSize = in.readRawData(temp, objectSize - buffer.size());
buffer.append(temp, bufferSize);
delete temp;
temp = NULL;
// now ready for parsing
if (dataType == 0)
// deserialize object
Please not that the first three bytes of the expected QDataStream are part of our own procotol: blockSize indicates the number of bytes for a complete single package, dataType helps deserializing the binary chunk.
For reducing the latency of sending objects through the TCP connection, disabling packet bunching was very usefull:
// disable Nagle's algorithm to avoid delay and bunching of small packages

Concurrent Processing - Petersons Algorithm

For those unfamiliar, the following is Peterson's algorithm used for process coordination:
int No_Of_Processes; // Number of processes
int turn; // Whose turn is it?
int interested[No_Of_Processes]; // All values initially FALSE
void enter_region(int process) {
int other; // number of the other process
other = 1 - process; // the opposite process
interested[process] = TRUE; // this process is interested
turn = process; // set flag
while(turn == process && interested[other] == TRUE); // wait
void leave_region(int process) {
interested[process] = FALSE; // process leaves critical region
My question is, can this algorithm give rise to deadlock?
No, there is no deadlock possible.
The only place you are waiting is while loop. And the process variables is not shared between threads and they are different, but turn variable is shared. So it's impossible to get true condition for turn == process for more then one thread in every single moment.
But anyway your solution is not correct at all, the Peterson's algorithm is only for two concurrent threads, not for any No_Of_Processes like in your code.
In original algorithm for N processes deadlocks are possible link.

Flush Socket blocks the call in non blocking SocketHi

I am using a non blocking Socket for sending messages.We were getting EGAIN error occassioanally .So I have decided to use Flush(socket) to flush the buffer and make space for new space so that i can avoid EGAIN error .But the problem is Flush(socket) is stuck for indefinite time .
Here is the code
int res = send(socket, buffer, size+lengthSize,0);
delete buffer;
if ( res== -1 )
int error = errno;
cout("ERROR on SendOnPortString, errno = " << error);
return 0 ;
cout<<"Send SucessFul = " << res << "Total Message size"<< size+lengthSize;
return 1 ;
This code printing
Send SucessFul = 11Total Message size 11
But after that its getting stuck in flush(socket) method .Any Idea why its behaving like that
You cast a socket handle of type int to a reference to a std::ostream in order to avoid compiler warnings/errors when you tried to hand it to flush. I'm surprised it's not crashing.
You can't make more space. The problem isn't you: the problem is that the system's internal buffers are full, and they will drain at their own pace in their own time. You can either poll by trying to send over and over till it works (in which case, why are you using non-blocking sockets?), or you use select, poll, kqueue, epoll, libevent, etc. to sleep till the socket is able to accept more data.
