What should I do if I want to send message by MPI and receive messages at the same time?

What should I do if I want to send message by MPI and receive messages at the same time? - tcp

Backgroup: rank 0 send message to rank 1, after rank 1 completes its work it returns messages to rank 0
actually I run a thread for sending message and the other one for receiving in rank 0 like this:
int tag = 1;
void* thread_send(void* argc)
{
...;
while(1)
{
if(tag == 1)
{
MPI_Send(...,1,TAG_SEND,...);//send something to slave
tag = 0;
}
}
...
}
void* thread_receive(void* argc)
{
while(1)
{
MPI_Recv(...,0,TAG_RECV,...); //ready for receiving from slave
tag = 1;
}
}
in rank 1 I run a thread like this:
void* slave(void* argc)
{
...;
while(1)
{
MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
switch(status.MPI_TAG){
case TAG_SEND:
MPI_Recv(..,0,TAG_SEND,..);
break;
}
MPI_Send(...,0,MPI_RECV,...); //notify rank 0 slave has done his work
}
}
then I got an error like this:
[comp01-mpi.gpu01.cis.k.hosei.ac.jp][[54135,1],0]
[btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
received unexpected process identifier [[16641,0],301989888]
In fact there are several interfaces for one machine, I know it might to be a problem, so I assign the parameter
--mca btl_tcp_if_include eth0 --mca oob_tcp_if_include eth0
to avoid network traffic.
Have I done something wrong? I will appreciate any suggestion you give me, thanks.
Thanks to #HristoIliev, I checked the Open MPI like this:
MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
if(provide_level < MPI_THREAD_MULTIPLE){
printf("Error: the MPI library doesn't provide the required thread level\n");
MPI_Abort(MPI_COMM_WORLD,0);
}
and I got the error:
Error: the MPI library doesn't provide the required thread level
that means I CAN NOT use multiple threads, so what else can I do?
Now I am using the non-blocing sends(Isend) and receives(Irecv), the code is like this:
send thread:
int tag = 1;
void* thread_send(void* argc)
{
...;
while(1)
{
while(1)
{
MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
if(tag == 1) break;
printf("tag is %d\n",tag);
MPI_Wait(&request,&status);
}
MPI_Send(...,1,MSG_SEND,...);//send something to slave
tag = 0;
}
...
}
receive thread:
void* slave(void* argc)
{
...;
while(1)
{
MPI_Probe(0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
switch(status.MPI_TAG){
case TAG_SEND:
MPI_Recv(..,0,MSG_Send,..);
break;
}
int tag = 1;
MPI_Isend(&tag,1,MPI_INT,0,MSG_TAG,MPI_COMM_WORLD,&request); //notify rank 0 slave has done his work
MPI_Wait(&request,&status);
printf("slave is idle now \n");
}
}
and it printed like this:
tag is 0
slave is idle now
and hang here

I have solved the problem by changing the Irecv() funciton's location, like following:
send thread:
int tag = 1;
void* thread_send(void* argc)
{
...;
while(1)
{
while(1)
{
if(tag == 1) break;
printf("tag is %d\n",tag);
MPI_Irecv(&tag,MPI_INT,1,MSG_TAG,MPI_COMM_WORLD,&request);
MPI_Wait(&request,&status);
}
MPI_Send(...,1,MSG_SEND,...);//send something to slave
tag = 0;
}
...
}.
In conclusion, to send and receive messages at the same time, you can use multiple thread if your MPI supports multiple-thread mode, you can check it when you init your MPI program like this:
MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provide_level);
if(provide_level < MPI_THREAD_MULTIPLE){
printf("Error: the MPI library doesn't provide the required thread level\n");
MPI_Abort(MPI_COMM_WORLD,0);
}
Or if your MPI doesn't support multiple thread mode, you may use non-blocking communication.

Related

nghttp2: Using server-sent events to be use by EventSource

I'm using nghttp2 to implement a REST server which should use HTTP/2 and server-sent events (to be consumed by an EventSource in the browser). However, based on the examples it is unclear to me how to implement SSE. Using res.push() as in asio-sv.cc doesn't seem to be the right approach.
What would be the right way to do it? I'd prefer to use nghttp2's C++ API, but the C API would do as well.

Yup, I did something like that back in 2018. The documentation was rather sparse :).
First of all, ignore response::push because that's the HTTP2 push -- something for proactively sending unsolicited objects to the client before it requests them. I know it sounds like what you need, but it is not -- the typical use case would be proactively sending a CSS file and some images along with the originally requested HTML page.
The key thing is that your end() callback must eventually return NGHTTP2_ERR_DEFERRED whenever you run out of data to send. When your application somehow obtains more data to be sent, call http::response::resume().
Here's a simple code. Build it as g++ -std=c++17 -Wall -O3 -ggdb clock.cpp -lssl -lcrypto -pthread -lnghttp2_asio -lspdlog -lfmt. Be careful, modern browsers don't do HTTP/2 over a plaintext socket, so you'll need to reverse-proxy it via something like nghttpx -f '*,8080;no-tls' -b '::1,10080;;proto=h2'.
#include <boost/asio/io_service.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/signals2.hpp>
#include <chrono>
#include <list>
#include <nghttp2/asio_http2_server.h>
#define SPDLOG_FMT_EXTERNAL
#include <spdlog/spdlog.h>
#include <thread>
using namespace nghttp2::asio_http2;
using namespace std::literals;
using Signal = boost::signals2::signal<void(const std::string& message)>;
class Client {
const server::response& res;
enum State {
HasEvents,
WaitingForEvents,
};
std::atomic<State> state;
std::list<std::string> queue;
mutable std::mutex mtx;
boost::signals2::scoped_connection subscription;
size_t send_chunk(uint8_t* destination, std::size_t len, uint32_t* data_flags [[maybe_unused]])
{
std::size_t written{0};
std::lock_guard lock{mtx};
if (state != HasEvents) throw std::logic_error{std::to_string(__LINE__)};
while (!queue.empty()) {
auto num = std::min(queue.front().size(), len - written);
std::copy_n(queue.front().begin(), num, destination + written);
written += num;
if (num < queue.front().size()) {
queue.front() = queue.front().substr(num);
spdlog::debug("{} send_chunk: partial write", (void*)this);
return written;
}
queue.pop_front();
spdlog::debug("{} send_chunk: sent one event", (void*)this);
}
state = WaitingForEvents;
return written;
}
public:
Client(const server::request& req, const server::response& res, Signal& signal)
: res{res}
, state{WaitingForEvents}
, subscription{signal.connect([this](const auto& msg) {
enqueue(msg);
})}
{
spdlog::warn("{}: {} {} {}", (void*)this, boost::lexical_cast<std::string>(req.remote_endpoint()), req.method(), req.uri().raw_path);
res.write_head(200, {{"content-type", {"text/event-stream", false}}});
}
void onClose(const uint32_t ec)
{
spdlog::error("{} onClose", (void*)this);
subscription.disconnect();
}
ssize_t process(uint8_t* destination, std::size_t len, uint32_t* data_flags)
{
spdlog::trace("{} process", (void*)this);
switch (state) {
case HasEvents:
return send_chunk(destination, len, data_flags);
case WaitingForEvents:
return NGHTTP2_ERR_DEFERRED;
}
__builtin_unreachable();
}
void enqueue(const std::string& what)
{
{
std::lock_guard lock{mtx};
queue.push_back("data: " + what + "\n\n");
}
state = HasEvents;
res.resume();
}
};
int main(int argc [[maybe_unused]], char** argv [[maybe_unused]])
{
spdlog::set_level(spdlog::level::trace);
Signal sig;
std::thread timer{[&sig]() {
for (int i = 0; /* forever */; ++i) {
std::this_thread::sleep_for(std::chrono::milliseconds{666});
spdlog::info("tick: {}", i);
sig("ping #" + std::to_string(i));
}
}};
server::http2 server;
server.num_threads(4);
server.handle("/events", [&sig](const server::request& req, const server::response& res) {
auto client = std::make_shared<Client>(req, res, sig);
res.on_close([client](const auto ec) {
client->onClose(ec);
});
res.end([client](uint8_t* destination, std::size_t len, uint32_t* data_flags) {
return client->process(destination, len, data_flags);
});
});
server.handle("/", [](const auto& req, const auto& resp) {
spdlog::warn("{} {} {}", boost::lexical_cast<std::string>(req.remote_endpoint()), req.method(), req.uri().raw_path);
resp.write_head(200, {{"content-type", {"text/html", false}}});
resp.end(R"(<html><head><title>nghttp2 event stream</title></head>
<body><h1>events</h1><ul id="x"></ul>
<script type="text/javascript">
const ev = new EventSource("/events");
ev.onmessage = function(event) {
const li = document.createElement("li");
li.textContent = event.data;
document.getElementById("x").appendChild(li);
};
</script>
</body>
</html>)");
});
boost::system::error_code ec;
if (server.listen_and_serve(ec, "::", "10080")) {
return 1;
}
return 0;
}
I have a feeling that my queue handling is probably too complex. When testing via curl, I never seem to run out of buffer space. In other words, even if the client is not reading any data from the socket, the library keep invoking send_chunk, asking for up to 16kB of data at a time for me. Strange. I have no idea how it works when pushing more data more heavily.
My "real code" used to have a third state, Closed, but I think that blocking events via on_close is enough here. However, I think you never want to enter send_chunk if the client has already disconnected, but before the destructor gets called.

How the TinyOS communicates with the TelosB hardware?

I have a very basic question regarding how the whole system works for TelosB. I have gone through the data manual and how TelosB operates.
Now I need to send some specific data through the TelosB nodes. I saw the packet format of the TinyOS. and defined the payload.
typedef nx_struct packet {
nx_uint16_t id; /* Mote id of sending mote. */
nx_uint16_t count; } packet_t;
In the radio count leds code I changed few small things
#include "Timer.h"
#include "RadioCountToLeds.h"
module RadioCountToLedsC #safe() {
uses {
interface Leds;
interface Boot;
interface Receive;
interface AMSend;
interface Timer<TMilli> as MilliTimer;
interface SplitControl as AMControl;
interface Packet;
}
}
implementation {
message_t packet;
bool locked;
uint16_t counter = 0;
event void Boot.booted() {
call AMControl.start();
}
event void AMControl.startDone(error_t err) {
if (err == SUCCESS) {
call MilliTimer.startPeriodic(250);
}
else {
call AMControl.start();
}
}
event void AMControl.stopDone(error_t err) {
// do nothing
}
event void MilliTimer.fired() {
counter++;
dbg("RadioCountToLedsC", "RadioCountToLedsC: timer fired, counter is
%hu.\n", counter);
if (locked) {
return;
}
else {
radio_count_msg_t* rcm = (radio_count_msg_t*)call
Packet.getPayload(&packet, sizeof(radio_count_msg_t));
if (rcm == NULL) {
return;
}
rcm->counter = 11110000;
if (call AMSend.send(AM_BROADCAST_ADDR, &packet, sizeof(radio_count_msg_t)) == SUCCESS) {
dbg("RadioCountToLedsC", "RadioCountToLedsC: packet sent.\n", counter);
locked = TRUE;
}
}
}
The counter is the data I want to transmit. It is 11110000. When this TinyOS code is run in the Telosb mote, how is the 11110000 interpreted? I need to be very specific about what data is going into the DAC of the telosb or OQPSK.
I want to know in details how the data is read and interpreted.

I guess I got my answer. cc2420 chip
http://www.ti.com/lit/ds/symlink/cc2420.pdf

Dead Lock In a Single Producer Multiple Consumer Case

Could anyone point out why this code can cause dead-lock?
It is a single producer, multiple consumer problem. The producer have 8 buffers. Here it has 4 consumers. Each consumer will have two buffers. When a buffer is filled, it flags it to be ready to consume and switch to the second buffer. The consumer then can process this buffer. After it done, it return the buffer to the producer.
Buffer 0-1 for consumer 0
Buffer 2-3 for consumer 1
Buffer 4-5 for consumer 2
Buffer 6-7 for consumer 3
The program once a while reaches to a dead lock state.
The understanding is that, since the flag can be only in one state, either 0 or 1, so at least either consumer or producer can proceed. It one proceed, it eventually will unlock the dead lock.
#include <iostream>
#include <thread>
#include <mutex>
using namespace std;
const int BUFFERSIZE = 100;
const int row_size = 10000;
class sharedBuffer
{
public:
int B[8][BUFFERSIZE];
volatile int B_STATUS[8];
volatile int B_SIZE[8];
sharedBuffer()
{
for (int i=0;i<8;i++)
{
B_STATUS[i] = 0;
B_SIZE[i] = 0;
for (int j=0;j<BUFFERSIZE;j++)
{
B[i][j] = 0;
}
}
}
};
class producer
{
public:
sharedBuffer * buffer;
int data[row_size];
producer(sharedBuffer * b)
{
this->buffer = b;
for (int i=0;i<row_size;i++)
{
data[i] = i+1;
}
}
void produce()
{
int consumer_id;
for(int i=0;i<row_size;i++)
{
consumer_id = data[i] % 4;
while(true)
{
if (buffer->B_STATUS[2*consumer_id] ==1 && buffer->B_STATUS[2*consumer_id + 1] == 1)
continue;
if (buffer->B_STATUS[2*consumer_id] ==0 )
{
buffer->B[2*consumer_id][buffer->B_SIZE[2*consumer_id]++] = data[i];
if(buffer->B_SIZE[2*consumer_id] == BUFFERSIZE || i==row_size -1)
{
buffer->B_STATUS[2*consumer_id] =1;
}
break;
}
else if (buffer->B_STATUS[2*consumer_id+1] ==0 )
{
buffer->B[2*consumer_id+1][buffer->B_SIZE[2*consumer_id+1]++] = data[i];
if(buffer->B_SIZE[2*consumer_id+1] == BUFFERSIZE || i==row_size -1)
{
buffer->B_STATUS[2*consumer_id+1] =1;
}
break;
}
}
}
//some buffer is not full, still need set the flag to 1
for (int i=0;i<8;i++)
{
if (buffer->B_STATUS[i] ==0 && buffer->B_SIZE[i] >0 )
buffer->B_STATUS[i] = 1;
}
cout<<"Done produce, wait the data to be consumed\n";
while(true)
{
if (buffer->B_STATUS[0] == 0 && buffer->B_SIZE[0] == 0
&& buffer->B_STATUS[1] == 0 && buffer->B_SIZE[1] == 0
&& buffer->B_STATUS[2] == 0 && buffer->B_SIZE[2] == 0
&& buffer->B_STATUS[3] == 0 && buffer->B_SIZE[3] == 0
&& buffer->B_STATUS[4] == 0 && buffer->B_SIZE[4] == 0
&& buffer->B_STATUS[5] == 0 && buffer->B_SIZE[5] == 0
&& buffer->B_STATUS[6] == 0 && buffer->B_SIZE[6] == 0
&& buffer->B_STATUS[7] == 0 && buffer->B_SIZE[7] == 0 )
{
for (int i=0;i<8;i++)
buffer->B_STATUS[i] = 2;
break;
}
}
};
};
class consumer
{
public:
sharedBuffer * buffer;
int sum;
int index;
consumer(int id, sharedBuffer * buf){this->index = id;this->sum = 0;this->buffer = buf;};
void consume()
{
while(true)
{
if (buffer->B_STATUS[2*index] ==0 && buffer->B_STATUS[2*index+1] ==0 )
continue;
if (buffer->B_STATUS[2*index] ==2 && buffer->B_STATUS[2*index+1] ==2 )
break;
if (buffer->B_STATUS[2*index] == 1)
{
for (int i=0;i<buffer->B_SIZE[2*index];i++)
{
sum+=buffer->B[2*index][i];
}
buffer->B_STATUS[2*index]=0;
buffer->B_SIZE[2*index] =0;
}
if (buffer->B_STATUS[2*index+1] == 1)
{
for (int i=0;i<buffer->B_SIZE[2*index+1];i++)
{
sum+=buffer->B[2*index+1][i];
}
buffer->B_STATUS[2*index+1]=0;
buffer->B_SIZE[2*index+1] =0;
}
}
printf("Sum of consumer %d = %d \n",index,sum);
};
};
int main()
{
sharedBuffer b;
producer p(&b);
consumer c1(0,&b),c2(1,&b),c3(2,&b),c4(3,&b);
thread p_t(&producer::produce,p);
thread c1_t(&consumer::consume,c1);
thread c2_t(&consumer::consume,c2);
thread c3_t(&consumer::consume,c3);
thread c4_t(&consumer::consume,c4);
p_t.join();c1_t.join();c2_t.join();c3_t.join();c4_t.join();
}

This is flawed in many ways. The compiler can reorder your instructions, and different CPU cores may not see memory operations in the same order.
Basically your producer does this:
it writes data to the buffer
it sets the flag
Your consumer does this:
it reads the flag
if the flag is what it wants it reads data
it resets the flag
This does not work, for several reasons.
The compiler can reorder your instructions (both on the consumer and producer side) to do things in a different order. For example, on the producer side, it could store all your computations in registers, and then write the status flag to memory first, and the data later. The consumer would then get stale data.
Even in absence of that, there is no guarantee that different writes to memory are seen in the same order by different CPU cores (e.g. if they have separate caches, and your flag and data are on different cache lines).
This can cause all sorts of trouble - data corruption, deadlocks, segfaults, depending on what exactly your code does. I haven't analyzed your code sufficiently to tell you exactly why this causes a deadlock, but I'm not surprised at all.
Note that the 'volatile' keyword is completely useless for this type of synchronization. 'volatile' is only useful for signal handling (unix signals), not for multithreaded code.
The correct way to do this is to use proper synchronization (for example mutexes) or atomic operations (e.g. std::atomic). They have various different guarantees that make sure that the issues above don't happen.
Mutexes are generally easier to use if speed is not of the highest importance. Atomic operations can get you a little more control but they are very tricky to use.
I would recommend that you do this with mutexes, then profile the program, and then only go to atomic operations if it's insufficiently fast.
valgrind is a great tool which is useful to debug multithreaded programs (it'll point out unsynchronized memory access and the like).

thanks for the helpful comments.
I thought if make sure all the flags/status value are read from memory, not from registers/cache, the deadlock should not happen no matter how compiler reorganize the instructions. The volatile keyword should enforce this. Looks like my understanding is wrong.
Another baffling thing is that, I thought the value of status variable should only be one of (0,1,2), but once a while, I saw the value like 5384. Somehow the data got corrupted.

boost mpi assertion failing

I am using boost-mpi and I am getting an error that I am having a hard time figuring out. I am using a recv call and an assertion in the boost code fails:
void boost::mpi::binary_buffer_iprimitive::load_impl(void *, int): Assertion `position+l<=static_cast<int>(buffer_.size())’
This is from the file binary_buffer_iprimitive.hpp located in boost/mpi/detail/.
This does not occur the first time I am receiving so I know it’s not a general error with every send/recv call I make. I think this assertion is checking whether the buffer is large enough to accommodate the data being received, but I’m not even sure about that. If that is the case, what could cause the buffer to not be large enough? Shouldn’t that be taken care of by the underlying boost?
for (unsigned int i=0;i<world.size();i++)
{
if (particles_to_be_sent[i].size()>0)
{
ghosts_to_be_sent[i].part_crossed_send()='b';
world.isend(i,20,particles_to_be_sent[i]);
}
}
mpi::all_to_all(world,ghosts_to_be_sent,ghosts_received);
//receive particles
for (int recv_rank=0;recv_rank<world.size();recv_rank++)
{
if (ghosts_received[recv_rank].part_crossed_send()=='b')
{
world.recv(recv_rank,20,particles_received); //this line fails
for (unsigned int i=0;i<particles_received.size();i++)
{
//do stuff
}
}
for (unsigned int j=0;j<ghosts_received[recv_rank].ghosts_to_send().size();j++)
{
//do stuff
}
}

Does waitpid blocks on a stopped job?

I have a child process which has received a SIGTSTP signal.
When I call
waitpid(-1,NULL,0);
the parent blocks, but in documentation, its written that waitpid returns with pid for stopped jobs.
#include<unistd.h>
#include<stdio.h>
#include<signal.h>
#include<sys/wait.h>
main() {
int pid;
if( (pid=fork()) > 0) {
sleep(5);
if(kill(pid,SIGTSTP) < 0)
printf("kill error\n");
int status;
waitpid(-1,&status,0);
printf("Returned %d\n",WIFSTOPPED(status));
}
else if(pid==0) {
while(1);
}
}

You missed the option WUNTRACED for waitpid (3rd argument). Otherwise it doesn't return until the job is terminated.
When the WUNTRACED option is set, children of the current process that are stopped due to a SIGTTIN, SIGTTOU, SIGTSTP, or SIGSTOP signal also have their status reported (from the mac man page).

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex