I am using boost-mpi and I am getting an error that I am having a hard time figuring out. I am using a recv call and an assertion in the boost code fails:
void boost::mpi::binary_buffer_iprimitive::load_impl(void *, int): Assertion `position+l<=static_cast<int>(buffer_.size())’
This is from the file binary_buffer_iprimitive.hpp located in boost/mpi/detail/.
This does not occur the first time I am receiving so I know it’s not a general error with every send/recv call I make. I think this assertion is checking whether the buffer is large enough to accommodate the data being received, but I’m not even sure about that. If that is the case, what could cause the buffer to not be large enough? Shouldn’t that be taken care of by the underlying boost?
for (unsigned int i=0;i<world.size();i++)
{
if (particles_to_be_sent[i].size()>0)
{
ghosts_to_be_sent[i].part_crossed_send()='b';
world.isend(i,20,particles_to_be_sent[i]);
}
}
mpi::all_to_all(world,ghosts_to_be_sent,ghosts_received);
//receive particles
for (int recv_rank=0;recv_rank<world.size();recv_rank++)
{
if (ghosts_received[recv_rank].part_crossed_send()=='b')
{
world.recv(recv_rank,20,particles_received); //this line fails
for (unsigned int i=0;i<particles_received.size();i++)
{
//do stuff
}
}
for (unsigned int j=0;j<ghosts_received[recv_rank].ghosts_to_send().size();j++)
{
//do stuff
}
}
Related
I've written this code to receive a series of char variable through USART6 and have them stored in a string. But the problem is first received value is just a junk! Any help would be appreciated in advance.
while(1)
{
//memset(RxBuffer, 0, sizeof(RxBuffer));
i = 0;
requestRead(&dt, 1);
RxBuffer[i++] = dt;
while (i < 11)
{
requestRead(&dt, 1);
RxBuffer[i++] = dt;
HAL_Delay(5);
}
function prototype
static void requestRead(char *buffer, uint16_t length)
{
while (HAL_UART_Receive_IT(&huart6, buffer, length) != HAL_OK)
HAL_Delay(10);
}
First of all, the HAL_Delay seems to be redundant. Is there any particular reason for it?
The HAL_UART_Receive_IT function is used for non-blocking mode. What you have written seems to be more like blocking mode, which uses the HAL_UART_Receive function.
Also, I belive you need something like this:
Somewhere in the main:
// global variables
volatile uint8_t Rx_byte;
volatile uint8_t Rx_data[10];
volatile uint8_t Rx_indx = 0;
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
And then the callback function:
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
if (huart->Instance == UART1) { // Current UART
Rx_data[Rx_indx++] = Rx_byte; // Add data to Rx_Buffer
}
HAL_UART_Receive_IT(&huart1, &Rx_byte, 1);
}
The idea is to receive always only one byte and save it into an array. Then somewhere check the number of received bytes or some pattern check, etc and then process the received frame.
On the other side, if the number of bytes is always same, you can change the "HAL_UART_Receive_IT" function and set the correct bytes count.
I have two pointers in memory and I want to swap it atomically but atomic operation in CUDA support only int types. There is a way to do the following swap?
classA* a1 = malloc(...);
classA* a2 = malloc(...);
atomicSwap(a1,a2);
When writing device-side code...
While CUDA provides atomics, they can't cover multiple (possibly remote) memory locations at once.
To perform this swap, you will need to "protect" access to both these values with something like mutex, and have whoever wants to write values to them take a hold of the mutex for the duration of the critical section (like in C++'s host-side std::lock_guard). This can be done using CUDA's actual atomic facilities, e.g. compare-and-swap, and is the subject of this question:
Implementing a critical section in CUDA
A caveat to the above is mentioned by #RobertCrovella: If you can make do with, say, a pair of 32-bit offsets rather than a 64-bit pointer, then if you were to store them in a 64-bit aligned struct, you could use compare-and-exchange on the whole struct to implement an atomic swap of the whole struct.
... but is it really device side code?
Your code actually doesn't look like something one would run on the device: Memory allocation is usually (though not always) done from the host side before you launch your kernel and do actual work. If you could make sure these alterations only happen on the host side (think CUDA events and callbacks), and that device-side code will not be interfered with by them - you can just use your plain vanilla C++ facilities for concurrent programming (like lock_guard I mentioned above).
I managed to have the needed behaviour, it is not atomic swap but still safe. The context was a monotonic Linked List working both on CPU and GPU:
template<typename T>
union readablePointer
{
T* ptr;
unsigned long long int address;
};
template<typename T>
struct LinkedList
{
struct Node
{
T value;
readablePointer<Node> previous;
};
Node start;
Node end;
int size;
__host__ __device__ void initialize()
{
size = 0;
start.previous.ptr = nullptr;
end.previous.ptr = &start;
}
__host__ __device__ void push_back(T value)
{
Node* node = nullptr;
malloc(&node, sizeof(Node));
readablePointer<Node> nodePtr;
nodePtr.ptr = node;
nodePtr.ptr->value = value;
#ifdef __CUDA_ARCH__
nodePtr.ptr->previous.address = atomicExch(&end.previous.address, nodePtr.address);
atomicAdd(&size,1);
#else
nodePtr.ptr->previous.address = end.previous.address;
end.previous.address = nodePtr.address;
size += 1;
#endif
}
__host__ __device__ T pop_back()
{
assert(end.previous.ptr != &start);
readablePointer<Node> lastNodePtr;
lastNodePtr.ptr = nullptr;
#ifdef __CUDA_ARCH__
lastNodePtr.address = atomicExch(&end.previous.address,end.previous.ptr->previous.address);
atomicSub(&size,1);
#else
lastNodePtr.address = end.previous.address;
end.previous.address = end.previous.ptr->previous.address;
size -= 1;
#endif
T toReturn = lastNodePtr.ptr->value;
free(lastNodePtr.ptr);
return toReturn;
}
__host__ __device__ void clear()
{
while(size > 0)
{
pop_back();
}
}
};
Is it necessary to clear all the inner lists to avoid a leak?:
class Instruction
{
int opcode;
int data1;
int data2;
bool Load(QTextStream* in);
void Save(QTextStream* out) const;
};
class Interpreter
{
QList<QList<Instruction>> steps;
bool Load(QTextStream* file)
{
if(file_is_bad)
{
return false;
}
int end = steps.size();
for(int i=0; i<end; i++)
{
steps.at(i).clear();
}
steps.clear();
//now that it's clear, rebuild it from file
return true;
}
};
Or can I just call steps.clear(); and call it a day?
(And here's some more text to get past the "too much code" error.)
Igor was right. steps.clear() is enough because it destroys all inner QLists. Destroying an inner QList calls the destructor of all Instruction instants.
So as long as a single Instruction does not leak memory (e.g. by calling a new in the constructor but no delete in the destructor), QList<QList<Instruction>> will not leak as well.
#include <iostream>
#include <functional>
#include <future>
#include <tchar.h>
void StartBackground(std::function<void()> notify)
{
auto background = std::async([&]
{
notify(); // (A)
});
}
int _tmain(int argc, _TCHAR* argv[])
{
StartBackground([](){});
char c; std::cin >> c; // (B)
while (1);
return 0;
}
1) Build and run the code above using Visual Studio 2012.
2) Line (A) triggers an Access Violation in _VARIADIC_EXPAND_P1_0(_CLASS_FUNC_CLASS_0, , , , ):
First-chance exception at 0x0F96271E (msvcp110d.dll) in
ConsoleApplication1.exe: 0xC0000005: Access violation writing location
0x0F9626D8
Most confusingly, the exception can be avoided by removing line (B).
Questions
Why does the callable object notify apparently conflict with the use of std::cin?
What's wrong with this code?
The real world scenario for this simplified example is a function that executes some code in parallel and have that code call a user-supplied notify function when done.
Edit
I found at least one problem im my code: The background variable is destroyed as soon as StartBackground() exits. Since std::async may or may not start a separate thread, and std::thread calls terminate() if the thread is still joinable, this might be causing the problem.
The following variant works because it gives the task enough time to complete:
void StartBackground(std::function<void()> notify)
{
auto background = std::async([&]
{
notify(); // (A)
});
std::this_thread::sleep_for(std::chrono::seconds(1));
}
Keeping the std::future object alive over a longer period instead of sleeping should also work. But the following code also causes the same access violation:
std::future<void> background;
void StartBackground(std::function<void()> notify)
{
background = std::async([&]
{
notify(); // (A)
});
}
whereas using a std::thread in the same manner works as expected:
std::thread background;
void StartBackground(std::function<void()> notify)
{
background = std::thread([&]
{
notify(); // (A)
});
}
I'm completely puzzled.
I must be missing some very crucial points here regarding std::async and std::thread.
The result of async is a future, not a running thread. You have to synchronize on the task by saying background.get(). Without that, the client procedure may never get executed.
I tried running QtConcurrent::run() in a loop, but program crashes: (i am using libsmbclient)
void Scanner::scan()
{
for(int i=0;i<ipList.length();i++)
{
QtConcurrent::run(this,&Scanner::scanThread,i);
}
}
void Scanner::scanThread(int i)
{
int dh;
QString ip;
ip="smb://"+ipList[i]+"/";
dh= smbc_opendir(ip.toAscii()); // debugger points to this location
if(dh<0)
return;
emit
updateTree(i,dh); // on commenting this line, it still crashes
}
Error:
talloc: access after free error - first free may be at ../lib/util/talloc_stack.c:103
Bad talloc magic value - access after free
The program has unexpectedly finished.