locks in OpenMP - qt

Everyone good time of day!
Not so long ago, I was able to parallel the recursive algorithm for searching for possible options for combining some events. At the moment, the code is as follows:
//#include's
// function announcements
// declaring a global variable:
QVector<QVector<QVector<float>>> variant; (or "std::vector")
int main() {
// reads data from file
// data are converted and analyzed
// the variant variable containing the current best result is filled in (here - by pre-analysis)
#pragma omp parallel shared(variant)
#pragma omp master
// occurs call a recursive algorithm of search all variants:
PEREBOR(Tabl_1, a, i_a, ..., reс_depth);
return 0;
}
void PEREBOR(QVector<QVector<uint8_t>> Tabl_1, QVector<A_struct> a, uint8_t i_a, ..., uint8_t reс_depth)
{
// looking for the boundaries of the first cycle for some reasons
for (int i = quantity; i < another_quantity; i++) {
// the Tabl_1 is processed and modified to determine the number of steps in the subsequent for cycle
for (int k = 0; k < the_quantity_just_found; k++) {
if the recursion depth is not 1, we go down further: {
// add descent to the next recursion level to the call stack:
#pragma omp task
PEREBOR(Tabl_1_COPY, a, i_a, ..., reс_depth-1);
}
else (if we went down to the lowest level): {
if (condition fulfilled) // condition check - READ variant variable
variant = it_is_equal_to_that_,_to_that...;
else
continue;
}
}
}
}
At the moment, this thing really works well, and on six cores the CPU gives an increase of more than 5.7 from the single-core version.
As you can see, with a sufficiently large number of threads, there may be a failure associated with the simultaneous reading/writing of the variant variable. I understand she needs to be protected. At the moment, I see an output only in the use of blocking functions, since the critical section is not suitable because if the variable variant is written in only one section of the code (at the lowest level of recursion), then the reading occurs in many places.
Actually, here is the question - if I apply the constructions:
omp_lock_t lock;
int main() {
...
omp_init_lock(&lock);
#pragma omp parallel shared(variant, lock)
...
}
...
else (if we went down to the lowest level): {
if (condition fulfilled) { // condition check - READ variant variable
omp_set_lock(&lock);
variant = it_is_equal_to_that_,_to_that...;
omp_unset_lock(&lock);
}
else
continue;
...
will this lock protect the reading of the variable in all other places? Or will I need to manually check the lock status and pause the thread before reading elsewhere?
I will be incredibly grateful to the distinguished community for help!

In OpenMP specification (1.4.1 The structure of OpenMP memory model) you can read
The OpenMP API provides a relaxed-consistency, shared-memory model.
All OpenMP threads have access to a place to store and to retrieve
variables, called the memory. In addition, each thread is allowed to
have its own temporary view of the memory. The temporary view of
memory for each thread is not a required part of the OpenMP memory
model, but can represent any kind of intervening structure, such as
machine registers, cache, or other local storage, between the thread
and the memory. The temporary view of memory allows the thread to
cache variables and thereby to avoid going to memory for every
reference to a variable.
This practically means that (as with any relaxed memory model), only at well-defined points, are threads guaranteed to have the same, consistent view on the value of shared variables. In between such points, the temporary view may be different across the threads.
In your code you handled the problem of simultaneous writing of the same variable, but there is no guarantee that an another thread reads the correct value of the variable without additional measures.
You have 3 options to do (Note that each of these solutions not only will handle simultaneous read/writes, but also provides a consistent view on the value of shared variables.):
If your variable is scalar type, the best solution is to use atomic operations. This is the fastest option as atomic operations are typically supported by the hardware.
#pragma omp parallel
{
...
#pragma omp atomic read
tmp=variant;
....
#pragma omp atomic write
variant=new_value;
}
Use critical construct. This solution can be used if your variable is a complex type (such as class) and its read/write cannot be performed atomically. Note that it is much less efficient (slower) than an atomic operation.
#pragma omp parallel
{
...
#pragma omp critical
tmp=variant;
....
#pragma omp critical
variant=new_value;
}
Use locks for each read/write of your variable. Your code is OK for write, but have to use it for reads as well. It requires the most coding, but practically the result is the same as using the critical construct. Note that OpenMP implementations typically use locks to implement critical constructs.

Related

QThreadStorage vs C++11 thread_local

As the title suggests: What are the differences. I know that QThreadStorage existed long before the thread_local keyword, which is probably why it exists in the first place - but the question remains: Is there any significant difference in what they do between those two (except for the extended API of QThreadStorage).
Im am assuming a normal use case of a static variable - as using QThreadStorage as non static is possible, but not recommended to do.
static thread_local int a;
// vs
static QThreadStorage<int> b;
Well, since Qt is open-source you can basically figure out the answers from the Qt sources. I am currently looking at 5.9 and here are some things, that I'd classify as significant:
1) Looking at qthreadstorage.cpp, the get/set methods both have this block in the beginning:
QThreadData *data = QThreadData::current();
if (!data) {
qWarning("QThreadStorage::set: QThreadStorage can only be used with threads started with QThread");
return 0;
}
So (an maybe this has changed/will change) you can't mix QThreadStorage with anything else than QThread, while the thread_local keyword does not have similar restrictions.
2) Quoting cppreference on thread_local:
thread storage duration. The object is allocated when the thread begins and deallocated when the thread ends.
Looking at qthreadstorage.cpp, the QThreadStorage class actually does not contain any storage for T, the actual storage is allocated upon the first get call:
QVector<void *> &tls = data->tls;
if (tls.size() <= id)
tls.resize(id + 1);
void **v = &tls[id];
So this means that the lifetime is quite different between these two - with QThreadStorage you actually have more control, since you create the object and set it with the setter, or default initialize with the getter. E.g. if the ctor for the type could throw, with QThreadStorage you could catch that, with thread_local I am not even sure.
I assume these two are significant enough not to go any deeper.

Do you make safe and unsafe version of your functions or just stick to the safe version? (Embedded System)

let's say you have a function that set an index and then update few variables based on the value stored in the array element which the index is pointing to. Do you check the index to make sure it is in range? (In embedded system environment to be specific Arduino)
So far I have made a safe and unsafe version for all functions, is that a good idea? In some of my other codes I noticed that having only safe functions result in checking conditions multiple time as the libraries get larger, so I started to develop both. The safe function checks the condition and call the unsafe function as shown in example below for the case explained above.
Safe version:
bool RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS)
{
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
return true;
}
return false;
}
Unsafe version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex)
{
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
If I am doing it wrong fundamentally please let me know why and how I could avoid that. Also I would like to know, generally when you program, do you consider the future user to be a fool or you expect them to follow the minimal documentation provided? (the reason I say minimal is because I do not have the time to write a proper documentation)
void RcChannelModule::setCuurentFactorIndexUnsafe(const factorIndex_T factorIndex)
{
currentFactorIndex_ = factorIndex;
}
Safety checks, such as array index range checks, null checks, and so on, are intended to catch programming errors. When these checks fail, there is no graceful recovery: the best the program can do is to log what happened, and restart.
Therefore, the only time when these checks become useful is during debugging and testing of your code. C++ provides built-in functionality for dealing with this through asserts, which are kept in the debug versions of the code, but compiled out from the release version:
void RcChannelModule::setFactorIndexAndUpdateBoundariesUnsafe(factorIndex_T factorIndex) {
assert(factorIndex < N_FACTORS);
setCuurentFactorIndexUnsafe(factorIndex);
updateOutputBoundaries();
}
Note: [When you make a library for external use] an argument-checking version of each external function perhaps makes sense, with non-argument-checking implementations of those and all internal-only functions. If you perform argument checking then do it (only) at the boundary between your library and the client code. But it's pointless to offer a choice to your users, for if you want to protect them from usage errors then you cannot rely on them to choose the "safe" versions of your functions. (John Bollinger)
Do you make safe and unsafe version of your functions or just stick to the safe version?
For higher level code, I recommend one version, a safe one.
High level code, with a large set of related functions and data, the combinations of interactions of data and code are not possible to fully check at development time. When an error is detected, the data should be set to indicate an error state. Subsequent use of data within these functions would be aware of the error state.
For lowest level -time critical routines, I'd go with #dasblinkenlight answer. Create one source code that compiles 2 ways per the debug and release compiles.
Yet keep in mind #pete becker, it this really likely a performance bottle neck to do a check?
With floating-point related routines, use the NaN to help keep track of an unrecoverable error.
Lastly, as able, create functions that do not fail and avoid the issue. With many, not all, this only requires small code additions. It often only adds a constant of time performance penalty and not a O(n) penalty.
Example: Consider a function to lop off the first character of a string - in place.
// This work fine as long as s[0] != 0
char *slop_1(char *s) {
size_t len = strlen(s); // most work is here
return memmove(s, s + 1, len); // and here
}
Instead define the function, and code it, to do nothing when s[0] == 0
char *slop_2(char *s) {
size_t len = strlen(s);
if (len > 0) { // negligible additional work
memmove(s, s + 1, len);
}
return s;
}
Similar code can be applied to OP's example. Note that it is "safe", at least within the function. The assert() scheme can still be used to discovery development issues. Yet the released code, without the assert(), still checks the range.
void RcChannelModule::setFactorIndexAndUpdateBoundaries(factorIndex_T factorIndex)
{
if(factorIndex < N_FACTORS) {
setFactorIndexAndUpdateBoundariesUnsafe(factorIndex);
} else {
assert(1);
}
}
Since you tagged this Arduino and embedded, you have a very resource-constrained system, one of the crappiest processors still manufactured.
On such a system you cannot afford extra error handling. It is better to properly document what values the parameters passed to the function must have, then leave the checking of this to the caller.
The caller can then either check this in run-time, if needed, or otherwise in compile-time with a static assert. Your function would however not be able to implement it as a static assert, as it can't know if factorIndex is a run-time variable or a compile-time constant.
As for "I have no time to write proper documentation", that's nonsense. It takes far less time to document this function than to post this SO question. You don't necessarily have to write an essay in some Word file. You don't necessarily have to use Doxygen or similar.
But you do need to write the bare minimum of documentation: In the header file, document the purpose and expected values of all function parameters in the form of comments. Preferably you should have a coding standard for how to document such functions. A minimal documentation of public API functions in the form of comments is part of your job as programmer. The code is not complete until this is written.

QList crashes when size is large

I am using a QList to store the data read from a SQL Table. The table has more than a million records. I need to get them in a list and then do some processing on the list.
QList<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
while (selectNewDBQuery.next())
{
QSqlRecord selectRec = selectNewDBQuery.record();
QVariantMap varMap;
QString key;
QVariant value;
for (int i=0; i < selectRec.count(); ++i)
{
key = selectRec.fieldName(i);
value = selectRec.value(i);
varMap.insert(key, value);
}
list << varMap;
}
I get "qvector.h, line 534: Out of memory" error.
The program crashes when the list reaches the size of <1197762 items>. I tried using reserve() but it didn't work. Is QList limited to a specific size?
You've ran out of memory because the C++ runtime has reported that it cannot allocate any more memory. It's not a problem with Qt containers. The containers are limited to 2^31-1 items due to the size of int the use for the index. You're nowhere near that.
At the very least:
Use a QVector instead of QList as it has much lower overhead for the QVariantMap element.
Attempt to reserve the space if the query allows it: this will almost halve the memory requirements!
Compile for a 64 bit target if you can.
QVector<QVariantMap> list;
QString selectNewDB = QString("SELECT * FROM newDatabase.M106SRData");
QSqlQuery selectNewDBQuery = QSqlDatabase::database("CurrentDBConn").exec(selectNewDB);
auto const size = selectNewDBQuery.size();
if (size > 0) list.reserve(size);
while (selectNewDBQuery.next())
{
auto selectRec = selectNewDBQuery.record();
QVariantMap varMap;
for (int i=0; i < selectRec.count(); ++i)
{
auto const key = selectRec.fieldName(i);
auto const value = selectRec.value(i);
varMap.insert(key, value);
}
list.append(varMap);
}
You either don't have enough ram, or more likely are using a 32bit Qt build, which cannot utilize more than 4 GB of ram. Or maybe both. Size wise the container itself should be able to handle more than 2 billion elements.
QList ain't helping either, as in your case it will likely store every element as a pointer and do an additional heap allocation for the actual variant map. So you end up with a sizeable additional heap allocation overhead.
And since the query already contains a significant amount of data, it probably eats a decent amount of ram itself.
Unless you have disabled pagefile, running out of ram on its own should not result in a crash, as it would just start paging and ruin performance, but keep running, so you are likely hitting the memory limit for a 32 bit process, which may be as low as a mere 2 GB.
Aside from doing the things Kuba suggested in his answer, you might want to split your query into smaller pieces, and get the results in a few queries rather than one if possible, and process them one at a time, reducing the memory used by the query results and freeing the memory for a query once you are done with it.
There is also the option of saving on RAM from QString, in case you have a lot of repeating strings. As it is implicitly shared, you can have a bunch of identical strings that all use the same underlying data. You can take advantage of this, by using a QSet to keep a collection of unique strings and a quick check if a string is already present. Then instead of using the string from the query result, use the one from the set. All identical strings copied by value from the set will reuse the same string data. In contrast, your current approach will use n amount of space for every n duplicated strings.

Confusion over the time taken by write() and read() sys calls

The below code simply calculates the time taken to write a file.
#include<time.h>
void main()
{
int fp;
long a,b;
char *str = "Life is like that only";
fp = open("tmp.txt",O_WRONLY,0666);
time(&a);
write(fp,str);
time(&b);
/*(b-a) should be the time taken to write
* the file tmp.txt.
*/
close(fp);
return;
}
My question is that if we have a single CPU then whether the time taken (b-a) would be exact or it can be affected by the execution of other process running parallel.
Some posts here mention that write() and read() can be treated almost like atomic syscalls as if they are not successful EINTR is set that simply means to try again.But still does that mean if it is successful then in the course of its execution all other processes are on hold.
Other processes (that are not using I/O or that are using I/O on different devices) can run while your process is waiting for the write to complete, and your process may not immediately get the CPU back after it completes.
In practice, for a small write to a regular file, your write() will probably return immediately after copying your data into a kernel-space buffer, rather than waiting for it to go all the way to the disk.

How do game trainers change an address in memory that's dynamic?

Lets assume I am a game and I have a global int* that contains my health. A game trainer's job is to modify this value to whatever in order to achieve god mode. I've looked up tutorials on game trainers to understand how they work, and the general idea is to use a memory scanner to try and find the address of a certain value. Then modify this address by injecting a dll or whatever.
But I made a simple program with a global int* and its address changes every time I run the app, so I don't get how game trainers can hard code these addresses? Or is my example wrong?
What am I missing?
The way this is usually done is by tracing the pointer chain from a static variable up to the heap address containing the variable in question. For example:
struct CharacterStats
{
int health;
// ...
}
class Character
{
public:
CharacterStats* stats;
// ...
void hit(int damage)
{
stats->health -= damage;
if (stats->health <= 0)
die();
}
}
class Game
{
public:
Character* main_character;
vector<Character*> enemies;
// ...
}
Game* game;
void main()
{
game = new Game();
game->main_character = new Character();
game->main_character->stats = new CharacterStats;
// ...
}
In this case, if you follow mikek3332002's advice and set a breakpoint inside the Character::hit() function and nop out the subtraction, it would cause all characters, including enemies, to be invulnerable. The solution is to find the address of the "game" variable (which should reside in the data segment or a function's stack), and follow all the pointers until you find the address of the health variable.
Some tools, e.g. Cheat Engine, have functionality to automate this, and attempt to find the pointer chain by themselves. You will probably have to resort to reverse-engineering for more complicated cases, though.
Discovery of the access pointers is quite cumbersome and static memory values are difficult to adapt to different compilers or game versions.
With API hooking of malloc(), free(), etc. there is a different method than following pointers. Discovery starts with recording all dynamic memory allocations and doing memory search in parallel. The found heap memory address is then reverse matched against the recorded memory allocations. You get to know the size of the object and the offset of your value within the object. You repeat this with backtracing and get the jump-back code address of a malloc() call or a C++ constructor. With that information you can track and modify all objects which get allocated from there. You dump the objects and compare them and find a lot more interesting values. E.g. the universal elite game trainer "ugtrain" does it like this on Linux. It uses LD_PRELOAD.
Adaption works by "objdump -D"-based disassembly and just searching for the library function call with the known memory size in it.
See: http://en.wikipedia.org/wiki/Trainer_%28games%29
Ugtrain source: https://github.com/sriemer/ugtrain
The malloc() hook looks like this:
static __thread bool no_hook = false;
void *malloc (size_t size)
{
void *mem_addr;
static void *(*orig_malloc)(size_t size) = NULL;
/* handle malloc() recursion correctly */
if (no_hook)
return orig_malloc(size);
/* get the libc malloc function */
no_hook = true;
if (!orig_malloc)
*(void **) (&orig_malloc) = dlsym(RTLD_NEXT, "malloc");
mem_addr = orig_malloc(size);
/* real magic -> backtrace and send out spied information */
postprocess_malloc(size, mem_addr);
no_hook = false;
return mem_addr;
}
But if the found memory address is located within the executable or a library in memory, then ASLR is likely the cause for the dynamic. On Linux, libraries are PIC (position-independent code) and with latest distributions all executables are PIE (position-independent executables).
EDIT: never mind it seems it was just good luck, however the last 3 numbers of the pointer seem to stay the same. Perhaps this is ASLR kicking in and changing the base image address or something?
aaahhhh my bad, i was using %d for printf to print the address and not %p. After using %p the address stayed the same
#include <stdio.h>
int *something = NULL;
int main()
{
something = new int;
*something = 5;
fprintf(stdout, "Address of something: %p\nValue of something: %d\nPointer Address of something: %p", &something, *something, something);
getchar();
return 0;
}
Example for a dynamicaly allocated varible
The value I want to find is the number of lives to stop my lives from being reduced to 0 and getting game over.
Play the Game and search for the location of the lifes variable this instance.
Once found use a disassembler/debugger to watch that location for changes.
Lose a life.
The debugger should have reported the address that the decrement occurred.
Replace that instruction with no-ops
Got this pattern from the program called tsearch
A few related websites found from researching this topic:
http://deviatedhacking.com/index.php?/topic/75-dynamic-memory-allocation/
http://www.edgeofnowhere.cc/viewforum.php?f=183
http://www.oldschoolhack.de/tutorials/Theories%20and%20methods%20of%20code-caves.htm
http://webcache.googleusercontent.com/search?q=cache:4wzMzFIZx54J:gamehacking.com/forums/tutorials-beginners/11597-c-making-game-trainer.html+reading+a+dynamic+memory+address+game+trainer&cd=2&hl=en&ct=clnk&gl=au&client=firefox-a (A google cache version)
http://www.codeproject.com/KB/cpp/codecave.aspx
The way things like Gameshark codes were figured out were by dumping the memory image of the application, then doing one thing, then looking to see what changed. There might be a few things changing, but there should be patterns to look for. E.g. dump memory, shoot, dump memory, shoot again, dump memory, reload. Then look for changes and get an idea for where/how ammo is stored. For health it'll be similar, but a lot more things will be changing (since you'll be moving at the very least). It'll be easiest though to do it when minimizing the "external effects," e.g. don't try to diff memory dumps during a firefight because a lot is happening, do your diffs while standing in lava, or falling off a building, or something of that nature.

Resources