Qt: PNG image transparency lost after opening - qt

I'm trying to open a PNG image using QT but the alpha values for the pixels is always 255 even when the image is blank (i.e., the alpha value should be 0)
I tried converting the image to ARGB format after loading but that didn't help. Here is an example of a blank png I'm trying to load:
Image:
QImage im = QImage("JGbVc0r.png");
int p1 = 20;
int p2 = 20;
QImage img = im.convertToFormat(QImage::Format_ARGB32);
qInfo()<<qRed(img.pixel(p1,p2))<<qBlue(img.pixel(p1,p2))<<qGreen(img.pixel(p1,p2))<<qAlpha(img.pixel(p1,p2));
The expected output is
0 0 0 0
but I get
255 255 255 255
Any suggestions on how to get the desired output? Thanks in advance. I'm using Qt 5.11.

I tried to reproduce the issue of OP.
Unfortunately, imgur.com is blocked in my company (where I stumbled first into this question). Hence, I started with a small sample image I prepared on my own in GIMP.
← It's really small: 8x8 pixel with RGBA channels.
Next, I made a small sample application to inspect the outcome in a QImage – testQImageRGBA.cc:
#include <iomanip>
#include <iostream>
#include <QtWidgets>
void dump(
const QImage &qImg, int row, int col,
std::function<void(const QImage&, int, int)> dumpPixel)
{
qDebug() << qImg;
const int nR = qImg.height(), nC = qImg.width();
const int iR0 = row >= 0 ? row : 0;
const int iRN = row >= 0 ? row + 1 : nR;
const int iC0 = col >= 0 ? col : 0;
const int iCN = col >= 0 ? col + 1 : nC;
std::cout << "Dump of rows [" << iR0 << ", " << iRN << "),"
<< " cols [" << iC0 << ", " << iCN << ")\n"
<< "Pixel format: '#RRGGBBAA'\n";
for (int iR = iR0; iR < iRN; ++iR) {
for (int iC = iC0; iC < iCN; ++iC) {
dumpPixel(qImg, iR, iC);
}
std::cout << '\n';
}
}
int main(int argc, char **argv)
{
std::function<void(const QImage&, int, int)> dumpPixelOP // OPs flavor
= [](const QImage &qImg, int iR, int iC) {
std::cout << std::hex << "_#"
<< std::setw(2) << std::setfill('0') << qRed(qImg.pixel(iC, iR))
<< std::setw(2) << std::setfill('0') << qGreen(qImg.pixel(iC, iR))
<< std::setw(2) << std::setfill('0') << qBlue(qImg.pixel(iC, iR))
<< std::setw(2) << std::setfill('0') << qAlpha(qImg.pixel(iC, iR));
};
std::function<void(const QImage&, int, int)> dumpPixel // my flavor
= [](const QImage &qImg, int iR, int iC) {
const QColor pixel = qImg.pixelColor(iC, iR);
std::cout << std::hex << " #"
<< std::setw(2) << std::setfill('0') << pixel.red()
<< std::setw(2) << std::setfill('0') << pixel.green()
<< std::setw(2) << std::setfill('0') << pixel.blue()
<< std::setw(2) << std::setfill('0') << pixel.alpha();
};
qDebug() << "Qt Version:" << QT_VERSION_STR;
QApplication app(argc, argv);
const QStringList args = app.arguments();
const QString filePath = args.size() > 1 ? args[1] : "smiley.8x8.rgba.png";
const int row = args.size() > 2 ? args[2].toInt() : -1;
const int col = args.size() > 3 ? args[3].toInt() : -1;
const bool useDumpOP = args.size() > 4;
QImage qImg(filePath);
qImg = qImg.convertToFormat(QImage::Format_ARGB32);
dump(qImg, row, col, useDumpOP ? dumpPixelOP : dumpPixel);
return 0;
}
and a Qt project file – testQImageRGBA.pro:
SOURCES = testQImageRGBA.cc
QT += widgets
Compiled and tested in cygwin64:
$ qmake-qt5 testQImageRGBA.pro
$ make && ./testQImageRGBA
g++ -c -fno-keep-inline-dllexport -D_GNU_SOURCE -pipe -O2 -Wall -W -D_REENTRANT -DQT_NO_DEBUG -DQT_WIDGETS_LIB -DQT_GUI_LIB -DQT_CORE_LIB -I. -isystem /usr/include/qt5 -isystem /usr/include/qt5/QtWidgets -isystem /usr/include/qt5/QtGui -isystem /usr/include/qt5/QtCore -I. -I/usr/lib/qt5/mkspecs/cygwin-g++ -o testQImageRGBA.o testQImageRGBA.cc
g++ -o testQImageRGBA.exe testQImageRGBA.o -lQt5Widgets -lQt5Gui -lQt5Core -lGL -lpthread
Qt Version: 5.9.4
QImage(QSize(8, 8),format=5,depth=32,devicePixelRatio=1,bytesPerLine=32,byteCount=256)
Dump of rows [0, 8), cols [0, 8)
Pixel format: '#RRGGBBAA'
#00000000 #00000000 #000000ff #000000ff #000000ff #000000ff #00000000 #00000000
#00000000 #000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff #00000000
#000000ff #ffff00ff #0000ffff #ffff00ff #ffff00ff #0000ffff #ffff00ff #000000ff
#000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff
#000000ff #ffff00ff #ff0000ff #ffff00ff #ffff00ff #ff0000ff #ffff00ff #000000ff
#000000ff #ffff00ff #ffff00ff #ff0000ff #ff0000ff #ffff00ff #ffff00ff #000000ff
#00000000 #000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff #00000000
#00000000 #00000000 #000000ff #000000ff #000000ff #000000ff #00000000 #00000000
The dump reflects exactly what I have drawn in GIMP. Please, note the pixels of corners which are black transparent.
I prepared a CMake source file CMakeLists.txt:
project(QImageRGBA)
cmake_minimum_required(VERSION 3.10.0)
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
find_package(Qt5Widgets CONFIG REQUIRED)
include_directories("${CMAKE_SOURCE_DIR}")
add_executable(testQImageRGBA testQImageRGBA.cc)
target_link_libraries(testQImageRGBA Qt5::Widgets)
and compiled the same source in VS2017. Tested in cmd.exe on Windows 10:
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>set PATH=%PATH%;D:\Scheff\Qt\5.11.2\msvc2017_64\bin;$(LocalDebuggerEnvironment)
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>build-VS2017\Debug\testQImageRGBA.exe
Qt Version: 5.11.2
QImage(QSize(8, 8),format=5,depth=32,devicePixelRatio=1,bytesPerLine=32,sizeInBytes=256)
Dump of rows [0, 8), cols [0, 8)
Pixel format: '#RRGGBBAA'
#00000000 #00000000 #000000ff #000000ff #000000ff #000000ff #00000000 #00000000
#00000000 #000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff #00000000
#000000ff #ffff00ff #0000ffff #ffff00ff #ffff00ff #0000ffff #ffff00ff #000000ff
#000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff
#000000ff #ffff00ff #ff0000ff #ffff00ff #ffff00ff #ff0000ff #ffff00ff #000000ff
#000000ff #ffff00ff #ffff00ff #ff0000ff #ff0000ff #ffff00ff #ffff00ff #000000ff
#00000000 #000000ff #ffff00ff #ffff00ff #ffff00ff #ffff00ff #000000ff #00000000
#00000000 #00000000 #000000ff #000000ff #000000ff #000000ff #00000000 #00000000
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>
There is effectively no difference.
Once at home (with access to imgur.com), I downloaded the image of OP JGbVc0r.png and tried again.
First in cygwin64:
$ make && ./testQImageRGBA JGbVc0r.png 20 20
make: Nothing to be done for 'first'.
Qt Version: 5.9.4
QImage(QSize(1000, 1000),format=5,depth=32,devicePixelRatio=1,bytesPerLine=4000,byteCount=4000000)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
#00000000
then in VS2017 and cmd.exe:
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>build-VS2017\Debug\testQImageRGBA.exe JGbVc0r.png 20 20
Qt Version: 5.11.2
QImage(QSize(1000, 1000),format=5,depth=32,devicePixelRatio=1,bytesPerLine=4000,sizeInBytes=4000000)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
#00000000
OP stated:
The expected output is 0 0 0 0
and that's exactly what I got.
replete provided an interesting hint about a trap concerning QColor::QColor(QRgb) which is subject of Q/A
SO: QImage setting alpha in PNG with transparency.
However, I came to the conclusion that this shouldn't be the issue here because in OPs code
qInfo()<<qRed(img.pixel(p1,p2))<<qBlue(img.pixel(p1,p2))<<qGreen(img.pixel(p1,p2))<<qAlpha(img.pixel(p1,p2));
QImage::pixel() is used which returns QRgb
qRed(), qGreen(), qBlue(), and qAlpha() take a QRgb and return an int.
Hence, the underhanding QColor::QColor(QRgb) is simply not involved. To eliminate my last doubts, I simply tried this as well in cygwin:
$ make && ./testQImageRGBA JGbVc0r.png 20 20 dumpLikeOP
make: Nothing to be done for 'first'.
Qt Version: 5.9.4
QImage(QSize(1000, 1000),format=5,depth=32,devicePixelRatio=1,bytesPerLine=4000,byteCount=4000000)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
_#00000000
and in VS2017 and cmd.exe:
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>build-VS2017\Debug\testQImageRGBA.exe JGbVc0r.png 20 20 dumpLikeOP
Qt Version: 5.11.2
QImage(QSize(1000, 1000),format=5,depth=32,devicePixelRatio=1,bytesPerLine=4000,sizeInBytes=4000000)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
_#00000000
So, I simply cannot reproduce the issue of OP although I tried hard.
The OP might take my code to test as well. Either OP overlooked something else which was not exposed in question or there is a specific issue on OP's system.
My last (desperate) idea was to check what happens if QImage was simply not loaded (e.g. due to file not found).
In cygwin:
$ make && ./testQImageRGBA ImageNotFound.png 20 20
make: Nothing to be done for 'first'.
Qt Version: 5.9.4
QImage(null)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
QImage::pixelColor: coordinate (20,20) out of range
#000000ff
$ make && ./testQImageRGBA ImageNotFound.png 20 20 dumpLikeOP
make: Nothing to be done for 'first'.
Qt Version: 5.9.4
QImage(null)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
QImage::pixel: coordinate (20,20) out of range
QImage::pixel: coordinate (20,20) out of range
QImage::pixel: coordinate (20,20) out of range
QImage::pixel: coordinate (20,20) out of range
_#00303900
Please note, that the out of range accesses are notified on console. Beside of this, the code seems to run properly but returns any arbitrary values.
In VS2017 / cmd.exe:
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>build-VS2017\Debug\testQImageRGBA.exe ImageNotFound.png 20 20
Qt Version: 5.11.2
QImage(null)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
QImage::pixelColor: coordinate (20,20) out of range
#000000ff
C:\Users\Scheff\tests\misc\Qt\QImageRGBA>build-VS2017\Debug\testQImageRGBA.exe ImageNotFound.png 20 20 dumpLikeOP
Qt Version: 5.11.2
QImage(null)
Dump of rows [20, 21), cols [20, 21)
Pixel format: '#RRGGBBAA'
_#QImage::pixel: coordinate (20,20) out of range
00QImage::pixel: coordinate (20,20) out of range
30QImage::pixel: coordinate (20,20) out of range
39QImage::pixel: coordinate (20,20) out of range
00
The order of output changed a bit. Beside of this, I even got the same "arbitrary" values (which IMHO means nothing).

Related

MPI_Sendrecv_replace gets blocked

I am working on a code that implements Cannon matrix multiplication algorithm.
Cannon's algorithm is described in the following fragment in pseudocode:
Executed in parallel:
circular movement with i positions to the left ofsub matrices Ai,x
circular movement with j positions upwards of submatrices Bx,j
for k = 0 to n/p-1
Executed in parallel:
Ci,j = Ci,j + Ai,j * Bi,j
circular movement with 1 position to the left of sub matrices Ai,x
circular movement with 1 position upwards of
sub matrices Bx,j
However my code seems to get blocked in the for loop after sending the submatrix B.
int main(int argc, char* argv[])
{
read_input_files(argc, argv);
int rank, size, i, j, shift;
//print_matrix(N, A, 0);
//print_matrix(N, B, 0);
//print_matrix(N, AB);
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm comm;
MPI_Status status;
int left, right, up, down;
int shiftsource, shiftdest;
int dims[2] = { 0, 0 }, periods[2] = { 1, 1 }, coords[2];
MPI_Dims_create(size, 2, dims);
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, 1, &comm);
MPI_Cart_coords(comm, rank, 2, coords);
MPI_Cart_shift(comm, 1, -1, &right, &left);
MPI_Cart_shift(comm, 0, -1, &down, &up);
//printf("%d --- %d %d %d %d.\n", rank, right, left, up, down);
if (dims[0] != dims[1]) {
printf("The number of processors must be a perfect square.\n");
if (rank == 0)
printf("The number of processors must be a perfect square.\n");
MPI_Finalize();
return 0;
}
int block_size = N / sqrt(size);
cout << rank << " : dims " << dims[0] << "---------------------------------------" << endl;
int* A_sub = make_sub(A, rank, block_size, size);
int* B_sub = make_sub(B, rank, block_size, size);
int* AB_sub = (int*)calloc(block_size * block_size, sizeof(int));
//print_submatrix(block_size, A_sub, rank);
cout << rank << " : coords " << coords[0] << " * " << coords[1] << "---------------------------------------" << endl;
MPI_Cart_shift(comm, 0, -coords[0], &shiftsource, &shiftdest);
MPI_Sendrecv_replace(A_sub, block_size * block_size, MPI_INT, shiftdest, 1, shiftsource, 1, comm, &status);
cout << rank << " : MPI_Sendrecv_replace A_sub " << endl;
//print_submatrix(block_size, A_sub, rank);
MPI_Cart_shift(comm, 1, -coords[1], &shiftsource, &shiftdest);
MPI_Sendrecv_replace(B_sub, block_size * block_size, MPI_INT, shiftdest, 1, shiftsource, 1, comm, &status);
cout << rank << " : MPI_Sendrecv_replace B_sub " << endl;
for (shift = 0;shift < dims[0];shift++) {
for (i = 0;i < block_size;i++) {
for (j = 0;j < block_size;j++)
{
for (k = 0;k < block_size;k++) {
AB_sub[i * block_size + j] += A_sub[i * block_size + k] * B_sub[k * block_size + j];
}
}
}
if(shift == dims[0]-1) print_submatrix(block_size, AB_sub, rank);
MPI_Cart_shift(comm, 1, 1, &left, &right);
MPI_Sendrecv_replace(A_sub, block_size * block_size, MPI_INT, left, 1, right, 1, comm, MPI_STATUS_IGNORE);
cout << rank << " : MPI_Sendrecv_replace A " << endl;
MPI_Cart_shift(comm, 0, 1, &up, &down);
MPI_Sendrecv_replace(B_sub, block_size * block_size, MPI_INT, up, 1, down, 1, comm, MPI_STATUS_IGNORE);
cout << rank << " : MPI_Sendrecv_replace B " <<endl;
}
//print_matrix(N, AB, rank);
//cout << rank << " : coords " << coords[0] << " * " << coords[1] << "---------------------------------------" << endl;
MPI_Gather(&AB_sub, block_size*block_size, MPI_INT, AB, N*N, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();// MPI_Comm_free(&comm); Free up communicator
//print_matrix(N, AB, 0);
return 0;
}
read_input_files() is a function that reads the 2 matrices in files give as cmd line args.
A matrix after reading file:
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
B matrix after reading file:
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
N is the size of matrix, N is 6 in this case.
Your call MPI_Cart_shift(comm, 1, -coords[1], has a strange shift parameter: you're shifting by something depending on the coordinate. That should probably be MPI_Cart_shift(comm,1,-1.

Why am I getting a seg fault with ILOG CP Optimizer

I am brand new to constraint programming and am trying to model my first problem as a constraint program using the ILOG CP Optimizer. I've installed ILOG Optimization Suite 20.1 and have successfully compiled and run many of the included example files.
What I'd like to do is schedule 5 tasks on 2 units. Each task has a release time and a due time. Moreover, on each unit, each task has a processing time and a processing cost.
The following is my code.
#include <ilcp/cp.h>
#include <iostream>
using namespace std;
int main(int argc, const char * argv[])
{
IloEnv env;
try
{
IloModel model(env);
IloInt numUnits = 2;
IloInt numTasks = 5;
IloIntervalVarArray dummyTasks(env, numTasks);
IloInt taskReleaseTimes[] = {0, 21, 15, 37, 3};
IloInt taskDueTimes[] = {200, 190, 172, 194, 161};
IloArray<IloBoolArray> unitTaskAssignment(env, numUnits);
unitTaskAssignment[0] = IloBoolArray(env, numTasks, IloTrue, IloTrue, IloFalse, IloTrue, IloTrue);
unitTaskAssignment[1] = IloBoolArray(env, numTasks, IloTrue, IloFalse, IloTrue, IloFalse, IloTrue);
IloArray<IloIntArray> unitTaskTimes(env, numUnits);
IloArray<IloNumArray> unitTaskCosts(env, numUnits);
IloIntArray minTaskTimes(env, numTasks);
IloIntArray maxTaskTimes(env, numTasks);
IloNumExpr totalCost(env);
unitTaskTimes[0] = IloIntArray(env, numTasks, 51, 67, 0, 24, 76);
unitTaskTimes[1] = IloIntArray(env, numTasks, 32, 0, 49, 0, 102);
unitTaskCosts[0] = IloNumArray(env, numTasks, 3.1, 3.7, 0.0, 3.4, 3.6);
unitTaskCosts[1] = IloNumArray(env, numTasks, 3.2, 0.0, 3.9, 0.0, 3.2);
IloArray<IloIntervalVarArray> taskUnits(env, numTasks);
IloArray<IloIntervalVarArray> unitTasks(env, numUnits);
for(IloInt i = 0; i < numTasks; i++)
{
IloInt minTaskTime = unitTaskTimes[0][i];
IloInt maxTaskTime = unitTaskTimes[0][i];
for(IloInt j = 1; j < numUnits; j++)
{
if(unitTaskTimes[j][i] < minTaskTime) minTaskTime = unitTaskTimes[j][i];
if(unitTaskTimes[j][i] > maxTaskTime) maxTaskTime = unitTaskTimes[j][i];
}
minTaskTimes[i] = minTaskTime;
maxTaskTimes[i] = maxTaskTime;
/* cout << "Minimum task time for task " << i << ": " << minTaskTimes[i] << endl;*/
/* cout << "Maximum task time for task " << i << ": " << maxTaskTimes[i] << endl;*/
}
char name[128];
for(IloInt i = 0; i < numTasks; i++)
{
taskUnits[i] = IloIntervalVarArray(env, numUnits);
sprintf(name, "dummyTask_%ld", i);
dummyTasks[i] = IloIntervalVar(env, minTaskTimes[i], maxTaskTimes[i]);
dummyTasks[i].setStartMin(taskReleaseTimes[i]);
dummyTasks[i].setEndMax(taskDueTimes[i]);
for(IloInt j = 0; j < numUnits; j++)
{
sprintf(name, "task%ld_in_unit%ld", j, i);
taskUnits[i][j] = IloIntervalVar(env, unitTaskTimes[j][i], name);
taskUnits[i][j].setOptional();
taskUnits[i][j].setStartMin(taskReleaseTimes[i]);
taskUnits[i][j].setEndMax(taskDueTimes[i]);
if(!unitTaskAssignment[j][i]) taskUnits[i][j].setAbsent();
totalCost += unitTaskCosts[j][i]*IloPresenceOf(env, taskUnits[i][j]);
}
model.add(IloAlternative(env, dummyTasks[i], taskUnits[i]));
}
for(IloInt j = 0; j < numUnits; j++)
{
unitTasks[j] = IloIntervalVarArray(env, numTasks);
for(IloInt i = 1; i < numTasks; i++)
{
unitTasks[j][i] = taskUnits[i][j];
}
model.add(IloNoOverlap(env, unitTasks[j]));
}
model.add(IloMinimize(env, totalCost));
IloCP cp(model);
cp.setParameter(IloCP::TimeLimit, 20);
if (cp.solve())
{
cout << "There's a solution." << endl;
}
else
{
cp.out() << "No solution found. " << std::endl;
}
}
catch (IloException & ex)
{
env.out() << "Caught " << ex << std::endl;
}
env.end();
return 0;
}
Perhaps there is a better logic that could be applied, and I'm happy to take suggestions on how to better model the problem, but that really isn't the question.
The question is, if I comment out the line
model.add(IloNoOverlap(env, unitTasks[j]));
everything compiles and runs fine, but if I leave it in, the program compiles, but seg faults on execution. Why?
Here's the valgrind output if it helps:
==361075== Memcheck, a memory error detector
==361075== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==361075== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==361075== Command: ./CP_test
==361075==
! --------------------------------------------------- CP Optimizer 20.1.0.0 --
! Minimization problem - 15 variables, 5 constraints
! TimeLimit = 20
! Initial process time : 1.00s (0.99s extraction + 0.02s propagation)
! . Log search space : 18.3 (before), 18.3 (after)
! . Memory usage : 335.4 kB (before), 335.4 kB (after)
! Using parallel search with 8 workers.
! ----------------------------------------------------------------------------
! Best Branches Non-fixed W Branch decision
0 15 -
+ New bound is 17.30000
^C==361075==
==361075== Process terminating with default action of signal 2 (SIGINT)
==361075== at 0x4882A9A: futex_wait (futex-internal.h:141)
==361075== by 0x4882A9A: futex_wait_simple (futex-internal.h:172)
==361075== by 0x4882A9A: pthread_barrier_wait (pthread_barrier_wait.c:184)
==361075== by 0xCEFA06: IlcParallel::SynchronizedMaster::workerSynchronize(IlcParallel::ThreadWorker*) (in /home/nate/Dropbox/Princeton/Maravelias/DCA_Branch_and_Cut/cplex_implementation/CP_optimizer_test_instance/CP_test)
==361075== by 0x66F224: IlcParallelEngineI::SynchronizedMaster::workerSynchronize(IlcParallel::ThreadWorker*) (in /home/nate/Dropbox/Princeton/Maravelias/DCA_Branch_and_Cut/cplex_implementation/CP_optimizer_test_instance/CP_test)
==361075== by 0xCF3B60: IlcParallel::SynchronizedMaster::ThreadIO::workerWaitInput() (in /home/nate/Dropbox/Princeton/Maravelias/DCA_Branch_and_Cut/cplex_implementation/CP_optimizer_test_instance/CP_test)
==361075== by 0xCF4182: IlcParallel::ThreadWorker::startup() (in /home/nate/Dropbox/Princeton/Maravelias/DCA_Branch_and_Cut/cplex_implementation/CP_optimizer_test_instance/CP_test)
==361075== by 0xCF5929: IlcThread::CallStartup(void*) (in /home/nate/Dropbox/Princeton/Maravelias/DCA_Branch_and_Cut/cplex_implementation/CP_optimizer_test_instance/CP_test)
==361075== by 0x487A608: start_thread (pthread_create.c:477)
==361075== by 0x4D0A292: clone (clone.S:95)
==361075==
==361075== HEAP SUMMARY:
==361075== in use at exit: 1,161,688 bytes in 353 blocks
==361075== total heap usage: 942 allocs, 589 frees, 2,260,411 bytes allocated
==361075==
==361075== LEAK SUMMARY:
==361075== definitely lost: 19,728 bytes in 1 blocks
==361075== indirectly lost: 9,752 bytes in 13 blocks
==361075== possibly lost: 2,304 bytes in 8 blocks
==361075== still reachable: 1,129,904 bytes in 331 blocks
==361075== of which reachable via heuristic:
==361075== stdstring : 35 bytes in 1 blocks
==361075== suppressed: 0 bytes in 0 blocks
==361075== Rerun with --leak-check=full to see details of leaked memory
==361075==
==361075== For lists of detected and suppressed errors, rerun with: -s
==361075== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Any advice is appreciated. Please note that I attempted to post to the IBM forum, but I apparently don't have permission for some reason.
You have a segmentation fault because unitTasks[j][0] is null pointer since your array initialization loop starts at 1.
Also, you should try to compile in debug mode (without -DNDEBUG) and thus you would get an assertion failure when the null handle is used.

QPrinter rounds pagesize if it is close to the standard page sizes (A3, A4 ...)

I want to save and restore QPrinter::pagesize for PageSize::Custom.
But when I save size, I read strange rounding size:
QPrinter p;
for(int i=0; i<20000; ++i) {
QSizeF size( qreal(rand()%100000)/100, qreal(rand()%100000)/100 );
p.setPaperSize( size, QPrinter::Millimeter );
if( size != p.paperSize(QPrinter::Millimeter) )
qDebug() << size << "->" << p.paperSize(QPrinter::Millimeter);
}
QSizeF(216.48, 321.33) -> QSizeF(215.9, 322.3)
QSizeF(250.15, 352.36) -> QSizeF(250, 353)
QSizeF(178.75, 227.77) -> QSizeF(177.8, 228.6) // 178.75 - 177.8 = 0.95 !!!
QSizeF(321.24, 445.22) -> QSizeF(322, 445)
QSizeF(182.6, 258.4) -> QSizeF(182, 257) // 258.4 - 257 = 1.4 !!!
QSizeF(382.17, 279.77) -> QSizeF(381, 279.4)
QSizeF(111.1, 208.13) -> QSizeF(110, 208) // 111.1 - 110 = 1.1 !!!
QSizeF(32.32, 43.67) -> QSizeF(32, 45)
QSizeF(114.07, 163.04) -> QSizeF(114, 162)
QSizeF(228.5, 323.36) -> QSizeF(229, 324)
QSizeF(63.81, 92.1) -> QSizeF(64, 91)
11 values have terrible rounding, and ~20000 values are valid.
Size rounded to standard (A0, A4 ......)
If its width and height differ by any standard size less than ~ 1.2
How disable it? Example code with problem: (freeze 210 value)
QDoubleSpinBox sb;
sb.setRange(0.0, 300.0);
sb.setValue(210.0);
sb.show();
QObject::connect(&sb, qOverload<double>(&QDoubleSpinBox::valueChanged),[&sb](double value){
QPrinter pr;
pr.setPaperSize( QSizeF(value, 297.0), QPrinter::Millimeter );
sb.blockSignals(true);
sb.setValue( pr.paperSize(QPrinter::Millimeter).width() );
sb.blockSignals(false);
});
QT5:
void setPaperSize(QPrinter::PaperSize newPaperSize) is obsolete.
This is a wild guess:
QT uses predefined sizes and PaperSize paperSize() returns the "nearest" one. See this link here but is not well programmed -> obsolete.

Arduino: convert boolean array to decimal

I have an issue with my Arduino. I am trying to convert a boolean array into an int with this piece of code:
int boolean_to_decimal(bool bol[]) {
int somme=0;
for (int i = 0; i < 6; i++){
somme += bol[i] * pow(2, 5-i);
}
return somme;
}
Nothing really impressive but here are my results:
010101 == 20 (instead of 21)
100101 == 36 (instead of 37)
101001 == 40 (instead of 41)
011001 == 23 (instead of 25)
etc
Thank you for your time, David
Using floating-point function pow() for integers seems bad because it may contain errors. Try using bit-shifting instead.
int boolean_to_decimal(bool bol[]){
int somme=0;
for (int i = 0; i<6; i++){
somme += bol[i]*(1 << (5-i));
}
return somme;
}

openCL CL_OUT_OF_RESOURCES Error

I'm Trying to convert a code written in Cuda to openCL and run into some trouble. My final goal is to implement the code on an Odroid XU3 board with a Mali T628 GPU.
In order to simplify the transition and save time trying to debug openCL kernels I've done the following steps:
Implement the code in Cuda and test it on a Nvidia GeForce 760
Implement the code in openCL and test it on a Nvidia GeForce 760
test the openCL code on an Odroid XU3 board with a Mali T628 GPU.
I know that different architectures may have different optimizations but that isn't my main concern for now. I manged to run the openCL code on my Nvidia GPU with no apparent issues but keep getting strange errors when trying to run the code on the Odroid board. I know that different architectures have different handling of exceptions etc. but I'm not sure how to solve those.
Since the openCL code works on my Nvidia I assume that I managed to do the correct transition between thread/blocks -> workItems/workGroups etc.
I already fixed several issues that relate to the cl_device_max_work_group_size issue so that can't be the cuase.
When running the code i'm getting a "CL_OUT_OF_RESOURCES" error. I've narrowed the cause of the error to 2 lines in the code but not sure to fix those issues.
the error is caused by the following lines:
lowestDist[pixelNum] = partialDiffSumTemp; both variables are private variables of the kernel and therefor I don't see any potential issue.
d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 0] = bestDisparity[0];
Here I guess the cause is "OUT_OF_BOUND" but not sure how to debug it since the original code doesn't have any issue.
My Kernel code is is:
#define ALIGN_IMAGE_WIDTH 64
#define NUM_PIXEL_PER_THREAD 4
#define MIN_DISPARITY 0
#define MAX_DISPARITY 55
#define WINDOW_SIZE 19
#define WINDOW_RADIUS (WINDOW_SIZE / 2)
#define TILE_SHARED_MEM_WIDTH 96
#define TILE_SHARED_MEM_HEIGHT 32
#define TILE_BOUNDARY_WIDTH 64
#define TILE_BOUNDARY_HEIGHT (2 * WINDOW_RADIUS)
#define BLOCK_WIDTH (TILE_SHARED_MEM_WIDTH - TILE_BOUNDARY_WIDTH)
#define BLOCK_HEIGHT (TILE_SHARED_MEM_HEIGHT - TILE_BOUNDARY_HEIGHT)
#define THREAD_NUM_WIDTH 8
#define THREADS_NUM_HEIGHT TILE_SHARED_MEM_HEIGHT
//TODO fix input arguments
__kernel void hello_kernel( __global unsigned char* d_leftImage,
__global unsigned char* d_rightImage,
__global float* d_disparityLeft) {
int blockX = get_group_id(0);
int blockY = get_group_id(1);
int threadX = get_local_id(0);
int threadY = get_local_id(1);
__local unsigned char leftImage [TILE_SHARED_MEM_WIDTH * TILE_SHARED_MEM_HEIGHT];
__local unsigned char rightImage [TILE_SHARED_MEM_WIDTH * TILE_SHARED_MEM_HEIGHT];
__local unsigned int partialDiffSum [BLOCK_WIDTH * TILE_SHARED_MEM_HEIGHT];
int alignedImageWidth = 640;
int partialDiffSumTemp;
float bestDisparity[4] = {0,0,0,0};
int lowestDist[4];
lowestDist[0] = 214748364;
lowestDist[1] = 214748364;
lowestDist[2] = 214748364;
lowestDist[3] = 214748364;
// Read image blocks into shared memory. read is done at 32bit integers on a uchar array. each thread reads 3 integers(12byte) 96/12=8threads
int sharedMemIdx = threadY * TILE_SHARED_MEM_WIDTH + 4 * threadX;
int globalMemIdx = (blockY * BLOCK_HEIGHT + threadY) * alignedImageWidth + blockX * BLOCK_WIDTH + 4 * threadX;
for (int i = 0; i < 4; i++) {
leftImage [sharedMemIdx + i ] = d_leftImage [globalMemIdx + i];
leftImage [sharedMemIdx + 4 * THREAD_NUM_WIDTH + i ] = d_leftImage [globalMemIdx + 4 * THREAD_NUM_WIDTH + i];
leftImage [sharedMemIdx + 8 * THREAD_NUM_WIDTH + i ] = d_leftImage [globalMemIdx + 8 * THREAD_NUM_WIDTH + i];
rightImage[sharedMemIdx + i ] = d_rightImage[globalMemIdx + i];
rightImage[sharedMemIdx + 4 * THREAD_NUM_WIDTH + i ] = d_rightImage[globalMemIdx + 4 * THREAD_NUM_WIDTH + i];
rightImage[sharedMemIdx + 8 * THREAD_NUM_WIDTH + i ] = d_rightImage[globalMemIdx + 8 * THREAD_NUM_WIDTH + i];
}
barrier(CLK_LOCAL_MEM_FENCE);
int imageIdx = sharedMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS;
int partialSumIdx = threadY * BLOCK_WIDTH + 4 * threadX;
for(int dispLevel = MIN_DISPARITY; dispLevel <= MAX_DISPARITY; dispLevel++) {
// horizontal partial sum
partialDiffSumTemp = 0;
#pragma unroll
for(int i = imageIdx - WINDOW_RADIUS; i <= imageIdx + WINDOW_RADIUS; i++) {
//partialDiffSumTemp += calcDiff(leftImage [i], rightImage[i - dispLevel]);
partialDiffSumTemp += abs(leftImage[i] - rightImage[i - dispLevel]);
}
partialDiffSum[partialSumIdx] = partialDiffSumTemp;
barrier(CLK_LOCAL_MEM_FENCE);
for (int pixelNum = 1, i = imageIdx - WINDOW_RADIUS; pixelNum < NUM_PIXEL_PER_THREAD; pixelNum++, i++) {
partialDiffSum[partialSumIdx + pixelNum] = partialDiffSum[partialSumIdx + pixelNum - 1] +
abs(leftImage[i + WINDOW_SIZE] - rightImage[i - dispLevel + WINDOW_SIZE]) -
abs(leftImage[i] - rightImage[i - dispLevel]);
}
barrier(CLK_LOCAL_MEM_FENCE);
// vertical sum
if(threadY >= WINDOW_RADIUS && threadY < TILE_SHARED_MEM_HEIGHT - WINDOW_RADIUS) {
for (int pixelNum = 0; pixelNum < NUM_PIXEL_PER_THREAD; pixelNum++) {
int rowIdx = partialSumIdx - WINDOW_RADIUS * BLOCK_WIDTH;
partialDiffSumTemp = 0;
for(int i = -WINDOW_RADIUS; i <= WINDOW_RADIUS; i++,rowIdx += BLOCK_WIDTH) {
partialDiffSumTemp += partialDiffSum[rowIdx + pixelNum];
}
if (partialDiffSumTemp < lowestDist[pixelNum]) {
lowestDist[pixelNum] = partialDiffSumTemp;
bestDisparity[pixelNum] = dispLevel - 1;
}
}
}
}
if (threadY >= WINDOW_RADIUS && threadY < TILE_SHARED_MEM_HEIGHT - WINDOW_RADIUS && blockY < 32) {
d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 0] = bestDisparity[0];
d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 1] = bestDisparity[1];
d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 2] = bestDisparity[2];
d_disparityLeft[globalMemIdx + TILE_BOUNDARY_WIDTH - WINDOW_RADIUS + 3] = bestDisparity[3];
}
}
Thanks for all the help
Yuval
From my experience NVidia GPUs not always crash on out of bound access and many times kernel still returns expected results.
Use printf to check the indexes. If you have Nvidia OpenCL 1.2 driver installed printf should be available as a core function. As far as I checked Mali-T628 uses OpenCL 1.1 then check if printf is available as a vendor extension. Also you can run your kernel on AMD/Intel CPU where printf is available (OpenCL 1.2 / 2.0).
Alternative way of checking indexes can be passing __global int* debug array where you would store indexes and then check them on the host. Make sure to allocate it big enough so that out of bound index will be recorded.

Resources