OpenMP code in CUDA source file not compiling on Google Colab - jupyter-notebook

I am trying to run a simple Hello World program with OpenMP directives on Google Colab using OpenMP library and CUDA. I have followed this tutorial but I am getting an error even if I am trying to include %%cu in my code. This is my code-
%%cu
#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
/* Main Program */
int main(int argc , char **argv)
{
int Threadid, Noofthreads;
printf("\n\t\t---------------------------------------------------------------------------");
printf("\n\t\t Objective : OpenMP program to print \"Hello World\" using OpenMP PARALLEL directives\n ");
printf("\n\t\t..........................................................................\n");
/* Set the number of threads */
/* omp_set_num_threads(4); */
/* OpenMP Parallel Construct : Fork a team of threads */
#pragma omp parallel private(Threadid)
{
/* Obtain the thread id */
Threadid = omp_get_thread_num();
printf("\n\t\t Hello World is being printed by the thread : %d\n", Threadid);
/* Master Thread Has Its Threadid 0 */
if (Threadid == 0) {
Noofthreads = omp_get_num_threads();
printf("\n\t\t Master thread printing total number of threads for this execution are : %d\n", Noofthreads);
}
}/* All thread join Master thread */
return 0;
}
And this is the error I am getting-
/tmp/tmpxft_00003eb7_00000000-10_15fcc2da-f354-487a-8206-ea228a09c770.o: In function `main':
tmpxft_00003eb7_00000000-5_15fcc2da-f354-487a-8206-ea228a09c770.cudafe1.cpp:(.text+0x54): undefined reference to `omp_get_thread_num'
tmpxft_00003eb7_00000000-5_15fcc2da-f354-487a-8206-ea228a09c770.cudafe1.cpp:(.text+0x78): undefined reference to `omp_get_num_threads'
collect2: error: ld returned 1 exit status
Without OpenMP directives, a simple Hello World program is running perfectly as can be seen below-
%%cu
#include <iostream>
int main()
{
std::cout << "Welcome To GeeksforGeeks\n";
return 0;
}
Output-
Welcome To GeeksforGeeks

There are two problems here:
nvcc doesn't enable or natively support OpenMP compilation. This has to be enabled by additional command line arguments passed through to the host compiler (gcc by default)
The standard Google Colab/Jupyter notebook plugin for nvcc doesn't allow passing of extra compilation arguments, meaning that even if you solve the first issue, it doesn't help in Colab or Jupyter.
You can solve the first problem as described here, and you can solve the second as described here and here.
Combining these in Colab got me this:
and then this:

Related

Can I use QCommandLineParser to determine GUI mode or CLI mode?

One of the programs that I work with has two modes that it can run in: GUI (Graphical User Interface) mode or CLI (Command-Line Interface) mode. We determine which mode to use via a command line argument (i.e., if "--cli" is passed, it will use CLI mode).
The type of QApplication that is instantiated depends on which mode is used: QApplication should be used for GUI mode, and QCoreApplication should be used for CLI mode, because the GUI parts of Qt should not be instantiated for CLI mode (since CLI mode does not use or need them).
I can do that via code similar to the following:
std::unique_ptr<QCoreApplication> app =
(cliMode) ? std::make_unique<QCoreApplication>(argc, argv)
: std::make_unique<QApplication>(argc, argv);
// Do some other stuff...
return app->exec();
Since I am already using Qt, it makes sense to use QCommandLineParser to parse my arguments. After parsing the arguments, I want to analyze them to determine whether we should run in GUI mode or CLI mode. However, it has been becoming increasingly difficult to do so.
The first problem I noticed was the following on Linux (this did not happen in older versions of Qt5, but it does happen in the newer versions):
$ ./myQtApplication --help
QCoreApplication::arguments: Please instantiate the QApplication object first
Segmentation fault (core dumped)
Okay: so I can no longer run the --help command without already having a QApplication object instantiated. I temporarily fixed this by manually parsing the arguments to see whether or not --help is an argument. If it is, go ahead and instantiated the QCoreApplication, parse the arguments, and then exit.
But then I started getting a cryptic error on Mac OS X. When I would run the executable on OS X directly, it would run without any issues. But if I tried to double-click on the .app file or type in the terminal $ open myQtApplication.app, I would get this cryptic error:
LSOpenURLsWithRole() failed with error -10810 for the file ./myQtApplication.app
Since it is a rather cryptic error, it took me a long time to figure out that this error was being caused by the QCommandLineParser being used before having a QApplication object instantiated.
To fix this, I am now doing the following:
Manually parse the arguments at the beginning of the main() function to determine whether or not --cli was passed.
Instantiate a QApplication object based on the results of #1.
Run QCommandLineParser to process the rest of the arguments.
This is not a very clean way to do this because I now have two argument parsers: one to determine if --cli was passed, and the rest for the other arguments.
Is there a much better, or "proper", way to do this?
I guess the main question is: can I use QCommandLineParser to determine whether to instantiate a QCoreApplication object or a QApplication object?
Of course you can use the parser - as long as QCoreApplication already present. If the --cli option is absent, you will switch to a QApplication. Recall that you have full control over the lifetime of the application object.
This works under Qt 4.8 and 5.11 on both Windows and OS X:
// https://github.com/KubaO/stackoverflown/tree/master/questions/app-cli-gui-switch-52649458
#include <QtGui>
#if QT_VERSION >= QT_VERSION_CHECK(5, 0, 0)
#include <QtWidgets>
#endif
struct Options {
bool cli;
};
static Options parseOptionsQt4() {
Options opts = {};
for (auto arg : QCoreApplication::arguments().mid(1)) {
if (arg == "--cli")
opts.cli = true;
else
qFatal("Unknown option %s", arg.toLocal8Bit().constData());
}
return opts;
}
static Options parseOptions() {
if (QT_VERSION < QT_VERSION_CHECK(5, 0, 0)) return parseOptionsQt4();
#if QT_VERSION >= QT_VERSION_CHECK(5, 0, 0)
Options opts = {};
QCommandLineParser parser;
QCommandLineOption cliOption("cli", "Start in command line mode.");
parser.addOption(cliOption);
parser.process(*qApp);
opts.cli = parser.isSet(cliOption);
return opts;
#endif
}
int main(int argc, char *argv[]) {
QScopedPointer<QCoreApplication> app(new QCoreApplication(argc, argv));
auto options = parseOptions();
if (options.cli) {
qDebug() << "cli";
} else {
qDebug() << "gui";
app.reset();
app.reset(new QApplication(argc, argv));
}
if (qobject_cast<QApplication *>(qApp))
QMessageBox::information(nullptr, "Hello", "Hello, World!");
QMetaObject::invokeMethod(qApp, "quit", Qt::QueuedConnection);
return app->exec();
}

breakpad not generate minidump on erase iterator twice

I find breakpad does not handle sigsegv sometimes.
and i wrote a simple example to reproduce it:
#include <vector>
#include <breakpad/client/linux/handler/exception_handler.h>
int InitBreakpad()
{
char core_file_folder[] = "/tmp/cores/";
google_breakpad::MinidumpDescriptor descriptor(core_file_folder);
auto exception_handler_ =
new google_breakpad::ExceptionHandler(descriptor,
nullptr,
nullptr,
nullptr,
true,
-1);
}
int main()
{
InitBreakpad();
// int* ptr = nullptr;
// *ptr = 1;
std::vector<int> sum;
sum.push_back(1);
auto it = sum.begin();
sum.erase(it);
sum.erase(it);
return 0;
}
and gcc is 4.8.5 and my comiple cmd is
g++ test_breakpad.cpp -I./include -I./include/breakpad -L./lib -lbreakpad -lbreakpad_client -std=c++11 -lpthread
run a.out, get "Segmentation fault" but no minidump is generated.
if i uncomment nullptr write, breakpad works!
what should i do to correct it?
GDB debug output:
(gdb) b google_breakpad::ExceptionHandler::~ExceptionHandler()
Breakpoint 2 at 0x402ed0: file src/client/linux/handler/exception_handler.cc, line 264.
(gdb) c
The program is not being run.
(gdb) r
Starting program: /home/zen/tmp/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, google_breakpad::ExceptionHandler::ExceptionHandler (this=0x619040, descriptor=..., filter=0x0, callback=0x0, callback_context=0x0, install_handler=true, server_fd=-1) at src/client/linux/handler/exception_handler.cc:224
224 ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
and i tried breakpad out of process dump, but still got nothing(nullptr write works).
After some debugging I think that the reason that the sum.erase(it) does not create a minidump in your example is due to stack corruption.
While debugging you can see that the variable g_handler_stack_ in src/client/linux/handler/exception_handler.cc is correctly initialized and the google_breakpad::ExceptionHandler instance is correctly added to the vector. However when google_breakpad::ExceptionHandler::SignalHandler is called the vector is reported empty despite no calls to google_breakpad::ExceptionHandler::~ExceptionHandler or any of the std::vector methods that would change the vector.
Some further data points that point to stack corruption is that the code works with clang++. Additionally, as soon as we change the std::vector<int> sum; to a std::vector<int>* sum, which will ensure that we don't corrupt the stack, the minidump is written to disk.

a simple MPI can't run after compile

i write a simple MPI program:
#include <stdio.h>
2 #include "mpi.h"
3
4 int main(int argc,char* argv[])
5 {
6 int rank;
7 int size;
8
9 MPI_Init(0,0);
10 MPI_Comm_rank(MPI_COMM_WORLD,&rank);
11 MPI_Comm_size(MPI_COMM_WORLD,&size);
12 printf("Hello World from process %d of %d\n",rank,size);
13 MPI_Finalize();
14 return 0;
15 }
the program compile successflly,but can't run
i use "mpirun -np 4 ./hello" or "mpirun -np 4 hello"
it shows like this:
_create_ep, create command failed: Operation not permitted
GLEX_ERR(ln0): _init_glex(608), _create_ep: system error
_create_ep, create command failed: Operation not permitted
GLEX_ERR(ln0): _init_glex(608), _create_ep: system error
_create_ep, create command failed: Operation not permitted
GLEX_ERR(ln0): _init_glex(608), _create_ep: system error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(498)........:
MPID_Init(187)...............: channel initialization failed
MPIDI_CH3_Init(89)...........:
MPID_nem_init(320)...........:
MPID_nem_glex_init(74).......:
MPIDI_nem_glex_init_glex(610): Cannot create GLEX endpoint.
besides,i wirite this program on HPC.And I guess the problem "Cannot create GLEX endpoint" maybe related to the HPC(HPC has already deployed MPI).
I'm not too sure of the level of support for MPI_Init() when null pointers are passed as arguments (I think there is something like calling it without arguments supported since MPI 3.0, but I wouldn't commit on that).
However, I would definitely replace MPI_Init(0,0) in your code by MPI_Init( &argc, &argv ) for a starter.
EDIT: my bad, MPI_Init() is supposed to support null pointer as argument as stated here.
However, trying with MPI_Init( &argc, &argv ) would still be my first try for fixing the issue.

MPI: Why does my MPICH program fails for large no. of processes?

/* C Example */
#include <mpi.h>
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
int main (int argc, char* argv[])
{
int rank, size;
int buffer_length = MPI_MAX_PROCESSOR_NAME;
char hostname[buffer_length];
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
MPI_Get_processor_name(hostname, &buffer_length); /* get hostname */
printf( "Hello world from process %d running on %s of %d\n", rank, hostname, size );
MPI_Finalize();
return 0;
}
The above program compiles and run successfully on ubuntu 12.04 for smaller no. of processes. But it fails when I try to execute with 1000s of processes. Why it is so?
I am expecting that scheduler would keep the threads in queue and can dispatch one by one (I am running this code on a single core machine)
Why the following error is coming for large no. of processes and how to resolve this issue?
root#ubuntu:/home# mpiexec -n 1000 ./hello
[proxy:0:0#ubuntu] HYDU_create_process (./utils/launch/launch.c:26): pipe error (Too many open files)
[proxy:0:0#ubuntu] launch_procs (./pm/pmiserv/pmip_cb.c:751): create process returned error
[proxy:0:0#ubuntu] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:935): launch_procs returned error
[proxy:0:0#ubuntu] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0#ubuntu] main (./pm/pmiserv/pmip.c:226): demux engine error waiting for event
Killed
You are running into the open file limit on your system. Default in Ubuntu is 1024. You can try raising the limit in your session with the ulimit command.
ulimit -n 2048

QProcess::readAllStandardOutput gives flaky outputs

I am trying to create a GUI in Qt4 for my tcl based tool. In order to populate widgets I need to execute some tcl commands. I read about QProcess and I am invoking tcl scripts using QProcess and then grabbing their output from stdout.
Suppose I execute 3 commands in tcl then when I query stdout I believe I should see 3 outputs corresponding to each of the three commands, however this is not happening consistently. Behavior is flaky.
As you can see in the main.cpp I am executing multiple commands using runTclCommand() function and in the end executing getData() function to read stdout.
main.cpp:
#include <QApplication>
#include <QProcess>
#include <QDebug>
#include "Tclsh.h"
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
QByteArray out;
Tclsh *tcl = new Tclsh;
tcl->startTclsh();
tcl->runTclCommand("set ::tcl_interactive 1\n");
tcl->runTclCommand("set a 23\n");
tcl->runTclCommand("puts $a\n");
tcl->runTclCommand("set a 40\n");
tcl->runTclCommand("puts $a\n");
// out = idl->getData();
out = tcl->getData();
}
Tclsh.cpp:
#include <QProcessEnvironment>
#include <QProcess>
#include <QDebug>
#include "Tclsh.h"
void Tclsh::startTclsh() {
QString program = "/usr/bin/tclsh8.4";
this->setProcessChannelMode(QProcess::MergedChannels);
this->start(program);
if ( !this->waitForStarted()) {
qDebug()<<"ERROR Starting tclsh";
}
return;
}
void Tclsh::runTclCommand(const char *cmd) {
qDebug()<<"CMD:"<<cmd;
this->write(cmd);
if (!this->waitForBytesWritten()) {
qDebug()<<"Error in writing data";
}
}
QByteArray Tclsh::getData() {
if (!this->waitForReadyRead()) {
qDebug()<<"Error in reading stdout: Ready read signal is not emitted";
}
QByteArray data = this->readAllStandardOutput();
qDebug()<<"DATA:"<<data;
return data;
}
However, sometime I get the following output:
CMD: set ::tcl_interactive 1
CMD: set a 23
CMD: puts $a
CMD: set a 40
CMD: puts $a
DATA: "1
% 23
% 23
% "
And sometimes this:
CMD: set ::tcl_interactive 1
CMD: set a 23
CMD: puts $a
CMD: set a 40
CMD: puts $a
DATA: "1
"
I do not understand why this is happening. I would really appreciate if someone can point me to the error in my approach here.
Thanks,
Newbie
Edit: After some more research, here are my thoughts
According to Qt manual, readyRead signal will be emitted whenever new data is available (as specified by #Frank Osterfeld also, thanks!). It will not wait for complete output data to be available (which is justified since it does not know when will that happen). Hence my approach is not good. What I can do is something like this:
start the process -> wait for process to finish -> read stdout
This will ensure that flaky behavior does not arise as process is already finished when I am reading hence no new data can come.
However, in this proposed approach I am not clear about one thing: Does stdout is specific to a process? I mean can it happen that process which was supposed to read stdout output from process1, can get other stdout data from some other process which happen to write stdout at the same time as process1?
Thanks,
Newbie
I am closing this question. Reading from a channel more than once does not seem to a be a good idea. Instead what I do now is write what I want to write in one go --> close the channel for writing --> then read it back. In that way I get consistent output.

Resources