I get the following problem if I try to run my OpenCL kernel on a server with and NDIVA GPU,
On my mac there is no problem.
It is this line of code that seems to be the problem,
float largest_0 = max(float (sin_i_angle), float (cos_i_angle));
Here is the error message.
File "threed_dp.py", line 918, in gpu_calculate_segment_costs_orig
bld = prg.build()
File "/work/mrdrygal/.local/lib/python3.6/site-packages/pyopencl/__init__.py", line 510, in build
options_bytes=options_bytes, source=self._source)
File "/work/mrdrygal/.local/lib/python3.6/site-packages/pyopencl/__init__.py", line 554, in _build_and_catch_errors
raise err
pyopencl._cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE - clBuildProgram failed: BUILD_PROGRAM_FAILURE
Build on <pyopencl.Device 'Tesla P100-PCIE-16GB' on 'NVIDIA CUDA' at 0x3767e50>:
<kernel>:82:33: error: expected expression
float largest_0 = max(float (sin_i_angle), float (cos_i_angle));
float (sin_i_angle)
Is not a valid expression in C. It is valid in C++ (explicitly invoking an instructor on float()), so perhaps that is why Apple's OpenCL compiler is allowing it. You should change the line to:
float largest_0 = max((float)sin_i_angle, (float)cos_i_angle);
Related
I recently came across AMIEGO. When I try to run the example problems (provided in the example directory) I get the following error.
-------------------------------------------------------------------------------
Exit Flag: True
Elapsed Time: 0.04829263687133789
======================ContinuousOptimization-End=======================================
+------------------------------------------------------------------------------+
| pyOptSparse Error: Received an unknown option: 'Major optimality tolerance' |
+------------------------------------------------------------------------------+
Traceback (most recent call last):
File "/home/sky/anaconda3/lib/python3.8/site-packages/openmdao/utils/concurrent.py", line 65, in concurrent_eval_lb
retval = func(*args)
File "/home/sky/anaconda3/lib/python3.8/site-packages/amiego/kriging.py", line 239, in _calculate_thetas
opt_x, opt_f, success, msg = snopt_opt(_calcll, x0, low, high, title='kriging',
File "/home/sky/anaconda3/lib/python3.8/site-packages/amiego/optimize_function.py", line 76, in snopt_opt
opt.setOption(name, value)
File "/home/sky/anaconda3/lib/python3.8/site-packages/pyoptsparse/pyOpt_optimizer.py", line 829, in setOption
raise Error("Received an unknown option: %s" % repr(name))
pyoptsparse.pyOpt_error.Error
I tested pyoptsparse optimization driver with the sellar problem and it worked as expected. So I think I'm missing something in AMIEGO. And fyi I didn't modify anything in the example, so I am running it with SLSQP(from pyoptsparse driver) for the continuous part(I dont have SNOPT). Any pointers on how to fix this or where to start looking will be helpful.
I've pushed up a couple of fixes to the repository so that you can run it without SNOPT. The basic Branin problem in the examples works and gets to the expected answer now. I can't promise that SLSQP is the best choice for more complicated problems as we usually favor SNOPT over SLSQP in our work. This is still very experimental code, so the documentation is weak and there are still a lot of control knobs and flags that are buried as subcomponent attributes (including ideas that we tried that didn't pan out). But we appreciate users who are willing to try AMIEGO and help us improve it.
The OpenMDAO problem that I'm running is quite complicated so I don't think it would be helpful to post the entire script. However, the basic setup is that my problem root is a ParallelFDGroup (not actually finite differencing for now--just running the problem once) that contains a few normal components as well as a parallel group. The parallel group is responsible for running 56 instances of an external code (one component per instance of the code). Strangely, when I run the problem with 4-8 processors, everything seems to work fine (sometimes even works with 10-12 processors). But when I try to use more processors (20+), I fairly consistently get the errors below. It provides two tracebacks:
Traceback (most recent call last):
File "opt_5mw.py", line 216, in <module>
top.setup() #call setup
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/problem.py", line 644, in setup
self.root._setup_vectors(param_owners, impl=self._impl, alloc_derivs=alloc_derivs)
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/group.py", line 476, in _setup_vectors
self._u_size_lists = self.unknowns._get_flattened_sizes()
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/petsc_impl.py", line 204, in _get_flattened_sizes
return self.comm.allgather(sizes)
File "MPI/Comm.pyx", line 1291, in mpi4py.MPI.Comm.allgather (src/mpi4py.MPI.c:109194)
File "MPI/msgpickle.pxi", line 746, in mpi4py.MPI.PyMPI_allgather (src/mpi4py.MPI.c:48575)
mpi4py.MPI.Exception: MPI_ERR_IN_STATUS: error code in status
Traceback (most recent call last):
File "opt_5mw.py", line 216, in <module>
top.setup() #call setup
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/problem.py", line 644, in setup
self.root._setup_vectors(param_owners, impl=self._impl, alloc_derivs=alloc_derivs)
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/group.py", line 476, in _setup_vectors
self._u_size_lists = self.unknowns._get_flattened_sizes()
File "/home/austinherrema/.local/lib/python2.7/site-packages/openmdao/core/petsc_impl.py", line 204, in _get_flattened_sizes
return self.comm.allgather(sizes)
File "MPI/Comm.pyx", line 1291, in mpi4py.MPI.Comm.allgather (src/mpi4py.MPI.c:109194)
File "MPI/msgpickle.pxi", line 749, in mpi4py.MPI.PyMPI_allgather (src/mpi4py.MPI.c:48609)
File "MPI/msgpickle.pxi", line 191, in mpi4py.MPI.Pickle.loadv (src/mpi4py.MPI.c:41957)
File "MPI/msgpickle.pxi", line 143, in mpi4py.MPI.Pickle.load (src/mpi4py.MPI.c:41248)
cPickle.BadPickleGet: 65
I am running under Ubuntu with OpenMDAO 1.7.3. I have tried running with both mpirun.openmpi (OpenRTE) 1.4.3 and mpirun (Open MPI) 1.4.3 and have gotten the same result in each case.
I found this post that seems to suggest that there is something wrong with the MPI installation. But if this were the case, it strikes me as strange that the problem would work for a small number of processors but not with a larger number. I also can run a relatively simple OpenMDAO problem (no external codes) with 32 processors without incident.
Because the traceback references OpenMDAO unknowns, I wondered if there are limitations on the size of OpenMDAO unknowns. In my case, each external code component has a few dozen array outputs that can be up to 50,000-60,000 elements each. Might that be problematic? Each external code component also reads the same set of input files. Could that be an issue as well? I have tried to ensure that read and write access is defined properly but perhaps that's not enough.
Any suggestions about what might be culprit in this situation are appreciated.
EDIT: I should add that I have tried running the problem without actually running the external codes (i.e. the components in the parallel group are called and set up but the external subprocesses are never actually created) and the problem persists.
EDIT2: I have done some more debugging on this issue and thought I should share the little that I have discovered. If I strip the problem down to only the parallel group containing the external code instances, the problem persists. However, if I reduce the components in the parallel group to basically nothing--just a print function for setup and for solve_nonlinear--then the problem can successfully "run" with a large number of processors. I started adding setup lines back in one by one to see what would create problems. I ran into issues when trying to add many large unknowns to the components. I can actually still add just a single large unknown--for example, this works:
self.add_output('BigOutput', shape=[100000])
But when I try to add too many large outputs like below, I get errors:
for i in range(100):
outputname = 'BigOutput{0}'.format(i)
self.add_output(outputname, shape=[100000])
Sometimes I just get a general segmentation violation error from PETSc. Other times I get a fairly length traceback that is too long to post here--I'll post just the beginning in case it provides any helpful clues:
*** glibc detected *** python2.7: free(): invalid pointer: 0x00007f21204f5010 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7da26)[0x7f2285f0ca26]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/../../libsqlite3.so.0(sqlite3_free+0x4f)[0x7f2269b7754f]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/../../libsqlite3.so.0(+0x1cbbc)[0x7f2269b87bbc]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/../../libsqlite3.so.0(+0x54d6c)[0x7f2269bbfd6c]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/../../libsqlite3.so.0(+0x9d31f)[0x7f2269c0831f]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/../../libsqlite3.so.0(sqlite3_step+0x1bf)[0x7f2269be261f]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/_sqlite3.so(pysqlite_step+0x2d)[0x7f2269e4306d]
/home/austinherrema/miniconda2/lib/python2.7/lib-dynload/_sqlite3.so(_pysqlite_query_execute+0x661)[0x7f2269e404b1]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8942)[0x7f2286c6a5a2]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x86c3)[0x7f2286c6a323]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x86c3)[0x7f2286c6a323]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x89e)[0x7f2286c6b1ce]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(+0x797e1)[0x7f2286be67e1]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53)[0x7f2286bb6dc3]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(+0x5c54f)[0x7f2286bc954f]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x53)[0x7f2286bb6dc3]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x43)[0x7f2286c60d63]
/home/austinherrema/miniconda2/bin/../lib/libpython2.7.so.1.0(+0x136652)[0x7f2286ca3652]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f2286957e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2285f8236d]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:03 9706352 /home/austinherrema/miniconda2/bin/python2.7
00600000-00601000 rw-p 00000000 08:03 9706352 /home/austinherrema/miniconda2/bin/python2.7
00aca000-113891000 rw-p 00000000 00:00 0 [heap]
7f21107d6000-7f2241957000 rw-p 00000000 00:00 0
etc...
its hard to guess whats going on here, but if it works for a small number of processors and not on larger ones one guess might be that the issue shows up when you use more than one node, and data has to get transfered across the network. I have seen bad MPI compilations that behaved this way. Things would work if I kept the job to one node, but broke on more than one.
The traceback shows that you're not even getting through setup. So its not likely to be anything in your external code or any other components run method.
If you're running on a cluster, are you compiling your own MPI? You usually need to compile very with very specific options/libraries for any kind of HPC library. But most HPC systems provide modules you can load that have mpi pre-compiled.
I am running developmental scientific code. I am stuck on a cryptic error message, and am curious what the OpenMDAO team thinks. When I run the code in serial, it works with no issues. When I run it under mpirun, OpenMDAO throws a cryptic error message:
Traceback (most recent call last):
File "test/exampleOptimizationAEP.py", line 129, in <module>
prob['ratedPower'] = ratedPower
.....
File "/scratch/jquick/test/lib/python2.7/site-packages/openmdao-1.7.3-py2.7.egg/openmdao/core/vec_wrapper.py", line 1316, in __setitem__
(self.name, name))
AttributeError: 'params' has not been initialized, setup() must be called before 'ratedPower' can be accessed
I am not sure how to approach this. There is nothing obviously different about the ratedPower variable in the code. What information does this error give me about what is going wrong?
This is a bug in OpenMDAO <= v1.7.2. Look at the output of check_setup and see the list of parameters without associated unknowns. You will find that variable in there. When running in parallel (because of the bug), you can not set any hanging params (ones without associated unknowns) in your setup script.
The way to fix it is to add an IndepVarComp to any variable you need to initialize the value of.
I'm implementing some numerical algorithms on GPU via OpenGL and Qt.
But i am not very familiar with it.
I want to extract some functions from my current shader to some "shader library" and use it in my other shaders by string interpolation. It not hard to implement but i don't know how handle shader's compile errors
I use following code to compile shader
QOpenGLShaderProgram *shaderProgram = new QOpenGLShaderProgram();
if (!shaderProgram->addShaderFromSourceFile(QOpenGLShader::Fragment,fragmentShaderFileName)) {
qDebug() << "Failed to compile fragment shader";
//..........
When some compile error appears Qt print following message (an example)
QOpenGLShader::compile(Fragment): 0:331(9): error: syntax error, unexpected NEW_IDENTIFIER, expecting ',' or ';'
*** Problematic Fragment shader source code ***
//my shader source code
Is possible to catch error line number and use it to build my own error message? (with highlighted line)
According to the Qt documentation, you can use QOpenGLShaderProgram::log():
Returns the errors and warnings that occurred during the last link()
or addShader() with explicitly specified source code.
You can then parse the resulting string to build your own error message.
I have a Mali GPU which does not support local memory at all.
Everytime I run code consisting of local memory it gives me some errors from the device.
So, I want to transfer my codes to a version that only uses global memory.
I was thinking if it is possible to run a prefix sum/parallel reduction algorithm using global memory only on GPU.
EDITED : I was debugging the error and found a strange thing that one particular line is giving the erorr.
I have e line like this:
`#define LOG_LSIZE 8`
`#define LSIZE_SHIFT_VALUE 4`
`#define LOG_NUM_BANKS 2`
`#define GET_CONFLICT_OFFSET(lid) ((lid) >> LOG_NUM_BANKS)`
`#define LSIZE 32`
`__local int lm_sum[2][LSIZE + LOG_LSIZE]`
`**lm_sum[lid >> LSIZE_SHIFT_VALUE][bi] += lm_sum[lid >> LSIZE_SHIFT_VALUE][ai]**`
lid is local id and I used qork groups size 32. I found that the highlighted line is the cause of the error. I tried using fixed values and found that I cannot use lm_sum on the right side of a statement. If I do, that gives me an error. For example, this line also gives me error:
int temp= lm_sum[0][0]
Any idea on what is going on?
Error:
`In initial.cpp***[14100.684249] Mali<ERROR, BASE_MMU>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_mmu.c line: 1240 function:kbase_mmu_report_fault_and_kill
[14100.709724] Unhandled Page fault in AS0 at VA 0x00000002000EC1A0
[14100.709728] raw fault status 0x500003C3
[14100.709730] decoded fault status: SLAVE FAULT
[14100.709733] exception type 0xC3: TRANSLATION_FAULT
[14100.709736] access type 0x3: WRITE
[14100.709738] source id 0x5000
[14100.734958]
[14100.736432] Mali<ERROR, BASE_JD>: In file: /home/jbmaster/work/01.LPD_OpenCL_RFS/01.arm_work_3_0_31/SEC_All_EVT0_TX013-BU-00001-r2p0-00rel0/TX013-BU-00001-r2p0-00rel0/driver/product/kernel/drivers/gpu/arm/t6xx/kbase/src/common/mali_kbase_jm.c line: 899 function:kbase_job_slot_hardstop
[14100.761458] Issueing GPU soft-reset instead of hard stopping job due to a hardware issue
[14100.769517] `
Since lm_sum[0][0] doesn't work, the memory for the array is not allocated. You said your GPU doesn't support local memory. Well, you are trying to use lm_sum which is declared to be in local memory (__local int lm_sum[2][LSIZE + LOG_LSIZE]).