I am using pybullet-gym to evaluate my policy model and visualize its interactions; however, when it renders the environment using the following sample code (taken from its own repo) jupyter notebook crashes and restarts its kernel :
import gym
import pybulletgym
env = gym.make('HumanoidPyBulletEnv-v0')
env.render()
env.reset()
Here is the message in shell:
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=2
argv[0] = --unused
argv[1] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=GeForce MX150/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 450.102.04
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
pthread_getconcurrency()=0
Version = 3.3.0 NVIDIA 450.102.04
Vendor = NVIDIA Corporation
Renderer = GeForce MX150/PCIe/SSE2
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = NVIDIA Corporation
ven = NVIDIA Corporation
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":1"
after 21540 requests (21540 known processed) with 0 events remaining.
Related
CefSharp v99.2.120
I am getting the following error in my CefSharp application:
ERROR::gpu_process_host.cc (967) GPU process exited unexpetedly: exit_code=532462766 WARNING:gpu_process_host.cc(1273) The GPU process has crashed 1 time(s)
And this repeats until crashed 3 times until
FATAL:gpu_data_manager_impl_private.cc(417) GPU process isn't usable. Goodbye/
I am using the following settings:
CefSettings settings = new CefSettings() settings.CefCommandLineArgs.Add("disable-gpu");
settings.CefCommandLineArgs.Add("disable-gpu-compositing");
settings.CefCommandLineArgs.Add("disable-gpu-vsync");
settings.CefCommandLineArgs.Add("disable-software-rasterizer");
App still crashes with same error.
Added
settings.DisableGpuAcceleration()
Still the same.
Expected there to be no GPU in use, but using the above settings doesn't alter anything.
I'm trying to get started using Keras and I have one of the new fancy type of Nvidia GPUs but I can't seem to get it off the ground despite the fact that I'm using a fresh installation of Ubuntu (20.04).
On my first attempt, I noticed that Ubuntu detected my graphics card so I installed it by going into "Additional Drivers." I then installed Keras and Tensorflow using the following commands and yielded no errors.
install.packages("keras")
library(keras)
install_keras(tensorflow = "gpu")
However, when I try to actually set up a Keras model,
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
I get this awful error message:
2021-01-14 09:04:53.188680: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-14 09:04:53.189214: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-01-14 09:04:53.224466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-01-14 09:04:53.224843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:09:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.785GHz coreCount: 68 deviceMemorySize: 9.78GiB deviceMemoryBandwidth: 707.88GiB/s
2021-01-14 09:04:53.224860: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-01-14 09:04:53.226413: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-01-14 09:04:53.226446: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-01-14 09:04:53.226935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-01-14 09:04:53.227061: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-01-14 09:04:53.227139: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/arta/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib:/usr/local/cuda-11.2/lib64:::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server:/usr/local/cuda-11.2/lib64
2021-01-14 09:04:53.227437: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-01-14 09:04:53.227513: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-01-14 09:04:53.227519: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-01-14 09:04:53.228275: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-14 09:04:53.228290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-14 09:04:53.228293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
As you might notice, this error message mentions cuda-11.2, however, I got an almost identical error message when I was using my system's default cuda-10.1, which I suppose came with the driver.
I did a number of things, including downloading and trying to install cuDNN direct from Nvidia's website using their documentation, and adding cuda to PATH and LD_LIBRARY_PATH, to no avail.
Finally, I removed my r-reticulate conda environment so that I can reinstall Tensorflow again from scratch but against cuda 11.2 instead of the default 10.1.
I followed the directions on this blog post but I instead substituted every instance of 10.1 with 11.2, and libcudnn.so.7 with libcudnn.so.8, since that's the newest version available and it's the one I downloaded to my system, which brings me to the above error message, which is almost the same as the one I got when I was using 10.1, which came default with my computer.
Also, I noticed something strange when I tried to use Tensorflow in R again. I installed it using install_keras(tensorflow = "gpu") with no discernible problems, but when I called the following command:
imdb <- dataset_imdb(num_words = 10000)
It started downloading and installing it for me once again, but it gave me this warning:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-gpu 2.2.0 requires tensorboard<2.3.0,>=2.2.0, but you have tensorboard 2.4.0 which is incompatible.
tensorflow-gpu 2.2.0 requires tensorflow-estimator<2.3.0,>=2.2.0, but you have tensorflow-estimator 2.4.0 which is incompatible.
What am I supposed to make of this? Why is it that it can use the right installation of CUDA:
2021-01-14 09:00:06.766462: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
But it can't use another file somewhere else?
2021-01-14 09:04:53.227139: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/arta/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib:/usr/local/cuda-11.2/lib64:::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server:/usr/local/cuda-11.2/lib64
What do I do now? Why can't I get gpu acceleration to work? My plan is to follow the directions in that blog post and purge all Nvidia software from Ubuntu and try again using 10.1, since that seems to be the most stable version.
Thanks to #RobertCrovella, I uninstalled CUDA, cuDNN etc. because of the version mismatch and reinstalled CUDA version 11.0 with cuDNN 8.0.
> tensorflow::tf_gpu_configured()
...
tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/device:GPU:0 with 8779 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:09:00.0, compute capability: 8.6)
GPU device name: /device:GPU:0[1] TRUE
do I understand correctly that if I installed cuda 11.0 and cuDNN 8.0 to cuda 11.0 then all these errors disappear?
I have installed cuda 11.2 and found cuDNN 8 to cuda 11.1. I've installed them then with python3 (3.8 ubuntu 20.04.1 LTS default) pip3 and tensorflow etc. In python the rip seems to be working but in R it's broken.
I have created symbolic links to the existing versions and an R code get to the point where it should use the gpu but it is aborted with core dump.
I was following the steps given at Gluon Documentation to run JavaFX on Raspberry Pi 4 via DRM. I downloaded the JavaFX EA 16 builds from here.
javafx.properties file :
javafx.version=16-internal
javafx.runtime.version=16-internal+28-2020-11-10-180413
javafx.runtime.build=28
After cloning the samples repository containing hellofx, I compiled it via javac (according to the steps) and then ran this command to run it using DRM:
sudo -E java -Dmonocle.platform=EGL -Djava.library.path=/opt/arm32hfb-sdk/lib -Dmonocle.egl.lib=/opt/arm32fb-sdk/lib/libgluon_drm.so --module-path /opt/arm32fb-sdk/lib --add-modules javafx.controls -cp dist/. hellofx.HelloFX
However, this caused the following error :
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x9c3314dc, pid=734, tid=746
#
# JRE version: OpenJDK Runtime Environment (11.0.9+11) (build 11.0.9+11-post-Raspbian-1deb10u1)
# Java VM: OpenJDK Server VM (11.0.9+11-post-Raspbian-1deb10u1, mixed mode, serial gc, linux-)
# Problematic frame:
# C [libgluon_drm.so+0x14dc] getNativeWindowHandle+0x54
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/pi/samples/CommandLine/Modular/CLI/hellofx/hs_err_pid734.log
#
# If you would like to submit a bug report, please visit:
# Unknown
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted
It seems that while loading libgluon_drm.so in JavaFXSDK/lib/ fails at getNativeWindowHandle
What's weird is that after I ran sudo apt install libegl* mesa* libgl*, it actually succeeded but was asking me to set variable ENABLE_GLUON_COMMERCIAL_EXTENSIONS as true, which I had already done.
However, after rebooting, it started showing the same error.
I am using a Raspberry Pi 4 Model B with 2GB RAM. It is running on Raspberry Pi OS 32-Bit with desktop.
I had performed all of this on a clean installation.
Pi4 has both vc4 for render, and v3d for 3D. You can probe the devices for their capabilities - only one should acknowledge that it has DRIVER_RENDER or DRIVER_MODESET capabilities.
Pi4 DRM questions
The card which JavaFX selects by default is /dev/dri/card1. In my case, /dev/dri/card0 was the one to be used for render, and not card1. I solved the issue by using the following runtime argument :
-Degl.displayid=/dev/dri/card0
The JavaFX Version I used was 16-ea+5.
I am able to run a number of GUI applications successfully on windows subsystem for linux (WSL), Ubuntu 14.04.4 LTS, using X-forwarding (via MobaXterm).
I recently tried to run an application that uses OpenGL. Although the GUI opens, there are a number of errors and some aspects of the GUI don't work properly. The errors are:
QT error: 1
QT error: <PyQt5.QtCore.QMessageLogContext object at 0x7f88e21d9ba8>
QT error: QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-user'
libGL error: failed to load driver: swrast
I can get rid of the swrast error by setting export LIBGL_ALWAYS_INDIRECT=1 but the other errors remain:
QT error: 1
QT error: <PyQt5.QtCore.QMessageLogContext object at 0x7fed3689e828>
QT error: QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-user'
A typical error when attempting to use part of the GUI is:
WARNING: QT error: 1 (Gui.qtMessageHandler:51)
WARNING: QT error: <PyQt5.QtCore.QMessageLogContext object at 0x7f9b35e7b6d8> (Gui.qtMessageHandler:51)
WARNING: QT error: QOpenGLWidget: Failed to make context current (Gui.qtMessageHandler:51)
I can run glxgears, but I realise this only checks a fraction of OpenGLs functions. If I run glmark2 I get the following:
glmark2
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
Error: Glmark2 needs OpenGL(ES) version >= 2.0 to run (but version string is: '1.4 (4.0.0 - Build 10.18.10.4358)')!
Error: main: Could not initialize canvas
If I run find /usr -iname "*libGL.so*" -exec ls -l -- {} + suggested on this link https://askubuntu.com/questions/541343/problems-with-libgl-fbconfigs-swrast-through-each-update I get the following output but I'm not sure if this indicates an error or not?
lrwxrwxrwx 1 root root 14 Jan 12 2016 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1 -> libGL.so.1.2.0
-rw-r--r-- 1 root root 413968 Jan 12 2016 /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
Running glxinfo seems to indicate the on-chip graphics card is used (there is a separate AMD graphics card but I don't think it's necessary to use that).
glxinfo | grep render
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer,
OpenGL renderer string: Intel(R) HD Graphics 4000
How can I successfully run OpenGL applications via WSL?
edit: solved. I was not using the proper version of
cudnn. After downloading and copying over cuDNN v5.1 Library for Windows 10, everything works fine so far. I should have followed tensorflow windows guidelines more
closely.
I just spent some time to set up a windows system to run
r-tensorflow with keras and tensorflow as a backend.
I followed these blog instructions, and I am able to get the hello world example to work. However, when I continue to some R keras examples
, several will just crash and exit RGui during running the training phase,
e.g. mnist_cnn fails at the training model block.
# train and evaluate
model %>% fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
validation_data = list(x_test, y_test)
)
A Visual Studio JIT popup shows "An unhandled win32 exception occured in Rgui.exe[10456]." And VS editor shows, "Unhandled exception at 0x00007FFFEA2734BE (ucrtbase.dll) in Rgui.exe: Fatal program exit requested."
There is very little load on CPU and or GPU during runtime
(less than 30%). Any ideas on how to further debug are appreciated.
Hardware/Software/Setup:
windows 10 Pro 64 bit OS x64 cpu
intel i7-7700 #3.60GHz
64G ram
R 64 bit 3.4.1
Anaconda 3/Python 3.5.4
Nvidia 1060 6G
CUDA 8.0 V8.0.60
Visual Studio 2015 community edition
using
library(tensorflow)
use_condaenv("r-tensorflow")
library(keras)
I also just ran in Rstudio w/ rdesktop log diagnostics on the file and got the following log after crash (I see it cannot find file, but not certain what file).
20 Aug 2017 01:29:32 [rdesktop] ERROR system error 2 (The system cannot find the file specified);
OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84;
LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
20 Aug 2017 01:29:42 [rdesktop] ERROR system error 2 (The system cannot find the file specified);
OCCURRED AT: virtual void rstudio::core::http::NamedPipeAsyncClient::connectAndWriteRequest() C:/Users/Administrator/rstudio/src/cpp/core/include/core/http/NamedPipeAsyncClient.hpp:84;
LOGGED FROM: void rstudio::desktop::NetworkReply::onError(const rstudio::core::Error&) C:\Users\Administrator\rstudio\src\cpp\desktop\DesktopNetworkReply.cpp:288
edit: stepping through several more exampes, several seem to work fine -- it could just be specific to the particular code and my setup. I don't know enough right now to troubleshoot the code ideosyncracies. Those that crash also seem to do so at the model fitting stage. I'll update a list on crash/no-crash as I test. pass = no-crash, fail = crash and exit
mnist_hierarchical_rnn # pass
mnist_transfer_cnn # fail
variational_autoencoder_deconv # fail
stateful_lstm # pass
variational_autoencoder # pass
reuters_mlp # pass
mnist_mlp # pass
mnist_irnn # pass
mnist_cnn # fail
mnist_antirectifier # fail