Loading Flux and CuArrays in Different Orders Causes Errors - julia

I'm trying to get Flux and CuArrays to work on my GPU enabled laptop using POP!_OS. I get errors on the second package that I load.
I initially didn't have CUDA tools installed but now that I do I get errors for each package.
My cuda info is:
(base) ➜ ~ find / -type d -name cuda 2>/dev/null
/usr/lib/cuda-10.1/targets/x86_64-linux/include/thrust/system/cuda
/usr/local/cuda-10.1/targets/x86_64-linux/include/thrust/system/cuda
/home/kailukowiak/anaconda3/lib/python3.7/site-packages/numba/cuda
/home/kailukowiak/anaconda3/pkgs/numba-0.43.1-py37h962f231_0/lib/python3.7/site-packages/numba/cuda
/home/kailukowiak/.julia/packages/CUDAnative/gJDZI/src/device/cuda
/home/kailukowiak/.julia/packages/Flux/qXNjB/src/cuda
/home/kailukowiak/.julia/packages/Flux/qXNjB/test/cuda
If I run the following code chunks in different Julia sessions
using CuArrays
using Flux
using Flux
using CuArrays
I get:
┌ Warning: CUDNN is not installed, some functionality will not be available.
└ # Flux.CUDA ~/.julia/packages/Flux/qXNjB/src/cuda/cuda.jl:35
However, the error is associated with different packages depending on the order.
Errors Depending on Order
Does anybody have any ideas for what I could try?
Thanks

So after quite a bit of hair pulling, I was able to get it working by adding export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64 to my .bashrc file.
I believe that julia/Flux/CuArrays was just unbale to find the CUDNN Toolkit.

Related

Unable to locate conda environment after creating it using .yml file

I'm trying to create (and activate & use) a Conda environment using a .yml file (in fact, I'm following instructions on this GitHub page: https://github.com/RajLabMSSM/echolocatoR). I'm working in a cluster computing system running Linux.
conda env create -f https://github.com/RajLabMSSM/echolocatoR/raw/master/inst/conda/echoR.yml
After running the above line of code, I'm trying to activate the environment:
conda activate echoR
However, this returns the following message:
Could not find conda environment: echoR
You can list all discoverable environments with conda info --envs.
When checking the list of environments in .conda/environments.txt, the echoR environment is indeed not listed.
I'm hoping for some suggestions of what might be the issue here.
Likely Cause: Out of Memory
Given the HPC context, the solver is likely trying to exceed the allocated memory and getting killed. The Python-based Conda solver is not very resource efficient and can struggle with large environments. This particular environment is quite large, given it mixes both Python and R, and it doesn't give exact specifications for R and Python versions - only lower bounds - which makes the SAT search space enormous.
Profiling Memory ( )
I attempted to use a GitHub Workflow to profile the memory usage. Using Mamba, it solved without issue; using Conda, the job was killed because the GitHub runner ran out of memory (7GB max). The breakdown was:
Tool
Memory (MB)
User Time (s)
Mamba
745
195.45
Conda
> 6,434
> 453.34
Workarounds
Use Mamba
As a drop-in replacement for Conda that is compiled, Mamba is much more resource efficient. Also, it has seen welcome adoption in the bioinformatics community (e.g., it is default frontend for Snakemake).
As the GitHub workflow demonstrates, the Mamba-based creation works perfectly fine with the YAML as is.
Request more memory
Ask SLURM/SGE for more memory for your interactive session. Conda seems to need more that 6.5 GB (maybe try 16GB?).
Create a better YAML
The first thing one could do to get a faster solve is provide exact versions for the Python and R. The Mamba solution resolved to python=3.9 r-base=4.0.
There's also a bunch of development-level stuff in the environment that is completely unnecessary for end-users. But that's more something to bother the developers about.

Tensorflow (in R) cannot find libcudart.so.11.0

I've spent an entire day on StackOverflow and the Tensorflow Github issues page trying to fix this, but still can't seem to resolve the problem, suggesting there's something obvious I'm missing.
I'm trying to get Tensorflow with GPU support running within RStudio on Ubuntu LTS 20.04. I've followed the RStudio installation instructions with Nvidia drivers 470, CUDA 11.2, and cuDDN 8.1.0, updated ~/.bashrc to point at
export CUDA_HOME=/usr/local/cuda
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH=/usr/lib/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/cuda/include:$LD_LIBRARY_PATH
and installed TF with GPU support via install_keras(tensorflow = "gpu"), yet still getting the dreaded error:
W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/nickopotamus/.local/share/r-miniconda/envs/r-reticulate/lib:/usr/lib/R/lib::/lib:/usr/lib/x86_64-linux-gnu:/usr/lib/jvm/default-java/lib/server
sudo find / -name 'libcudart.so.11.0' finds the file in:
/home/nickopotamus/.local/share/r-miniconda/envs/r-reticulate/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.3.1-h2bc3f7f_2/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.2.0-h73cb219_8/lib/libcudart.so.11.0
/home/nickopotamus/anaconda3/pkgs/cudatoolkit-11.2.72-h2bc3f7f_0/lib/libcudart.so.11.0
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudart.so.11.0
The top entry at least appears to be in the path that the error is searching, so I'm at a bit of a loss as to what to try next. Is it a conflict with the other anaconda packages (which I can't seem to remove), or am I simply being oblivious to a quick-fix?
Edit: Solved by using the NVIDIA Data Science Stack to install everything from scratch.
Not a particularly satisfying answer, but solved by removing all NVIDIA packages and installing the NVIDIA Data Science Stack from scratch, then using its Conda environment to run keras in R.

How to check which dependency has errored while building a package?

I am a new Julia user and am building my own package. When I build a package I created on my own I get this
(VFitApproximation) pkg> build VFitApproximation
Building MathLink → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/653c35640db592dfddd8c46dbb43d561cbef7862/build.log`
Building Conda ───→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/299304989a5e6473d985212c28928899c74e9421/build.log`
Building PyCall ──→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/169bb8ea6b1b143c5cf57df6d34d022a7b60c6db/build.log`
Progress [========================================>] 4/4
✗ VFitApproximation
3 dependencies successfully precompiled in 9 seconds (38 already precompiled)
1 dependency errored
I don't know which dependency has errored and how to fix that. How do I ask Julia what has errored?
It's VFitApproximation itself that has the error (it's the only one with an ✗ next to it). You should try starting a session and typing using VFitApproximation; if that causes an error, the message will tell you much more about the origin than build. If that doesn't directly trigger an error, then you can try the verbose mode of build as suggested by #sundar above. Julia's package manager runs a lot of its system-wide operations in parallel, which is wonderful when you have to build dozens or hundreds of packages, but under those circumstances you only get general summaries rather than the level of detail you can get from operations focused on a single package.
More generally, most packages don't need manual build: it's typically used for packages that require special configuration at the time of installation. Examples might include downloading a data set from the internet (though this now better handled through Artifacts), or saving configuration data about the user's hardware to a file, etc. For reference, on my system, out of 418 packages I have deved, only 20 have deps/build.jl scripts, and many of those only because they haven't yet been updated to use Artifacts.
Bottom line: for most code, you never need Pkg.build, and you should just use Pkg.precompile or using directly.

Can not use Elixir in jupyter notebook

Question Summary
I'm trying to use Elixir in jupyter notebook but IElixir doesn't work.
Does somebody know how to solve below error & use Elixir in jupyter notebook?
Environments
OS Ubuntu ver.“18.04.3 LTS (Bionic Beaver)”
CPU Intel Core i7-7700HQ 2.80GHz
RAM 16GB
GPU NVIDIA GeForce GTX 1600 Mobile
Version Information
anaconda 4.7.12
jupyter 1.0.0
elixir 1.9.4
What I did & stacking points
I refer to IElixir github comments to built a deverop environment.
https://github.com/pprzetacznik/IElixir
Progressed Situation
Repository clone (done)
git clone https://github.com/pprzetacznik/IElixir.git
cd IElixir
Built IElixir (Stacking Point)
mix deps.get
mix test
MIX_ENV=prod mix compile
Error was happen in mix test
kojiro#Inspiron7577:~/IElixir$ mix test
===> Compiling esqlite
===> Compiling /home/kojiro/IElixir/deps/esqlite/c_src/esqlite3_nif.c
===> /home/kojiro/IElixir/deps/esqlite/c_src/esqlite3_nif.c:25:10: fatal error: sqlite3.h: ãã®ãããªãã¡ã¤ã«ã ãã£ã¬ã¯ããªã¯ããã¾ãã #include “sqlite3.h” ^~~~~~~~~~~ compilation terminated.
** (Mix) Could not compile dependency :esqlite, “/home/kojiro/.mix/rebar3 bare compile --paths=”/home/kojiro/IElixir/_build/test/lib/*/ebin"" command failed. You can recompile this dependency with “mix deps.compile esqlite”, update it with “mix deps.update esqlite” or clean it with “mix deps.clean esqlite”
first time, I thought error caused by esqlite could not compile in Mix.
so I tried all of error commented command but error doesn't solved.
What I did to solve this error
I follow to IElixir github comment.
use mix local.rebar --force
add ~/.mix/ to PATH then try mix test again.
(base) kojiro#Inspiron7577:~/IElixir$ export PATH="$PATH:~/.mix/"
but error doensn't solved.
I already tried to search "Could not compile dependency :esqlite" but all environmentl situation is different.
Does someone know how to solve this situation?
I am not familiar with this problem, however I found some resources that might help you get Elixir working in jupyter (if ever you didn't already found them !) :
This medium post take you step by step through the installation, and also mention that is a Docker image that exists for it.
you can also find an installation tutorial with Docker here (different from the above mentioned one)
Hope it helps.

compilation error building an old version of R

In order to use a specific library that has not been updated for some time, I want to use an older version of R (2.3.1), under linux Mint 14.
I got the source file, installed the required library; checking with :
apt-cache showsrc r-base | grep Build-Depends
and issued, as indicated in the R-admin help page, the command:
./configue
that ended without error; then
make
that terminated with the following error message:
In file included from datetime.c:95:0:
Rstrptime.h:201:12: erreur: conflicting types for ‘wcsncasecmp’
In file included from ../../src/include/Defn.h:928:0,
from datetime.c:58:
/usr/include/wchar.h:172:12: note: previous declaration of ‘wcsncasecmp’ was here
Does anyone know what trigered that error (conflicting type between files datetime.c and wchar.h, if I understand well), and how I could keep compiling past this error.
Thanks in advance for your help.
The problem is that R 2.3.1 is very old, and was developed with the old C libraries in mind. With a recent linux install, you have the new C libraries which might not work well with your old R version. What you could do:
Install an old version of linux from around the time of the R version, for example in a virtual machine.
Port the package to the new version of R yourself.
The second option takes more time, but will make the work you base on the package more future proof.

Resources