I want to build a Docker container with airflow. The app requires geospatial packages like Geopandas. When trying to build the Docker Image it fails when trying to install Fiona, it says "
FileNotFoundError: [Errno 2] No such file or directory: 'gdal-config': 'gdal-config'
. I don't know exacly how to prcoeed further. As I don't have conda installed in prod enviornment so I need to install geopanda using pip only.
Below is docker file part:
COPY requirements.txt .
RUN pip install --user -r requirements.txt
Below is requirements.txt
apache-airflow[crypto,celery,postgres,jdbc,mysql,s3,password]==1.10.12
werkzeug<1.0.0
pytz
pyOpenSSL
ndg-httpsclient
gspread
oauth2client
pyasn1
boto3
airtable
numpy
scipy
slackclient
area
google-api-python-client
sqlalchemy
pandas
celery[redis]==4.1.1
analytics-python
networkx
zenpy==2.0.22
pyarrow
google-auth
six==1.13.0
geopandas
I tried to install required package seprately in requirements.txt along with GDAL that is also failing with same error. I want to run a DAG which is using geopandas library running on docker
When installing packages into a docker environment, there is nothing that makes this different from any other local environment, other than maybe the desire to speed up the build. So I'll answer this to highlight a faster option, but any other question which deals with installing geopandas is relevant here.
I'd give the geopandas installation guide a close read. It includes multiple warnings about the issue you're facing. The recommended way to install geopandas is with conda. You cannot install geopandas with pip without manually installing the dependencies, some of which cannot be installed with pip. So you can do this, but simply calling pip install geopandas won't get you there.
I'd recommend using miniforge, or especially since you're building a docker container, mambaforge, it's faster compiled cousin. mamba is a significantly faster drop-in replacement for conda written to build environments in parallel, but tends to crash harder with worse error messages. It's definitely worth the speedup when working with docker containers in my opinion, but if you're struggling to debug something you can always fall back to conda, which comes installed with mamba.
Don't install Anaconda, which includes conda along with a huge number of packages installed from the defaults channel bundled into your base environment, as it will cause a mix and match of channels. Generally, you should keep your base env clean, without any pacakges except those which explicitly manage channels themselves, such as an IDE. Instead, by using miniforge or mambaforge, you'll use the conda-forge channel by default.
To install mambaforge and then create a new geopandas environment:
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
# install whatever env you'd like here. try to build it in one command
# rather than iteratively installing dependencies
mamba create -n mynewenv -c conda-forge python=3.10 geopandas [other packages]
Related
I have been trying to run RStudio Quarto script in a fresh Ubuntu 20.04 installation but got into some trouble. Some Python packages that are required to run the simple hello.qmd were not there. I was getting these errors:
MoudleNotFoundError: No module named 'nbclient'
and a second error:
ModuleNotFoundError: No module named 'matplotlib_inline'
The first error was due to I had install the nbclient package. My default Python installation is python2.7. Quarto will not run well with Python 2.7; we should try with 3.7+. If your Linux doesn't come with it by default, this can easily be addressed by installing another Python version and configuring multiple Python versions with the help of the command:
sudo update-alternatives --config python
If no Python version shows up, then it means you have first to configure all your installed Python versions. This is very well explained at https://www.rosehosting.com/blog/how-to-install-and-switch-python-versions-on-ubuntu-20-04/
Once you have configured all your Python versions, every time that you run
sudo update-alternatives --config python, you will be prompted to enter the Python version you want as default. If you have a fresh Ubuntu 20.04, most likely that you have two: Python 2.7 and Python 3.8. Select 3.8 and you will fine. Quarto won't work with Python 2.7.
After you have python3 running and switched into, install nbclient with:
pip install nbclient.
The first error will now pass, but most likely you will get now
ModuleNotFoundError: No module named 'matplotlib_inline'. This is because you also need to install the package matplotlib-inline. This is not documented in the installation instructions of Quarto. But easy to fix. Run:
pip install matplotlib-inline
Now, go back to your VS Code, open the command palette and run Quarto: Render, or just type from the terminal:
quarto preview hello.qmd --no-browser --no-watch-inputs
You are done!
I am trying yo install to make an RNA-seq analysis from raw-data (fastq extension) and I am trying to install kb-python by running the next lines:
conda create -y --name kb python=3.8 #create an environment, specifying python v3.8
conda activate kb #activate that newly created environment
pip install kb-python #install kb-python in the environment. Note: if this fails because of an issue with pysam, then do 'conda install pysam' then retry this line.
When I run the last line it takes many many hours (more than one day. I tried it with two PCs, one of 4GB and another one of 8GB of RAM and 300GB of storage).
Because of it takes so long the PC overheats and I have to turn off it to avoid physical damage. Any suggestion? Any alternative for performing an RNA-seq analysis.
I am following the next tutorial: https://protocols.hostmicrobe.org/conda
Many thanks!
It looks like kb-python is available through Anaconda (https://anaconda.org/bioconda/kb-python), so you could try installing with the command "conda install -c bioconda kb-python" instead. I came across the same time-intensive problem using pip. This conda install worked for me.
Your commands would look like so:
conda create -y --name kb python=3.8 #create an environment, specifying python v3.8
conda activate kb #activate that newly created environment
conda install -c bioconda kb-python #install kb-python in the environment. Note: if this fails because of an issue with pysam, then do 'conda install pysam' then retry this line.
I'm trying to uninstall all django packages in my superuser environment to ensure that all my webapp dependencies are installed to my virtualenv.
sudo su
sudo pip freeze | grep -E '^django-' | xargs pip -q uninstall
But pip wants to confirm every package uninstall, and there doesn't seem to be a -y option for pip. Is there a better way to uninstall a batch of python modules? Is rm -rf .../site-packages/ a proper way to go? Is there an easy_install alternative?
Alternatively, would it be better to force pip to install all dependencies to the virtualenv rather than relying on the system python modules to meet those dependencies, e.g. pip --upgrade install, but forcing even equally old versions to be installed to override any system modules. I tried activating my virtualenv and then pip install --upgrade -r requirements.txt and that does seem to install the dependencies, even those existing in my system path, but I can't be sure if that's because my system modules were old. And man pip doesn't seem to guarantee this behavior (i.e. installing the same version of a package that already exists in the system site-packages).
starting with pip version 7.1.2 you can run pip uninstall -y <python package(s)>
pip uninstall -y package1 package2 package3
or from file
pip uninstall -y -r requirements.txt
Pip does NOT include a --yes option (as of pip version 1.3.1).
WORKAROUND: pipe yes to it!
$ sudo ls # enter pw so not prompted again
$ /usr/bin/yes | sudo pip uninstall pymongo
If you want to uninstall every package from requirements.txt,
pip uninstall -y -r requirements.txt
on www.saturncloud.io, Jupiter notebooks one can use like this:
!yes | pip uninstall tensorflow
!yes | pip uninstall gast
!yes | pip uninstall tensorflow-probability
Alternatively, would it be better to force pip to install all dependencies to the virtualenv rather than relying on the system python modules to meet those dependencies,
Yes. Don't mess too much with the inbuilt system installed packages. Many of the system packages, particularly in OS X (even the debian and the derived varieties) depend too much on them.
pip --upgrade install, but forcing even equally old versions to be installed to override any system modules.
It should not be a big deal if there are a few more packages installed within the venv that are already there in the system package, particularly if they are of different version. Thats the whole point of virtualenv.
I tried activating my virtualenv and then pip install --upgrade -r requirements.txt and that does seem to install the dependencies, even those existing in my system path, but I can't be sure if that's because my system modules were old. And man pip doesn't seem to guarantee this behavior (i.e. installing the same version of a package that already exists in the system site-packages).
No, it doesn't install the packages already there in the main installation unless you have used the --no-site-packages flag to create it, or the required and present versions are different..
Lakshman Prasad was right, pip --upgrade and/or virtualenv --no-site-packages is the way to go. Uninstalling the system-wide python modules is bad.
The --upgrade option to pip does install required modules in the virtual env, even if they already exist in the system environment, and even if the required version or latest available version is the same as the system version.
pip --upgrade install
And, using the --no-site-packages option when creating the virtual environment ensures that missing dependencies can't possibly be masked by the presence of missing modules in the system path. This helps expose problems during migration of a module from one package to another, e.g. pinax.apps.groups -> django-groups, especially when the problem is with load templatetags statements in django which search all available modules for templatetags directories and the tag definitions within.
pip install -U xxxx
can bypass confirm
I have successfully installed the Caffe on Ubuntu 18.* using
`sudo apt-get install caffe-cpu`
The which caffe returns /usr/bin/caffe
I am successfully able to run caffe command on terminal, but problem is running the test files, as they are link to build directories of caffe and the problem is if I manually get the github repository of caffe and make build, it keeps failing and some of the dependencies candidates dont't have an installation candidate on Ubuntu 18.
Also all the examples on net available are for the previous type of manually built caffe
Normally, it will use with the default python in /usr/bin/python3. You can check
/usr/bin/python3
>>> import caffe
With the python in /usr/bin/python3, it does not require to add any addtional PYTHONPATH
I am trying to install numpy on a remote host where I have no admin rights. I have sucessfully installed Python 2.7 and pip inside a virtualenv ,and can use pip to install trivial things like pip install Markdown. But if I pip install numpy or scipy, it errors on SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. I do not have rights to sudo apt-get or apt-get, so can not do sudo apt-get install python27-devel or sudo apt-get install python-devel. I wanted to build from source so that I could use the option --user but the source is a .deb file and building it requires even more things I have to apt-get. I tried contacted the admin but I am advised to keep my own installations in my own local environment. What should I do?
The OS system is Ubuntu 14.04 LTS.
The reason for the admins answer is simple. Ubuntu also uses python for internal scripts. So the admin will not update or change the python installation if you need a more recent version of a package.
This is what I would try:
Compile source of python 2.7 yourself and install it in your preferred path in your home directory. This way you always have all needed headers. Put the interpreter into your PATH.
(Optional) Set PYTHONUSERHOME to your local python site packages
Install virtualenv package via pip
Setup virtualenv envirnoment for numpy etc...
(Optional) 4. Build Blas libraries e.g. OpenBlas in your home
Install cython in virtualenv ... and probably some more packages needed for numpy
Install numpy scipy in the virtualenv with the correct settings BLAS libraries settings
If you use your own python installation, the virtualenv is not really necessary. So you might want to omit that. You just need to make sure that your python interpreter is always first to be found.