Install a R package permanently in Google Colab - r

I am using the -idefix- R package and I do not want to install it everytime I log into Google Colab. Is there any way of installing it permanently? Will it also be installed for other people if I share the notebook.
Thank you :)

Like what you could do in a local computer, copy the source local R library to the target location. See some instruction in this blog ( atusy.net )
Here are two CoLab notebooks to reproduce the import and export R library.
CoLab Notebook export local library
CoLab Notebook import local library
Here are some minimal snippets in this I/O process.
Open a CoLab notebook in Python,
# activate R magic
%load_ext rpy2.ipython
Make the notebook available for R.
%%R
install.packages('tidymodels')
tar("library.tar.gz", "/usr/local/lib/R/site-library")
Install the package tidymodels, and zip your library with installed packages.
from google.colab import drive
drive.mount('/content/drive')
Connect your google drive and make a copy for use in the future.
%cp library.tar.gz drive/MyDrive/src/
drive/MyDrive/src/ is the path I choose, you can use another.
Next, you use this library in another or new notebook.
from google.colab import drive
drive.mount('/content/drive')
Connect your Google Drive.
%cp drive/MyDrive/src/library.tar.gz .
Copy it in your working directory.
!tar xf library.tar.gz
Extract the installed packages from the zipped file.
.libPaths('usr/local/lib/R/site-library/')
update the Library path and put it at the top ranking.
library(tidymodels)
Check, this package is of reuse

As far as I understand it, each virtual machine is recycled after you close the browser window or the session is longer than 12 hours. There is no possibility to install packages in a way that you can access them without installing them again (to the best of my knowledge).

Related

How to install libs for R arrow package on ubuntu without internet?

I am working on Azure databricks and it's compute server is Ubuntu 18.04. I want to install arrow R package but without internet access because of security reasons. I downloaded arrow tar file on my MacBook that has internet access and made it available in ubuntu for manual installation. I performed following steps:
Re-installed build-essential by downloading it from this link and uploaded to ubuntu and executed following bash command to make it available: sudo dpkg -i /dbfs/FileStore/tables/build_essential_12_4ubuntu1_amd64.deb
Installed cpp11 as it is dependency mentioned on cran: R CMD INSTALL /dbfs/FileStore/tables/arrow_dir/cpp11_0_3_1.tar.gz
Downloaded arrow_4.0.1.tar.gz from here and made it available on ubuntu.
Here I see required C++ dependencies to be available on ubuntu before installing the arrow package. How can I install these dependencies without access to internet?
Thanks for reading my question.
Note: A solution is suggested below and after execution of ./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty I get:
# Environment variables for offline Arrow build
export ARROW_ABSL_URL=/root/arrow-thirdparty/absl-0f3bb466b868b523cf1dc9b2aaaed65c77b28862.tar.gz
export ARROW_AWSSDK_URL=/root/arrow-thirdparty/aws-sdk-cpp-1.8.133.tar.gz
export ARROW_AWS_CHECKSUMS_URL=/root/arrow-thirdparty/aws-checksums-v0.1.10
export ARROW_AWS_C_COMMON_URL=/root/arrow-thirdparty/aws-c-common-v0.5.10.tar.gz
export ARROW_AWS_C_EVENT_STREAM_URL=/root/arrow-thirdparty/aws-c-event-stream-v0.1.5
export ARROW_BOOST_URL=/root/arrow-thirdparty/boost-1.75.0.tar.gz
export ARROW_BROTLI_URL=/root/arrow-thirdparty/brotli-v1.0.9.tar.gz
export ARROW_BZIP2_URL=/root/arrow-thirdparty/bzip2-1.0.8.tar.gz
export ARROW_CARES_URL=/root/arrow-thirdparty/cares-1.17.1.tar.gz
export ARROW_GBENCHMARK_URL=/root/arrow-thirdparty/gbenchmark-v1.5.2.tar.gz
export ARROW_GFLAGS_URL=/root/arrow-thirdparty/gflags-v2.2.2.tar.gz
export ARROW_GLOG_URL=/root/arrow-thirdparty/glog-v0.4.0.tar.gz
export ARROW_GRPC_URL=/root/arrow-thirdparty/grpc-v1.35.0.tar.gz
export ARROW_GTEST_URL=/root/arrow-thirdparty/gtest-1.10.0.tar.gz
export ARROW_JEMALLOC_URL=/root/arrow-thirdparty/jemalloc-5.2.1.tar.bz2
export ARROW_LZ4_URL=/root/arrow-thirdparty/lz4-v1.9.3.tar.gz
export ARROW_MIMALLOC_URL=/root/arrow-thirdparty/mimalloc-v1.7.2.tar.gz
export ARROW_ORC_URL=/root/arrow-thirdparty/orc-1.6.6.tar.gz
Failed downloading https://github.com/google/protobuf/releases/download/v3.14.0/protobuf-all-3.14.0.tar.gz
Would it help to use the script mentioned in the link below to download the dependencies and put them somewhere you can then install them from?
There's some instructions here: https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
I've pasted them below in case the link expires, but you may want to check it for the most up to date version of these instructions.
To enable offline builds, you can download the source artifacts yourself and use environment variables of the form ARROW_$LIBRARY_URL to direct the build system to read from a local file rather than accessing the internet.
To make this easier for you, we have prepared a script thirdparty/download_dependencies.sh which will download the correct version of each dependency to a directory of your choosing. It will print a list of bash-style environment variable statements at the end to use for your build script.
# Download tarballs into $HOME/arrow-thirdparty
$ ./thirdparty/download_dependencies.sh $HOME/arrow-thirdparty
You can then invoke CMake to create the build directory and it will use the declared environment variable pointing to downloaded archives instead of downloading them (one for each build dir!).
Starting in arrow 6.0.0, the package should successfully install from source when offline. It will have only basic features: you'll be able to work with Arrow data and feather files, but features like Parquet reading, S3, and compression libraries won't be available. There is also a new utility function, create_package_with_all_dependencies(), that you can run on a machine connected to the internet in order to produce a "fat" source package containing all third-party C++ dependencies. You can then copy this to your airgapped server. See https://arrow.apache.org/docs/r/reference/create_package_with_all_dependencies.html for details.

How to open a local Jupyter Notebook directly on Google Colab without going for the extra step of uploading it on Google Drive?

I would like to load my jupyter notebook from my local repo directly on Colab, without uploading it first on GDrive. I'm well aware of the possibility to open Github notebooks directly on Colab, but that's not what I'm trying to make, since I would like to directly open notebooks from my local repo and then have the freedom to commit and push whenever I want. Is this possible or do I have to just give up and upload a copy on GDrive and then download the .ipynb everytime I want to save locally?
You can use my library to open Jupyter Notebook and open any local notebook in the same machine instance.
!pip install kora -q
from kora import jupyter
jupyter.start()
Then click the shown link. It should list local ipynb files. You can then click them to open and edit the files directly.

Jupyterlab extension install offline

Can Jupyterlab extensions be downloaded and installed later on offline?
If so, how?
In the documentation it says:
You can also install an extension that is not uploaded to npm, i.e., my-extension can be a local directory containing the extension, a gzipped tarball, or a URL to a gzipped tarball.
However, I can't seem to figure out how to do this.
How?
This github issue details the following solution. I tried it and it worked on the Bokeh jupyter lab extension.
outside, do a jupyter lab install of all extensions of interest
copy $PREFIX/share/jupyter/lab/static from the outside machine onto a shared/thumb drive
inside, overwrite/create that same folder
Make sure jupyterlab isn't running when you replace the static folder.
This is a temporary solution, and a more active discussion of the plans to make this process easier can be found here

How to install julia packages offline

I'd like to use Julia on a computer which is disconnected from the Internet.
Is there simple procedure to download a package and then install it offline?
Surely, its possible.
Pkg.dir() # => get you the package installation path
check the pkg.julialang.org/ address to get the right package and click on its github link, then you can download a zip archive from github.com and extract it into Pkg.dir()
BUT you may taking yourself into trouble
because you must do many optional things manually, e.g.:
rename folder to remove .jl
build steps
install all related packages
I think a better way is to install Pkgs on a connected machine and then copy Pkg.dir() contents from that machine, to your system. this approach would works well only if both machines are of the same architecture (cpuX os julia-version).

How to install stringi from local file (ABSOLUTELY no Internet Access)

I am working on a remote server using RStudio. This server has no access to the Internet. I would like to install the package "stringi." I have looked at this stackoverflow article, but whenever I use the command
install.packages("stringi_0.5-5.tar.gz",
configure.vars="ICUDT_DIR=/my/directory/for/icudt.zip")
It simply tries to access the Internet, which it cannot do. Up until now I have been using Tools -> Install Packages -> Install from Packaged Archive File. However, due to this error, I can no longer use this method.
How can I install this package?
If you have no internet access on local machines, you can build a distributable source package that includes all the required
ICU data files (for off-line use) by omitting some relevant lines in
the .Rbuildignore file. The following command sequence should do the trick:
wget https://github.com/gagolews/stringi/archive/master.zip -O stringi.zip
unzip stringi.zip
sed -i '/\/icu..\/data/d' stringi-master/.Rbuildignore
R CMD build stringi-master
Assuming the most recent development version is 1.3.1,
a file named stringi_1.3.1.tar.gz is created in the current working directory.
The package can now be installed (the source bundle may be propagated via
scp etc.) by executing:
R CMD INSTALL stringi_1.3.1.tar.gz
or by calling install.packages("stringi_1.3.1.tar.gz", repos=NULL),
from within an R session.
For a Linux machine the easiest way is from my point of view:
Download the release you need from Rexamine in tar.gz format to your local pc. In opposition to the version on CRAN it already contains the icu55\data\ folder.
Move the archive to your target linux machine without internet access
run R CMD INSTALL stringi-1.0-1.tar.gz (in case of release 1.0-1)
You provided the wrong value of configure.vars.
It indicates that you have to give the directory's name, not a final file name.
Correct your code to the following:
install.packages("stringi_0.5-5.tar.gz",
configure.vars="ICUDT_DIR=/my/directory/for/")
Regards,
Sean
Follow the steps below
Download icudt55l.zip seperately from server where you have internet access with
wget http://www.mini.pw.edu.pl/~gagolews/stringi/icudt55l.zip
Copy the downloaded packages to the server where you want to install stringi
Execute the following command
R CMD INSTALL --configure-vars='ICUDT_DIR=/tmp/ALL' stringi_1.1.6.tar.gz
icudt55l.zip is copied to /tmp/ALL
The suggestion from #gagolews almost worked for me. Here's what actually did the trick with RStudio.
Download the master.zip file that will save as stringi-master.zip.
Unzip the file onto your desktop. The unzipped folder should be stringi-master.
Edit the .Rbuildignore file by removing ^src/icu55/data and ^src/icu61/data or similar lines.
Move the folder from your desktop to the home directory of your server.
Create a New Project in RStudio with ~/stringi-master as the Existing Directory
From RStudio's menu, select Build and Build Source Package. (You may need to first select Configure Build Tools. For Project build tools choose Package then select OK.)
It should create a tar.gz file, in the following format: stringi_x.x.(x+1).tar.gz. For example, if the current version of stringi is 1.5.3, it will create version 1.5.4. (I received a few warnings that didn't seem to affect the outcome.)
Move the newly created package to your local repository. Update the repository index. And install the package.

Resources