I'm trying to figure out how Julia packages work since I like having containerized environments. I'm really struggling with it.
In python, I'd do something like conda create env --name ds to make an environment and then to install containerized packages I'd use conda activate ds; conda install <packages>.
I'm not having much success trying to get Julia to make a virtual environment.
From the Julia REPL I can type ] to go to package managers then I can create an enviornment with activate ds. From here I can add important packages add IJulia DataFrames Plots
At this point, my environment becomes actual folders which is good.
What I then don't know how to do is to activate my environment so that I can then run using IJulia; notebook()
From the REPL if I type activate ds it doesn't know what I'm talking about, even if I do cd("ds"); activate . it still doesn't know what I'm trying to do...
I looked at the docs and it seems to detail out how to manipulate packages but I haven't found anything helpful for actually running them.
You have to write activate ds (or activate . if you are already in the ds directory) in the package manager mode that is started with ] as you have commented.
Alternatively you can activate environments when you start Julia. Just write
julia --project=.
(if you are already in the ds directory).
Here https://github.com/bkamins/PyDataGlobal2020 you have a step by step example how to run things for a sample project.
A third option is to activate the environment via the package manager API e.g. like this
using Pkg
Pkg.activate(".")
Related
The Julia Pkg documentation tells how to initiate an environment ($ activate $ENV-NAME), but it probably lacks the handy command to switch to the already created special environment. Also, I'm having trouble finding a command that shows all already created environments on the list, hence, if I have forgotten the names of the environments previously created, I need to do a manual search through the Julia-related folders...
So far, the verbatim help command in Julia REPL provides a poor description and so does the related Pkg-documentation webpage.
Another possible general answer to this predicament is to start using the Playground.jl module, which was recommended here on Medium:
However, the direct download attempt with Pkg repeatedly fails since the Pkg isn't able to find the package in the suggested GH project.
Thanks beforehand for any recommendations.
In package manager prompt just type activate # and press tab-key. The REPLs autocomplete will show you the possible environments.
If you are on a Mac or Linux
you can run this shell command to find all the "enviroment"
bash-3.2$ pwd
/Users/ssiew/juliascript
bash-3.2$ find . -name Project.toml
./Luxordir/Project.toml
./symata/Project.toml
I often ]dev Pkg but I want the devved packaged to be stored somewhere other than the default location for convenient access.
I don't want to change the path of the ]add Pkg. This seems to be controlled by the environment parameter DEPOT_PATH.
Is there a way to change only the path for dev Pkg, i.e. the path in which the dev package is stored?
You can set the environment variable JULIA_PKG_DEVDIR to change where development packages are installed. See the develop docs for more info.
As #crstnbr noted, an alternative is to use the --local option to the pkg> dev command to install a development version of the package in a dev directory within the current project. This could make sense if you're developing your own package MyCode.jl which relies on Example.jl and you need to make a hot fix to Example.jl. Then your Pkg REPL command would look like this:
(MyCode) pkg> dev --local Example
If you would like to make changes to a third-party package and submit those changes as a pull request on Github, there are a few more steps in the process. See this Discourse thread for more details on that process.
Not quite what you're asking for but you can of course always git clone the package to a path of your choice and then dev path/to/the/local/clone/of/the/pkg.
You can even do this from within julia:
using Pkg
Pkg.GitTools.clone("<pkg url>", "<local path>")
Pkg.develop(PackageSpec(path="<local path>"))
Is there a Python-like virtualenv environment simulator for Julia where one can do development in a local, virtual environment?
Currently (julia 1.2) is able to manage virual environments via it's builtin Pkg standard library module:
https://docs.julialang.org/en/v1/stdlib/Pkg
julia> ]
(v1.2) pkg> activate tutorial
[ Info: activating new environment at `/tmp/tutorial/Project.toml`.
(tutorial) pkg>
(tutorial) pkg> status
Status `/tmp/tutorial/Project.toml`
(empty environment)
(tutorial) pkg> add Example
...
(tutorial) pkg> status
Status `/tmp/tutorial/Project.toml`
[7876af07] Example v0.5.1
There is Playground.jl
A package for managing julia sandboxes like python's virtualenv (with a little influence from pyenv and virtualenvwrapper)
Nowadays Julia has this kind of thing built into it's package manager, they're called environments and it's described here. It boils down to this, enter the package mangement repl by hitting ], then the command activate $dir switches to the environment described in $dir, then use the instantiate command to install the packages described in the environment.
What's wrong with just having a separate Julia installation in another directory? Then you just need to set the JULIA_PKGDIR environment variable appropriately, for the Julia setup you want to run.
I want to call R function in scala script on databricks. Is there anyway that we can do it?
I use
JVMR_JAR=$(R --slave -e 'library("jvmr"); cat(.jvmr.jar)')
scalac -cp "$JVMR_JAR"
scala -cp ".:$JVMR_JAR"
on my mac and it automatically open a scala which can call R functions.
Is there any way I can do similar stuff on databricks?
On the DataBricks Cloud, you can use the sbt-databricks to deploy external libraries to the cloud and attach them to specific clusters, which are two necessary steps to make sure jvmr is available to the machines you're calling this on.
See the plugin's github README and the blog post.
If those resources don't suffice, perhaps you should ask your questions to Databricks' support.
If you want to call an R function in the scala notebook, you can use the %r shortcut.
df.registerTempTable("temp_table_scores")
Create a new cell, then use:
%r
scores <- table(sqlContext, "temp_table_scores")
local_df <- collect(scores)
someFunc(local_df)
If you want to pass the data back into the environment, you can save it to S3 or register it as a temporary table.
I've found several posts about best practice, reproducibility and workflow in R, for example:
How to increase longer term reproducibility of research (particularly using R and Sweave)
Complete substantive examples of reproducible research using R
One of the major preoccupations is ensuring portability of code, in the sense that moving it to a new machine (possibly running a different OS) is relatively straightforward and gives the same results.
Coming from a Python background, I'm used to the concept of a virtual environment. When coupled with a simple list of required packages, this goes some way to ensuring that the installed packages and libraries are available on any machine without too much fuss. Sure, it's no guarantee - different OSes have their own foibles and peculiarities - but it gets you 95% of the way there.
Does such a thing exist within R? Even if it's not as sophisticated. For example simply maintaining a plain text list of required packages and a script that will install any that are missing?
I'm about to start using R in earnest for the first time, probably in conjunction with Sweave, and would ideally like to start in the best way possible! Thanks for your thoughts.
I'm going to use the comment posted by #cboettig in order to resolve this question.
Packrat
Packrat is a dependency management system for R. Gives you three important advantages (all of them focused in your portability needs)
Isolated : Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
What's next?
Walkthrough guide: http://rstudio.github.io/packrat/walkthrough.html
Most common commands: http://rstudio.github.io/packrat/commands.html
Using Packrat with RStudio: http://rstudio.github.io/packrat/rstudio.html
Limitations and caveats: http://rstudio.github.io/packrat/limitations.html
Update: Packrat has been soft-deprecated and is now superseded by renv, so you might want to check this package instead.
The Anaconda package manager conda supports creating R environments.
conda create -n r-environment r-essentials r-base
conda activate r-environment
I have had a great experience using conda to maintain different Python installations, both user specific and several versions for the same user. I have tested R with conda and the jupyter-notebook and it works great. At least for my needs, which includes RNA-sequencing analyses using the DEseq2 and related packages, as well as data.table and dplyr. There are many bioconductor packages available in conda via bioconda and according to the comments on this SO question, it seems like install.packages() might work as well.
It looks like there is another option from RStudio devs, renv. It's available on CRAN and supersedes Packrat.
In short, you use renv::init() to initialize your project library, and use renv::snapshot() / renv::restore() to save and load the state of your library.
I prefer this option to conda r-enviroments because here everything is stored in the file renv.lock, which can be committed to a Git repo and distributed to the team.
To add to this:
Note:
1. Have Anaconda installed already
2. Assumed your working directory is "C:"
To create desired environment -> "r_environment_name"
C:\>conda create -n "r_environment_name" r-essentials r-base
To see available environments
C:\>conda info --envs
.
..
...
To activate environment
C:\>conda activate "r_environment_name"
(r_environment_name) C:\>
Launch Jupyter Notebook and let the party begins
(r_environment_name) C:\> jupyter notebook
For a similar "requirements.txt", perhaps this link will help -> Is there something like requirements.txt for R?
Check out roveR, the R container management solution. For details, see https://www.slideshare.net/DavidKunFF/ownr-technical-introduction, in particular slide 12.
To install roveR, execute the following command in R:
install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/shared", "https://lair.functionalfinances.com/repos/cran"))
To make full use of the power of roveR (including installing specific versions of packages for reproducibility), you will need access to a laiR - for CRAN, you can use our laiR instance at https://lair.ownr.io, for uploading your own packages and sharing them with your organization you will need a laiR license. You can contact us on the email address in the presentation linked above.