Kernel LDA in Julia. (trouble in package installing) - julia

I want to use Kernel LDA in julia 1.6.1.
I found the repo.
https://github.com/remusao/LDA.jl
I read READEME.md, and I typed
] add LDA
. But it does not work.
The following package names could not be resolved:
LDA (not found in project, manifest or registry
Also, I tried all of the following commands, still does not work.
add https://github.com/remusao/LDA.jl
add https://github.com/remusao/LDA.jl.git
Pkg.clone("https://github.com/remusao/LDA.jl.git")
What is the problem? How can I install LDA.jl in my julia?

The package you have linked, https://github.com/remusao/LDA.jl, has had no commits in over eight years. Among other things, it lacks a Project.toml file, which is necessary for installation in modern Julia.
Since Julia was only about one year old and at version 0.2 back in 2013 when this package last saw maintenance, the language has also changed drastically in this time such that the code in this package would likely no longer function even if you could get it to install.
If you can't find any alternative to this package for your work, forking it and upgrading it to work with modern Julia would be a nice intermediate-beginner project.

Related

Can't install YStockData.jl in Julia 1.4

I am trying to add the Yahoo Finance package YStockData.jl to Julia 1.4 without success. The package's page at JuliaObserver says
This package is not yet in the official package repository. Therefore, to install, use the following invocation Pkg.clone("https://github.com/Algocircle/YStockData.jl")
However, this fails with the following:
UndefVarError: clone not defined
(Is Pkg.clone no longer working?) So I tried
Pkg.add(PackageSpec(url="https://github.com/Algocircle/YStockData.jl"))
which caused this response:
Updating git-repo https://github.com/Algocircle/YStockData.jl
could not find project file in package at https://github.com/Algocircle/YStockData.jl
So now what? This project was last updated three years ago.
I found a similar result trying to install Quandl which can also download financial data--missing project file.
How do others download financial data with Julia?
One new option to do this in Julia is to use Alpha Vantage through the AlphaVantage.jl package.

What type of object is an R package?

Probably a pretty basic question but a friend and I tried to run str(packge_name) and R threw us an error. Now that I'm looking at it, I'm wondering if an R package is like a .zip file in that it is a collection of objects, say pictures and songs, but not a picture or song itself.
If I tried to open a zip of pictures with an image viewer, it wouldn't know what to do until I unzipped it - just like I can't call str(forecast) but I can call str(ts) once I've loaded the forecast package into my library...
Can anyone set me straight?
R packages are generally distributed as compressed bundles of files. They can either be in "binary" form which are preprocessed at a repository to compile any C or Fortran source and create the proper headers, or they can be in source form where the various required files are available to be used in the installation process, but this requires that the users have the necessary compilers and tools installed at locations where the R build process using OS system resources can get at them.
If you read the documentation for a package at CRAN you see they are distributed in set of compressed formats that vary depending on the OS-targets:
Package source: Rcpp_0.11.3.tar.gz # the Linus/UNIX targets
Windows binaries: r-devel: Rcpp_0.11.3.zip, r-release: Rcpp_0.11.3.zip, r-oldrel: Rcpp_0.11.3.zip
OS X Snow Leopard binaries: r-release: Rcpp_0.11.3.tgz, r-oldrel: Rcpp_0.11.3.tgz
OS X Mavericks binaries: r-release: Rcpp_0.11.3.tgz
Old sources: Rcpp archive # not really a file but a web link
Once installed an R package will have a specified directory structure. The DESCRIPTION file is a text file with specific entries for components that determine whether the local installation meets the dependencies of the package. There are NAMESPACE, LICENSE, and INDEX files. There are directories named '/help', '/html', '/Meta', '/R', and possibly '/libs', '/demo', '/data', '/unitTests', and others.
This is the tree at the top of the ../library/Rcpp package directory:
$ ls
CITATION NAMESPACE THANKS examples libs
DESCRIPTION NEWS.Rd announce help prompt
INDEX R discovery html skeleton
Meta README doc include unitTests
So in the "life-cycle" of a package, there will be initially a series of required and optional files, which then get processed by the BUILD and CHECK mechanisms into an installed package, which than then get compressed for distribution, and later unpacked into a specified directory tree on the users machine. See these help pages:
?.libPaths # also describes .Library()
?package.skeleton
?install.packages
?INSTALL
And of course read Writing R Extensions, a document that ships with every installation of R.
Your question is:
What type of object is an R package?
Somehow, I’m still missing an answer to this exact question. So here goes:
As far as R is concerned, an R package is not an object. That is, it’s not an object in R’s type system. R is being a bit difficult, because it allows you to write
library(pkg_name)
Without requiring you to define pkg_name anywhere prior. In contrast, other objects which you are using in R have to be defined somewhere – either by you, or by some package that’s loaded either explicitly or implicitly.
This is unfortunate, and confuses people. Therefore, when you see library(pkg_name), think
library('pkg_name')
That is, imagine the package name in quotes. This does in fact work just as expected. The fact that the code also works without quotes is a peculiarity of the library function, known as non-standard evaluation. In this case, it’s mostly an unfortunate design decision (but there are reasons).
So, to repeat the answer: a package isn’t a type of R object1. For R, it’s simply a name which refers to a known location in the file system, similar to what you’ve assumed. BondedDust’s answer goes into detail to explain that structure, so I shan’t repeat it here.
1 For super technical details, see Joshua’s and Richard’s comments below.
From R's own documentation:
Packages provide a mechanism for loading optional code, data and
documentation as needed.…A package is a directory of files which
extend R, a source package (the master files of a package), or a
tarball containing the files of a source package, or an installed
package, the result of running R CMD INSTALL on a source package. On
some platforms (notably OS X and Windows) there are also binary
packages, a zip file or tarball containing the files of an installed
package which can be unpacked rather than installing from sources. A
package is not a library.
So yes, a package is not the functions within it; it is a mechanism to have R be able to use the functions or data which comprise the package. Thus, it needs to be loaded first.
I am reading Hadley's book Advanced-R (Chapter 6.3 - functions, p.79) and this quote will cover you I think:
Every operation is a function call
“To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call."
— John Chambers
According to that using library(name_of_library) is a function call that will load the package. Every little bit that has been loaded i.e. functions or data sets are objects which you can use by calling other functions. In that sense a package is not an object in any of R's environments until it is loaded. Then you can say that it is a collection of the objects it contains and which are loaded.

Virtual environment in R?

I've found several posts about best practice, reproducibility and workflow in R, for example:
How to increase longer term reproducibility of research (particularly using R and Sweave)
Complete substantive examples of reproducible research using R
One of the major preoccupations is ensuring portability of code, in the sense that moving it to a new machine (possibly running a different OS) is relatively straightforward and gives the same results.
Coming from a Python background, I'm used to the concept of a virtual environment. When coupled with a simple list of required packages, this goes some way to ensuring that the installed packages and libraries are available on any machine without too much fuss. Sure, it's no guarantee - different OSes have their own foibles and peculiarities - but it gets you 95% of the way there.
Does such a thing exist within R? Even if it's not as sophisticated. For example simply maintaining a plain text list of required packages and a script that will install any that are missing?
I'm about to start using R in earnest for the first time, probably in conjunction with Sweave, and would ideally like to start in the best way possible! Thanks for your thoughts.
I'm going to use the comment posted by #cboettig in order to resolve this question.
Packrat
Packrat is a dependency management system for R. Gives you three important advantages (all of them focused in your portability needs)
Isolated : Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
What's next?
Walkthrough guide: http://rstudio.github.io/packrat/walkthrough.html
Most common commands: http://rstudio.github.io/packrat/commands.html
Using Packrat with RStudio: http://rstudio.github.io/packrat/rstudio.html
Limitations and caveats: http://rstudio.github.io/packrat/limitations.html
Update: Packrat has been soft-deprecated and is now superseded by renv, so you might want to check this package instead.
The Anaconda package manager conda supports creating R environments.
conda create -n r-environment r-essentials r-base
conda activate r-environment
I have had a great experience using conda to maintain different Python installations, both user specific and several versions for the same user. I have tested R with conda and the jupyter-notebook and it works great. At least for my needs, which includes RNA-sequencing analyses using the DEseq2 and related packages, as well as data.table and dplyr. There are many bioconductor packages available in conda via bioconda and according to the comments on this SO question, it seems like install.packages() might work as well.
It looks like there is another option from RStudio devs, renv. It's available on CRAN and supersedes Packrat.
In short, you use renv::init() to initialize your project library, and use renv::snapshot() / renv::restore() to save and load the state of your library.
I prefer this option to conda r-enviroments because here everything is stored in the file renv.lock, which can be committed to a Git repo and distributed to the team.
To add to this:
Note:
1. Have Anaconda installed already
2. Assumed your working directory is "C:"
To create desired environment -> "r_environment_name"
C:\>conda create -n "r_environment_name" r-essentials r-base
To see available environments
C:\>conda info --envs
.
..
...
To activate environment
C:\>conda activate "r_environment_name"
(r_environment_name) C:\>
Launch Jupyter Notebook and let the party begins
(r_environment_name) C:\> jupyter notebook
For a similar "requirements.txt", perhaps this link will help -> Is there something like requirements.txt for R?
Check out roveR, the R container management solution. For details, see https://www.slideshare.net/DavidKunFF/ownr-technical-introduction, in particular slide 12.
To install roveR, execute the following command in R:
install.packages("rover", repos = c("https://lair.functionalfinances.com/repos/shared", "https://lair.functionalfinances.com/repos/cran"))
To make full use of the power of roveR (including installing specific versions of packages for reproducibility), you will need access to a laiR - for CRAN, you can use our laiR instance at https://lair.ownr.io, for uploading your own packages and sharing them with your organization you will need a laiR license. You can contact us on the email address in the presentation linked above.

How can I get a Makefile from an existing R package?

I like the GBM package in R.
I can't get R's memory management to work with the combination of my machine/data set/task needed for reasons that have been covered elsewhere and should be considered off topic for the purposes of this question.
I would like to "rip" out the GBM algorithm away from R and rebuild it as standalone code.
Unfortunately there is no Makefile in the package tarball (or indeed any R package tarball I've seen). Is there a place I can look for straightforward Makefiles of R packages? Or do I really have to go way back to ground zero and write my own Makefile for the long painful journey ahead?
As Henry Spencer quipped: "Those who do not understand Unix are doomed to reinvent it, poorly."
R packages do not have a Makefile because R creates one on the fly when building the package, using both the defaults of the current R installation and the settings in the package, typically via a file Makevars.
Run the usual command R CMD INSTALL foo_1.2.3.tar.gz and you will see the effect of the generated Makefile as the build proceeds. Worst case you can always start by copying and pasting.
You could also take a look at CMake which can quite easily create makefiles for you. It took me minimal time to get it working for a project of mine.

Dependency management in R

Does R have a dependency management tool to facilitate project-specific dependencies? I'm looking for something akin to Java's maven, Ruby's bundler, Python's virtualenv, Node's npm, etc.
I'm aware of the "Depends" clause in the DESCRIPTION file, as well as the R_LIBS facility, but these don't seem to work in concert to provide a solution to some very common workflows.
I'd essentially like to be able to check out a project and run a single command to build and test the project. The command should install any required packages into a project-specific library without affecting the global R installation. E.g.:
my_project/.Rlibs/*
Unfortunately, Depends: within the DESCRIPTION: file is all you get for the following reasons:
R itself is reasonably cross-platform, but that means we need this to work across platforms and OSs
Encoding Depends: beyond R packages requires encoding the Depends in a portable manner across operating systems---good luck encoding even something simple such as 'a PNG graphics library' in a way that can be resolved unambiguously across systems
Windows does not have a package manager
AFAIK OS X does not have a package manager that mixes what Apple ships and what other Open Source projects provide
Even among Linux distributions, you do not get consistency: just take RStudio as an example which comes in two packages (which all provide their dependencies!) for RedHat/Fedora and Debian/Ubuntu
This is a hard problem.
The packrat package is precisely meant to achieve the following:
install any required packages into a project-specific library without affecting the global R installation
It allows installing different versions of the same packages in different project-local package libraries.
I am adding this answer even though this question is 5 years old, because this solution apparently didn't exist yet at the time the question was asked (as far as I can tell, packrat first appeared on CRAN in 2014).
Update (November 2019)
The new R package renv replaced packrat.
As a stop-gap, I've written a new rbundler package. It installs project dependencies into a project-specific subdirectory (e.g. <PROJECT>/.Rbundle), allowing the user to avoid using global libraries.
rbundler on Github
rbundler on CRAN
We've been using rbundler at Opower for a few months now and have seen a huge improvement in developer workflow, testability, and maintainability of internal packages. Combined with our internal package repository, we have been able to stabilize development of a dozen or so packages for use in production applications.
A common workflow:
Check out a project from github
cd into the project directory
Fire up R
From the R console:
library(rbundler)
bundle('.')
All dependencies will be installed into ./.Rbundle, and an .Renviron file will be created with the following contents:
R_LIBS_USER='.Rbundle'
Any R operations run from within this project directory will adhere to the project-speciic library and package dependencies. Note that, while this method uses the package DESCRIPTION to define dependencies, it needn't have an actual package structure. Thus, rbundler becomes a general tool for managing an R project, whether it be a simple script or a full-blown package.
You could use the following workflow:
1) create a script file, which contains everything you want to setup and store it in your projectd directory as e.g. projectInit.R
2) source this script from your .Rprofile (or any other file executed by R at startup) with a try statement
try(source("./projectInit.R"), silent=TRUE)
This will guarantee that even when no projectInit.R is found, R starts without error message
3) if you start R in your project directory, the projectInit.R file will be sourced if present in the directory and you are ready to go
This is from a Linux perspective, but should work in the same way under windows and Mac as well.

Resources