I have a very long R script with many if statements and exception cases. As i've been going, if been importing and testing libraries as I've gone and haven't really documented them very well. The problem is that if I run this from a clean installation, i'm not sure which statements the script will run, and so which libraries will be needed.
My question is: Is there any R function to test which libraries are being used in a script?
EDIT: I have not used all of the libraries that have been installed so print(sessionInfo()) won't be useful but and I just want to start the script with an install.packages function
I found the list.functions.in.file() function from NCmisc (install.packages("NCmisc")) quite helpful for this:
list.functions.in.file(filename, alphabetic = TRUE)
For more info see this link: https://rdrr.io/cran/NCmisc/man/list.functions.in.file.html
The ‘renv’ package provides a robust solution for this nowadays via renv::dependencies.
renv::dependencies performs proper static analysis and reliably finds package dependencies even when they are declared in non-standard ways (e.g. via box::use) or via a package DESCRIPTION file rather than via library or ::.
As a quick hack I’ve previously (pre-‘renv’) used a shell script for this:
#!/usr/bin/env bash
source_files=($(git ls-files '*.R'))
grep -hE '\b(require|library)\([\.a-zA-Z0-9]*\)' "${source_files[#]}" | \
sed '/^[[:space:]]*#/d' | \
sed -E 's/.*\(([\.a-zA-Z0-9]*)\).*/\1/' | \
sort -uf \
> DEPENDS
This uses Git to collect all R files under version control in a project. Since you should be using version control anyway this is normally a good solution (although you may want to adapt the version control system). For the few cases where the project isn’t under version control you should (1) put it under version control. Or, failing that, (2) use find . -regex '.*\.[rR]' instead of git ls-files '*.R'.
And it produces a DEPENDS file containing a very simple list of dependencies.
It only finds direct calls to library and require though – if you wrap those calls, the script won’t work.
Based on everyone's response, especially eh21's suggestion of the NCmisc package, I put together a little function that outputs a list of packages used in all your R scripts in a directory, as well as their frequencies.
library(NCmisc)
library(stringr)
library(dplyr)
checkPacks<-function(path){
## get all R files in your directory
## by the way, extract R code from Rmd: http://felixfan.github.io/extract-r-code/
files<-list.files(path)[str_detect(list.files(path), ".R$")]
## extract all functions and which package they are from
## using NCmisc::list.functions.in.file
funs<-unlist(lapply(paste0(path, "/", files), list.functions.in.file))
packs<-funs %>% names()
## "character" functions such as reactive objects in Shiny
characters<-packs[str_detect(packs, "^character")]
## user defined functions in the global environment
globals<-packs[str_detect(packs, "^.GlobalEnv")]
## functions that are in multiple packages' namespaces
multipackages<-packs[str_detect(packs, ", ")]
## get just the unique package names from multipackages
mpackages<-multipackages %>%
str_extract_all(., "[a-zA-Z0-9]+") %>%
unlist() %>%
unique()
mpackages<-mpackages[!mpackages %in% c("c", "package")]
## functions that are from single packages
packages<-packs[str_detect(packs, "package:") & !packs %in% multipackages] %>%
str_replace(., "[0-9]+$", "") %>%
str_replace(., "package:", "")
## unique packages
packages_u<-packages %>%
unique() %>%
union(., mpackages)
return(list(packs=packages_u, tb=table(packages)))
}
checkPacks("~/your/path")
I am not sure of a good way to automatize this... but what you could do is:
Open a new R console
Check with sessionInfo that you don't have extra packages loaded.
You could check this using sessionInfo. If you, by default, load extra packages (e.g. using your .RProfile file) I suggest you avoid doing that, as it's a recipe for disaster.
Normally you should only have the base packages loaded: stats, graphics, grDevices, utils, datasets, methods, and base.
You can unload any extra libraries using:
detach("package:<packageName>", unload=TRUE)
Now run the script after commenting all of the library and require calls and see which functions give an error.
To get which package is required by each function type in the console:
??<functionName>
Load the required packages and re-run steps 3-5 until satisfied.
You might want to look at the checkpoint function from Revolution Analytics on GitHub here: https://github.com/RevolutionAnalytics/checkpoint
It does some of this, and solves the problem of reproducibility. But I don't see that it can report a list of what you are using.
However if you looked a the code you probably get some ideas.
I had a similar need when I needed to convert my code into a package, thus I need to identify every package dependency and either import or use full qualified name.
In reading book Extending R I found XRtools::makeImports can scan a package and find all packages need to be imported. This doesn't solve our problem yet as it only apply to existing package, but it provided the main insight on how to do it.
I made a function and put it into my package mischelper. You can install the package, either use the RStudio addin menu to scan current file or selected code, or use command line functions. Every external function (fun_inside) and the function that called it (usage) will be listed in table.
You can now go to each function, press F1 to find which package it belongs. I actually have another package that can scan all installed packages for function names and build a database, but that may cause more false positives for this usage because if you only loaded some packages, pressing F1 only search loaded packages.
See details of the usage in my package page
https://github.com/dracodoc/mischelper
I'd trust the {renv} based solutions the most for identifying packages dependencies.
Though I wrote a package funspotr that contains similar functionality to the answers mentioning NCmisc::list.functions.in.file() and can be used for parsing the functions or packages in a file or files:
library(dplyr)
funspotr::spot_pkgs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R")
#> [1] "knitr" "magrittr" "stringr" "readr" "purrr" "glue"
funspotr::spot_funs("https://gist.githubusercontent.com/brshallo/4b8c81bc1283a9c28876f38a7ad7c517/raw/b399b768e900a381d99f5120e44d119c7fb40ab9/source_rmd.R") %>%
select(-in_multiple_pkgs)
#> # A tibble: 13 x 2
#> funs pkgs
#> <chr> <chr>
#> 1 tempfile base
#> 2 purl knitr
#> 3 getOption base
#> 4 options base
#> 5 .Call base
#> 6 source base
#> 7 library base
#> 8 read_file readr
#> 9 map purrr
#> 10 str_extract stringr
#> 11 glue glue
#> 12 str_c stringr
#> 13 write_file readr
Related
I am creating my first package, here I am making some var estimations, the functions are running, however, I use packages that have the same function names.
Before writing the package I made an R script with the function and tests if it works, but at the top of my script I used the following code:
invisible(lapply(c("tibble","readxl","dplyr","stringr", "tidyr", "vars", "conflicted","forecast", "lubridate"), library, character.only = T))
conflict_prefer("select","dplyr")
conflict_prefer("lag", "dplyr")
conflict_prefer("filter", "dplyr")
The conflicted package chose the functions select, lag, and filter comes from the dplyr package rather from the stats package.
So I have not figured out how to use the conflict_prefer function inside the package.
Should they be the first lines of my function?
There is a roxygen way to prefer same-name functions?
I ask this because I get this warning:
> devtools::load_all()
i Loading FAVAR.MEF
Warning messages:
1: replacing previous import ‘dplyr::filter’ by ‘stats::filter’ when loading ‘FAVAR.MEF’
2: replacing previous import ‘dplyr::lag’ by ‘stats::lag’ when loading ‘FAVAR.MEF’
3: replacing previous import ‘stats::filter’ by ‘dplyr::filter’ when loading ‘FAVAR.MEF’
4: In setup_ns_exports(path, export_all, export_imports) :
Objects listed as exports, but not present in namespace: favar_est
Thanks in advance!!
If you are writing your own package and using external dependencies, you should not load them through repeated calls to library.
The proper way to do it is to state your dependencies in the DECRIPTION file of your package, which will mean that your dependencies are put on the search path in the correct order when your package is loaded. In your case, this removes the need for conflict_prefer, as dplyr will be higher up on the search path than stats. It also makes your package portable, because anyone who installs your package will have any missing dependencies installed automatically according to the packages listed in your DESCRIPTION file. Furthermore, doing it this way allows you to specify a minimum version of the dependency, so that anyone who already has an older version of the dependency installed will not come up against an obscure error when they try to use your package.
The DESCRIPTION file resides in the root directory of your package. It is a simple text file.
You need only add:
Depends:
tibble,
readxl,
dplyr,
stringr,
tidyr,
vars,
conflicted,
forecast,
lubridate
within this file, and your dependencies will be loaded with your package.
I have library(rminer) installed yet wondering why mmertric is still not there and unable to use the function.
Anyone has come across this?
#probability of each landmass category
flagdmodel <- naiveBayes(landmass ~ ., data=trainfdata)
#flagdmodel
#predictionmodel
flagprediction <- predict(flagdmodel, testfdata)
mmetric(testfdata$landmass, flagprediction, c("ACC","PRECISION","TPR","F1"))
+ mmetric(testfdata$landmass, flagprediction, c("ACC","PRECISION","TPR","F1"))
mmetric()
Error in mmetric() : could not find function "mmetric"
mmetric()
Error in mmetric() : could not find function "mmetric"
Question: why can't R find functions for the package I installed with RStudio's install tool?
Answer:
When you want to use functions or other objects in packages that aren't in R, you need to do two things:
install.packages("rminer")
library(rminer)
RStudio can do the first step for you with the install tool, but you still need to do the second one. The first step installs the needed directories and files on your computer. The second step loads them into your current R environment.
In RStudio, you can use the packages tab to check both steps. Installed packages will be in the list in that tab. Loaded packages will have a checkmark to the left of the package name.
It may be easier to find, though if you just run the following in your console:
"package:rminer" %in% search()
If the output is TRUE you're good to go. If it's FALSE you need to run library(rminer)
From the vignette of tibble, I read that some changes can be made in global options through options to control the appearance of printing. However, I failed to find any manual for this options with in R. I even cannot known what fields can be added to global options for a package. So the question is:
For a package, can we get a list of fields (like tibble.print_max, tibble.print_min for tibble and BioC_mirror for utils) that can be set through options with in R before knowing them?
Given the lack of required practice (e.g., on CRAN) for how to handle options in external (or even internal, as far as I can tell) packages, perhaps the most general approach is like this:
Find the package on the CRAN mirror on GitHub. For example, here's tibble.
Search for "option" within the repository to find all references to "option" in the package's code.
Search through this. It takes a bit of a keen eye to know what to look for, but this is how I learned that all of tibble's options are listed on the main package help page (?"tibble-package"), because I found these lines with the search.
Step 3 can be automated better if you clone the repo to your machine and use command line tools, e.g.
cd package_dir
grep option R/*
(this is quite similar to the above, but enables the full flexibility of grep)
Just for additional confirmation, this approach led me to the right place for data.table and xtable as well.
The way settable options are handled is at the discretion of the package author (whether they include them in .Options, keep them hidden, etc.). It looks like the tibble package has a hidden variable op.tibble, which shows the available options.
tibble:::op.tibble
# $tibble.print_max
# [1] 20
#
# $tibble.print_min
# [1] 10
#
# $tibble.width
# NULL
#
# $tibble.max_extra_cols
# [1] 100
So the following will give you the names of available options in the package.
names(tibble:::op.tibble)
# [1] "tibble.print_max" "tibble.print_min"
# [3] "tibble.width" "tibble.max_extra_cols"
As a note, I found op.tibble by doing
grep("op", ls(getNamespace("tibble"), all=TRUE), value=TRUE)
# [1] "op.tibble" "stopc" "tibble_opt"
and then looking at those items individually. Perhaps other authors might do something similar. But there is no general rule for defining options in packages.
Is there a way to exclude a function from an imported package. For example, I use almost all of dplyr but recently, they added a new function called recode that overwrites a function that I have from a proprietary package (that I can't make changes to).
Is there a way to exclude the s3 function from the namespace so it only sees the function from my package and ignores the one from dplyr.
I'm aware that we can import one-off functions from a package with ease, but in this case, I'm looking to exclude - just one.
R 3.3.0 or later now support "import all but x,y,z from foo" statements:
\item The \code{import()} namespace directive now accepts an
argument \code{except} which names symbols to exclude from the
imports. The \code{except} expression should evaluate to a
character vector (after substituting symbols for strings). See
Writing R Extensions.
Methinks that is exactly what you want here, and want most people want who do not intend to have dplyr clobber over functions from the stats package included with R such as filter or lag.
Edited based on later discussion in comments:
Example usage example in file NAMESPACE per Section 1.5.1 of WRE is as follows:
import(dplyr, except = c(recode, lag, filter))
The other alternative would be to use
recode <- SILLY_PROPRIETARY_PACKAGENAME::recode
at the head of your code (with an explanatory comment) to create a copy of recode in the global workspace (which should then mask the version from dplyr). This could prevent future confusion when you hand your code to someone who has the stock dplyr, rather than your personally hacked version, installed.
Use the Hack-R version of dplyr instead of the Hadley version. Given that I created this in the past 2 minutes, you could also easily make your own version.
require(devtools)
install_github("hack-r/dplyr")
require(dplyr)
All I did was fork it, open the project in RStudio via version control, remove recode, commit, and push it back to my GitHub.
It looks like library() gained this functionality in version 3.6, in the form of the exclude and include.only parameters.
See https://developer.r-project.org/Blog/public/2019/03/19/managing-search-path-conflicts/
library(digest, exclude="sha1")
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> Error in sha1(letters): could not find function "sha1"
or:
library(digest, include.only="sha1")
digest(letters)
#> Error in digest(letters): could not find function "digest"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
As compared to how it would appear without either of the options:
library(digest)
digest(letters)
#> [1] "5cab7c8e9f3d7042d6146f98602c88d2"
sha1(letters)
#> [1] "005ae317c931561a05b53fcfa860d7ac61dfec85"
Very neat!
(R.4.0.3 was used for the reprexes above)
What is the possible documentation available for R package? For example I try to understand sp package.
In addition to help(sp), what are the other functions for searching through help and documentation?
Getting help on a function that you know the name of
Use ? or, equivalently, help.
?mean
help(mean) # same
For non-standard names use quotes or backquotes; see An Introduction to R: Getting help with functions and features:
For a feature specified by special characters, the argument must be enclosed in double or single quotes, making it a “character string”: This is also necessary for a few words with syntactic meaning including if, for and function."
?`if`
?"if" # same
help("if") # same
There are also help pages for datasets, general topics and some packages.
?iris
?Syntax
?lubridate
Use the example function to see examples of how to use it.
example(paste)
example(`for`)
The demo function gives longer demonstrations of how to use a function.
demo() # all demos in loaded pkgs
demo(package = .packages(all.available = TRUE)) # all demos
demo(plotmath)
demo(graphics)
Finding a function that you don't know the name of
Use ?? or, equivalently, help.search.
??regression
help.search("regression")
Again, non-standard names and phrases need to be quoted.
??"logistic regression"
apropos finds functions and variables in the current session-space (but not in installed but not-loaded packages) that match a regular expression.
apropos("z$") # all fns ending with "z"
rseek.org is an R search engine with a Firefox plugin.
RSiteSearch searches several sites directly from R.
findFn in sos wraps RSiteSearch returning the results as a HTML table.
RSiteSearch("logistic regression")
library(sos)
findFn("logistic regression")
Finding packages
available.packages tells you all the packages that are available in the repositories that you set via setRepositories. installed.packages tells you all the packages that you have installed in all the libraries specified in .libPaths. library (without any arguments) is similar, returning the names and tag-line of installed packages.
View(available.packages())
View(installed.packages())
library()
.libPaths()
Similarly, data with no arguments tells you which datasets are available on your machine.
data()
search tells you which packages have been loaded.
search()
packageDescription shows you the contents of a package's DESCRIPTION file. Likewise news read the NEWS file.
packageDescription("utils")
news(package = "ggplot2")
Getting help on variables
ls lists the variables in an environment.
ls() # global environment
ls(all.names = TRUE) # including names beginning with '.'
ls("package:sp") # everything for the sp package
Most variables can be inspected using str or summary.
str(sleep)
summary(sleep)
ls.str is like a combination of ls and str.
ls.str()
ls.str("package:grDevices")
lsf.str("package:grDevices") # only functions
For large variables (particularly data frames), the head function is useful for displaying the first few rows.
head(sleep)
args shows you the arguments for a function.
args(read.csv)
General learning about R
The Info page is a very comprehensive set of links to free R resources.
Many topics in R are documented via vignettes, listed with browseVignettes.
browseVignettes()
vignette("intro_sp", package = "sp")
By combining vignette with edit, you can get its code chunks in an editor.
edit(vignette("intro_sp",package="sp"))
This answer already gives you a very comprehensive list.
I would add that findFn("some search terms") in package sos is extremely helpful, if you only have an idea/keywords of what you are looking for and don't already have a package or function in mind.
And also the task views on CRAN: not really a search process but a great place to wander as you wonder.
This thread contains many good suggestions. Let me add one more.
For finding which packages are loaded, plus extra goodies, ?sessionInfo is quite nice.
help(package="<package-name>") where of course <package-name> is the name of the package you want help for.
Often the same function name is used by several packages. To get help on a function from a specific package, use:
help(aggregate, package="stats")
help(aggregate, package="sp")
In the RStudio IDE you can click on any function name and press F1, which will directly open the associated function help text in its pane. Like you would have called help() or ?fun().