Determining if there are unused packages in an R script [duplicate] - r

As my code evolves from version to version, I'm aware that there are some packages for which I've found better/more appropriate packages for the task at hand or whose purpose was limited to a section of code which I've now phased out.
Is there any easy way to tell which of the loaded packages are actually used in a given script? My header is beginning to get cluttered.

Update 2020-04-13
I've now updated the referenced function to use the abstract syntax tree (AST) instead of using regular expressions as before. This is a much more robust way of approaching the problem (it's still not completely ironclad). This is available from version 0.2.0 of funchir, now on CRAN.
I've just got around to writing a quick-and-dirty function to handle this which I call stale_package_check, and I've added it to my package (funchir).
e.g., if we save the following script as test.R:
library(data.table)
library(iotools)
DT = data.table(a = 1:3)
Then (from the directory with that script) run funchir::stale_package_check('test.R'), we'll get:
Functions matched from package data.table: data.table
**No exported functions matched from iotools**

Have you considered using packrat?
packrat::clean() would remove unused packages, for example.

I've written a command-line script to accomplish this task. You can find it in this Github gist. I'm sure there are edge cases that it misses, but it works pretty well, on both R scripts and Rmd files.

My approach always is to close my R script or IDE (i.e. RStudio) and then start it again.
After this I run my function without loading any dependecies/packages beforehand.
This should result in various warning and error messages telling you which functions couldn't be found and executed. This again will give you hints on what packages are necessary to load beforehand and which one you can leave out.

Related

add and Rcpp file to an existing r Package?

I have already made a simple R package (pure R) to solve a problem with brute force then I tried to faster the code by writing the Rcpp script. I wrote a script to compare the running time with the "bench" library. now, how can I add this script to my package? I tried to add
#'#importFrom Rcpp cppFunction
on top of my R script and inserting the Rcpp file in the scr folder but didn't work. Is there a way to add it to my r package without creating the package from scratch? sorry if it has already been asked but I am new to all this and completely lost.
That conversion is actually (still) surprisingly difficult (in the sense of requiring more than just one file). It is easy to overlook details. Let me walk you through why.
Let us assume for a second that you started a working package using the R package package.skeleton(). That is the simplest and most general case. The package will work (yet have warning, see my pkgKitten package for a wrapper than cleans up, and a dozen other package helping functions and packages on CRAN). Note in particular that I have said nothing about roxygen2 which at this point is a just an added complication so let's focus on just .Rd files.
You can now contrast your simplest package with one built by and for Rcpp, namely by using Rcpp.package.skeleton(). You will see at least these differences in
DESCRIPTION for LinkingTo: and Imports
NAMESPACE for importFrom as well as the useDynLib line
a new src directory and a possible need for src/Makevars
All of which make it easier to (basically) start a new package via Rcpp.package.skeleton() and copy your existing package code into that package. We simply do not have a conversion helper. I still do the "manual conversion" you tried every now and then, and even I need a try or two and I have seen all the error messages a few times over...
So even if you don't want to "copy everything over" I think the simplest way is to
create two packages with and without Rcpp
do a recursive diff
ensure the difference is applied in your original package.
PS And remember that when you use roxygen2 and have documentation in the src/ directory to always first run Rcpp::compileAttributes() before running roxygen2::roxygenize(). RStudio and other helpers do that for you but it is still easy to forget...

What is the difference between library()/require() and source() in r?

Looked around and still not sure what is the difference between library()/require() and source() in r? According to this SO question: What is the difference between require() and library()? it looks like library() and require() are the same thing, maybe one is legacy. Is source() for lazy developers that don't want to create a library? When do you use each of these constructs?
The differences between library and require are already well documented in What is the difference between require() and library()?.
So, I will focus on how source differs from these. In fact they are fundamentally quite different commands. Neither library nor require actually execute any code. They simply attach a namespace, in a lazy fashion, meaning that individual functions in the package are not run unless they are actually called later. Source on the other hand does something quite different which is to execute all of the code in the file at that time.
A small caveat: packages can be made to actually run some code at the time of package loading or attaching, via the .onLoad and .onAttach functions. Have a look here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/ns-hooks.html
source runs the code in a .R file, line by line.
library and require load and attach R packages.
Is source() for lazy developers that don't want to create a library?
You're correct that source is for the cases when you don't have a package. Laziness is not the only reason, sometimes packages are not appropriate---packages provide functionality, but don't do things. Perhaps I have a script that pulls data from a database, fits a model, and makes some predictions. A package may provide functions to help me do that, but it does not actually do it. A script saved in a .R file and run with source() could run the commands and complete the task.
I do want to address this:
it looks like library() and require() are the same thing, maybe one is legacy.
They both do the same thing (load and attach a package). The main difference is that library() will throw an error and stop the script if the package is not available, whereas require() will return TRUE or FALSE depending on its success. The general consensus is that library is better so that your script stops with a nice clear error and you can install the missing package before proceeeding. The question linked has a more thorough discussion which I won't try to replicate here.

How to use/install merge method for data.sets from memisc package?

I have two data.sets (from the memisc package) all set for merge, and the merge goes through without error or warning, but the output is a data.frame, not a data.set. The command is:
datTS <- merge(datT1, datT2, by.x="ryear", by.y="ryear")
(Sorry I don't have a more convenient example with toy data handy.) The following pages seem to make it very clear that there should be a method built into memisc that properly merges the data.sets into one data.set:
http://rpackages.ianhowson.com/rforge/memisc/man/dataset-manip.html
https://github.com/melff/memisc/blob/master/pkg/R/dataset-methods.R
...but it just doesn't seem to be properly triggering on my machine (sorry also for my clumsy lingo). Note the similarity of my code and the example code from the very end of the first page I linked:
ds6 <- merge(ds1,ds5,by.x="a",by.y="c")
I've verified that I have the most recent versions of R, RStudio, memisc, and all dependencies. I've used a number of other memisc methods so far (within, transform, missing.values, etc.) without issue.
So my question is: what else does one need to do to get the merge function to properly produce a data.set when the source data are in data.set form, as per the memisc package? (There's no explicit addressing of this merge capability in the official package documentation.) Since the code in the second link above seems to provide the method for this, is there some workaround, at least, for installing and utilizing that code? Maybe there's just some separate "methods installation" I'm not aware of (but why would it be separate from the main package?).
The help page for pkg:memisc in the released version 0.97 does not describe a merge function method for data.sets. You are pointing us to the github version which may not be the one that has been released. You need to install the github version. See: https://github.com/melff/memisc/releases

How can I tell which packages I am not using in my R script?

As my code evolves from version to version, I'm aware that there are some packages for which I've found better/more appropriate packages for the task at hand or whose purpose was limited to a section of code which I've now phased out.
Is there any easy way to tell which of the loaded packages are actually used in a given script? My header is beginning to get cluttered.
Update 2020-04-13
I've now updated the referenced function to use the abstract syntax tree (AST) instead of using regular expressions as before. This is a much more robust way of approaching the problem (it's still not completely ironclad). This is available from version 0.2.0 of funchir, now on CRAN.
I've just got around to writing a quick-and-dirty function to handle this which I call stale_package_check, and I've added it to my package (funchir).
e.g., if we save the following script as test.R:
library(data.table)
library(iotools)
DT = data.table(a = 1:3)
Then (from the directory with that script) run funchir::stale_package_check('test.R'), we'll get:
Functions matched from package data.table: data.table
**No exported functions matched from iotools**
Have you considered using packrat?
packrat::clean() would remove unused packages, for example.
I've written a command-line script to accomplish this task. You can find it in this Github gist. I'm sure there are edge cases that it misses, but it works pretty well, on both R scripts and Rmd files.
My approach always is to close my R script or IDE (i.e. RStudio) and then start it again.
After this I run my function without loading any dependecies/packages beforehand.
This should result in various warning and error messages telling you which functions couldn't be found and executed. This again will give you hints on what packages are necessary to load beforehand and which one you can leave out.

R: importing data.table package namespace, unexplainable jump in memory consumption

I use data.table package inside my own package and I import data.table namespace in NAMESPACE and DESCRIPTION files.
In one of my functions I use data.table function to convert data.frame into data.table
dt <- data.table(df)
But when I call my function, at the point of calling data.table() memory usage jumps instantly and R just stops responding.
The code within the function works fine when I run it line by line and with low memory consumption.
Also, if I put library(data.table) within my function everything is fine. I was trying to avoid putting library(data.table) in my function and declare dependency instead. However, it seems something is going wrong that way. I am running R-2.14.0 on Mac OS X 10.6.8
Can anybody explain what could be a reason, and how can I fix that (without using library(data.table) within my function)?
Some random guesses in no particular order :
Try use the Imports or Depends field in DESCRIPTION only. I don't think you need to import in NAMESPACE as well, but I might be wrong. Why that would explain the memory use though, don't know.
What is df? Is it big or somehow recursive or strange in some way? Please provide str(df) to tell us something about it, if possible.
Try as.data.table(df) which is faster than data.table(df). But it sounds like your problem is different to that.
Is your function call being called repeatedly? I can see why repeatedly converting df to dt would use up memory, but not why just calling library(data.table) would make that fast.
Try starting R with R --vanilla to ensure no .Rdata (which may include functions masking data.table's) is being loaded on startup, amongst other things. If you have developed your own package then some kind of function name conflict, or the order of packages on the search() path sounds plausible.
Otherwise we'll need more information please. I don't recall anything similar to this happening to me, or being reported before.
And, which version of data.table are you using? There is this bug fix in v1.8.1 on R-Forge (not yet on CRAN) :
Moved data.table setup code from .onAttach to .onLoad so that it is also run when data.table is simply imported from within a package, fixing #1916 related to missing data.table options.
But if you are using 1.8.0 from CRAN, and are Importing (only) rather than Depending then I'd expect you to get an error about missing options rather than a jump in memory consumption.

Resources