R CMD Check: List of auxiliary functions - r

I am currently checking my package. R CMD check gives me the following warning:
* checking for missing documentation entries ... WARNING Undocumented code objects:
followed by a list of functions.
The problem seems to be that I have 3 lists containing functions. These, however, are just small functions that exist only because I tried to modularise my code as far as possible. I would like (and have seen that previously in other packages) to just give the function list + documentation without having to provide a documentation for every single tiny function bit.
Is there a way to do this?

The mistake was that within the function list I had
funclist <- list(function1 <- function1(){})
This lead to function1 being sourced upon sourcing funclist. However, when one writes
funclist <- list(function1 = function1(){})
function1 is not sourced and can be addressed via funclist$function1() then.

Related

Is it unwise to modify the class of functions in other packages?

There's a bit of a preamble before I get to my question, so hang with me!
For an R package I'm working on I'd like to make it as easy as possible for users to partially apply functions inline. I had the idea of using the [] operators to call my partial application function, which I've name "partialApplication." What I aim to achieve is this:
dnorm[mean = 3](1:10)
# Which would be exactly equivalent to:
dnorm(1:10, mean = 3)
To achieve this I tried defining a new [] method for objects of class function, i.e.
`[.function` <- function(...) partialApplication(...)
However, R gives a warning that the [] method for function objects is "locked." (Is there any way to override this?)
My idea seemed to be thwarted, but I thought of one simple solution: I can invent a new S3 class "partialAppliable" and create a [] method for it, i.e.
`[.partialAppliable` = function(...) partialApplication(...)
Then, I can take any function I want and append 'partialAppliable' to its class, and now my method will work.
class(dnorm) = append(class(dnorm), 'partialAppliable')
dnorm[mean = 3](1:10)
# It works!
Now here's my question/problem: I'd like users to be able to use any function they want, so I thought, what if I loop through all the objects in the active environment (using ls) and append 'partialAppliable' to the class of all functions? For instance:
allobjs = unlist(lapply(search(), ls))
#This lists all objects defined in all active packages
for(i in allobjs) {
if(is.function(get(i))) {
curfunc = get(i)
class(curfunc) = append(class(curfunc), 'partialAppliable')
assign(i, curfunc)
}
}
Voilà! It works. (I know, I should probably assign the modified functions back into their original package environments, but you get the picture).
Now, I'm not a professional programmer, but I've picked up that doing this sort of thing (globally modifying all variables in all packages) is generally considered unwise/risky. However, I can't think of any specific problems that will arise. So here's my question: what problems might arise from doing this? Can anyone think of specific functions/packages that will be broken by doing this?
Thanks!
This is similar to what the Defaults package did. The package is archived because the author decided that modifying other package's code is a "very bad thing". I think most people would agree. Just because you can do it, does not mean it's a good idea.
And, no, you most certainly should not assign the modified functions back to their original package environments. CRAN does not like when packages modify the users search path unnecessarily, so I would be surprised if they allowed a package to modify other package's function arguments.
You could work around that by putting all the modified functions in an environment on the search path. But then you have to ensure that environment is always searched first, which means modifying the search path every time another package is loaded.
Changing arguments for functions in other packages also has the potential to make it very difficult for others to reproduce your results because they must have all your argument settings. Unless you always call functions with all their arguments specified, which defeats the purpose of what you're trying to do.

R package can see variables not passed to it

I am writing a new R package and find that variables that I have not explicitly passed to a function in the package (as input argument) are visible within it, e.g.:
myFunc <- function(a,b,c) {
print(d)
}
where d is in the caller .R script, but has not been passed to myFunc, is visible.
Any help would be great, thanks; I'm using R 3.2.4 and have been using roxygen2 (via devtools::document()) to create the NAMESPACE if that helps.
Isn't this just a consequence of the scoping rules in R?
Your function defines a new myFunc environment. When you try to reference d in print(d), the interpreter first checks the myFunc environment for an object called d. Because no such object exists, the interpreter next checks the calling environment for an object called d. It finds the variable defined in your .R script and then prints it.
Here's a link with more info and a pile of examples.
Very useful link, thanks. It looks like forcing limited scoping within a function (i.e. getting a function to not access the global scope) is not a default property of R.
I found a similar question here: R force local scope
Using the checkStrict function posted by the main responder to that question seems to have worked; it found an unintended use of a global variable.
> require(myCustomPackage)
> checkStrict(showDendro)
Warning message:
In checkStrict(showDendro) : global variables used: palName
where showDendro is a function inside my custom package.
So it seems the solution to my problem is:
1) while you can stop R from moving up to the global environment by enclosing all your functions in the local() function , that seems like a tedious solution.
2) when moving code from the general environment into its own function, run something like checkStrict to remove unintended use of global variables.

How to get help in R?

What is the possible documentation available for R package? For example I try to understand sp package.
In addition to help(sp), what are the other functions for searching through help and documentation?
Getting help on a function that you know the name of
Use ? or, equivalently, help.
?mean
help(mean) # same
For non-standard names use quotes or backquotes; see An Introduction to R: Getting help with functions and features:
For a feature specified by special characters, the argument must be enclosed in double or single quotes, making it a “character string”: This is also necessary for a few words with syntactic meaning including if, for and function."
?`if`
?"if" # same
help("if") # same
There are also help pages for datasets, general topics and some packages.
?iris
?Syntax
?lubridate
Use the example function to see examples of how to use it.
example(paste)
example(`for`)
The demo function gives longer demonstrations of how to use a function.
demo() # all demos in loaded pkgs
demo(package = .packages(all.available = TRUE)) # all demos
demo(plotmath)
demo(graphics)
Finding a function that you don't know the name of
Use ?? or, equivalently, help.search.
??regression
help.search("regression")
Again, non-standard names and phrases need to be quoted.
??"logistic regression"
apropos finds functions and variables in the current session-space (but not in installed but not-loaded packages) that match a regular expression.
apropos("z$") # all fns ending with "z"
rseek.org is an R search engine with a Firefox plugin.
RSiteSearch searches several sites directly from R.
findFn in sos wraps RSiteSearch returning the results as a HTML table.
RSiteSearch("logistic regression")
library(sos)
findFn("logistic regression")
Finding packages
available.packages tells you all the packages that are available in the repositories that you set via setRepositories. installed.packages tells you all the packages that you have installed in all the libraries specified in .libPaths. library (without any arguments) is similar, returning the names and tag-line of installed packages.
View(available.packages())
View(installed.packages())
library()
.libPaths()
Similarly, data with no arguments tells you which datasets are available on your machine.
data()
search tells you which packages have been loaded.
search()
packageDescription shows you the contents of a package's DESCRIPTION file. Likewise news read the NEWS file.
packageDescription("utils")
news(package = "ggplot2")
Getting help on variables
ls lists the variables in an environment.
ls() # global environment
ls(all.names = TRUE) # including names beginning with '.'
ls("package:sp") # everything for the sp package
Most variables can be inspected using str or summary.
str(sleep)
summary(sleep)
ls.str is like a combination of ls and str.
ls.str()
ls.str("package:grDevices")
lsf.str("package:grDevices") # only functions
For large variables (particularly data frames), the head function is useful for displaying the first few rows.
head(sleep)
args shows you the arguments for a function.
args(read.csv)
General learning about R
The Info page is a very comprehensive set of links to free R resources.
Many topics in R are documented via vignettes, listed with browseVignettes.
browseVignettes()
vignette("intro_sp", package = "sp")
By combining vignette with edit, you can get its code chunks in an editor.
edit(vignette("intro_sp",package="sp"))
This answer already gives you a very comprehensive list.
I would add that findFn("some search terms") in package sos is extremely helpful, if you only have an idea/keywords of what you are looking for and don't already have a package or function in mind.
And also the task views on CRAN: not really a search process but a great place to wander as you wonder.
This thread contains many good suggestions. Let me add one more.
For finding which packages are loaded, plus extra goodies, ?sessionInfo is quite nice.
help(package="<package-name>") where of course <package-name> is the name of the package you want help for.
Often the same function name is used by several packages. To get help on a function from a specific package, use:
help(aggregate, package="stats")
help(aggregate, package="sp")
In the RStudio IDE you can click on any function name and press F1, which will directly open the associated function help text in its pane. Like you would have called help() or ?fun().

How to prevent functions polluting global namespace?

My R project is getting increasingly complex, and I'm starting to look for some construct that's equivalent to classes in Java/C#, or modules in python, so that my global namespace doesn't become littered with functions that are never used outside of one particular .r file.
So, I guess my question is: to what extent is it possible to limit the scope of functions to within a specific .r file, or similar?
I think I can just make the entire .r file into one giant function, and put functions inside that, but that messes with the echoing:
myfile.r:
myfile <- function() {
somefunction <- function(a,b,c){}
anotherfunction <- function(a,b,c){}
# do some stuff here...
123
456
# ...
}
myfile()
Output:
> source("myfile.r",echo=T)
> myfile <- function() {
+ somefunction <- function(a,b,c){}
+ anotherfunction <- function(a,b,c){}
+
+ # do some stuff here...
+ # . .... [TRUNCATED]
> myfile()
>
You can see that "123" is not printed, even though we used echo=T in the source command.
I'm wondering if there is some other construct which is more standard, since putting everything inside a single function doesn't sound like something that is really standard? But perhaps it is? Also, if it means that echo=T works then that is a definite bonus for me.
Firstly, as #Spacedman has said, you'll be best served by a package but there are other options.
S3 Methods
R's original "object orientation" is known as S3. The majority of R's code base uses this particular paradigm. It is what makes plot() work for all kinds of objects. plot() is a generic function and the R Core Team and package developers can and have written their own methods for plot(). Strictly these methods might have names like plot.foo() where foo is a class of object for which the function defines a plot() method. The beauty of S3 is that you don't (hardly) ever need to know or call plot.foo() you just use plot(bar) and R works out which plot() method to dispatch to based on the class of object bar.
In your comments on your Question you mention that you have a function populate() that has methods (in effect) for classes "crossvalidate" and "prod" which you keep in separate .r files. The S3 way to set this up is to do:
populate <- function(x, ...) { ## add whatever args you want/need
UseMethod("populate")
}
populate.crossvalidate <-
function(x, y, z, ...) { ## add args but must those of generic
## function code here
}
populate.prod <-
function(x, y, z, ...) { ## add args but must have those of generic
## function code here
}
The given some object bar with class "prod", calling
populate(bar)
will result in R calling populate() (the generic), it then looks for a function with name populate.prod because that is the class of bar. It finds our populate.prod() and so dispatches that function passing on to it the arguments we initially specified.
So you see that you only ever refer to the methods using the name of the generic, not the full function name. R works out for you what method needs to be called.
The two populate() methods can have very different arguments, with exception that strictly they should have the same arguments as the generic function. So in the example above, all methods should have arguments x and .... (There is an exception for methods that employ formula objects but we don't need to worry about that here.)
Package Namespaces
Since R 2.14.0, all R packages have had their own namespace, even if one were not provided by the package author, although namespaces have been around for a lot longer in R than that.
In your example, we wish to register the populate() generic and it's two methods with the S3 system. We also wish to export the generic function. Usually we don't want or need to export the individual methods. So, pop your functions into .R files in the R folder of the package sources and then in the top level of the package sources create a file named NAMESPACE and add the following statements:
export(populate) ## export generic
S3method(populate, crossvalidate) ## register methods
S3method(populate, prod)
Then once you have installed your package, you will note that you can call populate() but R will complain if you try to call populate.prod() etc directly by name from the prompt or in another function. This is because the functions that are the individual methods have not been exported from the namespace and thence are not visible outside it. Any function in your package that call populate() will be able to access the methods you have defined, but any functions or code outside your package can't see the methods at all. If you want, you can call non-exported functions using the ::: operator, i.e.
mypkg:::populate.crossvalidate(foo, bar)
will work, where mypkg is the name of your package.
To be honest, you don't even need a NAMESPACE file as R will auto generate one when you install the package, one that automatically exports all functions. That way your two methods will be visible as populate.xxx() (where xxx is the particular method) and will operate as S3 methods.
Read Section 1 Creating R Packages in the Writing R Extensions manual for details of what is involved, but yuo won't need to do half of this if you don't want too, especially if the package is for your own use. Just create the appropriate package folders (i.e. R and man), stick your .R files in R. Write a single .Rd file in man where you add
\name{Misc Functions}
\alias{populate}
\alias{populate.crossvalidate}
\alias{populate.prod}
at the top of the file. Add \alias{} for any other functions you have. Then you'll need to build and install the package.
Alternative using sys.source()
Although I don't (can't!) really recommend what I mention below as a long-term viable option here, there is an alternative that will allow you to isolate the functions from individual .r files as you initially requested. This is achieved through the use of environments not namespaces and doesn't involve creating a package.
The sys.source() function can be used to source R code/functions from a .R file and evaluate it in an environment. As you .R file is creating/defining functions, if you source it inside another environment then those will functions will be defined there, in that environment. They won't be visible on the standard search path by default and hence a populate() function defined in crossvalidate.R will not clash with a populate() defined in prod.R as long as you use two separate environments. When you need to use one set of functions you can assign the environment to the search path, upon which it will then be miraculously visible to everything, and when you are done you can detach it. The attach the other environment, use it, detach etc. Or you can arrange for R code to be evaluated in a specific environment using things like eval().
Like I said, this isn't a recommended solution but it will work, after a fashion, in the manner you describe. For example
## two source files that both define the same function
writeLines("populate <- function(x) 1:10", con = "crossvalidate.R")
writeLines("populate <- function(x) letters[1:10]", con = "prod.R")
## create two environments
crossvalidate <- new.env()
prod <- new.env()
## source the .R files into their respective environments
sys.source("crossvalidate.R", envir = crossvalidate)
sys.source("prod.R", envir = prod)
## show that there are no populates find-able on the search path
> ls()
[1] "crossvalidate" "prod"
> find("populate")
character(0)
Now, attach one of the environments and call populate():
> attach(crossvalidate)
> populate()
[1] 1 2 3 4 5 6 7 8 9 10
> detach(crossvalidate)
Now call the function in the other environment
> attach(prod)
> populate()
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> detach(prod)
Clearly, each time you want to use a particular function, you need to attach() its environment and then call it, followed by a detach() call. Which is a pain.
I did say you can arrange for R code (expressions really) to be evaluated in a stated environment. You can use eval() of with() for this for example.
> with(crossvalidate, populate())
[1] 1 2 3 4 5 6 7 8 9 10
At least now you only need a single call to run the version of populate() of your choice. However, if calling the functions by their full name, e.g. populate.crossvalidate() is too much effort (as per your comments) then I dare say that even the with() idea will be too much hassle? And anyway, why would you use this when you can quite easily have your own R package.
Don't worry about the complexity of 'making a package'. Stop thinking of it like that. What you are going to do is this:
in the folder where you are working on your project, make a folder called 'R'
put your R code in there, one function per file
make a DESCRIPTION file in your project directory. Check out existing examples for the exact format, but you only need a few fields.
Get devtools. install.packages("devtools")
Use devtools. library(devtools)
Now, write your functions in your R files in your R folder. To load them into R, DONT source them. Do load_all(). Your functions will be loaded but NOT into the global environment.
Edit one of your R files, then do load_all() again. This will load any modified files in the R folder, thus updating your function.
That's it. Edit, load_all, rinse and repeat. You have created a package, but its pretty lightweight and you don't have to deal with the bondage and discipline of R's package building tools.
I've seen, used, and even written code that tries to implement a lightweight packagey mechanism for loading objects, and none are as good as what devtools does.
All Hail Hadley!
You might want to consider making a package. As an alternative, you could look at environments. Finally, RStudio's projects may be closer to what would suit you.

What is read.data in R? Why can't I find it in any docs?

I'm trying to debug my first R script and I came across this line:
data <- read.data(dir, indiv, label)
I've been googling "R read.data" for the past 30 minutes and absolutely nothing is coming up. Am I doing something wrong? Is there a good way to look up things I see in R scripts that I don't know what they are?
And what is this particular line doing anyway?
It's probably a function defined by the author of the script. Search for it in the code you have.
A couple of things to check:
Does your script define the function 'read.data' somewhere? read.data <- function(...
Does your script use library() or require() to load another package? In that case, the read.data function can be defined in that package.
Does your script use source to read another script? Check that script then...
package sos to the rescue:
read.data is a deprecated function in the rjags package
> library(sos)
> findFn("read.data")
Finds this result:
http://finzi.psych.upenn.edu/R/library/rjags/html/read.data.html
From this page:
Read data for a JAGS model from a file.
Usage
read.jagsdata(file)
read.bugsdata(file)
Note
Earlier versions of the rjags package had a read.data function which read data
in either format, but the function name was ambiguous (There are many data file
format in R) so this is now deprecated.
There isn't a base function named read.data. If you want to find help for an R function (for example read.table), simply type ?read.table at the interactive prompt.
This line calls a read.data function which is either defined in that script or in something else it loads (such as libraries with the library() or require(), other scripts with source()). You'll need to search those sources to find this function.

Resources