I am following a tutorial in Rbloggers and found the use of double colons, I looked online, but I couldn't find an explanation for their use.
Here is an example of their use.
df <- dplyr::data_frame(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
I understand it creates a data frame but I don't understand their purpose.
As you probably have looked up the help page by now usage of :: helps to access the exact function from that specific package. When you load dplyr you probably got a message as follows..
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
So, for instance, if you would like to use intersect function from dplyr or base package, you need to specify using the :: double colons. Usage will be as follows
mtcars$model <- rownames(mtcars)
first <- mtcars[1:20, ]
second <- mtcars[10:20, ]
dplyr::intersect(first, second)
base::intersect(first, second)
Update: Added additional explanation
Note: The sequence you load libraries determine the preferential access of the specific functions. Developers of different package tend to use same function names. However, when R encounters a function, it runs through the different libraries that particular session has loaded in a sequential manner. You can check the packages in a session by running (.packages())
[1] "tidyr" "data.table" "dplyr" "stats"
[5] "graphics" "grDevices" "utils" "datasets"
[9] "methods" "base"
As you can see in my example session above, tidyr is the last library I loaded, which is r session 1st entry. So, when you use any function in your code , first it is searched in tidyr -> then data.table -> then dplyr and so on, finally the base package is looked up. So, in this process when there is function name overlaps between packages the one which loaded the last masks the previous ones. To avoid this masking, you specify in R code where to look for the function. Hence, here base::intersect, will use the function from base library instead of the dplyr. Alternatively, you can use to avoid loading of complete library. There are positives and negatives with this. Read the links and learn more.
run and check the differences.
Here are some resources for you to get an understanding.
Compare library(), require(), ::
Namespace
There may be multiple functions with the same name in multiple packages. The double colon operator allows you to specify the specific function you want:
package::functionname
Related
Everytime I load the dplyr package the console shows a warning message.
warning message Some objetcs are masked from other packages. I think this is because the objects have the same name. For example:
Filter has this usage in dplyr packagefilter(.data, ..., .preserve = FALSE)
Filter has this usage in stats package filter(x, filter, method = c("convolution", "recursive"), sides = 2, circular = FALSE, init)
How can I unmask the filter object from stats package if I need to use it?
Regards
You are correct that they are simply packages that share the same name. The comments above basically answer the question already. Theoretically if you have a conflict of functions you want to avoid, you can also select which you prefer, like so:
library(conflicted)
conflict_prefer("slice", # the function
"dplyr") # the package
And R will tell you which it will use as your primary:
[conflicted] Will prefer dplyr::slice over any other package
However that is an extra step and I prefer usually to name it explicitly like dplyr::slice as mentioned in the comments instead.
If I have multiple packages loaded that define functions of the same name, is there an easy way to determine which version of the function is currently the active one? Like, lets say I have base R, the tidyverse, and a bunch of time series packages loaded. I'd like a function which_package("intersect") that would tell me the package name of the active version of the intersect function. I know you can go back and look at all the warning messages you recieved when installing packages, but I think that sort of manual search is not only tedious but also error-prone.
There is a function here that does sort of what I want, except it produces a table for all conflicts rather than the value for one function. I would actually be quite happy with that, and would also accept a similar function as an answer, but I have had problems with the implimentation of function given. As applied to my examples, it inserts vast amounts of white space and many duplicates of the package names (e.g. the %>% function shows up with 132 packages listed), making the output hard to read and hard to use. It seems like it should be easy to remove the white space and duplicates, and I have spent considerable time on various approaches that I expected to work but which had no impact on the outcome.
So, for an example of many conflicts:
install.packages(pkg = c("tidyverse", "fpp3", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink")
lapply(x = c("tidyverse", "fable", "tsbox", "rugarch", "Quandl", "DREGAR", "dynlm", "zoo", "GGally", "dyn", "ARDL", "bigtime", "BigVAR", "dLagM", "VARshrink"),
library, character.only = TRUE)
You can pull this information with your own function helper.
which_package <- function(fun) {
if(is.character(fun)) fun <- getFunction(fun)
stopifnot(is.function(fun))
x <- environmentName(environment(fun))
if (!is.null(x)) return(x)
}
This will return R_GlobalEnv for functions that you define in the global environment. There is also the packageName function if you really want to restrict it to packages only.
For example
library(MASS)
library(dplyr)
which_package(select)
# [1] "dplyr"
I have used several packages of R libraries for my study. All libraries charge together at the beginning of my code. And here is the problem. It turns out that I have done several tests with different functions that were already in the packages of R. However, in the final code I have not implemented all the functions I have tried. Therefore, I am loading libraries that I do not use.
Would there be any way to check the libraries to know if they really are necessary for my code?
Start by restarting R with a fresh environment, no libraries loaded. For this demonstration, I'm going to define two functions:
zoo1 <- function() na.locf(1:10)
zoo2 <- function() zoo::na.locf(1:10)
With no libraries loaded, let's try something:
codetools::checkUsage(zoo1)
# <anonymous>: no visible global function definition for 'na.locf'
codetools::checkUsage(zoo2)
library(zoo)
# Attaching package: 'zoo'
# The following objects are masked from 'package:base':
# as.Date, as.Date.numeric
codetools::checkUsage(zoo1)
Okay, so we know we can check a single function to see if it is abusing scope and/or using non-base functions. Let's assume that you've loaded your script full of functions (but not the calls to require or library), so let's do this process for all of them. Let's first unload zoo, so that we'll see a complaint again about our zoo1 function:
detach("package:zoo", unload=TRUE)
Now let's iterate over all functions:
allfuncs <- Filter(function(a) is.function(get(a)), ls())
str(sapply(allfuncs, function(fn) capture.output(codetools::checkUsage(get(fn))), simplify=FALSE))
# List of 2
# $ zoo1: chr "<anonymous>: no visible global function definition for 'na.locf'"
# $ zoo2: chr(0)
Now you know to look in the function named zoo1 for a call to na.locf. It'll be up to you to find in which not-yet-loaded package this function resides, but that might be more more reasonable, depending on the number of packages you are loading.
Some side-thoughts:
If you have a script file that does not have everything comfortably ensconced in functions, then just wrap all of the global R code into a single function, say bigfunctionfortest <- function() { as the first line and } as the last. Then source the file and run codetools::checkUsage(bigfunctionfortest).
Package developers have to go through a process that uses this, so that the Imports: and Depends: sections of NAMESPACE (another ref: http://r-pkgs.had.co.nz/namespace.html) will be correct. One good trick to do that will prevent "namespace pollution" is loading the namespace but not the package ... and though that may sound confusing, it often results in using zoo::na.locf for all non-base functions. This gets old quickly (especially if you are using dplyr and such, where most of your daily functions are non-base), suggesting those oft-used functions should be directly imported instead of just referenced wholly. If you're familiar with python, then:
# R
library(zoo)
na.locf(c(1,2,NA,3))
is analagous to
# fake-python
from zoo import *
na_locf([1,2,None,3])
(if that package/function exists). Then the non-polluting variant looks like:
# R
zoo::na.locf(c(1,2,NA,3))
# fake-python
import zoo
zoo.na_locf([1,2,None,3])
where the function's package (and/or subdir packaging) must be used explicitly. There is no ambiguity. It is explicit. This is by some/many considered "A Good Thing (tm)".
(Language-philes will likely say that library(zoo) and from zoo import * are not exactly the same ... a better way to describe what is happening is that they bring everything from zoo into the search path of functions, potentially causing masking as we saw in a console message earlier; while the :: functionality only loads the namespace but does not add it to the search path. Lots of things going on in the background.)
I am following a tutorial in Rbloggers and found the use of double colons, I looked online, but I couldn't find an explanation for their use.
Here is an example of their use.
df <- dplyr::data_frame(
year = c(2015, NA, NA, NA),
trt = c("A", NA, "B", NA)
)
I understand it creates a data frame but I don't understand their purpose.
As you probably have looked up the help page by now usage of :: helps to access the exact function from that specific package. When you load dplyr you probably got a message as follows..
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
So, for instance, if you would like to use intersect function from dplyr or base package, you need to specify using the :: double colons. Usage will be as follows
mtcars$model <- rownames(mtcars)
first <- mtcars[1:20, ]
second <- mtcars[10:20, ]
dplyr::intersect(first, second)
base::intersect(first, second)
Update: Added additional explanation
Note: The sequence you load libraries determine the preferential access of the specific functions. Developers of different package tend to use same function names. However, when R encounters a function, it runs through the different libraries that particular session has loaded in a sequential manner. You can check the packages in a session by running (.packages())
[1] "tidyr" "data.table" "dplyr" "stats"
[5] "graphics" "grDevices" "utils" "datasets"
[9] "methods" "base"
As you can see in my example session above, tidyr is the last library I loaded, which is r session 1st entry. So, when you use any function in your code , first it is searched in tidyr -> then data.table -> then dplyr and so on, finally the base package is looked up. So, in this process when there is function name overlaps between packages the one which loaded the last masks the previous ones. To avoid this masking, you specify in R code where to look for the function. Hence, here base::intersect, will use the function from base library instead of the dplyr. Alternatively, you can use to avoid loading of complete library. There are positives and negatives with this. Read the links and learn more.
run and check the differences.
Here are some resources for you to get an understanding.
Compare library(), require(), ::
Namespace
There may be multiple functions with the same name in multiple packages. The double colon operator allows you to specify the specific function you want:
package::functionname
When I load the data.table package after having already loaded the lubridate package, I get the following error message:
Loading required package: data.table
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
Attaching package: ‘data.table’
The following objects are masked from ‘package:lubridate’:
hour, mday, month, quarter, wday, week, yday, year
Does anyone know a) what's causing this issue and b) how to prevent these objects within lubridate from being masked?
UPDATE:
The issue associated with the above is that I'm using the quarter function from the lubridate package and, after loading the data.table package, I can no longer do so in the same way.
Specifically, when I run quarter(Date, with_year=TRUE) (where Date is a vector of class = Dates), I now get the following error: Error in quarter(Date, with_year = TRUE) : unused argument (with_year = TRUE).
If I simply, quarter(Date), then I can get the desired output without the attached year. For example, if Date is set as simply May 15, 2015 (today), then quarter(Date) will yield 2 (since we're in the 2nd quarter of 2015), but I'd like it to yield 2015.2, hence the importance of the with_year = TRUE option.
Obviously, I can overcome this by using paste to bind together the year and the output of quarter(Date), but I'd prefer to avoid that work-around.
An object name in a package namespace is masked when a new object is defined with the same name. This can be done by the user assigning the name, or by attaching another package that has an object of the same name.
data.table and lubridate have overlapping function names. If you want the lubridate version to be the default, then the easiest solution is to load data.table first, then load lubridate---thus it will be the data.table versions of these functions that is masked by the "newer" lubridate versions.
library(data.table)
library(lubridate)
Otherwise, the solution is to use :: (as in package::function) to fully specify which version of the function you want to use, for example:
lubridate::quarter(Date, with_year = T)
Another option, which involves a little less typing but is perhaps a little less clear as well, would be to alias the lubridate functions you want in the global environment at the start of your script.
quarter = lubridate::quarter
Any use of quarter() later in the script will use the lubridate version of the function.
Yet another option is the conflicted package, which provides a system for preferring a function from one package. It is a bit more intense and intentional, you should definitely read the documentation before using it, but your script might include something like this:
library(conflicted)
conflict_prefer("quarter", "lubridate")
The package conflicted provides various alternatives and is a good practice to use it while loading libraries to be clear on the masking.
https://github.com/r-lib/conflicted