Peculiar error in dplyr filter - r

I am working through Hadley Wickham's 2015 ggplot book. In it, there is a line of code (the ggplot2 package is already loaded):
presidential <- subset(presidential, start > economics$date[1])
and it works fine. I tried replacing subset with filter as in:
library(dplyr)
presidential <- filter(presidential, start > economics$date[1])
and I get the error:
Error in `>.default`(start, x) :
comparison (6) is possible only for atomic and list types
If the comparison is incorrect, should it also not affect subset?

I think I found the problem. If I explicitly specify dplyr:: as below, then it works:
presidential <- dplyr::filter(presidential, start > economics$date[1])
This means that some other filter function was overriding the one from dplyr.
In the code that I had posted earlier, I had indicated the library(dplyr) line just before the line of code that I thought was causing the problem, but in reality, dplyr had been loaded earlier as part of my startup script.
It looks like the stats package which also has a filter function was loaded after dplyr was loaded (because dplyr was in my startup script) and hence stats::filter masked dplyr::filter.
I really ought to have checked this first before posting, but it does highlight something about the impacts that loading packages in the startup scripts can have. The other tricky point is that in this situation we do not get any messages about the masking that has occurred.

Related

Why wont replace_na actually replace the missing values using dplyr and piping?

I have been struggling a lot recently with the replace_na() function when cleaning my data. I have two complementary variables and I want to use one variable (varname2) to supply the missing values for the other (varname1). I've been trying the following:
df %>%
replace_na(varname = varname2)
In response I keep getting the error:
Did you misspecify an argument?
Run `rlang::last_error()` to see where the error occurred.
> df <- df %>%
+ replace_na(varname1= varname2)
Error: 1 components of `...` were not used.
We detected these problematic arguments:
* `varname1`
Suggestions for an efficient way to fix this?
I found a blog response elsewhere in which Hadley himself said they wanted to move away from replace_na() toward a more SQL adjacent command coalesce(). The solution involves both across() and coalesce().
Here's an example of what I just did in my work:
df %>%
mutate(across(varname1, coalesce, varname2))
It seems to have worked like a charm.

select command not working in R even after installing the library dplyr

Error message : could not find function "select"
After installing the package dplyr which contains the select function for R,
this error isn't expected but still i am getting this error.
I want to select a particular column of the dataset but the dollar sign operator is also not working.
I think I've had this problem as well and I'm not sure what causes it. However, I can usually solve the problem by specifying the package before the command as in the code below.
dplyr::select()
Hope this helps.
#THATguy nailed it! That will solve your problems. The cause of this error is often due to multiple libraries with the same function. In this case specifically, the function "select" exists in the package 'dplyr' and 'MASS'. If you type in select in your code it's likely going to pull the MASS library, and if your intention is select only certain columns out of a data frame then, you want to the select from 'dplyr'. For example:
df <- read.csv("df.csv") %>% #bring in the data frame
dplyr::select(-x, -y, -z) # remove the x, y, and z columns from the data frame
Or if you want to keep certain columns then drop the '-' in front of the variable.
There are various ways you can try to solve this problem.
Restart the R session with ctrl + shift + F10
You can use dplyr::select() if that's the select function you want

utils::globalVariables(.) not applicable to R CMD CHECK note:no visible binding for global variable '.' [duplicate]

I noticed in checking a package that I obtain notes "no visible binding for global variable" when I use functions like subset that use verbatim names of list elements as arguments.
For example with a data frame:
foo <- data.frame(a=c(TRUE,FALSE,TRUE),b=1:3)
I can do silly things like:
subset(foo,a)
transform(foo,a=b)
Which work as expected. The R code check in R CMD however doesn't understand that these refer to elements and complains about there not being any visible bindings of global variables.
While this works ok, I don't really like having notes in my package and prefer for it to pass the check with no errors, warnings and notes at all. I also don't really want to rework my code too much. Is there a way to write these codes so that it is clear the arguments do not refer to global variables?
To get it past R CMD check you can either :
Use get("b") (but that is onerous)
Place a=b=NULL somewhere higher up in your function (that's what I do)
There was a thread on r-devel a while ago where somebody from r-core basically said (from memory) "NOTES are ok, you know. The assumption is that the author checked it and is ok with the NOTE.". But, I agree with you. I do prefer to have CRAN checks return a clean "OK" on all platforms. That way the user is left in no doubt that it passes checks ok.
EDIT :
Here is the r-devel thread I was remembering (from April 2010). So that appears to suggest that there are some situations where there is no known way to avoid the NOTE, but that's ok.
This is one of the potential "unanticipated consequences" of using subset non-interactively. As it says in the Warning section of ?subset:
This is a convenience function intended for use interactively. For
programming it is better to use the standard subsetting functions like
‘[’, and in particular the non-standard evaluation of argument
‘subset’ can have unanticipated consequences.
From R version 2.15.1 onwards there is a way around this:
if(getRversion() >= "2.15.1") utils::globalVariables(c("a", "othervar"))
As per the warning section of ?subset it is better to use subset interactively, and [ for programming.
I would replace a command like
subset(foo,a)
with
foo[foo$a]
or if foo is a dataframe:
foo[foo$a, ]
you might also like to use with if foo is a dataframe and the expression to be evaluated is complex:
with(foo, foo[a, ])
I had this issue and traced it to my ggplot2 section.
This code provided the error:
ggplot2::ggplot(data = spec.df, ggplot2::aes(E.avg, fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))
Adding the data name to the parameters eliminated the not:
ggplot2::ggplot(data = spec.df, ggplot2::aes(spec.df$E.avg, spec.df$fraction)) +
ggplot2::geom_line() +
ggplot2::ggtitle(paste0(title))

Large data.tree causes plot() to error

I'm trying to build an org chart from a data.frame in r using the data.tree package.
As far as i can tell i have constructed the tree correctly, but when I try to plot() the data.tree object (which print()s fine) I get an error:
abort(0) at jsStackTrace#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:5:22110
stackTrace#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:5:22258
abort#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:28:10656
nullFunc_iii#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:5:662065
a8#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:21:31634
iC#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:9:83383
aD#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:9:102098
uF#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:9:173805
pG#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:9:204484
xc#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:11:740
http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:28:403
ccallFunc#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:5:15982
http://localhost:30899/session/viewhtml2fdc215a4edd/lib/viz-0.3/viz.js:47:42
renderValue#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/grViz-binding-0.8.4/grViz.js:38:27
http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:625:30
forEach#[native code]
forEach#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:55:21
http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:551:14
forEach#[native code]
forEach#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:55:21
staticRender#http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:549:12
http://localhost:30899/session/viewhtml2fdc215a4edd/lib/htmlwidgets-0.7/htmlwidgets.js:638:38
Any ideas?
I have just started using the data.frame package a couple of days ago, but came across the same problem with some of my trees - print(tree) worked but plot(tree) gave a similar error message to yours. So far, I could always resolve the issue by removing special characters like single and double quotation marks from the input data. Thus, the plot function appears to be sensitive to certain symbols or special characters ... maybe a starting point for your search for a solution?

Lubridate Objects Masked After Loading Data.Table

When I load the data.table package after having already loaded the lubridate package, I get the following error message:
Loading required package: data.table
data.table 1.9.4 For help type: ?data.table
*** NB: by=.EACHI is now explicit. See README to restore previous behaviour.
Attaching package: ‘data.table’
The following objects are masked from ‘package:lubridate’:
hour, mday, month, quarter, wday, week, yday, year
Does anyone know a) what's causing this issue and b) how to prevent these objects within lubridate from being masked?
UPDATE:
The issue associated with the above is that I'm using the quarter function from the lubridate package and, after loading the data.table package, I can no longer do so in the same way.
Specifically, when I run quarter(Date, with_year=TRUE) (where Date is a vector of class = Dates), I now get the following error: Error in quarter(Date, with_year = TRUE) : unused argument (with_year = TRUE).
If I simply, quarter(Date), then I can get the desired output without the attached year. For example, if Date is set as simply May 15, 2015 (today), then quarter(Date) will yield 2 (since we're in the 2nd quarter of 2015), but I'd like it to yield 2015.2, hence the importance of the with_year = TRUE option.
Obviously, I can overcome this by using paste to bind together the year and the output of quarter(Date), but I'd prefer to avoid that work-around.
An object name in a package namespace is masked when a new object is defined with the same name. This can be done by the user assigning the name, or by attaching another package that has an object of the same name.
data.table and lubridate have overlapping function names. If you want the lubridate version to be the default, then the easiest solution is to load data.table first, then load lubridate---thus it will be the data.table versions of these functions that is masked by the "newer" lubridate versions.
library(data.table)
library(lubridate)
Otherwise, the solution is to use :: (as in package::function) to fully specify which version of the function you want to use, for example:
lubridate::quarter(Date, with_year = T)
Another option, which involves a little less typing but is perhaps a little less clear as well, would be to alias the lubridate functions you want in the global environment at the start of your script.
quarter = lubridate::quarter
Any use of quarter() later in the script will use the lubridate version of the function.
Yet another option is the conflicted package, which provides a system for preferring a function from one package. It is a bit more intense and intentional, you should definitely read the documentation before using it, but your script might include something like this:
library(conflicted)
conflict_prefer("quarter", "lubridate")
The package conflicted provides various alternatives and is a good practice to use it while loading libraries to be clear on the masking.
https://github.com/r-lib/conflicted

Resources