I have a simple task which I would like to loop over many datasets (which have similar variable names). I know how to do it dplyr, but I need to convert it to base R in order to get it into an anonymous function.
For example (this not the real data I am working with):
This is my dplyr approach:
mtcars %>%
select(mpg, contains("cyl")) %>%
distinct()
However, when I throw this into an anonymous function:
I get an error: Error: No tidyselect variables were registered
mtcars %>% (function(x) subset(x, select=c(mpg, contains("cyl")))
Any ideas about how to solve this, and how to add distinct() to the function so that I only get unique values? Any and all suggestions are appreciated, thank you!
Related
Currently I'm using this code in order to subset unique rows grouped by Loan_ID where Execution_Date is the closest to predetermined date (in my case "2022-03-31")
The example is as follows:
library(dplyr)
df %>%
group_by(Loan_ID) %>%
slice(which.min(abs(Execution_Date - as.Date("2022-03-31")))) %>%
ungroup
The problem is that if I implement this code is sparklyr I do get an error: "Slice() is not supported on database backends" (this is because no alternative of slice() function is around in SQL)
How can I deal with this problem?
Thank you in advance!
I would like to arrange a variable called "Name" by the number of characters in their Name. I'm aware that I need the arrange() function in the package dplyr, but do not find a function in the arrange() function that helps me to arrange based on numbers of characters in the name.
So far I have come up with: arrange((Name))
Is there someone who can help me with this?
Here's a simple workaround with dplyr package and iris data:
library(dplyr)
iris %>%
mutate(Species = as.character(Species)) %>% # Convert factor to characters
arrange(nchar(Species))
I'm trying as per
dplyr mutate using variable columns
&
dplyr - mutate: use dynamic variable names
to use dynamic names in mutate. What I am trying to do is to normalize column data by groups subject to a minimum standard deviation. Each column has a different minimum standard deviation
e.g. (I omitted loops & map statements for convenience)
require(dplyr)
require(magrittr)
data(iris)
iris <- tbl_df(iris)
minsd <- c('Sepal.Length' = 0.8)
varname <- 'Sepal.Length'
iris %>% group_by(Species) %>% mutate(!!varname := mean(pluck(iris,varname),na.rm=T)/max(sd(pluck(iris,varname)),minsd[varname]))
I got the dynamic assignment & variable selection to work as suggested by the reference answers. But group_by() is not respected which, for me at least, is the main benefit of using dplyr here
desired answer is given by
iris %>% group_by(Species) %>% mutate(!!varname := mean(Sepal.Length,na.rm=T)/max(sd(Sepal.Length),minsd[varname]))
Is there a way around this?
I actually did not know much about pluck, so I don't know what went wrong, but I would go for this and this works:
iris %>%
group_by(Species) %>%
mutate(
!! varname :=
mean(!!as.name(varname), na.rm = T) /
max(sd(!!as.name(varname)),
minsd[varname])
)
Let me know if this isn't what you were looking for.
The other answer is obviously the best and it also solved a similar problem that I have encountered. For example, with !!as.name(), there is no need to use group_by_() (or group_by_at or arrange_() (or arrange_at()).
However, another way is to replace pluck(iris,varname) in your code with .data[[varname]]. The reason why pluck(iris,varname) does not work is that, I suppose, iris in pluck(iris,varname) is not grouped. However, .data refer to the tibble that executes mutate(), and so is grouped.
An alternative to as.name() is rlang::sym() from the rlang package.
The article on dplyr here says "[]" (square brackets) can be used to subset filtered Tibbles like this:
filter(mammals, adult_body_mass_g > 1e7)[ , 3]
But I am getting an "object not found" error.
Here is the replication of the error on a more known dataset "iris"
library(dplyr)
iris %>% filter(Sepal.Length>6) [,c(1:3)]
Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Sepal.Length' not found
I also want to mention that I am deliberately not preferring to use the native subsetting in dplyr using select() as I need a vector output and not a data frame on a single column. Unfortunately, dplyr always forces a data frame output (for good reasons).
You need an extra pipe:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
Sorry, forgot the . before the brackets.
Note: Your code will probably be more readable if you stick to the tidyverse syntax and use select as the last operation.
iris %>%
filter(Sepal.Length > 6) %>%
select(1:3)
The dplyr-native way of doing this is to use select:
iris %>% filter(Sepal.Length > 6) %>% select(1:3)
You could also use {} so that the filtering is done before [ is applied:
{iris %>% filter(Sepal.Length>6)}[,c(1:3)]
Or, as suggested in another answer, use the . notation to indicated where the data should go in relation to [:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
You can also load magrittr explicitly and use extract, which is a "pipe-able" version of [:
library(magrittr)
iris %>% filter(Sepal.Length>6) %>% extract( ,1:3)
The blog entry you reference is old in dplyr time - about 3 years old. dplyr has been changing a lot. I don't know whether the blog's suggestion worked at the time it was written or not, but I'd recommend finding more recent sources to learn about this frequently changing package.
I am trying to use the function mutate_if under dplyr() package to convert all the character columns to factor columns. I am aware of alternate approaches for this transfromation, but I am curious to see how mutate_if works. I tried the following command:
df <-df %>% mutate_if(is.character,as.factor)
But I am geting a message, :
could not find function mutate_if
I reinstalled dplyr() but still I am getting the same error message.
When I was having this problem, I also got the error message "there is no package called 'pillar'" while loading the dplyr package. (Re-installing dplyr didn't seem to install pillar.) Installing the pillar package allowed things to work again.
df <-df %>% dplyr::mutate_if(is.character,as.factor)
Try like this. when I faced the same error I used the mutate function like this.
There is no more mutate_if function. You could use this one which is really long if you would not be able to extract names of variables which are from type of character:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),names(which (sapply(dt1, class) == 'character',arr.ind = TRUE)))
If you have list of factor variables lets say in "varlist" object, you can use this one:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),varlist)