Is there a shortcut to comment out the pipe %>% operator for a given line? It would be great to have something like CMD+SHIFT+M+C or something to comment out just the pipe. This is useful when you have a long string of pipes and want to check your code along the way (especially for finding errors)
Example case (no reprex needed)
df %>%
select(col1, col2) %>%
filter(col1 > 5) %>%
mutate(col_sum = sum(col1, col2))
Shortcut to comment out just the pipe operator (say, to just see if the filter worked properly)
df %>%
select(col1, col2) %>%
filter(col1 > 5) #%>%
mutate(col_sum = sum(col1, col2))
Related
I am trying to specify a following function where I wall pass a dataset's column name as a name to group_by clause.
counter<-function(df,col_name){
a<-df %>%
group_by(col_name) %>%
count() %>%
arrange(desc(n))
return(a)
}
So if I try for example:
fraud_continent<-counter(fraud,continent_source1)
where fraud is dataset and continent_source1 is the column name from this dataset, the function wont work and the error I get is:
Error: Must group by variables found in .data.
Column col_name is not found.
How do I solve this?
You can use curly curly operator ({{}}).
counter<-function(df,col_name){
a<-df %>%
group_by({{col_name}}) %>%
count() %>%
arrange(desc(n))
return(a)
}
Also you can do this without group_by -
counter<-function(df,col_name){
a<-df %>%
count({{col_name}}) %>%
arrange(desc(n))
return(a)
}
This can be called as -
fraud_continent<-counter(fraud,continent_source1)
We could use ensym with !!
library(dplyr)
counter <- function(df, colname){
df %>%
count(!! rlang::ensym(colname)) %>%
arrange(desc(n))
}
and then it can be called as either
fraud_continent<-counter(fraud,continent_source1)
Or
fraud_continent<-counter(fraud, "continent_source1")
Update:
Thanks to the support of akrun .data[[col_name]] is better:
First answer:
Or you could use df[,col_name]
library(dplyr)
counter<-function(df,col_name){
a<-df %>%
group_by(df[,col_name]) %>%
count() %>%
arrange(desc(n))
return(a)
}
fraud_continent<-counter(fraud,"continent_source1")
Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.
Pretty basic but I don't think I really understand the change:
library(dplyr)
library(lubridate)
Lab_import_sql <- Lab_import %>%
select_if(~sum(!is.na(.)) > 0) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, funs(ifelse(is.character(.), trimws(.),.))) %>%
mutate_at(.vars = Lab_import %>% select_if(grepl("'",.)) %>% colnames(),
.funs = gsub,
pattern = "'",
replacement = "''") %>%
mutate_if(is.character, funs(ifelse(is.character(.), paste0("'", ., "'"),.))) %>%
mutate_if(is.Date, funs(ifelse(is.Date(.), paste0("'", ., "'"),.)))
Edit:
Thanks everyone for the input, here's reproducible code and my solution:
library(dplyr)
library(lubridate)
import <- data.frame(Test_Name = "Fir'st Last",
Test_Date = "2019-01-01",
Test_Number = 10)
import_sql <-import %>%
select_if(~!all(is.na(.))) %>%
mutate_if(is.factor, as.character) %>%
mutate_if(is.character, trimws) %>%
mutate_if(is.character, list(~gsub("'", "''",.))) %>%
mutate_if(is.character, list(~paste0("'", ., "'"))) %>%
mutate_if(is.Date, list(~paste0("'", ., "'")))
As of dplyr 0.8.0, the documentation states that we should use list instead of funs, giving the example:
Before:
funs(name = f(.))
After:
list(name = ~f(.))
So here, the call funs(ifelse(is.character(.), trimws(.),.)) can become instead list(~ifelse(is.character(.), trimws(.),.)). This is using the formula notation for anonymous functions in the tidyverse, where a one-sided formula (expression beginning with ~) is interpreted as function(x), and wherever x would go in the function is represented by .. You can still use full functions inside list.
Note the difference between the .funs argument of mutate_if and the funs() function which wrapped other functions to pass to .funs; i.e. .funs = gsub still works. You only needed funs() if you needed to apply multiple functions to selected columns or to name them something by passing them as named arguments. You can do all the same things with list().
You also are duplicating work by adding ifelse inside mutate_if; that line could be simplified to mutate_if(is.character, trimws) since if the column is character already you don't need to check it again with ifelse. Since you apply only one function, no need for funs or list at all.
Is there a command to add to tidyverse pipelines that does not break the flow, but produces some side effect, like printing something out. The usecase I have in mind is something like this. In case of a pipeline
data %>%
mutate(new_var = <some time consuming operation>) %>%
mutate(new_var2 = <some other time consuming operation>) %>%
...
I would like to add some command to the pipeline that would not modify the end result, but would print out some progress or the state of things. Maybe something like this:
data %>%
mutate(new_var = <some time consuming operation>) %>%
command_x(print("first operation done")) %>%
mutate(new_var2 = <some other time consuming operation>) %>%
...
Does there exist such command_x already?
For the specific case of printing an intermediate step in the pipeline, just use %>% print() %>%. E.g.,
mtcars %>%
filter(cyl == 4) %>%
print() %>%
summarise(mpg = mean(mpg))
For a simple status message, you'd do:
pipe_message = function(.data, status) {message(status); .data}
mtcars %>%
filter(cyl == 4) %>%
pipe_message("first operation done") %>%
select(cyl)
See the answer by #MrFlick for a more general solution for non-print functions.
You could easily write your own function
pass_through <- function(data, fun) {fun(data); data}
And use it like
mtcars %>% pass_through(. %>% ncol %>% print) %>% nrow
Here we use the . %>% syntax to create an anonymous function. You could also write your own more explicitly with
mtcars %>% pass_through(function(x) print(ncol(x))) %>% nrow
You can do on the fly with an anonymous function:
mtcars %>% ( function(x){print(x); return(x)} ) %>% nrow()
I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.