dplyr::recode Why does pipe generate error? - r

If I use recode in a pipe, I get an error:
df <- df %>%
recode(unit, .missing="g")
Error in UseMethod("recode") : no applicable method for 'recode'
applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
If I pull it out of the pipe, it works fine:
df$unit <- recode(df$unit, .missing="g")
Any ideas why? I'd like to stay in the pipe if possible.

An equivalent of the baseR solution in dplyr is to use it inside mutate:
df %>%
mutate(unit = recode(unit, .missing="g"))
Directly chaining recode after %>% will pass the data frame to recode as the first argument, which doesn't agree with recode's parameters. The first argument .x needs to be a vector; unlike some other dplyr functions recode doesn't use some non-standard evaluation magic to interpret unit as the column with that name in df. Most functions designed for direct use with the pipe have a data frame as their first argument and their output. You can read more about magrittr and how the pipe works here.

Related

Using to PLYR to count with Which Condition

I am trying to use the which function in conjunction with the count function. I would like to count the number of factors that follow a which condition. This code isn't correct, but any advice would be appreciated.
library(plyr)
count(data, 'factor', which numeric > 10)
#Base version attempt
count(data$factor, which(data$numeric > 10))
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "factor"
This isn't exactly what you're looking for but here are two pieces of advice:
plyr is an older version of dplyr so I would use the newer one, especially because it come in the tidyverse group. dplyr's count can deal with factors.
Factors aren't commonly used in R anymore. I would suggest just coercing with as.character
With dplyr you could write something like:
data %>% filter(numeric > 10) %>% count(factor)

do.call for all dataframe columns with lazy evaluation and dplyr function

My question is similar to these enter link description here and enter link description here, but my problem is more complex as it requires multiple dplyr operations and lazy evaluation.
This is my function:
stats <- function(col_names){
require("dplyr")
data %>%
group_by_(col_names) %>%
summarise(Count = n()) %>%
mutate(Percent = prop.table(Count)) -> temp
write.csv(temp, file=paste(col_names,".csv",sep="_"))}
Than, I want to pass every column name as an argument with do.call.
colnames <- names(data)
do.call(stats, as.list(col_names))
But I get a common error:
Error in (function (col_names) :
unused arguments ("loans_approved_amount_limit_in_account", "loans_approved_amount_limit_in_ron")
The function works if I enter the column names seperately. But I have to over 1000 columns, and so I need to automate the process.
do.call is used to supply multiple function arguments to a single execution of a function. For example, instead of writing paste('c', 1:2) we can use a list of arguments such that do.call(paste, list('c', 1:2)) gives the same result.
So in your case, do.call is the same as running stats(col1, col2, col3, ...). You can easily see that this won't work, since stats only takes one argument. That's why the error you get speaks about unused arguments.
What you want to do instead, is run your function multiple times with a single argument. One way to do that is lapply:
lapply(names(data), stats)

Could not find function mutate_if

I am trying to use the function mutate_if under dplyr() package to convert all the character columns to factor columns. I am aware of alternate approaches for this transfromation, but I am curious to see how mutate_if works. I tried the following command:
df <-df %>% mutate_if(is.character,as.factor)
But I am geting a message, :
could not find function mutate_if
I reinstalled dplyr() but still I am getting the same error message.
When I was having this problem, I also got the error message "there is no package called 'pillar'" while loading the dplyr package. (Re-installing dplyr didn't seem to install pillar.) Installing the pillar package allowed things to work again.
df <-df %>% dplyr::mutate_if(is.character,as.factor)
Try like this. when I faced the same error I used the mutate function like this.
There is no more mutate_if function. You could use this one which is really long if you would not be able to extract names of variables which are from type of character:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),names(which (sapply(dt1, class) == 'character',arr.ind = TRUE)))
If you have list of factor variables lets say in "varlist" object, you can use this one:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),varlist)

Pass a string variable to spread function in dplyr

I am trying to make a function which I pass a string variable to dplyr pipeline but having some problem. Like the following
col_spread = "speed".
In select(), I can use get(col_spread) to select the column named speed.
df %>% select(get(col_spread))
However, when I am using spread function in dplyr
df %>% spread(key = Key_col, value = get(col_spread))
Error: Invalid column specification
It doesn't work.
Is NSE the only way to go? If so, what should I do?
Thank you!
Actually get really isn't a great idea. It would be better to use the standard evaulation version of
df %>% select_(col_spread)
and then for spread it would look like
df %>% spread_("Key_col", col_spread)
note which values are quoted and which are not. spread_ expects two character values.

mutate_each_ non-standard evaluation

Really struggling with putting dplyr functions within my functions. I understand the function_ suffix for the standard evaluation versions, but still having problems, and seemingly tried all combinations of eval paste and lazy.
Trying to divide multiple columns by the median of the control for a group. Example data includes an additional column in iris named 'Control', so each species has 40 'normal', and 10 'control'.
data(iris)
control <- rep(c(rep("normal", 40), rep("control", 10)), 3)
iris$Control <- control
Normal dplyr works fine:
out_df <- iris %>%
group_by(Species) %>%
mutate_each(funs(./median(.[Control == "control"])), 1:4)
Trying to wrap this up into a function:
norm_iris <- function(df, control_col, control_val, species, num_cols = 1:4){
out <- df %>%
group_by_(species) %>%
mutate_each_(funs(./median(.[control_col == control])), num_cols)
return(out)
}
norm_iris(iris, control_col = "Control", control_val = "control", species = "Species")
I get the error:
Error in UseMethod("as.lazy_dots") :
no applicable method for 'as.lazy_dots' applied to an object of class "c('integer', 'numeric')"
Using funs_ instead of funs I get Error:...: need numeric data
If you haven't already, it might help you to read the vignette on standard evaluation here, although it sounds like some of this may be changing soon.
Your function is missing the use of interp from package lazyeval in the mutate_each_ line. Because you are trying to use a variable name (the Control variable) in the funs, you need funs_ in this situation along with interp. Notice that this is a situation where you don't need mutate_each_ at all. You would need it if you were trying to use column names instead of column numbers when selecting the columns you want to mutate.
Here is what the line would look like in your function instead of what you have:
mutate_each(funs_(interp(~./median(.[x == control_val]), x = as.name(control_col))),
num_cols)

Resources