I would like to assign a text to a variable and then use that variable within my pipeline. I extensively use gather and select.
In the example below, I want to be able to use x within my pipeline code:
library(tidyverse)
mtcars %>% head
mtcars %>%
gather(type, value, mpg:am) %>% head
mtcars %>% select(mpg:am) %>% head
This the variable I want to use
x <- "mpg:am"
None of what I have tried has worked
mtcars %>%
gather(type, value, get(x)) %>% head
mtcars %>%
gather(type, value, !!rlang::sym(x)) %>% head
mtcars %>% select(x) %>% head
mtcars %>% select(!!rlang::sym(x)) %>% head
Any ideas?
We can quote/quo it and then evaluate with !!
x <- quo(mpg:am)
out1 <- mtcars %>%
gather(type, value, !! x)
Checking the output with
out2 <- mtcars %>%
gather(type, value, mpg:am)
identical(out1, out2)
#[1] TRUE
Related
I have a set of chains of pipe operators (%>%) doing different things with different datasets.
For instance:
dataset %>%
mutate(...) %>%
filter(...) %>%
rowwise() %>%
summarise() %>%
etc...
If I want to reuse some parts of these chains, is there a way to do it, without just wrapping it into a function?
For instance (in pseudocode obviously):
subchain <- filter(...) %>%
rowwise() %>%
summarise()
# and then instead of the chain above it would be:
dataset %>%
mutate(...) %>%
subchain() %>%
etc...
Similar in syntax to desired pseudo-code:
library(dplyr)
subchain <- . %>%
filter(mass > mean(mass, na.rm = TRUE)) %>%
select(name, gender, homeworld)
all.equal(
starwars %>%
group_by(gender) %>%
filter(mass > mean(mass, na.rm = TRUE)) %>%
select(name, gender, homeworld),
starwars %>%
group_by(gender) %>%
subchain()
)
Using a dot . as start of a piping sequence. This is in effect close to function wrapping, but this is called a magrittr functional sequence. See ?functions and try magrittr::functions(subchain)
I'm new to R and trying to explore my variables by groups and i'm using a for loop to pass all suiting variable names under expss.
Here is an reproducible example :
require(expss)
require(dplyr)
colnoms <- as.data.frame(HairEyeColor) %>% names(.)
expss_digits(2)
for (i in colnoms){
as.data.frame(HairEyeColor) %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}
I expect the name of the variable in the output (Hair, Eye, Color) but instead i get only "get(i)".
Thanks for any advice
After get we can not to know original variable name. The simplest way to show original name is to set variable name as label:
require(expss)
data(HairEyeColor)
HairEyeColor <- as.data.frame(HairEyeColor)
colnoms <- names(HairEyeColor)
expss_digits(2)
for (i in colnoms){
# if we don't have label we assign name as label
if(is.null(var_lab(HairEyeColor[[i]]))) var_lab(HairEyeColor[[i]]) = i
HairEyeColor %>%
tab_cells(get(i)) %>%
tab_cols(Eye) %>%
tab_stat_cpct() %>%
tab_last_sig_cpct() %>%
tab_pivot() %>%
set_caption(i) %>%
htmlTable() %>%
print()
}
i'd like to produce nice summaries for a selection of grouping variables in my dataset, where for each group i would show the top 6 frequencies and their associated proportions. I can get this for a single grouping variable using the syntax:
my_db %>%
group_by(my_var) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>%
head()
How do i modify this expression so it can be used in an apply function?
For example using mtcars, I've tried something like this:
apply(mtcars[c(2:4,11)], 2,
function(x) {
group_by(!!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
but it doesn't work. Any idea how i can achieve this?
You should apply using the colnames(dat) to get the correct groupings:
dat <- mtcars[c(2:4,11)]
grp <- function(x) {
group_by(dat,!!as.name(x)) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
lapply(colnames(dat), grp)
apply(mtcars[c(2:4,11)], 2,
function(x) {
mtcars %>%
group_by(x= !!x) %>%
summarise(n=n()) %>%
mutate(pc=scales::percent(n/sum(n))) %>%
arrange(desc(n)) %>% head()
}
)
you just need the parent df to evaluation
Lets say I want to split out mtcars into 3 csv files based on their cyl grouping. I can use mutate to do this, but it will create a NULL column in the output.
library(tidyverse)
by_cyl = mtcars %>%
group_by(cyl) %>%
nest()
by_cyl %>%
mutate(unused = map2(data, cyl, function(x, y) write.csv(x, paste0(y, '.csv'))))
is there a way to do this on the by_cyl object without calling mutate?
Here is an option using purrr without mutate from dplyr.
library(tidyverse)
mtcars %>%
split(.$cyl) %>%
walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))
Update
This drops the cyl column before saving the output.
library(tidyverse)
mtcars %>%
split(.$cyl) %>%
map(~ .x %>% select(-cyl)) %>%
walk2(names(.), ~write_csv(.x, paste0(.y, '.csv')))
Update2
library(tidyverse)
by_cyl <- mtcars %>%
group_by(cyl) %>%
nest()
by_cyl %>%
split(.$cyl) %>%
walk2(names(.), ~write_csv(.x[["data"]][[1]], paste0(.y, '.csv')))
Here's a solution with do and group_by, so if your data is already grouped as it should, you save one line:
mtcars %>%
group_by(cyl) %>%
do(data.frame(write.csv(.,paste0(.$cyl[1],".csv"))))
data.frame is only used here because do needs to return a data.frame, so it's a little hack.
Apply function table() to each column of a data.frame using dplyr
I often apply the table-function on each column of a data frame using plyr, like this:
library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) )
Is it possible to do this in dplyr also?
My attempts fail:
mtcars %>% do( table %>% data.frame() )
melt( mtcars ) %>% do( table %>% data.frame() )
You can try the following which does not rely on the tidyr package.
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
Using tidyverse (dplyr and purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
Or:
mtcars %>%
map(~ table(.x) )
Or simply:
library(tidyverse)
mtcars %>%
map( table )
In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.
Solution by Caner did not work but from comenter akrun (credit goes to him), this solution worked great. Also using a much larger tibble to demo it. Also I added an order by percent descending.
library(nycflights13);dim(flights)
tte<-gather(flights, Var, Val) %>%
group_by(Var) %>% dplyr::mutate(n=n()) %>%
group_by(Var,Val) %>% dplyr::mutate(n1=n(), Percent=n1/n)%>%
arrange(Var,desc(n1) %>% unique()