I'm trying to create a function that will take 2 variables from a dataset, and map their distinct values side by side, after which it will write the out to a csv file. I'll be using dplyr's distinct function for getting the unique values.
map_table <- function(df, var1, var2){
df_distinct <- df %>% distinct(var1, var2)
write.csv(df_distinct, 'var1.csv')
}
map_table(iris, Species, Petal.Width)
1) map_table(iris, Species, Petal.Width) doesn't produce what I want. It should produce 27 rows of data, instead I'm getting 150 rows of data.
2) How can I name the csv file after the input of var1?
So if var1 = 'Sepal.Length', the name of the file should be 'Sepal.Length.csv'
If you want to pass the col names without quotes, you need to use non-standard evaluation. (More here)
deparse(substitute()) will get you the name for the file output.
library(dplyr)
map_table <- function(df, var1, var2){
file_name <- paste0(deparse(substitute(var1)), ".csv") # file name
var1 <- enquo(var1) # non-standard eval
var2 <- enquo(var2) # equo() caputures the expression passed, ie: Species
df_distinct <- df %>%
distinct(!!var1, !!var2) # non-standard eval, !! tells dplyr to use Species
write.csv(df_distinct, file = file_name)
}
map_table(iris, Species, Petal.Width)
You're trying to pass the columns as objects. Try passing their names instead and then use a select helper:
map_table <- function(df, var1, var2){
df_distinct <- df %>% select(one_of(c(var1, var2)))%>%
distinct()
write.csv(df_distinct, 'var1.csv')
}
map_table(iris, 'Species', 'Petal.Width')
1) Ok the answer is to use distinct_ instead of distinct. And the variables being called need to be apostrophized.
2) use apply function to concatenate values/string formatting, and file =
map_table <- function(df, var1, var2){
df_distinct <- df %>% distinct_(var1, var2)
write.csv(df_distinct, file = paste(var1,'.csv'))
}
map_table(iris, 'Species', 'Petal.Width')
Related
How do I drop a variable with the same name in a list of dataframes using map? Sadly the variable appears in a different position in each data frame, so I can't drop it using its position. It has to be with its name.
var1<-rnorm(100)
var2<-sample(letters, 100, replace=T)
var3<-rnorm(100)
df<-data.frame(var1, var2, var3)
df2<-data.frame(var1, var3, var2)
list1<-list(df, df2)
library(purrr)
#This works, but it won't help me because var2 is in different positions.
list1 %>%
map(., `[`, -2)
#This does not work.
list1 %>%
map(., `[`, -c("var2"))
You can do
map(list1, ~ .x %>% select(-var2))
Or using NSE with a curly-curly expression
name_excl <- "var2"
map(list1, ~ .x %>% select(-{{name_excl}}))
I'm trying to create a function that essentially gets me the MODE...or MODE-X (2nd-Xth most common value & and the associated counts for each column in a data frame.
I can't figure out what I may be missing and I'm looking for some assistance? I believe it has to do with the passing in of a variable into dplyr function.
library(tidyverse)
myfunct_get_mode = function(x, rank=1){
mytable = dplyr::count(rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = table %>% dplyr::slice(rlang::sym(rank))
return(result)
}
mtcars %>% lapply(. %>% (function(x) myfunct_get_mode(x, rank=2)))
There are some problems with your function:
You function-call is not doing what you think. Check with mtcars %>% lapply(. %>% (function(x) print(x))) that actually your x is the whole column of mtcars. To get the names of the column apply the function to names(mtcars). But then you also have to specify the dataframe you're working on.
To evaluate a symbol you get sym from you need to use !! in front of the rlang::sym(x).
rank is not a variable name, thus no need for rlang::sym here.
table should be mytable in second to last line of your function.
So how could it work (although there are probably better ways):
myfunct_get_mode = function(df, x, rank=1){
mytable = count(df, !!rlang::sym(x), sort = TRUE)
names(mytable)= c('variable','counts')
# return just the rank specified...such as mode or mode -1, etc
result = mytable %>% slice(rank)
return(result)
}
names(mtcars) %>% lapply(function(x) myfunct_get_mode(mtcars, x, rank=2))
If we need this in a list, we can use map
f1 <- function(dat, rank = 1) {
purrr::imap(dat, ~
dat %>%
count(!! rlang::sym(.y)) %>%
rename_all(~ c('variable', 'counts')) %>%
arrange(desc(counts)) %>%
slice(seq_len(rank))) #%>%
#bind_cols - convert to a data.frame
}
f1(mtcars, 2)
I have this work and need do it by only changing var1 and var2. So, I would like to create function but I could not. Can you make function with this simple work?
a12 = data %>% group_by(var1,var2)%>% tally
a12_1 <- data %>% group_by(var1) %>% tally
a12_2 = merge(a12,a12_1,by="var1")
a12_2$perc = a12_2[,3] / a12_2[,4]
The challenging for me is how to deal with this argument while creating function.
a_fun <- function(data,var1,var2)
I'm guessing you're struggling with the non-standard evaluation, if you append _ to the dplyr functions you can pass strings as arguments. I've not tested it, but you could try:
a_fun <- function(data, var1, var2) {
a12 <- data %>% group_by_(var1, var2) %>% tally()
a12_1 <- data %>% group_by_(var1) %>% tally()
a12_2 <- merge(a12, a12_1, by = var1)
a12_2$perc <- a12_2[, 3] / a12_2[, 4]
return(a12_2)
}
e.g
a_fun(data, "col1", "col2")
It's still not fully clear to me how I can pass certain expressions to dplyr.
I'd like to use a user defined function within mutate and be able to pass it column names as characters. I tried a few things with interp{lazyeval} with no success.
See the dummy example below.
library(dplyr)
library(lazyeval)
# Define custom function
sumVar <- function(x, y) { x + y }
# Using bare column names (OK)
iris %>%
mutate(newVar = sumVar(Petal.Length, Petal.Width))
# Using characters for column names (does not work)
iris %>%
mutate_(newVar = sumVar('Petal.Length', 'Petal.Width'))
We can try
library(lazyeval)
library(dplyr)
res1 <- iris %>%
mutate_(newVar= interp(~sumVar(x, y),
x= as.name("Petal.Length"),
y = as.name("Petal.Width")) )
The OP's method
res2 <- iris %>%
mutate(newVar = sumVar(Petal.Length, Petal.Width))
identical(res1, res2)
#[1] TRUE
Update
In the devel version of dplyr (soon to be released 0.6.0 in April 2017), this can be also with quosure
varNames <- quos(Petal.Length, Petal.Width)
res3 <- iris %>%
mutate(newVar = sumVar(!!! varNames))
The quos are quoting and inside the mutate, we use !!! to unquote a list for evaluation
identical(res2, res3)
#[1] TRUE
How can I simplify or perform the following operations using dplyr:
Run a function on all data.frame names, like mutate_each(funs()) for values, e.g.
names(iris) <- make.names(names(iris))
Delete columns that do NOT exist (i.e. delete nothing), e.g.
iris %>% select(-matches("Width")) # ok
iris %>% select(-matches("X")) # returns empty data.frame, why?
Add a new column by name (string), e.g.
iris %>% mutate_("newcol" = 0) # ok
x <- "newcol"
iris %>% mutate_(x = 0) # adds a column with name "x" instead of "newcol"
Rename a data.frame colname that does not exist
names(iris)[names(iris)=="X"] <- "Y"
iris %>% rename(sl=Sepal.Length) # ok
iris %>% rename(Y=X) # error, instead of no change
I would use setNames for this:
iris %>% setNames(make.names(names(.)))
Include everything() as an argument for select:
iris %>% select(-matches("Width"), everything())
iris %>% select(-matches("X"), everything())
To my understanding there's no other shortcut than explicitly naming the string like you already do:
iris %>% mutate_("newcol" = 0)
I came up with the following solution for #4:
iris %>%
rename_at(vars(everything()),
function(nm)
recode(nm,
Sepal.Length="sl",
Sepal.Width = "sw",
X = "Y")) %>%
head()
The last line just for convenient output of course.
1 through 3 are answered above. I came here because I had the same problem as number 4. Here is my solution:
df <- iris
Set a name key with the columns to be renamed and the new values:
name_key <- c(
sl = "Sepal.Length",
sw = "Sepal.Width",
Y = "X"
)
Set values not in data frame to NA. This works for my purpose better. You could probably just remove it from name_key.
for (var in names(name_key)) {
if (!(name_key[[var]] %in% names(df))) {
name_key[var] <- NA
}
}
Get a vector of column names in the data frame.
cols <- names(name_key[!is.na(name_key)])
Rename columns
for (nm in names(name_key)) {
names(df)[names(df) == name_key[[nm]]] <- nm
}
Select columns
df2 <- df %>%
select(cols)
I'm almost positive this can be done more elegantly, but this is what I have so far. Hope this helps, if you haven't solved it already!
Answer for the question n.2:
You can use the function any_of if you want to give explicitly the full names of the columns.
iris %>%
select(-any_of(c("X", "Sepal.Width","Petal.Width")))
This will not remove the non-existing column X and will remove the other two listed.
Otherwise, you are good with the solution with matches or a combination of any_of and matches.
iris %>%
select(-any_of("X")) %>%
select(-matches("Width"))
This will remove explicitly X and the matches. Multiple matches are also possible.
iris %>%
select(-any_of("X")) %>%
select(-matches(c("Width", "Spec"))) # use c for multiple matches