I have a simple function in R to modify a dataframe
monthly_fun <- function(x){
x %>%
mutate(obstime = convert_dates(obstime)) %>%
select(obstime, x = obsvalue)
}
When applying the function to dataframe df, i.e. monthly_fun(df), I would like df to be the name of obsvalue. In my current code, the name is obviously "x", how can I modify the part in select to get the name of the supplied dataframe as the variable name instead?
Thanks a lot
EDIT: I want to apply this function to several dataframes using
result <- list( df1, df2, df3) %>%
lapply( monthly_fun )
You could extract the name of input by deparse(substitute(x)), and use !!y := obsvalue in mutate().
monthly_fun <- function(x) {
y <- deparse(substitute(x))
x %>%
mutate(obstime = convert_dates(obstime),
!!y := obsvalue) %>%
select(obstime, y)
}
A simplified example:
fun <- function(x) {
y <- deparse(substitute(x))
x %>%
mutate(!!y := 1) %>%
select(y)
}
fun(df)
# df
# 1 1
# 2 1
# 3 1
# 4 1
# 5 1
Update
If you want to apply it to several data frames stored in a list, you should design a 2-argument function, one argument for data and the other for new column names. Then use Map() to apply this function over each pair of data and names.
fun <- function(x, y) {
x %>%
mutate(!!y := 1) %>%
select(y)
}
Map(fun, list(df1, df2), c("name1", "name2"))
# [[1]]
# name1
# 1 1
# 2 1
# 3 1
# 4 1
# 5 1
#
# [[2]]
# name2
# 1 1
# 2 1
# 3 1
# 4 1
# 5 1
If you're familiar with purrr, The use of Map can be replaced with map2() or imap(). (Notice the difference of inputs to the both functions)
library(purrr)
# (1) map2(): Input data and names separately
map2(list(df1, df2), c("name1", "name2"), fun)
# (2) imap(): Input a named list
imap(list(name1 = df1, name2 = df2), fun)
Using the suggestion by Julien and creating a variable using deparse(substitute(df)) and rename using that.
monthly_fun <- function(x) {
y = deparse(substitute(x))
x <- x %>%
mutate(obstime = obstime*5) %>%
select(obstime, obsvalue)
names(x)[names(x) == "obsvalue"] <- y
return(x)
}
see this site for more naming methods.
Related
I am looking to replace mutate_ with mutate since there are deprecation warnings now and am unsure how to use some of the answers I have sought out on Stack Overflow. This answer has the deprecated quosure issue and not sure how this one can be applied.
library(tibble)
library(dplyr)
library(magrittr)
library(rlang)
# two data frames/tibbles
df1 <-
data.frame(
w = c(0,9,8),
x = c(1,2,3),
y = c(4,5,6)
) %>% tibble()
df2 <-
data.frame(
x = c(9,9,9),
y = c(1,1,1),
z = c(6,6,6)
) %>% tibble()
# the original function
swapThem <- function(to, from) {
cols <- colnames(from)
if (length(cols) != 0) {
# Loop through `from` columns and if there's a match in `to`, copy and paste
# it into `to`
for (i in seq_along(cols)) {
col <- cols[i]
if (col %in% colnames(to)) {
print(col)
dots <-
stats::setNames(list(lazyeval::interp(
~ magrittr::use_series(from, x), x = as.name(col)
)), col)
to <- to %>%
#dplyr::mutate(.dots = dots)
dplyr::mutate_(.dots = dots)
} else {
next
}
}
}
return(to)
}
Here is a simpler base R alternative -
swapThem <- function(to, from) {
cols <- intersect(colnames(to), colnames(from))
if(length(cols)) to[cols] <- from[cols]
to
}
swapThem(df1, df2)
# A tibble: 3 × 3
# w x y
# <dbl> <dbl> <dbl>
#1 0 9 1
#2 9 9 1
#3 8 9 1
The output is similar when I run your code with swapThem(df1, df2) I get
#[1] "x"
#[1] "y"
# A tibble: 3 × 3
# w x y
# <dbl> <dbl> <dbl>
#1 0 9 1
#2 9 9 1
#3 8 9 1
There are easier ways to do this (see Ronak Shah's base R approach, for example), however since you specifically asked about how to switch from mutate_ to mutate, you can adjust your original code in this way:
swapThem <- function(to, from) {
cols <- colnames(from)
if (length(cols) != 0) {
# Loop through `from` columns and if there's a match in `to`, copy and paste
# it into `to`
for (i in seq_along(cols)) {
col <- cols[i]
if (col %in% colnames(to)) {
to <- to %>% dplyr::mutate(!!sym(col) := from[[col]])
} else {
next
}
}
}
return(to)
}
Note that you can also use {{}}, like this:
to <- to %>% dplyr::mutate({{col}} := from[[col]])
Here is another tidy approach, that uses bind_cols. The relocate is to ensure that the order of columns in to is preserved
swapThem <- function(to,from) {
bind_cols(
to %>% select(all_of(setdiff(colnames(to), colnames(from)))),
from %>% select(all_of(intersect(colnames(to), colnames(from))))
) %>%
relocate(colnames(to))
}
My foo function below works great, however it omits the name associated with its vector output.
For the example below, I expect foo to return:
scale
[1] 2
But it simply returns:
[1] 2
Is there fix for this?
library(tidyverse)
library(rlang)
txt = "
study scale
1 1
1 1
2 2
3 2
3 2
3 2
4 1
"
h <- read.table(text = txt,h=T)
foo <- function(data, ...){
dot_cols <- rlang::ensyms(...)
str_cols <- purrr::map_chr(dot_cols, rlang::as_string)
min(sapply(str_cols, function(i) length(unique(data[[i]]))))
}
foo(h, study, scale)
We may use which.min to get the index and then use it to subset the element. Also, loop over a named vector
foo <- function(data, ...){
dot_cols <- rlang::ensyms(...)
str_cols <- purrr::map_chr(dot_cols, rlang::as_string)
v1 <- sapply(setNames(str_cols, str_cols),
function(i) length(unique(data[[i]])))
v1[which.min(v1)]
}
-testing
> foo(h, study, scale)
scale
2
You can skip the rlang stuff by using summarise and passing ... to across
library(dplyr)
foo <- function(data, ...){
data %>%
summarise(across(c(...), n_distinct)) %>%
unlist() %>%
.[which.min(.)]
}
foo(h, study, scale)
#> scale
#> 2
I have a messy, highly nested, list:
m <- list('form' = list('elements' = list('name' = 'Bob', 'code' = 12), 'name' = 'Mary', 'code' = 15))
> m
$form
$form$elements
$form$elements$name
[1] "Bob"
$form$elements$code
[1] 12
$form$name
[1] "Mary"
$form$code
[1] 15
How can I extract from the object m the name and code, regardless as to how nested name and code appears within a list?
Expected output:
# A tibble: 2 x 2
name code
<chr> <dbl>
1 Bob 12
2 Mary 15
1) rrapply Flatten m using rrapply giving r and then separate the name and code fields of unlist(r) using tapply, remove the dimensions using c, convert to data.frame and set the order of the columns.
Note that this is not hard coded to name and code and would work with other fields and numbers of fields.
library(rrapply)
r <- rrapply(m, f = c, how = "flatten")
nms <- names(r)
as.data.frame(c(tapply(unname(r), nms, unlist)))[unique(nms)]
giving:
name code
1 Bob 12
2 Mary 15
An alternative to the final two lines of code above would be:
out <- unstack(stack(r))
out[] <- lapply(out, type.convert)
If there could be other fields in m in addition to name and code that we want ignored then use this in place of the statement that defines r above:
cond <- function(x, .xname) .xname %in% c("name", "code")
r <- rrapply(m, cond, c, how = "flatten")
2) Base R A base R solution is the following which unlists m, and then uses tapply as in (1) grouping by the suffixes of names(r). Like (1) this is a general approach that is not hard coded to name and code. Note that tools comes with R so it is part of Base R.
r <- unlist(m)
nms <- tools::file.ext(names(r))
as.data.frame(c(tapply(unname(r), nms, unlist)))[unique(nms)]
This could help formating the list into a dataframe and then reshaping it:
library(tidyverse)
#Process
y1 <- as.data.frame(lapply(m,unlist),stringsAsFactors = F)
y1$id <- rownames(y1)
rownames(y1)<-NULL
#Dplyr mutation
y1 %>% mutate(Var=ifelse(grepl('name',id,),'name',
ifelse(grepl('code',id),'code',NA))) %>%
select(-id) %>% group_by(Var) %>%
mutate(i=1:n())%>% pivot_wider(names_from = Var,values_from = form) %>%
select(-i) %>% mutate(code=as.numeric(code))
Output:
# A tibble: 2 x 2
name code
<chr> <dbl>
1 Bob 12
2 Mary 15
In this example, I have a simple function taking variable names passed via ... and making use of the enquos function in order to pass them to group_by operator in dplyr.
Basic function
# Libraries
library(dplyr)
library(rlang)
sample_function <- function(x, ...) {
group_vars <- enquos(...)
x %>%
group_by(!!!group_vars) %>%
summarise(num_obs = n())
}
Results
mtcars %>% sample_function(cyl, am)
# A tibble: 6 x 3
# Groups: cyl [3]
cyl am num_obs
<dbl> <dbl> <int>
1 4 0 3
2 4 1 8
3 6 0 4
4 6 1 3
5 8 0 12
6 8 1 2
Problem
I would like to expand the function above and in addition to the produced results create a new scalar character that would reflect names of variables passed to enquos in a format: "var1, var2, ...".
Attempt
library(dplyr)
library(rlang)
sample_function <- function(x, ...) {
group_vars <- enquos(...)
# Problem:
# Create test object of quoted variables
assign(x = "used_group_variables",
value = quo_text(group_vars),
envir = globalenv())
# Summary
x %>%
group_by(!!!group_vars) %>%
summarise(num_obs = n())
}
Results
Produced string does not match the desired format.
used_group_variables
# [1] "structure(list(~cyl, ~am), .Names = c(\"\", \"\"), class = c(\"quosures\", \n\"list\"))"
Desired results
Only names of all variables initially passed via enquos are returned and pasted together with "`" as a separator.
used_group_variables
# "cyl, am"
Notes
Admittedly, assigning values to the global environment from the inside of a function is not a good practice. This is only done for illustrative purposes. In effect, the key goal is to coerce whatever sits within enquos to string of format "name1, name2, ...".
You could use sapply for that and collapse with toString:
sample_function <- function(x, ...) {
group_vars <- enquos(...)
assign(x = "used_group_variables",
value = toString(sapply(group_vars, quo_name)),
envir = globalenv())
x %>%
group_by(!!!group_vars) %>%
summarise(num_obs = n())
}
Output:
mtcars %>% sample_function(am, cyl)
used_group_variables
# [1] "am, cyl"
Edit: As suggested by #LionelHenry in the comment, you may want to use as_label instead of quo_name as the latter is misleading and will likely be deprecated.
I am just started learning programming and I have a question that is probably easy for you.
I have a dataset that looks something like this
df <- data.frame(id= c(1,1,1,2,2,2,3,3,3), time=c(1,2,3,1,2,3,1,2,3),y = rnorm(9), x1 = LETTERS[seq( from = 1, to = 9 )], x2 = c(0,0,0,0,1,0,1,1,1),c2 = rnorm(9))
df
# id time y x1 x2 c2
# 1 1 1 0.6364831 A 0 -0.066480473
# 2 1 2 0.4476390 B 0 0.161372575
# 3 1 3 1.5113458 C 0 0.343956178
# 4 2 1 0.3532957 D 0 0.279987147
# 5 2 2 0.3401402 E 1 -0.462635393
# 6 2 3 -0.3160222 F 0 0.338454940
# 7 3 1 -1.3797158 G 1 -0.621169576
# 8 3 2 1.4026640 H 1 -0.005690801
# 9 3 3 0.2958363 I 1 -0.176488132
I am writing a function with multiple steps. I would like the feed the function with two elements the dataset and the variable of interest.
However, the function breaks down when I try to dcast it as it fails to individuate the variable. The crucial step of the function looks something like this.
testfun<-function(df,var)
{
newdf <- dcast(dataset,id+time~ x1, value.var = var) %>% # note this should be the variable of interest that i feed into the function
distinct()
return(newdf)
}
df2<-testfun(df,y)
Can anyone help me and explain how can I create a function where I index both a dataset and a function?
Thank you in advance for your help
If you pass column name as a string the function would work as it is
library(tidyverse)
library(data.table)
testfun1<-function(df,var) {
newdf <- dcast(df,id+time~ x1, value.var = var) %>% distinct()
return(newdf)
}
testfun1(df, "y")
However, if you want to pass unquoted variable as input you can use
testfun2<-function(df,var) {
var1 <- deparse(substitute(var))
newdf <- dcast(df,id+time~ x1, value.var = var1) %>% distinct()
return(newdf)
}
testfun2(df, y)
The equivalent tidyr function mentioned by #Konrad Rudolph is pivot_wider which would work with both types of inputs.
testfun3 <-function(df,var) {
new_df <- pivot_wider(df, names_from = x1, values_from = y)
return(new_df)
}
testfun3(df, y)
testfun3(df, "y")