Naive question ahead: I would like to remove a columns after a map
Repex:
tibble(a = rep(c("A", "B"), each = 5),
x = runif(10),
y = runif(10),
z = runif(10)) %>%
split(.$a) %>%
map(`[`, c("x", "y", "z"))
selects me the x, y, and z columns of the tibbles.
What if I want to drop the column a instead?
(Same result, but easier for me.)
Using base R
map(~.x[grep('a', names(.x), invert = TRUE)])
#OR
map(function(x) x[grep('a', names(x), invert = TRUE)])
Using dplyr
map(~select(.x, -a))
Related
I'm working on making a function to create tables and I need to have some conditional rules involved for formatting. One will be based on a column name, however when I send it down using as.formula it seems to be over doing it. I've made an example here:
library(tidyverse)
library(rlang)
a <- as_tibble(x =cbind( Year = c(2018, 2019, 2020), a = 1:3,
b.1 = c("a", "b", "c"),
b.2 = c("d", "e", "f"),
fac = c("This", "This","That")))
foo <- function(x, y, z, ...){
y_var <- enquo(y)
x %>%
filter(Year %in% c(2018, 2019),
...) %>%
mutate(!!quo_name(y_var) := factor(!!y_var,
levels = z,
ordered = TRUE)) %>%
arrange(!!y_var)
}
to.table <- function(x, y, z, ...){
y_var <- enquo(y)
df.in <- foo(x=x,
y=!!y_var,
z= z)
cond <- paste("~!is.na(", quo_name(y_var),")")
cond.2 <- paste("~startsWith(colnames(", df.in, "),\"b\")")
flextable(df.in) %>%
bold(i = as.formula(cond),
part = "body") %>%
bg(i = as.formula(cond.2),
bg = "Red3",
j = as.formula(cond.2))
}
to.table(x=a,
y=Year,
z= c(2020,2018,2019),
fac == "This")
Error in startsWith(colnames(2:3), "b") : non-character object(s)
From the error I've been reviving it looks like solved the expression before it gets put through the as.formula as those two columns are the correct answer.
Proof:
df.in <- foo(x=a,
y=Year,
z= c(2020,2018,2019),
fac == "This")
startsWith(colnames(df.in), prefix = "b")
[1] FALSE FALSE TRUE TRUE FALSE
What am I missing here? If anyone has a solution, or suggestion on how to do things differently potentially using quosures or other tidyverse friendly methods I would much appreciate it.
Extension:
To make this a bit more clear, I may need to elaborate on my intended use of this example. I'm trying to figure out how to take names generated dynamically in a function represented as foo that start with a specified value (generally 3 columns), and then check those columns for a specified value that I can then highlight in a specific Color.
Additionally in the answer cond is used in both of the i= designation, the two separate conditions in will likely never overlap.
We could specify the j with the column names of the data created i.e. startsWith returns a logical vector from the column names based on the names that starts with 'b', use the logical vector to extract the column names with [ (nm1).
to.table <- function(x, y, z, ...){
y_var <- enquo(y)
df.in <- foo(x=x,
y=!!y_var,
z= z)
cond <- as.formula(glue::glue('~ !is.na({quo_name(y_var)})'))
nm1 <- names(df.in)[startsWith(names(df.in), prefix = "b")]
flextable(df.in) %>%
bold(i = cond,
part = "body") %>%
bg(i = cond,
bg = "Red3",
j = nm1)
}
-testing
to.table(x=a,
y=Year,
z= c(2020,2018,2019),
fac == "This")
-output
In the OP's post formula created for 'cond' is fine although it is a bit more flexible by using glue whereas the second one i.e. 'cond.2' returns
paste("~startsWith(colnames(", df.in, "),\"b\")")
[1] "~startsWith(colnames( 2:3 ),\"b\")" "~startsWith(colnames( c(\"1\", \"2\") ),\"b\")"
[3] "~startsWith(colnames( c(\"a\", \"b\") ),\"b\")" "~startsWith(colnames( c(\"d\", \"e\") ),\"b\")"
[5] "~startsWith(colnames( c(\"This\", \"This\") ),\"b\")"
It is because df.in is a data.frame on which we are trying to paste the startsWith(colnames( string. Each of the lines returned are the column values
If we want to get either 'a' or 'b' column names prefix with 'red' color, change the startsWith to grep which can take a regex as pattern
to.table <- function(x, y, z, ...){
y_var <- enquo(y)
df.in <- foo(x=x,
y=!!y_var,
z= z)
cond <- as.formula(glue::glue('~ !is.na({quo_name(y_var)})'))
nm1 <- grep("^(a|b)", names(df.in), value = TRUE)
flextable(df.in) %>%
bold(i = cond,
part = "body") %>%
bg(i = cond,
bg = "Red3",
j = nm1)
}
to.table(x=a,
y=Year,
z= c(2020,2018,2019),
fac == "This")
-output
If we want to color based on the value of 'a'
to.table <- function(x, y, z, ...){
y_var <- enquo(y)
df.in <- foo(x=x,
y=!!y_var,
z= z)
cond <- as.formula(glue::glue('~ !is.na({quo_name(y_var)})'))
nm1 <- names(df.in)[startsWith(names(df.in), prefix = "b")]
flextable(df.in) %>%
bold(i = cond,
part = "body") %>%
bg(i = ~ a == 2,
bg = "Red3",
j = nm1)
}
to.table(x=a,
y=Year,
z= c(2020,2018,2019),
fac == "This")
-output
Let's presume I have the following dataframe:
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
And I would like to replace the values in the variables by their corresponding data frame and variable names in the following list:
replace_df <- list(x = data.frame(x = 1:10),
y = data.frame(y = 11:20),
z = data.frame(z = 21:30))
How would I do that using dplyr?
I feel like my issue is related to this Q&A, but I haven't been able to implement the answers to that question correctly to my situation.
I've attempted the below, among others, without success:
library(tidyverse)
variables <- c("x", "y", "z")
df %>%
mutate_at(vars(variables), funs(replace_df[[.]][[.]]))
The "dumb" way would be the following:
df %>%
mutate(x = replace_df[["x"]][["x"]],
y = replace_df[["y"]][["y"]],
z = replace_df[["z"]][["z"]])
You need to use expr! I am not sure if the subsetting will work as you tried above, but I was able to get the correct output by making a simple function and passing in an argument that was wrapped in expr()
df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10))
replace_df <- list(x = data.frame(x = 1:10),
y = data.frame(y = 11:20),
z = data.frame(z = 21:30))
my_func <- function(string) {
return(
replace_df[[string]][[string]]
)
}
df %>%
mutate_at(vars(x, y, z), funs(my_func(expr(.))))
I have two lists, both of which contain similar datasets corresponding to different years. I wish to merge the datasets in both lists, element by element. When I use mapply, alongside dplyr::full_join, in the instance where the variable names don't match and I need to use the by argument, R is unable to perform the join.
library(dplyr)
set.seed(100)
first_list <- list(data.frame(x = 1:3, y = rnorm(3)),
data.frame(x = 4:6, y = rnorm(3)))
second_list <- list(data.frame(z = 1:3, w = rnorm(3)),
data.frame(z = 4:6, w = rnorm(3)))
Map(full_join, by = c("x" = "z"), first_list, second_list)
#Error: 'z' column not found in rhs, cannot join
However,
Map(function(x, y) full_join(x, y, by = c("x" = "z")), first_list, second_list)
works successfully. I am curious about this behaviour and wonder if anyone could provide some explanation.
Since Map is a wrapper to mapply, use its MoreArgs argument while the other required args (...) include lists to be vectorized over (see ?mapply):
test1 <- Map(full_join, first_list, second_list, MoreArgs=list(by = c("x" = "z")))
test2 <- Map(function(x, y) full_join(x, y, by = c("x" = "z")), first_list, second_list)
all.equal(test1, test2)
# [1] TRUE
I'm starting with the below table dt and try to subset its column by the list keys:
library(data.table)
set.seed(123)
randomchar <- function(n, w){
chararray <- replicate(w, sample(c(letters, LETTERS), n, replace = TRUE))
apply(chararray, 1, paste0, collapse = "")
}
dt <- data.table(x = randomchar(1000, 3),
y = randomchar(1000, 3),
z = randomchar(1000, 3),
key = c("x", "y", "z"))
keys <- with(dt, list(x = sample(x, 501),
y = sample(y, 500),
z = sample(z, 721)))
I can get the result I want by using a loop:
desired <- copy(dt)
for(i in seq_along(keys)){
keyname <- names(keys)[i]
desired <- desired[get(keyname) %in% keys[[i]]]
}
desired
The question is - Is there a more data.table idiomatic way to do this subset?
I tried using CJ: dt[CJ(keys)], but it takes a very long time.
What about building a mask and filter dt on this mask:
dt[Reduce(`&`, Map(function(key, col) col %in% key, keys, dt)),]
I would like to loop through a list of dataframes and change the column names (I want each of the columns to have the same name)
Does anyone have a solution using the following data?
df <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df2 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df3 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
x <- list(df, df2, df3)
Either using a for loop or apply? Would actually love to see both if possible
Thanks,
Ben
Both hrbrmstr and David Arenburg's answers are perfect.