add labels to gt table with for loop over column names - r

I have the following data table .
library(dplyr)
library(gt)
df <- tibble(
`model 2000` = c("a", "b"),
`car 2022` = c("f", "d")
)
I would like to loop over a vector of column names and perform a string replace, then append this to the gt table
my_cols <- colnames(df)
for(i in my_cols){
df <- df %>%
gt() %>%
cols_label(
i = str_trim(str_remove(i, "2020|2021|2022"))
)
df
}
I want to be able to change the names after the GT table is created using this for loop but when the loop passes the values in my_cols, they aren't recognized... any help?
Here is the error:
Error: All column names provided must exist in the input `.data` table.

The best way to do this is to eschew looping and pass a named vector and splice it inside cols_labels():
my_cols <- setNames(str_trim(str_remove(names(df), "2020|2021|2022")), names(df))
df %>%
gt() %>%
cols_label(!!!my_cols)
If for some reason you must use a loop, you need to create the gt() object outside of the loop, otherwise after the first iteration you're passing an object that is already class gt_tbl to the gt() function which causes an error. You also need to use the Walrus operator := instead of = and the LHS needs to be a symbol or a glue string.
my_cols <- names(df)
df <- df %>%
gt()
for(i in my_cols) {
df <- df %>%
cols_label("{i}" := str_trim(str_remove(i, "2020|2021|2022"))) # or !!sym(i) := ...
}
df

You can use the .list option in cols_label().
my_cols <- colnames(df)
df %>%
gt() %>%
cols_label(
.list = setNames(as.list(str_trim(str_remove(my_cols, "2020|2021|2022"))), my_cols)
)
However, it seem much easier to just do this:
my_cols <- colnames(df)
df %>%
rename_with(~str_trim(str_remove(.x, "2020|2021|2022")), .cols =my_cols) %>%
gt()
Input:
df <- tibble(
`model 2021` = c("a", "b"),
`car 2022` = c("f", "d")
)

Related

multiple kableExtra::column_spec based on number of variables

I want to reproduce the figure below for a data frame with any number of columns (assuming all columns have same format)
For example, I have a data frame where each cell is a list containing numeric values
# dataframe containg data
df <- data.frame(YEAR = 1980:1990) %>%
tibble::as_tibble()
vars <- c("a","b","c")
df["a"] <- list(list(rnorm(100)))
df["b"] <- list(list(rnorm(100)))
df["c"] <- list(list(rnorm(100)))
I then create a table
# dataframe to create for table
newdf <- data.frame(YEAR = 1980:1990) %>%
tibble::as_tibble()
newdf[vars] <- ""
# create table
kableExtra::kbl(newdf,
col.names=c("YEAR",vars),
caption=paste0("Title"),
escape=F) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>%
kableExtra::column_spec(2,image=kableExtra::spec_hist(df$a)) %>%
kableExtra::column_spec(3,image=kableExtra::spec_hist(df$b)) %>%
kableExtra::column_spec(4,image=kableExtra::spec_hist(df$c))
It looks something like this:
This all works great.
However in reality i have a data frame that changes in the number of columns that need to be plotted by kableExtra (since it is created based on user inputs) and i can't work out how to achieve this since in the example above the column_spec function needs to be repeated for each column. So i need a way to generate the table for a variable data frame size.
This seems to be compounded by the use of the pipe operator.
I have looked at piping a function but i think the function still has the same problem of piping a variable number of sequential commands.
Any help greatly appreciated.
You can simultaneously format multiple columns with a purrr::reduce statement, setting the .init argument to the table. That way, the column_spec function can be applied to multiple columns in an elegant way.
The command call will be like
reduce(columns, column_spec, [column_spec arguments], .init = table)
The reduce will call column_spec(table, columns[1], [column_spec arguments], then send that output (call it modified_table) to column_spec(modeifed_table, columns[2], [column_spec arguments], etc.
Here's some example code. Sorry - I tried to create a reprex but I can't get it to work with the html tables.
library(tidyverse)
library(kableExtra)
df <- data.frame(a = 1:10, b = 1:10, c = 1:10)
which_col <- c("b", "c") # which columns to format in the reduce()
df %>%
kbl() %>%
reduce(
which(names(df) %in% which_col), # column_spec wants a vector of column indices
column_spec,
bold = TRUE, # this is a ... argument, which will get sent to column_spec
.init = .
)
# for more complex cases, won't be able to use ... argument as elegantly
df %>%
kbl() %>%
reduce(
which(names(df) %in% which_col),
~column_spec(.x, .y, bold = rep(c(TRUE, FALSE), 5)),
.init = .
)
edit: here is how this would be applied to your table
library(kableExtra)
reduce_inputs <- lst(
col = match(vars, names(newdf)),
dat = df[, vars]
) %>%
transpose()
# create table
newdf %>%
kbl(
newdf,
col.names = c("YEAR", vars),
caption = paste0("Title"),
escape= FALSE
) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
reduce(
reduce_inputs,
~column_spec(.x, .y$col, image = spec_hist(.y$dat)),
.init = .
)

Create loop to change structure of multiple data frames in R

I have a bunch of excel files that I have loaded into R as separate dataframes. I now need to change the structure/layout of every one of these data frames. I have done all of this separately, but it is becoming very time consuming. I am not sure how there is a better way to accomplish this. My guess would be that I need to combine them all into a list and then create some type of loop to go through every data frame in that list. I need to be able to remove rows and columns from the edge, add 'row' the top left cell that is currently empty, and then follow that pivot_longer, mutate, and select functions that I have listed below that I have done separately.
names(df)[1] <- 'row'
df <- df %>%
pivot_longer((!row), names_to = "plateColumn", values_to = "Broth_t0")
df <- df %>%
mutate(wellID = paste0(row, plateColumn)) %>%
select(-c(row, plateColumn))
I have tried what is below and I get an error, does anyone have a better way that what I am currently doing to accomplish this?
for(x in seq_along(files.list)){
names(files.list)[1] <- 'row'
df <- df %>%
pivot_longer((!row), names_to = "plateColumn", values_to = "Broth_t0")
df <- df %>%
mutate(wellID = paste0(row, plateColumn)) %>%
select(-c(row, plateColumn))
}
If you have a vector of filenames my_files, I think this will work
library(tidyverse)
library(readxl)
prepare_df <- function(df) {
# make changes to df
names(df)[1] <- 'row'
df <- df %>%
pivot_longer((!row), names_to = "plateColumn", values_to = "Broth_t0")
df <- df %>%
mutate(wellID = paste0(row, plateColumn)) %>%
select(-c(row, plateColumn))
return(df)
}
names(my_files) <- my_files # often useful if the vector we're mapping over has names
dfs <- map(my_files, read_excel) # read into a list of data frames
dfs <- map(dfs, prepare_df) # prepare each one
df <- bind_rows(dfs, .id = "file") # if you prefer one data frame instead

How do I write a function (analogous to a SAS macro) in R to import and format a list of Excel files?

I'm looking for a more efficient way to write the following:
Read in all my Excel files
DF1 <- read_excel(DF1, sheet = "ABC", range = cell_cols(1:10) )
DF2 <- read_excel(DF2, sheet = "ABC", range = cell_cols(1:10) )
etc...
DF50 <- read_excel(DF50, sheet = "ABC", range = cell_cols(1:10) )
Add a column to each DF with a location
DF1$Location <- location1
DF2$Location <- location2
etc...
DF50$Location <- location50
Keep only columns with specified names, get rid of blank rows, and convert column CR_NUMBER to an integer
library(hablar)
DF1 <- DF1 %>% select(all_of(colnames_r)) %>% filter(!is.na(NAME)) %>% convert(int(CR_NUMBER))
DF2 <- DF2 %>% select(all_of(colnames_r)) %>% filter(!is.na(NAME)) %>% convert(int(CR_NUMBER))
etc...
DF50 <- DF50 %>% select(all_of(colnames_r)) %>% filter(!is.na(NAME)) %>% convert(int(CR_NUMBER))
You can try to use the following getting the data in a list :
library(readxl)
library(hablar)
library(dplyr)
#Get the complete path of file which has name "DF" followed by a number.
file_names <- list.files('/folder/path', pattern = 'DF\\d+', full.names = TRUE)
list_data <- lapply(seq_along(file_names), function(x) {
data <- read_excel(file_names[x], sheet = "ABC", range = cell_cols(1:10))
data %>%
mutate(Location = paste0('location', x))
select(all_of(colnames_r)) %>%
filter(!is.na(NAME)) %>%
convert(int(CR_NUMBER))
})
list_data is a list of dataframes which is usually better to manage instead of having 50 dataframes in global environment. If you still want all the dataframes separately name the list and use list2env.
names(list_data) <- paste0('DF', seq_along(list_data))
list2env(list_data, .GlobalEnv)

Pass a data.frame column name to a function that uses purrr::map

I'm working with nested dataframes and want to pass the name of the top level dataframe, and the name of a column containing lower level dataframes, to a function that uses purrr::map to iterate over the lower level data frames.
Here's a toy example.
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)
df1 <- tibble(x = c("a","b","c", "a","b","c"), y = 1:6)
df1 <- df1 %>%
group_by(x) %>%
nest()
testfunc1 <- function(df) {
df <- df %>%
mutate(out = map(data, min))
tibble(min1 = df$out)
}
testfunc2 <- function(df, col_name) {
df <- df %>%
mutate(out = map(col_name, min))
tibble(min2 = df$out)
}
df1 <- bind_cols(df1, testfunc1(df1))
df1 <- bind_cols(df1, testfunc2(df1, "data"))
df1$min1
df1$min2
testfunc1 behaves as expected, in this case giving the minimum of each data column in a new column. In testfunc2, where I've tried to pass the column name, a string reading "data" is passed to the new column. I think I understand from the thread here (Pass a data.frame column name to a function) why this doesn't behave as I want, but I haven't been able to figure out how to make it work in this case. Any suggestions would be great.
This should work for you, it uses the tidy eval framework. This assumes col_name is a string.
testfunc2 <- function(df, col_name) {
df <- df %>%
mutate(out = map(!! rlang::sym(col_name), min))
tibble(min2 = df$out)
}
EDIT:
If you'd rather pass a bare column name to the function, instead of a string, use enquo instead of sym.
testfunc2 <- function(df, col_name) {
col_quo = enquo(col_name)
df <- df %>%
mutate(out = map(!! col_quo, min))
tibble(min2 = df$out)
}

Make a list element of each group with dplyr's group_by function

I would like to be able to use more automation when creating SpatialLines objects from otherwise tidy data frames.
library(sp)
#create sample data
sample_data <- data.frame(group_id = rep(c("a", "b","c"), 10),
x = rnorm(10),
y = rnorm(10))
#How can I recreate this using dplyr?
a_list <- Lines(list(Line(sample_data %>% filter(group_id == "a") %>% select(x, y))), ID = 1)
b_list <- Lines(Line(list(sample_data %>% filter(group_id == "b") %>% select(x, y))), ID = 2)
c_list <- Lines(Line(list(sample_data %>% filter(group_id == "c") %>% select(x, y))), ID = 3)
SpatialLines(list(a_list, b_list, c_list))
You can see how using something like group_by would make the process pretty easy if you could understand how the data could be piped into a list.
Using your sample data, a wrapper function, and dplyr::do will give you what you want :)
wrapper <- function(df) {
df %>% select(x,y) %>% as.data.frame %>% Line %>% list %>% return
}
y <- sample_data %>% group_by(group_id) %>%
do(res = wrapper(.))
# and now assign IDs (since we can't do that inside dplyr easily)
ids = 1:dim(y)[1]
SpatialLines(
mapply(x = y$res, ids = ids, FUN = function(x,ids) {Lines(x,ID=ids)})
)
I don't use sp so there might be a better way to assign IDs.
For reference, consider reading Hadley's comments on returning non-dataframe from dplyr do calls

Resources