Same Column Names in flextable in R - r

I am trying to create a 'flexable' object from the R package "flextable". I need to put same column names in more than one columns. When I am putting them in the the "col_key" option in the function "flextable" I am getting the error of "duplicated col_keys". Is there a way to solve this problem?
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A<-flextable(A,col_keys=c("a","b","a","b"))
This is the example code for which I am getting the error.

As it stands now, flextable does not allow duplicate column keys. However, you can achieve the same result by adding a row of "headers," or a row of column labels, to the top of your table. These headers can contain duplicate values.
You do this with the "add_header_row" function.
Here is a base example using the iris data set.
ft <- add_header_row(
ft,
values = c("", "length", "width", "length", "width"),
top = FALSE
)
ft <- theme_box(ft)
https://davidgohel.github.io/flextable/articles/layout.html

I found a work around by adding the character \r to the column names to create unique column names :
library(flextable)
A <- matrix(rnorm(8), nrow = 2, ncol = 4)
A <- as.data.frame(A)
col_Names <- c("a","b","a","b")
nb_Col_Names <- length(col_Names)
for(i in 1 : nb_Col_Names)
{
col_Names[i] <- paste0(col_Names[i], paste0(rep("\r", i), collapse = ""), collapse = "")
}
colnames(A) <- col_Names
tbl_A <- flextable(A)

Currently, using set_header_labels:
library(flextable)
a<-c(1:8)
b<-c(1:8)
c<-c(2:9)
d<-c(2:9)
A <- data.frame(a,b,c,d)
flextable(A) |> set_header_labels(`a` = "a", `b` = "b", `c` = "a", `d` = "b")

Related

data.table with unique names from nested list with different classes and missing names

I have a nested list of this structure:
test_list <- list(
"some string",
list(type = "entry_type", text = "some content"),
list("more strings"),
list(type = "another_type", text = "more text yet"),
""
)
So it is a list containing lists and plain entries, while the only names are those of the items of the nested lists - which are duplicated.
My goal is to transfer this into a data.table with the original names (provided they exist) but made unique.
Currently I use this pipeline:
library(data.table)
dt <- as.data.table(flatten(test_list))
unique_names <- paste0("V", seq_len(length(names(dt))))
propper_names <- names(dt)
new_names <- propper_names
blank_names <- which(new_names == "")
new_names[blank_names] <- unique_names[blank_names]
duplicates_names <- which(duplicated(new_names))
new_names[duplicates_names] <- paste(
propper_names[duplicates_names],
unique_names[duplicates_names],
sep = "_"
)
setnames(
dt,
new_names
)
Is there a nicer/faster/better/more robust way to accomplish this goal?
You could use unlist rapply() to retain the deeper-nested names:
setDT(as.list(rapply(test_list, identity)))[]
setnames(DT, make.names(names(DT), unique = TRUE))
DT
# V1 type text V2 type.1 text.1 V3
# 1: some string entry_type some content more strings another_type more text yet

How to order a DataTable using a hidden column

I am rather new to R and I am trying to prepare an interactive data table using the DT package. My data contains numeric values, but some of these values are preceded by < or > sign. What I want is for my data table is to allow interactive sorting on the numeric values, regardless of whether there is a < or > sign in front of it. So for example >10, <5, 9, >8 should sort to <5, >8, 9, >10.
My initial approach for this was to duplicate the column containing the numeric values with < and > signs, to remove the < and > signs from this duplicate column, and to convert this data to numeric values to obtain a column with only the numeric values. What I then would like is to be able to order the data in the table on these numeric values, but I want to be able to do this when clicking the ordening button of the column containing the numeric values with the < and > signs. Therefore, I want to hide the column containing only the numeric values (since I do not want this column to be present in the table), but I want to somehow link the ordining function of the original column to this hidden column.
Here are some example data and a script in which I have already duplicated the column (b to c), removed the < and > signs, and converted it to numeric values to obtain the column c, which I have then hidden:
library(DT)
df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
DT <- DT::datatable(df,
options = list(columnDefs =
list(list(visible=FALSE,
targets=3))))
DT
I have not been able to find a way to sort the data in the table on this hidden column c by using the sorting button of column b.
I have found that this should be possible in JavaScript: jQuery DataTables - Ordering dates by hidden column
However, I am not able to figure out how to do the same in R, either by using a suitable function in R, or by providing it in JavaScript using the JS() function.
Could anyone help me with this problem?
Here is a solution using render:
library(DT)
render <- c(
"function(data, type, row){",
" if(type === 'sort'){",
" return parseFloat(data.match(/\\d+\\.?\\d+/)[0]);",
" }else{",
" return data;",
" }",
"}"
)
df <- data.frame(
a = 1:5,
b = c('10','5.0','2.0','< 1.0','> 20')
)
DT <- datatable(df,
options = list(
columnDefs = list(
list(render = JS(render), type = "num", targets = 2)
)
)
)
DT
This solution does not require a hidden column.
Here's a way to do it. To get the "sorting key" use order.
library(DT)
# df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
df <- data.frame(a = 1:5, b = c('10', '5.0', '2.0', '< 1.0', '> 20'))
df
#ONE APPROACH
df$c <-
stringr::str_replace(string = df$b,
pattern = "[<>]",
replacement = "") %>%
as.numeric()
#ANOTHER APPROACH
df$c <- gsub("[<>]", "", df$b) %>% as.numeric()
DT::datatable(df[order(df$c), -3], rownames = FALSE)
library(DT)
df <- data.frame(a=1:5, b=c('10','5.0','2.0','< 1.0','> 20'), c=c(10,5,2,1,20))
DT <- DT::datatable(df,
options = list(columnDefs =
list(list(visible=FALSE, targets=3),
list(orderData=3, targets=2)
)))
DT
Note: This answer is based on this one here, but DT now uses R indexing instead of JS indexing.

How can I delete a row containing a specific string in R?

I am new to using R. I am using a data set and the missing values have been replaced with "?" before I get the data. I am looking for a way to delete the rows that contain this. It isn't specific to just one row it is in all of them.
I have tried Delete rows containing specific strings in R but it isn't working for me. I have included my code so far below.
library(randomForest)
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'))
names <- names(heart)
nrow(heart)
ncol(heart)
names(heart)
colnames(heart)[colnames(heart)=="X11"] <- "survival"
colnames(heart)[colnames(heart)=="X0"] <- "alive"
colnames(heart)[colnames(heart)=="X71"] <- "attackAge"
colnames(heart)[colnames(heart)=="X0.1"] <- "pericardialEffusion"
colnames(heart)[colnames(heart)=="X0.260"] <- "fractionalShortening"
colnames(heart)[colnames(heart)=="X9"] <- "epss"
colnames(heart)[colnames(heart)=="X4.600"] <- "lvdd"
colnames(heart)[colnames(heart)=="X14"] <- "wallMotionScore"
colnames(heart)[colnames(heart)=="X1"] <- "wallMotionIndex"
colnames(heart)[colnames(heart)=="X1.1"] <- "mult"
colnames(heart)[colnames(heart)=="name"] <- "patientName"
colnames(heart)[colnames(heart)=="X1.2"] <- "group"
colnames(heart)[colnames(heart)=="X0.2"] <- "aliveAfterYear"
names(heart)
library(randomForest)
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'),na.strings = "?")
names <- names(heart)
nrow(heart)
ncol(heart)
names(heart)
colnames(heart)[colnames(heart)=="X11"] <- "survival"
colnames(heart)[colnames(heart)=="X0"] <- "alive"
colnames(heart)[colnames(heart)=="X71"] <- "attackAge"
colnames(heart)[colnames(heart)=="X0.1"] <- "pericardialEffusion"
colnames(heart)[colnames(heart)=="X0.260"] <- "fractionalShortening"
colnames(heart)[colnames(heart)=="X9"] <- "epss"
colnames(heart)[colnames(heart)=="X4.600"] <- "lvdd"
colnames(heart)[colnames(heart)=="X14"] <- "wallMotionScore"
colnames(heart)[colnames(heart)=="X1"] <- "wallMotionIndex"
colnames(heart)[colnames(heart)=="X1.1"] <- "mult"
colnames(heart)[colnames(heart)=="name"] <- "patientName"
colnames(heart)[colnames(heart)=="X1.2"] <- "group"
colnames(heart)[colnames(heart)=="X0.2"] <- "aliveAfterYear"
names(heart)
heart1 <- na.omit(heart)
while importing file you can specify na.string as ? and later using na.omit you can remove all the ? or NA strings
I think this can do what you want.
# Do not forget to set stringsAsFactors as false to the read.csv
# as to make string comparison efficient
heart <- read.csv(url('http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data'),stringsAsFactors = F)
# Simpler way to assign column names to the dataframe
colnames(heart) <- c("survival", "alive", "attackAge", "pericardialEffusion",
"fractionalShortening", "epss", "lvdd", "wallMotionScore",
"wallMotionIndex", "mult", "patientName",
"group", "aliveAfterYear")
# You can traverse a dataframe as a matrix using the row and column index
# as coordinates
for(r in 1:nrow(heart)){
for(c in 1:ncol(heart)){
# For this particular cell you do a comparison
# substituting the ? with NA which is the default missing value
# in R
heart[r,c] <- ifelse(heart[r,c]=="?",NA,heart[r,c])
}
}
# omit the NA rows
heart <- na.omit(heart)
Some libraries support reading csv files and specifying strings to be read as missing values. I use the readr library most often. Then you can just use na.omit and similar functions.
library(readr)
library(dplyr)
heart <- read_csv(
'http://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data',
na=c("", "?")
)
colnames(heart) <- recode(
colnames(heart),
"X11" = "survival",
"X0" = "alive",
"X71" = "attackAge",
"X0.1" = "pericardialEffusion",
"X0.260" = "fractionalShortening",
"X9" = "epss",
"X4.600" = "lvdd",
"X14" = "wallMotionScore",
"X1" = "wallMotionIndex",
"X1.1" = "mult",
"name" = "patientName",
"X1.2" = "group",
"X0.2" = "aliveAfterYear"
)
heart
heart <- na.omit(heart)
(Also you can spare some typing with the recode function from the dplyr package, but your solution for renaming the columns works as good.)

R: Conditional Formatting across excel files

I am trying to highlight rows of an excel file based on a match from the columns in a separate excel file. Pretty much, I want to highlight a row in file1 if a cell in that row matches a cell in file2.
I saw the R package "conditionalFormatting" has some of this functionality, but I cannot figure out how to use it.
the pseudo-code i think would look something like this:
file1 <- read_excel("file1")
file2 <- read_excel("file2")
conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22,
rule = "number in file1 is found in a specific column of file 2")
Please let me know if this makes sense or if i need to clarify something.
Thanks!
The conditionalFormatting() function embeds active conditional formatting into the excel document but is likely more complicated than you need for a one-time highlight. I'd suggest loading each file into a dataframe, determining which rows contain a matching cell, creating a highlight style (yellow background), loading the file as a workbook object, setting the appropriate rows to the highlight style, and saving the updated workbook object.
The following function is the used to determine which rows have a match. The magrittr package provides the %>% pipes and the data.table package provides the transpose() function.
find_matched_rows <- function(df1, df2) {
require(magrittr)
require(data.table)
# the dataframe object treats each column as a list making it much easier and
# faster to search via column than row. Transpose the original file1 dataframe
# to treat the rows as columns.
df1_transposed <- data.table::transpose(df1)
# assuming that the location of the match in the second file is irrelevant,
# unlist the file2 dataframe so that each value in file1 can be searched in a
# vector
df2_as_vector <- unlist(df2)
# determine which columns contain a match. If one or more matches are found,
# attribute the row as 'TRUE' in the output vector to be used to subset the
# row numbers
match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
sapply(function(x) sum(x) > 0)
# make a vector of row numbers using the logical match_map vector to subset
matched_rows <- seq(1:nrow(df1))[match_map]
return(matched_rows)
}
The following code loads the data, finds the matched rows, applies the highlight, and saves over the original file1.xlsx. The second tst_df1 and tst_df2 provide for an easy way of testing the find_matched_rows() function. As expected, it finds that the 1st and 3rd rows of the first dataframe contain a cell that matches a cell in second dataframe.
# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1
tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
startRow = file2_header_row)
#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"),
lname = c("Smith", "Johnson", "Samson"),
wage = c(10, 15.23, 137.38),
stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2),
b = c("Billy", "Bill", "Billy-Bob"),
c = c("Samson", "Johansson", NA),
stringsAsFactors = FALSE)
df_matched_rows <- find_matched_rows(tst_df1, tst_df2)
# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow")
file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb,
sheet = 1,
style = highlight_style,
rows = file1_header_row + df_matched_rows,
cols = 1:ncol(tst_df1),
stack = TRUE,
gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb,
file = "./file1.xlsx",
overwrite = TRUE)

Initialize an empty tibble with column names and 0 rows

I have a vector of column names called tbl_colnames.
I would like to create a tibble with 0 rows and length(tbl_colnames) columns.
The best way I've found of doing this is...
tbl <- as_tibble(data.frame(matrix(nrow=0,ncol=length(tbl_colnames)))
and then I want to name the columns so...
colnames(tbl) <- tbl_colnames.
My question: Is there a more elegant way of doing this?
something like tbl <- tibble(colnames=tbl_colnames)
my_tibble <- tibble(
var_name_1 = numeric(),
var_name_2 = numeric(),
var_name_3 = numeric(),
var_name_4 = numeric(),
var_name_5 = numeric()
)
Haven't tried, but I guess it works too if instead of initiating numeric vectors of length 0 you do it with other classes (for example, character()).
This SO question explains how to do it with other R libraries.
According to this tidyverse issue, this won't be a feature for tribbles.
Since you want to combine a list of tibbles. You can just assign NULL to the variable and then bind_rows with other tibbles.
res = NULL
for(i in tibbleList)
res = bind_rows(res,i)
However, a much efficient way to do this is
bind_rows(tibbleList) # combine all tibbles in the list
For anyone still interested in an elegant way to create a 0-row tibble with column names given by a character vector tbl_colnames:
tbl_colnames %>% purrr::map_dfc(setNames, object = list(logical()))
or:
tbl_colnames %>% purrr::map_dfc(~tibble::tibble(!!.x := logical()))
or:
tbl_colnames %>% rlang::rep_named(list(logical())) %>% tibble::as_tibble()
This, of course, results in each column being of type logical.
The following command will create a tibble with 0 row and variables (columns) named with the contents of tbl_colnames
tbl <- tibble::tibble(!!!tbl_colnames, .rows = 0)
You could abuse readr::read_csv, which allow to read from string. You can control names and types, e.g.:
tbl_colnames <- c("one", "two", "three", "c4", "c5", "last")
read_csv("\n", col_names = tbl_colnames) # all character type
read_csv("\n", col_names = tbl_colnames, col_types = "lcniDT") # various types
I'm a bit late to the party, but for future readers:
as_tibble(matrix(nrow = 0, ncol = length(tbl_colnames)), .name_repair = ~ tbl_colnames)
.name_repair allows you to name you columns within the same function.

Resources