How to generate "pretty" output from the str() function in R - r

I have a moderate-sized data set that is 1000 rows by 81 columns. I'd like to use the output from str(), but I'd like to present it in a "prettier" way. I've tried things like this:
df %>% str() %>% kableExtra::kbl() %>% kableExtra::kable_minimal()
and
tbl_summary(as.data.frame(str(df)))
but neither works. I'm not married to str() or to any specific package, but that's the kind of summary I'm going for.
In the end, this is intended to generate an HTML file, but I'd like it to work with PDF output as well.
Any ideas on how to do this?

Update II:
This can be achieved making use of this gist devtools::source_gist('4a0a5ab9fe7e1cf3be0e')
<devtools::source_gist('4a0a5ab9fe7e1cf3be0e')>
print(strtable(iris, factor.values=as.integer), na.print='') %>%
kable() %>%
htmlTable()
Update I:
you could extend:
data.frame(variable = names(iris),
class = sapply(iris, typeof),
levels = sapply(iris, class),
first_values = sapply(iris, function(x) paste0(head(x), collapse = ", ")),
levels_values = sapply(iris, function(x) paste0(unique(x), collapse =", ")),
row.names = NULL) %>%
kable() %>%
htmlTable()
First answer:
Something like this using iris dataset:
library(knitr)
library(magrittr)
library(htmlTable)
data.frame(variable = names(iris),
classe = sapply(iris, typeof),
first_values = sapply(iris, function(x) paste0(head(x), collapse = ", ")),
row.names = NULL) %>%
kable() %>%
htmlTable()

skimr and gt (or kable, or flextable, or DT, or many other table packages) could also work here:
mtcars |>
skimr::skim() |>
gt::gt()

Related

how to use markdown formatting in gt::cols_label when supplying names with .list argument

Hello stackoverflow community,
I'd like to use the function for column renaming gt:cols_label() and globally define a list for this to input to the .list-argument of gt:cols_label(). The list should contain:
the new column names
the desired formatting options (using the md()-Function)
While providing both info in the ...-argument directly works:
mpg %>%
gt::gt() %>%
gt::cols_label(
manufacturer = md("*M*a**nufac***ture***r**")
)
This does not (and neither does the commented try):
rename_format_cols <- c(
"manufacturer" = md("*M*a**nufac***ture***r**")
) %>%
as.list()
# rename_format_cols <- c(
# "manufacturer" = "md("*M*a**nufac***ture***r**")"
# ) %>%
# as.list()
mpg %>%
gt::gt() %>%
gt::cols_label(
.list = rename_format_cols
)
I'd be very glad if you'd helped out.
Best regards, paulge

How to force linebreaks in kableExtra functions with escape = FALSE?

In kableExtra >= 0.8.0, the canonical way to insert a linebreak into text piped into a table from a kableExtra function such as add_header_above or pack_rows is to add an \n directly.
However, this appears not to work with the escape = FALSE argument, which is required if the text also contains LaTeX code.
How can one force linebreaks in kableExtra functions with escape = FALSE?
library(dplyr)
library(knitr)
library(kableExtra)
starwars %>%
filter(species == 'Gungan' | species == 'Droid') %>%
arrange(species) %>%
select(name, eye_color) %>%
kbl(booktabs = TRUE) %>%
pack_rows(
index = c(
'The droids: everybody\'s favourite' = 6,
'The Gungans: only beloved of \nthose aged under $3^2$' = 3),
escape = FALSE)
ISSUE
The issue at hand is that you wish to escape part of your header (i.e., the break) and not escape another part (i.e., the math code).
Further Complications
This core issue is further complicated by a number of factors:
when and how kableExtra is programmed to deal with escaping
a desire to have a solution that works for both html and LaTeX output
when and how R evaluates code
A SOLUTION
Here is a solution that will work for both html and LaTeX output, but it is not as clean and straight forward as your original code:
# a new version of `kableExtra::linebreak()` that takes into account what type
# of output is desired as well as how much escaping is necessary
linebreak2 <- function(x, double_escape = TRUE, ...) {
# if LaTeX insert text into a `\makecell[]{}` command and double escape
if(knitr::is_latex_output())
return(linebreak(x, double_escape = double_escape, ...))
# if html output just replace `\n`s with `<br/>`s
if(knitr::is_html_output())
return(gsub("\n", "<br/>", x))
# let x pass through for other types of output
return(x)
}
# build the index named vector outside the pipe flow
# in order to set the names using `linebreak2()`
index <- c(6, 3)
names(index) <- c(
'The droids: everybody\'s favourite',
linebreak2('The Gungans: only beloved of \nthose aged under $3^2$')
)
# proceed as before
starwars %>%
filter(species == 'Gungan' | species == 'Droid') %>%
arrange(species) %>%
select(name, eye_color) %>%
kbl(booktabs = TRUE) %>%
pack_rows(index = index, escape = FALSE)
PDF Output
HTML Output
You could use html line break tag <br/>:
starwars %>%
filter(species == 'Gungan' | species == 'Droid') %>%
arrange(species) %>%
select(name, eye_color) %>%
kbl(booktabs = TRUE) %>%
pack_rows(
index = c(
'The droids: everybody\'s favourite' = 6,
'The Gungans: only beloved of <br/> those aged under $3^2$' = 3),
escape = FALSE)

How to display LaTeX symbols in Flextable (R)

I have generated the following table using the Flextable package in R. I created a conditionally formatted column with LaTeX arrow symbols in it, however the symbols aren't displayed when I generate it as a flextable. Is there a way to fix this?
library(tidyverse)
library(flextable)
data.frame(one = c(10,20,30), two = c(30,20,6)) %>%
mutate(Trend = case_when(.[,2] == .[,1] ~ "$\\rightarrow$", .[,2] > .[,1] ~ "$\\nearrow$", TRUE ~ "$\\searrow$")) %>%
flextable()
It may be easier to do this with unicode values for the symbols
library(dplyr)
library(flextable)
data.frame(one = c(10,20,30), two = c(30,20,6)) %>%
mutate(Trend = ifelse(two == one, "\U2192", "\U2190")) %>%
flextable()
-output

Drop row numbers in R tibble

How do I drop the row numbers when displaying R tibble through the DT package?
The options = list(rownames = FALSE) argument doesn't seem to work, I also "made up" options = list(rownumbers = FALSE) and that didn't work. I messed around with things like select(2:everything()) but that did not work either. Maybe piping in remove_rownames() at then end will work... it does not, or perhaps as.data.frame() piped at the end... Nope.
library(tidyverse)
library(DT)
datatable(mtcars %>% head() %>% as_tibble(), options = list(rownames = FALSE))
library(tidyverse)
library(DT)
datatable(mtcars %>% head() %>% as_tibble, rownames = FALSE)
You just need to rearrange the parenthesis, so that the rownames value is changed for datatable and not for the as_tibble call.

`gather` can't handle rownames

allcsvs = list.files(pattern = "*.csv$", recursive = TRUE)
library(tidyverse)
##LOOP to redact the snow data csvs##
for(x in 1:length(allcsvs)) {
df = read.csv(allcsvs[x], check.names = FALSE)
newdf = df %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
####TURN DATES UNAMBIGUOUS HERE####
df$DATE = lubridate::mdy(df$DATE)
finaldf = merge(newdf, df, all.y = TRUE)
write.csv(finaldf, allcsvs[x])
df = read.csv(allcsvs[x])
newdf = df[, -grep("X20", colnames(df))]
write.csv(newdf, allcsvs[x])
}
I am using the code above to populate a new column row-by-row using values from different existing columns, using date as selection criteria. If I manually open each .csv in excel and delete the first column, this code works great. However, if I run it on the .csvs "as is"
I get the following message:
Error: Column 1 must be named
So far I've tried putting -rownames within the parenthesis of gather, I've tried putting remove_rownames %>% below newdf = df %>%, but nothing seems to work. I tried reading the csv without the first column [,-1] or deleting the first column in R df[,1]<-NULL but for some reason when I do that my code returns an empty table instead of what I want it to. In other words, I can delete the rownames in Excel and it works great, if I delete them in R something funky happens.
Here is some sample data: https://drive.google.com/file/d/1RiMrx4wOpUdJkN4il6IopciSF6pKeNLr/view?usp=sharing
You can consider to import them with readr::read_csv.
An easy solution with tidyverse:
allcsvs %>%
map(read_csv) %>%
reduce(bind_rows) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
DATE = as.Date(DATE,format = "%m/%d/%Y"),
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE)
With utils::read.csv, you are importing strings are factors. as.Date(DATE,format = "%m/%d/%Y") evaluates NA.
Update
Above solution returns one single dataframe. To write the each data file separately with the for loop:
for(x in 1:length(allcsvs)) {
read_csv(allcsvs[x]) %>%
gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>%
mutate(
COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d")
) %>%
filter(DATE == COL_DATE) %>%
select(-COL_DATE) %>%
write_csv(paste('tidy', allcsvs[x], sep = '_'))
}
Comparison
purrr:map and purrr:reduce can be used instead of for loop in some cases. Those functions take another functions as arguments.
readr::read_csv is typically 10x faster than base R equivalents. (More info: http://r4ds.had.co.nz/data-import.html). Also it can handle CSV files better.

Resources