How to make cells bold in KableExtra based on a condition - r

My data looks like this as a kable:
pdtable %>%
kbl(caption = "This is the caption") %>%
kable_classic_2()
However, I want to make some cells bold. Is there a way to do it without editing the input dataframe? I tried to integrate cell_spec in the pipes but I can't get it to work.
Does anyone have a solution?
EDIT:
here is some example data. I want to make all cells bold, that are below a value of 0.05 in the brackets. Using a conditional row_spec however, does not seem to work because there are two values in the cells.
structure(list(`2012` = c("4.16 (0.02)", "1.39 (0.043)", "-3.65 (0.213)",
"4.35 (0.248)", "3.16 (0.036)", "8.84 (0.002)", "15.13 (0)",
"13.03 (0)", "11.16 (0.002)", "4.35 (0.047)", "-2.39 (0.6)",
"-1.45 (0.531)"), `2013` = c("-5.97 (0.24)", "-2.45 (0.73)",
"1.58 (0.002)", "17.77 (0)", "24.23 (0)", "17.29 (0)", "24.62 (0)",
"26.95 (0)", "16.92 (0)", "2.53 (0.13)", "3.79 (0.019)", "4.37 (0)"
), `2014` = c("-22.53 (0.04)", "-14.01 (0.899)", "-3.06 (0.079)",
"12.06 (0.072)", "20.32 (0.011)", "13.86 (0.009)", "34.91 (0)",
"32.15 (0)", "27.33 (0)", "2.53 (0.412)", "3.79 (0.158)", "-6.35 (0)"
), `2012-2014` = c("-26.36 (0.002)", "-13.62 (0.028)", "-4.05 (0)",
"34.98 (0)", "46.65 (0)", "37.45 (0)", "76.91 (0)", "77.23 (0)",
"60.26 (0)", "-14.44 (0.004)", "-15.67 (0)", "-6.71 (0)")), class = "data.frame", row.names = c("test 3",
"test 7", "test 15", "test1 3", "test1 7", "test1 15",
"test3 3", "test 3", "test 4", "test 4", "test 4", "test 4"))

You could use cell_spec conditionally with dplyr::mutate and stringr
library(kableExtra)
library(dplyr)
library(stringr)
pdtable |>
mutate(across(everything(), ~cell_spec(.x, bold = ifelse(as.numeric(str_extract(.x, "(?<=\\().*?(?=\\))"))<0.05, TRUE, FALSE)))) |>
kbl(caption = "This is the caption",
escape = FALSE) |>
kable_classic_2()

column_spec can accept a vector of logical values to control text formats of individual cells in a column. This example sets cell (3, 1) to bold.
library(tidyverse)
library(kableExtra)
df <- tibble(a = 1:5, b = 1:5)
df %>%
kbl() %>%
column_spec(1, bold = ifelse(df$a == 3, TRUE, FALSE)) %>%
kable_styling()

Related

creating kendall correlation matrix

i have data that looks like this :
in total 38 columns .
data code sample :
df <- structure(
list(
Christensenellaceae = c(
0.010484508,
0.008641566,
0.010017172,
0.010741488,
0.1,
0.2,
0.3,
0.4,
0.7,
0.8,
0.9,
0.1,
0.3,
0.45,
0.5,
0.55
),
Date=c(27,27,27,27,27,27,27,27,28,28,28,28,28,28,28,28),
Treatment = c(
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 1",
"Treatment 2",
"Treatment 2",
"Treatment 2",
"Treatment 2"
)
),class = "data.frame",
row.names = c(NA,-9L)
)
whay i wish to do is to create kendall correlation matrix (the data doesnt have linear behavor) between the treatment types(10 in total but 2 in example)for every column (except treatment and date) so in total 36 correlation matrix with size 1010 (here will be 22) .
this is my code:
res2 <- cor(as.matrix(data),method ="kendall")
but i get the error:
Error in cor(data, method = "kendall") : 'x' must be numeric
is there any way to solve this ? thank you:)
You can do that using a tidyverse approach by first making some data wrangling and then using correlate to calculate the correlation in pairs for every combination of variables.
library(corrr)
library(tidyverse)
df |>
# Transform data into wide format
pivot_wider(id_cols = Date,
names_from = Treatment,
values_from = -starts_with(c("Treatment", "Date"))) |>
# Unnest lists inside each column
unnest(cols = starts_with("Treatment")) |>
# Remove Date from the columns
select(-Date) |>
# Correlate all columns using kendall
correlate(method = "kendall")
# A tibble: 2 x 3
# term `Treatment 1` `Treatment 2`
# <chr> <dbl> <dbl>
#1 Treatment 1 NA 0.546
#2 Treatment 2 0.546 NA

table1() Output Labeling all Data as "Missing"

I am trying to make a descriptive statistics table in R and my code functions properly (producing a table) but despite the fact that I have no missing values in my dataset, the table outputs all of my values as missing. I am still a novice in R, so I do not have a broad enough knowledge base to troubleshoot.
My code:
data <- read_excel("Data.xlsx")
data$stage <-
factor(data$stage, levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language, levels=c(1,2), labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode, levels = (1:3), labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock, levels = c(0:2), labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity, levels = c(0:1), labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity, levels = c(0,1,2,4), labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
label (data$stage)<- "Stage at Diagnosis"
label(data$age) <- "Age"
label(data$primary_language) <- "Primary language"
label(data$status_zipcode)<- "Demographic Status in Zipcode Area"
label(data$status_censusblock)<- "Demographic Status in Census Block Group"
label(data$self_identity) <- "Self-Identified Racial/Ethnic Group"
label(data$subjective_identity)<- "Racial/Ethnic Group as Identified by Others"
table1(~ stage +age + primary_language + status_zipcode + status_censusblock + self_identity + subjective_identity| primary_language, data=data)
Table output:
enter image description here
Data set:
enter image description here
When I run the data set the values are there. It actually worked for me when I re-did the spacing:
data$stage <- factor(data$stage,
levels = c(1,2,3,4,5,6,7),
labels = c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
When I did it exactly as you typed it came up with NA's, too. Try the first and see if it works for you that way. Then check the spacing for the others. That may be all it is.
I do end up with one NA on the stage column because 0 is not defined in your levels.
Edit: Ran the rest so here are some other points.
You end up with an NA in stage because one of your values is 0 but it's not defined with a label
You end up with NA's in language because you have a 0 and a 1 but you define it as 1, 2. So you'd need to change to the values. You end up with NA's in other portions because of the :
Change your code to this and you should have the values you need except that initial 0 in "stage":
data$stage <- factor(data$stage,
levels=c(1,2,3,4,5,6,7),
labels =c("Stage 0", "Stage 1", "Stage 2", "Stage 3", "Unsure", "Unsure (Early Stage)", "Unsure (Late Stage"))
data$primary_language <-factor(data$primary_language,
levels=c(0,1),
labels = c("Spanish", "English"))
data$status_zipcode <- factor(data$status_zipcode,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$status_censusblock <- factor(data$status_censusblock,
levels = c(0,1,2),
labels = c("Minority", "Majority", "Diverse"))
data$self_identity <- factor(data$self_identity,
levels = c(0,1),
labels = c("Hispanic/Latina","White/Caucasian"))
data$subjective_identity <- factor(data$subjective_identity,
levels = c(0,1,2,4),
labels = c("Hispanic/Latina", "White/Caucasian", "Multiracial", "Asian"))
enter image description here

Indentation when line break in group_rows() command - kableExtra package in R markdown

I'm using the kableExtra package to output a table to PDF in R markdown.
I use the command group_rows() to group some rows of my table together.
The text in some rows of my first column is too long for the column width, so it is broken into two lines. However, there is no indentation of the second line. Is there a way to either indent also the second line or remove the indentation overall?
Increasing the column width so the text won't be spread over two lines is unfortunately no option since I have way more columns in my real table.
This is a subset of my data frame:
data <- structure(list(`Control variables` = c("GDP growth", "GDP per capita",
"Top income tax rate", "Right-wing executive"), Treated = structure(c("2.29",
"21,523.57", "0.70", "0.62"), class = "AsIs"), top10_synthetic = structure(c("3.37", "19,939.72", "0.68", "0.63"), class = "AsIs"), top10_mean = structure(c("2.95", "30,242.60", "0.64", "0.43"), class = "AsIs")), .Names = c("Control variables", "Treated", "top10_synthetic", "top10_mean"), row.names = c(NA, 4L), class = "data.frame")
This is the code I am using:
```{r}
kable(data, "latex", caption = "table 1", booktabs = T, col.names = c("Control variables", "Treated", "Synthetic", "Mean")) %>%
add_header_above(c("", "", "Top 10%" = 2)) %>%
group_rows("UK", 1, 2) %>%
group_rows("Japan", 3, 4, latex_gap_space = "0.8cm") %>%
footnote(general = "xxx") %>%
kable_styling(latex_options = c("HOLD_position", "scale_down")) %>%
column_spec(1, width = "3cm")
```
This is how the .pdf output looks like. As you can see, e.g. the text "top income tax rate" is split into two lines and I would like the second line to be indented just like the first line.
Thank you for any tips!
If you just run the chunk in the R console, you'll see this LaTeX output:
\begin{table}[H]
\caption{\label{tab:}table 1}
\centering
\resizebox{\linewidth}{!}{
\begin{tabular}[t]{>{\raggedright\arraybackslash}p{3cm}lll}
\toprule
\multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{2}{c}{Top 10\%} \\
\cmidrule(l{2pt}r{2pt}){3-4}
Control variables & Treated & Synthetic & Mean\\
\midrule
\addlinespace[0.3em]
\multicolumn{4}{l}{\textbf{UK}}\\
\hspace{1em}GDP growth & 2.29 & 3.37 & 2.95\\
\hspace{1em}GDP per capita & 21,523.57 & 19,939.72 & 30,242.60\\
\addlinespace[0.8cm]
\multicolumn{4}{l}{\textbf{Japan}}\\
\hspace{1em}Top income tax rate & 0.70 & 0.68 & 0.64\\
\hspace{1em}Right-wing executive & 0.62 & 0.63 & 0.43\\
\bottomrule
\multicolumn{4}{l}{\textit{Note: }}\\
\multicolumn{4}{l}{xxx}\\
\end{tabular}}
\end{table}
As you can see, kableExtra isn't putting in a line break in that title, LaTeX is doing it. This means you need a LaTeX fix for the problem. Maybe someone else knows an easier one, but the best I could find is the following: wrap the long row title in a minipage environment, and fiddle with the spacing to look better.
Since this is kind of messy, I'd write an R function to do it:
inMinipage <- function(x, width)
paste0("\\begin{minipage}[t]{",
width,
"}\\raggedright\\setstretch{0.8}",
x,
"\\vspace{1.2ex}\\end{minipage}")
This needs to be called on the data being put into the table, and kable needs to be told not to escape those backslashes (using escape = FALSE). In addition, the \setstretch command comes from the setspace LaTeX package. So overall your sample document would look like this:
---
output:
pdf_document:
extra_dependencies: setspace
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(kableExtra)
library(knitr)
```
```{r}
inMinipage <- function(x, width)
paste0("\\begin{minipage}[t]{",
width,
"}\\raggedright\\setstretch{0.8}",
x,
"\\end{minipage}")
data <- structure(list(`Control variables` = c("GDP growth", "GDP per capita", "Top income tax rate", "Right-wing executive"), Treated = structure(c("2.29",
"21,523.57", "0.70", "0.62"), class = "AsIs"), top10_synthetic = structure(c("3.37", "19,939.72", "0.68", "0.63"), class = "AsIs"), top10_mean = structure(c("2.95", "30,242.60", "0.64", "0.43"), class = "AsIs")), .Names = c("Control variables", "Treated", "top10_synthetic", "top10_mean"), row.names = c(NA, 4L), class = "data.frame")
data[[1]] <- inMinipage(data[[1]], "2.5cm")
kable(data, "latex", caption = "table 1", booktabs = T, col.names = c("Control variables", "Treated", "Synthetic", "Mean"), escape = FALSE) %>%
add_header_above(c("", "", "Top 10%" = 2)) %>%
group_rows("UK", 1, 2) %>%
group_rows("Japan", 3, 4, latex_gap_space = "0.8cm") %>%
footnote(general = "xxx") %>%
kable_styling(latex_options = c("HOLD_position", "scale_down")) %>%
column_spec(1, width = "3cm")
```
With that code I see this:
The spacing isn't quite right, but it's getting close. I hope this helps.

Create empty data frame

I created a completely empty matrix. I would like to split a observation in 2 indices (like in Excel).
Indices <- matrix(NA, 8, 2)
rownames(Indices) <- rownames(Indices, do.NULL = FALSE, prefix = "Plot") # brauche ich das?
rownames(Indices) <- c("Plot 1", "Plot 2", "Plot 3", "Plot 8", "Plot 9", "Plot 10",
"Plot 12", "Plot 13")
colnames(Indices) <- c("Density", "Trees per ha")
I would like to split Densityone time in Density only Oaks and Density total. I have no idea how to call this, and is this even possible in R?

ggplot2 geom_text - 'dynamically' place label over barchart

I have what I know is going to be an impossibly easy question. I am showing an average number of days by month using a bar chart, using the following example:
dat <- structure(list(Days = c("217.00", "120.00", "180.00", "183.00",
"187.00", "192.00"), Amt = c("1,786.84", "1,996.53",
"1,943.23", "321.30", "2,957.03", "1,124.32"), Month = c(201309L,
201309L, 201309L, 201310L, 201309L, 201309L), Vendor = c("Comp A",
"Comp A", "Comp A", "Comp A", "Comp A",
"Comp A"), Type = c("Full", "Full",
"Self", "Self", "Self", "Self"
), ProjectName = c("Rpt 8",
"Rpt 8", "Rpt 8",
"Rpt 8", "Rpt 8",
"Rpt 8")), .Names = c("Days",
"Amt", "Month", "Vendor", "Type", "ProjectName"
), row.names = c("558", "561", "860", "1157", "1179", "1221"), class =
"data.frame")
ggplot(dat, aes(x=as.character(Month),y=as.numeric(Days),fill=Type))+
stat_summary(fun.y='mean', geom = 'bar')+
ggtitle('Rpt 8')+
xlab('Month')+
ylab('Average Days')+
geom_text(stat='bin',aes(y=100, label=paste('Avg:\n',..count..)))
Right now my labels are showing counts & showing up where ever i designate y.
I want to:
place labels at the top of the bars.
display the average, not the count.
I've pretty thoroughly - and unsuccessfully - tried most of the other solutions on SO & elsewhere.
Just got it:
means<-ddply(dat,.(Vendor,Type,Month), summarise, avg=mean(as.numeric(Days)))
ggplot(dat, aes(x=as.character(Month),y=as.numeric(Days),fill=Type))+
stat_summary(fun.y='mean', geom = 'bar')+
geom_text(data = means, stat='identity',
aes(y=avg+7, label=round(avg,0),group=Type))
i realize there is code nearly identical to this sitting elsewhere. my error came in placing the round's 0 outside the correct closing parenthesis -- thus moving all my labels to 0 on x axis... DUH!

Resources