Superscripting a variable over another when building tables in R and knitr - r

I am trying to build a table, and one of my variables should have another variable superscriptet after it. I can find several related answers here on SO, but they all involve fixed values that need to be superscriptet, instead of vectors as in my case.
Also most examples involve plot legends and not tables like in my case (Although I don't think that makes much of a difference).
Example data:
library(tidyverse)
library(knitr)
df <- crossing(
X = seq(1:2),
Y = c("A", "B"))
df
# A tibble: 4 x 2
X Y
<int> <chr>
1 1 A
2 1 B
3 2 A
4 2 B
I would like to mutate a new variable that is just X with Y values superscriptet after it.
Here is what I have tried (Doesn't work):
df %>% mutate(
New = paste0(X, "^Y")) %>%
kable()
df %>% mutate(
New = paste0(X, ^{Y})) %>%
kable()
df %>% mutate(
New = paste0(X, bquote(^~{.Y}~))) %>%
kable()
Any help appreciated.

You could use tableHTML:
df <- data.frame(
X = seq(1:2),
Y = c("A", "B"))
library(dplyr)
library(tableHTML)
You can slightly modify X with the HTML tag <sup> to display Y as a superset:
df %>%
mutate(X = paste0(X, "<sup>", Y, "</sup>")) %>%
select(X) %>%
tableHTML(rownames = FALSE,
escape = FALSE,
widths = 50)
Edit
As pointed out by Steen, this also works with knitr:
df %>%
mutate(X = paste0(X, "<sup>", Y, "</sup>")) %>%
select(X) %>%
knitr::kable(escape = FALSE)

Is it for a pdf output?
Because in this case the following could work:
library(tidyverse)
library(knitr)
df <- crossing(
X = seq(1:2),
Y = c("A", "B"))
df %>% mutate(
New = paste0(X, "\\textsuperscript{", Y, "}")) %>%
kable(escape = FALSE)
Using escape = FALSE to add LaTeX inside the table.

Related

How to combine lapply with dplyr in a function

Below is a sample data frame that I have created along with the expected output.
df = data.frame(color = c("Yellow", "Blue", "Green", "Red", "Magenta"),
values = c(24, 24, 34, 45, 49),
Quarter = c("Period1","Period2" , "Period3", "Period3", "Period1"),
Market = c("Camden", "StreetA", "DansFireplace", "StreetA", "DansFireplace"))
dfXQuarter = df %>% group_by(Quarter) %>% summarise(values = sum(values)) %>%
mutate(cut = "Quarter") %>% data.frame()
colnames(dfXQuarter)[1] = "Grouping"
dfXMarket = df %>% group_by(Market) %>% summarise(values = sum(values)) %>%
mutate(cut = "Market")%>% data.frame()
colnames(dfXMarket)[1] = "Grouping"
df_all = rbind(dfXQuarter, dfXMarket)
Now I for the sake brevity I want to compile this into a function and using lapply.
Below is my attempt at the same-
list = c("Market", "Quarter")
df_all <- do.call(rbind, lapply(list, function(x){
df_l= df %>% group_by(x) %>%
summarise(values = sum(values)) %>%
mutate(cut= x) %>%
data.frame()
colnames(df_l)[df_l$x] = "Grouping"
df_l
}))
This block of code is giving me error.
I need the output to be the exact replica of the 'df_all' output for further operations.
How I do write this function correctly?
We can use purrr::map_dfr
library(dplyr)
library(purrr)
#Don't use the R build-in type e.g. list in variables name
lst <- c("Market", "Quarter")
#Use map if you need the output as a list
map_dfr(lst, ~df %>% group_by("Grouping"=!!sym(.x)) %>%
summarise(values = sum(values)) %>%
mutate(cut = .x) %>%
#To avoid the warning massage from bind_rows
mutate_if(is.factor, as.character))
# A tibble: 6 x 3
Grouping values cut
<chr> <dbl> <chr>
1 Camden 24 Market
2 DansFireplace 83 Market
3 StreetA 69 Market
4 Period1 73 Quarter
5 Period2 24 Quarter
6 Period3 79 Quarter
We can fix the first solution by
change group_by(x) to group_by_at(x), since x is a string here.
Use colnames(df_l)[colnames(df_l)==x] <- "Grouping" in naming the grouping variable.
Not pretty but works and doesn't require tidy functions:
groupwise_summation <- function(df, grouping_vecs){
# Split, apply, combine:
tmpdf <- do.call(rbind, lapply(split(df, df[,grouping_vecs]), function(x){sum(x$values)}))
# Clean up the df:
data.frame(cbind(cut = row.names(tmpdf), value = as.numeric(tmpdf)), row.names = NULL)
}
# Apply and combine:
df_all <- rbind(groupwise_summation(df, c("Quarter")), groupwise_summation(df, c("Market")))
# Note inside the c(), you can use multiple grouping variables.

Dynamically change the column name created using summarise() and complete()

I'm trying to dynamically create an extra column. The first piece of code works as i want it to:
library(dplyr)
library(tidyr)
set.seed(1)
df <- data.frame(animals = sample(c('dog', 'cat', 'rat'), 100, replace = T))
my_fun <- function(data, column_name){
data %>% group_by(animals) %>%
summarise(!!column_name := n())
}
my_fun(df, 'frequency')
Here i also use the complete function and it doesn't work:
library(dplyr)
set.seed(1)
df <- data.frame(animals = sample(c('dog', 'cat', 'rat'), 100, replace = T))
my_fun <- function(data, column_name){
data %>% group_by(animals) %>%
summarise(!!column_name := n())%>%
ungroup() %>%
complete(animals = c('dog', 'cat', 'rat', 'bat'),
fill = list(!!column_name := 0))
}
my_fun(df, 'frequency')
The list function doesn't seem to like !!column_name :=
Is there something i can do to make this work? Basically i want the second piece of code to output:
animals frequency
bat 0
cat 38
dog 27
rat 35
You could keep the fill argument of complete() as the default (which will give you the missing values as NA) and subsequently replace them with 0:
my_fun <- function(data, column_name){
data %>%
group_by(animals) %>%
summarise(!!column_name := n())%>%
ungroup() %>%
complete(animals = c('dog', 'cat', 'rat', 'bat')) %>%
mutate_all(~replace(., is.na(.), 0))
}

How to add comma to summary result after using "Summarise function" from tidyverse

I did a sum of a column using the code below.
I have the correct number but it is not formatted properly as a number. I also have a case where I need it formatted as currency. This is the code I've tried
Result %>%
summarise(Pieces_Mailed = sum(Households, na.rm = TRUE)) %>%
comma_format(digits = 12)
first case: it gave me 520698. How do i get it to return 520,698 instead?
second case: it gave me 46553549. How do i get it to return $4,655,354 instead?
Thanks.
comma_format just returns ,
library(dplyr)
library(scales)
library(tibble)
tibble(col = sample(1e5, 10, replace = FALSE)) %>%
summarise(col = sum(col)) %>%
mutate(col = comma_format(accuracy = 12)(col))
# A tibble: 1 x 1
# col
# <chr>
#1 481,296
For adding $, we need dollar_format
tibble(col = sample(1e5, 10, replace = FALSE)) %>%
summarise(col = sum(col)) %>%
mutate(col = dollar_format(accuracy = 12)(col))
# A tibble: 1 x 1
# col
# <chr>
#1 $445,896

Use variable names in function in dplyr for sum and cumsum

dplyr programming question here. Trying to write a dplyr function which takes column names as inputs and also filters on a component outlined in the function. What I am trying to recreate is as follow called test:
#test df
x<- sample(1:100, 10)
y<- sample(c(TRUE, FALSE), 10, replace = TRUE)
date<- seq(as.Date("2018-01-01"), as.Date("2018-01-10"), by =1)
my_df<- data.frame(x = x, y =y, date =date)
test<- my_df %>% group_by(date) %>%
summarise(total = n(), total_2 = sum(y ==TRUE, na.rm=TRUE)) %>%
mutate(cumulative_a = cumsum(total), cumulative_b = cumsum(total_2)) %>%
ungroup() %>% filter(date >= "2018-01-03")
The function I am testing is as follows:
cumsum_df<- function(data, date_field, cumulative_y, minimum_date = "2017-04-21") {
date_field <- enquo(date_field)
cumulative_y <- enquo(cumulative_y)
data %>% group_by(!!date_field) %>%
summarise(total = n(), total_2 = sum(!!cumulative_y ==TRUE, na.rm=TRUE)) %>%
mutate(cumulative_a = cumsum(total), cumulative_b = cumsum(total_2)) %>%
ungroup() %>% filter((!!date_field) >= minimum_date)
}
test2<- cumsum_df(data = my_df, date_field = date, cumulative_y = y, minimum_date = "2018-01-03")
I have looked looked at some examples of using enquo and this thread gets me half way there:
Use variable names in functions of dplyr
But the issue is I get two different data frame outputs for test 1 and test 2. The one from the function outputs does not have data from the logical y referenced column.
I also tried this instead
cumsum_df<- function(data, date_field, cumulative_y, minimum_date = "2017-04-21") {
date_field <- enquo(date_field)
cumulative_y <- deparse(substitute(cumulative_y))
data %>% group_by(!!date_field) %>%
summarise(total = n(), total_2 = sum(data[[cumulative_y]] ==TRUE, na.rm=TRUE)) %>%
mutate(cumulative_a = cumsum(total), cumulative_b = cumsum(total_2)) %>%
ungroup() %>% filter((!!date_field) >= minimum_date)
}
test2<- cumsum_df(data= my_df, date_field = date, cumulative_y = y, minimum_date = "2018-01-04")
Based on this thread: Pass a data.frame column name to a function
But the output from my test 2 column is also wildly different and it seems to do some kind or recursive accumulation. Which again is different to my test date frame.
If anyone can help that would be much appreciated.

distinct drops columns after group_by

After doing a group_by I can't get distinct values unless I change the object back to a data frame.
library(dplyr)
x <- data.frame(A = c(1,1,2,2,3,3), B = c(1,2,3,4,5,6), C = c(6,6,6,5,5,5))
y <- x %>% group_by(A) %>% transmute(B = mean(B), C = mean(C))
y
distinct(y)
distinct(as.data.frame(y))
This behaviour seems to have changed after a recent dplyr release (I have dplyr_0.5.0) as I'm sure my code used to work. The question is, is this a bug or by design? If by design, I need to change a bunch of code. Thanks!
try
library(dplyr)
x <- data.frame(A = c(1,1,2,2,3,3), B = c(1,2,3,4,5,6), C = c(6,6,6,5,5,5))
y <- x %>% group_by(A) %>% transmute(B = mean(B), C = mean(C)) %>% ungroup()
y
distinct(y)
distinct(as.data.frame(y))
note the ungroup()

Resources