Applying text_transform() to row groups in R gt - r

In the {gt} package I want to use text_transform() on the row group titles in order to render the HTML but I'm getting the `no applicable method for 'resolve_location' error.
In my example below, you can see that text_transform() works if the locations argument is cells_body() (which is not what I actually want) but not if it's cells_row_groups() which is what I want.
Thoughts?
Zev
# As an experiment, I put HTML in both a value and in the groups, though
# in the real data there is only HTML in groups.
tbl <- tibble(values = c("test<sup>2</sup>", 2:4), groups = c("x<sup>2</sup>", "x<sup>2</sup>", "y", "y"))
unescape_html <- function(str){
xml2::xml_text(xml2::read_html(paste0("<x>", str, "</x>")))
}
# Error, no applicable method for resolve_location
tbl |>
gt::gt(groupname_col = "b") |>
gt::text_transform(
locations = gt::cells_row_groups(),
fn = function(x){
x <- purrr::map_chr(x, unescape_html)
paste("<span style=color:red;>", x, "</span>")
}
)
# This works so it shows that I'm close :)
tbl |>
gt::gt(groupname_col = "b") |>
gt::text_transform(
locations = gt::cells_body(columns = 1),
fn = function(x){
x <- purrr::map_chr(x, unescape_html)
paste("<span style=color:red;>", x, "</span>")
}
)

After some trial and error and a look at the rendered html code I figured out a solution using gt::html:
tbl <- tibble::tibble(values = c("test<sup>2</sup>", 2:4), groups = c("x<sup>2</sup>", "x<sup>2</sup>", "y", "y"))
unescape_html <- function(str){
xml2::xml_text(xml2::read_html(paste0("<x>", str, "</x>")))
}
tbl |>
gt::gt(groupname_col = "groups") |>
gt::text_transform(
locations = gt::cells_row_groups(),
fn = function(x) {
purrr::map(x, ~ gt::html(paste("<span style=color:blue;>", .x, "</span>")))
}
) |>
gt::text_transform(
locations = gt::cells_body(columns = 1),
fn = function(x){
x <- purrr::map_chr(x, unescape_html)
paste("<span style=color:red;>", x, "</span>")
}
)

Related

How to use plot inside a function?

I have a list and I want to plot all data frames in it by function MyPlot, but there are several problems:
It just plot the last data frame (L2)
The names of each data frames can not be extracted by name = deparse(substitute(df))
If I use jpeg instead of pdf there is an error:
"Error in switch(units, in = res, cm = res/2.54, mm = res/25.4, px = 1) * :
non-numeric argument to binary operator"
Any help would be appreciated.
L1 = data.frame(A = c(1:4) , B = c(1:4) , C = c(1:4))
L2 = data.frame(A = c(5:8) , B = c(8:11), G = c(1:4) )
L=list(L1,L2)
names(L) = c('L1' , 'L2')
MyPlot <- function(df){
name = deparse(substitute(df))
jpeg(paste(name) , ".jpg")
#pdf(paste0(name,".pdf"), onefile = TRUE, paper = "A4")
P = ggplot(df, aes(A , B)) + geom_point()
#print(P)
dev.off()
}
Plot_jpeg = L %>% lapply(MyPlot)
This might not be what you want, but:
L1 = data.frame(A = c(1:4) , B = c(1:4) , C = c(1:4))
L2 = data.frame(A = c(5:8) , B = c(8:11), G = c(1:4) )
L=list(L1,L2)
names(L) = c('L1' , 'L2')
MyPlot <- function(df, name){
P = ggplot(df, aes(A , B)) + geom_point()
ggsave(P, glue::glue("{name}.jpeg")
return(P)
}
Plots_list = purrr::map2(L, names(L),
function(.x, .y) MyPlot(.x, .y))
As MrFlick suggests,
a more idiomatic purrr option could be purrr::imap(L, ~MyPlot(.x, .y)) since imap(x, ...) is short hand for map2(x, names(x), ...). I am using map2 because I would rather be explicit, less things to remember.

get structure of unrooted tree with tree_layout()

Is there a way to get the structure of an unrooted tree with the phyloseq::tree_layout() function?
Using tree_layout() will give you the coordinates of the nodes and segments that compose the tree plotted below. You can then easily redraw that tree.
library(ape)
library(phyloseq) # bioconductor
#
cat("(((Strix_aluco:4.2,Asio_otus:4.2):3.1,",
"Athene_noctua:7.3):6.3,Tyto_alba:13.5);",
file = "ex.tre", sep = "\n")
tree.owls <- read.tree("ex.tre")
par(mfrow=c(2,1))
#original tree
plot.phylo(tree.owls,type = 'p',main = 'plot.phylo')
#redraw structure with tree_layout object
tree.ly <- tree_layout(tree.owls)
plot(1,type='n',axes=FALSE, xlim = c(0,max(tree.ly$edgeDT$xright)),ylim = c(0,max(tree.ly$edgeDT$y)),main = 'tree_layout')
segments(x0=tree.ly$edgeDT$xleft,y0=tree.ly$edgeDT$y,x1=tree.ly$edgeDT$xright,y1=tree.ly$edgeDT$y)
segments(x0=tree.ly$vertDT$x,y0=tree.ly$vertDT$vmin,x1=tree.ly$vertDT$x,y1=tree.ly$vertDT$vmax)
But what if you want to redraw this tree: plot.phylo(tree.owls,type = 'u'). How would you do it?
library(ape)
library(phyloseq)
cat(
"(((Strix_aluco:4.2,Asio_otus:4.2):3.1,",
"Athene_noctua:7.3):6.3,Tyto_alba:13.5);",
file = "ex.tre",
sep = "\n"
)
tree.owls <- read.tree("ex.tre")
tree_layout() works with the phylo object, not with the plotted phylogeny. So calling tree_layout() after plot.phylo(type='p') vs. plot.phylo(type='u') will result in the same object.
plot.phylo(tree.owls, type = 'p')
a <- tree_layout(tree.owls)
plot.phylo(tree.owls, type = 'u')
b <- tree_layout(tree.owls)
identical(a, b)
[1] TRUE
To get the coordinates of an already plotted phylogeny, we can use last_plot.phylo like so:
plot.phylo(tree.owls, type = 'p')
coords_phylogram <- get("last_plot.phylo",envir=.PlotPhyloEnv)
plot.phylo(tree.owls, type = 'u')
coords_unrooted <- get("last_plot.phylo",envir=.PlotPhyloEnv)
identical(coords_phylogram, coords_unrooted)
[1] FALSE
We can see that the coordinates are now different between the phylogram and unrooted network.
The output of get("last_plot.phylo",envir=.PlotPhyloEnv) is a named list, so we can extract particular element easily. For example, if we wanted to add some annotations at the coordinates of nodes and tips, we could do the following:
points(x = coords_unrooted$xx, y=coords_unrooted$yy, pch=21, bg="yellow", cex=3)
EDIT: create a layout data frame (like tree_layout) and plot an unrooted phylogeny
Preliminaries. phyloseq is no longer required as we are extracting the layout by hand.
library(ape)
library(dplyr)
tr <- "(((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3):6.3,Tyto_alba:13.5);"
tree.owls <- read.tree(text = tr)
A function to create tree_layouts()'s edgeDT and vertDT. To draw segments that are not only horizontal or vertical (like you would have in a cladogram or phylogram), we need another column for the y0 coordinates as these are now different from the y coordinates. This way we can plot slanted phylograms and unrooted trees.
Important note: Edge lengths as calculated by this function and stored in edgeDT are incorrect (in some cases)! This is because they are calculated by subtracting coordinates, and this obviously doesn't work if edges were plotted at an angle (as in a slanted phylogram or unrooted network). This does not matter for plotting, as the plotting function does not use edge.lengths. But keep it in mind.
layout_from_plot <- function(tree, type, drop_root=FALSE, use_vert=FALSE) {
plot.phylo(x = tree, type = type)
title("Plot to get coordinates")
coords <- get("last_plot.phylo",envir=.PlotPhyloEnv)
edgeDT <- tibble(
xright = coords$xx[coords$edge[,2]],
xleft = coords$xx[coords$edge[,1]],
y = coords$yy[coords$edge[,2]],
edge.length = xright-xleft,
V1 = coords$edge[,1],
V2 = coords$edge[,2]
)
edgeDT <- edgeDT %>%
arrange(V2) %>%
mutate(OTU = c(tree$tip.label, rep(NA_character_, coords$Nnode - 1))) %>%
select(V1, V2, edge.length, OTU, xleft, xright, y)
if (!use_vert) {
edgeDT <- mutate(edgeDT, y0=y[V1-1])
} else {
edgeDT <- mutate(edgeDT, y0=y)
}
# this is a bit hacky, but it does work.
# double check that OTUs are in the right positions
# and branch lengths are correct.
# ideally, you would unroot the tree, then plot the network
# to extract coordinates. See below.
if (is.rooted(phy = tree)) {
if (drop_root) {
edgeDT <- edgeDT %>%
group_by(y0) %>%
mutate(xleft = ifelse(y == y0, xright, xleft)) %>%
mutate(xright = ifelse(y == y0, lead(xright), xright)) %>%
mutate(y = ifelse(y == y0, lead(y), y)) %>%
distinct(y, y0, .keep_all = TRUE)
}
}
vertDT <- edgeDT %>%
group_by(V1) %>%
mutate(vmin=min(y), vmax=max(y)) %>%
mutate(x=xleft[which(y==min(y))]) %>%
select(V1, x, vmin, vmax) %>%
distinct()
return(list("edgeDT"=edgeDT, "vertDT"=vertDT))
}
A function to plot different types of trees based on the extracted tree layout.
plot_from_layout <- function(tree_ly, plot_vert=FALSE) {
plot(
1,
type = 'n',
axes = TRUE,
xlim = c(0, max(tree_ly$edgeDT$xright)),
ylim = c(0, max(c(tree_ly$edgeDT$y, tree_ly$edgeDT$y0))),
main = 'tree_layout'
)
segments(
x0 = tree_ly$edgeDT$xleft,
y0 = tree_ly$edgeDT$y0,
x1 = tree_ly$edgeDT$xright,
y1 = tree_ly$edgeDT$y
)
if (plot_vert) {
segments(
x0 = tree_ly$vertDT$x,
y0 = tree_ly$vertDT$vmin,
x1 = tree_ly$vertDT$x,
y1 = tree_ly$vertDT$vmax
)
}
}
Testing.
par(mfrow=c(1,2))
layout_from_plot(
tree = tree.owls,
type = "p",
drop_root = FALSE,
use_vert = TRUE
) %>%
plot_from_layout(tree_ly = ., plot_vert = TRUE)
layout_from_plot(
tree = tree.owls,
type = "c",
drop_root = FALSE,
use_vert = FALSE
) %>%
plot_from_layout(tree_ly = ., plot_vert = FALSE)
layout_from_plot(
tree = tree.owls,
type = "u",
drop_root = TRUE,
use_vert = FALSE
) %>%
plot_from_layout(tree_ly = ., plot_vert = FALSE)
The preferred way, instead of removing edges by hand (as above), would be to unroot the tree first, then plot it. For this to work, you would need to modify the way coordinates are extracted. The main question is how the root node splits the edges and to which branch to add the leftover after removing the root node. By default it appears that this extra edge is added to the one internal edge of the network, but this does not seem right given the phylogram and branch lengths there. See below.
owls.unrooted <- unroot(tree.owls)
layout_from_plot(
tree = owls.unrooted,
type = "u",
drop_root = FALSE,
use_vert = FALSE
) %>%
plot_from_layout(tree_ly = ., plot_vert = FALSE)
EDIT 2: Update the layout_from_plot function to handle the y0 coordinates correctly and use unrooted trees instead of manually removing edges.
I realized I was over-complicating with creating the column for the y0 coordinates. All that is needed is to take the yy coordinates ordered by the from the first column of edge from the output of get("last_plot.phylo",envir=.PlotPhyloEnv). Now things work fine with rooted and unrooted trees.
Updated function:
layout_from_plot <- function(tree, type, drop_root=FALSE, use_vert=FALSE) {
if (drop_root) {
tree <- unroot(tree)
}
plot.phylo(x = tree, type = type)
title("Plot to get coordinates")
coords <- get("last_plot.phylo",envir=.PlotPhyloEnv)
edgeDT <- tibble(
xright = coords$xx[coords$edge[,2]],
xleft = coords$xx[coords$edge[,1]],
y = coords$yy[coords$edge[,2]],
y0 = coords$yy[coords$edge[,1]],
edge.length = xright-xleft,
V1 = coords$edge[,1],
V2 = coords$edge[,2]
)
edgeDT <- edgeDT %>%
arrange(V2) %>%
mutate(OTU = c(tree$tip.label, rep(NA_character_, coords$Nnode - 1))) %>%
select(V1, V2, edge.length, OTU, xleft, xright, y, y0)
if (use_vert) {
edgeDT <- mutate(edgeDT, y0=y)
}
vertDT <- edgeDT %>%
group_by(V1) %>%
mutate(vmin=min(y), vmax=max(y)) %>%
mutate(x=xleft[which(y==min(y))]) %>%
select(V1, x, vmin, vmax) %>%
distinct()
return(list("edgeDT"=edgeDT, "vertDT"=vertDT))
}
Additional test code:
layout_from_plot(tree = bird.orders, type = "c", use_vert = FALSE, drop_root = TRUE) %>%
plot_from_layout(tree_ly = ., plot_vert = FALSE)
layout_from_plot(tree = bird.orders, type = "u", drop_root = TRUE, use_vert = FALSE) %>%
plot_from_layout(tree_ly = ., plot_vert = FALSE)

What is the function of !! in front of an argument inside a function?

I am reviewing code that was given to me as part of a homework. I am familiar with functions in R, but this person is using !! before the 'y' argument inside the function, which I had never seen before. I am wondering what is the function of !!.
I have tried google and the help function in R studio.
my_table <- function(df = NULL, y = NULL, grp = NULL,
grpname = " ", dgts = 1, tbltitle = " ") {
y <- enquo(y)
grp <- enquo(grp)
mysummary <- df %>% group_by(!!grp) %>% summarize(
n = sum(!is.na(!!y)),
mean = format(round(mean(!!y, na.rm=T), dgts), nsmall=dgts),
sd = format(round(sd(!!y, na.rm=T), dgts), nsmall=dgts),
se = format(round(sd(!!y, na.rm=T)/sqrt(sum(!is.na(!!y))), dgts),
nsmall=dgts),
kable(mysummary)
}

rgenoud - How to pass parameters to the function?

I have a function that currently plays nice with rgenoud. It has one parameter (xx) and rgenoud will optimize xx perfectly.
However, I would like to add a second parameter to my function that wouldnt be optimized by rgendoud . For example, I would like my function to either fit a model with a gaussian link or a poisson link and to specify that when I call rgenoud.
Any idea?
thanks
edit: here is a minimal working example of what I mean. How would you get the last line to work?
adstock reflect the fact that TV advertising should have an impact on the number of quotes of future weeks.
Adstock[t] = Ads[t] + rate* Ads[t-1] + rate^2*Ads[t-2] + .... + rate^max_memory * Ads[t-max_memory]
We want rgenoud to figure out what rate and max_memory will return the model with the best fit. Best fit is defined as the lowest RMSE.
set.seed(107)
library(fpp)
library(rgenoud)
adstock_k <- function(x, adstock_rate = 0, max_memory = 12){
learn_rates <- rep(adstock_rate, max_memory+1) ^ c(0:max_memory)
adstocked_advertising <- stats::filter(c(rep(0, max_memory), x), learn_rates, method="convolution")
adstocked_advertising <- adstocked_advertising[!is.na(adstocked_advertising)]
return(as.numeric(adstocked_advertising))
}
getRMSE <- function(x, y) {
mean((x-y)^2) %>% sqrt
}
df <- data.frame(insurance) %>%
mutate(Quotes = round (Quotes*1000, digits = 0 ))
df$idu <- as.numeric(rownames(df))
my_f <- function(xx){
adstock_rate <- xx[1]
adstock_memory <- xx[2]
df.temp <- df %>%
mutate(adstock = adstock_k(TV.advert, adstock_rate/100, adstock_memory ))
mod <- lm(data=df.temp, Quotes ~ adstock )
getRMSE( df.temp$Quotes, predict(mod))
}
domaine <- cbind(c(30,1), c(85, 8))
#this works
min_f <- genoud(my_f, nvars = 2, max = F, pop.size=1000, wait.generations=10, Domains = domaine, data.type.int = T)
#here I try to add a second parameter to the function.
my_f2 <- function(xx,first_n_weeks=20){
adstock_rate <- xx[1]
adstock_memory <- xx[2]
df.temp <- df %>%
filter(idu<= first_n_weeks) %>%
mutate(adstock = adstock_k(TV.advert, adstock_rate/100, adstock_memory ))
mod <- lm(data=df.temp, Quotes ~ adstock )
getRMSE( df.temp$Quotes, predict(mod))
}
#this doesnt work
min_f2 <- genoud(my_f2(first_n_week=10), nvars = 2, max = F, pop.size=1000, wait.generations=10, Domains = domaine, data.type.int = T)
Include the argument in the call to genoud, e.g.
genoud(my_f2, nvars = 2, max = F, pop.size=1000, wait.generations=10, Domains = domaine, data.type.int = T, first_n_weeks = 10)

randomize observations by groups (blocks) without replacement

This is a follow up question. The answers in the previous question are doing the random sampling with replacement. How can I change the code so that I assign each observation to on of J "urn" without putting the observation back in the 'lottery'?
This is the code I have right now:
set.seed(9782)
I <- 500
g <- 10
library(dplyr)
anon_id <- function(n = 1, lenght = 12) {
randomString <- c(1:n)
for (i in 1:n)
{
randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
lenght, replace = TRUE),
collapse = "")
}
return(randomString)
}
df <- data.frame(id = anon_id(n = I, lenght = 16),
group = sample(1:g, I, T))
J <- 3
p <- c(0.25, 0.5, 0.25)
randomize <- function(data, urns=2, block_id = NULL, p=NULL, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(is.null(block_id)){
df1 <- data %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
return(df1)
}else{
df1 <- data %>% group_by_(block_id) %>%
mutate(Treatment = sample(x = c(1:urns),
size = n(),
replace = T,
prob = p))
}
}
df1 <- randomize(data = df, urns = J, block_id = "group", p = p, seed = 9782)
If I change replace = T to replace = F I get the following error:
Error: cannot take a sample larger than the population when 'replace = FALSE'
Clarification of my objective:
Suppose that I have 10 classrooms (or villages, or something like that). To keep it simple, suppose each classroom has 20 students (in reality they will have N_j). Classroom per classroom, I want to assign each student to one of J groups, for example J=3. P says the fraction that will be assigned to each group. For example 25% to group 1 40% to group 2 and 35% to group 3.
This solution is based on #Frank's comment. I created one function that does the randomization for block j and another that calls that function for every block.
randomize_block <- function(data, block=NULL, block_name=NULL, urns, p, seed=9782) {
set.seed(seed)
if(!is.null(block)) {
condition <- paste0(block_name,"==",block)
df <- data %>% filter_(condition)
} else df <- data
if(is.null(p)) p <- rep(1/urns, urns)
N <- nrow(df)
Np <- round(N*p,0)
if(sum(Np)!=N) Np[1] <- N - sum(Np[2:length(Np)])
Urns = rep(seq_along(p), Np)
Urns = sample(Urns)
df$urn <- Urns
return(df)
}
randomize <- function(data, block_name=NULL, urns, p, seed=9782) {
if(is.null(p)) p <- rep(1/urns, urns)
if(!is.null(block_name)){
blocks <- unique(data[,block_name])
df <- lapply(blocks, randomize_block,
data = data,
block_name=block_name,
urns = urns,
p = p,
seed=seed)
return(data.table::rbindlist(df))
}else {
df <- randomize_block(data = data,
urns = urns, p = p,
seed=seed)
}
}
test <- randomize(data = df, block_name = "group",
urns = 3, p = c(0.25, 0.5, 0.25),
seed=4222016)
I'm trying to figure out if it is possible to use dplyr to do this, alternative solutions implementing that are more than welcome!
My answer to your other question is without replacement, as can be seen below:
block_rand <- as.tibble(randomizr::block_ra(blocks = df$group, conditions = c("urn_1","urn_2","urn_3")))
df2 <- as.tibble(bind_cols(df, block_rand))
df2 %>% janitor::tabyl(group, value)
df2 %>%
group_by(id) %>%
filter(n()>1) %>%
str()

Resources