I don't know how to order columns in tidyHtmlTable function from htmlTable package.
The documentation says:
Columns are sorted by arrange(cgroup,header) where cgroup will be expanded to the columns
of the cgroup argument, e.g. cgroup = c(a, b), header = c will become arrange(a,b,c). If you
want to sort in non-alphabetic order you can provide a factor variable and that information will be retained.
How can I provide such factor variable? For example, in the doc example
library(magrittr)
library(tidyr)
library(dplyr)
library(htmlTable)
library(tibble)
td <- mtcars %>%
as_tibble(rownames = "rnames") %>%
pivot_longer(names_to = "per_metric",
cols = c(hp, mpg, qsec))
tidy_summary <- td %>%
group_by(cyl, gear, per_metric) %>%
summarise(Mean = round(mean(value), 1),
SD = round(sd(value), 1),
Min = round(min(value), 1),
Max = round(max(value), 1),
.groups = 'drop') %>%
pivot_longer(names_to = "summary_stat",
cols = c(Mean, SD, Min, Max)) %>%
ungroup() %>%
mutate(gear = paste(gear, "Gears"),
cyl = paste(cyl, "Cylinders"))
tidy_summary %>%
arrange(per_metric, summary_stat) %>%
addHtmlTableStyle(align = "r") %>%
tidyHtmlTable(header = gear,
cgroup = cyl,
rnames = summary_stat,
rgroup = per_metric)
if I want to maintain the order of rgroup and rnames but, for example, I want the columns in non-alphabetical order, like this:
4 Gears |3 Gears |5 Gears
How can I do that? Thank you.
What we can do is to make the changes before tidyHtmlTable
How it works:
mutate gear to factor class
Use fct_relevel from forcats package (it is in tidyverse)
library(forcats)
tidy_summary %>%
arrange(per_metric, summary_stat) %>%
addHtmlTableStyle(align = "r") %>%
mutate(gear = as.factor(gear),
gear = fct_relevel(gear, "4 Gears", "3 Gears", "5 Gears")) %>%
tidyHtmlTable(header = gear,
cgroup = cyl,
rnames = summary_stat,
rgroup = per_metric)
Related
I have the following function with which I want to create summary statistics (for two data sets simultaneously):
mean.k <-function(x){round(mean(x, na.rm=TRUE), digits = 3)}
sd.k <-function(x){round(sd(x, na.rm=TRUE), digits = 3)}
sumstats<-function(x, y) { sumtable <- cbind(as.matrix(colSums(!is.na(x))),sapply(x,mean.k), paste("(",sapply(x,sd.k),")", sep = ""), as.matrix(colSums(!is.na(y))),sapply(y,mean.k), paste("(",sapply(y,sd.k),")", sep = ""))
sumtable=as.data.frame(sumtable)
names(sumtable)=c("Obs","Mean","Std.Dev", "Obs","Mean","Std.Dev");
sumtable}
On some data, the result looks like:
data(iris)
libary(dplyr)
iris_1 <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) # data set 1
iris_2 <- iris %>% filter(Species == "versicolor") %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) # data set 2
iris_desk_stats <- sumstats(iris_1, iris_2)
However, for me the optimal result would look like this:
So the two changes I need are the following:
Each standard deviation should be placed under the respective mean instead of in a separate column next to the means.
Obs should not be a proper column, but a row at the top in order to avoid redundancy.
Has anyone an idea? The first of the two desired changes would be the most important one.
I thought that I'd try to approach this "directly"; And here's what I've come up with:
iris %>%
as_tibble() %>%
summarise(
across(!where(is.factor), list(
mean = . %>% mean(na.rm = TRUE),
sd = . %>% sd(na.rm = TRUE)
)),
Obs = n()
) %>%
pivot_longer(
# c(contains(c("mean", "sd")), "Obs"),
everything(),
names_to = c("variable", "metric"),
values_to = "stats",
names_sep = "_") %>%
tidyr::replace_na(list(metric = "identity")) %>%
pivot_wider(names_from = metric,
values_from = stats) %>%
select(variable, everything()) %>%
mutate(entries = glue::glue("{mean}<br>({sd})",
mean = round(mean, 3),
sd = round(sd, 3)),
entries = if_else(!is.na(identity),
glue::glue("{identity}"),
entries)) %>%
select(-c("mean", "sd", "identity")) %>%
arrange(variable) %>%
gt::gt() %>%
gt::fmt_markdown(entries) %>%
identity()
I have a very large data set with variable names that are super abbreviated and it would help immensely if the label in the attr(*, "label") section was extracted and showed up in the column beside the corresponding variable.
label(mtcars[["mpg"]]) <- "Miles/(US) gallon"
label(mtcars[["hp"]]) <- "Gross horsepower"
label(mtcars[["wt"]]) <- "Weight (1000lbs)"
My current code just gets the mean/sd from the entire data set:
mtcars %>% select(mpg, hp, wt) %>% pivot_longer(everything()) %>% group_by(name) %>% summarise(mean=mean(value, na.rm = TRUE), sd=sd(value, na.rm=TRUE))
But I want a column with the label of the variables so it's easier to tell:
name mean sd label
hp 14.7. 68.6 Gross horsepower
mpg 20.1 6.03 Miles/(US) gallon
wt 3.22 0.978 Weight (1000lbs)
I found a thread that sort of gets to what I want, but if I add mutate(labels=label(mtcars)[name]) at the end of the code, I get a column with NA instead of the labels.
We can use imap
library(purrr)
library(dplyr)
library(Hmisc)
imap_dfr(mtcars[c('hp', 'mpg', 'wt')], ~
tibble(name = .y, mean = mean(.x[[1]]),
sd = sd(.x[[1]], na.rm = TRUE),
label = attr(.x, 'label')))
If we use the OP's method, we can also use summarise_all and then do the pivot_longer
library(tidyr)
mtcars %>%
dplyr::select(mpg, hp, wt) %>%
summarise_all(list(mean = ~mean(., na.rm = TRUE),
sd = ~sd(., na.rm = TRUE),
label = ~attr(., 'label'))) %>%
mutate(rn = 1) %>%
pivot_longer(cols = -rn, names_to = c('name', '.value'), names_sep="_") %>%
select(-rn)
# name mean sd label
#1 mpg 20.09062 6.0269481 Miles/(US) gallon
#2 hp 146.68750 68.5628685 Gross horsepower
#3 wt 3.21725 0.9784574 Weight (1000lbs)
I want to create a summary table for some dichotomous variables using the expss package. Since the variables are dichotomous, one of the two levels would the sufficient to "show the picture".
I tried to use the function tab_net_cell, but was not able to get the right results. Here is some example code with BrCa (Breast cancer) with 1 or 0. I only want to show the number of patients with but not without breast cancer.
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_net_cells("BrCa" = eq(1)) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
The simplest way is to filter resulted table:
df <- data.frame(BrCa = c(1,1,1,0,0,0,NA,NA,0,0))
df$group <- c(1,2,1,2,1,2,1,2,1,2)
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa) %>%
expss::tab_stat_cpct(total_row_position = "none",label = "%") %>%
expss::tab_stat_cases(total_row_position = "none",label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows") %>%
expss::where(grepl(1, row_labels))
Another way is to use mean and sum instead of cpct and cases:
df %>%
expss::tab_cols(group) %>%
expss::tab_cells(BrCa*100) %>%
expss::tab_stat_mean(label = "%") %>%
expss::tab_stat_sum(label = "N") %>%
expss::tab_pivot(stat_position = "inside_rows")
I can use purrr::map() to get the mean of every column in a dataframe. Can I use any of the map functions in combination with count() to get counts for each categorical variable in a dataframe?
library(dplyr)
library(purrr)
mtcars %>% map(mean)
mtcars %>% mutate(am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))) %>% select_if(is.factor) %>%
map(count)
Almost there! Just need to specify the data in count:
mtcars %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select_if(is.factor) %>%
map(~count(data.frame(x = .x), x))
You can use the 'table' function instead of count:
mtcars %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select_if(is.factor) %>%
map(table)
#$`vs`
#V S
#18 14
#$am
#auto manual
#19 13
I am trying to use purrr::pmap() to apply a custom function in a rowwise fashion along some dataframe rows. I can achieve my desired end result with a for-loop and with apply(), but when I try to use pmap() I can only get the result I want in combination with mutate(), which in my real-life applied case will be insufficient.
Is there a way to use pmap() to apply my custom function and just have the output print rather than be stored in a new column?
library(dplyr)
library(purrr)
library(tibble)
Create demo data & custom function
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
Successful example of rowwise for-loop:
for (row in 1:nrow(ds_mt)) {
foo(
model = ds_mt[row, "model"],
am = ds_mt[row, "am"],
mpg = ds_mt[row, "mpg"]
)
}
Successful example using apply():
row.names(ds_mt) <- NULL # to avoid named vector as output
apply(
ds_mt,
MARGIN = 1,
FUN = function(ds)
foo(
model = ds["model"],
am = ds["am"],
mpg = ds["mpg"]
)
)
Example using pmap() within mutate() that is almost what I need.
ds_mt %>%
mutate(new_var =
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
))
FAILING CODE: Why doesn't this work?
ds_mt %>%
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
)
So after some more reading it seems this is a case for pwalk() rather than pmap(), because I am trying to get output to print (i.e., a side effect) rather than to be stored in a dataframe.
library(dplyr)
library(purrr)
library(tibble)
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
ds_mt %>%
select(model, am, mpg) %>%
pwalk(
.l = .,
.f = foo
)