Unnest multiple nested dataframes in R - r

I am looking for a way to unnest a nested list with dataframes. The challenge is that I have to do unnest_wider() first and unnest() next, and both of them can only take one column at the time. I have to do over 30 columns and each of them contains a nested dataframes.
This is the replicate case that looks similar to my dataframe.
df <- tibble(
character = c("Toothless", "Dory"),
metadata = list(
list(
species = "dragon",
color = "black",
films = c(
"How to Train Your Dragon",
"How to Train Your Dragon 2",
"How to Train Your Dragon: The Hidden World"
)
),
list(
species = "blue tang",
color = "blue",
films = c("Finding Nemo", "Finding Dory")
)
)
)
df$mtcars <- list(mtcars)
df$iris <- list(iris)
df$longley <- list(longley)
My current working
df1 <- df %>%
unnest_wider(mtcars, names_sep = "_") %>%
unnest_wider(iris, names_sep = "_") %>%
unnest_wider(longley, names_sep = "_")
df2 <- df1 %>%
unnest(mtcars_mpg) %>%
unnest(mtcars_cyl) %>%
unnest(mtcars_disp)
Any advice to do it efficiently will be highly appreciated.

Related

R: How to create a Drilldown Highchart using loops

when doing a job I have found a problem that I don't know how to solve.
I have a data frame that has 2 columns:
date
value
And it has a total of 1303 rows.
For each date there are 12 values (1 for each month), except in the last year that only has 7
The work I have to do would be to create a 'drilldown' style chart using the 'highcharter' library. The problem is that I don't know how to do it efficiently.
The solution that comes to my mind is not very efficient, below I show my solution so you can see what I mean.
dataframe
# Load packages
library(tidyverse)
library(highcharter)
library(lubridate)
# Load dataset
df <- read.csv('example.csv')
# Prepare df to use
dfDD <- tibble(name = year(df$date),
y = round(df$value, digits = 2),
drilldown = name)
# Create a data frame to use in 'drilldown' (for each year)
df1913 <- df %>%
filter(year(date) == 1913) %>%
data.frame()
df1914 <- df %>%
filter(year(date) == 1914) %>%
data.frame()
# Create a drilldown chart using Highcharter library
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Example Drilldown") %>%
hc_xAxis(type = "category") %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(series = list(boderWidth = 2,
dataLabels = list(enabled = TRUE))) %>%
hc_add_series(data = dfDD,
name = "Mean",
colorByPoint = TRUE) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = list(list(id = 1913,
data = list_parse2(df1913)),
list(id = 1914,
data = list_parse2(df1914))))
Seeing my solution for the first time, I realized that in order to complete the graph I would have to create a subset of values for each year. Having realized that I tried to find a more efficient solution using a 'for loop' but so far I can't get it to work.
Is there a more efficient way to create this graph using a 'loop'!?
If it can be done in another way than using loops, I would also like to know.
Thank you for reading my question and I hope I explained myself well.
Using split and purrr::imap you could split your data by years and loop over the resulting list to convert your data to the nested list object required by hc_drilldown. Note: It's important to make the id a numeric and to pass a unnamed list.
library(tidyverse)
library(highcharter)
library(lubridate)
series <- split(df, year(df$date)) %>%
purrr::imap(function(x, y) list(id = as.numeric(y), data = list_parse2(x)))
# Unname list
names(series) <- NULL
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Example Drilldown") %>%
hc_xAxis(type = "category") %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(series = list(boderWidth = 2,
dataLabels = list(enabled = TRUE))) %>%
hc_add_series(data = dfDD,
name = "Mean",
colorByPoint = TRUE) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = series)

Multi-Label-Classification Approach to complex tabular data structure in R

I have a set of two dataframes that both contain duplicates but need to be merged into one big dataframe in order to use it as input for an ML algorithm.
Connecting the two dataframes is a big problem in the first place. Furthermore, it is difficult to predict multiple classes from the resulting dataset.
Since the original dataset has its origin in the medical field and is confidential, I added a fictional (though realistic) example code for a reproduction of the problem below.
library(data.table)
library(dplyr)
library(tidyr)
data1 <- data.table("color" = c("green", "green", "red", "red", "blue", "blue", "blue", "red", "pink"),
"type" = c("SUV", "SUV", "SEDAN", "SEDAN", "SEDAN", "TRUCK", "TRUCK", "CABRIO", "CABRIO"),
"NUM_SEATS" = c(4,4,5,4,4,3,3,2,2),
"MODELL_ID" = c("xyz", "xyz", "abc", "abc", "abc", "rtz", "rtz", "ghj", "ghj"))
data2 <- data.table("BRAND" = c("VW", "VW", "VW", "AUDI", "AUDI", "BMW", "BMW", "GM", "GM"),
"year_quarter" = c("20173", "20173", "20174", "20174", "20171", "20181", "20162", "20172", "20192"),
"MODELL_ID" = c("xyz", "xyz", "abc", "abc", "abc", "rtz", "rtz", "ghj", "ghj"))
data1 <- data1 %>% group_by(MODELL_ID) %>% mutate(time = row_number()) %>% ungroup()
data2 <- data2 %>% group_by(MODELL_ID) %>% mutate(time = row_number()) %>% ungroup()
data1_temp <- data1 %>% pivot_wider(names_from = time, values_from = c(-MODELL_ID), names_sort = TRUE, names_sep = "-")
data2_temp <- data2 %>% pivot_wider(names_from = time, values_from = c(-MODELL_ID), names_sort = TRUE, names_sep = "-")
data_join <- inner_join(data1_temp, data2_temp, by = c("MODELL_ID")) %>% select(-starts_with(c("n.", "time"))) %>% pivot_wider(names_from = "MODELL_ID", values_from = "MODELL_ID", names_prefix = "MODELL_ID-") %>% as.matrix()
data_join[is.na(data_join)] <- "0"
x_data <- data_join %>% as.data.table() %>% select(-starts_with("MODELL_ID-"))
y_data <- data_join %>% as.data.table() %>% select(starts_with("MODELL_ID-"))
x_data # input (unvectorized)
y_data # output (unvectorized)
x_data %>% data.matrix()-1 # input (vectorized)
y_data %>% data.matrix()-1 # output (vectorized)
X is my input (x_data), Modell_ID (y_data) my output. I want my ML solution to predict all possible Modell_IDs when given a row of X.
It would be great to get some advise on how to actually implement a solution for this. Every approach so far (Feed-Forward Net, etc.) has not delivered noticeable results...
I am really looking for a game-changing command, approach, example code rather than just superficial tips.

how to create columngroups from data with changing/reactive column values in reactable (and shiny)?

Iam trying to create colGroups within a reactable but with changing values for the column by which i want to group. So in my example dataset the values for column group are red and blue but can change by user interaction since i want to embed the reactable in a shiny App.
The first code example provides an example with static coded colnames and this works ofc fine.
But im looking for a way to recreate the result from the first code example with not explicitly coding the colnames and i tried in the second code example.
But i cannot understand why it gives me the error Error in reactable(., columns = map(.x = seq_along(data_wider_colnames), : `columns` must be a named list of column definitions. Something must be wrong in the columns-definition; the columnGroups-definition seem to work.
library(reactable)
library(tidyr)
data = tibble(group = c("red","blue"),
month = rep("august", 2),
valueOne = c(500,1000),
valueTwo = c(200, 2000),
valueThree = c(100, 5000))
### bring data into wider format and seperate new colnames with `.`
data_wider = data %>%
pivot_wider(names_from = group, names_glue = "{group}.{.value}", values_from = c("valueOne", "valueTwo", "valueThree"))
### pipe wider data into reactable
data_wider %>%
reactable(
### rename all new colnames
columns = list(
"red.valueOne" = colDef(name = "valueOne"),
"blue.valueOne" = colDef(name = "valueOne"),
"red.valueTwo" = colDef(name = "valueTwo"),
"blue.valueTwo" = colDef(name = "valueTwo"),
"red.valueThree" = colDef(name = "valueThree"),
"blue.valueThree" = colDef(name = "valueThree")),
### create columngroups
columnGroups = list(
colGroup(name = "red", columns = c("red.valueOne", "red.valueTwo", "red.valueThree")),
colGroup(name = "blue", columns = c("blue.valueOne", "blue.valueTwo", "blue.valueThree")))
)
library(reactable)
library(tidyr)
library(purr)
library(stringr)
data = tibble(group = c("red","blue"),
month = rep("august", 2),
valueOne = c(500,1000),
valueTwo = c(200, 2000),
valueThree = c(100, 5000))
### bring data into wider format and seperate new colnames with `.`
data_wider = data %>%
pivot_wider(names_from = group, names_glue = "{group}.{.value}", values_from = c("valueOne", "valueTwo", "valueThree"))
### asign new colnames into a help vector to extract/and change names attribute from it in colDef()
data_wider_colnames = colnames(data_wider[,2:ncol(data_wider)])
### asign distinct values for column `group` into new vector
distinct_group_values = data$group
### asign new colnames for each distinct value of `group` into a help vector to declare colGroups() and corresponding colums
distinct_group_colnames = map(seq_along(names), .f = ~ str_subset(data_wider_colnames, distinct_group_values[.x]))
data_wider %>%
reactable(
columns = map(.x = seq_along(data_wider_colnames),
.f = ~ set_names(list(colDef(name = str_extract(data_wider_colnames[.x], "(?<=\\.).*"))), data_wider_colnames[.x])),
columnGroups = map(.x = seq_along(distinct_group_values),
.f = ~ colGroup(name = distinct_group_values[.x], columns = distinct_group_colnames[.x][[1]]))
)
I have not tried to check out what's wrong with your code. But one option to achieve your desired result may look like so:
library(reactable)
library(tidyr)
data = tibble(group = c("red","blue"),
month = rep("august", 2),
valueOne = c(500,1000),
valueTwo = c(200, 2000),
valueThree = c(100, 5000))
### bring data into wider format and seperate new colnames with `.`
data_wider = data %>%
pivot_wider(names_from = group, names_glue = "{group}.{.value}",
values_from = c("valueOne", "valueTwo", "valueThree"))
# Get column names
cols <- names(data_wider)[!names(data_wider) %in% c("month")]
cols <- setNames(cols, cols)
# Get colGroup names
col_groups <- as.character(unique(data$group))
# Make columns
columns <- lapply(cols, function(x) colDef(name = gsub("^.*?\\.(.*)$", "\\1", x)))
# Make columnGroups
columnGroups <- lapply(col_groups, function(x) {
cols <- unlist(cols[grepl(x, names(cols))])
colGroup(name = x, columns = unname(cols))
})
reactable(
data_wider,
columns = columns,
columnGroups = columnGroups
)
A small admentment to Stefan's answer. If colums groups have similar name, e.g.
data = tibble(group = c("red","red month"))
Stefan's solution would not work (it will create a large columngroup named only 'red'). It can be easily fixed by adjusting Stefan's code with
cols <- unlist(cols[grepl(paste0(x,"."), names(cols), fixed = T)])
The full example would then look like this:
library(reactable)
library(tidyr)
data = tibble(group = c("red","red month"),
month = rep("august", 2),
valueOne = c(500,1000),
valueTwo = c(200, 2000),
valueThree = c(100, 5000))
### bring data into wider format and seperate new colnames with `.`
data_wider = data %>%
pivot_wider(names_from = group, names_glue = "{group}.{.value}",
values_from = c("valueOne", "valueTwo", "valueThree"))
# Get column names
cols <- names(data_wider)[!names(data_wider) %in% c("month")]
cols <- setNames(cols, cols)
# Get colGroup names
col_groups <- as.character(unique(data$group))
# Make columns
columns <- lapply(cols, function(x) colDef(name = gsub("^.*?\\.(.*)$", "\\1", x)))
# Make columnGroups
columnGroups <- lapply(col_groups, function(x) {
# Here is the change
cols <- unlist(cols[grepl(paste0(x,"."), names(cols), fixed = T)])
colGroup(name = x, columns = unname(cols))
})
reactable(
data_wider,
columns = columns,
columnGroups = columnGroups
)

Calculating percent for missing values using gtsummary in Rstudio

My question is a bit similar to this one here.
I have this following codes:
library(gtsummary)
basicvars <- names(isoq) %in% c("homeless_nonself", "test_result")
basictable <- isoq[basicvars]
# summarize the data
table1 <- tbl_summary(basictable, missing = "always",
missing_text = "(Missing)",
percent = "cell",
type = all_dichotomous() ~"categorical"
) %>%
bold_labels()
############Selecting the order of variables
basiccompletetable <- basictable %>% select(test_result,homeless_nonself)
mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>%
table3 <- tbl_summary(basiccompletetable, #missing = "always", missing_text = "(Missing)",
percent = "cell",
label = list(
test_result ~ "COVID-19 Test Result",
homeless_nonself ~ "Homeless",
),
sort = list(
test_result ~ "frequency",
homeless_nonself ~ "frequency",
),
type = list(all_character() ~ "categorical")
) %>%
modify_spanning_header(starts_with("stat_") ~ "**All**") %>%
modify_header(label = "**Variable**") %>% # update the column header
#add_n() %>%
bold_labels() %>%
as_gt() %>%
gt::tab_source_note(gt::md("*This data is simulated*"))
table3
It spits the output (not the complete output)
I am trying to show the percentages for the missing values. Tried first with test_result. Used this line of code mutate(test_result = factor(test_result) %>% fct_explicit_na()) %>% to what was suggested in the earlier question. However, I am seeing the same table as my output and there are no percentages on the missing values for the variable test_result.
Any suggestions why this is not working? Thanks

Loop through columnames to format colums with R-Package formattable kableExtra (R dplyr)

Hei,
To compare several variants of data I produced a HTML report.
Given a special catagory some indexes in the database should be the same. To detect errors / incorrect entries in the database I compare the different categories in a table.
For better reading, it would be fine, to have coloured tables. This can be done easily with the formattable-Package.
My dataset:
require(tidyverse)
require(formattable)
require(kableExtra)
require(knitr)
df1 <- data.frame(V1 = c(68,sample(c("J","N"),size=15,replace = TRUE)),
V2 = c(10,sample(c("J","N"),size=15,replace = TRUE)),
V3 = c(1,sample(c("J","N"),size=15,replace = TRUE))
)
It has - in this example - 3 differnt variants. Only one is recomended. It is supposed, that the variant with the highest N (=first entry in each Vx-Column) is the real one.
My formated table is produced with this code:
df1 %>%
mutate(
V2 = ifelse((as.character(V2) == as.character(V1)) == FALSE,
cell_spec(V2, color = "red",bold = TRUE),
cell_spec(V2, color = "black",bold = FALSE)),
V3 = ifelse((as.character(V3) == as.character(V1)) == FALSE,
cell_spec(V3, color = "red",bold = TRUE),
cell_spec(V3, color = "black",bold = FALSE))
) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Two questions:
How to mutate in a loop?
This is necessary because the different categories I have to investigate can have up to 18 different variants. In each dataset, V1 is everytime the reference variant.
As you can see (run the code!) the first line (the "N"s) is coded in the wrong matter. Is it possible to compare from the second line on only (first line is set to TRUE by default)
This would be fine, because the first line is now formated in a matter that does not really make sense.
Thank you!
To answer your two questions:
Instead of looping over the columns, you can use mutate_all
Just take a copy of the first column and mutate it back in later
I have first made your cell_spec calls into functions to reduce clutter in the code.
red <- function(x) cell_spec(x, color = "red", bold = TRUE)
black <- function(x) cell_spec(x, color = "black", bold = FALSE)
c1 <- as.character(df1[[1]])
Now we can do this:
df1 %>%
select(-V1) %>%
mutate_all(function(x) ifelse(as.character(x) != df1[[1]], red(x), black(x))) %>%
mutate(V1 = black(c1)) %>%
mutate_all(function(x) `[<-`(x, 1, " ")) %>%
select(V1, V2, V3) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Which gives this result:
Thank you, #AllanCameron!
I 'm not familiar to the package purrr - I really should do more studies about it.
Your idea with purrr::map_dfc solved the problem.
Instead of the first column I need the first row (the digit-row), and of course with grepl it is possible to solve this. The condition in the ifelse-Statement is a little bit longer then.
My final solution is then:
df1 %>%
map_dfc(function(x) ifelse(as.character(x) != as.character(df1$V1) & !grepl("[[:digit:]]",x),
mark_true(x), mark_false(x))) %>%
select(V1, everything()) %>%
kable(format = "html", escape = FALSE) %>%
kable_styling(c("striped", "condensed"), full_width = FALSE) %>%
row_spec(1, bold = T, color = "white", background = "#D7261E")
Thank you very much!

Resources