Selecting elements from a list with non compatible length - r

Given the following structure of the list:
x <- list(list(Main = list(one = list(tlv = 1, beta = 2), two = "three", three = 4,list_a = list(list(value_1 = "a1", value_2 = "b", c = "c")))),
list(Main = list(one = list(tlv = 2, beta = 6), two = "seven", three = 8,list_a = list(list(value_1 = "aa2", value_2 = "bb", c = "cc")))),
list(Main = list(one = list(tlv = 3),list_a = list(list(value_1 = c("aaa3", "aaaa4"), value_2 = c("bbb", "bbbb"), c = c("ccc", "ccc"))))))
I'm trying to create a dataframe with a structure like this:
tlv | value_1
1 | a1
2 | aa2
3 | aaa3
3 | aaaa4
so far I have to the following:
library(tidyverse)
tibble::tibble(
tlv = map(x, list(1,1,"tlv"), .default = NA) %>% unlist(),
value = map(x, list(1,"list_a", 1, "value"), .default = NA) %>% unlist())
Which leads to the following error:
Error: Tibble columns must have compatible sizes.
* Size 3: Existing data.
* Size 4: Column `value`.
i Only values of size one are recycled.
This makes sense given the structure of the list (3 values for one of the variables en 4 values for the other). But I don't see a solution to link the values to the parent element of the list. So that every 'value' also gets the corresponding 'tlv' value. Any guidance how to solve this problem?

Found a solution, this does the trick:
x %>%
map_df(~tibble(
tlv = .$Main$one$tlv,
value = .$Main$list_a[[1]]$value_1))

An alternative :
library(tidyverse)
value_1 <-
map_depth(x, 4, pluck, "value_1", .ragged = TRUE) %>%
map(unlist, use.names = FALSE)
tlv <-
map_depth(x, 3, pluck, "tlv") %>%
map_dbl(unlist, use.names = FALSE)
df <-
tibble(tlv = tlv, value_1 = value_1) %>%
unnest_auto(col = value_1)

Related

Formatting gtsummary tables with checkbox questions

I have been enjoying the gtsummary library quite a bit but I can't find a clean way to display checkbox style questions (select all that apply) gtsummary::tbl_summary. Here is an example:
example_df = tibble::tibble(
CHOICE1 = sample(c(1, NA), size = 10, replace = TRUE),
CHOICE2 = sample(c(1, NA), size = 10, replace = TRUE),
CHOICE3 = sample(c(1, NA), size = 10, replace = TRUE)
)
for(i in 1:3){
expss::val_lab(example_df[[i]]) = set_names(1, letters[i])
expss::var_lab(example_df[[i]]) = 'Question 1'
}
example_df %>%
gtsummary::tbl_summary(
type = list(
CHOICE1 ~ "categorical",
CHOICE2 ~ "categorical",
CHOICE3 ~ "categorical"
)
)
Ideally, we would just have one header that says 'Question 1' and then each of the columns would be summarized below it. Any suggestions on how to do this properly or gerry rig it?
Thank you!
Great question. Below is an, admittedly, not great solution to your question. But it does get the job done. If you file an GH issue on the gtsummary page, requesting better support for these types of data, we can work together a more concise solution. Happy Programming!
library(gtsummary)
library(tidyverse)
example_df = tibble::tibble(
CHOICE1 = sample(c(1, NA), size = 10, replace = TRUE),
CHOICE2 = sample(c(1, NA), size = 10, replace = TRUE),
CHOICE3 = sample(c(1, NA), size = 10, replace = TRUE)
)
for(i in 1:3){
expss::val_lab(example_df[[i]]) = setNames(1, letters[i])
expss::var_lab(example_df[[i]]) = 'Question 1'
}
example_df %>%
mutate(across(everything(), ~replace_na(., 0L))) %>%
gtsummary::tbl_summary(
type = list(
CHOICE1 ~ "categorical",
CHOICE2 ~ "categorical",
CHOICE3 ~ "categorical"
)
) %>%
remove_row_type(variables = c(CHOICE2, CHOICE3), type = "header") %>%
modify_table_body(
~.x %>%
filter(label != "0")
) %>%
as_kable() # converting to kable to display on SO
Characteristic
N = 10
Question 1
a
4 (40%)
b
3 (30%)
c
4 (40%)
Created on 2022-01-12 by the reprex package (v2.0.1)

Subsetting nested lists within R

I am trying to create a dataframe from nested lists from within R. Here is an example:
mylist<-list(file1 = list("a", sample1 = list(x = 2, y = list(c(1, 2)),
sample2 = list(x = 4, y = list(c(3, 8))))), file2 = list(
"a", sample1 = list(x = 6, y = list(c(6, 4)), sample2 = list(
x = 6, y = list(c(7, 4))))))
I would like to know how I could extract all the features 'x' and the features 'y' from the nested lists, with 'y' split into two columns; one for each value?
Thanks you for your time everyone!
I'm not exactly sure what you're expected output is supposed to be like, but perhaps something like this?
library(tidyverse)
unlist(mylist) %>%
data.frame(val = .) %>%
rownames_to_column("id") %>%
filter(str_detect(id, "(x|y1|y2)")) %>%
separate(id, into = c("id", "col"), sep = "\\.(?=\\w+$)") %>%
spread(col, val)
# id x y1 y2
#1 file1.sample1 2 1 2
#2 file1.sample1.sample2 4 3 8
#3 file2.sample1 6 6 4
#4 file2.sample1.sample2 6 7 4

Recoding values in a selection of columns of a dataframe using dplyr

I have a large dataset like the one in the next example. Columns with As in their headers have codes from 1 to 4, and columns with Bs from 1 to 3.
library(dplyr)
d <- data.frame(
ID = 1:10,
A = sample(x = 1:4, size = 10, replace = T),
AA = sample(x = 1:4, size = 10, replace = T),
B = sample(x = 1:3, size = 10, replace = T),
BB = sample(x = 1:3, size = 10, replace = T)
)
Is there a neat way to use pipes in dplyr to recode the values from columns with As in the headers and columns with Bs to the following strings?
As - from 1, 2, 3, 4 to Green, Yellow, Orange, Red respectively
Bs - from 1, 2, 3 to Green, Yellow, Red respectively
This is a simplified and friendlier version of the real dataset.
By using mutate_at from dplyr, it is possible to accomplish the recodification from numeric codes to strings. It is necessary to first coerce the columns we want to recode from numeric to character or, otherwise, there will be an error message.
library(dplyr)
d <- data.frame(
ID = 1:10,
A = sample(x = 1:4, size = 10, replace = T),
AA = sample(x = 1:4, size = 10, replace = T),
B = sample(x = 1:3, size = 10, replace = T),
BB = sample(x = 1:3, size = 10, replace = T))
d_recoded <- d %>% mutate_at(vars(-contains("ID")), funs(as.character)) %>%
mutate_at(vars(contains("A"), -contains("ID")), funs(case_when(. == 1 ~ "Green", . == 2 ~ "Yellow", . == 3 ~ "Orange", . == 4 ~ "Red"))) %>%
mutate_at(vars(contains("B"), -contains("ID")), funs(case_when(. == 1 ~ "Green", . == 2 ~ "Yellow", . == 3 ~ "Red")))

How to Convert a Data Frame to a List for Clickstream Analysis

I am a novice R user and new to the forum.
I have a data frame that I need to convert so that each row is a character vector.
I want to remove the 0's from the data frame so that each row can have varying lengths. So in essence each row is a separate character vector in a list.
Where I am at is the following:
mydf<-matrix(sample(0:1,12*5, replace = T),ncol =4)
colnames(mydf)<-letters[1:ncol(mydf)]
swapcol <-which(mydf == 1, arr.ind = T)
mydf[swapcol]<-colnames(mydf)[swapcol[,2]]
mydf
The code produces a data frame in which the column labels are values. I need the following output:
Desired List Result
the format appears to be what I need in order to read in data to the package clickstream.
Thanks
Try this solution:
library(tidyverse)
s <- sample(x = 0:1, size = 15 * 4, replace = TRUE)
mx <- matrix(data = s, nrow = 15, ncol = 4, byrow = TRUE,
dimnames = list(c(paste("User", 1:15, sep = " ")), c("V1", "V2", "V3", "V4")))
df2 <- mx %>% as.data.frame() %>% rownames_to_column() %>% as_tibble()
%>% mutate(
V1 = ifelse(test = V1 == 1, yes = "a", no = NA),
V2 = ifelse(test = V2 == 1, yes = "b", no = NA),
V3 = ifelse(test = V3 == 1, yes = "c", no = NA),
V4 = ifelse(test = V4 == 1, yes = "d", no = NA))
mx2 <- t(apply(X = df2, MARGIN = 1, FUN = function(x{return(c(x[!is.na(x)],
x[is.na(x)]))}))
This returns a list with the formart you are asking for:
list(
apply(mydf, 1, function(a_row) {
my_paste <- function(...){
paste(..., sep = ", ")
}
a_row <- Reduce(my_paste, a_row)
a_row <- gsub("0(, )*", "", a_row)
a_row <- gsub(", $", "", a_row)
})
)
This returns a list of length 1. Replacing list with as.list, returns a list of length 15.

Return nested list with nested level and value

I would like to visualize some deeply nested data using networkD3. I can't figure out how to get the data into the right format before sending to radialNetwork.
Here is some sample data:
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
where level indicates the level of the nest, and value is the name of the node. By using these two vectors, I need to get the data into the following format:
my_list <- list(
name = "root",
children = list(
list(
name = value[1], ## a
children = list(list(
name = value[2], ## b
children = list(list(
name = value[3], ## c
children = list(
list(name = value[4]), ## d
list(name = value[5]) ## e
)
),
list(
name = value[6], ## f
children = list(
list(name = value[7]), ## g
list(name = value[8]) ## h
)
))
))
),
list(
name = value[9], ## i
children = list(list(
name = value[10], ## j
children = list(list(
name = value[11] ## k
))
))
)
)
)
Here is the deparsed object:
> dput(my_list)
# structure(list(name = "root",
# children = list(
# structure(list(
# name = "a",
# children = list(structure(
# list(name = "b",
# children = list(
# structure(list(
# name = "c", children = list(
# structure(list(name = "d"), .Names = "name"),
# structure(list(name = "e"), .Names = "name")
# )
# ), .Names = c("name",
# "children")), structure(list(
# name = "f", children = list(
# structure(list(name = "g"), .Names = "name"),
# structure(list(name = "h"), .Names = "name")
# )
# ), .Names = c("name",
# "children"))
# )), .Names = c("name", "children")
# ))
# ), .Names = c("name",
# "children")), structure(list(
# name = "i", children = list(structure(
# list(name = "j", children = list(structure(
# list(name = "k"), .Names = "name"
# ))), .Names = c("name",
# "children")
# ))
# ), .Names = c("name", "children"))
# )),
# .Names = c("name",
# "children"))
Then I can pass it to the final plotting function:
library(networkD3)
radialNetwork(List = my_list)
The output will look similar to this:
Question: How can I create the nested list?
Note: As pointed out by #zx8754, there is already a solution in this SO post, but that requires data.frame as input. Due to the inconsistency in my level, I don't see a simple way to transform it into a data.frame.
Using a data.table-style merge:
library(data.table)
dt = data.table(idx=1:length(value), level, parent=value)
dt = dt[dt[, .(i=idx, level=level-1, child=parent)], on=.(level, idx < i), mult='last']
dt[is.na(parent), parent:= 'root'][, c('idx','level'):= NULL]
> dt
# parent child
# 1: root a
# 2: a b
# 3: b c
# 4: c d
# 5: c e
# 6: b f
# 7: f g
# 8: f h
# 9: root i
# 10: i j
# 11: j k
Now we can use the solution from the other post:
x = maketreelist(as.data.frame(dt))
> identical(x, my_list)
# [1] TRUE
As a preface, your data is difficult to work with because critical information is encoded in the order of the values in level. I don't know how you get those values in that order, but consider that there may be a better way to structure that information in the first place, which would make the next task easier.
Here's a base-y way of converting your data into a data frame with 2 columns, parent and child, then passing that into data.tree functions that can easily convert to the JSON format you need... and then pass it on to radialNetwork...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(data.tree)
library(networkD3)
parent_idx <- sapply(1:length(level), function(n) rev(which(level[1:n] < level[n]))[1])
df <- data.frame(parent = value[parent_idx], child = value, stringsAsFactors = F)
df$parent[is.na(df$parent)] <- ""
list <- ToListExplicit(FromDataFrameNetwork(df), unname = T)
radialNetwork(list)
Here's a tidyverse way of achieving the same...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(tidyverse)
library(data.tree)
library(networkD3)
data.frame(level, value, stringsAsFactors = F) %>%
mutate(row = row_number()) %>%
mutate(level2 = level, value2 = value) %>%
spread(level2, value2) %>%
mutate(`0` = "") %>%
arrange(row) %>%
fill(-level, -value, -row) %>%
gather(parent_level, parent, -level, -value, -row) %>%
filter(parent_level == level - 1) %>%
arrange(row) %>%
select(parent, child = value) %>%
data.tree::FromDataFrameNetwork() %>%
data.tree::ToListExplicit(unname = TRUE) %>%
radialNetwork()
and for a bonus, the current dev version of networkD3 (v0.4.9000) has a new treeNetwork function that takes a data frame with nodeId and parentId columns/variables, which eliminates the need for the data.tree fucntions to convert to JSON, so something like this works...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(tidyverse)
library(networkD3)
data.frame(level, value, stringsAsFactors = F) %>%
mutate(row = row_number()) %>%
mutate(level2 = level, value2 = value) %>%
spread(level2, value2) %>%
mutate(`0` = "root") %>%
arrange(row) %>%
fill(-level, -value, -row) %>%
gather(parent_level, parent, -level, -value, -row) %>%
filter(parent_level == level - 1) %>%
arrange(row) %>%
select(nodeId = value, parentId = parent) %>%
rbind(data.frame(nodeId = "root", parentId = NA)) %>%
mutate(name = nodeId) %>%
treeNetwork(direction = "radial")

Resources