Extracting from Nested list to data frame - r

I will put dput of what my list looks like at the bottom such that the q can be reproducible. The dput is of a not x.
I have a big nested list called x that I'm trying to build a data frame from but cannot figure it out.
I have done the first part:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
zz <- as.data.frame(do.call(rbind,lapply(a, `length<-`, max(indx))))}
For this I used the following answer:
Converting nested list (unequal length) to data frame
This leaves me a data.frame with n columns for results where n is the max results for any i:
v1 v2 v3
1 NULL NULL NULL
2 * * *
3 NULL NULL NULL
Each * is another nested list in the format list(experience = list(duration = ...
For example the first * in row 2, column v1. I don't want the total list. I only want:
a[[2]][[1]]$experience$start
or in terms of the original list x:
x$results[[2]]$experiences[[1]]$experience$start
I feel like I'm nearly there with some tweaks. I tried:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
for(y in 1:length(a[[i]])) aa <- rbind(aa,tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(NA)))
zz <- as.data.frame(do.call(rbind,lapply(aa, `length<-`, max(indx))))}
Resulting in:
v1 v2 v3
1 NA NA NA
2 NA NA NA
3 2014 NA NA
4 2012 NA NA
5 2006 NA NA
6 NA NA NA
7 NA NA NA
Tried cbind instead of rbind on final line and that put all the dates in the first row.
I also tried the following:
for(i in 1:3){a[[i]]<-lengths(x$results[[i]]$experiences)
indx <- lengths(a)
for(y in 1:length(indx)){tt[i] <- tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(""))}
zz <- as.data.frame(do.call(rbind,lapply(tt, `length<-`, max(indx))))}
This came close, builds the right format but only returns the first result:
v1 v2 v3
1 NA NA NA
2 2014 NA NA
3 NA NA NA
The format I want is:
V1 V2 V3
1 NA NA NA
2 2014 2012 2006
3 NA NA NA
((Sample data now at bottom))
Newest attempt:
Doing the following but returns only the first start date from each a[[i]], the second loop I need to make the list aa[i][y] something different.
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
for(y in 1:length(a[[i]])){aa[i][y] = if(is.null(a[[i]][[y]]$experience$start)){"NULL"}else{a[[i]][[y]]$experience$start}}}
So for dput2 I'd like the form:
v1 v2 v3 v4 v5 v6 v7 v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010
I dont mind if the blanks are null or na
UPDATE
The below answer almost works, however in my data the structure changes, the order of the names (roleName, duration etc) change so that ruins the answer as cumsum is used to determine when a new list is found. If you have duration then start the keys are 9 and 1 and the cumsum part labels them two different lists.
I wrote the following:
my.list <- list(structure(
list(
experience = structure(
list(
start = "1",
end = "1",
roleName = "a",
summary = "a",
duration = "a",
current = "a",
org = structure(list(name = "a", url = "a"), .Names = c("name","url")),
location = structure(
list(
displayLocation = NULL,
lat = NULL,
lng = NULL
),
.Names = c("displayLocation",
"lat", "lng")
) ),.Names = c("start", "end", "roleName", "summary", "duration", "current", "org", "location")),
`_meta` = structure(
list(weight = 1L, `_sources` = list(structure(
list(`_origin` = "a"), .Names = "_origin"
))),.Names = c("weight", "_sources"))),.Names = c("experience", "_meta")))
Then:
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
This changes the structure such that key2 will always be in the right order.
However Then I found after this loop I have another issue.
Sometimes I have for example nothing but a roleName in the experience list. If that occurs twice in a row the keys are repeated. cumsum treats them as the same experience instead of separate ones.
This means I cannot create df3 because of duplicate identifiers for rows. And even if I could by removing troublesome rows, the names wouldn't match as i in the solution below matches the names using the sequence, if I remove any rows that changes the lengths.
Here is my total code for more insight:
for(i in 1:x$count){a[[i]]<-x$results[[i]]$experiences}
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
aaa <- unlist(aa)
dummydf <- data.frame(b=c("start", "end", "roleName", "summary",
"duration", "current", "org.name", "org.url"), key=1:8)
df <- data.frame(a=aaa, b=names(aaa))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
df4 <- data.frame(t(df3))
df4$dates <- as.Date(NA)
df4$dates <- as.Date(df4$start)
df4 <- data.frame(dates = df4$dates)
df4 <- t(df4)
df4 <- data.frame(df4)
names(df4) <- paste0(names(df4), "_", i)
df4[] <- lapply(df4[], as.character)
l1 <- lapply(split(stack(df4), sub('.*_', '', stack(df4)[,2])), '[', 1)
df5 <- t(do.call(cbindPad, l1))
df5 <- data.frame(df5)
cbindpad taken from this question
New sample code including the issues:
dput3 =
list(list(), list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(end = "3000",
start = "2012", duration = "2",
roleName = "a", summary = "aaa",
org = structure(list(name = "None"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(duration = "3",
start = "2006", end = "3000",
roleName = "a", summary = "aaa", org = structure(list(name = " "), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta"))
),
list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta"))))

Maybe this can help
library(dplyr)
library(tidyr)
a <- unlist(a)
df <- data.frame(a=a, b=names(a)) %>% mutate(key=cumsum(b=="experience.duration")) %>%
split(.$key) %>% lapply(function(x) x %>% select(-key) %>% spread(b, a)) %>%
do.call(rbind, .) %>% t %>% data.frame
df$key <- rownames(df)
Then you can filter in on the rows of interest
The above would be equivalent to
rbind(unlist(a)[1:8], unlist(a)[9:16],unlist(a)[17:24]) %>% t
Update
try this for dput2
a <- unlist(dput2)
library(dplyr)
library(tidyr)
dummydf <- data.frame(b=c("experience.start", "experience.end", "experience.roleName", "experience.summary",
"experience.org", "experience.org.name", "experience.org.url",
"_meta.weight", "_meta._sources._origin", "experience.duration"), key=1:10)
df <- data.frame(a=a, b=names(a))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(dput2), function(y) rep(y, sapply(dput2, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
View(df3)

Managed to figure something out, using dput3 above:
a <- dput3
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){if(is.null(a[[y]][[i]]$experience$start)){"Null"}else{a[[y]][[i]]$experience$start}}),error=function(e) print(list()))})
for(i in 1:length(aa)){for(y in 1:length(aa[[i]])){tryCatch(for(z in length(aa[[i]][[y]]))
{test <- rbind(test, data.frame(key = i, key2= y))},error=function(e) print(0))}}
aaa <- unlist(aa)
df <- data.frame(a=aaa)
df2 <- cbind(df, test)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
df5 <- data.frame(dates = df2$a)
df5 <- t(df5)
df5 <- data.frame(df5)
names(df5) <- paste0(names(df5), "_", i)
df5[] <- lapply(df5[], as.character)
l1 <- lapply(split(stack(df5), as.numeric(sub('.*_', '', stack(df5)[,2]))), '[', 1)
df6 <- t(do.call(cbindPad, l1))
df6 <- data.frame(df6)
Will try and expand it so it works with more than one vertical (as currently in aa I isolate start)

Related

Combine text from two or more cells into one cell in formattable R

Given a dataframe df as follows:
df <- structure(list(class = c("A", "", "", "B", ""), name = c("Jack",
"Rose", "Steve", "James", "Rick"), score = c(7L, 18L, 9L, 15L,
12L)), class = "data.frame", row.names = c(NA, -5L))
Is it possible I get the output effect as follows with formattable:
library(df)
formattable(df)
Out:
Updated code:
library(flextable)
data <- flextable(
df,
col_keys = c("class", "name", "score"))
data <- merge_v(data, j = c("class"))
formattable(data)
Out:
Error in create_obj(x, "formattable", list(formatter = formatter, format = list(...), :
argument "formatter" is missing, with no default
It raises same error as above:
library(gt)
# dummy data
dat <- tibble(
a=1:3,
b=c("a","b c","d e f")
)
d <- dat %>%
mutate(b = str_replace_all(b, " ", "<br>")) %>%
gt() %>%
fmt_markdown(columns = TRUE)
formattable(d)
Code which may help:
a1 <- c(1, 2, 3)
data <- c(100, 155, -4)
a2 <- c(0, paste(data, collapse = "<br> "), 1000000)
b <- data.frame(cbind(a1, a2))
width <- 10
formattable(b)
Out:
Reference:
In R when using formattable() place line breaks between entries inside a single cell

Match strings before special character

I am trying to match strings in two columns and return mismatches before ":". It should not return if x2x, y67y, as x remains x and y remains as y.
I don't want to match the ":decimal". If x2y is in both columns then its a match (irrespective of the mismatch in the decimal after special character)
INPUT:
input <- structure(list(x = structure(c(1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), y = structure(c(2L, 3L, 1L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), x_val = c("x2x:0.12345,y67h:0.06732,d7j:0.032647",
"x2y:0.26345,y67y:0.28320,d7r:0.043647", "x2y:0.23435,y67y:0.28310,d7r:0.043547",
"x2y:0.23435,y67y:0.28330,d7r:0.043247"), y_val = c("x2y:0.33134,y67y:0.3131,d7r:0.23443",
"x2y:0.34311,y67y:0.14142,d7r:0.31431", "x2x:0.34314,y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234")), row.names = c(NA, -4L
), class = "data.frame")
Output:
output <- structure(list(x = structure(c(1L, 2L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), y = structure(c(2L, 3L, 1L, 4L), .Label = c("A",
"B", "C", "D"), class = "factor"), x_val = c("x2x:0.12345,y67h:0.06732,d7j:0.032647",
"x2y:0.26345,y67y:0.28320,d7r:0.043647", "x2y:0.23435,y67y:0.28310,d7r:0.043547",
"x2y:0.23435,y67y:0.28330,d7r:0.043247"), y_val = c("x2y:0.33134,y67y:0.3131,d7r:0.23443",
"x2y:0.34311,y67y:0.14142,d7r:0.31431", "x2x:0.34314,y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234"), diff_x = c("y67h:0.06732,d7j:0.03264",
NA, "x2y:0.23435,d7r:0.043547", "x2y:0.23435,y67y:0.28330,d7r:0.043247"
), diff_y = c("x2y:0.33134,d7r:0.23443", NA, "y67h:0.14141,d7j:0.453145",
"x67b:0.31411,g72v:0.3134,b8c:0.89234")), row.names = c(NA, -4L
), class = "data.frame")
I run into problem when I just want to match till ":" character. The following code is taken from this question: https://stackoverflow.com/a/55285959/5150629.
library(dplyr)
library(purrr)
I %>% mutate(diff_x = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.x, .y),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA),
diff_y = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\\d+)(?!\\1)', setdiff(.y, .x),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA))
Can anyone help?Thanks!
I modified my answer in https://stackoverflow.com/a/55285959/5150629 to fit this question:
library(dplyr)
library(purrr)
df %>%
mutate(
diff_x = map2_chr(
strsplit(x_val, split = ","),
strsplit(y_val, split = ","),
~ {
setdiff(sub(":.+$", "", .x), sub(":.+$", "", .y)) %>%
grep('([a-z])(?>\\d+)(?!\\1)', ., value = TRUE, perl = TRUE) %>%
sapply(grep, .x, value = TRUE) %>%
paste(collapse = ", ") %>%
replace(. == "", NA)
}
),
diff_y = map2_chr(
strsplit(x_val, split = ","),
strsplit(y_val, split = ","),
~ {
setdiff(sub(":.+$", "", .y), sub(":.+$", "", .x)) %>%
grep('([a-z])(?>\\d+)(?!\\1)', ., value = TRUE, perl = TRUE) %>%
sapply(grep, .y, value = TRUE) %>%
paste(collapse = ", ") %>%
replace(. == "", NA)
}
)
)
Output:
x y x_val y_val diff_x
1 A B x2x:0.12345,y67h:0.06732,d7j:0.032647 x2y:0.33134,y67y:0.3131,d7r:0.23443 y67h:0.06732, d7j:0.032647
2 B C x2y:0.26345,y67y:0.28320,d7r:0.043647 x2y:0.34311,y67y:0.14142,d7r:0.31431 <NA>
3 C A x2y:0.23435,y67y:0.28310,d7r:0.043547 x2x:0.34314,y67h:0.14141,d7j:0.453145 x2y:0.23435, d7r:0.043547
4 C D x2y:0.23435,y67y:0.28330,d7r:0.043247 x67b:0.31411,g72v:0.3134,b8c:0.89234 x2y:0.23435, d7r:0.043247
diff_y
1 x2y:0.33134, d7r:0.23443
2 <NA>
3 y67h:0.14141, d7j:0.453145
4 x67b:0.31411, g72v:0.3134, b8c:0.89234
Notes:
Since we are only interested in comparing the first part of the string format x1y:000000, I added a sub(":.+$", "", .x) for each map2_chr input argument to strip out the :000000 part first.
setdiff and the following grep steps work as expected to return the mismatches and exclude strings with the form x1x.
sapply(grep, .x, value = TRUE) after the first grep takes the vector of mismatches, and searches for their corresponding original strings (in x1y:000000 form).
paste collapses the vector of mismatches into a single comma separated list.

Combine two columns containing lists into single column based on condition

I have two columns x and y in data frame which are in form of list. Some lists in col x are having logical(0) value and I want to fill these with list values from y column. How can I do that in R.
Sample dput
df <- structure(
list(
x = list(
structure(logical(0), .Dim = c(0L,
2L)),
structure(
c(72.8468555473385, 19.1207531432888),
.Dim = 1:2,
.Dimnames = list("1", c("X", "Y"))
),
structure(logical(0), .Dim = c(0L, 2L)),
structure(
c(72.8466089689375, 19.1222313526198),
.Dim = 1:2,
.Dimnames = list("1", c("X", "Y"))
),
structure(
c(72.8458211528575, 19.1206957620104),
.Dim = 1:2,
.Dimnames = list("1", c("X", "Y"))
)
),
y = list(
structure(
c(72.846989997634, 19.1197250026469),
.Dim = 1:2,
.Dimnames = list(NULL, c("lon", "lat"))
),
structure(
c(72.846989997634,
19.1197250026469),
.Dim = 1:2,
.Dimnames = list(NULL, c("lon",
"lat"))
),
structure(
c(72.8480650003086, 19.1195200000195),
.Dim = 1:2,
.Dimnames = list(NULL, c("lon", "lat"))
),
structure(
c(72.8463200059764,
19.1207150074423),
.Dim = 1:2,
.Dimnames = list(NULL, c("lon",
"lat"))
),
structure(
c(72.8468350022863, 19.1204500035408),
.Dim = 1:2,
.Dimnames = list(NULL, c("lon", "lat"))
)
)
),
.Names = c("x", "y"),
row.names = c(NA,-5L),
class = "data.frame"
)
I want x col to have combined values from x and y like below
x
1 72.84699, 19.11973
2 72.84686, 19.12075
3 72.84807, 19.11952
4 72.84661, 19.12223
5 72.84582, 19.12070
There might be a smarter way to do this but using base R mapply we can check for length of x column and if it is less than 1 then we replace it with value in y column.
df$x <- mapply(function(x, y) if (length(x) > 1) list(x) else list(y), df$x, df$y)
df
# x y
#1 72.84699, 19.11973 72.84699, 19.11973
#2 72.84686, 19.12075 72.84699, 19.11973
#3 72.84807, 19.11952 72.84807, 19.11952
#4 72.84661, 19.12223 72.84632, 19.12072
#5 72.84582, 19.12070 72.84684, 19.12045
We can do this in a vectorized way in base R by creating a logical index to assign the values in 'x' column
i1 <- !lengths(df$x)
df$x[i1] <- df$y[i1]
Or in a single line
df$x <- replace(df$x, i1, df$y[i1])
df
# x y
#1 72.84699, 19.11973 72.84699, 19.11973
#2 72.84686, 19.12075 72.84699, 19.11973
#3 72.84807, 19.11952 72.84807, 19.11952
#4 72.84661, 19.12223 72.84632, 19.12072
#5 72.84582, 19.12070 72.84684, 19.12045
Or with tidyverse
library(tidyverse)
df %>%
mutate(x = ifelse(lengths(x)==0, y, x))
Benchmarks
Some benchmarks on a slightly big dataset
df1 <- df[rep(seq_len(nrow(df)), 1e6), ]
df2 <- copy(df1)
system.time({
df1$x <- mapply(function(x, y) if (length(x) > 1) list(x) else list(y), df1$x, df1$y)
})
#user system elapsed
# 6.261 0.941 7.164
system.time({
i1 <- !lengths(df2$x)
df2$x[i1] <- df2$y[i1]
})
# user system elapsed
# 0.858 0.018 0.874

Return nested list with nested level and value

I would like to visualize some deeply nested data using networkD3. I can't figure out how to get the data into the right format before sending to radialNetwork.
Here is some sample data:
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
where level indicates the level of the nest, and value is the name of the node. By using these two vectors, I need to get the data into the following format:
my_list <- list(
name = "root",
children = list(
list(
name = value[1], ## a
children = list(list(
name = value[2], ## b
children = list(list(
name = value[3], ## c
children = list(
list(name = value[4]), ## d
list(name = value[5]) ## e
)
),
list(
name = value[6], ## f
children = list(
list(name = value[7]), ## g
list(name = value[8]) ## h
)
))
))
),
list(
name = value[9], ## i
children = list(list(
name = value[10], ## j
children = list(list(
name = value[11] ## k
))
))
)
)
)
Here is the deparsed object:
> dput(my_list)
# structure(list(name = "root",
# children = list(
# structure(list(
# name = "a",
# children = list(structure(
# list(name = "b",
# children = list(
# structure(list(
# name = "c", children = list(
# structure(list(name = "d"), .Names = "name"),
# structure(list(name = "e"), .Names = "name")
# )
# ), .Names = c("name",
# "children")), structure(list(
# name = "f", children = list(
# structure(list(name = "g"), .Names = "name"),
# structure(list(name = "h"), .Names = "name")
# )
# ), .Names = c("name",
# "children"))
# )), .Names = c("name", "children")
# ))
# ), .Names = c("name",
# "children")), structure(list(
# name = "i", children = list(structure(
# list(name = "j", children = list(structure(
# list(name = "k"), .Names = "name"
# ))), .Names = c("name",
# "children")
# ))
# ), .Names = c("name", "children"))
# )),
# .Names = c("name",
# "children"))
Then I can pass it to the final plotting function:
library(networkD3)
radialNetwork(List = my_list)
The output will look similar to this:
Question: How can I create the nested list?
Note: As pointed out by #zx8754, there is already a solution in this SO post, but that requires data.frame as input. Due to the inconsistency in my level, I don't see a simple way to transform it into a data.frame.
Using a data.table-style merge:
library(data.table)
dt = data.table(idx=1:length(value), level, parent=value)
dt = dt[dt[, .(i=idx, level=level-1, child=parent)], on=.(level, idx < i), mult='last']
dt[is.na(parent), parent:= 'root'][, c('idx','level'):= NULL]
> dt
# parent child
# 1: root a
# 2: a b
# 3: b c
# 4: c d
# 5: c e
# 6: b f
# 7: f g
# 8: f h
# 9: root i
# 10: i j
# 11: j k
Now we can use the solution from the other post:
x = maketreelist(as.data.frame(dt))
> identical(x, my_list)
# [1] TRUE
As a preface, your data is difficult to work with because critical information is encoded in the order of the values in level. I don't know how you get those values in that order, but consider that there may be a better way to structure that information in the first place, which would make the next task easier.
Here's a base-y way of converting your data into a data frame with 2 columns, parent and child, then passing that into data.tree functions that can easily convert to the JSON format you need... and then pass it on to radialNetwork...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(data.tree)
library(networkD3)
parent_idx <- sapply(1:length(level), function(n) rev(which(level[1:n] < level[n]))[1])
df <- data.frame(parent = value[parent_idx], child = value, stringsAsFactors = F)
df$parent[is.na(df$parent)] <- ""
list <- ToListExplicit(FromDataFrameNetwork(df), unname = T)
radialNetwork(list)
Here's a tidyverse way of achieving the same...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(tidyverse)
library(data.tree)
library(networkD3)
data.frame(level, value, stringsAsFactors = F) %>%
mutate(row = row_number()) %>%
mutate(level2 = level, value2 = value) %>%
spread(level2, value2) %>%
mutate(`0` = "") %>%
arrange(row) %>%
fill(-level, -value, -row) %>%
gather(parent_level, parent, -level, -value, -row) %>%
filter(parent_level == level - 1) %>%
arrange(row) %>%
select(parent, child = value) %>%
data.tree::FromDataFrameNetwork() %>%
data.tree::ToListExplicit(unname = TRUE) %>%
radialNetwork()
and for a bonus, the current dev version of networkD3 (v0.4.9000) has a new treeNetwork function that takes a data frame with nodeId and parentId columns/variables, which eliminates the need for the data.tree fucntions to convert to JSON, so something like this works...
level <- c(1, 2, 3, 4, 4, 3, 4, 4, 1, 2, 3)
value <- letters[1:11]
library(tidyverse)
library(networkD3)
data.frame(level, value, stringsAsFactors = F) %>%
mutate(row = row_number()) %>%
mutate(level2 = level, value2 = value) %>%
spread(level2, value2) %>%
mutate(`0` = "root") %>%
arrange(row) %>%
fill(-level, -value, -row) %>%
gather(parent_level, parent, -level, -value, -row) %>%
filter(parent_level == level - 1) %>%
arrange(row) %>%
select(nodeId = value, parentId = parent) %>%
rbind(data.frame(nodeId = "root", parentId = NA)) %>%
mutate(name = nodeId) %>%
treeNetwork(direction = "radial")

using both column and row name in apply

I'm having something like this:
#data.table
# a b
#aland 1 2
#bland 3 4
freq_all = read.table(file='data.table', header=T,stringsAsFactors = FALSE)
country_names = rownames(freq_all)
blood_types = colnames(freq_all)
func <- function(country,type) {paste(country, type)}
newfr <- freq_all
for (country in country_names){
for (type in blood_types){
newfr[country, type] <- func(country, type)
}
}
And I'm wondering if I can use the apply() function or something similar here.
We can use outer
freq1 <- freq_all
freq1[] <- outer(rownames(freq_all), colnames(freq_all), FUN= paste)
freq1
# a b
#aland aland a aland b
#bland bland a bland b
identical(freq1, newfr)
#[1] TRUE
data
freq_all <- structure(list(a = c(1L, 3L), b = c(2L, 4L)), .Names = c("a",
"b"), class = "data.frame", row.names = c("aland", "bland"))

Resources