Nested list assignment R - r

I have a list of the following type
categories = list(
c("Women","Clothing", "Jeans"),
c("Women","Clothing", "Sweaters"),
c("Men","Accessories", "Belts"),
c("Women", "Accessories", "Jewelry" ))
I want to parse this list and create a list of lists to export in JSON and it should have the following structure:
Women={
Clothing= {
Jeans{},
Sweaters{}
},
accesories={
Jewleery{}
}
},
Men ={
Accessires={
Belts={}
}
So it should go over each element which is a char vector contained in the list and check if there is such element in the final list, if there isn't it should append it. It should append the element at the proper level. For example if Clothing is second element to Woman, it should append to the Women list of the final list. Or if Sweaters is thrid element to Women.Clothing it should apppend Clothing list of the Women list of the final list.
If the element exists at the given level already it should not append, instead it should go to next element in the char vector.
In the char vectors of the input lsit, the first element is always level 1 the second level 2 the third level 3 etc..
It should be done recursively, I tried few times but I have no idea how to assign to a nested list, specifically i need to do nested assigns.

I made the data into a matrix, transposed, then a dataframe:
x <- data.frame(t(vapply(categories, identity, character(3))), stringsAsFactors = F)
Then split, and lapply. You could do this recursively if you have more than 3 levels:
lapply(split(x, x$X1), function(df) {
lapply(split(df, df$X2), function(df) {
lapply(split(df, df$X3), function(x) list())
})
})

If you are looking for a recursive solution, then the following may help you:
output the full directory trajectory within a string at the end
## construct a data frame from list
df <- data.frame(matrix(unlist(categories),nrow = length(categories),byrow = T),stringsAsFactors = F)
## recursion function that makes nested list
f <- function(df, k=1) {
if (k == ncol(df)) return(lapply(split(df,df[,k]), toString)) ##
return(lapply(split(df,df[,k]), function(df) f(df, k+1)))
}
The nested list output looks as below
> f(df)
$Men
$Men$Accessories
$Men$Accessories$Belts
[1] "Men, Accessories, Belts"
$Women
$Women$Accessories
$Women$Accessories$Jewelry
[1] "Women, Accessories, Jewelry"
$Women$Clothing
$Women$Clothing$Jeans
[1] "Women, Clothing, Jeans"
$Women$Clothing$Sweaters
[1] "Women, Clothing, Sweaters"
output empty lists at the end
f <- function(df, k=1) {
if (k == ncol(df)) return(lapply(split(df,df[,k]), function(v) list()))
return(lapply(split(df,df[,k]), function(df) f(df, k+1)))
}
which gives:
> f(df)
$Men
$Men$Accessories
$Men$Accessories$Belts
list()
$Women
$Women$Accessories
$Women$Accessories$Jewelry
list()
$Women$Clothing
$Women$Clothing$Jeans
list()
$Women$Clothing$Sweaters
list()

Related

How to use the `purrr` package in R instead of for-loop to iterate over indices

I have a list of S4 objects, and I'm trying to iterate a function over these lists where I select an index position, and then from that position extract keywords I'm interested in. I am able to do a for loop and apply the function successfully, but is there a way this could be done using the purrr package? I'm not sure how to replicate the S4 object exactly, so I've included a very high level example just to get an idea of my process.
list_1 <- list("Sample", "test", "test Date")
list_2 <- list("test", "sample", "test Date")
listoflists <- list(list_1, list_2)
I created a list of indices of "Sample":
groupList <- map(listoflists,~which(toupper(.) == "SAMPLE"))
As well as a list of keywords that I'd like to extract:
keywordsList <- list(c("One test", "two test"), c("one test", "two test"))
I have a function that takes the S4 objects, selects the index where "sample" is found, and from that extracts the keywords.
for(i in seq_along(listoflists){
output[[i]] <- some_function(listoflists[[i]], index = groupList[[i]], keywords = keywordsList[[i]]) }
I tried using imap, but it seems like when I do this, the output's sublist only has 1 keyword (say "One test" in first list and "two test" in second list) instead of 3:
output <- listoflists %>% imap(~some_function(.x,index = groupList[[.y]], keywords = keywordsList[[.y]])
You are missing an closing bracket in your for loop but other than that your code should work. I am going to define a trivial some_function() to demonstrate:
some_function <- function(x, index, keywords) {
c(x[[index]], keywords)
}
loop_output <- vector(mode = "list", length = length(listoflists))
for (i in seq_along(listoflists)) {
loop_output[[i]] <- some_function(listoflists[[i]], index = groupList[[i]], keywords = keywordsList[[i]])
}
purr_output <- imap(
listoflists,
~ some_function(
.x,
index = groupList[[.y]],
keywords = keywordsList[[.y]]
)
)
identical(loop_output, purr_output)
# TRUE
If even with the correct brackets, your example works in a loop but not using imap I doubt that the use of S4 objects is relevant.
You can be tripped up if you have a named list. From the imap docs:
imap_xxx(x, ...), an indexed map, is short hand for map2(x, names(x), ...) if x has names, or map2(x, seq_along(x), ...) if it does not.
See for example:
listoflists <- list(list_1, list_2)
imap(listoflists, ~.y)
# [[1]]
# [1] 1
# [[2]]
# [1] 2
listoflists <- list(l1 = list_1, l2 = list_2)
imap(listoflists, ~.y)
# $l1
# [1] "l1"
# $l2
# [1] "l2"
Make sure you are looping over the indices rather than the names and the output should be identical.
You could also do this with purrr::pmap(), which maps in parallel over an arbitrary number of lists (passed within a super-list):
output <-
pmap(.l = list(listoflists, index = groupList, keywords = keywordsList),
.f = some_function)

Tracking parent list in nested list

I have a nested loop that I need to iterate over. I want to go to the end of the list (in this case second item of the parent list), and add item to it if it isn't nested loop anymore. So loop may have many levels of nested loop. Right now, I'm only getting second list as a return. How do I track parent list?
a <- list( x = list(1,2,3),y =list(4,5,6))
con=TRUE
while(con){
i <-length(a)
for(k in i:i){
if(!typeof(a[[k]])=="list"){
a[[k+1]] <- "test"
con=FALSE
}else{
a <- a[[k]]
i <- length(a)
}
}
}
Expected Result:a <- list(x = list(1,2,3), y =list(4,5,6, "test"))
Result: a <- list(4,5,6,"test")
library(magrittr)
a <- list( x = list(1,2,3),y =list(4,5,6), z = 1)
temp <- lapply(a, typeof) %>% unlist
tempList <- (temp!="list")
if (sum(tempList) > 0) {
a[[max(which(tempList == FALSE))]] %<>% append("test")
} else {
a[[length(a)]] %<>% append("test")
}
It isn't clear to me what it is that you want to do, but
just concentrating on your example, this would work.
In short, see which elements of the parent list are not Lists, and for the last one of them add "test". If all of them are lists, then add "test" to the last one.

trying to get a proper names(list) output

I'm trying to split a 2 level deep list of characters into a 1 level list using a suffix.
More precisely, I have a list of genes, each containing 6 lists of probes corresponding to 6 bins. The architecture looks like :
feat_indexed_probes_bin$HSPB6$bin1
[1] "cg14513218" "cg22891287" "cg20713852" "cg04719839" "cg27580050" "cg18139462" "cg02956481" "cg26608795" "cg15660498" "cg25654926" "cg04878216"
I'm trying to get a list "bins_indexed_probes" with the following architecture :
bins_indexed_probes$HSPB6_bin6 containing the same probes so I can pass it to my map-reducing function.
I tried many solutions such as melt(), for loop, etc but I can't figure how to perform a double nested loop ( on genes and on bins) and get a list output with only 1 level depth.
For the moment, my func to do so is the following :
create_map <- function(indexes = feat_indexed_probes_bin, binlist = c("bin1", "bin2", "bin3", "bin4", "bin5", "bin6"), genes = features) {
map <- list()
ret <- lapply(binlist, function(bin) {
lapply(rownames(features), function(gene) {
map[[paste(gene, "_", bin, sep = "")]] <- feat_indexed_probes_bin[[gene]][[bin]]
tmp_names <<- paste(gene, "_", bin, sep = "")
return(map)
})
names(map) <- tmp_names
rm(tmp_names)
})
return(ret)
}
it returns:
[[6]][[374]]
GDF10_bin6
"cg13565300"
[[6]][[375]]
NULL
[[6]][[376]]
[[6]][[376]]$HNF1B_bin6
[1] "cg03433642" "cg09679923" "cg17652435" "cg03348978" "cg02435495" "cg02701059" "cg05110178" "cg11862993" "cg09463047"
[[6]][[377]]
[[6]][[377]]$GPIHBP1_bin6
[1] "cg01953797" "cg00152340"
instead, I would expect something like
$GPIHBP1_bin1
"cg...." "cg...."
...
$GPIHBP1_bin6
"someotherprobe"
$someothergene_bin1
"probe" "probe"
...
I hope I'm being clear, and since this is my first time asking question, I already apologise if I didn't follow the stackoverflow protocol.
Thank you already for reading me
Consider a nested lapply with extract, [[, and setNames calls, all wrapped in do.call using c to bind return elements together.
bins_indexed_probes <- do.call(c,
lapply(1:6, function(i)
setNames(lapply(feat_indexed_probes_bin, `[[`, i),
paste0(names(feat_indexed_probes_bin), "_bin", i))
)
)
# RE-ORDER ELEMENTS BY NAME
bins_indexed_probes <- bins_indexed_probes[sort(names(bins_indexed_probes))]
Rextester Demo

R: How to store a list within a list?

I am trying to parse data from tables at baseball-reference.com. I want to do so for multiple teams and multiple years. The code below is used to capture each team season link.
library(XML)
#Will use for loop to fill in the rest of the link
link_base <- "http://www.baseball-reference.com/teams/"
#List of teams
teams <- c("CHC", "STL")
#Year
season <- 2000:2002
#End of link
end_link <- "-schedule-scores.shtml"
links <- list()
for(i in 1:length(teams)){
links[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
links[[i]][j] <- paste0(link_base, teams[i], "/", season[j], end_link)
}
}
This results in:
> links
[[1]]
[1] "http://www.baseball-reference.com/teams/CHC/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/CHC/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/CHC/2002-schedule-scores.shtml"
[[2]]
[1] "http://www.baseball-reference.com/teams/STL/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/STL/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/STL/2002-schedule-scores.shtml"
Now, for each element in the list, I would like to use the readHTMLTable function so that I can parse information. I have tried doing so:
a <- list()
for(i in 1:length(teams)){
a[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
a[[i]][j] <- readHTMLTable(links[[i]][j])
}
}
The readHTMLTable returns a list of length 6:
x <- readHTMLTable(links[[1]][1])
> length(x)
[1] 6
I would like the 1st element of list a to store to the output from the readHTMLTable function for the "CHC" links. I would like the 2nd element of list a to store the output from the readHTMLTable function for the "STL" links. Thus, the list a would comprise of 2 elements. Both elements would comprise of 3 lists comprising of 6 elements.
I think this works
lst <- lapply(links, function(l) lapply(l, function(x) readHTMLTable(x)))
length(lst)
# [1] 2
lengths(lst)
# [1] 3 3
The first sublist should have the CHC, the second the STL.

R: get element by name from a nested list

I have a nested list like so:
smth <- list()
smth$a <- list(a1=1, a2=2, a3=3)
smth$b <- list(b1=4, b2=5, b3=6)
smth$c <- "C"
The names of every element in the list are unique.
I would like to get an element from such a list merely by name without knowing where it is located.
Example:
getByName(smth, "c") = "C"
getByName(smth, "b2") = 5
Also I don't really want to use unlist since the real list has a lot of heavy elements in it.
The best solution so far is the following:
rmatch <- function(x, name) {
pos <- match(name, names(x))
if (!is.na(pos)) return(x[[pos]])
for (el in x) {
if (class(el) == "list") {
out <- Recall(el, name)
if (!is.null(out)) return(out)
}
}
}
rmatch(smth, "a1")
[1] 1
rmatch(smth, "b3")
[1] 6
Full credit goes to #akrun for finding it and mbedward for posting it here

Resources