Counting split rules in decision trees in R

Counting split rules in decision trees in R - r

I'm trying to count each unique split rule from a data frame of decision trees in R. For example, if I have a data frame containing 4 trees like the one shown below:
df <- data.frame(
var = c('x10', NA, NA,
'x10', NA, 'x7', NA, NA,
'x5', 'x2', NA, NA, 'x9', NA, NA,
'x5', NA, NA),
num = c(1,1,1,
2,2,2,2,2,
1,1,1,1,1,1,1,
2,2,2),
iter = c(rep(1, 8), rep(2, 10))
)
> df
var num iter
1 x10 1 1
2 <NA> 1 1
3 <NA> 1 1
4 x10 2 1
5 <NA> 2 1
6 x7 2 1
7 <NA> 2 1
8 <NA> 2 1
9 x5 1 2
10 x2 1 2
11 <NA> 1 2
12 <NA> 1 2
13 x9 1 2
14 <NA> 1 2
15 <NA> 1 2
16 x5 2 2
17 <NA> 2 2
18 <NA> 2 2
The var column contains the variable name used in the splitting rule and is ordered by depth first. So, for example, the 4 trees created from that data would look like this:
I'm trying to find a way to return the count of each pair of variables used in a split rule, but grouped by iter. For example, if we look at the 2nd tree (i.e.,num == 2, iter == 1) we can see that x7 splits on x10. so, the pair x10:x7 appears 1 time when iter == 1.
My desired output would look something like this:
allSplits count iter
1 x10:x7 1 1
2 x5:x2 1 2
3 x5:x9 1 2
Any suggestions as to how I could do this?

There is probably a package that knows how to operate on this kind of data frame, but maybe these two hand-crafted recursive functions can get you started.
mkTree <- function(x, pos = 1L) {
var <- x[pos]
if (is.na(var)) {
list(NA_character_, NULL, NULL, 1L)
} else {
node <- vector("list", 4L)
node[[1L]] <- var
node[[2L]] <- l <- Recall(x, pos + 1L)
node[[3L]] <- r <- Recall(x, pos + 1L + l[[4L]])
node[[4L]] <- 1L + l[[4L]] + r[[4L]]
node
}
}
tabTree <- function(tree, sep = ":") {
x <- rep.int(NA_character_, tree[[4L]])
pos <- 1L
recurse <- function(subtree) {
var1 <- subtree[[1L]]
if (!is.na(var1)) {
for (i in 2:3) {
var2 <- subtree[[c(i, 1L)]]
if (!is.na(var2)) {
x[pos] <<- paste0(var1, sep, var2)
pos <<- pos + 1L
Recall(subtree[[i]])
}
}
}
}
recurse(tree)
x <- x[!is.na(x)]
if (length(x)) {
x <- factor(x)
setNames(tabulate(x), levels(x))
} else {
integer(0L)
}
}
mkTree transforms into recursive lists the segments of var in your data frame that specify a tree. Nodes in these recursive structures have the form:
list(variable_name, left_node, right_node, subtree_size)
tabTree takes the mkTree result and returns a named integer vector tabulating the splits. So you could do
f <- function(x) tabTree(mkTree(x))
L <- tapply(df[["var"]], df[c("num", "iter")], f, simplify = FALSE)
to get a list matrix storing the tabulated splits for each [num, iter] pair (i.e., for each tree).
L
## iter
## num 1 2
## 1 integer,0 integer,2
## 2 1 integer,0
L[2L, 1L]
## [[1]]
## x10:x7
## 1
L[1L, 2L]
## [[1]]
## x5:x2 x5:x9
## 1 1
And you could sum over num to get tabulated splits for each level of iter.
g <- function(l) {
x <- unlist(unname(l))
tapply(x, names(x), sum)
}
apply(L, 2L, g)
## $`1`
## x10:x7
## 1
## $`2`
## x5:x2 x5:x9
## 1 1

Related

How to make a For loop that keeps the original row value

I am trying to run multiple conditional statements in a loop. My first conditional is an if, else if with 3 conditions (4 technically if nothing matches). My second really only needs one condition, and I want to keep the original row value if it doesn't meet that condition. The problem is my output doesn't match the row numbers, and I'm not sure how to output only to a specific row in a loop.
I want to loop over each column, and within each column I use sapply to check each value for falling outside of a range1 (gets marked with 4), inside of range1 (gets marked with 1), is.na (gets marked with 9), otherwise is marked -999. A narrower range would then be used, if each value in a column falls inside of range2, mark with a 3, otherwise don't update.
My partially working code, and a reproducible example is below. My input and first loop is:
df <- structure(list(A = c(-2, 3, 5, 10, NA), A.c = c(NA, NA, NA, NA, NA), B = c(2.2, -55, 3, NA, 99), B.c = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L))
> df
A A.c B B.c
1 -2 NA 2.2 NA
2 3 NA -55.0 NA
3 5 NA 3.0 NA
4 10 NA NA NA
5 NA NA 99.0 NA
min1 <- 0
max1 <- 8
test1.func <- function(x) {
val <- if (!is.na(x) & is.numeric(x) & (x < min1 | x > max1){
num = 4
} else if (!is.na(x) & is.numeric(x) & x >= min1 & x <= max1){
num = 1
} else if (is.na(x)){# TODO it would be better to make this just what is already present in the row
} else {
num = -999
}
val
}
Test1 <- function(x) {
i <- NA
for(i in seq(from = 1, to = ncol(x), by = 2)){
x[, i + 1] <- sapply(x[[i]], test1.func)
}
x
}
df_result <- Test1(df)
> df_result
A A.c B B.c
1 -2 4 2.2 1
2 3 1 -55.0 4
3 5 1 3.0 1
4 10 4 NA 9
5 NA 9 99.0 4
The next loop and conditional (any existing values of 4 or 9 would remain):
min2 <- 3
max2 <- 5
test2.func <- function(x) {
val <- if (!is.na(x) & is.numeric(x) & (x < min2 | x > max2){
num = 3
}
val
}
Test2 <- function(x) {
i <- NA
for(i in seq(from = 1, to = ncol(x), by = 2)){
x[, i + 1] <- sapply(x[[i]], test2.func)
}
x
}
df_result2 <- Test2(df_result)
# Only 2.2 matches, if working correctly would output
> df_result2
A A.c B B.c
1 -2 4 2.2 3
2 3 1 -55.0 4
3 5 1 3.0 1
4 10 4 NA 9
5 NA 9 99.0 4
Current code errors, since there is only one match:
Warning messages:
1: In `[<-.data.frame`(`*tmp*`, , i + 1, value = list(3, NULL, NULL, :
provided 5 variables to replace 1 variables

Some thoughts.
for loops are not necessary, it is better to capitalize on R's vectorized operations;
it appears that your values of 4 and 3 are really something like "outside band 1" and "outside band 2", in which case this can be resolved in one function.
Testing for == "NA" is a bit off ... if one of the values in a column is a string "NA" (and not R's NA value), then all values in that column are strings and you have other problems. Because of this, I don't explicitly check for is.numeric, though it is not hard to work back in.
Try this:
func <- function(x, range1, range2) {
ifelse(is.na(x), 9L,
ifelse(x < range1[1] | x > range1[2], 4L,
ifelse(x < range2[1] | x > range2[2], 3L,
1L)))
}
df[,c("A.c", "B.c")] <- lapply(df[,c("A", "B")], func, c(0, 8), c(3, 5))
df
# A A.c B B.c
# 1 -2 4 2.2 3
# 2 3 1 -55.0 4
# 3 5 1 3.0 1
# 4 10 4 NA 9
# 5 NA 9 99.0 4
One problem I have with this is that it uses a 3-nested ifelse loop. While this works fine, it can be difficult to trace and troubleshoot (and ifelse has problems of its own). If you have other conditions to incorporate, it might be nice to use dplyr::case_when.
func2 <- function(x, range1, range2) {
dplyr::case_when(
is.na(x) ~ 9L,
x < range1[1] | x > range1[2] ~ 4L,
x < range2[1] | x > range2[2] ~ 3L,
TRUE ~ 1L
)
}
I find this second method much easier to read, though it does have the added dependency of dplyr (which, while it definitely has advantages and strengths, includes an army of other dependencies). If you are already using any of the tidyverse packages in your workflow, though, this is likely the better solution.

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards.
(The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the matrix doesn't necessarily make much sense.)
For that purpose I already wrote a working function that computes by principle all possible power-subsets and checks them for NAs. It works well with this toy data d which represents a 10x5 matrix:
> d
id X1 X2 X3 X4 X5
1 A 1 11 21 31 41
2 B 2 12 22 32 42
3 C 3 13 23 33 NA
4 D 4 14 24 34 NA
5 E 5 15 25 NA NA
6 F 6 16 26 NA NA
7 G 7 17 NA NA NA
8 H 8 18 NA NA NA
9 I 9 NA NA NA NA
10 J 10 NA NA NA NA
11 K NA NA NA NA NA
The problem now is that I actually want to apply the algorithm to survey data that would represent a 34235 x 17 matrix!
My code has been reviewed on Code Review, but still cannot be applied to the real data.
I am aware that with this approach there would be a huge calculation. (Presumably too huge for non-supercomputers?!)
Does anyone know a more suitable approach?
I show you the already enhanced function by #minem from Code Review:
seqRank2 <- function(d, id = "id") {
require(matrixStats)
# change structure, convert to matrix
ii <- as.character(d[, id])
dm <- d
dm[[id]] <- NULL
dm <- as.matrix(dm)
rownames(dm) <- ii
your.powerset = function(s){
l = vector(mode = "list", length = 2^length(s))
l[[1]] = numeric()
counter = 1L
for (x in 1L:length(s)) {
for (subset in 1L:counter) {
counter = counter + 1L
l[[counter]] = c(l[[subset]], s[x])
}
}
return(l[-1])
}
psr <- your.powerset(ii)
psc <- your.powerset(colnames(dm))
sss <- lapply(psr, function(x) {
i <- ii %in% x
lapply(psc, function(y) dm[i, y, drop = F])
})
cn <- sapply(sss, function(x)
lapply(x, function(y) {
if (ncol(y) == 1) {
if (any(is.na(y))) return(NULL)
return(y)
}
isna2 <- matrixStats::colAnyNAs(y)
if (all(isna2)) return(NULL)
if (sum(isna2) == 0) return(NA)
r <- y[, !isna2, drop = F]
return(r)
}))
scr <- sapply(cn, nrow)
scc <- sapply(cn, ncol)
namesCN <- sapply(cn, function(x) paste0(colnames(x), collapse = ", "))
names(scr) <- namesCN
scr <- unlist(scr)
names(scc) <- namesCN
scc <- unlist(scc)
m <- t(rbind(n.obs = scr, sq.len = scc))
ag <- aggregate(m, by = list(sequence = rownames(m)), max)
ag <- ag[order(-ag$sq.len, -ag$n.obs), ]
rownames(ag) <- NULL
return(ag)
}
Yielding:
> seqRank2(d)
sequence n.obs sq.len
1 X1, X2, X3, X4 4 4
2 X1, X2, X3 6 3
3 X1, X2, X4 4 3
4 X1, X3, X4 4 3
5 X2, X3, X4 4 3
6 X1, X2 8 2
7 X1, X3 6 2
8 X2, X3 6 2
9 X1, X4 4 2
10 X2, X4 4 2
11 X3, X4 4 2
12 X1 10 1
13 X2 8 1
14 X3 6 1
15 X4 4 1
16 X5 2 1
> system.time(x <- seqRank2(d))
user system elapsed
1.93 0.14 2.93
In this case I would choose X1, X2, X3, X4, X1, X2, X3 or X2, X3, X4 because they're continuous and yield an appropriate number of observations.
Expected output:
So for toy data d the expected output would be something like:
> seqRank2(d)
sequence n.obs sq.len
1 X1, X2, X3, X4 4 4
2 X1, X2, X3 6 3
3 X2, X3, X4 4 3
4 X1, X2 8 2
5 X2, X3 6 2
6 X3, X4 4 2
7 X1 10 1
8 X2 8 1
9 X3 6 1
10 X4 4 1
11 X5 2 1
And at the end the function should run properly on the huge matrix d.huge which leads to errors at the moment:
> seqRank2(d.huge)
Error in vector(mode = "list", length = 2^length(s)) :
vector size cannot be infinite
Toy data d:
d <- structure(list(id = structure(1:11, .Label = c("A", "B", "C",
"D", "E", "F", "G", "H", "I", "J", "K"), class = "factor"), X1 = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA), X2 = c(11L, 12L, 13L,
14L, 15L, 16L, 17L, 18L, NA, NA, NA), X3 = c(21L, 22L, 23L, 24L,
25L, 26L, NA, NA, NA, NA, NA), X4 = c(31L, 32L, 33L, 34L, NA,
NA, NA, NA, NA, NA, NA), X5 = c(41L, 42L, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), row.names = c(NA, -11L), class = "data.frame")
Toy data d.huge:
d.huge <- setNames(data.frame(matrix(1:15.3e5, 3e4, 51)),
c("id", paste0("X", 1:50)))
d.huge[, 41:51] <- lapply(d.huge[, 41:51], function(x){
x[which(x %in% sample(x, .05*length(x)))] <- NA
x
})
Appendix (see comments latest answer):
d.huge <- read.csv("d.huge.csv")
d.huge.1 <- d.huge[sample(nrow(d.huge), 3/4*nrow(d.huge)), ]
d1 <- seqRank3(d.huge.1, 1.27e-1, 1.780e1)
d2 <- d1[complete.cases(d1), ]
dim(d2)
names(d2)

This takes less than one second on the huge data
l1 = combn(2:length(d), 2, function(x) d[x[1]:x[2]], simplify = FALSE)
# If you also need "combinations" of only single columns, then uncomment the next line
# l1 = c(d[-1], l1)
l2 = sapply(l1, function(x) sum(complete.cases(x)))
score = sapply(1:length(l1), function(i) NCOL(l1[[i]]) * l2[i])
best_score = which.max(score)
best = l1[[best_score]]
The question was unclear about how to rank the various combinations. We can use different scoring formulae to generate different preferences. For example, to weight number of rows versus columns separately we can do
col_weight = 2
row_weight = 1
score = sapply(1:length(l1), function(i) col_weight*NCOL(l1[[i]]) + row_weight * l2[i])

Convert to matrix and calculate Na counts for each column:
dm <- is.na(d[, -1])
na_counts <- colSums(dm)
x <- data.frame(na_counts = na_counts, non_na_count = nrow(dm) - na_counts)
x <- as.matrix(x)
# create all combinations for column indexes:
nx <- 1:nrow(x)
psr <- do.call(c, lapply(seq_along(nx), combn, x = nx, simplify = FALSE))
# test if continuous:
good <- sapply(psr, function(y) !any(diff(sort.int(y)) != 1L))
psr <- psr[good == T] # remove non continuous
# for each combo count nas and non NA:
s <- sapply(psr, function(y) colSums(x[y, , drop = F]))
# put all together in table:
res <- data.frame(var_count = lengths(psr), t(s))
res$var_indexes <- sapply(psr, paste, collapse = ',')
res
# var_count na_counts non_na_count var_indexes
# 1 1 1 10 1
# 2 1 3 8 2
# 3 1 5 6 3
# 4 1 7 4 4
# 5 1 9 2 5
# 6 2 4 18 1,2
# 7 2 8 14 2,3
# 8 2 12 10 3,4
# 9 2 16 6 4,5
# 10 3 9 24 1,2,3
# 11 3 15 18 2,3,4
# 12 3 21 12 3,4,5
# 13 4 16 28 1,2,3,4
# 14 4 24 20 2,3,4,5
# 15 5 25 30 1,2,3,4,5
# choose
As var indexes are sorted, for speed we can use simply:
good <- sapply(psr, function(y) !any(diff(y) != 1L))

Just to clarify, the seqsubm function from TraMineR has no problem at all with NAs, nor with sequences of different length. However, the function expects a state sequence object (to be created with seqdef) as input.
The function seqsubm is for computing substitution costs (i.e. dissimilarities) between states by means of different methods. You probably refer to the method ('TRATE') that derives the costs from the observed transition probabilities, namely as 2-p(i|j) - p(j|i), where p(i|j) is the probability to be in state i in t when we were in state j in t-1. So, all we need are the transition probabilities, which can easily be estimated from a set of sequences of different length or with gaps within them.
I illustrate below using the ex1 data that ships with TraMineR. (Due to the high number of different states in your toy example, the resulting matrix of substitution costs would be too large (28 x 28) for this illustration.)
library(TraMineR)
data(ex1)
sum(is.na(ex1))
# [1] 38
sq <- seqdef(ex1[1:13])
sq
# Sequence
# s1 *-*-*-A-A-A-A-A-A-A-A-A-A
# s2 D-D-D-B-B-B-B-B-B-B
# s3 *-D-D-D-D-D-D-D-D-D-D
# s4 A-A-*-*-B-B-B-B-D-D
# s5 A-*-A-A-A-A-*-A-A-A
# s6 *-*-*-C-C-C-C-C-C-C
# s7 *-*-*-*-*-*-*-*-*-*-*-*-*
sm <- seqsubm(sq, method='TRATE')
round(sm,digits=3)
# A-> B-> C-> D->
# A-> 0 2.000 2 2.000
# B-> 2 0.000 2 1.823
# C-> 2 2.000 0 2.000
# D-> 2 1.823 2 0.000
Now, it is not clear to me what you want to do with the state dissimilarities. Inputting them in a clustering algorithm, you would cluster the states. If you want to cluster the sequences, then you should first compute dissimilarities between sequences (using seqdist and possibly passing the matrix of substitution costs returned by seqsubm as sm argument) and then input the resulting distance matrix in the clustering algorithm.

Double for loop with NA in R

I have a couple of questions with my R script. I have a database with many series which have NA and numeric values. I would like to replace the NA by a 0 from the moment we have a numeric value but keep the NA if the serie is not started.
As we see below, for example in the second column I would like to keep the 2 first NA but replace the fourth by 0.
example
There is my script, but it doesn't work
my actual script
It would be very kind to have some suggestions
Many thanks
ER

In case you, or anyone else, want to avoid for loops:
# example dataset
df = data.frame(x1 = c(23,NA,NA,35),
x2 = c(NA,NA,45,NA),
x3 = c(4,34,NA,5))
# function to replace NAs not in the beginning of vector with 0
f = function(x) { x[is.na(x) & cumsum(!is.na(x)) != 0] = 0; x }
# apply function and save as dataframe
data.frame(sapply(df, f))
# x1 x2 x3
# 1 23 NA 4
# 2 0 NA 34
# 3 0 45 0
# 4 35 0 5
Or using tidyverse and the same function f:
library(tidyverse)
df %>% map_df(f)
# # A tibble: 4 x 3
# x1 x2 x3
# <dbl> <dbl> <dbl>
# 1 23. NA 4.
# 2 0. NA 34.
# 3 0. 45. 0.
# 4 35. 0. 5.

if this is your dataset:
ORIGINAL_DATA <- data.frame(X1 = c(23, NA, NA, 35),
X2 = c(NA, NA, 45, NA),
X3 = c(4, 34, NA, 5))
This could probably work:
for(i in 1:ncol(ORIGINAL_DATA)) {
for (j in 1:nrow(ORIGINAL_DATA)) {
if(!is.na(ORIGINAL_DATA[j, i])) {
ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i] <- ifelse(is.na(ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i]), 0, ORIGINAL_DATA[c(j:nrow(ORIGINAL_DATA)), i])
# To end this for-loop
j <- nrow(ORIGINAL_DATA)
}
}
}

How to add columns to data.frame based on vector length

I have a function runBootstrap whose output result is a vector of variable length (depending on # of values for cat, which itself is a product of test). Apologies that this isn't "minimal".
require(dplyr)
test <- function(combo) {
if(combo[1] == 4) {
cat <- 4
} else if((combo[1] == 3 & combo[2] == 2) | (combo[1] == 2 & combo[2] == 2)) {
cat <- 3
} else if((combo[1] == 2 & combo[2] == 1) | (combo[1] == 1 & combo[2] == 2)) {
cat <- 2
} else {
cat <- 1
}
}
arg1.freqs <- c(0.5, 0.2, 0.1, 0.1)
arg2.freqs <- c(0.8, 0.2)
runBootstrap <- function(arg1.freqs, arg2.freqs) {
sim.df <- data.frame(x1 = 1:10000, y1 = NA)
sim.df$x1 <- sample(1:4, 10000, replace = TRUE,
prob = arg1.freqs)
sim.df$y1 <- sample(1:2, 10000, replace = TRUE,
prob = arg2.freqs)
sim.df$cat <- NA
for(i in 1:nrow(sim.df)) {
combo <- c(sim.df[i, 1], sim.df[i, 2])
sim.df$cat[i] <- test(combo)
}
sim.df <- sim.df %>%
select(cat) %>%
group_by(cat) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))
sim.df <- as.data.frame(sim.df)
result <- c(sim.df[1, 3], sim.df[2, 3])
}
In this current version there are only two values for cat so result is a vector of length 2; in a future version I will adjust code so that length(result) will equal # values of cat.
When using the function in a for loop, I would like to use the vector values to create new columns in an already existing data.frame df1. The code I've tried thus far is as follows:
df1$result <- NA
for (i in 1:nrow(df1)) {
df1$result[i] <- runBootstrap(arg1.freqs, arg2.freqs)
}
This clearly doesn't work unless the result vector is length = 1. But I don't know the length of the vector until the function runs (although once it runs it will be same length each iteration).
What I would like to achieve is the following:
Example 1: if length(result) == 2
df1.col x1 x2
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
Example 2: if length(result) == 3
df1.col x1 x2 x3
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
Thanks for any advice or direction.
edited for clarification
UPDATE - edited with solution
I got it to work as I wanted by creating a blank list, populating, then using rbind as follows:
appendResults <- function(df1, arg1, arg2) {
my.list <- vector("list", nrow(df1))
for (i in 1:nrow(df1)) {
arg1.freqs <- as.numeric(arg1[i, 3:6])
arg2.freqs <- as.numeric(arg2[i, 3:4])
my.list[[i]] <- runBootstrap(arg1.freqs, arg2.freqs)
}
result.df <- do.call(rbind, my.list)
df2 <- do.call(cbind, list(df1, result.df))
}

Check this one, not sure what the result looks like, but this creates empty columns, equal to the length of results, with NAs:
# fake data frame
df1 <- data.frame(x = c(1,2,3), y = c("a", "b", "c"))
# say result has length 3
res <- c(5,6,7)
# make columns with names x1, ..., x + length of res
# and assign NA values to those column
df1[ , paste("x", 1:length(res), sep = "")] <- NA

R - Collapse into vector same member of a list

I have a list with same structure for every member as the following
config <- NULL
config[["secA"]] <- NULL
config[["secA"]]$VAL <- 0
config[["secA"]]$ARR <- c(1,2,3,4,5)
config[["secA"]]$DF <- data.frame(matrix(c(1,5,3,8),2,2))
config[["secB"]] <- NULL
config[["secB"]]$VAL <- 1
config[["secB"]]$ARR <- c(1,3,2,4,9)
config[["secB"]]$DF <- data.frame(matrix(c(2,6,1,9),2,2))
config[["secC"]] <- NULL
config[["secC"]]$VAL <- 5
config[["secC"]]$ARR <- c(4,2,1,5,8)
config[["secC"]]$DF <- data.frame(matrix(c(4,2,1,7),2,2))
and I need to obtain 3 vectors VAL, ARR and DF, each with the concatenated elements of the corresponding member. such as
# VAL: 0,1,5
# ARR: 1,2,3,4,5,1,3,2,4,9,4,2,1,5,8
# DF: 1,5,3,8,2,6,1,9,4,2,1,7
Looking at similar situations, I have the feeling I need to use a combination of do.call and cbind or lapply but I have no clue. any suggestions?

config <- NULL
config[["secA"]] <- NULL
config[["secA"]]$VAL <- 0
config[["secA"]]$ARR <- c(1,2,3,4,5)
config[["secA"]]$DF <- data.frame(matrix(c(1,5,3,8),2,2))
config[["secB"]] <- NULL
config[["secB"]]$VAL <- 1
config[["secB"]]$ARR <- c(1,3,2,4,9)
config[["secB"]]$DF <- data.frame(matrix(c(2,6,1,9),2,2))
config[["secC"]] <- NULL
config[["secC"]]$VAL <- 5
config[["secC"]]$ARR <- c(4,2,1,5,8)
config[["secC"]]$DF <- data.frame(matrix(c(4,2,1,7),2,2))
sapply(names(config[[1]]), function(x)
unname(unlist(sapply(config, `[`, x))), USE.NAMES = TRUE)
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# [1] 1 5 3 8 2 6 1 9 4 2 1 7
Or you can use this clist function
Unfortunately there were no other answers.
(l <- Reduce(clist, config))
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# X1 X2 X1 X2 X1 X2
# 1 1 3 2 1 4 1
# 2 5 8 6 9 2 7
It merges data frames and matrices, so you need to unlist to get the vector you want
l$DF <- unname(unlist(l$DF))
l
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# [1] 1 5 3 8 2 6 1 9 4 2 1 7
Function
clist <- function (x, y) {
islist <- function(x) inherits(x, 'list')
'%||%' <- function(a, b) if (!is.null(a)) a else b
get_fun <- function(x, y)
switch(class(x %||% y),
matrix = cbind,
data.frame = function(x, y)
do.call('cbind.data.frame', Filter(Negate(is.null), list(x, y))),
factor = function(...) unlist(list(...)), c)
stopifnot(islist(x), islist(y))
nn <- names(rapply(c(x, y), names, how = 'list'))
if (is.null(nn) || any(!nzchar(nn)))
stop('All non-NULL list elements should have unique names', domain = NA)
nn <- unique(c(names(x), names(y)))
z <- setNames(vector('list', length(nn)), nn)
for (ii in nn)
z[[ii]] <- if (islist(x[[ii]]) && islist(y[[ii]]))
Recall(x[[ii]], y[[ii]]) else
(get_fun(x[[ii]], y[[ii]]))(x[[ii]], y[[ii]])
z
}

Another approach, with slightly less code.
un_config <- unlist(config)
un_configNAM <- names(un_config)
vecNAM <- c("VAL", "ARR", "DF")
for(n in vecNAM){
assign(n, un_config[grepl(n, un_configNAM)])
}
This will return 3 vectors as the OP requested. However, generally it is more advantageous to store results in a list as rawr suggests. You of course can adopt the above code so that results are stored within a list.
l <- rep(list(NA), length(vecNAM))
i = 1
for(n in vecNAM){
l[[i]] <- un_config[grepl(n, un_configNAM)]
i = i +1
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Counting split rules in decision trees in R - r

Related

How to make a For loop that keeps the original row value

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

Double for loop with NA in R

How to add columns to data.frame based on vector length

R - Collapse into vector same member of a list

Categories

Resources