I have a list of lists, as follows:
my_list = list(list(a=1,b=2),list(a=1,b=2),list(a=1,b=2))
I have a vector b_new, the length of which is exactly the same as length(my_list):
b_new = c(3,4,5)
I would like to overwrite the b-elements of my_list with the values in b sequentially, so the expected output is:
my_list = list(list(a=1,b=3),list(a=1,b=4),list(a=1,b=5))
I could obviously do this in a for loop:
for(i in 1:length(b_new))
{
my_list[[i]]$b=b_new[i]
}
but I wonder if there is a way of doing this without a for loop, for example using mapply?
It's still a loop really, but the following will do it:
Map(`[<-`, my_list, "b", b_new)
# or more pleasantly named:
Map(replace, my_list, "b", b_new)
str(Map(`[<-`, my_list, "b", b_new))
#List of 3
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 3
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 4
# $ :List of 2
# ..$ a: num 1
# ..$ b: num 5
Related
I have a list of elements that hold values in them. I would to write an if statement where if their isn't a specific number of elements for a specific ID (e.g., A, B, C) then add the appropriate number of elements, and assign an NA as the value in the element. I would like the output to look something like expected_ID. Is there an efficient way of doing this?
library(lubridate)
library(tidyverse)
library(purrr)
date <- rep_len(seq(dmy("01-01-2011"), dmy("31-07-2011"), by = "days"), 200)
ID <- rep(c("A","B", "C"), 200)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df$Month <- month(df$date)
int1 <- df %>%
# arrange(ID) %>% # skipped for readability of result
mutate(new = floor_date(date, '10 day')) %>%
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(ID, new) %>%
filter(Month == "1") %>%
group_split()
names(int1) <- sapply(int1, function(x) paste(x$ID[1]))
int1 <- int1[-c(6, 8, 9)]
expected_ID <- list(int1[[1]], int1[[2]], int1[[3]], int1[[4]], int1[[5]], NA, int1[[6]], NA, NA)
names(expected_ID) <- c(rep("A", 3), rep("B", 3), rep("C", 3))
It's not usually desirable to create lists with repeated names, and it would be better to store these data in a hierarchical structure. This is necessary to achieve your intended output, but after having done that, we can get the data back to the format you've specified. Comments are in the code block below.
# split the list into a list of nested lists
lst <- split(int1, names(int1))
# fill each inner list to the desired length
# the use of pmax() ensures that rep() will not be sent an invalid negative value
# '3' here is your desired list length
filled_lst <- lapply(lst, \(x) list(x, rep(list(NA), pmax(0, 3 - length(x)))))
# convert to desired flattened output format
flat_lst <- unlist(unlist(filled_lst, recursive = F), recursive = F)
names(flat_lst) <- sub('(.).*', '\\1', names(flat_lst))
For posterity, here is my original answer, which worked on the example in which ID was a list of vectors.
# split the list into a list of nested lists
lst <- split(ID, names(ID))
str(lst)
List of 3
$ A:List of 3
..$ A: int [1:3] 1 2 3
..$ A: int [1:3] 4 5 6
..$ A: int [1:3] 7 8 9
$ B:List of 2
..$ B: int [1:2] 1 2
..$ B: int [1:2] 3 4
$ C:List of 1
..$ C: int [1:3] 1 2 3
# fill each nested list to the desired length
# the use of pmax() ensures that rep() will not be sent an invalid negative value
# '3' here is your desired list length
filled_lst <- lapply(lst, \(x) c(x, rep(NA, pmax(0, 3 - length(x)))))
str(filled_lst)
List of 3
$ A:List of 3
..$ A: int [1:3] 1 2 3
..$ A: int [1:3] 4 5 6
..$ A: int [1:3] 7 8 9
$ B:List of 3
..$ B: int [1:2] 1 2
..$ B: int [1:2] 3 4
..$ : logi NA
$ C:List of 3
..$ C: int [1:3] 1 2 3
..$ : logi NA
..$ : logi NA
# convert to desired flattened output format
flat_lst <- unlist(filled_lst, recursive = F)
names(flat_lst) <- gsub('\\d|.\\.+', '', names(flat_lst))
str(flat_lst)
List of 9
$ A: int [1:3] 1 2 3
$ A: int [1:3] 4 5 6
$ A: int [1:3] 7 8 9
$ B: int [1:2] 1 2
$ B: int [1:2] 3 4
$ B: logi NA
$ C: int [1:3] 1 2 3
$ C: logi NA
$ C: logi NA
Given a nested list for example as below
lst <- list(
1,
list(list(c(4, 5, 4)), list(c(6, 7))),
list(c(2, 3, 3)),
list(list(c(5, 5, 6)), list(c(7, 7, 7)))
)
> str(lst)
List of 4
$ : num 1
$ :List of 2
..$ :List of 1
.. ..$ : num [1:3] 4 5 4
..$ :List of 1
.. ..$ : num [1:2] 6 7
$ :List of 1
..$ : num [1:3] 2 3 3
$ :List of 2
..$ :List of 1
.. ..$ : num [1:3] 5 5 6
..$ :List of 1
.. ..$ : num [1:3] 7 7 7
Let's say its deepest level is 3, e.g., depths of vectors 4 5 4, 6 7, 5 5 6 and 7 7 7 in lst.
I am wondering if there is a way that only runs a certain function over those deepest levels, while other levels are untouched. For example, if the the function is unique, then my expected output is
lstout <- list(
1,
list(list(c(4, 5)),list(c(6,7))),
list(c(2, 3, 3)),
list(list(c(5, 6)), list(7))
)
> str(lstout)
List of 4
$ : num 1
$ :List of 2
..$ :List of 1
.. ..$ : num [1:2] 4 5
..$ :List of 1
.. ..$ : num [1:2] 6 7
$ :List of 1
..$ : num [1:3] 2 3 3
$ :List of 2
..$ :List of 1
.. ..$ : num [1:2] 5 6
..$ :List of 1
.. ..$ : num 7
It seems rapply cannot run the function selectively only on the deepest level. I have no clue how to make it.
Any base R idea or solution would be greatly appreciated!
We can recursively descend lst to find the maximum depth and then use that to recursively descend again applying unique only at the maximum depth. No packages are used.
maxDepth <- function(x, depth = 0) {
if (is.list(x)) max(sapply(x, maxDepth, depth+1))
else depth
}
lstUnique <- function(x, depth = maxDepth(x)) {
if (depth == 0) unique(x)
else if (is.list(x)) lapply(x, lstUnique, depth-1)
else x
}
lstUnique(lst)
Variation using rapply
A variation of the above is to recursively add a class to each leaf equal to its depth. Then we can use rapply three times. First use rapply to extract the classes and take the maximum to find the maximum depth. second use rapply to apply unique on just the nodes having the maximum depth class. Third, remove any remaining classes that were not removed by unique because the node was not at maximum depth. (The third rapply, i.e. the last line of code below, could be omitted if it is ok to leave some leaves with the classes we added.)
addDepth <- function(x, depth = 0) {
if (is.list(x)) lapply(x, addDepth, depth+1)
else structure(x, class = format(depth))
}
lst2 <- addDepth(lst)
mx <- max(as.numeric(rapply(lst2, class))) # max depth
lst3 <- rapply(lst2, unique, classes = format(mx), how = "replace")
rapply(lst3, as.vector, how = "replace")
Note on rapply
Note that if you alternately wanted to run unique on all leaves rather than just on the maximum depth leaves then rapply in base R would work.
rapply(lst, unique, how = "replace")
data.tree
This alternative does require the use of a package. First we create a data.tree dt and then traverse it applying unique to the nodes that satisfy the filterFun.
library(data.tree)
dt <- as.Node(lst)
dt$Do(function(x) x$"1" <- unique(x$"1"),
filterFun = function(x) x$level == dt$height)
print(dt, "1")
rrapply
The rrapply package provides an enhancement to rapply which can also pass a position vector whose length equals the depth so we can use it first to calculate the maximum depth mx and then again to apply unique only at that depth. (Have updated rrapply call to use how = "unlist" as opposed to applying unlist afterwards as per suggestion in comments.)
library(rrapply)
mx <- max(rrapply(lst, f = function(x, .xpos) length(.xpos), how = "unlist"))
uniq_mx <- function(x, .xpos) if (length(.xpos) == mx) unique(x) else x
rrapply(lst, is.numeric, uniq_mx)
Cannot think of a base R option, but with purrr, you can a get close solution:
modify_depth(lst, 3, unique, .ragged = TRUE)
[[1]]
[1] 1
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[1] 4 5
[[2]][[2]]
[[2]][[2]][[1]]
[1] 6 7
[[3]]
[[3]][[1]]
[1] 2 3 3
[[4]]
[[4]][[1]]
[[4]][[1]][[1]]
[1] 5 6
[[4]][[2]]
[[4]][[2]][[1]]
[1] 7
Given a nested list, how to create all possible lists from its elements, while preserving the structure of the nested list?
Nested list:
l = list(
a = list(
b = 1:2
),
c = list(
d = list(
e = 3:4,
f = 5:6
)
),
g = 7
)
Desired output: all possible combinations of the elements of l, while preserving the structure, e.g.:
# One possible output:
list(
a = list(
b = 1
),
c = list(
d = list(
e = 3,
f = 5
)
),
g = 7
)
# Another possible output:
list(
a = list(
b = 1
),
c = list(
d = list(
e = 4,
f = 5
)
),
g = 7
)
My approach so far is to:
flatten the list (e.g., as discussed in this answer)
expand.grid() and get a matrix where each row represents a unique combination
parse every row of the resulting matrix and reconstruct the structure from the names() using regular expressions
I am looking for a less cumbersome approach because I have no guarantee that the names of the list elements will not change.
The relist function from utils seems to be designed for this task:
rl <- as.relistable(l)
r <- expand.grid(data.frame(rl), KEEP.OUT.ATTRS = F)
> head(r, 5)
b c.d.e c.d.f g
1 1 3 5 7
2 2 3 5 7
3 1 4 5 7
4 2 4 5 7
5 1 3 6 7
It saves the structure of the list (skeleton). This means one can now manipulate the data within the nested list and re-assign it into the structure (flesh). Here with the first row of the expanded matrix.
r <- rep(unname(unlist(r[1,])),each = 2)
l2 <- relist(r, skeleton = rl)
> l2
$a
$a$b
[1] 1 1
$c
$c$d
$c$d$e
[1] 3 3
$c$d$f
[1] 5 5
$g
[1] 7
attr(,"class")
[1] "relistable" "list"
Note that since the structure stays the same, I need to supply the same amount of elements as in the original list. This is why used rep to repeat the element twice. One could also fill it with NA, I guess.
For every possible combination iterate through r (expanded):
lapply(1:nrow(r), function(x)
relist(rep(unname(unlist(r[x,])),each = 2), skeleton = rl))
Combining Ben Nutzer's brilliant answer and Joris Chau's brilliant comment, the answer will become a one-liner:
apply(expand.grid(data.frame(l)), 1L, relist, skeleton = rapply(l, head, n = 1L, how = "list"))
It creates a list of lists with as many elements as rows returned by expand.grid(). The result is better visualised by the output of str():
str(apply(expand.grid(data.frame(l)), 1L, relist, skeleton = rapply(l, head, n = 1L, how = "list")))
List of 16
$ :List of 3
..$ a:List of 1
.. ..$ b: num 1
..$ c:List of 1
.. ..$ d:List of 2
.. .. ..$ e: num 3
.. .. ..$ f: num 5
..$ g: num 7
$ :List of 3
..$ a:List of 1
.. ..$ b: num 2
..$ c:List of 1
.. ..$ d:List of 2
.. .. ..$ e: num 3
.. .. ..$ f: num 5
..$ g: num 7
...
...
...
$ :List of 3
..$ a:List of 1
.. ..$ b: num 2
..$ c:List of 1
.. ..$ d:List of 2
.. .. ..$ e: num 4
.. .. ..$ f: num 6
..$ g: num 7
Unequal sublist lengths
Here is an approach --extending on Uwe and Ben's answers-- that also works for arbitrary sublist lengths. Instead of calling expand.grid on data.frame(l), first flatten l to a single-level list and then call expand.grid on it:
## skeleton
skel <- rapply(l, head, n = 1L, how = "list")
## flatten to single level list
l.flat <- vector("list", length = length(unlist(skel)))
i <- 0L
invisible(
rapply(l, function(x) {
i <<- i + 1L
l.flat[[i]] <<- x
})
)
## expand all list combinations
l.expand <- apply(expand.grid(l.flat), 1L, relist, skeleton = skel)
str(l.expand)
#> List of 12
#> $ :List of 3
#> ..$ a:List of 1
#> .. ..$ b: num 1
#> ..$ c:List of 1
#> .. ..$ d:List of 2
#> .. .. ..$ e: num 3
#> .. .. ..$ f: num 5
#> ..$ g: num 7
#> ...
#> ...
#> $ :List of 3
#> ..$ a:List of 1
#> .. ..$ b: num 2
#> ..$ c:List of 1
#> .. ..$ d:List of 2
#> .. .. ..$ e: num 4
#> .. .. ..$ f: num 7
#> ..$ g: num 7
Data
I slightly modified the data structure, so that the sublist components e and f are of unequal length.
l <- list(
a = list(
b = 1:2
),
c = list(
d = list(
e = 3:4,
f = 5:7
)
),
g = 7
)
## calling data.frame on l does not work
data.frame(l)
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 3
Putting together the great answers from Ben Nutzer and Joris Chau, we have a way to create all possible combinations from a nested list, regardless of whether some sublist components are of unequal length.
Put together as a function:
list.combine <- function(input) {
# Create list skeleton.
skeleton <- rapply(input, head, n = 1, how = "list")
# Create storage for the flattened list.
flattened = list()
# Flatten the list.
invisible(rapply(input, function(x) {
flattened <<- c(flattened, list(x))
}))
# Create all possible combinations from list elements.
combinations <- expand.grid(flattened, stringsAsFactors = FALSE)
# Create list for storing the output.
output <- apply(combinations, 1, relist, skeleton = skeleton)
return(output)
}
Note: If a character type exists in the sublist components, then everything will be coerced to a character. For example:
# Input list.
l <- list(
a = "string",
b = list(
c = 1:2,
d = 3
)
)
# Applying the function.
o <- list.combine(l)
# View the list:
str(o)
# List of 2
# $ :List of 2
# ..$ a: chr "string"
# ..$ b:List of 2
# .. ..$ c: chr "1"
# .. ..$ d: chr "3"
# $ :List of 2
# ..$ a: chr "string"
# ..$ b:List of 2
# .. ..$ c: chr "2"
# .. ..$ d: chr "3"
One--slow--way around this is to relist within a loop which will maintain the data in a 1x1 dataframe. Accessing the dataframe as df[, 1] will give a vector of length 1 of the original type as the element in the input list. For example:
Updated list.combine():
list.combine <- function(input) {
# Create list skeleton.
skeleton <- rapply(input, head, n = 1, how = "list")
# Create storage for the flattened list.
flattened = list()
# Flatten the list.
invisible(rapply(input, function(x) {
flattened <<- c(flattened, list(x))
}))
# Create all possible combinations from list elements.
combinations <- expand.grid(flattened, stringsAsFactors = FALSE)
# Create list for storing the output.
output <- list()
# Relist and preserve original data type.
for (i in 1:nrow(combinations)) {
output[[i]] <- retain.element.type(relist(flesh = combinations[i, ], skeleton = skeleton))
}
return(output)
}
Then the retain.element.type():
retain.element.type <- function(input.list) {
for (name in names(input.list)) {
# If the element is a list, recall the function.
if(inherits(input.list[[name]], "list")) {
input.list[[name]] <- Recall(input.list[[name]])
# Else, get the first element and preserve the type.
} else {
input.list[[name]] <- input.list[[name]][, 1]
}
}
return(input.list)
}
Example:
# Input list.
l <- list(
a = "string",
b = list(
c = 1:2,
d = 3
)
)
# Applying the updated function to preserve the data type.
o <- list.combine(l)
# View the list:
str(o)
# List of 2
# $ :List of 2
# ..$ a: chr "string"
# ..$ b:List of 2
# .. ..$ c: int 1
# .. ..$ d: num 3
# $ :List of 2
# ..$ a: chr "string"
# ..$ b:List of 2
# .. ..$ c: int 2
# .. ..$ d: num 3
I'm new to R. I'd like to get a number of statistics on the numeric columns (say, column C) of a data frame (dt) based on the combination of factor columns (say, columns A and B). First, I want the results by grouping both columns A and B, and then the same operations by A alone and by B alone. I've written a code that looks like the one below. I have a list of the factor combinations that I'd like to test (groupList) and then for each iteration of the loop I feed an element of that list as the argument to "by". However, as surely you can see, it doesn't work. R doesn't recognize the elements of the list as arguments to the function "by". Any ideas on how to make this work? Any pointer or suggestion is welcome and appreciated.
groupList <- list(".(A, B)", "A", "B")
for(i in 1:length(groupList)){
output <- dt[,list(mean=mean(C),
sd=sd(C),
min=min(C),
median=median(C),
max=max(C)),
by = groupList[i]]
Here insert code to save each output
}
I guess aggregate function can solve your problem. Let us say you have a dataframe df contains three columns A,B,C,given as:
df<-data.frame(A=rep(letters[1:3],3),B=rep(letters[4:6],each=3),C=1:9)
If you want calculate mean of C by factor A, try:
aggregate(formula=C~A,data=df,FUN=mean)
by factor B, try:
aggregate(formula=C~B,data=df,FUN=mean)
by factor A and B, try:
aggregate(formula=C~A+B,data=df,FUN=mean)
Your groupList can be restructured as a list of character vectors. Then you can either use lapply or the existing for loop with an added eval() to interpret the by= input properly:
set.seed(1)
dt <- data.table(A=rep(1:2,each=5), B=rep(1:5,each=2), C=1:10)
groupList <- list(c("A", "B"), c("A"), c("B"))
lapply(
groupList,
function(x) {
dt[, .(mean=mean(C), sd=sd(C)), by=x]
}
)
out <- vector("list", 3)
for(i in 1:length(groupList)){
out[[i]] <- dt[, .(mean=mean(C), sd=sd(C)), by=eval(groupList[[i]]) ]
}
str(out)
#List of 3
# $ :Classes ‘data.table’ and 'data.frame': 6 obs. of 4 variables:
# ..$ A : int [1:6] 1 1 1 2 2 2
# ..$ B : int [1:6] 1 2 3 3 4 5
# ..$ mean: num [1:6] 1.5 3.5 5 6 7.5 9.5
# ..$ sd : num [1:6] 0.707 0.707 NA NA 0.707 ...
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 2 obs. of 3 variables:
# ..$ A : int [1:2] 1 2
# ..$ mean: num [1:2] 3 8
# ..$ sd : num [1:2] 1.58 1.58
# ..- attr(*, ".internal.selfref")=<externalptr>
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
# ..$ B : int [1:5] 1 2 3 4 5
# ..$ mean: num [1:5] 1.5 3.5 5.5 7.5 9.5
# ..$ sd : num [1:5] 0.707 0.707 0.707 0.707 0.707
For demonstration, I used the mtcars data set. Here is one way with the dplyr package.
library(dplyr)
# create a vector of functions that you need
describe <- c("mean", "sd", "min", "median", "max")
# group by the variable gear
mtcars %>%
group_by(gear) %>%
summarise_at(vars(mpg), describe)
# group by the variable carb
mtcars %>%
group_by(carb) %>%
summarise_at(vars(mpg), describe)
# group by both gear and carb
mtcars %>%
group_by(gear, carb) %>%
summarise_at(vars(mpg), describe)
I have a list of data frames that have different dimensions. I want to create different alternative sublists that contain data frames with the same number of columns.
The structure of my list df_list looks something like this:
List of 6
$ df1:'data.frame': 49743 obs. of 88 variables
$ df2:'data.frame': 49889 obs. of 89 variables
$ df3:'data.frame': 50500 obs. of 91 variables
$ df4:'data.frame': 49732 obs. of 88 variables
$ df5:'data.frame': 48500 obs. of 90 variables
$ df6:'data.frame': 50011 obs. of 91 variables
My desired output would be something similar to:
sub_list1 = list(df1, df4)
sub_list2 = list(df3, df6)
Could anyone help me to solve this issue? Many thanks in advance
It's very easily solved using
split(df_list, lengths(df_list))
# or for older R versions: split(df_list, sapply(df_list, ncol))
which will result in a new list of lists and each of the sublists contains data.frame's with equal numbers of columns.
Here's a reproducible example:
l <- list(
data.frame(x = 1),
data.frame(x = 1, y = 2),
data.frame(x = 1),
data.frame(x = 1, y = 2, z = 3),
data.frame(x = 1))
To check how many variables each data.frame in l has, run:
lengths(l)
#[1] 1 2 1 3 1
Now you can split them and check the structure:
res <- split(l, lengths(l))
str(res)
#List of 3
# $ 1:List of 3
# ..$ :'data.frame': 1 obs. of 1 variable:
# .. ..$ x: num 1
# ..$ :'data.frame': 1 obs. of 1 variable:
# .. ..$ x: num 1
# ..$ :'data.frame': 1 obs. of 1 variable:
# .. ..$ x: num 1
# $ 2:List of 1
# ..$ :'data.frame': 1 obs. of 2 variables:
# .. ..$ x: num 1
# .. ..$ y: num 2
# $ 3:List of 1
# ..$ :'data.frame': 1 obs. of 3 variables:
# .. ..$ x: num 1
# .. ..$ y: num 2
# .. ..$ z: num 3