Say, I have the following list
raw <- list(list(1:2, 2:3, 3:4), list(4:5, 5:6, 6:7), list(7:8, 8:9, 9:10))
I would like to find the mean of the corresponding entries of the out-most list. The expected output would be something like
[[1]]
[1] 4 5
[[2]]
[1] 5 6
[[3]]
[1] 6 7
This is because the mean of 1:2, 4:5, and 7:8 would be 4:5.
I have been experimenting with stuff like lapply(raw, function(x) lapply(x, mean)), but apparently it doesn't return the desired output.
This is pretty ugly, but we can use mapply to iterate over the lists but we need to expand the list into parameters via do.call
do.call("mapply", c(function(...) rowMeans(data.frame(...)), raw, SIMPLIFY=FALSE))
You can make this prettier using the purrr package
purrr::pmap(raw, ~rowMeans(data.frame(...)))
1
n = length(raw[[1]])
lapply(1:n, function(i){
d = do.call(rbind, lapply(seq_along(raw), function(j){
raw[[j]][[i]]
}))
apply(d, 2, mean)
})
#[[1]]
#[1] 4 5
#[[2]]
#[1] 5 6
#[[3]]
#[1] 6 7
2
aggregate(. ~ ind, do.call(rbind, lapply(raw, function(x)
data.frame(cbind(do.call(rbind, x), ind = seq_along(x))))), mean)
# ind V1 V2
#1 1 4 5
#2 2 5 6
#3 3 6 7
You could put the thing into an array and take the cell medians (I suppose you want these instead of means).
A <- array(matrix(unlist(raw), 2, byrow=FALSE), dim=c(2, 3, 3))
v.mds <- t(apply(A, 1:2, median))
lapply(1:3, function(x) v.mds[x, ])
# [[1]]
# [1] 4 5
#
# [[2]]
# [1] 5 6
#
# [[3]]
# [1] 6 7
Generalized like so:
A <- array(matrix(unlist(raw), length(el(el(raw))), byrow=0),
dim=c(length(el(el(raw))), el(lengths(raw)), length(raw)))
v.mds <- t(apply(A, 1:2, median))
lapply(1:nrow(v.mds), function(x) v.means[x, ])
Related
Simple problem, given a list:
main_list <- list(1:3,
4:6,
7:9,
10:12,
13:15)
main_list
# [[1]]
# [1] 1 2 3
# [[2]]
# [1] 4 5 6
# [[3]]
# [1] 7 8 9
# [[4]]
# [1] 10 11 12
# [[5]]
# [1] 13 14 15
I want to split the list into multiple lists where I break up the original one into lists each of length x. So if I said x = 2, I would get 3 lists of length 2, 2 and the leftover 1:
target <- list(list(1:3,
4:6),
list(7:9,
10:12),
list(13:15))
target
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[1]][[2]]
# [1] 4 5 6
# [[2]]
# [[2]][[1]]
# [1] 7 8 9
# [[2]][[2]]
# [1] 10 11 12
# [[3]]
# [[3]][[1]]
# [1] 13 14 15
Something like:
my_split <- function(listtest, x) {
split(listtest, c(1:x))
}
target <- my_split(main_list, 2)
Thanks
here is an option with gl
split(main_list, as.integer(gl(length(main_list), 2, length(main_list))))
It can be converted to a custom function
f1 <- function(lstA, n) {
l1 < length(lstA)
split(lstA, as.integer(gl(l1, n, l1)))
}
EDIT: no conditional logic needed. Just use split() with c() and rep():
my_split <- function(l, x){
l_length <- length(l)
l_div <- l_length / x
split(l, c(rep(seq_len(l_div), each = x), rep(ceiling(l_div), l_length %% x)))
}
my_split(main_list, 2)
What is the most elegant way to split a vector into n-Elements based on a condition?
Every separate true-block should go into its own list element. All the false elements get thrown away.
example1:
vec <- c(1:3,NA,NA,NA,4:6,NA,NA,NA,7:9,NA)
cond <- !is.na(vec)
result = list(1:3,4:6,7:9)
example2:
vec_2 <- c(3:1,11:13,6:4,14:16,9:7,20)
cond_2 <- vec_2 < 10
results_2 = list(3:1,6:4,9:7)
It would be great to have a general solution for a vector vec and a relating condition cond.
My best try:
res <- split(vec,data.table::rleidv(cond))
odd <- as.logical(seq_along(res)%%2)
res[if(cond[1])odd else !odd]
I guess this should work generally:
> split(vec[cond], data.table::rleid(cond)[cond])
$`1`
[1] 1 2 3
$`3`
[1] 4 5 6
$`5`
[1] 7 8 9
Let's make it a function:
> f <- function(vec, cond) split(vec[cond], data.table::rleid(cond)[cond])
> f(vec_2, cond_2)
$`1`
[1] 3 2 1
$`3`
[1] 6 5 4
$`5`
[1] 9 8 7
Here is a base R option with rle
grp <- with(rle(cond), rep(seq_along(values) * NA^ !values, lengths))
split(vec[cond], grp[cond])
#$`1`
#[1] 1 2 3
#$`3`
#[1] 4 5 6
#$`5`
#[1] 7 8 9
Similarly with 'vec_2'
grp <- with(rle(cond_2), rep(seq_along(values) * NA^ !values, lengths))
split(vec_2[cond_2], grp[cond_2])
#$`1`
#[1] 3 2 1
#$`3`
#[1] 6 5 4
#$`5`
#[1] 9 8 7
Or create a grouping variable with cumsum and diff
grp <- cumsum(c(TRUE, diff(cond) < 0)) * NA^ is.na(vec)
Assume I have the following list:
list(c(1:5,NA,NA),NA,c(NA,6:10))
[[1]]
[1] 1 2 3 4 5 NA NA
[[2]]
[1] NA
[[3]]
[1] NA 6 7 8 9 10
I want to replace all NAs with 0:
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
I was originally thinking is.na would be involved, but couldn't get it to affect all list elements. I learned from the related question (Remove NA from list of lists), that using lapply would allow me to apply is.na to each element, but that post demonstrates how to remove (not replace) NA values.
How do I replace NA values from multiple list elements?
I've tried for loops and ifelse approaches, but everything I've tried is either slow, doesn't work or just plain clunky. There's got to be a simple way to do this with an apply function...
And there is!
Here's a simple lapply approach using the replace function:
L1 <-list(c(1:5,NA,NA),NA,c(NA,6:10))
lapply(L1, function(x) replace(x,is.na(x),0))
With the desired result:
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
There are multiple ways to do this:
using map from purrrr package.
lt <- list(c(1:5,NA,NA),NA,c(NA,6:10))
lt %>%
map(~replace(., is.na(.), 0))
#output
[[1]]
[1] 1 2 3 4 5 0 0
[[2]]
[1] 0
[[3]]
[1] 0 6 7 8 9 10
kk<- list(c(1:5,NA,NA),NA,c(1,6:10))
lapply(kk, function(i)
{ p<- which(is.na(i)==TRUE)
i[p] <- 0
i
})
Edited upon Gregor's commment
lapply(kk, function(i) {i[is.na(i)] <- 0; i})
I've decided to benchmark the various lapply approaches mentioned:
lapply(Lt, function(x) replace(x,is.na(x),0))
lapply(Lt, function(x) {x[is.na(x)] <- 0; x})
lapply(Lt, function(x) ifelse(is.na(x), 0, x))
Benchmarking code:
Lt <- lapply(1:10000, function(x) sample(c(1:10000,rep(NA,1000))) ) ##Sample list
elapsed.time <- data.frame(
m1 = mean(replicate(25,system.time(lapply(Lt, function(x) replace(x,is.na(x),0)))[3])),
m2 = mean(replicate(25,system.time(lapply(Lt, function(x) {x[is.na(x)] <- 0; x}))[3])),
m3 = mean(replicate(25,system.time(lapply(Lt, function(x) ifelse(is.na(x), 0, x)))[3]))
)
Results:
Function Average Elapsed Time
lapply(Lt, function(x) replace(x,is.na(x),0)) 0.8684
lapply(Lt, function(x) {x[is.na(x)] <- 0; x}) 0.8936
lapply(Lt, function(x) ifelse(is.na(x), 0, x)) 8.3176
The replace approach is fastest followed closely by the [] approach. The ifelse approach is 10x slower.
This will deal with any list depth and structure:
x <- eval(parse(text=gsub("NA","0",capture.output(dput(a)))))
# [[1]]
# [1] 1 2 3 4 5 0 0
#
# [[2]]
# [1] 0
#
# [[3]]
# [1] 0 6 7 8 9 10
Try this:
lapply(enlist, function(x) { x[!is.na(x)]})
where:
enlist <- list(c(1:5,NA,NA),NA,c(NA,6:10))
This yields:
[[1]]
[1] 1 2 3 4 5
[[2]]
logical(0)
[[3]]
[1] 6 7 8 9 10
Assume that this is my list
a <- list(c(1,2,4))
a[[2]] <- c(2,10,3,2,7)
a[[3]] <- c(2, 2, 14, 5)
How do I subset this list to exclude all the 2's. How do I obtain the following:
[[1]]
[1] 1 4
[[2]]
[1] 10 3 7
[[3]]
[1] 14 5
My current solution:
for(j in seq(1, length(a))){
a[[j]] <- a[[j]][a[[j]] != 2]
}
However, this approach feels a bit unnatural. How would I do the same thing with a function from the apply family?
Thanks!
lapply(a, function(x) x[x != 2])
#[[1]]
#[1] 1 4
#
#[[2]]
#[1] 10 3 7
#
#[[3]]
#[1] 14 5
Using lapply you can apply the subset to each vector in the list. The subset used is, x[x != 2].
Or use setdiff by looping over the list with lapply
lapply(a, setdiff, 2)
#[[1]]
#[1] 1 4
#[[2]]
#[1] 10 3 7
#[[3]]
#[1] 14 5
I have a range of values
c(1,2,3,4,5,8,9,10,13,14,15)
And I want to find the ranges where the numbers become discontinuous. All I want is this as output:
(1,5)
(8,10)
(13,15)
I need to find break points.
I need to do it in R.
Something like this?
x <- c(1:5, 8:10, 13:15) # example data
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range)
# [[1]]
# [1] 1 5
#
# [[2]]
# [1] 8 10
#
# [[3]]
# [1] 13 15
Another example:
x <- c(1, 5, 10, 11:14, 20:21, 23)
unname(tapply(x, cumsum(c(1, diff(x)) != 1), range))
# [[1]]
# [1] 1 1
#
# [[2]]
# [1] 5 5
#
# [[3]]
# [1] 10 14
#
# [[4]]
# [1] 20 21
#
# [[5]]
# [1] 23 23
x <- c(1:5, 8:10, 13:15)
rr <- rle(x - seq_along(x))
rr$values <- seq_along(rr$values)
s <- split(x, inverse.rle(rr))
s
# $`1`
# [1] 1 2 3 4 5
#
# $`2`
# [1] 8 9 10
#
# $`3`
# [1] 13 14 15
## And then to get *literally* what you asked for:
cat(paste0("(", gsub(":", ",", sapply(s, deparse)), ")"), sep="\n")
# (1,5)
# (8,10)
# (13,15)
I published seqle which will do this for you in one line. You can load the package cgwtools or search SO for the code, as it's been posted a couple times.
Assuming that you don't care about the exact output and are looking for the min and max of each range, you can use diff/cumsum/range as follows:
x <- c(1:5, 8:10, 13:15)
x. <- c(0, cumsum( diff(x)-1 ) )
lapply( split(x, x.), range )