Suppose I have a list of length 2, within which is another list of length 2, within which there is a data frame of numbers coded as either 0, 1 or 2 (bear with me!):
set.seed(42)
l1<-data.frame(sample(0:2, 5, replace = TRUE))
l2<-data.frame(sample(0:2, 5, replace = TRUE))
l<-list(l1,l2)
ll<-list(list(l,l), list(l,l))
I need to count the number of times either 1 or 2 appears within each data frame. I then need to sum these counts across all counts at the level above.
So for ll[[1]][[1]][[1]] the count would be 1, for ll[[1]][[1]][[2]] the count would be 4. Across those two dataframes the sum would be 5.
To give a more plain-English description of the real data I'm working with: the top level is the number of species (in this example, 2 species), the level below that is the year when data was recorded (in this example, data is collected in 2 different years). Below that is a location within which data are recorded. I need to know that, within years, how many times 1 or 2 appears across all locations (within that year).
There is perhaps a better way to describe this but so far it's eluding me. Any help would be appreciated.
We can use purrr functions.
library(purrr)
map(ll, function(x) transpose(x) %>% map(~sum(unlist(.x) != 0)))
#[[1]]
#[[1]][[1]]
#[1] 2
#[[1]][[2]]
#[1] 8
#[[2]]
#[[2]][[1]]
#[1] 2
#[[2]][[2]]
#[1] 8
A bit nested, but the solution should work:
lapply(ll,
function(l)
lapply(l,
function(li) sum(unlist(li) %in% 1:2)))
# [[1]]
# [[1]][[1]]
# [1] 5
#
# [[1]][[2]]
# [1] 5
#
#
# [[2]]
# [[2]][[1]]
# [1] 5
#
# [[2]][[2]]
# [1] 5
Related
I have two lists x and y created from
x1 = list(c(1,2,3,4))
x2 = list(c(seq(1, 10, by = 2)))
x<- list(x1,x2)
x
[[1]]
[[1]][[1]]
[1] 1 2 3 4
[[2]]
[[2]][[1]]
[1] 1 3 5 7 9
and y,
y1 = list(c(5, 6, 7, 8))
y2 = list(c(9, 7, 5, 3, 1))
y <- list(y1, y2)
y
[[1]]
[[1]][[1]]
[1] 5 6 7 8
[[2]]
[[2]][[1]]
[1] 9 7 5 3 1
So basically, I want to get matches of x into y so I should just get '1 3 5 7 9' actually being a match. I am also needing indexes.
I have tried, I want to match the values irrespective of the position each x[[ ]] with each y[[ ]].
Matches <- x[x %in% y]
IDX <- which(x %in% y)
This does not work....
I would like something that can return matches of the same elements irrespective of positions per each list. This would be a rough idea of what I need...
matches
[1] False
[1] 1 3 5 7 9
Thanks in advance, appreciate all the help.
Here is what you can do:
So, you have made list of lists, which is quite confusing to work with, you could have totally avoided using c, so you can have, x <- c(x1, x2) to get list of vectors, which is much more easy to work with.
But since you provided with list of lists, I will work with that.
Now back to solving your question:
flags <- lapply(Map(`%in%`, unlist(x, recursive = F), unlist(y, recursive=F)),all)
k <- lapply(1:length(x), function(i)ifelse(unlist(flags)[i] == TRUE,
list(unlist(x, recursive=F)[[i]]),
unlist(flags[i])))
unlist(k, recursive = F) #Final Output
Logic:
Mapping each items in list using %in% to see if an element
contains item of other elements, if all the elements are present it
will return a TRUE or a FALSE, In your case it would return FALSE and
TRUE respectively.
Here we are iterating to the lists of x by using flag as a filter
criteria you can make another list k, when value of flag created in
earlier step is TRUE it will copy back the contents of x, however
when FALSE it will remain as FALSE
Final step to your answer, unlist k again to convert into a list
of vectors using unlist with recursive = F.
Output:
# [[1]]
# [1] FALSE
# [[2]]
# [1] 1 3 5 7 9
I'd like to change a list into one cell of a data frame.
list <- list(1,2,3,4,5)
View(list)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
I'd like to transform this such that it looks like:
x
1 1,2,3,4,5
The reason is because I have a loop that is storing result in a list for each iteration, but I only want one cell per iteration.
There are other columns where for each iteration, there is only one result. So saving that in a data frame is easy. But then for the metric with multiple results, I don't want multiple columns or rows.
So I will have two data frames that I can use cbind on such that my final data frame will look like:
x y
1 1,2,3,4,5 a
2 5,4,3 b
You can easily achieve that by unlist and paste, i.e.,
data.frame(x = paste(l1, collapse = ','))
# x
#1 1,2,3,4,5
or simply (thanks #David)
data.frame(x = toString(list))
# x
#1 1, 2, 3, 4, 5
On a side note, avoid naming your lists 'list' as there is a function called list in R
say I have vector x
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol <- 0.4
how do I get the indices of the groups of elements that are 'unique' within the tolerance range (tol) as in the list below. I don't know how many of these groups there are beforehand.
[[1]]
[1] 1 2 3 5
[[2]]
[1] 4 6
[[3]]
[1] 7
thanks
Not 100% reliable, since it uses unique on lists, but you can try:
unique(apply(outer(x,x,function(a,b) abs(a-b)<tol),1,which))
#[[1]]
#[1] 1 2 3 5
#
#[[2]]
#[1] 4 6
#
#[[3]]
#[1] 7
The point #Roland raised in the comments showed that there is some ambiguity in your requirements. For instance if x<-c(1, 1.3, 1.6), my line gives three groups: 1-2, 2-3 and 1-2-3. This because, from the 1 point of view, it is similar only to 1.3, but from 1.3 point of view, it is similar to both 1 and 1.6.
An alternative using nn2 from RANN to find nearest neighbors within radius for clustering:
library(RANN)
x <- c(1, 1, 1.1, 2, 1, 2.1, 2.6)
tol=0.4
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2 3 5
##
##[[2]]
##[1] 4 6
##
##[[3]]
##[1] 7
x <- c(1, 1.3, 1.6)
nn <- nn2(x,x,k=length(x),searchtype="radius",radius=tol)
m <- unique(apply(nn$nn.idx,1,sort), MARGIN=2)
sapply(seq_len(ncol(m)), function(i) m[which(m[,i] > 0),i])
##[[1]]
##[1] 1 2
##
##[[2]]
##[1] 1 2 3
##
##[[3]]
##[1] 2 3
Notes:
The call to nn2 finds all nearest neighbors for each element of x with respect to all elements of x within a radius equalling the tol. The result nn$nn.idx is a matrix whose rows contain the indices that are nearest neighbors for each element in x. The matrix is dense and filled with zeroes as needed.
Clustering is performed by sorting each row so that unique rows can be extracted. The output m is a matrix where each column contains the indices in a cluster. Again, this matrix is dense and filled with zeroes as needed.
The resulting list is extracted by subsetting each column to remove the zero entries.
This is likely more efficient for large x because nn2 uses a KD-Tree, but it suffers from the same issue for elements that overlap (with respect to the tolerance) as pointed out by nicola.
Maybe it's a hammer to kill a mosquito, but i thought of univariate density clustering: the dbscan library enables you to do exactly that:
library(dbscan)
groups <- dbscan(as.matrix(x), eps=tol, minPts=1)$cluster
#### [1] 1 1 1 2 1 2 3
You don't neek to know in advance the number of groups.
It gives you the cluster number in output but you can if you prefer, take the groups means and round them to the closest integer. Once you've got this, you generate the list for instance like this:
split(seq_along(x), groups)
#### $`1`
#### [1] 1 2 3 5
#### ...
Edit: Behaviour with overlapping:
This algo attributes the same group to all elements that are within the range of tolerance of one other (works by proximity). So you might end up with fewer groups than expected if there is overlapping.
Here is another attempt with cut function from base R. We first try to create the range vector named sq and then go through x elements that falls within any specific range.
sq <- seq(min(x)-tol,max(x)+tol*2,tol*2)
# [1] 0.6 1.4 2.2 3.0
sapply(1:(length(sq)-1), function(i) which(!is.na(cut(x, breaks =c(sq[i], sq[i+1])))))
# [[1]]
# [1] 1 2 3 5
# [[2]]
# [1] 4 6
# [[3]]
# [1] 7
It does not produce any duplicate. (no need to use unique as it is the case for #nicola's answer)
It works as follows, in sapply, first we search for elements within the range [0.6, 1.4], then for [1.4, 2.2] and finally [2.2, 3.0].
Say I have a very simple dataframe:
DF <- data.frame(col1=c("a", "a", "b", "b"), col2=c(1, 2, 3, 4))
How can I end up with a list that looks like:
$a
[1] 1 2
$b
[1] 3 4
More importantly, how is this generalizable to some unknown number of groups, beyond a and b?
I first thought I could use something like group_by from the dplyr package, but it only seems to be useful if you are going to then summarise or something along those lines.
I think the best idea would be to use lapply but I'm not exactly sure how to do the grouping.
Any advice is appreciated.
Another option would be using split
with(DF, split(col2, col1))
# $a
# [1] 1 2
#
# $b
# [1] 3 4
Or, using indexes (per #jorans comment)
split(DF[[2]], DF[[1]])
Or
split(DF[, 2], DF[, 1])
I think this does what you want:
as.list(unstack(DF,col2~col1));
## $a
## [1] 1 2
##
## $b
## [1] 3 4
Apologies in advance if this has been answered but I've not figured out what to search for.
Say you have a nested list object of the following form:
Ob1 <- list(
A=vector("list", 5),
B=vector("list", 5)
)
sub <- c(2,4)
Is there any non-messy way to create a new object Ob2 which contains the same nested structure as Ob1 but only those elements of A and B as indexed by sub? Ideally this is something I'd like that could be generalised to something more than two levels deep.
Many thanks, Jon
Here is one approach:
# Fill the specified list elements with some random values
# This makes it easier to check that really the right elements have been extracted
Ob1[["A"]][[2]] <- 2
Ob1[["A"]][[4]] <- 4
Ob1[["B"]][[2]] <- 22
Ob1[["B"]][[4]] <- 44
# Extract the specified list elements
new.list <- lapply(Ob1, function(x)x[sub])
# > new.list
# $A
# $A[[1]]
# [1] 2
#
# $A[[2]]
# [1] 4
#
#
# $B
# $B[[1]]
# [1] 22
#
# $B[[2]]
# [1] 44