Count if a particular element is within a given range? - r

I have a vector made up of lists of length 10.
I have two other vectors storing their lower and upper quantiles.
Is there a way to extract the data between the quantile for each list of 10?
Basically I am looking to see how many of these have a specific number.
sims is the vector with the data
so far I have tried to use the %in% (note- sims is the vector with lists))
for (i in 1:100){
a <- 80.0 %in% sims[[i]]
}
I was going to count how many of these are true and then count them however, this only returns false and also doesn't guarantee if it is in the range.
Is there an easier way than sorting each list by extracting relevant data then checking if it is has the value?

Since you don't provide a sample dataset here is a reproducible example based on some sample data I generate
set.seed(2018)
lst <- replicate(4, sample(10), simplify = FALSE)
qrt <- lapply(lst, quantile, probs = c(0.25, 0.75))
Here I've generated the 25% and 75% quantiles for every vector in list; the result is a list with as many elements as list.
We can now use Map to select only those entries from the list elements that fall within the quantile range
Map(function(x, y) x[x >= y[1] & x <= y[2]], lst, qrt)
#[[1]]
#[1] 4 5 7 6
#
#[[2]]
#[1] 4 6 5 7
#
#[[3]]
#[1] 6 5 4 7
#
#[[4]]
#[1] 4 7 6 5
To count the number of elements within the quantile range
Map(function(x, y) sum(x >= y[1] & x <= y[2]), lst, qrt)
#[[1]]
#[1] 4
#
#[[2]]
#[1] 4
#
#[[3]]
#[1] 4
#
#[[4]]
#[1] 4

Related

Get common values in three vectors at least repeated once between vectors

I want to compare an undetermined number of vectors and get the common values in at least two of them.
As an example, let’s just use three vectors:
x <- c(1,3,5,7,9)
y <- c(3,6,8,9,4)
z <- c(2,3,4,5,7,9)
Reduce(intersect, list(x,y,z))
# or
intersect(x, intersect(y,z))
[1] 3 9
Expected result is:
[1] 3 4 5 7 9
We may stack a named list to two column data.frame, get the table, check if the row wise sum is greater than or equal to 2, return the names that are TRUE for those
names(which(rowSums(table(stack(list(x= x, y = y, z = z))) > 0) >= 2))
[1] "3" "4" "5" "7" "9"
If we want to use pairwise intersect for all combinations, use combn and then Reduce with union
sort(Reduce(union, combn(list(x, y, z), 2,
FUN = function(x) intersect(x[[1]], x[[2]]), simplify = FALSE)))
[1] 3 4 5 7 9
You can put the vectors in one large vector:
vec <- unlist(list(x, y, z))
Or, if you may have duplicates within a single contributing vector:
vec <- unlist(lapply(list(x, y, z), unique))
Then, subset based on presence of duplicates (and sort if needed):
sort(unique(vec[duplicated(vec)]))
Output
[1] 3 4 5 7 9

Count within multiple nested lists R

Suppose I have a list of length 2, within which is another list of length 2, within which there is a data frame of numbers coded as either 0, 1 or 2 (bear with me!):
set.seed(42)
l1<-data.frame(sample(0:2, 5, replace = TRUE))
l2<-data.frame(sample(0:2, 5, replace = TRUE))
l<-list(l1,l2)
ll<-list(list(l,l), list(l,l))
I need to count the number of times either 1 or 2 appears within each data frame. I then need to sum these counts across all counts at the level above.
So for ll[[1]][[1]][[1]] the count would be 1, for ll[[1]][[1]][[2]] the count would be 4. Across those two dataframes the sum would be 5.
To give a more plain-English description of the real data I'm working with: the top level is the number of species (in this example, 2 species), the level below that is the year when data was recorded (in this example, data is collected in 2 different years). Below that is a location within which data are recorded. I need to know that, within years, how many times 1 or 2 appears across all locations (within that year).
There is perhaps a better way to describe this but so far it's eluding me. Any help would be appreciated.
We can use purrr functions.
library(purrr)
map(ll, function(x) transpose(x) %>% map(~sum(unlist(.x) != 0)))
#[[1]]
#[[1]][[1]]
#[1] 2
#[[1]][[2]]
#[1] 8
#[[2]]
#[[2]][[1]]
#[1] 2
#[[2]][[2]]
#[1] 8
A bit nested, but the solution should work:
lapply(ll,
function(l)
lapply(l,
function(li) sum(unlist(li) %in% 1:2)))
# [[1]]
# [[1]][[1]]
# [1] 5
#
# [[1]][[2]]
# [1] 5
#
#
# [[2]]
# [[2]][[1]]
# [1] 5
#
# [[2]][[2]]
# [1] 5

How can I remove elements by columns number from a list?

I've like to remove elements in a list, if the number of elements are smaller than 3.
For this I try:
#Create a list
my_list <- list(a = c(3,5,6), b = c(3,1,0), c = 4, d = NA)
my_list
$a
[1] 3 5 6
$b
[1] 3 1 0
$c
[1] 4
$d
[1] NA
# Thant I create a function for remove the elements by my condition:
delete.F <- function(x.list){
x.list[unlist(lapply(x.list, function(x) ncol(x)) < 3)]}
delete.F(my_list)
And I have as output:
Error in unlist(lapply(x.list, function(x) ncol(x)) < 3) :
(list) object cannot be coerced to type 'double'
Any ideas, please?
An option is to create a logical expression with lengths and use that for subsetting the list
my_list[lengths(my_list) >=3]
#$a
#[1] 3 5 6
#$b
#[1] 3 1 0
Note that in the example, it is a list of vectors and not a list of data.frame. the ncol/nrow is when there is a dim attribute - matrix checks TRUE for that, as do data.frame
If we want to somehow use lapply (based on some constraints), create the logic with length
unlist(lapply(my_list, function(x) if(length(x) >=3 ) x))
If we need to create the index with lapply, use length (but it would be slower than lengths)
my_list[unlist(lapply(my_list, length)) >= 3]
Here are few more options. Using Filter in base R
Filter(function(x) length(x) >=3, my_list)
#$a
#[1] 3 5 6
#$b
#[1] 3 1 0
Or using purrr's keep and discard
purrr::keep(my_list, ~length(.) >= 3)
purrr::discard(my_list, ~length(.) < 3)

Appending vector values to sublists

Let's assume we have a list with three sublists: list1 <- [[1,2],[4,5],[7,8]]
and a vector: vector1 <- c(3,6,9)
Is there a way in R, without using loops, to append vector's values to the list, so we could get the result list2 = [[1,2,3],[4,5,6],[7,8,9]]
?
Thanks for all comments
Use Map
Map(c, list1, vector1)
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 4 5 6
#[[3]]
#[1] 7 8 9
Or lapply
lapply(seq_along(list1), function(x) c(list1[[x]], vector1[[x]]))
The equivalent purrr variants can be
purrr::map2(list1, vector1, c)
purrr::map(seq_along(list1), ~c(list1[[.]], vector1[[.]]))
data
list1 <- list(c(1,2),c(4,5),c(7,8))
vector1 <- c(3,6,9)

Removing specific element from a list of vectors in R

Suppose I have a list of indices and values.
indx_list <- list(1,2,c(3,4),5,c(6,7,8))
val_list <- list(0.1,0.6,c(0.8,0.9),0.3,c(0.4,0.8,0.5))
I then want to update both lists by removing indices c(4,7) and the corresponding values c(0.9,0.5). This is pretty easily done using lapply and setdiff. For example:
indx_list_new <- lapply(indx_list,function(x) setdiff(x,c(4,7)))
val_list_new <- lapply(val_list,function(x) setdiff(x,c(0.9,0.5)))
However, I don't know beforehand what indices and corresponding values I will be removing.
set.seed(1234)
indx_flag <- sample(seq(8),2)
You can also see that some values are repeated (i.e. 0.8) so using setdiff might actually remove values at the wrong position.
Questions
1) I can still use lapply and setdiff to update indx_list, but how can I update the values in val_list?
2) Is lapply the most efficient solution here? I will have lists with thousands of elements, and each element can be a vector of hundreds of indices/values.
Edit
Each element in the list (highest level) actually has a particular meaning, so I'd like to keep the list structure.
Instead, arrange your data into a 'tidy' representation
df = data.frame(
indx = unlist(indx_list),
val = unlist(val_list),
grp = factor(rep(seq_along(indx_list), lengths(indx_list)))
)
where the operation is more-or-less transparent
base::subset(df, !indx %in% c(4, 7))
indx val grp
1 1 0.1 1
2 2 0.6 2
3 3 0.8 3
5 5 0.3 4
6 6 0.4 5
8 8 0.5 5
Using subset() is similar to df[!df$indx %in% c(4, 7), , drop = FALSE]. (I used factor() to allow for empty groups, i.e., levels with no corresponding values).
Here's an attempt using relist and Map to remove the same points:
Map(`[`, val_list, relist(!unlist(indx_list) %in% c(4,7), indx_list))
#[[1]]
#[1] 0.1
#
#[[2]]
#[1] 0.6
#
#[[3]]
#[1] 0.8
#
#[[4]]
#[1] 0.3
#
#[[5]]
#[1] 0.4 0.5

Resources