Averaging list duplicated elements - r

I have the following list containing duplicate names
> l
$A
[1] 2
$A
[1] 4
$B
[1] 10
I can't find a way to merge the "A" elements into a single "A" averaging the value of these elements. The resulting list should be as follows
> l
$A
[1] 3
$B
[1] 10
Is there a way to produce this list?

Here is a base R option with aggregate
aggregate(values ~ ind, stack(li), FUN = mean)
If we need it in a list, then do a split and loop through the list to get the mean
lapply(split(li, names(li)), function(x) mean(unlist(x)))
#$A
#[1] 3
#$B
#[1] 2
data
li <- list(A = 2, A = 4, B = 2)

Using tidyverse:
library(tidyverse)
li <- list(A = 2, A = 4, B = 2)
tibble(key = names(li), value = unlist(li)) %>%
group_by(key) %>%
summarize(mean = mean(value))

Related

Find unique level of list of data set each column

I have a list of 18 datasets, each dataset has some columns, how I write a loop to find the intersect by the index of column, and return list of index of column.
df1 <- data.frame(id = c(1:5), loc = c("a","b","c","a","b"))
df2 <- data.frame(id = c(3:7), ta = c("c","b","d","a","b"))
df3 <- data.frame(id = c(1:5), az = c("d","a","e","d","b"))
df <- list(df1, df2, df3)
df <- lapply(df, function(i) lapply(i, function(j) as.character(j)))
intersect(df[[1]][1], df[[2]][1], df[[3]][1])
intersect(df[[1]][2], df[[2]][2], df[[3]][2])
With tidyverse, we can use map/reduce
library(purrr)
library(dplyr)
map(df, pull, 1) %>%
reduce(intersect)
#[1] 3 4 5
Or as a function
f1 <- function(lstA, ind) {
map(lstA, pull, ind) %>%
reduce(intersect)
}
f1(df, 1)
#[1] 3 4 5
f1(df, 2)
#[1] "a" "b"
You may use Reduce on the intersect function and the [ in an sapply to choose sub list number.
Single:
Reduce(intersect, sapply(df, `[`, 1))
# [1] "3" "4" "5"
Reduce(intersect, sapply(df, `[`, 2))
# [1] "a" "b"
Or altogether:
lapply(1:2, function(i) Reduce(intersect, sapply(df, `[`, i)))
# [[1]]
# [1] "3" "4" "5"
#
# [[2]]
# [1] "a" "b"

How to remove duplicate elements from two lists (pairwise)?

I have two very large lists (13000) elements. I would like to remove the duplicates pair-wise, i.e. remove object i in both lists if we find the same as object j.
The function unique() works very well for a single list, but does not work pairwise.
a = matrix(c(50,70,45,89), ncol = 2)
b = matrix(c(45,86), ncol = 2)
c = matrix(c(20,35), ncol = 2)
df1 = list(a,b,c)
df2 = list(a,b,a)
df3 = cbind(df1,df2)
v = unique(df3, incomparables = FALSE)
In the end, the expected result would be df1 = list(c) and df2 = list(a). Do you have a good approach for this? Thank you a lot!
If you only have single element for each component of your list, then you can:
df1 <- list("a", "b", "c")
df2 <- list("a", "b", "a")
comp <- unlist(df1) != unlist(df2)
df1[comp]
[[1]]
[1] "c"
df2[comp]
[[1]]
[1] "a"
is that what you were looking for?
a more generic (whatever you'd have in your lists) solution using purrr would be:
comp2 <- !purrr::map2_lgl(df1, df2, identical)
df1[comp2]
[[1]]
[1] "c"
df2[comp2]
[[1]]
[1] "a"
You can try
Filter(length, Map(function(x, y) x[x != y], df1, df2))
#[[1]]
#[1] "c"
Filter(length, Map(function(x, y) x[x != y], df2, df1))
#[[1]]
#[1] "a"

Making new dataframes from old dataframes by column number

I'm trying to re-organize my dataframes by Column orders
for Example
x <- data.frame("A" = c(1,1), "B" = c(2,2), "C" = c(3,3))
y <- data.frame("A" = c(2,2), "B" = c(3,3), "C" = c(4,4))
z <- data.frame("A" = c(3,3), "B" = c(4,4), "C" = c(5,5))
Say I have dataframes as above.
What I want to do is make new dataframes by column orders of those above dataframes. (Simply put, I want to put all the "A"s ,"B"s and "C"s, to 3 new dataframes.
the below dataframes are my wanted results
a <- data.frame("A" = c(1,1), "A" = c(2,2), "A" = c(3,3))
b <- data.frame("B" = c(2,2), "B" = c(3,3), "B" = c(4,4))
c <- data.frame("C" = c(3,3), "C" = c(4,4), "C" = c(5,5))
We can do this with tidyverse
library(tidyverse)
list(x, y, z) %>%
transpose %>%
map(~ do.call(cbind, .x))
Or with base R
lapply(names(x), function(nm) cbind(x[, nm], y[, nm], z[, nm]))
Assuming you have equal number of columns in all the dataframes, one way is to use lapply over list of dataframes and subset them sequentially.
lst1 <- list(x, y, z)
lapply(seq_len(ncol(x)), function(i) cbind.data.frame(lapply(lst1, `[`, i)))
#[[1]]
# A A A
#1 1 2 3
#2 1 2 3
#[[2]]
# B B B
#1 2 3 4
#2 2 3 4
#[[3]]
# C C C
#1 3 4 5
#2 3 4 5
If your dataframes are not already sorted by names you might want to do that first.
lst1 <- lapply(list(x, y, z), function(i) i[order(names(i))])
We can also use purrr using the same logic
library(purrr)
map(seq_len(ncol(x)), ~cbind.data.frame(map(lst1, `[`, .)))

Summing a dataframe with lapply

Here I'm attempting to sum the columns for c1,c2,c3 and add total to a new column in res dataframe :
res <- data.frame("ID" = c(1,2), "c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
res_subset <- data.frame(res$c1 , res$c2 , res$c3)
tr <- t(res_subset)
s1 <- lapply(tr , function(x){
sum(x)
})
s1 contains :
s1
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 5
[[4]]
[1] 2
[[5]]
[1] 4
[[6]]
[1] 6
I take the transpose of the columns to be summed ( tr <- t(res_subset) ) as lapply executes function against each column but I'm attempting to execute function against row.
Is there an issue with how I take transpose as this appears to work for simpler example :
res1 <- data.frame("c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
lapply(res1 , function(x){
sum(x)
})
returns :
$c1
[1] 3
$c2
[1] 7
$c3
[1] 11
If I understood right what you need, just use rowSums() function.
res$sum <- rowSums(res[,2:4])
The function sum returns a scalar, which is not what you want here. Instead, col1 + col2 + ... gives the desired result. So you can use Reduce in combination with +:
res$sum <- Reduce(`+`, res[, c('c1','c2','c3')])
The + operator must be quoted with backticks, since we are using it as a function. (I think quoting with normal quotation marks is OK too.)
rowSums also works, but my understanding is that it will create an intermediate matrix, which is not efficient.

How to extract the non-empty elements of list in R?

I have very big list, but some of the elements(positions) are NULL, means nothing inside there.
I want just extract the part of my list, which is non-empty. Here is my effort, but I faced with error:
ind<-sapply(mylist, function() which(x)!=NULL)
list<-mylist[ind]
#Error in which(x) : argument to 'which' is not logical
Would someone help me to implement it ?
You can use the logical negation of is.null here. That can be applied over the list with vapply, and we can return the non-null elements with [
(mylist <- list(1:5, NULL, letters[1:5]))
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# NULL
# [[3]]
# [1] "a" "b" "c" "d" "e"
mylist[vapply(mylist, Negate(is.null), NA)]
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# [1] "a" "b" "c" "d" "e"
Try:
myList <- list(NULL, c(5,4,3), NULL, 25)
Filter(Negate(is.null), myList)
If you don't care of the result structure , you can just unlist:
unlist(mylist)
What the error means is that your brackets are not correct, the condition you want to test must be in the which function :
which(x != NULL)
One can extract the indices of null enteries in the list using "which" function and not include them in the new list by using "-".
new_list=list[-which(is.null(list[]))]
should do the job :)
Try this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_if(is.null, ~ NA_character_) %>% #convert NULL into NA
is.na() %>% #find NA
`!` %>% #Negate
which() #get index of Non-NULLs
or even this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_lgl(is.null) %>%
`!` %>% #Negate
which()
MyList <- list(NULL, c(5, 4, 3), NULL, NULL)
[[1]]
NULL
[[2]]
[1] 5 4 3
[[3]]
NULL
[[4]]
NULL
MyList[!unlist(lapply(MyList,is.null))]
[[1]]
[1] 5 4 3

Resources