I have the following list containing duplicate names
> l
$A
[1] 2
$A
[1] 4
$B
[1] 10
I can't find a way to merge the "A" elements into a single "A" averaging the value of these elements. The resulting list should be as follows
> l
$A
[1] 3
$B
[1] 10
Is there a way to produce this list?
Here is a base R option with aggregate
aggregate(values ~ ind, stack(li), FUN = mean)
If we need it in a list, then do a split and loop through the list to get the mean
lapply(split(li, names(li)), function(x) mean(unlist(x)))
#$A
#[1] 3
#$B
#[1] 2
data
li <- list(A = 2, A = 4, B = 2)
Using tidyverse:
library(tidyverse)
li <- list(A = 2, A = 4, B = 2)
tibble(key = names(li), value = unlist(li)) %>%
group_by(key) %>%
summarize(mean = mean(value))
Related
I have a list of 18 datasets, each dataset has some columns, how I write a loop to find the intersect by the index of column, and return list of index of column.
df1 <- data.frame(id = c(1:5), loc = c("a","b","c","a","b"))
df2 <- data.frame(id = c(3:7), ta = c("c","b","d","a","b"))
df3 <- data.frame(id = c(1:5), az = c("d","a","e","d","b"))
df <- list(df1, df2, df3)
df <- lapply(df, function(i) lapply(i, function(j) as.character(j)))
intersect(df[[1]][1], df[[2]][1], df[[3]][1])
intersect(df[[1]][2], df[[2]][2], df[[3]][2])
With tidyverse, we can use map/reduce
library(purrr)
library(dplyr)
map(df, pull, 1) %>%
reduce(intersect)
#[1] 3 4 5
Or as a function
f1 <- function(lstA, ind) {
map(lstA, pull, ind) %>%
reduce(intersect)
}
f1(df, 1)
#[1] 3 4 5
f1(df, 2)
#[1] "a" "b"
You may use Reduce on the intersect function and the [ in an sapply to choose sub list number.
Single:
Reduce(intersect, sapply(df, `[`, 1))
# [1] "3" "4" "5"
Reduce(intersect, sapply(df, `[`, 2))
# [1] "a" "b"
Or altogether:
lapply(1:2, function(i) Reduce(intersect, sapply(df, `[`, i)))
# [[1]]
# [1] "3" "4" "5"
#
# [[2]]
# [1] "a" "b"
I have two very large lists (13000) elements. I would like to remove the duplicates pair-wise, i.e. remove object i in both lists if we find the same as object j.
The function unique() works very well for a single list, but does not work pairwise.
a = matrix(c(50,70,45,89), ncol = 2)
b = matrix(c(45,86), ncol = 2)
c = matrix(c(20,35), ncol = 2)
df1 = list(a,b,c)
df2 = list(a,b,a)
df3 = cbind(df1,df2)
v = unique(df3, incomparables = FALSE)
In the end, the expected result would be df1 = list(c) and df2 = list(a). Do you have a good approach for this? Thank you a lot!
If you only have single element for each component of your list, then you can:
df1 <- list("a", "b", "c")
df2 <- list("a", "b", "a")
comp <- unlist(df1) != unlist(df2)
df1[comp]
[[1]]
[1] "c"
df2[comp]
[[1]]
[1] "a"
is that what you were looking for?
a more generic (whatever you'd have in your lists) solution using purrr would be:
comp2 <- !purrr::map2_lgl(df1, df2, identical)
df1[comp2]
[[1]]
[1] "c"
df2[comp2]
[[1]]
[1] "a"
You can try
Filter(length, Map(function(x, y) x[x != y], df1, df2))
#[[1]]
#[1] "c"
Filter(length, Map(function(x, y) x[x != y], df2, df1))
#[[1]]
#[1] "a"
I'm trying to re-organize my dataframes by Column orders
for Example
x <- data.frame("A" = c(1,1), "B" = c(2,2), "C" = c(3,3))
y <- data.frame("A" = c(2,2), "B" = c(3,3), "C" = c(4,4))
z <- data.frame("A" = c(3,3), "B" = c(4,4), "C" = c(5,5))
Say I have dataframes as above.
What I want to do is make new dataframes by column orders of those above dataframes. (Simply put, I want to put all the "A"s ,"B"s and "C"s, to 3 new dataframes.
the below dataframes are my wanted results
a <- data.frame("A" = c(1,1), "A" = c(2,2), "A" = c(3,3))
b <- data.frame("B" = c(2,2), "B" = c(3,3), "B" = c(4,4))
c <- data.frame("C" = c(3,3), "C" = c(4,4), "C" = c(5,5))
We can do this with tidyverse
library(tidyverse)
list(x, y, z) %>%
transpose %>%
map(~ do.call(cbind, .x))
Or with base R
lapply(names(x), function(nm) cbind(x[, nm], y[, nm], z[, nm]))
Assuming you have equal number of columns in all the dataframes, one way is to use lapply over list of dataframes and subset them sequentially.
lst1 <- list(x, y, z)
lapply(seq_len(ncol(x)), function(i) cbind.data.frame(lapply(lst1, `[`, i)))
#[[1]]
# A A A
#1 1 2 3
#2 1 2 3
#[[2]]
# B B B
#1 2 3 4
#2 2 3 4
#[[3]]
# C C C
#1 3 4 5
#2 3 4 5
If your dataframes are not already sorted by names you might want to do that first.
lst1 <- lapply(list(x, y, z), function(i) i[order(names(i))])
We can also use purrr using the same logic
library(purrr)
map(seq_len(ncol(x)), ~cbind.data.frame(map(lst1, `[`, .)))
Here I'm attempting to sum the columns for c1,c2,c3 and add total to a new column in res dataframe :
res <- data.frame("ID" = c(1,2), "c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
res_subset <- data.frame(res$c1 , res$c2 , res$c3)
tr <- t(res_subset)
s1 <- lapply(tr , function(x){
sum(x)
})
s1 contains :
s1
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 5
[[4]]
[1] 2
[[5]]
[1] 4
[[6]]
[1] 6
I take the transpose of the columns to be summed ( tr <- t(res_subset) ) as lapply executes function against each column but I'm attempting to execute function against row.
Is there an issue with how I take transpose as this appears to work for simpler example :
res1 <- data.frame("c1" = c(1,2), "c2" = c(3,4), "c3" = c(5,6))
lapply(res1 , function(x){
sum(x)
})
returns :
$c1
[1] 3
$c2
[1] 7
$c3
[1] 11
If I understood right what you need, just use rowSums() function.
res$sum <- rowSums(res[,2:4])
The function sum returns a scalar, which is not what you want here. Instead, col1 + col2 + ... gives the desired result. So you can use Reduce in combination with +:
res$sum <- Reduce(`+`, res[, c('c1','c2','c3')])
The + operator must be quoted with backticks, since we are using it as a function. (I think quoting with normal quotation marks is OK too.)
rowSums also works, but my understanding is that it will create an intermediate matrix, which is not efficient.
I have very big list, but some of the elements(positions) are NULL, means nothing inside there.
I want just extract the part of my list, which is non-empty. Here is my effort, but I faced with error:
ind<-sapply(mylist, function() which(x)!=NULL)
list<-mylist[ind]
#Error in which(x) : argument to 'which' is not logical
Would someone help me to implement it ?
You can use the logical negation of is.null here. That can be applied over the list with vapply, and we can return the non-null elements with [
(mylist <- list(1:5, NULL, letters[1:5]))
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# NULL
# [[3]]
# [1] "a" "b" "c" "d" "e"
mylist[vapply(mylist, Negate(is.null), NA)]
# [[1]]
# [1] 1 2 3 4 5
# [[2]]
# [1] "a" "b" "c" "d" "e"
Try:
myList <- list(NULL, c(5,4,3), NULL, 25)
Filter(Negate(is.null), myList)
If you don't care of the result structure , you can just unlist:
unlist(mylist)
What the error means is that your brackets are not correct, the condition you want to test must be in the which function :
which(x != NULL)
One can extract the indices of null enteries in the list using "which" function and not include them in the new list by using "-".
new_list=list[-which(is.null(list[]))]
should do the job :)
Try this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_if(is.null, ~ NA_character_) %>% #convert NULL into NA
is.na() %>% #find NA
`!` %>% #Negate
which() #get index of Non-NULLs
or even this:
list(NULL, 1, 2, 3, NULL, 5) %>%
purrr::map_lgl(is.null) %>%
`!` %>% #Negate
which()
MyList <- list(NULL, c(5, 4, 3), NULL, NULL)
[[1]]
NULL
[[2]]
[1] 5 4 3
[[3]]
NULL
[[4]]
NULL
MyList[!unlist(lapply(MyList,is.null))]
[[1]]
[1] 5 4 3