Making new dataframes from old dataframes by column number - r

I'm trying to re-organize my dataframes by Column orders
for Example
x <- data.frame("A" = c(1,1), "B" = c(2,2), "C" = c(3,3))
y <- data.frame("A" = c(2,2), "B" = c(3,3), "C" = c(4,4))
z <- data.frame("A" = c(3,3), "B" = c(4,4), "C" = c(5,5))
Say I have dataframes as above.
What I want to do is make new dataframes by column orders of those above dataframes. (Simply put, I want to put all the "A"s ,"B"s and "C"s, to 3 new dataframes.
the below dataframes are my wanted results
a <- data.frame("A" = c(1,1), "A" = c(2,2), "A" = c(3,3))
b <- data.frame("B" = c(2,2), "B" = c(3,3), "B" = c(4,4))
c <- data.frame("C" = c(3,3), "C" = c(4,4), "C" = c(5,5))

We can do this with tidyverse
library(tidyverse)
list(x, y, z) %>%
transpose %>%
map(~ do.call(cbind, .x))
Or with base R
lapply(names(x), function(nm) cbind(x[, nm], y[, nm], z[, nm]))

Assuming you have equal number of columns in all the dataframes, one way is to use lapply over list of dataframes and subset them sequentially.
lst1 <- list(x, y, z)
lapply(seq_len(ncol(x)), function(i) cbind.data.frame(lapply(lst1, `[`, i)))
#[[1]]
# A A A
#1 1 2 3
#2 1 2 3
#[[2]]
# B B B
#1 2 3 4
#2 2 3 4
#[[3]]
# C C C
#1 3 4 5
#2 3 4 5
If your dataframes are not already sorted by names you might want to do that first.
lst1 <- lapply(list(x, y, z), function(i) i[order(names(i))])
We can also use purrr using the same logic
library(purrr)
map(seq_len(ncol(x)), ~cbind.data.frame(map(lst1, `[`, .)))

Related

Find unique level of list of data set each column

I have a list of 18 datasets, each dataset has some columns, how I write a loop to find the intersect by the index of column, and return list of index of column.
df1 <- data.frame(id = c(1:5), loc = c("a","b","c","a","b"))
df2 <- data.frame(id = c(3:7), ta = c("c","b","d","a","b"))
df3 <- data.frame(id = c(1:5), az = c("d","a","e","d","b"))
df <- list(df1, df2, df3)
df <- lapply(df, function(i) lapply(i, function(j) as.character(j)))
intersect(df[[1]][1], df[[2]][1], df[[3]][1])
intersect(df[[1]][2], df[[2]][2], df[[3]][2])
With tidyverse, we can use map/reduce
library(purrr)
library(dplyr)
map(df, pull, 1) %>%
reduce(intersect)
#[1] 3 4 5
Or as a function
f1 <- function(lstA, ind) {
map(lstA, pull, ind) %>%
reduce(intersect)
}
f1(df, 1)
#[1] 3 4 5
f1(df, 2)
#[1] "a" "b"
You may use Reduce on the intersect function and the [ in an sapply to choose sub list number.
Single:
Reduce(intersect, sapply(df, `[`, 1))
# [1] "3" "4" "5"
Reduce(intersect, sapply(df, `[`, 2))
# [1] "a" "b"
Or altogether:
lapply(1:2, function(i) Reduce(intersect, sapply(df, `[`, i)))
# [[1]]
# [1] "3" "4" "5"
#
# [[2]]
# [1] "a" "b"

How to remove duplicate elements from two lists (pairwise)?

I have two very large lists (13000) elements. I would like to remove the duplicates pair-wise, i.e. remove object i in both lists if we find the same as object j.
The function unique() works very well for a single list, but does not work pairwise.
a = matrix(c(50,70,45,89), ncol = 2)
b = matrix(c(45,86), ncol = 2)
c = matrix(c(20,35), ncol = 2)
df1 = list(a,b,c)
df2 = list(a,b,a)
df3 = cbind(df1,df2)
v = unique(df3, incomparables = FALSE)
In the end, the expected result would be df1 = list(c) and df2 = list(a). Do you have a good approach for this? Thank you a lot!
If you only have single element for each component of your list, then you can:
df1 <- list("a", "b", "c")
df2 <- list("a", "b", "a")
comp <- unlist(df1) != unlist(df2)
df1[comp]
[[1]]
[1] "c"
df2[comp]
[[1]]
[1] "a"
is that what you were looking for?
a more generic (whatever you'd have in your lists) solution using purrr would be:
comp2 <- !purrr::map2_lgl(df1, df2, identical)
df1[comp2]
[[1]]
[1] "c"
df2[comp2]
[[1]]
[1] "a"
You can try
Filter(length, Map(function(x, y) x[x != y], df1, df2))
#[[1]]
#[1] "c"
Filter(length, Map(function(x, y) x[x != y], df2, df1))
#[[1]]
#[1] "a"

Averaging list duplicated elements

I have the following list containing duplicate names
> l
$A
[1] 2
$A
[1] 4
$B
[1] 10
I can't find a way to merge the "A" elements into a single "A" averaging the value of these elements. The resulting list should be as follows
> l
$A
[1] 3
$B
[1] 10
Is there a way to produce this list?
Here is a base R option with aggregate
aggregate(values ~ ind, stack(li), FUN = mean)
If we need it in a list, then do a split and loop through the list to get the mean
lapply(split(li, names(li)), function(x) mean(unlist(x)))
#$A
#[1] 3
#$B
#[1] 2
data
li <- list(A = 2, A = 4, B = 2)
Using tidyverse:
library(tidyverse)
li <- list(A = 2, A = 4, B = 2)
tibble(key = names(li), value = unlist(li)) %>%
group_by(key) %>%
summarize(mean = mean(value))

More efficient than nested loop in R

I am trying to break my habit of for loops by using apply but I've gotten stumped on this one. I have a for loop that collapses every two rows into one row for an object, obj.tmp(366 by 34343), but it is slow.
Here's a much shortened example:
df <- data.frame(X1 = letters[1:10], X2 = letters[11:20], stringsAsFactors = FALSE)
Thus:
> df
X1 X2
a k
b l
c m
d n
e o
f p
g q
h r
i s
j t
for(i in 1:(nrow(df)/2)){
df2[i,] <- apply( df[(i*2-1):(i*2),], 2, paste, collapse = "")
}
Output:
> df2
X1 X2
ab kl
cd mn
ef op
gh qr
ij st
Suggestions on a better method?
Based on your sample data, here is one possibility:
# Sample data
df <- data.frame(X1 = letters[1:10], X2 = letters[11:20], stringsAsFactors = FALSE);
do.call(rbind, lapply(split(df, gl(nrow(df) / 2, 2, nrow(df))), function(x) sapply(x, paste0, collapse = "")))
# X1 X2
#1 "ab" "kl"
#2 "cd" "mn"
#3 "ef" "op"
#4 "gh" "qr"
#5 "ij" "st"
Explanation: Split df every two rows and store in list, paste entries by column, and rbind into final object.
If you want to avoid rbinding the list element, you can also do:
t(sapply(split(df, gl(nrow(df) / 2, 2, nrow(df))), function(x) sapply(x, paste0, collapse = "")));
# X1 X2
#1 "ab" "kl"
#2 "cd" "mn"
#3 "ef" "op"
#4 "gh" "qr"
#5 "ij" "st"
We can use the aggregate function:
df1=cbind(df,id=rep(1:(nrow(df)/2)# Create a new df with an id that shows the rows to be combined
aggregate(.~id,df1,each=2)),paste0,collapse="")[-1]#Combine the rows
X1 X2
1 ab kl
2 cd mn
3 ef op
4 gh qr
5 ij st
You can do all this in one line:
aggregate(.~id,cbind(df,id=rep(1:(nrow(df)/2),each=2)),paste0,collapse="")[-1]
You can also try:
matrix(do.call(paste0,data.frame(matrix(unlist(df),,2,T))),,2)
[,1] [,2]
[1,] "ab" "kl"
[2,] "cd" "mn"
[3,] "ef" "op"
[4,] "gh" "qr"
[5,] "ij" "st"
Some thing like this ? If isn't, Can you be more clear? And pass the code to replicate what you are doing. But I hope this solves your problem.
df <- data.frame(X1 = letters[1:10], stringsAsFactors = FALSE)
df2 <- data.frame(X1 = character(), stringsAsFactors = FALSE)
sapply(1:round(nrow(df)/2), FUN = function(x) {
df2[x,] <<- paste(df[(x*2-1):(x*2),], collapse = "")
})
df2

Inserting values in one list into another list by index

I have two lists x and y, and a vector of indices where.
x <- list(a = 1:4, b = letters[1:6])
y <- list(a = c(20, 50), b = c("abc", "xyz"))
where <- c(2, 4)
I want to insert y into x at the indices in where, so that the result is
list(a = c(1,20,2,50,3,4), b = c("a", "abc", "b", "xyz", "c", "d", "e", "f"))
#$a
#[1] 1 20 2 50 3 4
#
#$b
#[1] "a" "abc" "b" "xyz" "c" "d" "e" "f"
I've been trying it with append, but it's not working.
lapply(seq(x), function(i) append(x[[i]], y[[i]], after = where[i]))
#[[1]]
#[1] 1 2 20 50 3 4
#
#[[2]]
#[1] "a" "b" "c" "d" "abc" "xyz" "e" "f"
This is appending at the wrong index. Plus, I want to retain the list names in the process. I also don't know if append is the right function for this, since I've literally never seen it used anywhere.
What's the best way to insert values from one list into another list using an index vector?
How about an mapply solution
x <- list(a = 1:4, b = letters[1:6])
y <- list(a = c(20, 50), b = c("abc", "xyz"))
where <- c(2, 4)
mapply(function(x,y,w) {
r <- vector(class(x), length(x)+length(y))
r[-w] <- x
r[w] <- y
r
}, x, y, MoreArgs=list(where), SIMPLIFY=FALSE)
which returns
$a
[1] 1 20 2 50 3 4
$b
[1] "a" "abc" "b" "xyz" "c" "d" "e" "f"
which seems to be the results you desire.
Here I created a APPEND function that is an iterative (via Reduce) version of append:
APPEND <- function(x, where, y)
Reduce(function(z, args)do.call(append, c(list(z), args)),
Map(list, y, where - 1), init = x)
Then you just need to call that function via Map:
Map(APPEND, x, list(where), y)

Resources