Merge elements of dataframe in list based on partial match of names - r

I have a list:
lst <- list(a1=dfa1, a2=dfa2, b1=dfb1, b2=dfb2)
dfa1 <- data.frame(x=c(1:5), y=c(2, 5, 7, 9, 10))
dfa2 <- data.frame(x=c(1:6), y=c(3, 8, 1, 2, 4, 13))
dfb1 <- data.frame(x=c(1:4), y=c(7, 9, 3, 2))
dfb2 <- data.frame(x=c(1:7), y=c(9, 3, 5, 1, 7, 9, 11))
Base on the partial element match 'a' and 'b', I want column bind the dataframem and the new list should look like below:
new_list
$a
x y1 y2
1 1 2 3
2 2 5 8
3 3 7 1
4 4 9 2
5 5 10 4
$b
x y1 y2
1 1 7 9
2 2 9 3
3 3 3 5
4 4 2 1

Here is a method with lapply and Reduce. lapply iterates through the letters "a" and "b" and applies Reduce to the list elements whose names contain the current letter. Reduce applies the merge function to the two data.frames, merging by the variable "x" and adding the desired suffixes with the given argument. Thanks to zx8754's suggestion, I added seq_along(grep(let, names(lst))) to allow the final names of the variables to increase by the number of group members.
myList <- lapply(c("a", "b"), function(let)
setNames(Reduce(function(x, y) merge(x, y, by="x"),
lst[grep(let, names(lst))]),
c("x", paste0("y", seq_along(grep(let, names(lst)))))))
[[1]]
x y1 y2
1 1 2 3
2 2 5 8
3 3 7 1
4 4 9 2
5 5 10 4
[[2]]
x y1 y2
1 1 7 9
2 2 9 3
3 3 3 5
4 4 2 1
To add names to the list it is probably easiest to do this afterward,
names(myList) <- c("a", "b")
You could also start with the vector
myVec <- c("a", "b")
and then use it in the lapply and in the names line.

Related

Is there a way to automatically append data frame columns below each other into one column within large list of data frames?

I have a large list with thousands of data frames included in it. These data frames have multiple columns each. Thereby, I want to automatically bind in each of these data frames the columns into one column. This means that they are appended below each other as shown below. Thereafter, I would transform the list to a data frame which would have varying column lengths due to the different number of columns within each element in the original list.
From this:
y1 y2
1 4
2 5
3 6
To this:
y1
1
2
3
4
5
6
This should be done for each element in the list, whereby the solution needs to take into account that there are thousands of different data frames, which cannot be mentioned individually (example):
df1 = data.frame(
X1 = c(1, 2, 3),
X1.2 = c(4, 5, 6)
)
df2 = data.frame(
X2 = c(7, 8, 9),
X2.2 = c(1, 4, 6)
)
df3 = data.frame(
X3 = c(3, 4, 1),
X3.2 = c(8, 3, 5),
X3.3 = c(3, 1, 9)
)
listOfDataframe = list(df1, df2, df3)
Final output:
df_final = data.frame(
X1 = c(1, 2, 3, 4, 5, 6),
X2 = c(7, 8, 9, 1, 4, 6),
X3 = c(3, 4, 1, 8, 3, 5, 3, 1, 9)
)
Another problem underlying this question is that there will be a differing number of rows, which I do not know how to account for in the data frame, as the columns need to have the same length.
Thank you in advance for your help, it is highly appreciated.
Structure of list within R:
We can unlist after looping over the list with lapply
lst1 <- lapply(listOfDataframe, \(x)
setNames(data.frame(unlist(x, use.names = FALSE)), names(x)[1]))
-output
lst1
[[1]]
X1
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
X2
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
X3
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9
If we need to convert the list to a single data.frame, use cbind.na from qPCR
do.call(qpcR:::cbind.na, lst1)
X1 X2 X3
1 1 7 3
2 2 8 4
3 3 9 1
4 4 1 8
5 5 4 3
6 6 6 5
7 NA NA 3
8 NA NA 1
9 NA NA 9
Here is a tidyverse solution:
library(dplyr)
library(purrr)
listOfDataframe %>%
map(~.x %>% stack(.)) %>%
map(~.x %>% select(-ind))
[[1]]
values
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
values
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
values
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9

Divide the number into different groups according to the adjacency relationship

I have a dataframe that stores adjacency relations. I want to divide numbers into different groups according to this dataframe. The dataframe are as follows:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df
from to
1 1 1
2 1 3
3 2 2
4 2 3
5 2 4
6 3 1
7 3 2
8 3 3
9 4 2
10 4 4
11 4 5
12 5 4
13 5 5
In above dataframe, number 1 has links with number 1 and 3, number 2 has links with number 2, 3, 4, so number 1 can not be in same group with number 3 and number 2 can not be in same group with number 3 and number 4. In the end, groups can be c(1, 2, 5) and c(3, 4).
I wonder how to program it?
First replace the values of to with NA when from and to are equal.
df2 <- transform(df, to = replace(to, from == to, NA))
Then recursively bind each row of the data if from of the latter row has not appeared in to of the former rows.
Reduce(function(x, y) {
if(y$from %in% x$to) x else rbind(x, y)
}, split(df2, 1:nrow(df2)))
# from to
# 1 1 NA
# 2 1 3
# 3 2 NA
# 4 2 3
# 5 2 4
# 12 5 4
# 13 5 NA
Finally, you could extract unique elements for the both columns to get the two groups.
The overall pipeline should be
df |>
transform(to = replace(to, from == to, NA)) |>
(\(dat) split(dat, 1:nrow(dat)))() |>
Reduce(f = \(x, y) if(y$from %in% x$to) x else rbind(x, y))
The answer of Darren Tsai has solved this problem, but with some flaw.
Following is a very clumsy solution:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df.list = lapply(split(df,df$from), function(x){
x$to
})
group.idx = rep(1, length(unique(df$from)))
for (i in seq_along(df.list)) {
df.vec <- df.list[[i]]
curr.group = group.idx[i]
remain.vec = setdiff(df.vec, i)
for (j in remain.vec) {
if(group.idx[j] == curr.group){
group.idx[j] = curr.group + 1
}
}
}
group.idx
[1] 1 1 2 2 1

Sorting specific columns of a dataframe by their names in R

df is a test dataframe and I need to sort the last three columns in ascending order (without hardcoding the order).
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
Desired output:
> df
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
I'm aware of the order() function but I can't seem to find the right way to implement it to get the desired output.
Update:
Base R:
cbind(df[1:3],df[4:6][,order(colnames(df[4:6]))])
First answer:
We could use relocate from dplyr:
https://dplyr.tidyverse.org/reference/relocate.html
It is configured to arrange columns:
Here we relocate by the index.
We take last (index = 6) and put it before (position 5, which is C)
library(dplyr)
df %>%
relocate(6, .before = 5)
An alternative:
library(dplyr)
df %>%
select(order(colnames(df))) %>%
relocate(4:6, .before = 1)
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
In base R, a selection on the first columns then sort the last 3 names :
df[, c(names(df)[1:(ncol(df)-3)], sort(names(df)[ncol(df)-2:0]))]
We want to reorder the columns based on the column names, so if we use names(df) as the argument to order, we can reorder the data frame as follows.
The complicating factor is that order() returns a vector of numbers, so if we want to reorder only a subset of the column names, we'll need an approach that retains the original sort order for the first three columns.
We accomplish this by creating a vector of the first 3 column names, the sorted remaining column names using a function that returns the values rather than locations in the vector, and then use this with the [ form of the extract operator.
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
df[,c(names(df[1:3]),sort(names(df[4:6])))]
...and the output:
> df[,c(names(df[1:3]),sort(names(df[4:6])))]
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
to_order <- seq(ncol(df)) > ncol(df) - 3
df[order(to_order*order(names(df)))]
#> X Z Y A B C
#> 1 1 1 1 1 1 1
#> 2 2 2 2 2 2 2
#> 3 3 3 3 3 3 3
#> 4 4 4 4 4 4 4
#> 5 5 5 5 5 5 5
Created on 2021-12-24 by the reprex package (v2.0.1)

R - list of dataframes - how to add columns

I have a list of dataframes (my.list)
d1 <- data.frame(ref = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(7, 8, 9), y4 = c(10, 11, 12))
d2 <- data.frame(ref = c(3, 2, 1), y2 = c(6, 5, 4), y3 = c(9, 8, 1))
my.list <- list(d1, d2)
d1
ref y2 y3 y4
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
Now I want to add some columns with absolute difference values to each of the dataframes in this list. I would use the following for loop to do this for dataframe d1
for (i in names(d1)[2:length(names(d1))]){
d1[[paste(i, 'abs_diff', sep="_")]] <- abs(d1[,i]-d1[,2])
}
d1 then looks like this:
ref y2 y3 y4 y2_abs_diff y3_abs_diff y4_abs_diff
1 1 4 7 10 0 3 6
2 2 5 8 11 0 3 6
3 3 6 9 12 0 3 6
But how can I now do this in one shot for all dataframes of my.list? I know I should be using 'lapply' for this, but I can't get it to work.
Wee can use lapply to loop over the list and create the new columns by assignment
my.list1 <- lapply(my.list, function(x) {
x[paste0(names(x)[2:length(x)], "abs_diff")] <- abs(x[-1] - x[,2])
x
})
my.list1
#[[1]]
# ref y2 y3 y4 y2abs_diff y3abs_diff y4abs_diff
#1 1 4 7 10 0 3 6
#2 2 5 8 11 0 3 6
#3 3 6 9 12 0 3 6
#[[2]]
# ref y2 y3 y2abs_diff y3abs_diff
#1 3 6 9 0 3
#2 2 5 8 0 3
#3 1 4 1 0 3
NOTE: When there is a single column to take the difference, due to recycling it will recycle the values to do the operation in each of the columns. Otherwise, we can either make the dimensions same by replicating the column or loop (as in the OP's post)

Fill array in for-loop with sequences of different lengths [duplicate]

This question already has answers here:
Generate a sequence of numbers with repeated intervals
(6 answers)
Closed 5 years ago.
I've got some struggle with a small issue. What I want to get is a dim=1 array to be filled up with help of this for-loop.
Minimal-Example (it's not working!):
Numbers <- seq(1,5)
Result <- array(NA)
for(n in Numbers){
Result[n] <- seq(n,5)
# The Result array should be like this:
# (1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5)
}
I guess there a two problems:
The Result[n] don't have the same length
The index n in Result[n] is wrong. Actually, it should be dynamic, thus, change with every new n.
Can you guys help me?
Thank you!
We can do this with sapply
unlist(sapply(Numbers, function(x) seq(x, 5)))
#[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Or using the for loop
Result <- c()
for(n in Numbers){
Result <- c(Result, seq(n, 5))
}
Result
#[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Using sequence and rep:
n <- 5
sequence(n:1) + rep(0:(n-1), n:1)
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
You may also create an 'oversized' matrix and select the lower triangle:
m <- matrix(c(NA, 1:n), nrow = n + 1, ncol = n + 1)
m[lower.tri(m)]
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

Resources