R - list of dataframes - how to add columns - r

I have a list of dataframes (my.list)
d1 <- data.frame(ref = c(1, 2, 3), y2 = c(4, 5, 6), y3 = c(7, 8, 9), y4 = c(10, 11, 12))
d2 <- data.frame(ref = c(3, 2, 1), y2 = c(6, 5, 4), y3 = c(9, 8, 1))
my.list <- list(d1, d2)
d1
ref y2 y3 y4
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
Now I want to add some columns with absolute difference values to each of the dataframes in this list. I would use the following for loop to do this for dataframe d1
for (i in names(d1)[2:length(names(d1))]){
d1[[paste(i, 'abs_diff', sep="_")]] <- abs(d1[,i]-d1[,2])
}
d1 then looks like this:
ref y2 y3 y4 y2_abs_diff y3_abs_diff y4_abs_diff
1 1 4 7 10 0 3 6
2 2 5 8 11 0 3 6
3 3 6 9 12 0 3 6
But how can I now do this in one shot for all dataframes of my.list? I know I should be using 'lapply' for this, but I can't get it to work.

Wee can use lapply to loop over the list and create the new columns by assignment
my.list1 <- lapply(my.list, function(x) {
x[paste0(names(x)[2:length(x)], "abs_diff")] <- abs(x[-1] - x[,2])
x
})
my.list1
#[[1]]
# ref y2 y3 y4 y2abs_diff y3abs_diff y4abs_diff
#1 1 4 7 10 0 3 6
#2 2 5 8 11 0 3 6
#3 3 6 9 12 0 3 6
#[[2]]
# ref y2 y3 y2abs_diff y3abs_diff
#1 3 6 9 0 3
#2 2 5 8 0 3
#3 1 4 1 0 3
NOTE: When there is a single column to take the difference, due to recycling it will recycle the values to do the operation in each of the columns. Otherwise, we can either make the dimensions same by replicating the column or loop (as in the OP's post)

Related

Is there a way to automatically append data frame columns below each other into one column within large list of data frames?

I have a large list with thousands of data frames included in it. These data frames have multiple columns each. Thereby, I want to automatically bind in each of these data frames the columns into one column. This means that they are appended below each other as shown below. Thereafter, I would transform the list to a data frame which would have varying column lengths due to the different number of columns within each element in the original list.
From this:
y1 y2
1 4
2 5
3 6
To this:
y1
1
2
3
4
5
6
This should be done for each element in the list, whereby the solution needs to take into account that there are thousands of different data frames, which cannot be mentioned individually (example):
df1 = data.frame(
X1 = c(1, 2, 3),
X1.2 = c(4, 5, 6)
)
df2 = data.frame(
X2 = c(7, 8, 9),
X2.2 = c(1, 4, 6)
)
df3 = data.frame(
X3 = c(3, 4, 1),
X3.2 = c(8, 3, 5),
X3.3 = c(3, 1, 9)
)
listOfDataframe = list(df1, df2, df3)
Final output:
df_final = data.frame(
X1 = c(1, 2, 3, 4, 5, 6),
X2 = c(7, 8, 9, 1, 4, 6),
X3 = c(3, 4, 1, 8, 3, 5, 3, 1, 9)
)
Another problem underlying this question is that there will be a differing number of rows, which I do not know how to account for in the data frame, as the columns need to have the same length.
Thank you in advance for your help, it is highly appreciated.
Structure of list within R:
We can unlist after looping over the list with lapply
lst1 <- lapply(listOfDataframe, \(x)
setNames(data.frame(unlist(x, use.names = FALSE)), names(x)[1]))
-output
lst1
[[1]]
X1
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
X2
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
X3
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9
If we need to convert the list to a single data.frame, use cbind.na from qPCR
do.call(qpcR:::cbind.na, lst1)
X1 X2 X3
1 1 7 3
2 2 8 4
3 3 9 1
4 4 1 8
5 5 4 3
6 6 6 5
7 NA NA 3
8 NA NA 1
9 NA NA 9
Here is a tidyverse solution:
library(dplyr)
library(purrr)
listOfDataframe %>%
map(~.x %>% stack(.)) %>%
map(~.x %>% select(-ind))
[[1]]
values
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
values
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
values
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9

Merging 2 datasets by calling on the row numbers (without using merge() or lookup functions)

Hi This is a problem that I run into often in R programing and am in need of simple solution from this community. In sort, the problem requires a lookup value to be returned to a dataframe. I would like to call on the rownumber of the lookup table
> x1 <- c(2, 3, 1, 5, 4)
> x2 <- c("a", "b", "c", "d", "e")
>
> set.seed(5)
> x3 <- round(runif (10, 1, 5))
>
> lookup.df <- data.frame(x1, x2)
> Data.df <- data.frame(x3)
> lookup.df
x1 x2
1 2 a
2 3 b
3 1 c
4 5 d
5 4 e
> Data.df
x3
1 2
2 4
3 5
4 2
5 1
6 4
7 3
8 4
9 5
10 1
Data.df$x2 <- df1 [ (matching row numbers from Data.df with lookup.df$x1) , 2 ]
In theory, the code should be able to generate a list that would look like
rows <- c(1, 5, 4, 1, 3, 5, 2, 5, 4, 3)
so that the following would result
> Data.df$x2 <- df1 [ rows , 2 ]
> Data.df
x3 x2
1 2 a
2 4 e
3 5 d
4 2 a
5 1 c
6 4 e
7 3 b
8 4 e
9 5 d
10 1 c
I appreciate an ideas. Thanks.
We can use a named vector to match
Data.df$x2 <- setNames(lookup.df$x2, lookup.df$x1)[as.character(Data.df$x3)]
-output
> Data.df
x3 x2
1 2 a
2 4 e
3 5 d
4 2 a
5 1 c
6 4 e
7 3 b
8 4 e
9 5 d
10 1 c
You may use match function -
Data.df$x2 <- lookup.df$x2[match(Data.df$x3, lookup.df$x1)]
# x3 x2
#1 2 a
#2 4 e
#3 5 d
#4 2 a
#5 1 c
#6 4 e
#7 3 b
#8 4 e
#9 5 d
#10 1 c
From the title of the post I understand that you don't want to use merge function but that would be the most straightforward solution.
merge(lookup.df, Data.df, by.x = 'x1', by.y = 'x3')

Replacing NA in for/if loop in R

I'm running into an unexpected challenge in R. In my dataset, there are NA in certain columns. Some of these NAs SHOULD be present (the values are truly missing), while others should be replaced with 0s. I used code like the following:
df1 <- data.frame(x = c(1, 2, 3, 4, 5), y = c(10, 10, NA, NA, 12), z = c(9, 9, 9, 9, 9))
for (i in nrow(df1)){
if(df1$x[i] > 3){
df1$y[i] = 0
df1$z[i] = 0
}
}
And obtained this output
x y z
1 1 10 9
2 2 10 9
3 3 NA 9
4 4 NA 9
5 5 0 0
The NA SHOULD be preserved in row 3, but the NA in row 4 should have been replaced with 0. Further, the z value in row 4 did not update. Any ideas as to what is happening?
You've used for i in nrow(df1) which evaluates to for i in 5. I'm guessing you meant to use for i in 1:nrow(df1), which would evaluate to for i in 1:5 and include all rows.
Don't do it this way, R isn't Python, you get your vectorized functions out of the box:
df1[df1$x > 3, c('y', 'z')] <- 0
df1
# x y z
# 1 1 10 9
# 2 2 10 9
# 3 3 NA 9
# 4 4 0 0
# 5 5 0 0

In R, how to group data by multiple columns in the descending order, then give every group an index starting from 1?

My data is like:
a <- data.frame(a1=c(2,2,1,1,2,2,3,3),
a2=c(5,4,2,2,5,5,6,6),
a3=c(3,1,5,5,7,7,8,8))
Then, i sort the data like:
aa <- a %>%
arrange(desc(a3),desc(a2),desc(a1))
The data looks like:
> aa
a1 a2 a3
1 3 6 8
2 3 6 8
3 2 5 7
4 2 5 7
5 1 2 5
6 1 2 5
7 2 5 3
8 2 4 1
Now i need to group the data by a3, a2 and a1. So, in aa, the rows 1 and 2 will be in one group, and row 3 and 4 will be in one group as well. Now I need to give every group an index, which starts from 1. So, the data should look like below:
> aa
a1 a2 a3 Index
1 3 6 8 1
2 3 6 8 1
3 2 5 7 2
4 2 5 7 2
5 1 2 5 3
6 1 2 5 3
7 2 5 3 4
8 2 4 1 5
So in summarizing, I need to arrange the data in the descending order first, then group it, then give every group an index starting from 1. Could anyone help me out here?
We could potentially use group_indices, but that would also have a reordering issue. Instead, an option is to paste (or str_c - from stringr) on the columns of interest and then match with unique values of pasted string
library(dplyr)
library(stringr)
aa %>%
mutate(Index = str_c(a1, a2, a3),
Index = match(Index, unique(Index)))
Or instead of arrangeing separately, use it with across
library(tidyr)
a %>%
arrange(across(a1:a3, desc)) %>%
unite(Index, everything(), remove = FALSE) %>%
mutate(Index = match(Index, unique(Index)))
Or with .GRP in data.table
library(dplyr)
setDT(aa)[, Index := .GRP, .(a1, a2, a3)]
aa
# a1 a2 a3 Index
#1: 3 6 8 1
#2: 3 6 8 1
#3: 2 5 7 2
#4: 2 5 7 2
#5: 1 2 5 3
#6: 1 2 5 3
#7: 2 5 3 4
#8: 2 4 1 5
Base R:
a_ordered <- with(a, a[rev(order(a1, a2, a3)), ])
a_ordered$idx <- with(a_ordered,
cumsum(abs(c(
0,
diff(as.integer(factor(paste0(
a1, a2, a3
))))
))) + 1)
Data:
a <- data.frame(
a1 = c(2, 2, 1, 1, 2, 2, 3, 3),
a2 = c(5, 4, 2, 2, 5, 5, 6, 6),
a3 = c(3, 1, 5, 5, 7, 7, 8, 8)
)

Merge elements of dataframe in list based on partial match of names

I have a list:
lst <- list(a1=dfa1, a2=dfa2, b1=dfb1, b2=dfb2)
dfa1 <- data.frame(x=c(1:5), y=c(2, 5, 7, 9, 10))
dfa2 <- data.frame(x=c(1:6), y=c(3, 8, 1, 2, 4, 13))
dfb1 <- data.frame(x=c(1:4), y=c(7, 9, 3, 2))
dfb2 <- data.frame(x=c(1:7), y=c(9, 3, 5, 1, 7, 9, 11))
Base on the partial element match 'a' and 'b', I want column bind the dataframem and the new list should look like below:
new_list
$a
x y1 y2
1 1 2 3
2 2 5 8
3 3 7 1
4 4 9 2
5 5 10 4
$b
x y1 y2
1 1 7 9
2 2 9 3
3 3 3 5
4 4 2 1
Here is a method with lapply and Reduce. lapply iterates through the letters "a" and "b" and applies Reduce to the list elements whose names contain the current letter. Reduce applies the merge function to the two data.frames, merging by the variable "x" and adding the desired suffixes with the given argument. Thanks to zx8754's suggestion, I added seq_along(grep(let, names(lst))) to allow the final names of the variables to increase by the number of group members.
myList <- lapply(c("a", "b"), function(let)
setNames(Reduce(function(x, y) merge(x, y, by="x"),
lst[grep(let, names(lst))]),
c("x", paste0("y", seq_along(grep(let, names(lst)))))))
[[1]]
x y1 y2
1 1 2 3
2 2 5 8
3 3 7 1
4 4 9 2
5 5 10 4
[[2]]
x y1 y2
1 1 7 9
2 2 9 3
3 3 3 5
4 4 2 1
To add names to the list it is probably easiest to do this afterward,
names(myList) <- c("a", "b")
You could also start with the vector
myVec <- c("a", "b")
and then use it in the lapply and in the names line.

Resources