Subset columns of data frames contained in list based on matrix of indices - r

I have a list that contains many data frames, and I have a matrix representing the index positions of columns of interest, with each row for each successive data frame. I am trying to subset each of the data frames within that list based on the matrix.
df1 <- data.frame(id=letters[1:4], result1=1:4, result2=1:4, result3=1:4)
df2 <- data.frame(id=letters[1:4], result1=5:8, result2=1:4, result3=1:4)
df3 <- data.frame(id=letters[1:4], result1=9:12, result2=1:4, result3=1:4)
df4 <- data.frame(id=letters[1:4], result1=13:16, result2=1:4, result3=1:4)
dflist <- list(df1, df2, df3, df4)
indices <- matrix(c(1,1,1,1,2,2,4,3),nrow=4, ncol=2)
So the data frames look like this:
[[1]]
id result1 result2 result3
1 a 1 1 1
2 b 2 2 2
3 c 3 3 3
4 d 4 4 4
[[2]]
id result1 result2 result3
1 a 5 1 1
2 b 6 2 2
3 c 7 3 3
4 d 8 4 4
[[3]]
id result1 result2 result3
1 a 9 1 1
2 b 10 2 2
3 c 11 3 3
4 d 12 4 4
[[4]]
id result1 result2 result3
1 a 13 1 1
2 b 14 2 2
3 c 15 3 3
4 d 16 4 4
and the index matrix looks like this
[,1] [,2]
[1,] 1 2
[2,] 1 2
[3,] 1 4
[4,] 1 3
From the first data frame, I want to subset columns 1 and 2, from the second dataframe I want columns 1, 2, from the third, I want columns 1 and 4, etc.
I can achieve this one by one using:
dflist[[1]][indices[1,]]
But I can't figure out a way to do for all at once (I tried lapply() and sapply() without luck)

You could loop on the indices
lapply(1:4, function(i) dflist[[i]][indices[i,]]) # or 1:nrow(indices) as #bgoldst suggests
Or, using mapply to operate on the rows of indices and the dflist
mapply(function(a, b) a[,b], dflist, split(indices, row(indices)), SIMPLIFY = F)
This could be simplified further as suggested by #Frank, using Map (a wrapper for mapply) and removing the anonymous function
Map(`[`, dflist, split(indices,row(indices)))

Related

How to interlacely merge two matrices?

I am facing with the other problem in coding with R-Studio. I have two dataframes (with the same number of rows and colunms). Now I want to merge them two into one, but the 6 columns of dataframe 1 would be columns 1,3,5,7,9.11 in the new matrix; while those of data frame 2 would be 2,4,6,8,10,12 in the new merged dataframe.
I can do it with for loop but is there any smarter way/function to do it? Thank you in advance ; )
You can cbind them and then reorder the columns accordingly:
df1 <- as.data.frame(sapply(LETTERS[1:6], function(x) 1:3))
df2 <- as.data.frame(sapply(letters[1:6], function(x) 1:3))
cbind(df1, df2)[, matrix(seq_len(2*ncol(df1)), 2, byrow=T)]
# A a B b C c D d E e F f
# 1 1 1 1 1 1 1 1 1 1 1 1 1
# 2 2 2 2 2 2 2 2 2 2 2 2 2
# 3 3 3 3 3 3 3 3 3 3 3 3 3
The code below will produce your required result, and will also work if one data frame has more columns than the other
# create sample data
df1 <- data.frame(
a1 = 1:10,
a2 = 2:11,
a3 = 3:12,
a4 = 4:13,
a5 = 5:14,
a6 = 6:15
)
df2 <- data.frame(
b1=11:20,
b2=12:21,
b3=13:22,
b4=14:23,
b5=15:24,
b6=16:25
)
# join by interleaving columns
want <- cbind(df1,df2)[,order(c(1:length(df1),1:length(df2)))]
Explanation:
cbind(df1,df2) combines the data frames with all the df1 columns first, then all the df2 columns.
The [,...] element re-orders these columns.
c(1:length(df1),1:length(df2)) gives 1 2 3 4 5 6 1 2 3 4 5 6 - i.e. the order of the columns in df1, followed by the order in df2
order() of this gives 1 7 2 8 3 9 4 10 5 11 6 12 which is the required column order
So [, order(c(1:length(df1), 1:length(df2)] re-orders the columns so that the columns of the original data frames are interleaved as required.

I have a list of data frames and a character vector. I want to rename the second column of each data frame by iterating through the vector. How do I?

I have a list of dataframes. Each of these dataframes has the same number of columns and rows, and has a similar data structure:
df.list <- list(data.frame1, data.frame2, data.frame3)
I have a vector of characters:
charvec <- c("a","b","c")
I want to replace the column name of the second column in each data frame by iterating through the above character vector. For example, the first data frame's second column should be "a". The second data frame's second column should be "b".
[[1]]
col1 a
1 1 2
2 2 3
[[2]]
col1 b
1 1 2
2 2 3
A reproducible example:
charvec <- c("a","b","c")
df_list <- list(df1 = data.frame(x = seq_len(3), y = seq_len(3)), df2 = data.frame(x = seq_len(4), y = seq_len(4)), df3 = data.frame(x = seq_len(5), y = seq_len(5)))
for(i in seq_along(df_list)){
names(df_list[[i]])[2] <- charvec[i]
}
> df_list
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Also can use map2 from purrr. Thanks to #ismirsehregal for example data.
library(purrr)
map2(
df_list,
charvec,
\(x, y) {
names(x)[2] <- y
x
}
)
Output
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

Remove the last column of dataframe in R in a function

I need to remove the last column of 10 dataframes, so I decided to put it in lapply(). I wrote a function to remove the col, like below,
remove_col <- function(mydata){
mydata = subset(mydata, select=-c(24))
}
and create a mylist <- (data1, data2.... data10), then I passed lapply as
lapply(mylist, FUN = remove_col)
It did give me a list of the removed dataframe, however, when I checked the original dataframe, the last column is still there.
How should I change the code to change the original dataset?
You need to assign the result of the function call to the input list on the LHS:
mylist <- lapply(mylist, FUN = remove_col)
Had you defined your function with an explicit return value, this might have been more obvious:
remove_col <- function(mydata) {
mydata <- subset(mydata, select=-c(24))
return(mydata) # return the modified list/data frame
}
Instead of hardcoding the column number to remove you can use ncol to remove the last column from each dataframe.
remove_col <- function(mydata){
mydata[, -ncol(mydata)]
}
mylist <- lapply(mylist, remove_col)
To see the changes in the original dataframe you can assign names to list of dataframe and use list2env.
names(mylist) <- paste0('data', seq_along(mylist))
list2env(mylist, .GlobalEnv)
Using base R and lapply, Note, you can remove ", drop = F" from your script if there are more than 2 columns in all dataframes in the list.
> d1
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
> d2
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> mylist <- list(d1, d2)
> mylist
[[1]]
c1 c2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
[[2]]
c1 c2
1 5 10
2 4 9
3 3 8
4 2 7
5 1 6
> lapply(mylist, function(x) x[,1:(ncol(x)-1), drop = F] )
[[1]]
c1
1 1
2 2
3 3
4 4
5 5
[[2]]
c1
1 5
2 4
3 3
4 2
5 1
>

R How to permute all rows of a data frame such that all possible combinations of rows are returned in a list?

I'm trying to produce all possible row permutations of a data frame (or matrix if that's easier) and have an object returned as a list or array of the data frames/matrices. I've constructed a mock dataframe that as the same dimensions as the one I'm working with.
test.df <- as.data.frame(matrix(1:80,nrow=16,ncol=5)
Edit: changed combinations to permutations
v.df <- data.frame(symbol = c("a", "b", "c"), number = c(1,2,3))
v.df
## symbol number
## 1 a 1
## 2 b 2
## 3 c 3
permutate.rows <- function(df) {
k <- dim(df)[1] # number of rows
index.df <- as.data.frame(t(permutations(n = k, r = k, v = 1:k)))
res <- lapply(index.df, function(idx) df[idx, , drop = FALSE])
}
permutate.rows(v.df)
gives the list of all permutated dfs:
$V1
symbol number
1 a 1
2 b 2
3 c 3
$V2
symbol number
1 a 1
3 c 3
2 b 2
$V3
symbol number
2 b 2
1 a 1
3 c 3
$V4
symbol number
2 b 2
3 c 3
1 a 1
$V5
symbol number
3 c 3
1 a 1
2 b 2
$V6
symbol number
3 c 3
2 b 2
1 a 1
Use 16 instead of 3 and your data frame to apply it on your example.
I shortened the df because 16!=20922789888000
library(purrr)
library(combinat)
test.df <- as.data.frame(matrix(1:25,nrow=5,ncol=5))
map(permn(1:nrow(test.df)), function(x) test.df[x,])

Select unique values from a list of 3

I would like to list all unique combinations of vectors of length 3 where each element of the vector can range between 1 to 9.
First I list all such combinations:
df <- expand.grid(1:9, 1:9, 1:9)
Then I would like to remove the rows that contain repetitions.
For example:
1 1 9
9 1 1
1 9 1
should only be included once.
In other words if two lines have the same numbers and the same number of each number then it should only be included once.
Note that
8 8 8 or
9 9 9 is fine as long as it only appears once.
Based on your approach and the idea to remove repetitions:
df <- expand.grid(1:2, 1:2, 1:2)
# Var1 Var2 Var3
# 1 1 1 1
# 2 2 1 1
# 3 1 2 1
# 4 2 2 1
# 5 1 1 2
# 6 2 1 2
# 7 1 2 2
# 8 2 2 2
df2 <- unique(t(apply(df, 1, sort))) #class matrix
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 2 2
# [4,] 2 2 2
df2 <- as.data.frame(df2) #class data.frame
There are probably more efficient methods, but if I understand you correct, that is the result you want.
Maybe something like this (since your data frame is not large, so it does not pain!):
len <- apply(df,1,function(x) length(unique(x)))
res <- rbind(df[len!=2,], df[unique(apply(df[len==2,],1,prod)),])
Here is what is done:
Get the number of unique elements per row
Comprises two steps:
First argument of rbind: Those with length either 1 (e.g. 1 1 1, 7 7 7, etc) or 3 (e.g. 5 8 7, 2 4 9, etc) are included in the final results res.
Second argument of rbind: For those in which the number of unique elements are 2 (e.g. 1 1 9, 3 5 3, etc), we apply product per row and take whose unique products (cause, for example, the product of 3 3 5 and 3 5 3 and 5 3 3 are the same)

Resources