How to interlacely merge two matrices? - r

I am facing with the other problem in coding with R-Studio. I have two dataframes (with the same number of rows and colunms). Now I want to merge them two into one, but the 6 columns of dataframe 1 would be columns 1,3,5,7,9.11 in the new matrix; while those of data frame 2 would be 2,4,6,8,10,12 in the new merged dataframe.
I can do it with for loop but is there any smarter way/function to do it? Thank you in advance ; )

You can cbind them and then reorder the columns accordingly:
df1 <- as.data.frame(sapply(LETTERS[1:6], function(x) 1:3))
df2 <- as.data.frame(sapply(letters[1:6], function(x) 1:3))
cbind(df1, df2)[, matrix(seq_len(2*ncol(df1)), 2, byrow=T)]
# A a B b C c D d E e F f
# 1 1 1 1 1 1 1 1 1 1 1 1 1
# 2 2 2 2 2 2 2 2 2 2 2 2 2
# 3 3 3 3 3 3 3 3 3 3 3 3 3

The code below will produce your required result, and will also work if one data frame has more columns than the other
# create sample data
df1 <- data.frame(
a1 = 1:10,
a2 = 2:11,
a3 = 3:12,
a4 = 4:13,
a5 = 5:14,
a6 = 6:15
)
df2 <- data.frame(
b1=11:20,
b2=12:21,
b3=13:22,
b4=14:23,
b5=15:24,
b6=16:25
)
# join by interleaving columns
want <- cbind(df1,df2)[,order(c(1:length(df1),1:length(df2)))]
Explanation:
cbind(df1,df2) combines the data frames with all the df1 columns first, then all the df2 columns.
The [,...] element re-orders these columns.
c(1:length(df1),1:length(df2)) gives 1 2 3 4 5 6 1 2 3 4 5 6 - i.e. the order of the columns in df1, followed by the order in df2
order() of this gives 1 7 2 8 3 9 4 10 5 11 6 12 which is the required column order
So [, order(c(1:length(df1), 1:length(df2)] re-orders the columns so that the columns of the original data frames are interleaved as required.

Related

I have a list of data frames and a character vector. I want to rename the second column of each data frame by iterating through the vector. How do I?

I have a list of dataframes. Each of these dataframes has the same number of columns and rows, and has a similar data structure:
df.list <- list(data.frame1, data.frame2, data.frame3)
I have a vector of characters:
charvec <- c("a","b","c")
I want to replace the column name of the second column in each data frame by iterating through the above character vector. For example, the first data frame's second column should be "a". The second data frame's second column should be "b".
[[1]]
col1 a
1 1 2
2 2 3
[[2]]
col1 b
1 1 2
2 2 3
A reproducible example:
charvec <- c("a","b","c")
df_list <- list(df1 = data.frame(x = seq_len(3), y = seq_len(3)), df2 = data.frame(x = seq_len(4), y = seq_len(4)), df3 = data.frame(x = seq_len(5), y = seq_len(5)))
for(i in seq_along(df_list)){
names(df_list[[i]])[2] <- charvec[i]
}
> df_list
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Also can use map2 from purrr. Thanks to #ismirsehregal for example data.
library(purrr)
map2(
df_list,
charvec,
\(x, y) {
names(x)[2] <- y
x
}
)
Output
$df1
x a
1 1 1
2 2 2
3 3 3
$df2
x b
1 1 1
2 2 2
3 3 3
4 4 4
$df3
x c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5

From axis values to coodinates pairs [duplicate]

I have two vectors of integers, say v1=c(1,2) and v2=c(3,4), I want to combine and obtain this as a result (as a data.frame, or matrix):
> combine(v1,v2) <--- doesn't exist
1 3
1 4
2 3
2 4
This is a basic case. What about a little bit more complicated - combine every row with every other row? E.g. imagine that we have two data.frames or matrices d1, and d2, and we want to combine them to obtain the following result:
d1
1 13
2 11
d2
3 12
4 10
> combine(d1,d2) <--- doesn't exist
1 13 3 12
1 13 4 10
2 11 3 12
2 11 4 10
How could I achieve this?
For the simple case of vectors there is expand.grid
v1 <- 1:2
v2 <- 3:4
expand.grid(v1, v2)
# Var1 Var2
#1 1 3
#2 2 3
#3 1 4
#4 2 4
I don't know of a function that will automatically do what you want to do for dataframes(See edit)
We could relatively easily accomplish this using expand.grid and cbind.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
expand.grid(df1, df2) # doesn't work so let's try something else
id <- expand.grid(seq(nrow(df1)), seq(nrow(df2)))
out <-cbind(df1[id[,1],], df2[id[,2],])
out
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#1.1 1 3 6 b
#2.1 2 4 6 b
Edit: As Joran points out in the comments merge does this for us for data frames.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
merge(df1, df2)
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#3 1 3 6 b
#4 2 4 6 b

R Sum columns by index

I need to find a way to sum columns by their index,I'm working on a bigread.csv file, I'll show here a sample of the problem; I'd like for example to sum from the 2nd to the 5th and from the 6th to the 7h the following matrix:
a 1 3 3 4 5 6
b 2 1 4 3 4 1
c 1 3 2 1 1 5
d 2 2 4 3 1 3
The result has to be like this:
a 11 11
b 10 5
c 7 6
d 8 4
The columns have all different names
We can use rowSums on the subset of columns i.e 2:5 and 6:7 separately and then create a new data.frame with the output.
data.frame(df1[1], Sum1=rowSums(df1[2:5]), Sum2=rowSums(df1[6:7]))
# id Sum1 Sum2
#1 a 11 11
#2 b 10 5
#3 c 7 6
#4 d 11 4
The package dplyr has a function exactly made for that purpose:
require(dplyr)
df1 = data.frame(a=c(1,2,3,4,3,3),b=c(1,2,3,2,1,2),c=c(1,2,3,21,2,3))
df2 = df1 %>% transmute(sum1 = a+b , sum2 = b+c)
df2 = df1 %>% transmute(sum1 = .[[1]]+.[[2]], sum2 = .[[2]]+.[[3]])

Subset columns of data frames contained in list based on matrix of indices

I have a list that contains many data frames, and I have a matrix representing the index positions of columns of interest, with each row for each successive data frame. I am trying to subset each of the data frames within that list based on the matrix.
df1 <- data.frame(id=letters[1:4], result1=1:4, result2=1:4, result3=1:4)
df2 <- data.frame(id=letters[1:4], result1=5:8, result2=1:4, result3=1:4)
df3 <- data.frame(id=letters[1:4], result1=9:12, result2=1:4, result3=1:4)
df4 <- data.frame(id=letters[1:4], result1=13:16, result2=1:4, result3=1:4)
dflist <- list(df1, df2, df3, df4)
indices <- matrix(c(1,1,1,1,2,2,4,3),nrow=4, ncol=2)
So the data frames look like this:
[[1]]
id result1 result2 result3
1 a 1 1 1
2 b 2 2 2
3 c 3 3 3
4 d 4 4 4
[[2]]
id result1 result2 result3
1 a 5 1 1
2 b 6 2 2
3 c 7 3 3
4 d 8 4 4
[[3]]
id result1 result2 result3
1 a 9 1 1
2 b 10 2 2
3 c 11 3 3
4 d 12 4 4
[[4]]
id result1 result2 result3
1 a 13 1 1
2 b 14 2 2
3 c 15 3 3
4 d 16 4 4
and the index matrix looks like this
[,1] [,2]
[1,] 1 2
[2,] 1 2
[3,] 1 4
[4,] 1 3
From the first data frame, I want to subset columns 1 and 2, from the second dataframe I want columns 1, 2, from the third, I want columns 1 and 4, etc.
I can achieve this one by one using:
dflist[[1]][indices[1,]]
But I can't figure out a way to do for all at once (I tried lapply() and sapply() without luck)
You could loop on the indices
lapply(1:4, function(i) dflist[[i]][indices[i,]]) # or 1:nrow(indices) as #bgoldst suggests
Or, using mapply to operate on the rows of indices and the dflist
mapply(function(a, b) a[,b], dflist, split(indices, row(indices)), SIMPLIFY = F)
This could be simplified further as suggested by #Frank, using Map (a wrapper for mapply) and removing the anonymous function
Map(`[`, dflist, split(indices,row(indices)))

How to combine two data frames using dplyr or other packages?

I have two data frames:
df1 = data.frame(index=c(0,3,4),n1=c(1,2,3))
df1
# index n1
# 1 0 1
# 2 3 2
# 3 4 3
df2 = data.frame(index=c(1,2,3),n2=c(4,5,6))
df2
# index n2
# 1 1 4
# 2 2 5
# 3 3 6
I want to join these to:
index n
1 0 1
2 1 4
3 2 5
4 3 8 (index 3 in two df, so add 2 and 6 in each df)
5 4 3
6 5 0 (index 5 not exists in either df, so set 0)
7 6 0 (index 6 not exists in either df, so set 0)
The given data frames are just part of large dataset. Can I do it using dplyr or other packages in R?
Using data.table (would be efficient for bigger datasets). I am not changing the column names, as the rbindlist uses the name of the first dataset ie. in this case n from the second column (Don't know if it is a feature or bug). Once you join the datasets by rbindlist, group it by column index i.e. (by=index) and do the sum of n column (list(n=sum(n)) )
library(data.table)
rbindlist(list(data.frame(index=0:6,n=0), df1,df2))[,list(n=sum(n)), by=index]
index n
#1: 0 1
#2: 1 4
#3: 2 5
#4: 3 8
#5: 4 3
#6: 5 0
#7: 6 0
Or using dplyr. Here, the column names of all the datasets should be the same. So, I am changing it before binding the datasets using rbind_list. If the names are different, there will be multiple columns for each name. After joining the datasets, group it by index and then use summarize and do the sum of column n.
library(dplyr)
nm1 <- c("index", "n")
colnames(df1) <- colnames(df2) <- nm1
rbind_list(df1,df2, data.frame(index=0:6, n=0)) %>%
group_by(index) %>%
summarise(n=sum(n))
This is something you could do with the base functions aggregate and rbind
df1 = data.frame(index=c(0,3,4),n=c(1,2,3))
df2 = data.frame(index=c(1,2,3),n=c(4,5,6))
aggregate(n~index, rbind(df1, df2, data.frame(index=0:6, n=0)), sum)
which returns
index n
1 0 1
2 1 4
3 2 5
4 3 8
5 4 3
6 5 0
7 6 0
How about
names(df1) <- c("index", "n") # set colnames of df1 to target
df3 <- rbind(df1,setNames(df2, names(df1))) # set colnnames of df2 and join
df <- df3 %>% dplyr::arrange(index) # sort by index
Cheers.

Resources