Creating combinations of two vectors - r

Suppose the following situation. There are two tables, each one of them with data of different quality. Both of them have the same variables A, B and C. Variables in the first table are called A1, B1 and C2, while those in the second table are called A2, B2, and C2.
The first table can be updated with the second table. There are six possible combinations:
A1, B1, C2
A1, B2, C1
A2, B1, C1
A1, B2, C2
A2, B1, C2
A2, B2, C1
The question is how to get that in R. What I'm using is what follows:
require(utils)
require(stringr)
vars <- c("A1", "B1", "C1", "A2", "B2", "C2")
combine <- function(data, n){
com1 = combn(data, n)# make all combinations
com2 = c(str_sub(com1, end=-2L))# remove the number in the end of the name
com3 = matrix(com2, nrow = dim(com1)[1], ncol = dim(com1)[2])# vector to matrix
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, duplicated)# find list elements with duplicated names
com3 = lapply(com3, function(X){X[which(!any(X == TRUE))]})# identify duplicated names
pos = which(as.numeric(com3) == 0)# get position of duplicates
com3 = com1[,pos]# return elements from the original list
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, sort)# sort by alphabetical order
com3 = as.data.frame(com3, stringsAsFactors = FALSE)# matrix to data frame
res = list(positions = pos, combinations = com3)# return position and combinations
return(res)
}
combine(vars, 3)
$positions
[1] 1 4 6 10 11 15 17 20
$combinations
X1 X2 X3 X4 X5 X6 X7 X8
1 A1 A1 A1 A1 A2 A2 A2 A2
2 B1 B1 B2 B2 B1 B1 B2 B2
3 C1 C2 C1 C2 C1 C2 C1 C2
I'd like to know if anyone knows a more straightforward solution than creating all possible combinations and afterwards cleaning up the result as my function does.

You're over thinking the problem. Just use expand.grid:
> expand.grid(c('A1','A2'),c('B1','B2'),c('C1','C2'))
Var1 Var2 Var3
1 A1 B1 C1
2 A2 B1 C1
3 A1 B2 C1
4 A2 B2 C1
5 A1 B1 C2
6 A2 B1 C2
7 A1 B2 C2
8 A2 B2 C2

Related

Is there a way to print the two text files one after the other inside jupyter notebook

I have two text files:
First.txt
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3
and
Second.txt
C1 D1
C2 D1
C3 D2
Here is the code written in jupyter notebook:
!type First.txt Second.txt
The output is given as:
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3C1 D1
C2 D1
C3 D2
but I want the output to be like:
A1 B1 C1
A2 B1 C2
A3 B2 C2
A4 B3 C3
C1 D1
C2 D1
C3 D2
How can I print the two text files one after the other?
I would not resort to shell commands for this. Just this snippet should work fine:
for filename in ["First.txt", "Second.txt"]:
with open(filename) as f:
for line in f:
print(line)

RStudio: Compare two different data tables when row.name is same

I have two data frames with different columns and rows:
One of the tables are specifically expressed genes names in rows (750 entries) with statistical analysis (p-val, fold change) in columns. (750x2 matrix)
The second table is all expressed genes names(13,000) and their associated genes in the same row (rows go as long as 100). (13,000x100 matrix)
I am interested in creating a data frame with the 750 expressed gene names from the first file and using a match function in R that will insert the associated genes from table2.
Example:
First Data
table
Name Fold Change P-value
A 0 3
B 1 4
F 2 6
H 1 8
Second Data table
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
C C1 C2 C3 C4 C5
D D1 D2 D3 D4 D5
E E1 E2 E3 E4 E5
F F1 F2 F3 F4 F5
Desired Output
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
D D1 D2 D3 D4 D5
F F1 F2 F3 F4 F5

append three excel sheet's data with same header in R

The data.xlsx contains three sheet S1,S2,S3. All of them use the same header, how to merge these data into one data frame?
data.xlsx S1 sheet
A B C
a1 b1 c1
data.xlsx S2 sheet
A B C
a2 b2 c2
data.xlsx S3 sheet
A B C
a3 b3 c3
Here is my starting code
s1 = read.xlsx('data.xlsx', sheetName='S1') # contains 2 rows
s2 = read.xlsx('data.xlsx', sheetName='S2') # contains 3 rows
s3 = read.xlsx('data.xlsx', sheetName='S3') # contains 4 rows
all = s1 + s2 + s3 # of cause this is wrong code
I wish the all contains
A B C
a1 b1 c1
a2 b2 c2
a3 b3 c3
do this
rbind(s1,s2,s3)
this assumes s1 ,s2 and s3 has same no of cols

Recursively building a list of dataframes (in R) - is loop only option?

I have a dataframe with n columns and I need to obtain combinations of its variables:
E.g.:
df <- data.frame(A = c("a1","a2","a3","a4","a5","a6"),
B = c("a1","a1","a3","a3","a5","a5"),
C = c("a1","a1","a1","a3","a4","a4"),
D = c("a1","a1","a1","a3","a4","a5"))
I need to make a list that would have n-1 elements each including all the unique combinations of the dataframe variables. The first element includes unique values for each columns starting from the first and ending to the last. For each subsequent element I need to drop the first column of the previous appended dataframe. Like this:
myList <- list(unique(df[,1:ncol(df)),
unique(df[,2:ncol(df)),
unique(df[,3:ncol(df)))
I managed to solve this with a for loop:
myList <- list()
for (i in 1:(ncol(df) - 1)){
myList[[i]] <- unique(df[, i:ncol(df)])
}
but I was left wondering whether there was a faster and more elegant way to do this.
With sapply():
sapply(1:(ncol(df)-1),
FUN = function(x, nc, df) unique(df[, x:nc]), nc = ncol(df), df = df)
An elegant solution would be a recursion:
func = function(df, n, lst)
{
if(ncol(df)==n) return(lst)
func(df, n+1, c(lst, list(unique(df[n:ncol(df)]))))
}
#> func(df,1, list())
#[[1]]
# A B C D
#1 a1 a1 a1 a1
#2 a2 a1 a1 a1
#3 a3 a3 a1 a1
#4 a4 a3 a3 a3
#5 a5 a5 a4 a4
#6 a6 a5 a4 a5
#[[2]]
# B C D
#1 a1 a1 a1
#3 a3 a1 a1
#4 a3 a3 a3
#5 a5 a4 a4
#6 a5 a4 a5
#[[3]]
# C D
#1 a1 a1
#4 a3 a3
#5 a4 a4
#6 a4 a5

expand.grid with separate variable for each column

I would like to achieve the following data.frame in R:
i1 i2 i3
1 A1 A2 A3
2 No A2 A3
3 A1 No A3
4 No No A3
5 A1 A2 No
6 No A2 No
7 A1 No No
8 No No No
In each column the variable can either be the concatenated string "A" and the column number or "No". The data.frame should contain all possible combinations.
My idea was to use expand.grid, but I don't know how to create the list dynamically. Or is there a better approach?
expand.grid(list(c("A1", "No"), c("A2", "No"), c("A3", "No")))
I guess you could create your own helper function, something like that
MyList <- function(n) expand.grid(lapply(paste0("A", seq_len(n)), c, "No"))
Then simply pass it the number of elements (e.g., 3)
MyList(3)
# Var1 Var2 Var3
# 1 A1 A2 A3
# 2 No A2 A3
# 3 A1 No A3
# 4 No No A3
# 5 A1 A2 No
# 6 No A2 No
# 7 A1 No No
# 8 No No No
Alternatively, you could also try data.tables CJ equivalent which should much more efficient than expand.grid for a big n
library(data.table)
DTCJ <- function(n) do.call(CJ, lapply(paste0("A", seq_len(n)), c, "No"))
DTCJ(3) # will return a sorted cross join
# V1 V2 V3
# 1: A1 A2 A3
# 2: A1 A2 No
# 3: A1 No A3
# 4: A1 No No
# 5: No A2 A3
# 6: No A2 No
# 7: No No A3
# 8: No No No
Another option is using Map with expand.grid
n <- 3
expand.grid(Map(c, paste0('A', seq_len(n)), 'NO'))
Or
expand.grid(as.data.frame(rbind(paste0('A', seq_len(n)),'NO')))
Another option, only using the most fundamental functions in R, is to use the indices:
df <- data.frame(V1 = c('A','A','A', 'A',rep('No',4)), V2 = c('A','A','No','No','A','A','No','No'), V3 = c('A','No','A','No','A','No','A','No'), stringsAsFactors = FALSE)
to get the row and col indices of the elements we need to change:
rindex <- which(df != 'No') %% nrow(df)
cindex <- ceiling(which(df != 'No')/nrow(df))
the solution is basically a one-liner:
df[matrix(c(rindex,cindex),ncol=2)] <- paste0(df[matrix(c(rindex,cindex),ncol=2)],cindex)
> df
V1 V2 V3
1 A1 A2 A3
2 A1 A2 No
3 A1 No A3
4 A1 No No
5 No A2 A3
6 No A2 No
7 No No A3
8 No No No

Resources