How can I get all the combinations of a list with duplicates. By duplicates I mean an element with itself. I am building a symmetric matrix.
names.list<-c("A","B","C")
as.data.frame(t(combn(names.list,2)))
Result is:
V1 V2
1 A B
2 A C
3 B C
When I want:
V1 V2
1 A A
2 A B
3 A C
4 B B
5 B C
6 C C
Or even:
V1 V2
1 A A
2 A B
3 A C
4 B A
5 B B
6 B C
7 C A
8 C B
9 C C
But my matrices are large so I would like to keep combinations to a minimum (so preferably the second result), since more combinations = more computations = larger run times..
Thanks.
It sounds like you're looking for expand.grid instead of combn:
expand.grid(names.list, names.list)
# Var1 Var2
# 1 A A
# 2 B A
# 3 C A
# 4 A B
# 5 B B
# 6 C B
# 7 A C
# 8 B C
# 9 C C
Update
There's also combinations from "gtools" which would give you your preferred output.
library(gtools)
combinations(3, 2, names.list, repeats = TRUE)
# [,1] [,2]
# [1,] "A" "A"
# [2,] "A" "B"
# [3,] "A" "C"
# [4,] "B" "B"
# [5,] "B" "C"
# [6,] "C" "C"
Related
I have a below stdin input and I am trying to convert this input to a list.
Input
input <- suppressWarnings(readLines(stdin(), n=31))
8 30
a s 3
b s 5
s a 3
b a 1
c a 10
d a 11
s b 5
a b 3
c b 2
d b 3
a c 10
b c 2
d c 3
e c 7
f c 12
a d 15
b d 7
c d 2
e d 11
f d 2
c e 7
d e 11
f e 3
z e 2
c f 12
d f 2
e f 3
z f 2
e z 2
f z 2
Line 1 first value denotes total number of alphabets , Second value denotes total number of rows.
From Line 2 to Line n. First value denotes starting node , second is ending node and third is cost.
I want to group the alphabets and cost as a list in below manner.
Expected output
> alphabets
$s
[1] "a" "b"
$a
[1] "s" "b" "c" "d"
$b
[1] "s" "a" "c" "d"
$c
[1] "a" "b" "d" "e" "f"
$d
[1] "a" "b" "c" "e" "f"
$e
[1] "c" "d" "f" "z"
$f
[1] "c" "d" "e" "z"
$z
[1] "e" "f"
> cost
$s
[1] 3 5
$a
[1] 3 1 10 11
$b
[1] 5 3 2 3
$c
[1] 10 2 3 7 12
$d
[1] 15 7 2 11 2
$e
[1] 7 11 3 2
$f
[1] 12 2 3 2
$z
[1] 2 2
Any suggestions from where to start.?
Does this give you what you want? I convert your input to a data.frame and the split based on your second column. The output of this differs slightly from yours since split will sort. If you do not want that, you can order the output based on the input.
df <- read.table(textConnection(input[-1]))
alphabets <- split(df$V1, df$V2)
cost <- split(df$V3, df$V2)
# you can do this to reorder how you had it
order <- unique(df$V2)
alphabets[order]
cost[order]
Hopefully this is simple, but it seems tricky to explain!
I want to combine two matrices in R, but I'd like to take the first two columns from the first matrix as the first two rows of the combined matrix, then the first column in the second matrix as the third column in the new matrix, then the 4th and 5th columns of the new matrix would be the 3rd and 4th from the first matrix and so and so forth. All matrices have the same row names and same number of rows
Matrix 1:
1 2 1 2 1 2
A a b c d e f
B a b c d e f
C a b c d e f
Matrix 2:
3 3 3
A x x x
B y y y
C z z z
Desired Matrix:
1 2 3 1 2 3 1 2 3
A a b x c d x e f x
B a b y c d y e f y
C a b z c d z e f z
In my example I need this (1,2)(3)(1,2)(3) configuration but as the post title suggests it would be cool to have a generic way of doing this for any configuration of columns from the matrices to be merged.
Make a set of column indexes and then subset a cbind-ed version of the pair of matrices:
grp1 <- 2
grp2 <- 1
sel <- c(rbind(
matrix(1:ncol(mat1),ncol=ncol(mat1)/grp1),
matrix(1:ncol(mat2),ncol=ncol(mat2)/grp2) + ncol(mat1)
))
# 'sel' looks like this before coercion to a vector.
# You can see how the alternating numbers fit together here:
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 4 6
#[3,] 7 8 9
cbind(mat1,mat2)[,sel]
1 2 3 1 2 3 1 2 3
A "a" "b" "x" "c" "d" "x" "e" "f" "x"
B "a" "b" "y" "c" "d" "y" "e" "f" "y"
C "a" "b" "z" "c" "d" "z" "e" "f" "z"
Using the following objects as mat1 and mat2:
mat1 <- as.matrix(read.table(text="1 2 1 2 1 2
A a b c d e f
B a b c d e f
C a b c d e f", header=TRUE, check.names=FALSE, stringsAsFactors=FALSE))
mat2 <- as.matrix(read.table(text="3 3 3
A x x x
B y y y
C z z z", header=TRUE, check.names=FALSE, stringsAsFactors=FALSE))
I have a data frame with entries in R, and want to create all possible unique subsets from this data frame, when each subset should include a unique possible pairwise combination of two columns from the pool of columns in the original data frame. This means that if the number of columns in the original data frame is Y, the number of unique subsets I should get is Y*(Y-1)/2. I also want that the name of the columns in each subset would be the name that was used in the original data frame. How do I do it?
colpairs <- function(d) {
apply(combn(ncol(d),2), 2, function(x) d[,x])
}
x <- colpairs(iris)
sapply(x, head, n=2)
## [[1]]
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
##
## [[2]]
## Sepal.Length Petal.Length
## 1 5.1 1.4
## 2 4.9 1.4
...
I'd use combn to make the indices of your columns, and lapply to take subsets of your data.frame and store them in a list structure. e.g.
# Example data
set.seed(1)
df <- data.frame( a = sample(2,4,repl=T) ,
b = runif(4) ,
c = sample(letters ,4 ),
d = sample( LETTERS , 4 ) )
# Use combn to get indices
ind <- combn( x = 1:ncol(df) , m = 2 , simplify = FALSE )
# ind is the column indices. The indices returned by the example above are (pairs in columns):
#[,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 1 1 2 2 3
#[2,] 2 3 4 3 4 4
# Make subsets, combine in list
out <- lapply( ind , function(x) df[,x] )
[[1]]
# a b
#1 1 0.2016819
#2 1 0.8983897
#3 2 0.9446753
#4 2 0.6607978
[[2]]
# a c
#1 1 q
#2 1 b
#3 2 e
#4 2 x
[[3]]
# a d
#1 1 R
#2 1 J
#3 2 S
#4 2 L
[[4]]
# b c
#1 0.2016819 q
#2 0.8983897 b
#3 0.9446753 e
#4 0.6607978 x
[[5]]
# b d
#1 0.2016819 R
#2 0.8983897 J
#3 0.9446753 S
#4 0.6607978 L
[[6]]
# c d
#1 q R
#2 b J
#3 e S
#4 x L
After creating a key on a data.table:
set.seed(12345)
DT <- data.table(x = sample(LETTERS[1:3], 10, replace = TRUE),
y = sample(LETTERS[1:3], 10, replace = TRUE))
setkey(DT, x, y)
DT
# x y
# [1,] A B
# [2,] A B
# [3,] B B
# [4,] B B
# [5,] C A
# [6,] C A
# [7,] C A
# [8,] C A
# [9,] C C
# [10,] C C
I would like to get an integer vector giving for each row the corresponding "key index". I hope the expected output (column i) below will help clarify what I mean:
# x y i
# [1,] A B 1
# [2,] A B 1
# [3,] B B 2
# [4,] B B 2
# [5,] C A 3
# [6,] C A 3
# [7,] C A 3
# [8,] C A 3
# [9,] C C 4
# [10,] C C 4
I thought about using something like cumsum(!duplicated(DT[, key(DT), with = FALSE])) but am hoping there is a better solution. I feel this vector could be part of the table's internal representation, and maybe there is a way to access it? Even if it is not the case, what would you suggest?
Update: From v1.8.3, you can simply use the inbuilt special .GRP:
DT[ , i := .GRP, by = key(DT)]
See history for older answers.
I'd probably just do this, since I'm fairly confident that no index counter is available from within the call to [.data.table():
ii <- unique(DT)
ii[ , i := seq_len(nrow(ii))]
DT[ii]
# x y i
# 1: A B 1
# 2: A B 1
# 3: B B 2
# 4: B B 2
# 5: C A 3
# 6: C A 3
# 7: C A 3
# 8: C A 3
# 9: C C 4
# 10: C C 4
You could make this a one-liner, at the expense of an additional call to unique.data.table():
DT[unique(DT)[ , i := seq_len(nrow(unique(DT)))]]
I have an R data frame that looks like this:
z = as.data.frame(list(Col1=c("a","c","e","g"),Col2=c("b","d","f","h"),Col3=c("1,2,5","3,5,7","9,8","1")))
> z
Col1 Col2 Col3
1 a b 1,2,5
2 c d 3,5,7
3 e f 9,8
4 g h 1
(The third column is a text column with comma-separated values.) I would like to convert it to a data frame like this:
a b 1
a b 2
a b 5
c d 3
c d 5
c d 7
e f 9
e f 8
g h 1
Can anyone suggest a way to accomplish this using apply? I'm close using the command below but it's not quite right. Any suggestions on more efficient ways to do this would be appreciated as well...
> apply(z,1,function(a){ids=strsplit(as.character(a[3]),",")[[1]];out<-c();for(id in ids){out<-rbind(out,c(a[1:2],id))};return(out)})
[[1]]
Col1 Col2
[1,] "a" "b" "1"
[2,] "a" "b" "2"
[3,] "a" "b" "5"
[[2]]
Col1 Col2
[1,] "c" "d" "3"
[2,] "c" "d" "5"
[3,] "c" "d" "7"
[[3]]
Col1 Col2
[1,] "e" "f" "9"
[2,] "e" "f" "8"
[[4]]
Col1 Col2
[1,] "g" "h" "1"
You can use ddply.
library(plyr)
ddply(z, c("Col1", "Col2"), summarize,
Col3=strsplit(as.character(Col3),",")[[1]]
)
With reshapeor reshape2
require(reshape2)
merge(cbind(z[,-3], L1=rownames(z)), melt(strsplit(as.character(z$Col3),",")))
gives
L1 Col1 Col2 value
1 1 a b 1
2 1 a b 2
3 1 a b 5
4 2 c d 3
5 2 c d 5
6 2 c d 7
7 3 e f 9
8 3 e f 8
9 4 g h 1