Related
I have 5 groups: G1, G2,…,G5 with n1,n2,…,n5 elements in each group respectively. I select 2 elements from each of the 4 groups and 1 element from the 5th group. How do I generate all possible combinations in R?
(It is not specified in the question whether the groups are mutually exclusive or not; So, assume:
1. the groups are mutually exclusive
2. the subsets of groups (n1, n2, ...) will use the same elements in being filled)
3 just for the sake of argument |G1|=|G2|=|G3|=5 (The user can change the following code accordingly for differing numbers of elements in the groups)
The following is 3 set mock-up answer of the question that any user can generalize to arbitrary number of groups. So, assume group names are G1, G2, G3.
library(causfinder)
gctemplate(5,2,2) # Elements are coded as: 1,2,3,4,5; |sub-G1|=2; |sub-G2|=2; |sub-G3|=5-(2+2)=1
# In the following table, each number represents a unique element. (SOLUTION ENDED!)
My package (causfinder) is not in CRAN. Hence, I will give the function gctemplate's code below.
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5 sub-G1={1,2} sub-G2={3,4} sub-G3={5}
[2,] 1 2 3 5 4
[3,] 1 2 4 5 3 sub-G1={1,2} sub-G2={4,5} sub-G3={3}
[4,] 1 3 2 4 5
[5,] 1 3 2 5 4
[6,] 1 3 4 5 2
[7,] 1 4 2 3 5
[8,] 1 4 2 5 3
[9,] 1 4 3 5 2
[10,] 1 5 2 3 4
[11,] 1 5 2 4 3
[12,] 1 5 3 4 2
[13,] 2 3 1 4 5
[14,] 2 3 1 5 4
[15,] 2 3 4 5 1
[16,] 2 4 1 3 5
[17,] 2 4 1 5 3
[18,] 2 4 3 5 1
[19,] 2 5 1 3 4
[20,] 2 5 1 4 3
[21,] 2 5 3 4 1
[22,] 3 4 1 2 5
[23,] 3 4 1 5 2
[24,] 3 4 2 5 1
[25,] 3 5 1 2 4
[26,] 3 5 1 4 2
[27,] 3 5 2 4 1
[28,] 4 5 1 2 3
[29,] 4 5 1 3 2
[30,] 4 5 2 3 1
The code of gctemplate:
gctemplate <- function(nvars, ncausers, ndependents){
independents <- combn(nvars, ncausers)
patinajnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
independentspatinajednumber <- dim(combn(nvars, ncausers))[[2]]*patinajnumber
dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ndependents)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
dependents[(patinajnumber*(i-1)+1):(patinajnumber*i),] <- t(combn(setdiff(seq(1:nvars), independents[,i]), ndependents))
}
independentspatinajed <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = ncausers)
for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])){
for (j in as.integer(1:patinajnumber)){
independentspatinajed[(i-1)*patinajnumber+j,] <- independents[,i]
}}
independentsdependents <- cbind(independentspatinajed, dependents)
others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]]*patinajnumber, ncol = nvars - ncausers - ndependents)
for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]])*patinajnumber))){
others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i,])
}
causalitiestemplate <- cbind(independentsdependents, others)
causalitiestemplate
}
Now, the solution for G1,G2,G3 is the above. Just generalize the above code to 5-variable case with the very same logic!
I'm hoping to add to a data set a variable that sequences the instances a certain grouping variable appears. For example:
ids <- c(rep(1,4),rep(2,6),rep(3,2))
I'm wanting another variable that would count the instances each id appears. Creating a vector like this:
1,2,3,4,1,2,3,4,5,6,1,2
With them combined looking something like this:
ids count
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
9 2 5
10 2 6
11 3 1
12 3 2
Any ideas? Many thanks!
I suggest ave with seq_along
ids <- c(rep(1,4),rep(2,6),rep(3,2))
count <- ave(ids,ids, FUN=seq_along)
cbind(ids, count)
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2
Or if it is ordered
cbind(ids, count=sequence(unname(table(ids))))
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2
Or
cbind(ids,within.list(rle(ids), lengths <- sequence(lengths))$lengths)
Or
library(data.table)
dt <- as.data.table(ids)
dt[,count:=seq_len(.N), by=ids]
Or
library(dplyr)
dat <- data.frame(ids)
dat %>%
group_by(ids) %>%
mutate(count=row_number())
I have a table which is
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 10 0.00040803 0.00255277
[2,] 1 11 3 0.01765470 0.01584580
[3,] 1 6 2 0.15514850 0.15509000
[4,] 1 8 14 0.02100531 0.02572320
[5,] 1 9 4 0.04748648 0.00843252
[6,] 2 5 10 0.00040760 0.06782680
[7,] 2 11 3 0.01765480 0.01584580
[8,] 2 6 2 0.15514810 0.15509000
[9,] 2 8 14 0.02100491 0.02572320
[10,] 2 9 4 0.04748608 0.00843252
[11,] 3 5 10 0.00040760 0.06782680
[12,] 3 11 3 0.01765480 0.01584580
[13,] 3 8 14 0.02100391 0.02572320
[14,] 3 9 4 0.04748508 0.00843252
[15,] 4 5 10 0.00040760 0.06782680
[16,] 4 11 3 0.01765480 0.01584580
[17,] 4 8 14 0.02100391 0.02572320
[18,] 4 9 4 0.04748508 0.00843252
[19,] 5 8 14 0.02100391 0.02572320
[20,] 5 9 4 0.04748508 0.00843252
I want to remove duplicates from this table. However, only colums 2,3,4 matter. Example: rows 1,6,11,15 are identical if only columns 2,3,4 are observed. Note for column 4: is it possible to incorporate that it is considered as being the same as long as it is within 10e-5 of the value? So that rows 1 and 6 would be considered as being identical although the value in column 4 differs slightly (within the tolerance I mentioned)?
Then it would be great to get an output which would be like:
column 2 value | column 3 value | column 1 value at which the the pair has been first observed (with the tolerance) (in the example 1) | column 1 value at which the pair has been last observed (with tolerance) (in the example 4) | value of column 4 at first appearance (0.00040803 in the example)
This is a way of thinking about it, but I'm not sure it's what you're looking for. The logic should be able to get you started though.
dat <- YOUR DATA SET
dat
V1 V2 V3 V4 V5
1 1 5 10 0.00040803 0.00255277
2 1 11 3 0.01765470 0.01584580
3 1 6 2 0.15514850 0.15509000
4 1 8 14 0.02100531 0.02572320
5 1 9 4 0.04748648 0.00843252
# TRUNCATED
dat <- dat[, c(2, 3, 4)]
dat$V4 <- round(dat$V4, 5)
unique(dat)
V2 V3 V4
1 5 10 0.00041
2 11 3 0.01765
3 6 2 0.15515
4 8 14 0.02101
5 9 4 0.04749
9 8 14 0.02100
You could do something like this:
# read your data
yy <- read.csv('your-data.csv', header=F)
## V1 V2 V3 V4 V5
## 1 1 5 10 0.00040803 0.00255277
## 2 1 11 3 0.01765470 0.01584580
## 3 1 6 2 0.15514850 0.15509000
## 4 1 8 14 0.02100531 0.02572320
# create a logical matrix indicating value is within tolerance
mat.eq.tol <- sapply(yy$V4, function(x) abs(yy$V4-x) < 1E-5)
# minimum index
eq.min <- apply(mat.eq.tol, 1, function(x) min(which(x)))
# maximum index
eq.max <- apply(mat.eq.tol, 1, function(x) max(which(x)))
# combine result
res <- cbind(yy$V2, yy$V3, yy$V1[eq.min], yy$V1[eq.max], yy$V4[eq.min])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 5 10 1 4 0.00040803
## [2,] 11 3 1 4 0.01765470
## [3,] 6 2 1 2 0.15514850
## [4,] 8 14 1 5 0.02100531
## [5,] 9 4 1 5 0.04748648
## [6,] 5 10 1 4 0.00040803
I have a table which i want to transform
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 4 9 1 2
[5,] 2 3 5 1 2
[6,] 2 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
I want to filter the data in a way that rows which just differ by there number in the first column are removed (not completely but only the duplicate). So for rows 1 and 4 only row 1 should remain in the table. Or for row 3 and 9 only row 9 should remain. It is important that the information in the first column is remained and that the earliest occurance of the row remaisn in the table not the other incidences.
You can use duplicated:
mat[!duplicated(as.data.frame(mat[, -1])), ]
t LabelA LabelB start stop
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[7,] 2 7 2 2 2
[8,] 3 3 5 3 4
[9,] 3 1 6 3 4
[10,] 3 7 2 3 5
[11,] 3 4 9 3 5
where mat is the name of your matrix.
Try using duplicated function:
mymx <- matrix(c(1,4,9,1,2 ,1,3,5,1,2 ,1,1,6,1,2 ,2,4,9,1,2 ,2,3,5,1,2 ,2,1,6,1,2 ,2,7,2,2,2 ,3,3,5,3,4 ,3,1,6,3,4 ,3,7,2,3,5 ,3,4,9,3,5), ncol=5, byrow=T)
mymx[!duplicated(mymx[,-1]),]
> mymx[!duplicated(mymx[,-1]),]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 9 1 2
[2,] 1 3 5 1 2
[3,] 1 1 6 1 2
[4,] 2 7 2 2 2
[5,] 3 3 5 3 4
[6,] 3 1 6 3 4
[7,] 3 7 2 3 5
[8,] 3 4 9 3 5
Using R, I'm trying to construct a dataframe of the row and col numbers of a given matrix. E.g., if
a <- matrix(c(1:15), nrow=5, ncol=3)
then I'm looking to construct a dataframe that gives:
row col
1 1
1 2
1 3
. .
5 1
5 2
5 3
What I've tried:
row <- matrix(row(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
col <- matrix(col(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
out <- cbind(row, col)
colnames(out) <- c("row", "col")
results in:
row col
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 1 2
[7,] 2 2
[8,] 3 2
[9,] 4 2
[10,] 5 2
[11,] 1 3
[12,] 2 3
[13,] 3 3
[14,] 4 3
[15,] 5 3
Which isn't what I'm looking for, as the sequence of rows and cols in suddenly reversed, even tough I specified "byrow=T". I don't see if and where I'm making a mistake but would hugely appreciate suggestions to overcome this problem. Thanks in advance!
I'd use expand.grid on the vectors 1:ncol and 1:nrow, then flip the columns with [,2:1] to get them in the order you want:
> expand.grid(seq(ncol(a)),seq(nrow(a)))[,2:1]
Var2 Var1
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Use row and col, but more directly manipulate their output ordering since they return corresponding indices in place for the input array. Use t to get the non-default order you want in the end:
data.frame(row = as.vector(t(row(a))), col = as.vector(t(col(a))))
row col
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Or, as a matrix not a data.frame:
cbind(as.vector(t(row(a))), as.vector(t(col(a))))
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 2 1
[5,] 2 2
[6,] 2 3
[7,] 3 1
[8,] 3 2
[9,] 3 3
[10,] 4 1
[11,] 4 2
[12,] 4 3
[13,] 5 1
[14,] 5 2
[15,] 5 3
You may want to have a look at ?expand.grid, which does just about exactly what you want to achieve.
Since there are many ways to skin a cat, I'll chip in with yet another variant based on rep:
data.frame(row=rep(seq(nrow(a)), each=ncol(a)), col=rep(seq(ncol(a)), nrow(a)))
...but to announce a "winner", I think you need to time the solutions:
# Make up a huge matrix...
a <- matrix(runif(1e7), 1e4)
system.time( a1<-data.frame(row = as.vector(t(row(a))),
col = as.vector(t(col(a)))) ) # 0.68 secs
system.time( a2<-expand.grid(col = seq(ncol(a)),
row = seq(nrow(a)))[,2:1] ) # 0.49 secs
system.time( a3<-data.frame(row=rep(seq(nrow(a)), each=ncol(a)),
col=rep(seq(ncol(a)), nrow(a))) ) # 0.59 secs
identical(a1, a2) && identical(a1, a3) # TRUE
...so it seems #Spacedman has the speediest solution!