R, data.table, group by column *numbers* AND sum a column - r

Let's say I have the following data.table
> DT
# A B C D E N
# 1: J t X D N 0.07898388
# 2: U z U L A 0.46906049
# 3: H a Z F S 0.50826435
# ---
# 9998: X b R L X 0.49879990
# 9999: Z r U J J 0.63233668
# 10000: C b M K U 0.47796539
Now I need to group by a pair of columns and calculate sum N.
That's easy to do when you know column names in advance:
> DT[, sum(N), by=.(A,B)]
# A B V1
# 1: J t 6.556897
# 2: U z 9.060844
# 3: H a 4.293426
# ---
# 674: V z 11.439100
# 675: M x 1.736050
# 676: U k 3.676197
But I must do that in a function, which receives a vector of column indices to group by.
> f <- function(columns = 1:2) {
DT[, sum(N), by=columns]
}
> f(1:2)
Error in `[.data.table`(DT, , sum(N), by = columns) :
The items in the 'by' or 'keyby' list are length (2). Each must be same
length as rows in x or number of rows returned by i (10000).
I also tried:
> f(list("A", "B"))
Error in `[.data.table`(DT, , sum(N), by = list(columns)) :
column or expression 1 of 'by' or 'keyby' is type list. Do not quote column
names. Usage: DT[,sum(colC),by=list(colA,month(colB))]
How do I make this to work?

Here's how I would approach this:
f <- function(columns) {
Get <- if (!is.numeric(columns)) match(columns, names(DT)) else columns
columns <- names(DT)[Get]
DT[, sum(N), by = columns]
}
The first line (Get..) keeps "columns" as numeric if it's already numeric or it converts it from characters to numeric if they are not.
Test it out with some sample data:
set.seed(1)
DT <- data.table(
A = sample(letters[1:3], 20, TRUE),
B = sample(letters[1:5], 20, TRUE),
C = sample(LETTERS[1:2], 20, TRUE),
N = rnorm(20)
)
## Should work with either column number or name
f(1)
f("A")
f(c(1, 3))
f(c("A", "C"))

Related

How to calculate the number of common neighbours for all edges in a graph?

I have a network defined by a list of edges. The network is large and sparse. For each pair of connected vertices, I would like to calculate the number of common neighbours. This post discusses how to do this for a single pair of vertices, but it strikes me as inefficient to loop over all edges to calculate this statistic for each edge in the graph. Instead, the statistic I'm after can be calculated from the product of the adjacency matrix with itself, as follows:
library(igraph)
library(data.table)
set.seed(1111)
E <- data.table(i = sample(as.character(1:5e4), 1e5, replace = T),
j = sample(as.character(1:5e4), 1e5, replace = T))
G <- simplify(graph_from_data_frame(E, directed = F)) # remove loops and multiples
N <- as_adjacency_matrix(G) %*% as_adjacency_matrix(G)
However, I don't know how to efficiently get the information out of the resulting matrix N, without looping over all the cells, which would look like this:
extract_entries <- function(x, M) {
nl <- M#p[x] + 1 # index from 1, not 0
nu <- M#p[x+1]
j.col <- M#Dimnames[[1]][M#i[nl:nu] + 1]
i.col <- M#Dimnames[[2]][x]
nb.col <- M#x[nl:nu]
data.table(i = i.col, j = j.col, nb = nb.col)
}
system.time(E.nb <- rbindlist(lapply(1:N#Dim[1], extract_entries, N), fill = T))
# user system elapsed
# 8.29 0.02 8.31
E <- E.nb[E, on = c('i', 'j')][is.na(nb), nb := 0]
Even in the reproducible example above, looping is slow, and the true graph might have millions of vertices and tens of millions of edges. My end goal is to add a column to the data frame E with the number of common neighbours for each edge, as illustrated in the MWE.
My question is: is there a (much) more efficient way of extracting the number of common neighbours for each pair of vertices and merging this information back into the list of edges?
I have seen that the package diagramme_R includes a function that calculates the number of common neighbours, however it again appears intended to be used for a limited number of edges, and wouldn't solve the problem of adding the information on the number of common neighbours back to the original data frame.
You're pretty much there. Just a couple things: converting N to a triangular matrix lets us build E.nb without lapply. Also, the i-j ordering is messing up the E.nb[E join. Temporarily sorting each row fixes this.
I've also included a function that uses the igraph triangles function instead of squaring the adjacency matrix, which is a bit faster on this example dataset.
library(igraph)
library(data.table)
library(Matrix)
set.seed(1111)
E <- data.table(i = sample(as.character(1:5e4), 1e5, replace = TRUE),
j = sample(as.character(1:5e4), 1e5, replace = TRUE))
f1 <- function(E) {
blnSort <- E[[1]] > E[[2]]
E[blnSort, 2:1 := .SD, .SDcols = 1:2]
G <- simplify(graph_from_data_frame(E, directed = FALSE)) # remove loops and multiples
N <- as(tril(as_adjacency_matrix(G) %*% as_adjacency_matrix(G), -1), "dtTMatrix")
data.table(
i = N#Dimnames[[1]][N#i + 1],
j = N#Dimnames[[2]][N#j + 1],
nb = as.integer(N#x)
)[
i > j, 2:1 := .(i, j)
][
E, on = .(i, j)
][
is.na(nb), nb := 0L
][
blnSort, 2:1 := .(i, j)
]
}
f2 <- function(E) {
blnSort <- E[[1]] > E[[2]]
E[blnSort, 2:1 := .SD, .SDcols = 1:2]
G <- simplify(graph_from_data_frame(E, directed = FALSE)) # remove loops and multiples
tri <- matrix(triangles(G), ncol = 3, byrow = TRUE)
data.table(
i = names(V(G)[tri[,c(1, 1, 2)]]),
j = names(V(G)[tri[,c(2, 3, 3)]])
)[
i > j, 2:1 := .(i, j)
][
,.(nb = .N), .(i, j)
][
E, on = .(i, j)
][
is.na(nb), nb := 0L
][
blnSort, 2:1 := .(i, j)
]
}
microbenchmark::microbenchmark(f1 = f1(EE),
f2 = f2(EE),
setup = {EE <- copy(E)})
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> f1 257.4803 281.8928 325.0267 303.4441 370.4977 478.7524 100
#> f2 123.5213 139.5152 169.3914 151.3065 190.7800 284.2644 100
identical(f1(copy(E)), f2(copy(E)))
#> [1] TRUE
I think you can simply the a matrix for indexing (since you are using character as vertex name, which turns to be the row/column names of the sparse matrix N, so we have to use match to find the corresponding indices)
cbind(df, nb = N[matrix(match(as.matrix(df), colnames(N)), ncol = 2)])
Example
Given an edge list df and resulting sparse matrix N
df <- data.frame(
from = letters[c(1, 1, 2, 2, 6, 6, 6, 1, 1, 1)],
to = letters[c(2, 4, 3, 5, 3, 2, 5, 5, 6, 3)]
)
g <- graph_from_data_frame(df, directed = FALSE)
m <- get.adjacency(g)
N <- m %*% m
by running the solution above, we will obtain
> cbind(df, nb = N[matrix(match(as.matrix(df), colnames(N)), ncol = 2)])
from to nb
1 a b 3
2 a d 0
3 b c 2
4 b e 2
5 f c 2
6 f b 3
7 f e 2
8 a e 2
9 a f 3
10 a c 2

How to run a function with multiple arguments of varying length in a loop in R

I need to run this function like 6000 times with all of its iterations. I have 6 arguments in total for the function. The first 3 of them go hand in hand and number 75. The next argument has 9 values. And the last 2 arguments have 3 values.
#require dplyr
#data is history as list
matchloop <- function(data, data2, x, a, b, c) {
#history as list
split <- data
#history for reference
fh <- FullHistory
#start counter
n<-1
#end counter
m<-a
tempdf0.3 <- fh
#set condition for loop
while(nrow(tempdf0.3) > 1 && m <= (nrow(data2))*b) {
#put history into a variable
tempdf0.0 <- split
#put fh into a variable
tempdf0.5 <- fh
#put test path into variable from row n to m
tempdf0.1 <- as.data.frame(data2[n:m,], stringsAsFactors = FALSE)
#change column name of test path
colnames(tempdf0.1) <- "directions"
#put row n to m of history into variable
tempdf0.2 <- lapply(tempdf0.0, function(df) df[n:m,])
#put output into output
tempdf0.3 <- orderedDistancespos(tempdf0.2, tempdf0.1,
"allPaths","directions")
#add to output routeID based on reference from fh-the test path ID
tempdf0.3 <- mutate(tempdf0.3, routeID = (subset(tempdf0.5, routeID
!= x)$routeID))
#reduce output based on the matched threshold
tempdf0.3 <- subset(tempdf0.3, dists >= a*c)
#create new history based on the IDs remaining in output
split <- split[as.character(tempdf0.3$routeID)]
#create new history for reference based on the IDs remaining in
output
fh <- subset(fh, routeID %in% tempdf0.3$routeID)
#increase loop counter
n <- n+a
#increase loop counter
m <- n+(a-1)
}
#show output
mylist <- list(tempdf0.3, nrow(tempdf0.3))
return(mylist)
}
I tried putting the 3 arguments with 75 elements in them to their own lists and use mapply. This works. But even at this level I still have to run the code 81 times to cover all the variables because as far as I understand mapply recycles based on the length of the longest argument.
mapply(matchloop, mylist2,mylist3,mylist4, MoreArgs = list(a=a, b=b, c=c))
data is a list of dataframes
data2 is a dataframe
x, a, b, c are all numerical.
Right now I'm trying to streamline my output so that its in just 1 line. So if possible I would like 1 single csv output with all 6000+ lines.
You can combine mapply and apply function to cycle through all possible combination of a, b and c variables. To create all possible combinations you can use expand.grid. Finally you can contatenate list of rows into a data.frame with the help of do.call and rbind functions as follows:
matchloop_stub <- matchloop <- function(data, data2, x, a, b, c) {
# stub
c(d = sum(data), d2 = sum(data2), x = sum(x), a = a, b = b, c = c, r = a + b + c)
}
set.seed(123)
mylist2 <- replicate(75, data.frame(rnorm(1)))
mylist3 <- replicate(75, data.frame(rnorm(2)))
mylist4 <- replicate(75, data.frame(rnorm(3)))
a <- 1:9
b <- 1:3
c <- 1:3
abc <- expand.grid(a, b, c)
names(abc) <- c("a", "b", "c")
xs <- apply(abc, 1, function(x) (mapply(matchloop_stub, mylist2, mylist3, mylist4, x[1], x[2], x[3], SIMPLIFY = FALSE)))
df <- do.call(rbind, do.call(rbind, xs))
write.csv(df, file = "temp.csv")
res <- read.csv("temp.csv")
nrow(res)
# [1] 6075
head(res)
# X d d2 x a b c r
# 1 1 -0.5604756 0.7407984 -1.362065 1 1 1 3
# 2 2 -0.5604756 0.7407984 -1.362065 2 1 1 4
# 3 3 -0.5604756 0.7407984 -1.362065 3 1 1 5
# 4 4 -0.5604756 0.7407984 -1.362065 4 1 1 6
# 5 5 -0.5604756 0.7407984 -1.362065 5 1 1 7
# 6 6 -0.5604756 0.7407984 -1.362065 6 1 1 8

Extract the combinations of cells without repeating the index

I am trying to calculate the combinations of elements of a matrix but each element should appear only once.
The (real) matrix is symmetric, and can have more then 5 elements (up to ~2000):
o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
# A B C D E
# A 0.4400317 0.1715681 0.7319108946 0.3994685 0.4466997
# B 0.5190471 0.1666164 0.3430245044 0.3837903 0.9322599
# C 0.3249180 0.6122229 0.6312876740 0.8017402 0.0141673
# D 0.1641411 0.1581701 0.0001703419 0.7379847 0.8347536
# E 0.4853255 0.5865909 0.6096330935 0.8749807 0.7230507
I desire to calculate the product of all the combinations of pairs (If possible it should appear all elements:AB, CD, EF if the matrix is of 6 elements), where for each pair one letter is the column, the other one is the row. Here are some combinations:
AB, CD, E
AC, BD, E
AD, BC, E
AE, BC, D
AE, BD, C
Where the value of the single element is just 1.
Combinations not desired:
AB, BC: Element B appears twice
AB, AC: Element A appears twice
Things I tried:
I thought about removing the unwanted part of the matrix:
out <- which(upper.tri(o), arr.ind = TRUE)
out <- cbind.data.frame(out, value = o[upper.tri(o)])
out[, 1] <- colnames(o)[out[, 1]]
out[, 2] <- colnames(o)[out[, 2]]
# row col value
# 1 A B 0.1715681
# 2 A C 0.7319109
# 3 B C 0.3430245
# 4 A D 0.3994685
# 5 B D 0.3837903
# 6 C D 0.8017402
# 7 A E 0.4466997
# 8 B E 0.9322599
# 9 C E 0.0141673
# 10 D E 0.8347536
My attempt involves the following process:
Make a copy of the matrix (out)
Store first value of the first row.
Remove all the pairs that involve any of the pair.
Select the next pair of the resulting matrix
Repeat until all rows are removed of the matrix
Repeat 2:5 starting from a different row
However, this method has one big problem, it doesn't guarantee that all the combinations are stored, and it could store several times the same combination.
My expected output is a vector, where each element is the product of the values in the cell selected by the combination:
AB, CD: 0.137553
How can I extract all those combinations efficiently?
This might work. I tested this on N elements = 5 and 6.
Note that this is not optimised, and hopefully can provide a framework for you to work from. With a much larger array, I can see steps involving apply and combn being a bottleneck.
The idea here is to generate a collection of unique sets first before calculating the product of the sets from another data.frame that stores values of sets.
Unique sets are identified by counting the number of unique elements in all combination pairs. For example, if N elements = 6, we expect length(unlist(combination)) == 6. The same is true if N elements = 7 (there will only be 3 pairs plus a remainder element). In cases where N elements is odd, we can ignore the remaining, unpaired element since it is constrained by the other elements.
library(dplyr)
library(reshape2)
## some functions
unique_by_n <- function(inlist, N){
## select unique combinations by count
## if unique, expect n = 6 if n elements = 6)
if(N %% 2) N <- N - 1 ## for odd numbers
return(length(unique(unlist(inlist))) == N)
}
get_combs <- function(x,xall){
## format and catches remainder if matrix of odd elements
xu <- unlist(x)
remainder <- setdiff(xall,xu) ## catch remainder if any
xset <- unlist(lapply(x, paste0, collapse=''))
finalset <- c(xset, remainder)
return(finalset)
}
## make dataset
set.seed(0) ## set reproducible example
#o <- matrix(runif(25), ncol = 5, nrow = 5) ## uncomment to test 5
#dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
o <- matrix(runif(36), ncol = 6, nrow = 6)
dimnames(o) <- list(LETTERS[1:6], LETTERS[1:6])
o[lower.tri(o)] <- t(o)[lower.tri(o)] ## make matrix symmetric
n_elements = nrow(o)
#### get matrix
dat <- melt(o, varnames = c('Rw', 'Cl'), as.is = TRUE)
dat$Set <- apply(dat, 1, function(x) paste0(sort(unique(x[1:2])), collapse = ''))
## get unique sets (since your matrix is symmetric)
dat <- subset(dat, !duplicated(Set))
#### get sets
elements <- rownames(o)
allpairs <- expand.grid(Rw = elements, Cl = elements) %>%
filter(Rw != Cl) ## get all pairs
uniqpairsgrid <- unique(t(apply(allpairs,1,sort)))
uniqpairs <- split(uniqpairsgrid, seq(nrow(uniqpairsgrid))) ## get unique pairs
allpaircombs <- combn(uniqpairs,floor(n_elements/2)) ## get combinations of pairs
uniqcombs <- allpaircombs[,apply(allpaircombs, 2, unique_by_n, N = n_elements)] ## remove pairs with repeats
finalcombs <- apply(uniqcombs, 2, get_combs, xall=elements)
#### calculate results
res <- apply(finalcombs, 2, function(x) prod(subset(dat, Set %in% x)$value)) ## calculate product
names(res) <- apply(finalcombs, 2, paste0, collapse=',') ## add names
resdf <- data.frame(Sets = names(res), Products = res, stringsAsFactors = FALSE, row.names = NULL)
print(resdf)
#> Sets Products
#> 1 AB,CD,EF 0.130063454
#> 2 AB,CE,DF 0.171200062
#> 3 AB,CF,DE 0.007212619
#> 4 AC,BD,EF 0.012494787
#> 5 AC,BE,DF 0.023285088
#> 6 AC,BF,DE 0.001139712
#> 7 AD,BC,EF 0.126900247
#> 8 AD,BE,CF 0.158919605
#> 9 AD,BF,CE 0.184631344
#> 10 AE,BC,DF 0.042572488
#> 11 AE,BD,CF 0.028608495
#> 12 AE,BF,CD 0.047056905
#> 13 AF,BC,DE 0.003131029
#> 14 AF,BD,CE 0.049941770
#> 15 AF,BE,CD 0.070707311
Created on 2018-07-23 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0.9000).
Maybe the following does what you want.
Note that I was more interested in being right than in performance.
Also, I have set the RNG seed, to have reproducible results.
set.seed(9840) # Make reproducible results
o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
cmb <- combn(LETTERS[1:5], 2)
n <- ncol(cmb)
res <- NULL
nms <- NULL
for(i in seq_len(n)){
for(j in seq_len(n)[-seq_len(i)]){
x <- unique(c(cmb[, i], cmb[, j]))
if(length(x) == 4){
res <- c(res, o[cmb[1, i], cmb[2, i]] * o[cmb[1, j], cmb[2, j]])
nms <- c(nms, paste0(cmb[1, i], cmb[2, i], '*', cmb[1, j], cmb[2, j]))
}
}
}
names(res) <- nms
res

Computing squared residual in regression row-wise in a data.table

Suppose I have a data.table with columns X1,X2,X3,Y. For each row, I would like to treat the entries in X1,X2,X3 as vector of length 3, take the inner product with a fixed vector say beta of length 4, subtract the result from the entry inY, square the result, and either output the final result for every row (or save it as another column).
After much research, I came up with this
dat[, (Y-sum(.SD*beta))^2, .SDcols=c(1:3)]
which does not work as expected.
Bonus point #1: Doing this with 3 replaced by a general n.
Bonus point #2: Suppose I have a grp column with group indices and I want to average these residual squares by group.
Assuming y is the first column of your data table dat and the rest of the columns are predictors. This works for bonus 1.
mat = as.matrix(dat[, x1:x3, with = F])
pred = cbind(1, mat) %*% beta
dat[, rss := (pred - y)^2]
For bonus 2:
dat[, mean_by_grp := mean(rss), by = grp]
To avoid the matrix conversion, you could do this:
dat[, pred := beta[1] + beta[2] * x1 + beta[3] * x2 + beta[4] * x3]
writing out the inner product.
Complete reproducible example
set.seed(47)
dat = data.table(replicate(4, rnorm(5)))
setnames(dat, c("y", paste0("x", 1:3)))
dat[, grp := c("A", "A", "B", "B", "B")]
beta = 1:4
mat = as.matrix(dat[, x1:x3, with = F])
pred = cbind(1, mat) %*% beta
dat[, rss := (pred - y) ^ 2]
dat[, mean_by_grp := mean(rss), by = grp]
dat
# y x1 x2 x3 grp rss mean_by_grp
# 1: 1.9946963 -1.08573747 -0.92245624 0.67077922 A 10.565250 7.064164
# 2: 0.7111425 -0.98548216 0.03960243 -0.08107805 A 3.563078 7.064164
# 3: 0.1854053 0.01513086 0.49382018 1.26424109 B 54.512843 38.263204
# 4: -0.2817650 -0.25204590 -1.82822917 -0.70338819 B 56.558929 38.263204
# 5: 0.1087755 -1.46575030 0.09147291 -0.04057817 B 3.717840 38.263204

Recode dataframe based on one column - in reverse

I asked this question a while ago (Recode dataframe based on one column) and the answer worked perfectly. Now however, i almost want to do the reverse. Namely, I have a (700k * 2000) of 0/1/2 or NA. In a separate dataframe I have two columns (Ref and Obs). The 0 corresponds to two instances of Ref, 1 is one instance of Ref and one instance of Obs and 2 is two Obs. To clarify, data snippet:
Genotype File ---
Ref Obs
A G
T C
G C
Ref <- c("A", "T", "G")
Obs <- c("G", "C", "C")
Current Data---
Sample.1 Sample.2 .... Sample.2000
0 1 2
0 0 0
0 NA 1
mat <- matrix(nrow=3, ncol=3)
mat[,1] <- c(0,0,0)
mat[,2] <- c(1,0,NA)
mat[,3] <- c(2,0,1)
Desired Data format---
Sample.1 Sample.1 Sample.2 Sample.2 Sample.2000 Sample.2000
A A A G G G
T T T T T T
G G 0 0 G C
I think that's right. The desired data format has two columns (space separated) for each sample. 0 in this format (plink ped file for the bioinformaticians out there) is missing data.
MAJOR ASSUMPTION: your data is in 3 element frames, i.e. you want to apply your mapping to the first 3 rows, then the next 3, and so on, which I think makes sense given DNA frames. If you want a rolling 3 element window this will not work (but code can be modified to make it work). This will work for an arbitrary number of columns, and arbitrary number of 3 row groups:
# Make up a matrix with your properties (4 cols, 6 rows)
col <- 4L
frame <- 3L
mat <- matrix(sample(c(0:2, NA_integer_), 2 * frame * col, replace=T), ncol=col)
# Mapping data
Ref <- c("A", "T", "G")
Obs <- c("G", "C", "C")
map.base <- cbind(Ref, Obs)
num.to.let <- matrix(c(1, 1, 1, 2, 2, 2), byrow=T, ncol=2) # how many from each of ref obs
# Function to map 0,1,2,NA to Ref/Obs
re_map <- function(mat.small) { # 3 row matrices, with col columns
t(
mapply( # iterate through each row in matrix
function(vals, map, num.to.let) {
vals.2 <- unlist(lapply(vals, function(x) map[num.to.let[x + 1L, ]]))
ifelse(is.na(vals.2), 0, vals.2)
},
vals=split(mat.small, row(mat.small)), # a row
map=split(map.base, row(map.base)), # the mapping for that row
MoreArgs=list(num.to.let=num.to.let) # general conversion of number to Obs/Ref
) )
}
# Split input data frame into 3 row matrices (assumes frame size 3),
# and apply mapping function to each group
mat.split <- split.data.frame(mat, sort(rep(1:(nrow(mat) / frame), frame)))
mat.res <- do.call(rbind, lapply(mat.split, re_map))
colnames(mat.res) <- paste0("Sample.", rep(1:ncol(mat), each=2))
print(mat.res, quote=FALSE)
# Sample.1 Sample.1 Sample.2 Sample.2 Sample.3 Sample.3 Sample.4 Sample.4
# 1 G G A G G G G G
# 2 C C 0 0 T C T C
# 3 0 0 G C G G G G
# 1 A A A A A G A A
# 2 C C C C T C C C
# 3 C C G G 0 0 0 0
I am not sure but this could be what you need:
first same simple data
geno <- data.frame(Ref = c("A", "T", "G"), Obs = c("G", "C", "C"))
data <- data.frame(s1 = c(0,0,0),s2 = c(1, 0, NA))
then a couple of functions:
f <- function(i , x, geno){
x <- x[i]
if(!is.na(x)){
if (x == 0) {y <- geno[i , c(1,1)]}
if (x == 1) {y <- geno[i, c(1,2)]}
if (x == 2) {y <- geno[i, c(2,2)]}
}
else y <- c(0,0)
names(y) <- c("s1", "s2")
y
}
g <- function(x, geno){
Reduce(rbind, lapply(1:length(x), FUN = f , x = x, geno = geno))
}
The way f() is defined may not be the most elegant but it does the job
Then simply run it as a doble for loop in a lapply fashion
as.data.frame(Reduce(cbind, lapply(data , g , geno = geno )))
hope it helps
Here's one way based on the sample data in your answer:
# create index
idx <- lapply(data, function(x) cbind((x > 1) + 1, (x > 0) + 1))
# list of matrices
lst <- lapply(idx, function(x) {
tmp <- apply(x, 2, function(y) geno[cbind(seq_along(y), y)])
replace(tmp, is.na(tmp), 0)
})
# one data frame
as.data.frame(lst)
# s1.1 s1.2 s2.1 s2.2
# 1 A A A G
# 2 T T T T
# 3 G G 0 0

Resources