How to get findOverlapped region? - r

Hi i am working with GRanges and finding the overlaps using findOverlaps function of IRanges. I am getting the hits of which query and subject are overlapped,but I want to also have the coordinates of query and subject where they are overlapped and so I can retrieve the sequence of it.
How can get the coordinates of both subject and query where they are overlapped. I am using following function :
library(GenomicRanges)
library(regioneR) # toGRanges
fo <- findOverlaps(query = toGRanges(df1),subject = toGRanges(df2),type = "within")
df1 <- structure(list(df1c = c("chr2", "chr2", "chr2", "chr2"), df1c2 = c(2800,
3600, 3719, 3893), df1c3 = c(3270, 4152, 5092, 4547)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(df2c = c("chr2", "chr2", "chr2", "chr2", "chr2L"
), df2c2 = c(263, 342, 424, 846, 1030), df2c3 = c(20091, 17222,
2612, 4265, 11575)), class = "data.frame", row.names = c(NA,
-5L))
The expected output should be like
chr CoDF1 CoDF2
1 100-200 90-210
1 150-280 100-285
CoDF1 = Coordinates of df1 file where its overlapped with df2 reads
CoDF2 = Coordinates of df1 file where its overlapped with df1 reads

You'd better use intersect() :
> intersect(toGRanges(df1),toGRanges(df2))
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 2800-3270 *
[2] chr2 3600-5092 *
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
But pay attention that your data.frames colnames are not correct to create GRanges object, they should be seqnames/start/end
EDITED :
To see all intersections of all coordinates:
intersection = findOverlaps(query = toGRanges(df1), subject = toGRanges(df2), type = "any")
df = data.frame(df1[queryHits(intersection),], df2[subjectHits(intersection),])
df
seqnames start end seqnames.1 start.1 end.1
1 chr2 2800 3270 chr2 263 20091
1.1 chr2 2800 3270 chr2 342 17222
1.2 chr2 2800 3270 chr2 846 4265
2 chr2 3600 4152 chr2 263 20091
2.1 chr2 3600 4152 chr2 342 17222
2.2 chr2 3600 4152 chr2 846 4265
3 chr2 3719 5092 chr2 263 20091
3.1 chr2 3719 5092 chr2 342 17222
3.2 chr2 3719 5092 chr2 846 4265
4 chr2 3893 4547 chr2 263 20091
4.1 chr2 3893 4547 chr2 342 17222
4.2 chr2 3893 4547 chr2 846 4265

Related

Finding overlaps between 2 ranges and their overlapped region lengths?

I need to find length of overlapped region on same chromosomes between 2 group(gp1 & gp2). (similar question in stackoverflow were different from my aim, because I wanna find overlapped region not a TRUE/FALSE answer).
For example:
gp1:
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
gp2:
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
I'm looking for a way to compare these 2 group and get results like this:
id1 id2 chr overlapped_length
1 1 chr1 10
3 2 chr3 50
Should point you in the right direction:
Load libraries
# install.packages("BiocManager")
# BiocManager::install("GenomicRanges")
library(GenomicRanges)
library(IRanges)
Generate data
gp1 <- read.table(text =
"
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
", header = TRUE)
gp2 <- read.table(text =
"
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
", header = TRUE)
Calculate ranges
gr1 <- GenomicRanges::makeGRangesFromDataFrame(
gp1,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
gr2 <- GenomicRanges::makeGRangesFromDataFrame(
gp2,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
Calculate overlaps
hits <- findOverlaps(gr1, gr2)
p <- Pairs(gr1, gr2, hits = hits)
i <- pintersect(p)
Result
> as.data.frame(i)
seqnames start end width strand hit
1 chr1 590 600 11 * TRUE
2 chr3 550 600 51 * TRUE

Plotting genomic data using RCircos package

I am trying to use the RCircos package in R to visualize links between genomic positions. I am unfamiliar with this package and have been using the package documentation available from the CRAN repository from 2016.
I have attempted to format my data according to the package requirements. Here is what it looks like:
> head(pts3)
Chromosome ChromStart ChromEnd Chromosome.1 ChromStart.1 ChromEnd.1
1 chr1 33 34 chr1 216 217
2 chr1 33 34 chr1 789 790
3 chr1 33 34 chr1 1716 1717
4 chr1 33 34 chr1 1902 1903
5 chr1 33 34 chr2 2538 2539
6 chr1 33 34 chr2 4278 4279
Ultimately, I would like to produce a plot with tracks from ChromStart to ChromStart.1 and each gene labeled along the outside of the plot. I thought the script would look something like:
RCircos.Set.Core.Components(cyto.info = pts3,
chr.exclude = NULL,
tracks.inside = 1,
tracks.outside = 2)
RCircos.Set.Plot.Area()
RCircos.Chromosome.Ideogram.Plot()
RCircos.Link.Plot(link.data = pts3,
track.num = 3,
by.chromosome = FALSE)
It appears that to do so, I must first initialize with the RCircos.Set.Core.Components() function which requires positional information for each gene to pass to RCircos.Chromosome.Ideogram.Plot(). So, I created a second data frame containing the required information to pass to the function and this is the error that I get:
> head(genes)
Chromosome ChromStart ChromEnd GeneName Band Stain
1 chr1 0 2342 PB2 NA NA
2 chr2 2343 4683 PB1 NA NA
3 chr3 4684 6917 PA NA NA
4 chr4 6918 8710 HA NA NA
5 chr5 8711 10276 NP NA NA
6 chr6 10277 11735 NA NA NA
> RCircos.Set.Core.Components(cyto.info = genes,
+ chr.exclude = NULL,
+ tracks.inside = 1,
+ tracks.outside = 2)
Error in RCircos.Validate.Cyto.Info(cyto.info, chr.exclude) :
Cytoband start should be 0.
I don't actually have data for the Band or Stain columns and don't understand what they are for, but adding data to the those columns (such as 1:8 or chr1, chr2, etc) does not resolve the problem. Based on a recommendation from another forum, I also tried to reset the plot parameters for RCircos using the following functions, but it did not resolve the error:
core.chrom <- data.frame("Chromosome" = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8"),
"ChromStart" = c(0, 2343, 4684, 6918, 8711, 10277, 11736, 12763),
"ChromEnd" = c(2342, 4683, 6917, 8710, 10276, 11735, 12762, 13666),
"startLoc" = c(0, 2343, 4684, 6918, 8711, 10277, 11736, 12763),
"endLoc" = c(2342, 4683, 6917, 8710, 10276, 11735, 12762, 13666),
"Band" = NA,
"Stain" = NA)
RCircos.Reset.Plot.Ideogram(chrom.ideo = core.chrom)
Any advice would be deeply appreciated!
I'm not sure if you figured this one out or moved on etc. I had the same problem and ended up resolving it by reformatting my start positions for each chromosome to 0 as opposed to a continuation of the previous chr. For you it would be:
Chromosome ChromStart ChromEnd GeneName Band Stain
1 chr1 0 2342 PB2 NA NA
2 chr2 0 2340 PB1 NA NA
3 chr3 0 2233 PA NA NA
...etc

Get ranges for synonymous and non-synonymous nucleotide positions within a codon separately

I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).
grTargetGene itself looks like this
> grTargetGene
GRanges object with 11 ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype prev_exons_length coding_pos
<Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
[1] chr2 [148602722, 148602776] + | ENSG00000121989 protein_coding 0 1
[2] chr2 [148653870, 148654077] + | ENSG00000121989 protein_coding 55 2
[3] chr2 [148657027, 148657136] + | ENSG00000121989 protein_coding 263 3
[4] chr2 [148657313, 148657467] + | ENSG00000121989 protein_coding 373 2
[5] chr2 [148672760, 148672903] + | ENSG00000121989 protein_coding 528 1
[6] chr2 [148674852, 148674995] + | ENSG00000121989 protein_coding 672 1
[7] chr2 [148676016, 148676161] + | ENSG00000121989 protein_coding 816 1
[8] chr2 [148677799, 148677913] + | ENSG00000121989 protein_coding 962 3
[9] chr2 [148680542, 148680680] + | ENSG00000121989 protein_coding 1077 1
[10] chr2 [148683600, 148683730] + | ENSG00000121989 protein_coding 1216 2
[11] chr2 [148684649, 148684843] + | ENSG00000121989 protein_coding 1347 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)
> grTargetGene_Nonsynonym
GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602722, 148602723] + | ENSG00000121989 protein_coding
[2] chr2 [148602725, 148602726] + | ENSG00000121989 protein_coding
[3] chr2 [148602728, 148602729] + | ENSG00000121989 protein_coding
[4] chr2 [148602731, 148602732] + | ENSG00000121989 protein_coding
> grTargetGene_Synonym
GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602724, 148602724] + | ENSG00000121989 protein_coding
[2] chr2 [148602727, 148602727] + | ENSG00000121989 protein_coding
[3] chr2 [148602730, 148602730] + | ENSG00000121989 protein_coding
[4] chr2 [148602733, 148602733] + | ENSG00000121989 protein_coding
I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.
Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.
> library("GenomicRanges")
> dput(grTargetGene)
new("GRanges"
, seqnames = new("Rle"
, values = structure(1L, .Label = "chr2", class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, ranges = new("IRanges"
, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L,
148674852L)
, width = c(55L, 208L, 110L, 155L, 144L, 144L)
, NAMES = NULL
, elementType = "integer"
, elementMetadata = NULL
, metadata = list()
)
, strand = new("Rle"
, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, elementMetadata = new("DataFrame"
, rownames = NULL
, nrows = 6L
, listData = structure(list(ensembl_ids =
c("ENSG00000121989","ENSG00000121989",
"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"
), gene_biotype = c("protein_coding", "protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding"), cds_length =
c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L,
148602086L, 148602086L, 148602086L, 148602086L, 148602086L),
gene_end_position = c(148688393L, 148688393L, 148688393L,
148688393L, 148688393L, 148688393L), prev_exons_length = c(0,
55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =
c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",
"gene_end_position",
"prev_exons_length", "coding_pos"))
, elementType = "ANY"
, elementMetadata = NULL
, metadata = list()
)
, seqinfo = new("Seqinfo"
, seqnames = "chr2"
, seqlengths = NA_integer_
, is_circular = NA
, genome = NA_character_
)
, metadata = list()
)
How about the following:
grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
ranges(grTargetGene) <- IRanges(
start = start(grTargetGene) + x[1] - 1,
end = start(grTargetGene) + x[2] - 1)
return(grTargetGene) })
grl
#$Nonsym
#GRanges object with 6 ranges and 7 metadata columns:
# seqnames ranges strand | ensembl_ids gene_biotype
# <Rle> <IRanges> <Rle> | <character> <character>
# [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
# [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
# [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
# [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
# [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
# [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
# cds_length gene_start_position gene_end_position prev_exons_length
# <numeric> <integer> <integer> <numeric>
# [1] 1542 148602086 148688393 0
# [2] 1542 148602086 148688393 55
# [3] 1542 148602086 148688393 263
# [4] 1542 148602086 148688393 373
# [5] 1542 148602086 148688393 528
# [6] 1542 148602086 148688393 672
# coding_pos
# <numeric>
# [1] 1
# [2] 2
# [3] 3
# [4] 2
# [5] 1
# [6] 1
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
#
#$Sym
#GRanges object with 6 ranges and 7 metadata columns:
# seqnames ranges strand | ensembl_ids gene_biotype cds_length
# <Rle> <IRanges> <Rle> | <character> <character> <numeric>
# [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
# [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
# [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
# [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
# [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
# [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
# gene_start_position gene_end_position prev_exons_length coding_pos
# <integer> <integer> <numeric> <numeric>
# [1] 148602086 148688393 0 1
# [2] 148602086 148688393 55 2
# [3] 148602086 148688393 263 3
# [4] 148602086 148688393 373 2
# [5] 148602086 148688393 528 1
# [6] 148602086 148688393 672 1
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.
I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)
CodonPosition_separation = function(grTargetGene) {
grTargetGene = sort(grTargetGene)
grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
if (length(grTargetGene) >1) {
for (l in 2:length(grTargetGene)) {
grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
}
}
grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
grTargetGene_N = GRanges()
grTargetGene_S = GRanges()
for (l in 1:length(grTargetGene)) {
for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
if (as.character(strand(grTargetGene)[1]) =="+"){
start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
end_ns = end(grTargetGene[l])
if (start_ns <=end_ns) {
start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
}
start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
end_s = end(grTargetGene[l])
if (start_s <=end_s) {
start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
end_syn = start_syn
}
} else {
start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
end_ns = start(grTargetGene[l])
if (start_ns >=end_ns) {
start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
}
start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
end_s = start(grTargetGene[l])
if (start_ns >=end_ns) {
start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
end_syn = start_syn
}
}
if (exists("start_nonsyn")) {
length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
gr_nonsyn = GRanges(
seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
strand = rep(strand(grTargetGene[l]), length_nonsyn),
ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
)
gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
}
if (exists("start_syn")) {
length_syn = length(start_syn)
gr_syn = GRanges(
seqnames = rep(seqnames(grTargetGene[l]), length_syn),
strand = rep(strand(grTargetGene[l]), length_syn),
ranges = IRanges(start = start_syn, end = end_syn)
)
gr_syn = intersect(gr_syn,grTargetGene[l])
grTargetGene_S = append(grTargetGene_S, gr_syn)
}
}
return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
}
It works nicely:
> CodonPosition_separation(grTargetGene)
$grTargetGene_S
GRanges object with 514 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [148602724, 148602724] +
[2] chr2 [148602727, 148602727] +
[3] chr2 [148602730, 148602730] +
[4] chr2 [148602733, 148602733] +
[5] chr2 [148602736, 148602736] +
... ... ... ...
[510] chr2 [148684831, 148684831] +
[511] chr2 [148684834, 148684834] +
[512] chr2 [148684837, 148684837] +
[513] chr2 [148684840, 148684840] +
[514] chr2 [148684843, 148684843] +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
$grTargetGene_N
GRanges object with 517 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [148602722, 148602723] +
[2] chr2 [148602725, 148602726] +
[3] chr2 [148602728, 148602729] +
[4] chr2 [148602731, 148602732] +
[5] chr2 [148602734, 148602735] +
... ... ... ...
[513] chr2 [148684829, 148684830] +
[514] chr2 [148684832, 148684833] +
[515] chr2 [148684835, 148684836] +
[516] chr2 [148684838, 148684839] +
[517] chr2 [148684841, 148684842] +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

Find overlapping ranges based on positions in R

I have two datasets:
chr1 25 85
chr1 2000 3000
chr2 345 2300
and the 2nd,
chr1 34 45 1.2
chr1 100 1000
chr2 456 1500 1.3
This is my desired output,
chr1 25 85 1.2
chr2 345 2300 1.3
Below is my code:
sb <- NULL
rangesC <- NULL
sb$bin <- NULL
for(i in levels(df1$V1)){
s <- subset(df1, df1$V1 == i)
sb <- subset(df2, df2$V1 == i)
for(j in 1:nrow(sb)){
sb$bin[j] <-s$V4[(s$V2 <= sb$V2[j] & s$V3 >= sb$V3[j])]
}
rangesC <- try(rbind(rangesC, sb),silent = TRUE)
}
The error I get is :
replacement has length zero OR when I use as.character rangesC is empty.
I would like to get the V4 corresponding if the positions overlap. What is going wrong?
The foverlaps() function from the data.table package does an overlap join of two data.tables:
library(data.table)
setDT(df1, key = names(df1))
setDT(df2, key = key(df1))
foverlaps(df2, df1, nomatch = 0L)[, -c("i.V2", "i.V3")]
V1 V2 V3 V4
1: chr1 25 85 1.2
2: chr2 345 2300 1.3
Data
library(data.table)
df1 <- fread(
"chr1 25 85
chr1 2000 3000
chr2 345 2300", header = FALSE
)
df2 <- fread(
"chr1 34 45 1.2
chr1 100 1000
chr2 456 1500 1.3", header = FALSE
)

How to sort a data frame by user-defined (e.g. non-alphabetic order) [duplicate]

This question already has answers here:
Custom sorting (non-alphabetical)
(4 answers)
Closed 6 years ago.
Given a data frame dna
> dna
chrom start
chr2 39482
chr1 203918
chr1 198282
chrX 7839028
chr17 3874
The following code reorders dna by $chrom in alphabetical ascending order and by $start in numerical ascending order:
> dna <- dna[with(dna, order(chrom, start)), ]
> dna
chrom start
chr1 198282
chr1 203918
chr17 3874
chr2 39482
chrX 7839028
However, I would like to be able to have $chrom ordered as follows (simplified for the sake of my example here):
chrom_order <- c("chr1","chr2", "chr17", "chrX")
I am not allowed to rename stuff, for example chr1 to chr01.
You need to specify the levels in factor and then use order with indexing:
zz <- "chrom start
chr2 39482
chr1 203918
chr1 198282
chrX 7839028
chr17 3874"
Data <- read.table(text=zz, header = TRUE)
library(Hmisc)
library(gdata)
Data$chrom <- reorder.factor(Data$chrom , levels = c("chr1","chr2", "chr17", "chrX"))
Data[order(Data$chrom), ]
chrom start
2 chr1 203918
3 chr1 198282
1 chr2 39482
5 chr17 3874
4 chrX 7839028
or you can use this:
> Data$chrom <- factor(chrom , levels = c("chr1","chr2", "chr17", "chrX"))
> Data[order(Data$chrom), ]
chrom start
2 chr1 203918
3 chr1 198282
1 chr2 39482
5 chr17 3874
4 chrX 7839028
or use this:
> Data$chrom <- reorder(Data$chrom, new.order=c("chr1","chr2", "chr17", "chrX"))
> Data[order(Data$chrom), ]
Try this:
dna <- structure(list(chrom = structure(c(2L, 1L, 1L, 4L, 3L), .Label = c("chr1",
"chr2", "chr17", "chrX"), class = c("ordered", "factor")), start = c(39482L,
203918L, 198282L, 7839028L, 3874L)), .Names = c("chrom", "start"
), row.names = c(NA, -5L), class = "data.frame")
chrom_order <- c("chr1","chr2", "chr17", "chrX")
# Make chrom column ordered. Second term defines the order
dna$chrom <- ordered(dna$chrom, chrom_order)
dna[with(dna, order(chrom, start)),]
chrom start
3 chr1 198282
2 chr1 203918
1 chr2 39482
5 chr17 3874
4 chrX 7839028

Resources