Through image recognition and segmentation I have already obtained an abstract representation of plants on a field (i. e. I exactly know all the coordinates of all plants in an image).
Now I want to detect the crop-rows in this abstract representation - and can't quite figure out how.
My problems are:
the rows in the images may be a bit rotated and not exactly in north/south orientation (angles may vary between -10° to +10°)
The number of crop-rows per image can vary per image and is not fixed - also it is unknown before processing has taken place
The rotation of the crop-rows may be slightly different in each processed image
I have hundreds of images / representations to be processed (so doing it by hand is obviously not really feasible :-) ), so I would need an algorithm that I can later e.g. put into a loop
Can you perhaps help me with at least a strategy (or code snippets) to do such a crop row detection? Idealy in the end for each crop row I would have the parameters of a linear equation (y=m*x+t), so that abline() can be used, but I am open for anything. In the end it could look like something like this (here purely for illustration purposes done by hand):
Underlying data of the images is here:
structure(c(5278.072, 2632.564, 393.34, 4057.704, 3805.599, 611.269,
1823.835, 3359.069, 3598.284, 5262.873, 2069.963, 1579.745, 4539.584,
3579.977, 4296.46, 1831.153, 2333.835, 1126.639, 152.948, 4030.205,
3368.738, 2066.733, 855.111, 2579.665, 3092.37, 1318.357, 1109.438,
3578.606, 375.756, 3796.788, 4520.064, 1807.36, 5001.773, 87.272,
4033.594, 836.708, 639.13, 3105.628, 1569.256, 2831.851, 826.444,
3557.598, 1078.643, 576.266, 4789.585, 3091.929, 5239.658, 1099.954,
1807.972, 2534.677, 4271.841, 5019.276, 2053.246, 1536.071, 3347.644,
4019.766, 3793.392, 5257.628, 604.323, 2561.307, 1792.665, 884.25,
109.456, 3066.108, 3750.833, 4511.819, 2815.08, 119.468, 4499.801,
2582.512, 2822.354, 3773.842, 1054.719, 4251.171, 4002.476, 2018.277,
1775.284, 4959.269, 2541.009, 4742.312, 2265.149, 3071.313, 1779.218,
3972.64, 2822.409, 5217.848, 1265.449, 1522.899, 3057.732, 5364.729,
346.341, 4226.012, 3287.299, 1767.18, 3991.963, 1811.498, 2785.251,
4488.214, 822.509, 2016.435, 3022.344, 2528.079, 4470.315, 3017.716,
572.771, 97.748, 5168.119, 4199.643, 2006.285, 3946.505, 2771.626,
3495.94, 1745.531, 3734.241, 3265.819, 4963.116, 1058.788, 300.408,
1252.845, 4453, 5411.107, 2768.93, 557.806, 2004.424, 2218.582,
4214.073, 4698.292, 5149.238, 4953.886, 1238.343, 3502.518, 2753.044,
5417.502, 1031.945, 2518.901, 1483.487, 4450.737, 2258.484, 289.261,
2987.945, 5156.371, 4171.407, 1995.901, 781.96, 3918.94, 1974.667,
316.758, 1470.993, 5160.868, 3237.828, 521.251, 787.228, 1039.416,
1202.261, 3456.837, 4148.167, 2200.492, 2720.912, 4915.451, 3902.744,
4435.419, 1209.418, 1471.057, 4641.269, 3913.51, 5412.672, 1953.878,
2220.277, 4911.249, 1006.368, 2974.173, 4410.827, 1688.391, 293.729,
1462.871, 4618.785, 5150.904, 2689.061, 1952.56, 5389.383, 2176.387,
995.073, 4125.245, 498.978, 5137.266, 5358.118, 1444.34, 1674.431,
2689.288, 2465.351, 4566.352, 765.125, 1196.984, 1687.859, 258.247,
1914.911, 4575.408, 3421.147, 495.879, 979.079, 1922.943, 4097.704,
737.439, 3410.562, 234.74, 2159.697, 471.983, 1418.991, 2440.575,
1942.708, 1162.525, 5312.409, 2162.656, 5059.814, 1411.412, 4558.905,
247.618, 4319.106, 3411.827, 1786.69, 1670.462, 1180.524, 1640.636,
4715.993, 3576.548, 3566.57, 3589.872, 3565.564, 3531.571, 3415.178,
3511.07, 3510.051, 3487.762, 3470.791, 3443.062, 3369.329, 3386.999,
3387.786, 3277.473, 3376.266, 3421.932, 3387.869, 3367.994, 3346.403,
3259.785, 3296.081, 3297.633, 3285.163, 3300.119, 2941.504, 3264.344,
3277.9, 3235.499, 3198.869, 3235.508, 3156.907, 3221.313, 3123.96,
3165.979, 3186.806, 3148.158, 3129.906, 3035.963, 2987.899, 3053.684,
3050.107, 3052.643, 3037.767, 3037.525, 2994.456, 3006.454, 2960.606,
2973.443, 2919.843, 2917.246, 2939.87, 2914.804, 2886.588, 2920.769,
2906.616, 2908.866, 2868.052, 2885.769, 2860.088, 2801.168, 2853.439,
2853.863, 2847.141, 2805.677, 2806.183, 2718.094, 2661.652, 2695.19,
2656.518, 2612.372, 2603.286, 2602.449, 2591.63, 2595.714, 2593.287,
2575.333, 2572.15, 2476.559, 2435.917, 2538.626, 2514.215, 2458.875,
2477.5, 2385.366, 2421.47, 2220.899, 2397.842, 2396.848, 2393.501,
2352.039, 2292.429, 2315.84, 2328.682, 2256.508, 2236.925, 2192.809,
2241.279, 2144.107, 2195.016, 2185.86, 2112.28, 2098.085, 2020.843,
1971.232, 1979.691, 1968.859, 1943.755, 1974.743, 1891.801, 1944.186,
1951.423, 1872.022, 1928.441, 1880.504, 1912.82, 1893.822, 1878.889,
1850.38, 1834.762, 1851.886, 1806.117, 1776.713, 1682.26, 1733.805,
1714.941, 1700.778, 1686.258, 1703.367, 1549.601, 1682.525, 1563.277,
1632.103, 1609.4, 1621.888, 1587.126, 1545.346, 1537.933, 1542.424,
1366.974, 1494.822, 1498.618, 1494.055, 1450.098, 1407.89, 1345.613,
1388.68, 1380.527, 1368.772, 1372.391, 1161.35, 1297.577, 1312.849,
1304.972, 1286.721, 1292.485, 1257.53, 1241.146, 1263.164, 1217.146,
1226.615, 993.046, 1166.837, 1112.254, 1072.249, 1117.723, 1061.758,
1098.207, 1084.597, 1059.916, 1059.685, 1063.814, 1054.735, 944.2,
982.653, 963.989, 969.55, 941.066, 907.014, 930.988, 776.849,
877.918, 889.259, 805.872, 831.361, 803.752, 786.654, 791.649,
814.271, 794.444, 776.833, 694.969, 664.718, 653.238, 661.703,
652.696, 655.997, 637.118, 539.101, 555.694, 491.482, 459.712,
453.73, 490.567, 391.441, 409.506, 319.697, 391.505, 390.46,
308.658, 310.59, 285.799, 268.86, 245.89, 195.933, 243.418, 214.203,
172.129, 173.754, 191.456, 194.795, 98.098, 99.4479999999999,
62.1419999999998), .Dim = c(224L, 2L))
Here is something that may help:
For each detected plant point, find the closest neighboring plant. Hopefully this finds a plant in the same crop row more often than not. If it's known a priori that images are roughly in north/south orientation, we should prefer looking more in the vertical direction to choose neighboring plants. One way to do that is to redefine "distance" for the nearest neighbor search as something anisotropic like
distance = 10 * (x0 - x1)² + (y0 - y1)²
Here is a plot of what this produces, making a line segment between each plant and its nearest neighbor:
It's not perfect, but could be a useful start. Most crop rows are lucky enough that a run of 4 or more plants are correctly chained together.
A thought on a possible strategy from here:
Identify the connected components, the "chains" of plants.
For each chain, regress a best fit line by least squares. Or better yet, use the RANSAC algorithm so that the fit robustly ignores a single stray plant in an otherwise colinear chain.
Again using the rough north/south orientation, consider the best fit line "valid" only if it's close enough to vertical. Supposing it is valid, find all plants that are close to the best fit line. If many plants are close, then the best fit line is likely a crop row.
Hi I've just downloaded a XML file refering to the 5.8S region in aedes aegyptii from NCBI - nucleotide. As an example I paste the info I get for the first sample in the text.
From here I wish to extract
1. <INSDSeq_accession-version>CH477247.1</INSDSeq_accession-version>
2. <INSDSeq_update-date>23-MAR-2015</INSDSeq_update-date>
3. <INSDSeq_create-date>28-OCT-2005</INSDSeq_create-date>
4. <INSDReference_journal>Submitted (07-OCT-2005) Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA </INSDReference_journal>
Also, as I said this is a short version of all the info I really downloadead (13 samples) https://www.ncbi.nlm.nih.gov/nuccore/?term=aedes+aegypti+5.8, is there a posibility to extract the info I wanted for all the samples?
I`m familiar with R but, which platform suites better to do this?
<INSDSeq_locus>CH477247</INSDSeq_locus>
<INSDSeq_length>3065330</INSDSeq_length>
<INSDSeq_strandedness>double</INSDSeq_strandedness>
<INSDSeq_moltype>DNA</INSDSeq_moltype>
<INSDSeq_topology>linear</INSDSeq_topology>
<INSDSeq_division>CON</INSDSeq_division>
<INSDSeq_update-date>23-MAR-2015</INSDSeq_update-date>
<INSDSeq_create-date>28-OCT-2005</INSDSeq_create-date>
<INSDSeq_definition>Aedes aegypti strain Liverpool supercont1.62 genomic scaffold, whole genome shotgun sequence</INSDSeq_definition>
<INSDSeq_primary-accession>CH477247</INSDSeq_primary-accession>
<INSDSeq_accession-version>CH477247.1</INSDSeq_accession-version>
<INSDSeq_other-seqids>
<INSDSeqid>gnl|WGS:AAGE|supercont1.62</INSDSeqid>
<INSDSeqid>gb|CH477247.1|</INSDSeqid>
<INSDSeqid>gi|78216626</INSDSeqid>
</INSDSeq_other-seqids>
<INSDSeq_project>PRJNA12434</INSDSeq_project>
<INSDSeq_keywords>
<INSDKeyword>WGS</INSDKeyword>
</INSDSeq_keywords>
<INSDSeq_source>Aedes aegypti (yellow fever mosquito)</INSDSeq_source>
<INSDSeq_organism>Aedes aegypti</INSDSeq_organism>
<INSDSeq_taxonomy>Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Diptera; Nematocera; Culicoidea; Culicidae; Culicinae; Aedini; Aedes; Stegomyia</INSDSeq_taxonomy>
<INSDSeq_references>
<INSDReference>
<INSDReference_reference>1</INSDReference_reference>
<INSDReference_position>1..3065330</INSDReference_position>
<INSDReference_authors>
<INSDAuthor>Nene,V.</INSDAuthor>
<INSDAuthor>Wortman,J.R.</INSDAuthor>
<INSDAuthor>Lawson,D.</INSDAuthor>
<INSDAuthor>Haas,B.</INSDAuthor>
<INSDAuthor>Kodira,C.</INSDAuthor>
<INSDAuthor>Tu,Z.J.</INSDAuthor>
<INSDAuthor>Loftus,B.</INSDAuthor>
<INSDAuthor>Xi,Z.</INSDAuthor>
<INSDAuthor>Megy,K.</INSDAuthor>
<INSDAuthor>Grabherr,M.</INSDAuthor>
<INSDAuthor>Ren,Q.</INSDAuthor>
<INSDAuthor>Zdobnov,E.M.</INSDAuthor>
<INSDAuthor>Lobo,N.F.</INSDAuthor>
<INSDAuthor>Campbell,K.S.</INSDAuthor>
<INSDAuthor>Brown,S.E.</INSDAuthor>
<INSDAuthor>Bonaldo,M.F.</INSDAuthor>
<INSDAuthor>Zhu,J.</INSDAuthor>
<INSDAuthor>Sinkins,S.P.</INSDAuthor>
<INSDAuthor>Hogenkamp,D.G.</INSDAuthor>
<INSDAuthor>Amedeo,P.</INSDAuthor>
<INSDAuthor>Arensburger,P.</INSDAuthor>
<INSDAuthor>Atkinson,P.W.</INSDAuthor>
<INSDAuthor>Bidwell,S.</INSDAuthor>
<INSDAuthor>Biedler,J.</INSDAuthor>
<INSDAuthor>Birney,E.</INSDAuthor>
<INSDAuthor>Bruggner,R.V.</INSDAuthor>
<INSDAuthor>Costas,J.</INSDAuthor>
<INSDAuthor>Coy,M.R.</INSDAuthor>
<INSDAuthor>Crabtree,J.</INSDAuthor>
<INSDAuthor>Crawford,M.</INSDAuthor>
<INSDAuthor>Debruyn,B.</INSDAuthor>
<INSDAuthor>Decaprio,D.</INSDAuthor>
<INSDAuthor>Eiglmeier,K.</INSDAuthor>
<INSDAuthor>Eisenstadt,E.</INSDAuthor>
<INSDAuthor>El-Dorry,H.</INSDAuthor>
<INSDAuthor>Gelbart,W.M.</INSDAuthor>
<INSDAuthor>Gomes,S.L.</INSDAuthor>
<INSDAuthor>Hammond,M.</INSDAuthor>
<INSDAuthor>Hannick,L.I.</INSDAuthor>
<INSDAuthor>Hogan,J.R.</INSDAuthor>
<INSDAuthor>Holmes,M.H.</INSDAuthor>
<INSDAuthor>Jaffe,D.</INSDAuthor>
<INSDAuthor>Johnston,J.S.</INSDAuthor>
<INSDAuthor>Kennedy,R.C.</INSDAuthor>
<INSDAuthor>Koo,H.</INSDAuthor>
<INSDAuthor>Kravitz,S.</INSDAuthor>
<INSDAuthor>Kriventseva,E.V.</INSDAuthor>
<INSDAuthor>Kulp,D.</INSDAuthor>
<INSDAuthor>Labutti,K.</INSDAuthor>
<INSDAuthor>Lee,E.</INSDAuthor>
<INSDAuthor>Li,S.</INSDAuthor>
<INSDAuthor>Lovin,D.D.</INSDAuthor>
<INSDAuthor>Mao,C.</INSDAuthor>
<INSDAuthor>Mauceli,E.</INSDAuthor>
<INSDAuthor>Menck,C.F.</INSDAuthor>
<INSDAuthor>Miller,J.R.</INSDAuthor>
<INSDAuthor>Montgomery,P.</INSDAuthor>
<INSDAuthor>Mori,A.</INSDAuthor>
<INSDAuthor>Nascimento,A.L.</INSDAuthor>
<INSDAuthor>Naveira,H.F.</INSDAuthor>
<INSDAuthor>Nusbaum,C.</INSDAuthor>
<INSDAuthor>O'leary,S.</INSDAuthor>
<INSDAuthor>Orvis,J.</INSDAuthor>
<INSDAuthor>Pertea,M.</INSDAuthor>
<INSDAuthor>Quesneville,H.</INSDAuthor>
<INSDAuthor>Reidenbach,K.R.</INSDAuthor>
<INSDAuthor>Rogers,Y.H.</INSDAuthor>
<INSDAuthor>Roth,C.W.</INSDAuthor>
<INSDAuthor>Schneider,J.R.</INSDAuthor>
<INSDAuthor>Schatz,M.</INSDAuthor>
<INSDAuthor>Shumway,M.</INSDAuthor>
<INSDAuthor>Stanke,M.</INSDAuthor>
<INSDAuthor>Stinson,E.O.</INSDAuthor>
<INSDAuthor>Tubio,J.M.</INSDAuthor>
<INSDAuthor>Vanzee,J.P.</INSDAuthor>
<INSDAuthor>Verjovski-Almeida,S.</INSDAuthor>
<INSDAuthor>Werner,D.</INSDAuthor>
<INSDAuthor>White,O.</INSDAuthor>
<INSDAuthor>Wyder,S.</INSDAuthor>
<INSDAuthor>Zeng,Q.</INSDAuthor>
<INSDAuthor>Zhao,Q.</INSDAuthor>
<INSDAuthor>Zhao,Y.</INSDAuthor>
<INSDAuthor>Hill,C.A.</INSDAuthor>
<INSDAuthor>Raikhel,A.S.</INSDAuthor>
<INSDAuthor>Soares,M.B.</INSDAuthor>
<INSDAuthor>Knudson,D.L.</INSDAuthor>
<INSDAuthor>Lee,N.H.</INSDAuthor>
<INSDAuthor>Galagan,J.</INSDAuthor>
<INSDAuthor>Salzberg,S.L.</INSDAuthor>
<INSDAuthor>Paulsen,I.T.</INSDAuthor>
<INSDAuthor>Dimopoulos,G.</INSDAuthor>
<INSDAuthor>Collins,F.H.</INSDAuthor>
<INSDAuthor>Birren,B.</INSDAuthor>
<INSDAuthor>Fraser-Liggett,C.M.</INSDAuthor>
<INSDAuthor>Severson,D.W.</INSDAuthor>
</INSDReference_authors>
<INSDReference_title>Genome sequence of Aedes aegypti, a major arbovirus vector</INSDReference_title>
<INSDReference_journal>Science 316 (5832), 1718-1723 (2007)</INSDReference_journal>
<INSDReference_xref>
<INSDXref>
<INSDXref_dbname>doi</INSDXref_dbname>
<INSDXref_id>10.1126/science.1138878</INSDXref_id>
</INSDXref>
</INSDReference_xref>
<INSDReference_pubmed>17510324</INSDReference_pubmed>
</INSDReference>
<INSDReference>
<INSDReference_reference>2</INSDReference_reference>
<INSDReference_position>1..3065330</INSDReference_position>
<INSDReference_authors>
<INSDAuthor>Galagan,J.</INSDAuthor>
<INSDAuthor>Devon,K.</INSDAuthor>
<INSDAuthor>Henn,M.R.</INSDAuthor>
<INSDAuthor>Severson,D.W.</INSDAuthor>
<INSDAuthor>Collins,F.</INSDAuthor>
<INSDAuthor>Jaffe,D.</INSDAuthor>
<INSDAuthor>Rounsley,S.</INSDAuthor>
<INSDAuthor>DeCaprio,D.</INSDAuthor>
<INSDAuthor>Kodira,C.</INSDAuthor>
<INSDAuthor>Lander,E.</INSDAuthor>
<INSDAuthor>Crawford,M.</INSDAuthor>
<INSDAuthor>Butler,J.</INSDAuthor>
<INSDAuthor>Alvarez,P.</INSDAuthor>
<INSDAuthor>Gnerre,S.</INSDAuthor>
<INSDAuthor>Grabherr,M.</INSDAuthor>
<INSDAuthor>Kleber,M.</INSDAuthor>
<INSDAuthor>Mauceli,E.</INSDAuthor>
<INSDAuthor>Brockman,W.</INSDAuthor>
<INSDAuthor>Young,S.</INSDAuthor>
<INSDAuthor>LaButti,K.</INSDAuthor>
<INSDAuthor>Pushparaj,V.</INSDAuthor>
<INSDAuthor>Koehrsen,M.</INSDAuthor>
<INSDAuthor>Engels,R.</INSDAuthor>
<INSDAuthor>Montgomery,P.</INSDAuthor>
<INSDAuthor>Pearson,M.</INSDAuthor>
<INSDAuthor>Howarth,C.</INSDAuthor>
<INSDAuthor>Zeng,Q.</INSDAuthor>
<INSDAuthor>Yandava,C.</INSDAuthor>
<INSDAuthor>Oleary,S.</INSDAuthor>
<INSDAuthor>Alvarado,L.</INSDAuthor>
<INSDAuthor>Nusbaum,C.</INSDAuthor>
<INSDAuthor>Birren,B.</INSDAuthor>
</INSDReference_authors>
<INSDReference_consortium>The Broad Institute Genome Sequencing Platform</INSDReference_consortium>
<INSDReference_title>Direct Submission</INSDReference_title>
<INSDReference_journal>Submitted (07-OCT-2005) Broad Institute of MIT and Harvard, 320 Charles Street, Cambridge, MA 02141, USA</INSDReference_journal>
</INSDReference>
<INSDReference>
<INSDReference_reference>3</INSDReference_reference>
<INSDReference_position>1..3065330</INSDReference_position>
<INSDReference_authors>
<INSDAuthor>Loftus,B.J.</INSDAuthor>
<INSDAuthor>Nene,V.M.</INSDAuthor>
<INSDAuthor>Hannick,L.I.</INSDAuthor>
<INSDAuthor>Bidwell,S.</INSDAuthor>
<INSDAuthor>Haas,B.</INSDAuthor>
<INSDAuthor>Amedeo,P.</INSDAuthor>
<INSDAuthor>Orvis,J.</INSDAuthor>
<INSDAuthor>Wortman,J.R.</INSDAuthor>
<INSDAuthor>White,O.R.</INSDAuthor>
<INSDAuthor>Salzberg,S.</INSDAuthor>
<INSDAuthor>Shumway,M.</INSDAuthor>
<INSDAuthor>Koo,H.</INSDAuthor>
<INSDAuthor>Zhao,Y.</INSDAuthor>
<INSDAuthor>Holmes,M.</INSDAuthor>
<INSDAuthor>Miller,J.</INSDAuthor>
<INSDAuthor>Schatz,M.</INSDAuthor>
<INSDAuthor>Pop,M.</INSDAuthor>
<INSDAuthor>Pai,G.</INSDAuthor>
<INSDAuthor>Utterback,T.</INSDAuthor>
<INSDAuthor>Rogers,Y.-H.</INSDAuthor>
<INSDAuthor>Kravitz,S.</INSDAuthor>
<INSDAuthor>Fraser,C.M.</INSDAuthor>
</INSDReference_authors>
<INSDReference_title>Direct Submission</INSDReference_title>
<INSDReference_journal>Submitted (07-OCT-2005) The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA</INSDReference_journal>
</INSDReference>
<INSDReference>
<INSDReference_reference>4</INSDReference_reference>
<INSDReference_position>1..3065330</INSDReference_position>
<INSDReference_consortium>VectorBase</INSDReference_consortium>
<INSDReference_title>Direct Submission</INSDReference_title>
<INSDReference_journal>Submitted (05-SEP-2012) VectorBase / Ensembl, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK</INSDReference_journal>
<INSDReference_remark>Annotation update by submitter</INSDReference_remark>
</INSDReference>
</INSDSeq_references>
<INSDSeq_comment>The sequence for this assembly was produced jointly by The Broad Institute of Harvard/MIT and The Institute for Genomic Research. The assembly represents 7.6X sequence coverage of the genome and the total length of the contigs is 1.31 Gb. Additional information about the Aedes aegypti sequencing project and assembly can be found at http://www.broad.mit.edu/annotation/disease_vector/aedes_aegypti/ and http://www.tigr.org/msc/aedes/aedes.shtml. Long-term curation of the sequence and subsequent annotation updates will be the responsibility of VectorBase at http://www.vectorbase.org.~Annotation was updated by VectorBase in Sept 2012.</INSDSeq_comment>
<INSDSeq_feature-table>
<INSDFeature>
<INSDFeature_key>source</INSDFeature_key>
<INSDFeature_location>1..3065330</INSDFeature_location>
<INSDFeature_intervals>
<INSDInterval>
<INSDInterval_from>1</INSDInterval_from>
<INSDInterval_to>3065330</INSDInterval_to>
<INSDInterval_accession>CH477247.1</INSDInterval_accession>
</INSDInterval>
</INSDFeature_intervals>
<INSDFeature_quals>
<INSDQualifier>
<INSDQualifier_name>organism</INSDQualifier_name>
<INSDQualifier_value>Aedes aegypti</INSDQualifier_value>
</INSDQualifier>
<INSDQualifier>
<INSDQualifier_name>mol_type</INSDQualifier_name>
<INSDQualifier_value>genomic DNA</INSDQualifier_value>
</INSDQualifier>
<INSDQualifier>
<INSDQualifier_name>strain</INSDQualifier_name>
<INSDQualifier_value>Liverpool</INSDQualifier_value>
</INSDQualifier>
<INSDQualifier>
<INSDQualifier_name>db_xref</INSDQualifier_name>
<INSDQualifier_value>taxon:7159</INSDQualifier_value>
</INSDQualifier>
<INSDQualifier>
<INSDQualifier_name>chromosome</INSDQualifier_name>
<INSDQualifier_value>2</INSDQualifier_value>
</INSDQualifier>
</INSDFeature_quals>
</INSDFeature>
</INSDSeq_feature-table>
<INSDSeq_contig>join(AAGE02003964.1:1..7226,gap(unk100),AAGE02003965.1:1..6376,gap(unk100),AAGE02003966.1:1..16236,gap(4301),AAGE02003967.1:1..174188,gap(unk100),AAGE02003968.1:1..24199,gap(1396),AAGE02003969.1:1..104064,gap(29770),AAGE02003970.1:1..12303,gap(56956),AAGE02003971.1:1..2368,gap(12542),AAGE02003972.1:1..29888,gap(1379),AAGE02003973.1:1..98175,gap(unk100),AAGE02003974.1:1..13180,gap(unk100),AAGE02003975.1:1..2872,gap(unk100),AAGE02003976.1:1..18626,gap(unk100),AAGE02003977.1:1..52378,gap(151),AAGE02003978.1:1..153108,gap(901),AAGE02003979.1:1..3583,gap(unk100),AAGE02003980.1:1..32852,gap(unk100),AAGE02003981.1:1..68239,gap(unk100),AAGE02003982.1:1..61056,gap(unk100),AAGE02003983.1:1..21852,gap(unk100),AAGE02003984.1:1..49659,gap(unk100),AAGE02003985.1:1..33070,gap(315),AAGE02003986.1:1..411266,gap(unk100),AAGE02003987.1:1..2985,gap(unk100),AAGE02003988.1:1..38365,gap(159),AAGE02003989.1:1..110697,gap(890),AAGE02003990.1:1..22405,gap(2299),AAGE02003991.1:1..7510,gap(187),AAGE02003992.1:1..447937,gap(263),AAGE02003993.1:1..92770,gap(1409),AAGE02003994.1:1..2258,gap(132),AAGE02003995.1:1..5605,gap(unk100),AAGE02003996.1:1..3451,gap(2717),AAGE02003997.1:1..20215,gap(unk100),AAGE02003998.1:1..35683,gap(514),AAGE02003999.1:1..307288,gap(unk100),AAGE02004000.1:1..71359,gap(433),AAGE02004001.1:1..10550,gap(unk100),AAGE02004002.1:1..289125,gap(unk100),AAGE02004003.1:1..45622,gap(unk100),AAGE02004004.1:1..35927)</INSDSeq_contig>
<INSDSeq_xrefs>
<INSDXref>
<INSDXref_dbname>BioProject</INSDXref_dbname>
<INSDXref_id>PRJNA12434</INSDXref_id>
</INSDXref>
<INSDXref>
<INSDXref_dbname>BioSample</INSDXref_dbname>
<INSDXref_id>SAMN02953616</INSDXref_id>
</INSDXref>
</INSDSeq_xrefs>
`
Use an xpath or a CSS selector.
Depending on the language and libraries you use.