My XML looks something like this :
<books>
<book id="b1">
<title>Set theory and the continuum problem</title>
<category>Mathematics</category>
<location>
<area>hall1</area>
<case>1</case>
<shelf>2</shelf>
</location>
<description>A lucid, elegant, and complete survey of set theory.</description>
<history>
<borrowed by="m4"/>
<borrowed by="m2" until="2018-04-05"/>
</history>
</book>
<book id="b2">
<title>Computational Complexity</title>
<isbn>978-0201-530-827</isbn>
<category>Computer Science</category>
<location>
<area>hall1</area>
<case>3</case>
<shelf>3</shelf>
</location>
<description>.</description>
</book>
<book id="b3">
<title>To mock a mockingbird</title>
<isbn>1-292-09761-2</isbn>
<category>Logic</category>
<category>Mathematics</category>
<location>
<area>hall1</area>
<case>1</case>
<shelf>3</shelf>
</location>
<description>.</description>
</book>
</books>
Is it possible to count how many books are there with elements area='hall1' and case='1'?
I tried this:
count(//books/book[location/area='hall1'])
but i do not know how to include case='1' "restriction" also
This should work:
count(//books/book[location/area='hall1'][location/case='1'])
** ADDITIONAL QUESTION **
Is it possible to list all book's titles which have area='hall1' and case='1'?
for $bb in //books/book[location/area='hall1'][location/case='1']
let $n := //books/book/title
return <book>$n</book>
Related
I have very very large complex xml files (look like this https://github.com/HL7/C-CDA-Examples/blob/master/General/Parent%20Document%20Replace%20Relationship/CCD%20Parent%20Document%20Replace%20(C-CDAR2.1).xml ) to process but only need attributes and values at particular XPaths (nodes). By removing unneeded nodes, processing time may be cut, filtering out fluff before detailed processing.
So far I have tried using: xml_remove
xmlfile <- paste0(dir,"xmlFiles/",filelist[k])
file<-read_xml(xmlfile)
file<-xml_ns_strip(file)
for(counx in 1:nrow(xpathTable)){
xr <- xml_find_all(file, xpath =paste0('/',toString(xpathTable$xpaths[counx])) )
xml_remove(xr, free = TRUE)
file<-file
}
This works well for removing few nodes but crashes as the numbers go up (>100)
Below show a kind of example of what I want to get too
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
<ISBN>
<Random>12354</Random>
</ISBN>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<ISBN>
<Random>12345</Random>
</ISBN>
<price>39.95</price>
</book>
</bookstore>
Filter by XPaths
/bookstore/book/title
/bookstore/book/year
/bookstore/book/ISBN/Random
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<year>2005</year>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<year>2005</year>
<ISBN>
<Random>12354</Random>
</ISBN>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<year>2003</year>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<year>2003</year>
<ISBN>
<Random>12345</Random>
</ISBN>
</book>
</bookstore>
Looks like an XQuery job, e.g. you could recreate your document like this
<bookstore>{
for $book in /bookstore/*
return <book category="{$book/#category}">
{$book/title}
{$book/year}
{$book/ISBN}
</book>
}</bookstore>
Using the book example to get the result below it. You can test this online here using XQuery as an option https://www.videlibri.de/cgi-bin/xidelcgi
There might be ways to run XQuery from R but I would rather do it in a pre-processing step from the command line using a tool like xidel.
All elements could be looked up in a single XPath 1.0 expression valid for many languages:
/bookstore/book/descendant::*[name()="title" or name()="year" or name()="Random"]
Equivalent/similar expressions:
/bookstore/book/title | /bookstore/book/year | /bookstore/book/ISBN/Random
//book/#category | //book/year | //ISBN/Random
To filter out elements:
//book/*[not(name()="title" or name()="year" or name()="ISBN" or name()="Random")]
For XMLs with namespaces, local-name() can be used instead of name() if namespace handling is not used.
For the given example and elements and testing on command line:
echo 'cat /bookstore/book/descendant::*[name()="title" or name()="year" or name()="Random"]' | xmllint --shell test.xml
Result:
/ > cat /bookstore/book/descendant::*[name()="title" or name()="year" or name()="Random"]
-------
<title lang="en">Everyday Italian</title>
-------
<year>2005</year>
-------
<title lang="en">Harry Potter</title>
-------
<year>2005</year>
-------
<Random>12354</Random>
-------
<title lang="en">XQuery Kick Start</title>
-------
<year>2003</year>
-------
<title lang="en">Learning XML</title>
-------
<year>2003</year>
-------
<Random>12345</Random>
/ >
For the mentioned R crash, worth looking here.
I used this Here Geocode API to get the area shapes for the district. However, there's no available shapes being returned on the API.
here is the link for the API
https://geocoder.ls.hereapi.com/6.2/geocode.xml?xnlp=CL_JSMv3.0.17.0&apiKey=<APIKEY>&searchtext=Cakung%20Barat%20Kel.%20Jakarta%2013910%20Indonesia&mode=retrieveAddresses&jsoncallback=H.service.jsonp.handleResponse(37)&gen=9&additionalData=IncludeShapeLevel,district
here is the response
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:Search xmlns:ns2="http://www.navteq.com/lbsp/Search-Search/4">
<Response>
<MetaInfo>
<Timestamp>2021-07-28T10:09:26.934Z</Timestamp>
</MetaInfo>
<View xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns2:SearchResultsViewType">
<ViewId>0</ViewId>
<Result>
<Relevance>1.0</Relevance>
<MatchLevel>district</MatchLevel>
<MatchQuality>
<Country>1.0</Country>
<City>1.0</City>
<Subdistrict>1.0</Subdistrict>
<PostalCode>1.0</PostalCode>
</MatchQuality>
<Location>
<LocationId>NT_ezpLZJFGXsz2rWjEVeA2oD</LocationId>
<LocationType>area</LocationType>
<DisplayPosition>
<Latitude>-6.16492</Latitude>
<Longitude>106.93429</Longitude>
</DisplayPosition>
<NavigationPosition>
<Latitude>-6.16492</Latitude>
<Longitude>106.93429</Longitude>
</NavigationPosition>
<MapView>
<TopLeft>
<Latitude>-6.1516</Latitude>
<Longitude>106.92209</Longitude>
</TopLeft>
<BottomRight>
<Latitude>-6.19349</Latitude>
<Longitude>106.94349</Longitude>
</BottomRight>
</MapView>
<Address>
<Label>Cakung Barat Kel., Cakung, Jakarta, Indonesia</Label>
<Country>IDN</Country>
<County>DKI Jakarta</County>
<City>Jakarta</City>
<District>Cakung</District>
<Subdistrict>Cakung Barat Kel.</Subdistrict>
<PostalCode>13910</PostalCode>
<AdditionalData key="CountryName">Indonesia</AdditionalData>
<AdditionalData key="CountyName">DKI Jakarta</AdditionalData>
</Address>
</Location>
</Result>
</View>
</Response>
</ns2:Search>
it is working for city and other options. based from the documentation, district is included on valid shape levels
We have tried to reproduce the request but it was determined as the coverage range is not supportive to get district-level shape for Jakarta.
For Example,
Demo link: https://tcs.ext.here.com/examples/v3/admin_boundaries
Also, we have documentation information with regards to coverage range.
Link: https://developer.here.com/documentation/geocoder/dev_guide/topics/coverage-geocoder.html
I am trying to convert the below XML file to a data frame with data time also as a column of the data frame but I am unable to extract the date time attribute. Any ideas on how to do this in R? A demonstration using a sample code will be helpful to understand how to do this in R.
<?xml version="1.0" encoding="UTF-8"?>
<Group snapshotTime="2018-05-30T19:33:44.352Z">
<Links>
<rel>self</rel>
<href>https:cloud.com/Group/1</href>
</Links>
<Links>
<rel>last</rel>
<href>https:cloud.com/Group/6</href>
</Links>
<Links>
<rel>next</rel>
<href>https:cloud.com/Group/2</href>
</Links>
<Equipment>
<EquipmentHeader>
<Name>CASE IH</Name>
<Model>1100</Model>
<EquipmentID> Desk</EquipmentID>
<SerialNumber>1231</SerialNumber>
<PIN>123</PIN>
</EquipmentHeader>
<Location datetime="2012-06-25T11:14:54.000Z">
<Latitude>12.573722</Latitude>
<Longitude>-45.515805</Longitude>
</Location>
<Ophrs datetime="2012-03-01T17:42:37.000Z">
<Hour>1968.80</Hour>
</Ophrs>
</Equipment>
<Equipment>
<EquipmentHeader>
<Name>CALL</Name>
<Model>L2048</Model>
<EquipmentID>1MM772GP4</EquipmentID>
<SerialNumber>1TT772GPVJF688214</SerialNumber>
<PIN>1TT772G4</PIN>
</EquipmentHeader>
<Location datetime="2018-05-30T19:22:46.000Z">
<Latitude>15.518556</Latitude>
<Longitude>-55.422444</Longitude>
</Location>
<CumulativeIdleHours datetime="2018-05-30T19:02:46.000Z">
<Hour>14.74</Hour>
</CumulativeIdleHours>
<Ophrs datetime="2018-05-30T19:22:48.000Z">
<Hour>52.35</Hour>
</Ophrs>
<Distance datetime="2018-05-30T19:02:46.000Z">
<OdometerUnits>kilometre</OdometerUnits>
<Odometer>130.9</Odometer>
</Distance>
<FuelUsed datetime="2018-05-30T19:02:46.000Z">
<FuelUnits>litre</FuelUnits>
<FuelConsumed>395</FuelConsumed>
</FuelUsed>
</Equipment>
</Group>
Here's how you would get the datetime attribute, for example, for the Location nodes:
library("xml2")
library("tidyverse")
temp <- '<?xml version="1.0" encoding="UTF-8"?>
<Group snapshotTime="2018-05-30T19:33:44.352Z">
<Links>
<rel>self</rel>
<href>https:cloud.com/Group/1</href>
</Links>
<Links>
<rel>last</rel>
<href>https:cloud.com/Group/6</href>
</Links>
<Links>
<rel>next</rel>
<href>https:cloud.com/Group/2</href>
</Links>
<Equipment>
<EquipmentHeader>
<Name>CASE IH</Name>
<Model>1100</Model>
<EquipmentID> Desk</EquipmentID>
<SerialNumber>1231</SerialNumber>
<PIN>123</PIN>
</EquipmentHeader>
<Location datetime="2012-06-25T11:14:54.000Z">
<Latitude>12.573722</Latitude>
<Longitude>-45.515805</Longitude>
</Location>
<Ophrs datetime="2012-03-01T17:42:37.000Z">
<Hour>1968.80</Hour>
</Ophrs>
</Equipment>
<Equipment>
<EquipmentHeader>
<Name>CALL</Name>
<Model>L2048</Model>
<EquipmentID>1MM772GP4</EquipmentID>
<SerialNumber>1TT772GPVJF688214</SerialNumber>
<PIN>1TT772G4</PIN>
</EquipmentHeader>
<Location datetime="2018-05-30T19:22:46.000Z">
<Latitude>15.518556</Latitude>
<Longitude>-55.422444</Longitude>
</Location>
<CumulativeIdleHours datetime="2018-05-30T19:02:46.000Z">
<Hour>14.74</Hour>
</CumulativeIdleHours>
<Ophrs datetime="2018-05-30T19:22:48.000Z">
<Hour>52.35</Hour>
</Ophrs>
<Distance datetime="2018-05-30T19:02:46.000Z">
<OdometerUnits>kilometre</OdometerUnits>
<Odometer>130.9</Odometer>
</Distance>
<FuelUsed datetime="2018-05-30T19:02:46.000Z">
<FuelUnits>litre</FuelUnits>
<FuelConsumed>395</FuelConsumed>
</FuelUsed>
</Equipment>
</Group>'
temp %>% xml2::read_xml() %>% rvest::xml_nodes("Location") %>% xml2::xml_attr("datetime")
Hope this helps.
Below is my sample XML
<catalog>
<book>
<author>Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
<storeno>123</storeno>
</book>
<book>
<author>Ralls, Kim</author>
<title>Rain Fantasy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former</description>
<storeno>123</storeno>
</book>
<book>
<author>zxcv</author>
<title>Maeve</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology</description>
<storeno>123</storeno>
</book>
<book>
<author>zxcv</author>
<title>Legacy</title>
<genre>Fiction</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse</description>
<storeno>123</storeno>
</book>
<book>
<author>Corets, Eva</author>
<title>The</title>
<genre>Fiction</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters</description>
<storeno>123</storeno>
</book>
<book>
<author>Horror</author>
<title>Horror</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When abc meets xyz</description>
<storeno>123</storeno>
</book>
<book>
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
<storeno>123</storeno>
</book>
<book>
<author>O'Brien, Tim</author>
<title>kids ganes</title>
<genre>story</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
<storeno>123</storeno>
</book>
<book>
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The abc</description>
<storeno>123</storeno>
</book>
<book>
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>story</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio</description>
<storeno>123</storeno>
</book>
</catalog>
and I need an XQuery that returns
<titles>
need the first instance of the title where the genre is “Fantasy”
need all the titles concatenated where the genre is “Computer”
need all the titles concatenated where the genre is “Fiction”
need the first instance of the title where the genre is “Story”
</ titles >
example:( Rain Fantasy, XML Developer's Guide………………, Legacy……………….., kids ganes)
Note: the case can be ignored in the above for comparison.
Here is what we are trying
<Titles>
let $fan := $catalog /book[genre = ‘Fantasy’][1]/title
let $stry := $catalog /book[genre = ‘Story’][1]/title
for $comp in $catalog /book[genre ='Computer']/title
return concat($comp, “”)
for $fict in $catalog /book[genre ='Fiction']/title
return concat($fict, “”)
concat($fan, $comp, $fict, $stry)
</Titles>
we are facing issues in multiple for loops implementation.
Any help is really appreciated.
Thanks in advance
From the question and the comments you seem to want something like this:
<Titles>
{
for $genre in distinct-values($catalog/book/genre)
let $books := $catalog/book[genre=$genre]
let $retval := if($genre=("Fantasy","Story")) then $books[1]/title else $books/title
return data($retval)
}
</Titles>
With your input, the result this gives is:
<Titles>XML Developer's Guide Rain Fantasy Legacy The Horror Creepy Crawlies kids ganes Visual Studio 7: A Comprehensive Guide MSXML3: A Comprehensive Guide</Titles>
My gut tells me you probably don't want the data() part though. Without it you get:
<Titles>
<title>XML Developer's Guide</title>
<title>Rain Fantasy</title>
<title>Legacy</title>
<title>The</title>
<title>Horror</title>
<title>Creepy Crawlies</title>
<title>kids ganes</title>
<title>Visual Studio 7: A Comprehensive Guide</title>
<title>MSXML3: A Comprehensive Guide</title>
</Titles>
I have used the XML package to parse both HTML and XML before, and have a rudimentary grasp of xPath. However I've been asked to consider XML data where the important bits are determined by a combination of text and attributes of the elements themselves, as well as those in related nodes. I've never done that. For example
[updated example, slightly more expansive]
<Catalogue>
<Bookstore id="ID910705541">
<location>foo bar</location>
<books>
<book category="A" id="1">
<title>Alpha</title>
<author ref="1">Matthew</author>
<author>Mark</author>
<author>Luke</author>
<author ref="2">John</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Beta</title>
<author ref="1">Huey</author>
<author>Duey</author>
<author>Louie</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>Gamma</title>
<author ref="1">Tweedle Dee</author>
<author ref="2">Tweedle Dum</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
<Bookstore id="ID910700051">
<location>foo</location>
<books>
<book category="A" id="1">
<title>Happy</title>
<author>Dopey</author>
<author>Bashful</author>
<author>Doc</author>
<author ref="1">Grumpy</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Ni</title>
<author ref="1">John</author>
<author ref="2">Paul</author>
<author ref="3">George</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>San</title>
<author ref="1">Ringo</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
<Bookstore id="ID910715717">
<location>bar</location>
<books>
<book category="A" id="1">
<title>Un</title>
<author ref="1">Winkin</author>
<author>Blinkin</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Deux</title>
<author>Nod</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>Trois</title>
<author>Manny</author>
<author>Moe</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
</Catalogue>
I would like to extract all author names where:
1) the location element has a text value that contains "NY"
2) the author element does NOT contain a "ref" attribute; that is where ref is not present in the author tag
I will ultimately need to concatenate the extracted authors together within a given bookstore, so that my resulting data frame is one row per store. I'd like to preserve the bookstore id as an additional field in my data frame so that I can uniqely reference each store.
Since only the first bokstore is in NY, results from this simple example would look something like:
1 Jane Smith John Doe Karl Pearson William Gosset
If another bookstore contained "NY" in its location, it would comprise the second row, and so forth.
Am I asking too much of R to parser under these convoluted conditions?
require(XML)
xdata <- xmlParse(apptext)
xpathSApply(xdata,'//*/location[text()[contains(.,"NY")]]/following-sibling::books/.//author[not(#ref)]')
#[[1]]
#<author>Jane Smith</author>
#[[2]]
#<author>John Doe</author>
#[[3]]
#<author>Karl Pearson</author>
#[[4]]
#<author>William Gosset</author>
Breakdown:
Get all locations containing 'NY'
//*/location[text()[contains(.,"NY")]]
Get the books sibling of these nodes
/following-sibling::books
from these notes get all authors without a ref attribute
/.//author[not(#ref)]
Use xmlValue if you want the text:
> xpathSApply(xdata,'//*/location[text()[contains(.,"NY")]]/following-sibling::books/.//author[not(#ref)]',xmlValue)
[1] "Jane Smith" "John Doe" "Karl Pearson" "William Gosset"
UPDATE:
child.nodes <- xpathSApply(xdata,'//*/location[text()[contains(.,"NY")]]/following-sibling::books/.//author[not(#ref)]')
ans.func<-function(x){
xpathSApply(x,'.//ancestor::bookstore[#id]/#id')
}
sapply(child.nodes,ans.func)
# id id id id
#"1" "1" "1" "1"
UPDATE 2:
With your changed data
xdata <- '<Catalogue>
<Bookstore id="ID910705541">
<location>foo bar</location>
<books>
<book category="A" id="1">
<title>Alpha</title>
<author ref="1">Matthew</author>
<author>Mark</author>
<author>Luke</author>
<author ref="2">John</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Beta</title>
<author ref="1">Huey</author>
<author>Duey</author>
<author>Louie</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>Gamma</title>
<author ref="1">Tweedle Dee</author>
<author ref="2">Tweedle Dum</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
<Bookstore id="ID910700051">
<location>foo</location>
<books>
<book category="A" id="1">
<title>Happy</title>
<author>Dopey</author>
<author>Bashful</author>
<author>Doc</author>
<author ref="1">Grumpy</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Ni</title>
<author ref="1">John</author>
<author ref="2">Paul</author>
<author ref="3">George</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>San</title>
<author ref="1">Ringo</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
<Bookstore id="ID910715717">
<location>bar</location>
<books>
<book category="A" id="1">
<title>Un</title>
<author ref="1">Winkin</author>
<author>Blinkin</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="B" id="10">
<title>Deux</title>
<author>Nod</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="D" id="100">
<title>Trois</title>
<author>Manny</author>
<author>Moe</author>
<year>2005</year>
<price>29.99</price>
</book>
</books>
</Bookstore>
</Catalogue>'
Note previously you had bookstore now Bookstore. NY is gone so I have used foo
require(XML)
xdata <- xmlParse(xdata)
child.nodes <- getNodeSet(xdata,'//*/location[text()[contains(.,"foo")]]/following-sibling::books/.//author[not(#ref)]')
ans.func<-function(x){
xpathSApply(x,'.//ancestor::Bookstore[#id]/#id')
}
sapply(child.nodes,ans.func)
# id id id id id
#"ID910705541" "ID910705541" "ID910705541" "ID910705541" "ID910700051"
# id id
#"ID910700051" "ID910700051"
xpathSApply(xdata,'//*/location[text()[contains(.,"foo")]]/following-sibling::books/.//author[not(#ref)]',xmlValue)
# [1] "Mark" "Luke" "Duey" "Louie" "Dopey" "Bashful" "Doc"