I have the following XML that I need to transform:
<?xml version="1.0" encoding="utf-8"?>
<TestRecords>
<TestData>
<Users>
<User>
<Id>BG123</Id>
<Name>Bill Gates</Name>
</User>
<User>
<Id>SN123</Id>
<Name>Satya Nadella</Name>
</User>
</Users>
<UserDetails>
<UserDetail>
<UserId>SN123</UserId>
<CompanyName>Microsoft Corp</CompanyName>
</UserDetail>
<UserDetail>
<UserId>
<UserId>BG123</UserId>
<CompanyName>Bill Gates Foundation</CompanyName>
</UserId>
</UserDetail>
</UserDetails>
I need to map this XML into the following XML:
<?xml version="1.0" encoding="utf-8"?>
<TestRecords>
<TestData>
<Users>
<User>
<Id>BG123</Id>
<Name>Bill Gates</Name>
<CompanyName>Bill Gates Foundation</CompanyName>
</User>
<User>
<Id>SN123</Id>
<Name>Satya Nadella</Name>
<CompanyName>Microsoft Corp</CompanyName>
</User>
</Users>
</TestData>
</TestRecords>
When I loop over Users/User, I need to find the UserDetail where UserDetail/UserId is equal to the current User/Id
Thank you and best regards
Michael
If you don't want to do Custom XSLT as suggested by FCR the only other option when you have different looping structures is to have an intermediate schema and two maps.
Which produces
<TestRecords>
<TestData>
<Users>
<User>
<Id>BG123</Id>
<Name>Bill Gates</Name>
<UserDetails>
<UserID>SN123</UserID>
<CompanyName>Microsoft Corp</CompanyName>
</UserDetails>
<UserDetails>
<UserID>BG123</UserID>
<CompanyName>Bill Gates Foundation</CompanyName>
</UserDetails>
</User>
<User>
<Id>SN123</Id>
<Name>Satya Nadella</Name>
<UserDetails>
<UserID>SN123</UserID>
<CompanyName>Microsoft Corp</CompanyName>
</UserDetails>
<UserDetails>
<UserID>BG123</UserID>
<CompanyName>Bill Gates Foundation</CompanyName>
</UserDetails>
</User>
</Users>
</TestData>
</TestRecords>
Which you can then run through this second map to produce the desired outcome.
This will become very inefficient however if the second list is large.
This is a common lookup pattern in xslt and there is also the opportunity to use xsl:key to create an index which can boost performance on large documents. Refer here if you need to convert a .btm to xslt.
(Also, I'm assuming that there isn't a double wrapper UserId at on the last UserDetails/UserDetail element):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output indent="yes"/>
<xsl:key name="userLookup"
match="/TestRecords/TestData/UserDetails/UserDetail" use="UserId"/>
<!--identity template - copy everything by default -->
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<!--i.e.match only Users in the first Users/User tree. Actually the explicit
ancestor qualifier is redundant because of the other suppress template -->
<xsl:template match="User[ancestor::Users]">
<User>
<xsl:copy-of select="child::*" />
<CompanyName>
<xsl:value-of select="key('userLookup', Id)/CompanyName"/>
</CompanyName>
</User>
</xsl:template>
<!--Suppress the second userdetails part of the tree entirely -->
<xsl:template match="UserDetails" />
</xsl:stylesheet>
Fiddle here
Related
I am trying to replace text node with element if document matches that text, below query I have tried but it is giving error "Target is not an element, text, attribute, comment or pi" below is my query.
inputXML:
<book>
<p>Isn't it lovely here? Very smart. We'll be like three queens when you've finished with us,
Edie. You doing well then?</p>
<p>
<name type="person">April De Angelis</name>’ plays include <title type="work">Positive
Hour</title> (Out of Joint) <title type="work">Playhouse Creatures</title> (<name
type="org">Sphinx Theatre Company</name>), <title type="work">Hush</title> (<name
type="org">Royal Court</name>), <title type="work">Soft Vengeance</title>, <title
type="work">The Life and Times of Fanny Hill</title> (adapted from the <name type="org"
>John Cleland novel</name>) and <title type="work">Ironmistress</title>. Her work for
radio includes <title>The Outlander</title> (<name type="org">Radio 5</name>), which won the
<name type="org">Writers’ Guild Award</name> (<date>1992</date>), and, for opera, <title
type="work">Flight</title> with composer <name type="person">Jonathan Dove</name> (<name
type="place">Glyndebourne</name>, <date>1998</date>).</p>
</book>
Expected output:
<book>
<p>Isn't it lovely here? Very smart. We'll be like three <highlight>>queens</highlight> when
you've finished with us, Edie. You doing well then?</p>
<p>
<name type="person">April De Angelis</name>’ plays <highlight>include</highlight>
<title type="work">Positive Hour</title> (Out of Joint) <title type="work">Playhouse
Creatures</title> (<name type="org">Sphinx Theatre Company</name>), <title type="work"
>Hush</title> (<name type="org">Royal Court</name>), <title type="work">Soft
Vengeance</title>, <title type="work">The Life and Times of Fanny Hill</title> (adapted
from the <name type="org">John Cleland novel</name>) and <title type="work"
>Ironmistress</title>. Her work for radio includes <title>The Outlander</title> (<name
type="org">Radio 5</name>), which won the <name type="org">Writers’ Guild Award</name>
(<date>1992</date>), and, for opera, <title type="work">Flight</title> with composer
<name type="person">Jonathan Dove</name> (<name type="place">Glyndebourne</name>,
<date>1998</date>).</p>
</book>
I am using BaseX version 9.5.1 below is the code.
let $body := <indexedterms>
<content>
<terms>
<term>include</term>
<term>Queens</term>
</terms>
<uri>/IEEE/IEEE/test.xml</uri>
</content>
</indexedterms>
for $contents in $body/content
let $uri := $contents/uri
let $doc := fn:doc($uri)
for $selectedterm in $contents/terms/term/string()
let $Modifieddoc := copy $c := $doc
modify
(
for $nodes in $c//*//text()[fn:matches(.,$selectedterm)]/parent::*
return
if($nodes/node()[fn:matches(.,$selectedterm)]/parent::*:highlight)
then ()
else
replace node $nodes/$selectedterm with <highlight>{$selectedterm}</highlight>
)
return $c
return
db:replace('IEEE',substring-after($uri,'/IEEE'),$Modifieddoc)
Previously I was using the "replace node $nodes/node()[fn:contains(.,$selectedterm)] with {$selectedterm} " instead of "replace node $nodes/$selectedterm with {$selectedterm}" it was doing the work but where terms like steam e.g.(include, includes) so it was matching the both words which is not correct so I have changed the code to "replace node "$nodes/$selectedterm with {$selectedterm}"
$nodes/$selectedterm is probably the culprit and most likely not what you want as the $selectedterm variable is a sequence of string values (you bind for $selectedterm in $contents/terms/term/string()). It might help us understand what you want to achieve if you show us a sample document you load with the doc function and the update you want to do on that with BaseX, for instance, for the two sample terms you have shown in your code snippet.
Your task of identifying and wrapping search terms in your text contents can be done nicely in XSLT 3 or 3 which you can run with BaseX if you put Saxon 9.9 or 10 or 11 on the class path:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:param name="terms" as="xs:string*" select="'include', 'Queens'"/>
<xsl:output method="xml" indent="no"/>
<xsl:template match="p//text()">
<xsl:apply-templates select="analyze-string(., string-join($terms, '|'), 'i')/node()"/>
</xsl:template>
<xsl:template match="fn:match">
<highlight>{.}</highlight>
</xsl:template>
<xsl:template match="fn:non-match">
<xsl:apply-templates/>
</xsl:template>
<xsl:mode on-no-match="shallow-copy"/>
</xsl:stylesheet>
As the used analyze-string function exists also in BaseX/XQuery you should also be able to use XQuery update on the result calling that function, i.e. by replacing fn:match elements with highlight elements.
I have a fairly deep xml file of travel data that I've anonymized here. I would like to pull out the coupon statuses for multiple segments and attach them to the itinerary id. I'm having a very difficult time using the xml2 package and I think the reason why is that some of my XML data terminates with text and some terminate with attributes. I've tried to convert the xml to a list with as_list(). I've also tried to start with xml_find_all() but get a nodeset of 0 regardless of the node I search for (Ticketing or Coupons should work, for example). Below is the data:
<?xml version="1.0" encoding="UTF-8"?>
<eTicketCouponRS xmlns="http://webse" xmlns:ns4="http://s" xmlns:stl="http://se" Version="2.0.0">
<stl:ApplicationResults status="Complete">
<stl:Success timeStamp="2021-06-16T11:39:52-05:00" />
</stl:ApplicationResults>
<TicketingInfos>
<TicketingInfo>
<Ticketing AgencyCity="DCA" AgentWorkArea="A" IATA_Number="0952" IssuingAgent="A" PrimeHostID="1S" PseudoCityCode="5SE0" TransactionDateTime="2021-06-16T11:39">
<CouponData InformationSource="S" IssueDate="2021-03-29" NumBooklets="1" TicketMedia="E" TicketMode="63">
<AirItineraryPricingInfo>
<FareCalculation>
<Text>SAN AA X/E/DFW AA TYO M0.00NUC0.00END ROE1.00 XFSAN4.5DFW4.5</Text>
</FareCalculation>
<ItinTotalFare>
<BaseFare Amount="0.00" CurrencyCode="USD" />
<Taxes>
<Tax Amount="19.10" TaxCode="US" />
<Tax Amount="5.60" TaxCode="AY" />
<Tax Amount="9.00" TaxCode="XF" />
</Taxes>
<TotalFare Amount=".70" CurrencyCode="USD" />
</ItinTotalFare>
<PassengerTypeQuantity Code="GV1" />
</AirItineraryPricingInfo>
<Coupons>
<Coupon CodedStatus="OK" Number="1" StatusCode="RFND">
<FlightSegment DepartureDateTime="2021-08-13T06:15" FlightNumber="2535" RPH="1" ResBookDesigCode="V">
<DestinationLocation LocationCode="DFW" />
<FareBasis Code="VCA" />
<MarketingAirline Code="AA" FlightNumber="2535" />
<OperatingAirline Code="AA" />
<OriginLocation LocationCode="SAN" />
</FlightSegment>
</Coupon>
<Coupon CodedStatus="OK" Number="2" StatusCode="RFND">
<FlightSegment ConnectionInd="X" DepartureDateTime="2021-08-13T12:20" FlightNumber="175" RPH="2" ResBookDesigCode="V">
<DestinationLocation LocationCode="HND" />
<FareBasis Code="VCA" />
<MarketingAirline Code="AA" FlightNumber="175" />
<OperatingAirline Code="AA" />
<OriginLocation LocationCode="DFW" />
<FareTypeClass>PG</FareTypeClass>
<FareTypeRule>OW-GO</FareTypeRule>
</FlightSegment>
</Coupon>
</Coupons>
<CustomerInfo>
<Customer>
<Invoice Number="126" />
<Payment ApprovalID="03" RPH="1" ReferenceNumber="XXXXXXXXXXXX" Type="CC">
<CC_Info>
<PaymentCard Code="VI" ExpirationDate="XX-XX" />
</CC_Info>
</Payment>
<PersonName NameReference="PCS" PassengerType="GV1">
<GivenName>VER</GivenName>
<Surname>DE</Surname>
</PersonName>
</Customer>
</CustomerInfo>
<ItineraryRef CustomerIdentifier="R5" ID="EXAMPLE" />
</CouponData>
</Ticketing>
</TicketingInfo>
</TicketingInfos>
</eTicketCouponRS>
I have about 100 of these each to load separately and pull out a small table consisting of the following columns:
SuccTimeStamp TransacTimeStamp ItineraryID CouponNumber StatusCode Origin Destination OperatingAirline FlightNumber.
You can see that each of these elements are found at different depths of the xml and every travel itinerary has a different number of coupons, anywhere from 1-10. I also found a helpful post here from hrbrmstr helping out someone from 2018, but I can't get a similar solution to "see" my nodes and I'm not sure if it's my code or my xml data.
Any help is appreciated!
For nested XML files which you need to flatten for end use needs such as R, consider XSLT, the special-purpose language designed to transform XML files. You can run XSLT 1.0 scripts in R using the xslt package (sister to xml2). Alternatively, you can use a dedicated XSLT processor and have R call the external program with system(). Like SQL, XSLT is an industry, portable language not limited to R.
Within XSLT, since your granularity is coupon, you can extract from <Coupon> level and use the ancestor:: XPath axe to retrieve higher up level information. Due to the default namespace needs, the long-windeed <xsl:element> is used. IATA_Number is assumed to be ItineraryID.
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://webse"
xmlns:stl="http://se">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/w:eTicketCouponRS">
<xsl:copy>
<xsl:apply-templates select="descendant::w:Coupon"/>
</xsl:copy>
</xsl:template>
<xsl:template match="w:Coupon">
<xsl:copy>
<xsl:element name="SuccTimeStamp" namespace="http://webse">
<xsl:value-of select="ancestor::w:eTicketCouponRS/stl:ApplicationResults/stl:Success/#timeStamp"/>
</xsl:element>
<xsl:element name="TransacTimeStamp" namespace="http://webse">
<xsl:value-of select="ancestor::w:Ticketing/#TransactionDateTime"/>
</xsl:element>
<xsl:element name="ItineraryID" namespace="http://webse">
<xsl:value-of select="ancestor::w:Ticketing/#IATA_Number"/>
</xsl:element>
<xsl:element name="CouponNumber" namespace="http://webse">
<xsl:value-of select="#Number"/>
</xsl:element>
<xsl:element name="StatusCode" namespace="http://webse">
<xsl:value-of select="#CodedStatus"/>
</xsl:element>
<xsl:element name="Origin" namespace="http://webse">
<xsl:value-of select="w:FlightSegment/w:OriginLocation/#LocationCode"/>
</xsl:element>
<xsl:element name="Destination" namespace="http://webse">
<xsl:value-of select="w:FlightSegment/w:DestinationLocation/#LocationCode"/>
</xsl:element>
<xsl:element name="OperatingAirline" namespace="http://webse">
<xsl:value-of select="w:FlightSegment/w:OperatingAirline/#Code"/>
</xsl:element>
<xsl:element name="FlightNumber" namespace="http://webse">
<xsl:value-of select="w:FlightSegment/#FlightNumber"/>
</xsl:element>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Online Demo
R
library(xml2)
library(xslt)
# LOAD XML AND XSLT
doc <- read_xml("/path/to/Input.xml")
style <- read_xml("/path/to/Style.xsl", package = "xslt")
# RUN TRANSFORMATION AND SEE OUTPUT
flat_xml <- xml_xslt(doc, style)
cat(as.character(flat_xml))
# RETRIEVE data NODES
nmsp <- c(w = "http://webse")
recs <- xml2::xml_find_all(flat_xml, "//w:Coupon", ns=nmsp)
# BIND EACH CHILD TEXT AND NAME
df_list <- lapply(recs, function(r) {
vals <- xml2::xml_children(r)
data.frame(rbind(setNames(c(xml2::xml_text(vals)),
c(xml2::xml_name(vals)))))
})
# COMBINE ALL DFS
final_df <- do.call(rbind.data.frame, df_list)
rm(recs, df_list)
final_df
# SuccTimeStamp TransacTimeStamp ItineraryID CouponNumber StatusCode Origin Destination OperatingAirline FlightNumber
# 1 2021-06-16T11:39:52-05:00 2021-06-16T11:39 0952 1 OK SAN DFW AA 2535
# 2 2021-06-16T11:39:52-05:00 2021-06-16T11:39 0952 2 OK DFW HND AA 175
Above runs for a single XML. For 100 separate files, wrap above in a user-defined method and run lapply for a list of XML dataframes for master concatenation at very end. Load XSLT once outside loop since it does not change, assuming XML files retain same structure.
style <- read_xml("/path/to/Style.xsl", package = "xslt")
xml_to_df <- function(xml_file) { ... }
xml_dfs <- lapply(list_of_xml_files, xml_to_df)
master_df <- do.call(rbind.data.frame, xml_dfs)
Thanks Parfait! I was able to modify the template xsl that you provided. The xsl sheet seems to "parse" everything nicely! After I got it "flattened", all I simply had to do was as_list(), as_tibble() and unnest() a couple of times and then it was a data frame.
Thanks!
I'm a newbie at XSL / XML. I would like to make a simple XSL of the below XML code, that just shows the attributes name and adress? I have most of the XSL but I can't write the part where it shows my results (the customers).
This is the XML code:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="customer.xsl"?>
<customers xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="customer.xsd">
<customer name="Jay Z" address="New York, trinity st. 300, USA"/>
<customer name="Vladimir Putin" address="St. Petersburg, wadim street 23, Russia"/>
<customer name="Hiro Nakamura" address="Kyoto, Natsukawa street 49, Japan"/>
</customers>
Like this?
<xsl:value-of select="customers/customer"/>
Any help will be much appreciated! Thank you.
The XSL itself would look like the following:
<xsl:for-each select="customers/customer">
<xsl:value-of select="#name"/>
<xsl:value-of select="#address"/>
</xsl:for-each>
This code as working example
For a more detailed example look at this
I'm saving some timestamps in my XML results in standard UTC format.
What I'd like to be able to do is re-convert this to human readable times. Without the timezone addendum. As far as I've been able to get so far is:
format-dateTime(
xs:dateTime(
adjust-dateTime-to-timezone(
xs:dateTime(#thevalue),xs:dayTimeDuration('P0DT4H')
)
),'[M01]/[D01]/[Y0001] [H01]:[m01]:[s01]'
)
where #thevalue is like: 2006-02-15T17:00:00
It's giving me a headache because the formatter returns a time of 17:00. If I peel back a layer of the format-dateTime to see what the adjust-dateTime function returns, it gives
2006-02-15T17:00:00+04:00
... and all I really want to see is 21:00... so very frustrated. Anyone deal with this before?
Here is a transformation that does what you want:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vDateTime" as="xs:dateTime"
select="xs:dateTime('2006-02-15T17:00:00+00:00')"/>
<xsl:template match="/">
<xsl:sequence select=
"adjust-dateTime-to-timezone($vDateTime,
xs:dayTimeDuration('P0DT4H')
)"/>
</xsl:template>
</xsl:stylesheet>
When applied to any XML document (not used), the result is:
2006-02-15T21:00:00+04:00
And the complete solution is:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vDateTime" as="xs:dateTime"
select="xs:dateTime('2006-02-15T17:00:00+00:00')"/>
<xsl:template match="/">
<xsl:variable name="vadjustedDateTime" select=
"adjust-dateTime-to-timezone($vDateTime,
xs:dayTimeDuration('P0DT4H')
)"/>
<xsl:sequence select=
"format-dateTime($vadjustedDateTime,
'[M01]/[D01]/[Y0001] [H01]:[m01]:[s01]'
)
"/>
</xsl:template>
</xsl:stylesheet>
which produces this result:
02/15/2006 21:00:00
I need to loop through an XML document (no problem over there) and check if a value that i find is already in a (a) tag in a div in my XSL document that i am generating, only if the value is not in that (a) tag i should create a new (a) tag for it and put in in the div that i am checking...
Any one knows how to do it dynamically in XSLT?
<div id="tags"><span class="l_cap"> </span>
all
<xsl:for-each select="root/nodes/node/data/genres">
<xsl:for-each select="value">
**<xsl:if test="not(contains())">**
<xsl:value-of select="current()"/>
</xsl:if>
</xsl:for-each>
</xsl:for-each>
sorry for before, what i am trying to do is: in the if statement, check if the current value is already exist in the div if not, add it, if is, don't do anything...
10x again
It sounds like you're trying to create a distinct list of all of the "genres" in your list.
Assuming a data structure which looks a bit like this:
<root>
<nodes>
<node>
<data>
<genres>
<value>One</value>
<value>Two</value>
<value>Two</value>
<value>Three</value>
<value>Two</value>
</genres>
</data>
</node>
</nodes>
</root>
And a stylesheet which looks a bit like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="genres" match="value" use="."/>
<xsl:template match="/">
<div>
<xsl:for-each select="/root/nodes/node/data/genres/value">
<xsl:if test="generate-id(.) = generate-id(key('genres', .)[1])">
<xsl:value-of select="."/>
</xsl:if>
</xsl:for-each>
</div>
</xsl:template>
</xsl:stylesheet>
Then you will end up with something like this:
<div>
One
Two
Three
</div>
This is a fairly standard XSLT 1.0 technique. It uses keys (described here: http://www.xml.com/pub/a/2002/02/06/key-lookups.html ) to create a sort of index of all the /root/nodes/node/data/genres/value entries. Then it loops through all of the entries, but only prints the first one of each type. The end result is that each value will only be output once.