Parse HTML/XML characters in R - r

I am reading in the following XML as a text file in R:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC
"-//Recordare//DTD MusicXML 3.0 Partwise//EN"
"http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="3.0">
<part-list>
<score-part id="P1">
<part-name>Music</part-name>
</score-part>
</part-list>
<part id="P1">
<measure number="1">
<attributes>
<divisions>1</divisions>
<key>
<fifths>0</fifths>
</key>
<time>
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
</attributes>
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>4</duration>
<type>whole</type>
</note>
</measure>
</part>
</score-partwise>
R:
library(readtext)
xml <- readtext("musicxml.txt")$text
I am then trying to render this in Javascript via Shiny by feeding my XML text to a Javascript function. NB: Working outside of R.
shiny::tags$script(paste0('var osmd = new opensheetmusicdisplay.OpenSheetMusicDisplay(\"sheet-music\", {drawingParameters: "compact",
drawPartNames: false, drawMeasureNumbers: false, drawMetronomeMarks: false, drawTitle: false});
var loadPromise = osmd.load(\'',xml,'\');
loadPromise.then(function(){
osmd.render();
});
'))
However, when I concatenate the XML string above, it does not work because characters are escaped, e.g one line:
<note>
I tried using the unescape_xml function here (with and without the tags removed), but this does not solve the problem. It leaves me with:
"Music1044G2C44whole"
So how can I end up with a concatenated string with none of the escaped characters? It must just be a string and not another R object.

You need to wrap the contents of the tag call with shiny::HTML to ensure it is passed unescaped:
shiny::tags$script(shiny::HTML(paste0(
'var osmd = new opensheetmusicdisplay.OpenSheetMusicDisplay(\"sheet-music\",
{ drawingParameters: "compact",
drawPartNames: false,
drawMeasureNumbers: false,
drawMetronomeMarks: false,
drawTitle: false});
var loadPromise = osmd.load(\'',xml,'\');
loadPromise.then(function(){ osmd.render() });')))
Which gives you:
<script>var osmd = new opensheetmusicdisplay.OpenSheetMusicDisplay("sheet-music",
{ drawingParameters: "compact",
drawPartNames: false,
drawMeasureNumbers: false,
drawMetronomeMarks: false,
drawTitle: false});
var loadPromise = osmd.load('<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC
"-//Recordare//DTD MusicXML 3.0 Partwise//EN"
"http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="3.0">
<part-list>
<score-part id="P1">
<part-name>Music</part-name>
</score-part>
</part-list>
<part id="P1">
<measure number="1">
<attributes>
<divisions>1</divisions>
<key>
<fifths>0</fifths>
</key>
<time>
<beats>4</beats>
<beat-type>4</beat-type>
</time>
<clef>
<sign>G</sign>
<line>2</line>
</clef>
</attributes>
<note>
<pitch>
<step>C</step>
<octave>4</octave>
</pitch>
<duration>4</duration>
<type>whole</type>
</note>
</measure>
</part>
</score-partwise>');
loadPromise.then(function(){ osmd.render() });</script>

Related

WSO2 : Transforming response xml

I would like to turn this xml response into something more easily readable.
<?xml version="1.0" encoding="ISO-8859-1"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"/>
<soap:Body>
<executeResponse xmlns="urn:GCE">
<BusinessViewServiceexecuteOut xmlns="http://www.generix.fr/technicalframework/businesscomponent/applicationmodule/common" xmlns:ns2="http://www.generixgroup.com/processus/configuration/scheduler" xmlns:ns3="http://www.generix.fr/technicalframework/business/service/common">
<xmlpres><?xml version = '1.0' encoding = 'UTF-8'?> <VueTable type="View" name="Table" habctr="true" total_business_row="2" nbline="400" confNbline="400" numpage="1" nbpage="1">
<JTblView name="JTblView" type="ViewObject" maxfetchsize="999" maxfetchsizeexceeded="false">
<JTblViewRow current="true" type="ViewRow" index="1" business_row_index="1">
<Cletbl precision="6" type="VARCHAR" pk="true">
<business_data>N</business_data>
</Cletbl>
<Codtbl precision="6" type="VARCHAR" pk="true">
<business_data>001</business_data>
</Codtbl>
<Lib1 precision="30" type="VARCHAR">
<business_data>Non</business_data>
</Lib1>
<Lib2 precision="30" type="VARCHAR">
<business_data/>
</Lib2>
<Lir precision="10" type="VARCHAR">
<business_data>Non</business_data>
</Lir>
</JTblViewRow>
<JTblViewRow type="ViewRow" index="2" business_row_index="2">
<Cletbl precision="6" type="VARCHAR" pk="true">
<business_data>O</business_data>
</Cletbl>
<Codtbl precision="6" type="VARCHAR" pk="true">
<business_data>001</business_data>
</Codtbl>
<Lib1 precision="30" type="VARCHAR">
<business_data>Oui</business_data>
</Lib1>
<Lib2 precision="30" type="VARCHAR">
<business_data/>
</Lib2>
<Lir precision="10" type="VARCHAR">
<business_data>Oui</business_data>
</Lir>
</JTblViewRow>
</JTblView>
</VueTable></xmlpres>
</BusinessViewServiceexecuteOut>
</executeResponse>
</soap:Body></soap:Envelope>
At least if I could extract what's in the value of "xmlpres", the better I could do:
<table><row><code></code><libelle></libelle/></row></table>
To then turn it into a json response but I can't see ... I just get all the output or in json stream but with everything , which is not usable.
Create an out-mediation sequence with the following content and attach it to the respective API and try out the scenario. This is to extract the xmlpres content and send that as the response to the client
<?xml version="1.0" encoding="UTF-8"?>
<sequence xmlns="http://ws.apache.org/ns/synapse" name="out-sequence">
<!-- extract the xmlpres content and store as OM element -->
<property name="XMLBody"
expression="$body//soap:Body//generic:xmlpres"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:gce="urn:GCE"
xmlns:generic="http://www.generix.fr/technicalframework/businesscomponent/applicationmodule/common" type="OM" />
<!-- pass the extracted property as response body -->
<enrich>
<source type="property" property="XMLBody" />
<target type="body" />
</enrich>
</sequence>
Hope this helps you to extract and send the response accordingly.

XML to list and back to XML

Situation: "Software" to R and back to "Software". The only interface for "Software" is xml.
In R, I need to make a few changes in the file so i convert it to a list and make some changes.
library(XML)
myFile = xmlParse("myXML")
xml_data <- xmlToList(myFile)
xml_data$timetable$train$.attrs[6] = "HelloNewWorld"
Now i need to convert this list "xml_data" it back to xml.
I found some functions like this:
function(item, tag) {
# just a textnode, or empty node with attributes
if(typeof(item) != 'list') {
if (length(item) > 1) {
xml <- xmlNode(tag)
for (name in names(item)) {
xmlAttrs(xml)[[name]] <- item[[name]]
}
return(xml)
} else {
return(xmlNode(tag, item))
}
}
# create the node
if (identical(names(item), c("text", ".attrs"))) {
# special case a node with text and attributes
xml <- xmlNode(tag, item[['text']])
} else {
# node with child nodes
xml <- xmlNode(tag)
for(i in 1:length(item)) {
if (names(item)[i] != ".attrs") {
xml <- append.xmlNode(xml, listToXml(item[[i]], names(item)[i]))
}
}
}
# add attributes to node
attrs <- item[['.attrs']]
for (name in names(attrs)) {
xmlAttrs(xml)[[name]] <- attrs[[name]]
}
return(xml)
}
But this doesnt work...
Any help or hints appreciated!
Thanks!
In the linked picture you can see the current xml-file. Highlighted in yellow the values that I need to change.
Link:
https://i.stack.imgur.com/remzj.png
Consider XSLT, the special-purpose language designed to transform XML files. No need to rewrite the entire tree in R. Using its xslt package (available on CRAN-R), extension of xml2, you can transform an input source and write output to screen or file.
Using the Identity Transform to copy document as is, below XSLT then rewrites one of the attributes in <train> tag, #source, similar to your above code attempt but with sixth attribute.
XML (sample input from railIML Wiki page)
<?xml version="1.0" encoding="UTF-8"?>
<railml xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="timetable.xsd">
<timetable version="1.1">
<train trainID="RX 100.2" type="planned" source="opentrack">
<timetableentries>
<entry posID="ZU" departure="06:08:00" type="begin"/>
<entry posID="ZWI" departure="06:10:30" type="pass"/>
<entry posID="ZOER" arrival="06:16:00" departure="06:17:00" minStopTime="9" type="stop"/>
<entry posID="WS" departure="06:21:00" type="pass"/>
<entry posID="DUE" departure="06:23:00" type="pass"/>
<entry posID="SCW" departure="06:27:00" type="pass"/>
<entry posID="NAE" departure="06:29:00" type="pass"/>
<entry posID="UST" arrival="06:34:30" type="stop"/>
</timetableentries>
</train>
</timetable>
</railml>
XSLT (save as .xsl file, rewrites the #source attribute)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="#source">
<xsl:attribute name="source">HelloNewWorld</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
R
library(xslt)
doc <- read_xml("/path/to/Input.xml", package = "xslt")
style <- read_xml("/path/to/XLSTScript.xsl", package = "xslt")
new_xml <- xml_xslt(doc, style)
# OUTPUT TO SCREEN
cat(as.character(new_xml))
# OUTPUT TO FILE
write_xml(new_xml, "/path/to/Output.xml")
Output
<?xml version="1.0" encoding="UTF-8"?>
<railml xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="timetable.xsd">
<timetable version="1.1">
<train trainID="RX 100.2" type="planned" source="HelloNewWorld">
<timetableentries>
<entry posID="ZU" departure="06:08:00" type="begin"/>
<entry posID="ZWI" departure="06:10:30" type="pass"/>
<entry posID="ZOER" arrival="06:16:00" departure="06:17:00" minStopTime="9" type="stop"/>
<entry posID="WS" departure="06:21:00" type="pass"/>
<entry posID="DUE" departure="06:23:00" type="pass"/>
<entry posID="SCW" departure="06:27:00" type="pass"/>
<entry posID="NAE" departure="06:29:00" type="pass"/>
<entry posID="UST" arrival="06:34:30" type="stop"/>
</timetableentries>
</train>
</timetable>
</railml>
I found it hard to apply many of the answers listed here so I wonder if this set of simple Java XML XPathHelper Unities may help others. You can find the source code here. I didn't write it all myself but adapted code I found, so I can't take all the credit but it works and it is compact and hope it helps others.
String xmlPayLoad = readFileAsString(payLoadPath + "/payLoad.xml");
TreeMap<String, String> header = new TreeMap<String, String>();
XPathHelperCommon xph = new XPathHelperCommon();
header = xph.findMultipleXMLItems(xmlPayLoad, "//header/*");
header.put("type", "newProcess");
xmlPayLoad = xph.modifyMultipleXMLItems(xmlPayLoad, "//header/*", header);
The primative XML header could be something like this:
<header>
<type>process</type>
<ruleBaseVersion>0</ruleBaseVersion>
<ruleBaseCommitment>0</ruleBaseCommitment>
<sequenceId>0</sequenceId>
<priortiseSID>0</priortiseSID>
<monitorIncomingEvents>0</monitorIncomingEvents>
<activityCount>0</activityCount>
<taskElapsedTime>0</taskElapsedTime>
<processStartTime>0</processStartTime>
<processElapsedTime>0</processElapsedTime>
<eventElapsedTime>0</eventElapsedTime>
<status>0</status>
</header>

Import XML to R data frame

I am trying to import an xml file into R. It is of the format below with an event on each row followed by a number of attributes - which ones depend on the event type. This file is 0.7GB and future versions may be much bigger. I would like to create a data frame with each event on a new row and all the possible attributes in separate columns (meaning some will be empty depending on the event type). I have looked elsewhere for answers but they all seem to be dealing with XML files in a tree structure and I can't work out how to apply them to this format.
I am new to R and have no experience with XML files so please give me the "for dummies" answer with plenty of explanation. Thanks!
<?xml version="1.0" encoding="utf-8"?>
<events version="1.0">
<event time="21510.0" type="actend" person="3" link="1" actType="h" />
<event time="21510.0" type="departure" person="3" link="1" legMode="car" />
<event time="21510.0" type="PersonEntersVehicle" person="3" vehicle="3" />
<event time="21510.0" type="vehicle enters traffic" person="3" link="1" vehicle="3" networkMode="car" relativePosition="1.0" />
...
</events>
You can try something like this:
original_xml <- '<?xml version="1.0" encoding="utf-8"?>
<events version="1.0">
<event time="21510.0" type="actend" person="3" link="1" actType="h" />
<event time="21510.0" type="departure" person="3" link="1" legMode="car" />
<event time="21510.0" type="PersonEntersVehicle" person="3" vehicle="3" />
<event time="21510.0" type="vehicle enters traffic" person="3" link="1" vehicle="3" networkMode="car" relativePosition="1.0" />
</events>'
library(xml2)
data2 <- xml_children(read_xml(original_xml))
attr_names <- unique(names(unlist(xml_attrs(data2))))
xmlDataFrame <- as.data.frame(sapply(attr_names, function (attr) {
xml_attr(data2, attr = attr)
}), stringsAsFactors = FALSE)
#-- since all columns are strings, you may want to turn the numeric columns to numeric
xmlDataFrame[, c("time", "person", "link", "vehicle")] <- sapply(xmlDataFrame[, c("time", "person", "link", "vehicle")], as.numeric)
If you have additional "numeric" columns, you can add them at the end to convert the data to its proper class.

xquery: how to remove unused namespace in xml node?

I have a xml document same as:
<otx xmlns="http://iso.org/OTX/1.0.0" xmlns:i18n="http://iso.org/OTX/1.0.0/i18n" xmlns:diag="http://iso.org/OTX/1.0.0/DiagCom" xmlns:measure="http://iso.org/OTX/1.0.0/Measure" xmlns:string="http://iso.org/OTX/1.0.0/StringUtil" xmlns:dmd="http://iso.org/OTX/1.0.0/Auxiliaries/DiagMetaData" xmlns:fileXml="http://vwag.de/OTX/1.0.0/XmlFile" xmlns:log="http://iso.org/OTX/1.0.0/Logging" xmlns:file="http://vwag.de/OTX/1.0.0/File" xmlns:dataPlus="http://iso.org/OTX/1.0.0/DiagDataBrowsingPlus" xmlns:event="http://iso.org/OTX/1.0.0/Event" xmlns:quant="http://iso.org/OTX/1.0.0/Quantities" xmlns:hmi="http://iso.org/OTX/1.0.0/HMI" xmlns:math="http://iso.org/OTX/1.0.0/Math" xmlns:flash="http://iso.org/OTX/1.0.0/Flash" xmlns:data="http://iso.org/OTX/1.0.0/DiagDataBrowsing" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dt="http://iso.org/OTX/1.0.0/DateTime" xmlns:eventPlus="http://iso.org/OTX/1.0.0/EventPlus" xmlns:corePlus="http://iso.org/OTX/1.0.0/CorePlus" xmlns:xmime="http://www.w3.org/2005/05/xmlmime" xmlns:job="http://iso.org/OTX/1.0.0/Job" id="id_4e5722f2f81a4309860c146fd3c743e5" name="NewDocument1" package="NewOtxProject1Package1" version="1.0.0.0" timestamp="2014-04-11T09:42:50.2091628+07:00">
<declarations>
<constant id="id_fdc639e20fbb42a4b6b039f4262a0001" name="GlobalConstant">
<realisation>
<dataType xsi:type="Boolean">
<init value="false"/>
</dataType>
</realisation>
</constant>
</declarations>
<metaData>
<data key="MadeWith">Created by emotive Open Test Framework - www.emotive.de</data>
<data key="OtfVersion">4.1.0.8044</data>
</metaData>
<procedures>
<procedure id="id_1a80900324c64ee883d9c12e08c8c460" name="main" visibility="PUBLIC">
<realisation>
<flow/>
</realisation>
</procedure>
</procedures>
</otx>
I want to remove unsused namespaces in the xml by xquery but i don't know any solution to solving it. the current namespace used are "http://iso.org/OTX/1.0.0 and http://www.w3.org/2001/XMLSchema-instance".
please help me!.
Unfortunately, XQuery Update provides no expression to remove existing namespaces, but you can recursively rebuild your document:
declare function local:strip-ns($n as node()) as node() {
if($n instance of element()) then (
element { node-name($n) } {
$n/#*,
$n/node()/local:strip-ns(.)
}
) else if($n instance of document-node()) then (
document { local:strip-ns($n/node()) }
) else (
$n
)
};
let $xml :=
<A xmlns="used1" xmlns:unused="unused">
<b:B xmlns:b="used2"/>
</A>
return local:strip-ns($xml)

XSLT : Cannot convert the operand to 'Result tree fragment'

I work on an xslt stylesheet, and I should receive as parameter two additional XML. I get an error when I use the node-set() method (from namespace ms, microsoft). The contents of the XML is correct. The parameters are send with classic ASP.
Here's the header and the call in xslt:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
...
<xsl:param name="xmlPlanning"></xsl:param>
<xsl:variable name="myXml" select="ms:node-set($xmlPlanning)"></xsl:variable>
<xsl:value-of select="ms:node-set($xmlPlanning)/*"/>
Here's the stack trace of the error:
[XsltException: Impossible de convertir l'opérande en 'fragment de l'arborescence résultat'.]
System.Xml.Xsl.XsltOld.XsltFunctionImpl.ToNavigator(Object argument) +380943
System.Xml.Xsl.XsltOld.FuncNodeSet.Invoke(XsltContext xsltContext, Object[] args, XPathNavigator docContext) +33
MS.Internal.Xml.XPath.FunctionQuery.Evaluate(XPathNodeIterator nodeIterator) +292
[XPathException: Échec de la fonction 'ms:node-set()'.]
MS.Internal.Xml.XPath.FunctionQuery.Evaluate(XPathNodeIterator nodeIterator) +347
System.Xml.Xsl.XsltOld.Processor.RunQuery(ActionFrame context, Int32 key) +24
System.Xml.Xsl.XsltOld.VariableAction.Execute(Processor processor, ActionFrame frame) +200
System.Xml.Xsl.XsltOld.ActionFrame.Execute(Processor processor) +20
System.Xml.Xsl.XsltOld.Processor.Execute() +82
System.Xml.Xsl.XsltOld.Processor.Execute(TextWriter writer) +96
System.Xml.Xsl.XslTransform.Transform(XPathNavigator input, XsltArgumentList args, TextWriter output, XmlResolver resolver) +68
System.Xml.Xsl.XslTransform.Transform(IXPathNavigable input, XsltArgumentList args, TextWriter output, XmlResolver resolver) +43
System.Web.UI.WebControls.Xml.Render(HtmlTextWriter output) +132
And here's the beginning of the xml I receive in parameter :
<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfGenerationPlanningDesign xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://webservices.secureholiday.net/">
<GenerationPlanningDesign>
What could be my problem ?
In case the parameter you are passing is already a true nodeset (XPath navigator or XPathNodeIterator in .NET or IXMLDOMNodeList for MSXML), you don't need and must not use the ms:node-set() extension function. Simply remove the call to ms:nodeset().
In case it is a string that represents XML -- well it shouldn't! Parse this string to one of the allowable parameter types for a nodeset and only then invoke the transformation -- using the true node-set.
node-set() operates on Result Document Fragments (RDFs) only, but you give it a string, which is something entirely different (even if the string contents looks like XML).
What you must do is parse the string into XML. You can use an extension script for that. The following worked for me (tested with msxsl.exe on the command line), but if you don't want to use JScript you can use C# or any other supported language to do the same.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ms="urn:schemas-microsoft-com:xslt"
xmlns:script="urn:my-scripts"
exclude-result-prefixes="ms script"
>
<ms:script language="JScript" implements-prefix="script">
<![CDATA[
function stringToXml(str) {
var xml = new ActiveXObject("MSXML2.DOMDocument.4.0");
xml.async = false;
xml.loadXML(str);
return xml;
}
]]>
</ms:script>
<xsl:param name="xmlPlanning"></xsl:param>
<xsl:variable name="myXml" select="script:stringToXml(string($xmlPlanning))" />
<xsl:template match="/">
<xsl:value-of select="$myXml/*" /><!-- whatever -->
</xsl:template>
</xsl:stylesheet>
As Dimitre said you can use ms:node-set but you must use node()
<xsl:variable name="yourVariable">
<xsl:copy-of select="/foo/bar/something/node()"/>
</xsl:variable>
<xsl:value-of select="ms:node-set($yourVariable)/theOtherElement"/>

Resources