Convert an XML file to R Data Frame and extract an attribute - r

I am trying to convert the below XML file to a data frame with data time also as a column of the data frame but I am unable to extract the date time attribute. Any ideas on how to do this in R? A demonstration using a sample code will be helpful to understand how to do this in R.
<?xml version="1.0" encoding="UTF-8"?>
<Group snapshotTime="2018-05-30T19:33:44.352Z">
<Links>
<rel>self</rel>
<href>https:cloud.com/Group/1</href>
</Links>
<Links>
<rel>last</rel>
<href>https:cloud.com/Group/6</href>
</Links>
<Links>
<rel>next</rel>
<href>https:cloud.com/Group/2</href>
</Links>
<Equipment>
<EquipmentHeader>
<Name>CASE IH</Name>
<Model>1100</Model>
<EquipmentID> Desk</EquipmentID>
<SerialNumber>1231</SerialNumber>
<PIN>123</PIN>
</EquipmentHeader>
<Location datetime="2012-06-25T11:14:54.000Z">
<Latitude>12.573722</Latitude>
<Longitude>-45.515805</Longitude>
</Location>
<Ophrs datetime="2012-03-01T17:42:37.000Z">
<Hour>1968.80</Hour>
</Ophrs>
</Equipment>
<Equipment>
<EquipmentHeader>
<Name>CALL</Name>
<Model>L2048</Model>
<EquipmentID>1MM772GP4</EquipmentID>
<SerialNumber>1TT772GPVJF688214</SerialNumber>
<PIN>1TT772G4</PIN>
</EquipmentHeader>
<Location datetime="2018-05-30T19:22:46.000Z">
<Latitude>15.518556</Latitude>
<Longitude>-55.422444</Longitude>
</Location>
<CumulativeIdleHours datetime="2018-05-30T19:02:46.000Z">
<Hour>14.74</Hour>
</CumulativeIdleHours>
<Ophrs datetime="2018-05-30T19:22:48.000Z">
<Hour>52.35</Hour>
</Ophrs>
<Distance datetime="2018-05-30T19:02:46.000Z">
<OdometerUnits>kilometre</OdometerUnits>
<Odometer>130.9</Odometer>
</Distance>
<FuelUsed datetime="2018-05-30T19:02:46.000Z">
<FuelUnits>litre</FuelUnits>
<FuelConsumed>395</FuelConsumed>
</FuelUsed>
</Equipment>
</Group>

Here's how you would get the datetime attribute, for example, for the Location nodes:
library("xml2")
library("tidyverse")
temp <- '<?xml version="1.0" encoding="UTF-8"?>
<Group snapshotTime="2018-05-30T19:33:44.352Z">
<Links>
<rel>self</rel>
<href>https:cloud.com/Group/1</href>
</Links>
<Links>
<rel>last</rel>
<href>https:cloud.com/Group/6</href>
</Links>
<Links>
<rel>next</rel>
<href>https:cloud.com/Group/2</href>
</Links>
<Equipment>
<EquipmentHeader>
<Name>CASE IH</Name>
<Model>1100</Model>
<EquipmentID> Desk</EquipmentID>
<SerialNumber>1231</SerialNumber>
<PIN>123</PIN>
</EquipmentHeader>
<Location datetime="2012-06-25T11:14:54.000Z">
<Latitude>12.573722</Latitude>
<Longitude>-45.515805</Longitude>
</Location>
<Ophrs datetime="2012-03-01T17:42:37.000Z">
<Hour>1968.80</Hour>
</Ophrs>
</Equipment>
<Equipment>
<EquipmentHeader>
<Name>CALL</Name>
<Model>L2048</Model>
<EquipmentID>1MM772GP4</EquipmentID>
<SerialNumber>1TT772GPVJF688214</SerialNumber>
<PIN>1TT772G4</PIN>
</EquipmentHeader>
<Location datetime="2018-05-30T19:22:46.000Z">
<Latitude>15.518556</Latitude>
<Longitude>-55.422444</Longitude>
</Location>
<CumulativeIdleHours datetime="2018-05-30T19:02:46.000Z">
<Hour>14.74</Hour>
</CumulativeIdleHours>
<Ophrs datetime="2018-05-30T19:22:48.000Z">
<Hour>52.35</Hour>
</Ophrs>
<Distance datetime="2018-05-30T19:02:46.000Z">
<OdometerUnits>kilometre</OdometerUnits>
<Odometer>130.9</Odometer>
</Distance>
<FuelUsed datetime="2018-05-30T19:02:46.000Z">
<FuelUnits>litre</FuelUnits>
<FuelConsumed>395</FuelConsumed>
</FuelUsed>
</Equipment>
</Group>'
temp %>% xml2::read_xml() %>% rvest::xml_nodes("Location") %>% xml2::xml_attr("datetime")
Hope this helps.

Related

WSO2 : Transforming response xml

I would like to turn this xml response into something more easily readable.
<?xml version="1.0" encoding="ISO-8859-1"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<SOAP-ENV:Header xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"/>
<soap:Body>
<executeResponse xmlns="urn:GCE">
<BusinessViewServiceexecuteOut xmlns="http://www.generix.fr/technicalframework/businesscomponent/applicationmodule/common" xmlns:ns2="http://www.generixgroup.com/processus/configuration/scheduler" xmlns:ns3="http://www.generix.fr/technicalframework/business/service/common">
<xmlpres><?xml version = '1.0' encoding = 'UTF-8'?> <VueTable type="View" name="Table" habctr="true" total_business_row="2" nbline="400" confNbline="400" numpage="1" nbpage="1">
<JTblView name="JTblView" type="ViewObject" maxfetchsize="999" maxfetchsizeexceeded="false">
<JTblViewRow current="true" type="ViewRow" index="1" business_row_index="1">
<Cletbl precision="6" type="VARCHAR" pk="true">
<business_data>N</business_data>
</Cletbl>
<Codtbl precision="6" type="VARCHAR" pk="true">
<business_data>001</business_data>
</Codtbl>
<Lib1 precision="30" type="VARCHAR">
<business_data>Non</business_data>
</Lib1>
<Lib2 precision="30" type="VARCHAR">
<business_data/>
</Lib2>
<Lir precision="10" type="VARCHAR">
<business_data>Non</business_data>
</Lir>
</JTblViewRow>
<JTblViewRow type="ViewRow" index="2" business_row_index="2">
<Cletbl precision="6" type="VARCHAR" pk="true">
<business_data>O</business_data>
</Cletbl>
<Codtbl precision="6" type="VARCHAR" pk="true">
<business_data>001</business_data>
</Codtbl>
<Lib1 precision="30" type="VARCHAR">
<business_data>Oui</business_data>
</Lib1>
<Lib2 precision="30" type="VARCHAR">
<business_data/>
</Lib2>
<Lir precision="10" type="VARCHAR">
<business_data>Oui</business_data>
</Lir>
</JTblViewRow>
</JTblView>
</VueTable></xmlpres>
</BusinessViewServiceexecuteOut>
</executeResponse>
</soap:Body></soap:Envelope>
At least if I could extract what's in the value of "xmlpres", the better I could do:
<table><row><code></code><libelle></libelle/></row></table>
To then turn it into a json response but I can't see ... I just get all the output or in json stream but with everything , which is not usable.
Create an out-mediation sequence with the following content and attach it to the respective API and try out the scenario. This is to extract the xmlpres content and send that as the response to the client
<?xml version="1.0" encoding="UTF-8"?>
<sequence xmlns="http://ws.apache.org/ns/synapse" name="out-sequence">
<!-- extract the xmlpres content and store as OM element -->
<property name="XMLBody"
expression="$body//soap:Body//generic:xmlpres"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:gce="urn:GCE"
xmlns:generic="http://www.generix.fr/technicalframework/businesscomponent/applicationmodule/common" type="OM" />
<!-- pass the extracted property as response body -->
<enrich>
<source type="property" property="XMLBody" />
<target type="body" />
</enrich>
</sequence>
Hope this helps you to extract and send the response accordingly.

How to count elements with two "restrictions"?

My XML looks something like this :
<books>
<book id="b1">
<title>Set theory and the continuum problem</title>
<category>Mathematics</category>
<location>
<area>hall1</area>
<case>1</case>
<shelf>2</shelf>
</location>
<description>A lucid, elegant, and complete survey of set theory.</description>
<history>
<borrowed by="m4"/>
<borrowed by="m2" until="2018-04-05"/>
</history>
</book>
<book id="b2">
<title>Computational Complexity</title>
<isbn>978-0201-530-827</isbn>
<category>Computer Science</category>
<location>
<area>hall1</area>
<case>3</case>
<shelf>3</shelf>
</location>
<description>.</description>
</book>
<book id="b3">
<title>To mock a mockingbird</title>
<isbn>1-292-09761-2</isbn>
<category>Logic</category>
<category>Mathematics</category>
<location>
<area>hall1</area>
<case>1</case>
<shelf>3</shelf>
</location>
<description>.</description>
</book>
</books>
Is it possible to count how many books are there with elements area='hall1' and case='1'?
I tried this:
count(//books/book[location/area='hall1'])
but i do not know how to include case='1' "restriction" also
This should work:
count(//books/book[location/area='hall1'][location/case='1'])
** ADDITIONAL QUESTION **
Is it possible to list all book's titles which have area='hall1' and case='1'?
for $bb in //books/book[location/area='hall1'][location/case='1']
let $n := //books/book/title
return <book>$n</book>

Import XML to R data frame

I am trying to import an xml file into R. It is of the format below with an event on each row followed by a number of attributes - which ones depend on the event type. This file is 0.7GB and future versions may be much bigger. I would like to create a data frame with each event on a new row and all the possible attributes in separate columns (meaning some will be empty depending on the event type). I have looked elsewhere for answers but they all seem to be dealing with XML files in a tree structure and I can't work out how to apply them to this format.
I am new to R and have no experience with XML files so please give me the "for dummies" answer with plenty of explanation. Thanks!
<?xml version="1.0" encoding="utf-8"?>
<events version="1.0">
<event time="21510.0" type="actend" person="3" link="1" actType="h" />
<event time="21510.0" type="departure" person="3" link="1" legMode="car" />
<event time="21510.0" type="PersonEntersVehicle" person="3" vehicle="3" />
<event time="21510.0" type="vehicle enters traffic" person="3" link="1" vehicle="3" networkMode="car" relativePosition="1.0" />
...
</events>
You can try something like this:
original_xml <- '<?xml version="1.0" encoding="utf-8"?>
<events version="1.0">
<event time="21510.0" type="actend" person="3" link="1" actType="h" />
<event time="21510.0" type="departure" person="3" link="1" legMode="car" />
<event time="21510.0" type="PersonEntersVehicle" person="3" vehicle="3" />
<event time="21510.0" type="vehicle enters traffic" person="3" link="1" vehicle="3" networkMode="car" relativePosition="1.0" />
</events>'
library(xml2)
data2 <- xml_children(read_xml(original_xml))
attr_names <- unique(names(unlist(xml_attrs(data2))))
xmlDataFrame <- as.data.frame(sapply(attr_names, function (attr) {
xml_attr(data2, attr = attr)
}), stringsAsFactors = FALSE)
#-- since all columns are strings, you may want to turn the numeric columns to numeric
xmlDataFrame[, c("time", "person", "link", "vehicle")] <- sapply(xmlDataFrame[, c("time", "person", "link", "vehicle")], as.numeric)
If you have additional "numeric" columns, you can add them at the end to convert the data to its proper class.

Dynamically populate version from version.sbt

I'm trying to take the version from version.sbt and and populate it to logback.xml's log appender's applicationVersion field.
version.sbt
version in ThisBuild := "0.4.63"
logback.xml
<configuration debug="true" scan="true" scanPeriod="60 seconds">
<appender name="ADP-MESSAGING" class="com.agoda.adp.messaging.logging.appenders.LogbackAppender">
<applicationName>MyApp</applicationName>
<applicationAssemblyName>myapp</applicationAssemblyName>
<applicationVersion>0.4.61</applicationVersion>
<!-- <applicationVersion>${application.version}</applicationVersion> -->
<apiKey>s234W##$WFW$#$#</apiKey>
<getCallerData>false</getCallerData>
</appender>
....
<root level="WARN">
<appender-ref ref="ADP-MESSAGING" />
<appender-ref ref="STDOUT" />
</root>
</configuration>
I tried by adding ${application.version}, ${version} but no success.
How can I do this?
Please share your thoughts.
Thanks
The values interpolated in a logback.xml file are simply Java system properties. All you have to do is add a value to your Java commandline defining the value you want:
// NOTE: This will only work when running through sbt. You'll have to
// append the same value to your startup scripts when running elsewhere.
javaOptions += "-Dapplication.version=" + version.value
With this flag, you should be able to interpolate the version in your XML file:
<applicationVersion>${application.version}</applicationVersion>
You can add logback PropertyDefiner implementation:
package org.mypackage
import ch.qos.logback.core.PropertyDefinerBase
class VersionPropertyDefiner extends PropertyDefinerBase {
override def getPropertyValue: String = BuildInfo.version
}
You will get autogenerated (managed) scala code BuildInfo.version if you use BuildInfoPlugin in your project build settings.
Then you can define and use variable in your logback.xml configuration:
<configuration debug="true" scan="true" scanPeriod="60 seconds">
<define name="appVersion" class="org.mypackage.VersionPropertyDefiner"/>
<appender name="ADP-MESSAGING" class="com.agoda.adp.messaging.logging.appenders.LogbackAppender">
<applicationName>MyApp</applicationName>
<applicationAssemblyName>myapp</applicationAssemblyName>
<applicationVersion>${appVersion}</applicationVersion>
<apiKey>s234W##$WFW$#$#</apiKey>
<getCallerData>false</getCallerData>
</appender>
....
<root level="WARN">
<appender-ref ref="ADP-MESSAGING" />
<appender-ref ref="STDOUT" />
</root>
</configuration>

Add children to existing node using R XML

I have the following XML file test.graphml that I am trying to manipulate using the XML package in R.
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
<node id="n4"/>
<edge source="n0" target="n1"/>
<edge source="n0" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n1" target="n3"/>
<edge source="n3" target="n4"/>
</graph>
</graphml>
I would like to nest nodes n0, n1, n2, and n3 into a new graph node as shown below.
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<graph id="G" edgedefault="directed">
<graph id="g1">
<node id="n0"/>
<node id="n1"/>
<node id="n2"/>
<node id="n3"/>
</graph>
<node id="n4"/>
<edge source="n0" target="n1"/>
<edge source="n0" target="n2"/>
<edge source="n2" target="n3"/>
<edge source="n1" target="n3"/>
<edge source="n3" target="n4"/>
</graph>
</graphml>
The code I have written has unknowns and errors that I am unable to resolve due to lack of experience with XML processing. I would greatly appreciate some pointers to that will help me proceed.
library(XML)
# Read file
x <- xmlParse("test.graphml")
ns <- c(graphml ="http://graphml.graphdrawing.org/xmlns")
# Create new graph node
ng <- xmlNode("graph", attrs = c("id" = "g1"))
# Add n0-n3 as children of new graph node
n0_n1_n2_n3 <- getNodeSet(x,"//graphml:node[#id = 'n0' or #id='n1' or #id='n2' or #id='n3']", namespaces = ns)
ng <- append.xmlNode(ng, n0_n1_n2_n3)
# Get only graph node
g <- getNodeSet(x,"//graphml:graph", namespaces = ns)
# Remove nodes n0-n3 from the only graph node
# How I do this?
# This did not work: removeNodes(g, n0_n1_n2_n3)
# Add new graph node as child of only graph node
g <- append.xmlNode(g, ng)
#! Error message:
Error in UseMethod("append") :
no applicable method for 'append' applied to an object of class "XMLNodeSet"
Consider XSLT, the special-purpose language to transform XML files. Since you require modification of the XML (adding parent node in a select group of children) and have to navigate through an undeclared namespace prefix (xmlns="http://graphml.graphdrawing.org/xmlns"), XSLT is an optimal solution.
However, to date R does not have a fully compliant XSL module to run XSLT 1.0 scripts like other general purpose languages (Java, PHP, Python). Nonetheless, R can call external programs (including aforementioned languages), or dedicated XSLT processors (Xalan, Saxon), or call command line interpreters including PowerShell and terminal's xsltproc using system(). Below are latter solutions.
XSLT (save as .xsl, to be referenced in R script)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="doc:graphml">
<xsl:copy>
<xsl:copy-of select="document('')/*/#xsi:schemaLocation"/>
<xsl:apply-templates select="doc:graph"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc:graph">
<xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:apply-templates select="#*"/>
<xsl:element name="graph" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:attribute name="id">g1</xsl:attribute>
<xsl:apply-templates select="doc:node[position() < 5]"/>
</xsl:element>
<xsl:apply-templates select="doc:node[#id='n4']|doc:edge"/>
</xsl:element>
</xsl:template>
<xsl:template match="doc:graph/#*">
<xsl:attribute name="{local-name()}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>
<xsl:template match="doc:node|doc:edge">
<xsl:element name="{local-name()}" namespace="http://graphml.graphdrawing.org/xmlns">
<xsl:attribute name="{local-name(#*)}"><xsl:value-of select="#*"/></xsl:attribute>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
PowerShell script (for Windows PC users, save as XMLTransform.ps1)
param ($xml, $xsl, $output)
if (-not $xml -or -not $xsl -or -not $output) {
Write-Host "& .\xslt.ps1 [-xml] xml-input [-xsl] xsl-input [-output] transform-output"
exit;
}
trap [Exception]{
Write-Host $_.Exception;
}
$xslt = New-Object System.Xml.Xsl.XslCompiledTransform;
$xslt.Load($xsl);
$xslt.Transform($xml, $output);
Write-Host "generated" $output;
R Script (calling command line operations)
library(XML)
# WINDOWS USERS
ps <- '"C:\\Path\\To\\XMLTransform.ps1"' # POWER SHELL SCRIPT
input <- '"C:\\Path\\To\\Input.xml"' # XML SOURCE
xsl <- '"C:\\Path\\To\\XSLTScript.xsl"' # XSLT SCRIPT
output <- '"C:\\Path\\To\\Output.xml"' # BLANK, EMPTY FILE PATH TO BE CREATED
system(paste('Powershell.exe -executionpolicy remotesigned -File',
ps, input, xsl, output)) # NOTE SECURITY BYPASS ARGS
doc <- xmlParse("C:\\Path\\To\\Output.xml")
# UNIX (MAC/LINUX) USERS
system("xsltproc /path/to/XSLTScript.xsl /path/to/input.xml -o /path/to/output.xml")
doc <- xmlParse("/path/to/output.xml")
print(doc)
# <?xml version="1.0" encoding="utf-8"?>
# <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
# <graph id="G" edgedefault="directed">
# <graph id="g1">
# <node id="n0"/>
# <node id="n1"/>
# <node id="n2"/>
# <node id="n3"/>
# </graph>
# <node id="n4"/>
# <edge source="n0"/>
# <edge source="n0"/>
# <edge source="n2"/>
# <edge source="n1"/>
# <edge source="n3"/>
# </graph>
# </graphml>

Resources