R SAX Parse attribute in empty element XML - r

I am new to R and I cannot find example of extracting specific attribute in an empty element, most example i found are extracting data value of a child nodes.
In a nutshell, how to extract an attribute using xmlEventParse() for XML like this:
<elements>
<element attribute1="value" attribute2="value"/>
<element attribute1="value" attribute2="value"/>
</elements>
Assuming I want to get attribute1 on the 2nd element.
Thanks in advance.
Update: Found the solution. It is xmlAttrs(root[[2]])[['attribute1']]

Related

Extract XML child attribute based on another child attribute

I have the following XML structure. I am trying to extract the attributes StartDate and EndDate of the relationship period, that is only if rr:PeriodType is RELATIONSHIP_PERIOD.
However, the nodes for "relationship" and "accounting" have exactly the same name and am not sure how to proceed.
<rr:RelationshipPeriods>
<rr:RelationshipPeriod>
<rr:StartDate>2018-01-01T00:00:00.000Z</rr:StartDate>
<rr:EndDate>2018-12-31T00:00:00.000Z</rr:EndDate>
<rr:PeriodType>ACCOUNTING_PERIOD</rr:PeriodType>
</rr:RelationshipPeriod>
<rr:RelationshipPeriod>
<rr:StartDate>2019-01-02T00:00:00.000Z</rr:StartDate>
<rr:PeriodType>RELATIONSHIP_PERIOD</rr:PeriodType>
</rr:RelationshipPeriod>
</rr:RelationshipPeriods>
I tried using this code
ldply(xpathApply(xmlData, '//rr:RelationshipPeriod/rr:StartDate', getChildrenStrings), rbind)
But doesn't work well as it's hard to understand if it is extracting accounting or relationship period.
Any help would be greatly appreciated!
For rr:StartDate use XPath:
//rr:RelationshipPeriod[rr:PeriodType='RELATIONSHIP_PERIOD']/rr:StartDate
But probably better to first find the correct rr:RelationshipPeriod using XPath:
//rr:RelationshipPeriod[rr:PeriodType='RELATIONSHIP_PERIOD']
See this answer on how to reuse the result of a XPath.
But don't use // in front of rr:StartDate and rr:EndDate

Not Able to fetch element using Get Element in Robot Framework

I have the below xml snippet and i am unable to fetch the element using Get Element.
<configuration commit-localtime="2020-06-27 12:48:13 IST" commit-seconds="1593242293" commit-user="root">
<groups>
<name>group1</name>
<interfaces>
<interface>
<name><*></name>
<unit>
Is the xpath=configuration/groups/name incorrect?
Have also tried xpath=name but does not work.
Get error as No element matching 'configuration/groups/name' found
Your Xpath is incorrect, I have created a small test case as an example. In it you can see an xpath expression that returns what you want.
Fetch element in XML document
${root} = Parse XML ${XML}
log ${root}
${first} = Get Element ${root} xpath=.//groups/name
Should Be Equal ${first.text} group1

getNodeSet {XML} not working when XML root node contains "xlmns" attribute

I have an xml document like this
<MasterDataSet xmlns="http://tempuri.org/MasterDataSet.xsd">
<t_attribute>
<class_id>2</class_id>
<description>Latitude</description>
</t_attribute>
<t_object>
<name>Ship</name>
</t_object>
...
</MasterDataSet>
With many "t_attribute" and "t_object" nodes. I want to get a node set of all the "t_object" nodes so I use getNodeSet with xPath:
library("XML")
emtree0 <- xmlParse("EM0.xml", useInternalNodes = TRUE)
onlyobjects <- getNodeSet(emtree0,"/MasterDataSet//t_object")
But this returns an empty list.
However if I modify the XML file to look like this, i.e. if I remove the xmlns attribute, it works perfectly:
<MasterDataSet>
<t_attribute>
...
Any suggestions to make the code work without having to remove the xmlns attribute?

XQuery Create where clause based on xml structure as a kind of dynamic where clause

this is about XQuery - I am using MarkLogic as Database.
I have data as in the following example:
<instrument name="myTest1" id="test1">
<daten>
<daily>
<day date="2016-02-05">
<screener>
<column name="i1">
<value>1</value>
<bg>red</bg>
</column>
<column name="i2">
<value>1</value>
<fg>lime</bg>
</column>
<column name="i4">
<fg>black</bg>
</column>
</screener>
</day>
</daily>
</daten>
</instrument>
I have many instruments, and each one has an entry for each day in the daily element, and inside screener, there can be manz columns, all with different names. Some screeners include more columns than others. Each column can include a value element, a bg element and a fg element.
I want to search for instruments that fullfill specific criteria about what kind of columns do have children with specific values. Example: I want a sequence of all instruments, that for a given day, have a value 1 for column i1 and that have a fg black for column i2
Since I have many different of those conditions, I would not like to hardcode them in XQuery where clauses. I did that for a few and it works, but the code gets a lot of duplications and is hard to maintain.
My question is, is it possible to build a where clause in a FLOWR statement programatically, meaning, based on another xml structure, which could look like this:
<searchpatterns>
<pattern name="test1">
<c>
<name>i1</name>
<element>value</element>
<value>1</value>
</c>
<c>
<name>i2</name>
<element>fg</element>
<value>red</value>
<modifier>not</modifier>
</c>
</pattern>
</searchpatterns>
which would find those instruments, where the screener has a column i1 which itself has a value of 1, and also it must not have column i2 with a fg of red.
When I do it the normal way I query my date like this:
for $res in doc()/instrument
where $res/daten/daily/day[#date="2016-02-05"]/screener/column[#name="i1"]/value/text()="1"
and res/daten/daily/day[#date="2016-02-05"]/screener/column[#name="i2"]/fg/text()!="red"
This kind of where clause I want to generate based on an XML structure.
I did some research of the MarkLogic inbuilt cts:search function and a lot of stuff around it but it seems to be for something else (more user interactive searching)
If you have a hint to point me in the right direction, if what I want is even possible, I would very much appreciate it.Thanks!
The doc()/instrument XPath asks for every document with an instrument element and then filters those documents.
Where possible, it's usually better in MarkLogic to model the documents so you can use the indexes to retrieve as few documents as possible. It's also usually better to use cts:search() instead of XPath to generate the sequence so you are working directly with the indexes.
In this case, you might consider using the values of the name attribute as elements instead of the generic "column." You could then generate a cts:element-query that matches the name containing a cts:element-value-query that matches the value within the name.
Hoping that helps,
Yes, this can be achieved programmatically. If you want to check whether an element satisifes a test for every item in a sequence, the every ... satisfies construct comes to mind. So in this case it could be:
for $res in doc()/instrument
where every $pattern in $searchpatterns/pattern/c satisfies (
let $equal := $res/daten/daily/day[#date="2016-02-05"]/screener/column[#name = $pattern/name]/*[name() = $pattern/element] = $pattern/value
return if ($pattern/modifier = "not") then not($equal) else $equal
)
return $res
So every $pattern will be checked. I assume the modifier element is supposed to modify the equal construct. So we first check if the element satisfies the equal condition and the we check whether the modifier element is equal to not. Of course, applying the same idea could also be used to implement other modifiers as well.

how to use the BizTalk Flat File Mapping Wizard for nested repeating items?

I have a flat file with some repeating sections in it, and I'm confused how to create the schema via the BT flat file mapping wizard. The file looks like this:
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
As you can see, the 006* records can repeat. I'm going to want to wind up with XML that looks like this:
<001Stuff>...</001Stuff>
<002Stuff>...</002Stuff>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
<006BStuff>...</006BStuff>
</006Loop>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
</006Loop>
Obviously I can't just set the first group of 006* records to "Repeating record" and Ignore the second set. I'm used to dealing with single repeating rows via the wizard (i.e. another 006 row right after the first one) and not nested things like this - any suggestions on how to proceed? Thanks!
Working with the Flat File Schema Wizard is quite hard and there is only so much it can help you with. I always seem to have to tweak its output a little bit.
In order to make things a little bit easier, I suggest you should restrict your sample document to a single occurrence of the whole <006> structure. You will not have to set many lines to Ignored using the Flat File Schema Wizard :
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
Next, each repeating structure should be wrapped inside a corresponding Repeating Record in the definition of your Xml Schema.
Please, note that you can always run the Flat File Schema Wizard recursively on nested structures to have more fine-grained control. So I would suggest, first, to run the wizard with an all-encompassing repeating <006> structure, like so :
Then, you can right click on the structure, and provide a more detailed definition of nested child structures, only highlighting a subset of the sample contents, like so:
Then, the most important part: you need to tweak the Child Order property to Conditional Default for both repeating structures, because there is only one empty line at the end of your document file and the Wizard cannot help you out with this situation.
For reference, your resulting structure should look like so:
With the following settings:
BunchOfStuff (Root) : Delimited, 0x0D 0x0A, Suffix.
_001Stuff : Delimited, ,, Prefix, Tag Identifier 001.
_002Stuff : Delimited, ,, Prefix, Tag Identifier 002.
_006Loop : Delimited, 0x0D 0x0A, Conditional Default.
_006Stuff : Delimited, ,, Prefix, Tag Identifier 006.
_006AStuff : Delimited, ,, Prefix, Tag Identifier 006A.
_006BLoop : Delimited, 0x0D 0x0A, Conditional Default.
_006BStuff : Delimited, ,, Prefix, Tag Identifier 006B.
Hope this helps.
Treat everything from the first start of the first 006, record to the start of the second 006, record as one record. When you define 006 record, set it up as a repeating record also. This should create a node for each 660, group and nodes for each 600 under it.
That is what I would try.
Here is my output after 2 minutes of work. Except for the node/element names I think it is what you want. You would still have to create seperate elements for each of the fields in your data.
<_x0030_01 xmlns="">001,bunch of data
<_x0030_02 xmlns="">002,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data

Resources