Parsing input file with DFDL page by page - dfdl

I have a very simple row based document in which each row comprises a record. I want to have a DFDL that parses it into chunks that contains fixed number of rows(say, for example 3 records at each chunk).
Original File:
record1
record2
record3
record4
record5
record6
record7
After DFDL Parse:
1) [record1, record2, record3]
2) [record4, record5, record6]
3) [record7]
I am currently able to get all records at once with the following DFDL, but it creates a serious problem when the size of document gets bigger, that' s why i want to get these records page by page. Is it possible to do this? Does anyone have any idea how can this be done?
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="../IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}" encoding="{$dfdl:encoding}" escapeSchemeRef="recSepFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:encoding="{$dfdl:encoding}" ibmSchExtn:docRoot="true" name="MM1">
<xsd:complexType>
<xsd:sequence dfdl:separator="%CR;%LF;%WSP*;" dfdl:terminator="">
<xsd:element dfdl:alignment="1" dfdl:escapeSchemeRef="" dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" maxOccurs="unbounded" name="body" type="xsd:string">
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Thanks

<xsd:element ibmSchExtn:docRoot="true" name="MM1">
<xsd:complexType>
<xsd:sequence dfdl:separator="" dfdl:terminator="">
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="unbounded" name="Chunk">
<xsd:complexType>
<xsd:sequence dfdl:separator="%CR;%LF;%WSP*;" dfdl:terminator="">
<xsd:element dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" maxOccurs="3" name="Body" type="xsd:string">
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

Related

Importing XSD and data into R

I generated a XML output from a database (MSSQL2014) and now would like to consume the data as well as the XSD schema information into a R data frame.
Data Source: MSSQL2014 - AdventureWorks2014 database
Query executed:
select top 1 *
from person.person as p
join person.EmailAddress as ea on p.businessEntityID = ea.businessentityID
join person.PersonPhone as pphone on p.businessEntityID = pphone.businessentityID
for XML AUTO, ELEMENTS, XMLSCHEMA('person');
Below is the File generate. I tried this R code to import but it is unsuccessful. Does anyone have a guide/tip to point me to the right direction?
RCode:
library(XML)
(xml_data <- xmlParse(gsub(" ", "", "C:\\dissertation\\smta\\indata\\01_Source_Query.XML", fixed = TRUE), asText = TRUE))
xml_attrib <- xpathSApply(doc=xml, path="//person", xmlAttrs)
df2 <- data.frame(t(xml_attrib))
df2
<xsd:schema targetNamespace="person" xmlns:schema="person" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo" />
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactRecord" />
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactTypes" />
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey" />
<xsd:element name="p">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BusinessEntityID" type="sqltypes:int" />
<xsd:element name="PersonType">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="2" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="NameStyle" type="sqltypes:bit" sqltypes:sqlTypeAlias="[AdventureWorks2014].[dbo].[NameStyle]" />
<xsd:element name="Title" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="8" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="FirstName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks2014].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="50" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="MiddleName" minOccurs="0">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks2014].[dbo].[Name]">
<xsd:restri ction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="50" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="LastName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks2014].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="50" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="Suffix" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="10" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="EmailPromotion" type="sqltypes:int" />
<xsd:element name="AdditionalContactInfo" minOccurs="0">
<xsd:complexType sqltypes:xmlSchemaCollection="[AdventureWorks2014].[Person].[AdditionalContactInfoSchemaCollection]">
<xsd:complexContent>
<xsd:restriction base="sqltypes:xml">
<xsd:sequence>
<xsd:any processContents="strict" minOccurs="0" maxOccurs="unbounded" namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactRecord http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactTypes"
/></xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
<xsd:element name="Demographics" minOccurs="0">
<xsd:complexType sqltypes:xmlSchemaCollection="[AdventureWorks2014].[Person].[IndividualSurveySchemaCollection]">
<xsd:complexContent>
<xsd:restriction base="sqltypes:xml">
<xsd:sequence>
<xsd:any processContents="strict" minOccurs="0" maxOccurs="unbounded" namespace="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey" /></xsd:sequence>
</xsd:restriction>
</xsd:complexContent>
</xsd:complexType>
</xsd:element>
<xsd :element name="rowguid" type="sqltypes:uniqueidentifier" />
<xsd:element name="ModifiedDate" type="sqltypes:datetime" />
<xsd:element ref="schema:ea" minOccurs="0" maxOccurs="unbounded" /></xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="ea">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BusinessEntityID" type="sqltypes:int" />
<xsd:element name="EmailAddressID" type="sqltypes:int" />
<xsd:element name="EmailAddress" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="50" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="rowguid" type="sqltypes:uniqueidentifier" />
<xsd:element name="ModifiedDate" type="sqltypes:datetime" />
<xsd:element ref="schema:pphone" minOccurs="0" maxOccurs="unbounded" /></xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="pphone">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="BusinessEntityID" type="sqltypes:int" />
<xsd:element name="PhoneNumber">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks2014].[dbo].[Phone]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
<xsd:maxLength value="25" /></xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="PhoneNumberTypeID" type="sqltypes:int" />
<xsd:element name="ModifiedDate" type="sqltypes:datetime" /></xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<p xmlns="person">
<BusinessEntityID>1</BusinessEntityID>
<PersonType>EM</PersonType>
<NameStyle>0</NameStyle>
<FirstName>Ken</FirstName>
<MiddleName>J</MiddleName>
<LastName>Sánchez</LastName>
<EmailPromotion>0</EmailPromotion>
<Demographics>
<IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey">
<TotalPurchaseYTD>0</TotalPurchaseYTD>
</IndividualSurvey>
</Demographics>
<rowguid>92C4279F-1207-48A3-8448-4636514EB7E2</rowgu id>
<ModifiedDate>2009-01-07T00:00:00</ModifiedDate>
<ea>
<BusinessEntityID>1</BusinessEntityID>
<EmailAddressID>1</EmailAddressID>
<EmailAddress>ken0#adventure-works.com</EmailAddress>
<rowguid>8A1901E4-671B-431A-871C-EADB2942E9EE</rowguid>
<ModifiedDate>2009-01-07T00:00:00</ModifiedDate>
<pphone>
<BusinessEntityID>1</BusinessEntityID>
<PhoneNumber>697-555-0142</PhoneNumber>
<PhoneNumberTypeID>1</PhoneNumberTypeID>
<ModifiedDate>2009-01-07T00:00:00</ModifiedDate>
</pphone>
</ea>
</p>
I prefer the rest/xml2 package for parsing files.
library(xml2)
library(rvest)
page<-read_html("C:\\dissertation\\smta\\indata\\01_Source_Query.XML")
persons<-html_nodes(page, xpath = "//p")
fieldnames<-xml_name(xml_find_all(persons, ".//*"))
fields<-xml_text(xml_find_all(persons, ".//*"))
df<-data.frame(fieldnames, fields)
The file seems to be interested as html. The code above reads the data file finds the paragraph tag for the persons and extracts the field names and values and places them into a data frame. If there are multiple persons in your file then the lines defining field names and values will need to be vectorized (most likely with supply). Some clean-up is required to remove a few extraneous rows added to the final data frame.
Good luck.

Bpel arrays transformation foreach

I have a bpel process which receive an array and output another one.
The thing is, I need to get the first one elements, populate the second one and add some for elements in the second one.
My first one array is from this kind of object:
<xsd:complexType name="comment_A">
<xsd:sequence>
<xsd:element name="id" type="xsd:int"/>
<xsd:element name="username" type="xsd:string"/>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="picture" type="xsd:base64Binary"/>
<xsd:element name="date" type="xsd:string"/>
<xsd:element name="hour" type="xsd:string"/>
<xsd:element name="bus-line" type="xsd:string"/>
<xsd:element name="bus-number" type="xsd:integer"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="rate" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
My second one:
<xsd:complexType name="comment_B">
<xsd:sequence>
<xsd:element name="id" type="xsd:int"/>
<xsd:element name="username" type="xsd:string"/>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="picture" type="xsd:base64Binary"/>
<xsd:element name="date" type="xsd:string"/>
<xsd:element name="hour" type="xsd:string"/>
<xsd:element name="bus-line" type="xsd:string"/>
<xsd:element name="bus-number" type="xsd:integer"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="rate" type="xsd:int"/>
<xsd:element name="type-comment" type="xsd:string"/>
<xsd:element name="liked-number" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
So first, I tried iterate over the first array to populate the second one with properties which both have in common. I tried use forEach element.
My code seems like this:
<forEach parallel="yes" counterName="c" name="forEachComment">
<startCounterValue>1</startCounterValue>
<finalCounterValue>count($comments.VwCommentCollection/ns3:VwComment)</finalCounterValue>
<scope name="Scope1">
<assign name="assignResult">
<extensionAssignOperation>
<bpelx:copyList>
<bpelx:from>$comments.VwCommentCollection[$c]</bpelx:from>
<bpelx:to>$outputVariable.payload</bpelx:to>
</bpelx:copyList>
</extensionAssignOperation>
<copy>
<from>$comments.VwCommentCollection[$c]/ns3:VwComment/ns3:id</from>
<to>$outputVariable.payload/ns2:comment/ns2:id</to>
</copy>
</assign>
</scope>
</forEach>
I tried do this first just with id element for a test, however when the comment_A array has size greater than 1, I receive an exception
$comment is my variable which comment_A array
I found a solution following this Brazilian blog: http://blog.iprocess.com.br/2012/09/oracle-soa-suite-11g-uso-da-atividade-assign-no-bpel/
I used append opperation inside BPEL

Unmarshal xhtml as string using xsd

I'm trying to unmarshal a large xhtml document using XSD's and jaxb. I've got everything working except for one part, which contains pure html. Here is an example of the xhtml I'm getting (I am able to grab every element except the "content"):
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">...</title>
<id>...</id>
<updated>...</updated>
<entry>
<id>...</id>
<title type="text">...</title>
<updated>...</updated>
<author>
<name>...</name>
</author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">
<div>{html...}<div>{html...}</div>/<div>/<div>
</content>
</entry>
</feed>
Here's an expansion of the xsd file:
<xsd:complexType name="ApCategoriesJAXB" >
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="entry" type="tns:ApEntryJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApEntryJAXB">
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="name" type="xsd:string" minOccurs="0"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="author" type="tns:ApAuthorJAXB" minOccurs="0"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="category" type="tns:ApCategoryJAXB" minOccurs="0"></xsd:element>
<xsd:element name="content" type="tns:ApContentJAXB" minOccurs="0"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApCategoryJAXB" >
<xsd:sequence></xsd:sequence>
<xsd:attribute name="term" type="xsd:string" />
<xsd:attribute name="label" type="xsd:string" />
<xsd:attribute name="scheme" type="xsd:string" />
</xsd:complexType>
<xsd:complexType name="ApContentJAXB" >
<xsd:sequence>
<xsd:element name="div" type="tns:ApDivJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApDivJAXB" >
<xsd:sequence>
<xsd:any namespace="http://www.w3.org/2005/Atom" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
I have tried every combination of nested xsd elements, complexTypes, xsd:any etc etc and cannot seem to get this "content" value no matter what I try. I am happy to take all the html as a string, or unmarshal it into an object.
Thank you in advance for any thoughts.
** I've edited the xsd part to include relevant parts. I've tried both nesting the "any" element in the "div" complexType as seen, as well as skipping the "div" complexType altogether.
Thanks again.
If you want <content> to have a type of xsd:string, you need to encode or otherwise escape the HTML. You could use CDATA sections, Base64 encoding, or escape all the entities (e.g. < to <, etc.).
Otherwise, xsd:any should work. Can you provide a more complete example of your XSD when you tried this?

Biztalk mapper altering nodes order

I need to map document X to document Y, being both quite similar. X has the following [fragment of] XSD:
<xsd:complexType>
<xsd:sequence>
<xsd:choice minOccurs="3" maxOccurs="unbounded">
<xsd:element maxOccurs="unbounded" ref="A" />
<xsd:element maxOccurs="unbounded" ref="B" />
<xsd:element maxOccurs="unbounded" ref="C" />
<xsd:element minOccurs="0" maxOccurs="unbounded" ref="D"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
Y has the same elements (A,B,C), but they're not within a sequence.
My problem arises when I test the map with the following input:
<doc-X>
<A>...</A>
<B>...</B>
<C>...</C>
<D>...</D>
<C>...</C>
<D>...</D>
</doc-X>
I get something like this:
<doc-Y>
<A>...</A>
<B>...</B>
<C>...</C>
<C>...</C>
<D>...</D>
<D>...</D>
</doc-Y>
I don't understand why is this happening, since I just map each element with its corresponding pair on the other schema.
EDIT : I've tried putting the property PreserveSequenceOrder to "Yes", but that hasn't worked
XSD on its own does not guarantee that sibling elements will appear in any given order. From what you've described, it sounds like the output is perfectly valid according to its schema. Are you actually getting a failure in your test map?
Is there any way you can post the complete schema and document instances?
You are getting that output because, Map always look for output document ( Document Y) Connections from top elment to bottom element. So in your case it will execute first elment A ( in Document Y) Links, afterthat B, after that C.
Try to modify the output doucment( Document Y) xsd to sth like this
<xsd:complexType>
<xsd:sequence>
<xsd:choice minOccurs="3" maxOccurs="unbounded">
<xsd:element maxOccurs="unbounded" ref="D" />
<xsd:element maxOccurs="unbounded" ref="C" />
<xsd:element maxOccurs="unbounded" ref="B" />
<xsd:element minOccurs="0" maxOccurs="unbounded" ref="A"/>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
You will see the difference.....

SOAP Response not unserialized correctly in Flex 4

I am seeing certain nodes of my SOAP response disappearing in Flex 4. I am using an <mx:WebService> that is written using PHP/nusoap and 99% of the responses are unserialized correctly in Flex. For some reason this snippet is causing problems:
RAW XML:
<data xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="tns:reportData[1]">
<item xsi:type="tns:reportData">
<name xsi:type="xsd:string">Tue. 8 Mar. 2011</name>
<year xsi:type="xsd:int">2011</year>
<month xsi:type="xsd:int">3</month>
<day xsi:type="xsd:int">8</day>
<counts xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:double[3]">
<item xsi:type="xsd:double">26</item>
<item xsi:type="xsd:double">11</item>
<item xsi:type="xsd:double">11</item>
</counts>
</item>
</data>
The only element to show in the Flex ProxyObject is "name". All other values are simply ignored.
The WSDL defines reportData as:
<xsd:complexType name="reportData">
<xsd:all>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="url" type="xsd:string"/>
<xsd:element name="year" type="xsd:int"/>
<xsd:element name="month" type="xsd:int"/>
<xsd:element name="day" type="xsd:int"/>
<xsd:element name="hour" type="xsd:int"/>
<xsd:element name="counts" type="tns:reportCountList"/>
<xsd:element name="breakdown_total" type="tns:reportCountList"/>
<xsd:element name="breakdown" type="tns:reportDataList"/>
</xsd:all>
</xsd:complexType>
Any ideas why this XML will not unserialize correctly?
From the comments above:
According to w3.org/TR/2001/REC-xmlschema-1-20010502/#element-all minOccurs defaults to 1 meaning it needs to be explicitly set to minOccurs=0 in the WSDL.
The above SOAP response was missing the required url element. Changing the WSDL to explicitly define the url element as optional minOccurs="0" fixes the issue.

Resources