how to use the BizTalk Flat File Mapping Wizard for nested repeating items? - biztalk

I have a flat file with some repeating sections in it, and I'm confused how to create the schema via the BT flat file mapping wizard. The file looks like this:
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
As you can see, the 006* records can repeat. I'm going to want to wind up with XML that looks like this:
<001Stuff>...</001Stuff>
<002Stuff>...</002Stuff>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
<006BStuff>...</006BStuff>
</006Loop>
<006Loop>
<006Stuff>...</006Stuff>
<006AStuff>...</006AStuff>
<006BStuff>...</006BStuff>
</006Loop>
Obviously I can't just set the first group of 006* records to "Repeating record" and Ignore the second set. I'm used to dealing with single repeating rows via the wizard (i.e. another 006 row right after the first one) and not nested things like this - any suggestions on how to proceed? Thanks!

Working with the Flat File Schema Wizard is quite hard and there is only so much it can help you with. I always seem to have to tweak its output a little bit.
In order to make things a little bit easier, I suggest you should restrict your sample document to a single occurrence of the whole <006> structure. You will not have to set many lines to Ignored using the Flat File Schema Wizard :
001,bunch of data
002,bunch of data
006,bunch of data
006A,bunch of data
006B,bunch of data
006B,bunch of data
Next, each repeating structure should be wrapped inside a corresponding Repeating Record in the definition of your Xml Schema.
Please, note that you can always run the Flat File Schema Wizard recursively on nested structures to have more fine-grained control. So I would suggest, first, to run the wizard with an all-encompassing repeating <006> structure, like so :
Then, you can right click on the structure, and provide a more detailed definition of nested child structures, only highlighting a subset of the sample contents, like so:
Then, the most important part: you need to tweak the Child Order property to Conditional Default for both repeating structures, because there is only one empty line at the end of your document file and the Wizard cannot help you out with this situation.
For reference, your resulting structure should look like so:
With the following settings:
BunchOfStuff (Root) : Delimited, 0x0D 0x0A, Suffix.
_001Stuff : Delimited, ,, Prefix, Tag Identifier 001.
_002Stuff : Delimited, ,, Prefix, Tag Identifier 002.
_006Loop : Delimited, 0x0D 0x0A, Conditional Default.
_006Stuff : Delimited, ,, Prefix, Tag Identifier 006.
_006AStuff : Delimited, ,, Prefix, Tag Identifier 006A.
_006BLoop : Delimited, 0x0D 0x0A, Conditional Default.
_006BStuff : Delimited, ,, Prefix, Tag Identifier 006B.
Hope this helps.

Treat everything from the first start of the first 006, record to the start of the second 006, record as one record. When you define 006 record, set it up as a repeating record also. This should create a node for each 660, group and nodes for each 600 under it.
That is what I would try.
Here is my output after 2 minutes of work. Except for the node/element names I think it is what you want. You would still have to create seperate elements for each of the fields in your data.
<_x0030_01 xmlns="">001,bunch of data
<_x0030_02 xmlns="">002,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data
<_x0030_06 xmlns="">
<_x0030_06_Child1>bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>A,bunch of data
<_x0030_06_Child2>
<_x0030_06_Child2_Child1>B,bunch of data

Related

OpenEdge Progress 4GL WRITE-XML NAMESPACE-PREFIX

Hi Progress OpenEdge dev,
I am using the following syntax to generate an XML file from temp table. All is good but for one item.
dataset dsCust:write-xml("FILE", "c:/Test/Customer.xml", true).
This is my temp table declaration
def temp-table ttCustomer no-undo
namespace-uri "http://WMS.URI"
namespace-prefix "ns0"
field PurchaseOrderNumber as char
field Plant as char.
This is my output
<ns0:GoodsReceipt xmlns:ns0="http://WMS.URI">
<ns0:PurchaseOrderNumber/>
<ns0:Plant>Rose</ns0:Plant>
</ns0:GoodsReceipt>
But this is my desired output
<ns0:GoodsReceipt xmlns:ns0="http://WMS.URI">
<PurchaseOrderNumber/>
<Plant>Rose</Plant>
</ns0:GoodsReceipt>
Notice the element inside GoodsReceipt node does not have ns0 prefix.
Can this achived using write-xml? I want to avoid using DOM or SAX if possible.
Thank you
You can always manually set attributes and tag-names using XML-NODE-TYPE and SERIALIZE-NAME.
However: I've worked with lot's of xml:s and API:s together with Progress OpenEdge and have yet to fail based on namespace-problems but I guess it might depend on what you want to do with the data.
Since you don't include the entire dataset this is something of a guess. It produces more or less what you want for this specific case. I don't know how multiple "receipts" should be rendered though so you might need to change this.
DEFINE TEMP-TABLE ttCustomer NO-UNDO SERIALIZE-NAME "ns0:GoodsReceipt"
FIELD xmlns AS CHARACTER SERIALIZE-NAME "xmlns:ns0" INITIAL "http://WMS.URI" XML-NODE-TYPE "ATTRIBUTE"
FIELD PurchaseOrderNumber AS CHARACTER
FIELD Plant AS CHARACTER .
DEFINE DATASET dsCust SERIALIZE-HIDDEN
FOR ttCustomer .
CREATE ttCustomer.
ASSIGN Plant = "Rose".
DATASET dsCust:write-xml("FILE", "c:/temp/Customer.xml", TRUE).
From a quick Google on the subject, it seems the W3C suggests that the namespace prefix should be presented the way OpenEdge does it: https://www.w3schools.com/xml/xml_namespaces.asp
And I'm pretty certain you can't change the behaviour with write-xml like you want to either. The documentation doesn't mention any way of overriding the behaviour. https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dvxml/namespace-uri-and-namespace-prefix.html

Why does InDesign put "c_" before literal text?

I'm working with the old interchange format of Adobe InDesign (.inx files).
This XML file has text contents like the following:
<pcnt>c_Stackoverflow

</pcnt>
Which results in
Stackoverflow<CR><CR>
Question: Why does it put c_ before the actual value and not simply use CDATA in doubt?
Adobe has been encoding everything.
The reason they prefix the payload is because you can have several values in a single field, hence, CDATA would not be working.
c_ indicates a string
x_ represents a list (enum)
x_a represents a list with 0x0a items (a = 10 hexadecimal)
l_ represents a long
See a full list of all prefixes in the PDF:
http://partners.adobe.com/public/developer/en/indesign/sdk/working_with_inx_file_format.pdf

Using Marklogic Xquery data population

I have the data as below manner.
<Status>Active Leave Terminated</Status>
<date>05/06/2014 09/10/2014 01/10/2015</date>
I want to get the data as in the below manner.
<status>Active</Status>
<date>05/06/2014</date>
<status>Leave</Status>
<date>09/10/2014</date>
<status>Terminated</Status>
<date>01/10/2015</date>
please help me on the query, to retrieve the data as specified above.
Well, you have a string and want to split it at the whitestapces. That's what tokenize() is for and \s is a whitespace. To get the corresponding date you can get the current position in the for loop using at. Together it looks something like this (note that I assume that the input data is the current context item):
let $dates := tokenize(date, "\s+")
for $status at $pos in tokenize(Status, "\s+")
return (
<status>{$status}</status>,
<date>{$dates[$pos]}</date>
)
You did not indicate whether your data is on the file system or already loaded into MarkLogic. It's also not clear if this is something you need to do once on a small set of data or on an on-going basis with a lot of data.
If it's on the file system, you can transform it as it is being loaded. For instance, MarkLogic Content Pump can apply a transformation during load.
If you have already loaded the content and you want to transform it in place, you can use Corb2.
If you have a small amount of data, then you can just loop across it using Query Console.
Regardless of how you apply the transformation code, dirkk's answer shows how you need to change it. If you are updating content already in your database, you'll xdmp:node-delete() the original Status and date elements and xdmp:node-insert-child() the new ones.

BizTalk Varying Length Flat File using Single Schema for Transform

I have a pipe delimited .txt Flat File that I'm using to do bulk insert to SQL. Everything works well for straight one to one. However, the Flat File now contains 2 new fields that can repeat an unknown number of times.
Is there a way to create a single flat file schema where I can have an unbounded child within the main unbounded child? I think the place I'm getting tripped up is how to make the ChildRoot listed below just a "group heading" like Root is where ChildRoot doesn't correspond to a location in the flat file. How do I insert something like that?
Schema:
-Roots
--Root (unbounded)
---ChildID
---ChildName
Roots gets a direct link to my sql stored procedure to do a bulk insert on as many "Root" rows that come in.
Now I have:
Schema:
-Roots
--Root (unbounded)
---Child
---ChildName
---ChildRoot (unbounded)
----ChildRootID
----ChildRootName
**EDIT
I should also add that ChildRootID & ChildRootName can repeat an indefinite number of times until the row delimiter (carriage return) is found

How to read a non-standard DBF memo (BLOB) file from ACT?

I am trying to convert data from Act 2000 to a MySQL database. I have successfully imported the DBF files into individual MySQL tables. However I am having issues with the *.BLB file, which seems to be a non-standard memo file.
The DBF files, identifies themselves as dbase III Plus, No memo format. There is a single *.BLB which is a memo file for multiple DBFs to share BLOB data.
If you read this document: http://cicorp.com/act/sdk/ACT6-SDK-ChapterA.htm#_Toc483994053)
You can see that the REGARDING column is a 6 character one. The description is: This 6-byte field is supplied by the system and contains a reference to a field in the Binary Large Object (BLOB) Database.
Now upon opening the *.BLB I can see that the block size is 64 bytes. All the blocks of text are NULL padded out to that size.
Where I am stumbling is trying to convert the values stored in the REGARDING column to blocks location in the BLB file. My assumption is that 6 character field is an offset.
For example, one value for REGARDING is, (ignoring the square brackets): [ ",J$]
In my Googling, I found this: http://ulisse.elettra.trieste.it/services/doc/dbase/DBFstruct.htm#C1.5
It explains that in memo fields (in normal DBF files at least) the space value is ignore (i.e. it's padding out the column).
Therefore if I'm correct (again, square brackets) [",J$] should be the offset in my BLB file. Luckily I've still got access to the original ACT2000 software, so I can compare the full text in the program / MySQL and BLB file.
Using my example value, I know that the DB row with REGARDING value of [ ",J$] corresponds to a 1024 byte offset (or 16 blocks, assuming my guess of a 64 byte sized block).
I've tried reading some Python code for open source projects that read DBF files - but I'm in over my head.
I think what I need to do is unpack the characters to binary, but am not sure.
How can I find the 64-block based spot to read from based on what's found in the DBF files?
EDIT by Jerry Dodge
I've attempted to reverse-engineer the strings in this field to hexadecimal values, and then to an integer value using StrToInt64, but the result still does not match up with the blob file. I've also tried multiplying this integer value by 64 and not multiplying, but the result keeps winding up outside of the size of the blob file, not actually finding any data.
For example, a value of ___/BD (_ = space) translates to $2f4244 hexidecimal, which in turn translates to the integer value of 3097156, but does not correspond with any relevant portion of data in the blob file, even when multiplied or divided by 64.
According to the SDK you linked, the following happens as I understand:
There is a TYPE field (right behing REGARDING) that encodes what REGARDING is used for (see the second table of the linked chapter). So I'd assume that if type=6 (meeting not held) the REGARDING is either irrelevant or only contains a meeting ID reference from some other table. On that line of thought I would only expect REGARDING to be a BLB offset if type=101 (or possibly 100). I'd also not abandon the thought that in these relevant cases TYPE might be a concatenation of BLB file index and offset (because there is a mention that each file must not be longer than 30K chars and I really expect to be able to store much more data even in one table).

Resources