I have to parse a CSV flat-file containing only line item data, with no recognisable header record, kinda like this:
930001,14-02-2013,100.00,1,Line 1,2,10.00,20.00
930001,14-02-2013,100.00,2,Line 2,2,20.00,40.00
930001,14-02-2013,100.00,3,Line 3,1,40.00,40.00
930002,13-02-2013,200.00,1,Line 1,10,10.00,100.00
930002,13-02-2013,200.00,2,Line 2,5,20.00,100.00
930003,14-02-2013,100.00,1,Line 1,3,20.00,60.00
930003,14-02-2013,100.00,2,Line 2,2,20.00,40.00
Where the fields are, in order:
Order No,Order Date,Order Amt,Line No,Line Desc,Line Qty,Unit Price,Line Price
I want to use the BizTalk Flat File receive pipeline to transform this into a hierarchical schema, grouping on the first field, the Order No:
Order_Batch
+ Order
+ OrderLine
Is there a way to perform a grouping operation via the flat-file receive, so that, in the above instance, the first 3 lines (Order No=930001)
<OrderBatch>
<Order>
<OrderLine>
<OrderNo>930001</OrderNo>
<other_fields />
<LineNo>1</LineNo>
<other_fields_etc />
</OrderLine>
<OrderLine>
<OrderNo>930001</OrderNo>
<other_fields />
<LineNo>2</LineNo>
<other_fields_etc />
</OrderLine>
<OrderLine>
<OrderNo>930001</OrderNo>
<other_fields />
<LineNo>2</LineNo>
<other_fields_etc />
</OrderLine>
</Order>
<Order> ... Details of Order 930002 ... </Order>
<Order> ... Details of Order 930003 ... </Order>
</OrderBatch>
The only option I currently see available to me is to accept the entire file as a set of OrderLine records, un-batched, then perform the batching using the Gather pattern in another Orchestration. I would prefer to Keep It Seriously Simple.
Use a map to translate from flat to hierarchical:
Create a schema for your flat file using the flat file schema wizard
Use a pipeline and the flat file disassembler to get the input message
Create a schema for your desired output xml
Create a map to transform the flat file message to the desired output message
I believe you can use xsl in the map you can do the grouping
Related
I'm using Marklogic8, and our query like below:
query=Color:red,yellow,black AND Size:middle
search options like below:
<options xmlns="http://marklogic.com/appservices/search">
<grammar>
<quotation>"</quotation>
<implicit>
<cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
</implicit>
<starter strength="30" apply="grouping" delimiter=")">(</starter>
<starter strength="40" apply="prefix" element="cts:not-query" tokenize="word">NOT</starter>
<joiner strength="10" apply="infix" element="cts:or-query" tokenize="word">OR</joiner>
<joiner strength="20" apply="infix" element="cts:and-query" tokenize="word">AND</joiner>
<joiner strength="10" apply="infix" element="cts:or-query">,</joiner>
<joiner strength="50" apply="constraint">:</joiner>
</grammar>
<constraint name="Color"><value><element name="Color" ns="" /></value></constraint>
<constraint name="Size"><value><element name="Size" ns="" /></value></constraint>
</options>
We are using this to parse our query text:
cts:query(search:parse($query, $options)
However, it can't parse the query to correct way:
<cts:or-query xmlns:cts="http://marklogic.com/cts">
<cts:element-value-query>
<cts:element>Color</cts:element>
<cts:text xml:lang="en">red</cts:text>
</cts:element-value-query>
<cts:word-query>
<cts:text xml:lang="en">yellow</cts:text>
</cts:word-query>
<cts:word-query>
<cts:text xml:lang="en">black</cts:text>
</cts:word-query>
<cts:element-value-query>
<cts:element>Size</cts:element>
<cts:text xml:lang="en">middle</cts:text>
</cts:element-value-query>
</cts:or-query>
I know that we can use the input query like below:
query=Color:red OR Color:yellow OR Color:black AND Size:middle
But it's too long.
Is there any possible to cut short our input query?
The markLogic Search API does not do that. However, you can write a small custom search constraint on the search API to accomplish what you are trying to do. Custom constraints are passed 2 parameters - the information on the left and right sides of the semi-colon. You could then create the proper query to match as you like. You could probably accomplish this by extending the search library as well.
However, it is also something that you can likely take care of in your logic before passing the query to the server.
It might be worth looking into cts:parse. You have to translate your options to bindings yourself (not too difficult), but you'll get a slightly more advanced, and faster parser for your search strings. It allows for amongst others expressions like:
Color = (yellow red black) AND Size:middle
See also: http://docs.marklogic.com/guide/search-dev/cts_query#id_15151
HTH!
I am running ft:query on a collection which is stored in eXist-db but it's returning zero results. If I use fn:contains function it works perfect but ft:query returns zero results. Below is my XML structure, index configuration file, and query:
test.xml
<article xmlns="http://www.rsc.org/schema/rscart38"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
type="ART"
xsi:schemaLocation="http://www.rsc.org/schema/rscart38 http://www.rsc.org/schema/rscart38/rscart38.xsd" dtd="RSCART3.8">
<metainfo last-modified="2012-11-23T19:16:50.023Z">
<subsyear>1997</subsyear>
<collectiontype>rscart</collectiontype>
<collectionname>journals</collectionname>
<docid>A605867A</docid>
<doctitle>NMR studies on hydrophobic interactions in solution Part
2.—Temperature and urea effect on
the self-association of ethanol in water</doctitle>
<summary/>
</article>
collection.xconf
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index rsc="http://www.rsc.org/schema/rscart38"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
type="ART"
xsi:schemaLocation="http://www.rsc.org/schema/rscart38 http://www.rsc.org/schema/rscart38/rscart38.xsd"
dtd="RSCART3.8">
<fulltext default="all" attributes="false"/>
<lucene>
<analyzer id="nosw" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
<param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet"/>
</analyzer>
<text qname="//rsc:article" analyzer="nosw"/>
</lucene>
<create path="//rsc:doctitle" type="xs:string"/>
<create path="//rsc:journal-full-title" type="xs:string"/>
<create path="//rsc:journal-full-title" type="xs:string"/>
</index>
</collection>
test.xq
declare namespace rsc="http://www.rsc.org/schema/rscart38";
let $coll := collection('/db/apps/test/RSC')
let $hits := $coll//rsc:doctitle[ft:query(., 'studies')]
return
$hits
Let's start from your query. The key part of your query is:
$coll//rsc:doctitle[ft:query(., 'studies')]
This performs a full text query for the string studies on rsc:doctitle elements in the collection. For this ft:query() function to work, there must be an index configuration for the named elements. This brings us to your index configuration.
In your index configuration, you have a full text (Lucene) index:
<text qname="//rsc:article" analyzer="nosw"/>
A couple of issues:
The #qname attribute should be a QName - simply, an element or attribute name. You've expressed this as a path. Remove the path //, leaving just rsc:article.
Your code does a full text query on rsc:doctitle, not on rsc:article, so I would expect your code, as written, to return 0 results. Change the existing index to rsc:doctitle, or add a new index on rsc:doctitle so that you could query either one. Reindex the collection afterwards, and as Adam suggested, check the Monex app's Indexing pane to ensure that the database has applied your index configuration as expected.
Lastly, contains() does not require an index to be in place. It benefits from the presence of a range index (i.e., your <create> elements), but range indexes are quite different from full text indexes. To learn more about these, I'd suggest reading the eXist documentation on indexing, http://exist-db.org/exist/apps/doc/indexing.xml.
I am not certain if configuring a Standard Analyzer without stopwords in the way you have done is correct. Can you check with Monex that your index has your terms in it?
Note also, if you created the index config after loading the index, then you need to reindex the collection. When you reindex it is also worth monitoring $EXIST_HOME/webapp/WEB-INF/exist.log to ensure that the indexing is done as expected.
Reading the docs http://exist-db.org/exist/apps/doc/indexing.xml
I'm finding difficult to understand how and if I can improve the performances of a 'read' query (with 2 parameters: a string and an integer).
Do eXist-db have a default structural index? Can I improve a 2 params query with a 'range index'?
More details about my XML db (note there are 2 different dbs simply merged on the same root):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<db>
<docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26001</number>
<details>
<detail>
<description>legge</description>
<number>19</number>
<date>14/01/1994</date>
</detail>
<detail>
<description>decreto legge</description>
<number>453</number>
<date>15/11/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
</header>
<metas>
<meta>
<number>26002</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
<meta>
<number>26016</number>
<details>
<detail>
<description>decreto legislativo</description>
<number>29</number>
<date>03/02/1993</date>
</detail>
</details>
</meta>
</metas>
</doc>
</docs>
<full_text_docs>
<doc>
<header>
<year>2001</year>
<number>1</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum ...
</text>
</doc>
<doc>
<header>
<year>2001</year>
<number>2</number>
<type>O</type>
<president>ferrari</president>
</header>
<text>lorem ipsum......
</text>
</doc>
</full_text_docs>
</db>
This is my xquery
xquery version "3.0";
let $doc := doc("/db//index_test/test_general.xml")//db/docs/doc
let $fulltxt := doc("/db//index_test/test_general.xml")//db/full_text_docs/doc
return <root> {
for $a in $doc[metas/meta/details/detail[date="03/02/1993" and number = "29"]]/header
return $fulltxt[header/year/text()=$a/year/text() and
header/number/text()=$a/number/text() and
header/type/text()=$a/type/text()
]
} </root>
Basically I simply find for the detail/number and detail/date that matches the input in the first db and take the results for querying the second db. The results are all the <full_text_header> documents that matches.
I would to know if I can create indexes for the fields number and date to improve performance. Note this is the ONLY query I need to optimize (the only I do on this db) obviously number and date changes :).
SOLUTION:
For a clear explanation read the joewiz answer. My problem was the correct recognition of the .xconf file. It have to be placed in /db/yourcollectiondir. If you're using eXide when you create the file you should select Xml type with template "eXist-db collection configuration". When you try to save the file you will see a prompt "Apply configuration?" then click 'ok'. Just then run this xquery xmldb:reindex('/db/yourcollectiondir').
Now if all it's right when you run an xquery involving an index you will see the usage in "Monitoring and profiling".
As that documentation page states, eXist does create a structural index for all XML stored in the database. This is not an index of values, though, so without further indexes, queries based on value (rather than structure) would involve a lookup of values in the DOM. As your data grows larger, looking up values in the DOM gets slower and slower. This is where value-based indexes, such a range index, saves the day. (For a fuller explanation, see the "Indexing" section of Wolfgang Meier's "Tuning the Database" article, which is essential for getting the most performance out of eXist.)
So, yes, you can create indexes for the <number> and <date> fields. I'd recommend the "new range" index, as described on that documentation page. Your collection.xconf file setting up these indexes would look like this:
<collection xmlns="http://exist-db.org/collection-config/1.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<index>
<range>
<create qname="number" type="xs:integer"/>
<create qname="date" type="xs:string"/>
</range>
</index>
</collection>
You have to store this within the /db/system/config/ collection, in a subcollection corresponding to the location of your data in the database. So if your data is located in /db/apps/myapp/data, you would place this collection.xconf file in /db/system/config/db/apps/myapp/data.
Note that the configuration here would only affect the for clause's queries of date and number values, and not the predicates in the return clause, which depend on the values of <year> and <type> elements. So, to ensure your query maximized the use of indexes, you should declare indexes on these; it seems that xs:integer would be the appropriate type for each.
Lastly, I would suggest eliminating the /text() steps, which are completely extraneous. For more on the use/abuse of text(), see Evan Lenz's article, "text() is a code smell".
Update (2016-07-17): With the updated code sample above, I have a couple of additional suggestions. First, since the code is in /db/index_test, we will store our files as follows:
Assuming you're using eXide, when you store the collection.xconf file in a collection, eXide will prompt you to have a copy of the file placed in the correct location in /db/system/config. If you're not using eXide, you need to store the collection.xconf file there yourself.
Using the unmodified query, I can confirm that despite the presence of the collection.xconf file, monex shows no indexes are being applied:
Let's make a few modifications to the file to ensure indexes are properly applied:
xquery version "3.0";
<root> {
for $a in doc("/db/index_test/test_general.xml")//detail[date = "03/02/1993" and number = 29]/ancestor::doc/header
return
doc("/db/index_test/test_general.xml")/db/full_text_docs/doc
[
header/year = $a/year and
header/number = $a/number and
header/type = $a/type
]
} </root>
With these modifications, monex shows that indexes are applied to the comparisons in the for clause:
The insights here are derived from the "Tuning the Database" article. To get full indexing for all comparisons, you will need to define additional indexes and may need to make similar modifications to your query.
One final note: the version of monex you see in these pictures is using a feature I added this weekend, called "Tare", which tries to filter out other operations from the query profiling results in order to help the user see just the effects of their own query. This feature is still just a pull request, so running the current release version, you won't see identical results.
I have an MS Access DB where the Saved Imports inside the External Data has Import Jobs which are actually importing certain data from various locations to SOME tables. I am unable to find out which tables are actually imported with each of these jobs present there as the names given for these imports are unclear and unrelated. Is there any way I could find out to which table the import actually brings the data ?
The items that appear when you click "Saved Imports" on the "External Data" tab are stored as ImportExportSpecification objects in the CurrentProject.ImportExportSpecifications collection. Each object has a .Name property and an .XML property (among others). The details of the import operation are in the XML data, for example
<?xml version="1.0"?>
<ImportExportSpecification Path="C:\Users\Public\zzz.csv" xmlns="urn:www.microsoft.com/office/access/imexspec">
<ImportText TextFormat="Delimited" FirstRowHasNames="false" FieldDelimiter="," TextDelimiter="" CodePage="437" Destination="MyNewTable">
<DateFormat DateOrder="YMD" DateDelimiter="-" TimeDelimiter=":" FourYearDates="true" DatesLeadingZeros="false"/>
<NumberFormat DecimalSymbol="."/>
<Columns PrimaryKey="id">
<Column Name="Col1" FieldName="id" Indexed="YESDUPLICATES" SkipColumn="false" DataType="Long" Width="2"/>
<Column Name="Col2" FieldName="textfield" Indexed="NO" SkipColumn="false" DataType="Text" Width="4"/>
</Columns>
</ImportText>
</ImportExportSpecification>
The Path= attribute of the <ImportExportSpecification> element indicates the location of the file to be imported.
The Destination= attribute of the <ImportText> element specifies the name of the table into which the data will be imported.
I have two test groups which are dependent on another group.
<dependencies>
<group name="search" depends-on="login" />
<group name="addnew" depends-on="login" />
</dependencies>
Which one out of the two groups (search, addnew) should ideally get executed first? For me, the group addnew is getting executed first all the time, which I don't want to happen. I want search to get executed and then addnew to get executed, once login is done. Also, I have set "preserve-order" for the test as true. Any suggestions?
If you want search to get executed first, then add new is also dependent on search group in that case. You can specify a list of groups in the depends-on list. Try with depends-on="login search" or you can let search depend on login and make add new depend on search to guarantee execution order.
Quote from documentation: "By default, TestNG will run your tests in the order they are found in the XML file. If you want the classes and methods listed in this file to be run in an unpredictible order, set the preserve-order attribute to false:"
<test name="Regression1" preserve-order="false">
<class name="test.Test1">
<methods>
<include name="m1" />
<include name="m2" />
</methods>
</class>
<class name="test.Test2" />