xproc: p:xquery with multiple input documents - xquery

According to XProc: W3C Recommendation p:xquery gets only one input document and parameters (which can only be atomic, right?)
<p:declare-step type="p:xquery">
<p:input port="source" sequence="true" primary="true"/>
<p:input port="query"/>
<p:input port="parameters" kind="parameter"/>
<p:output port="result" sequence="true"/>
</p:declare-step>
If my query has multiple input documents (from previous steps), do I really have to store them first and load them inside the query?

No, as the syntax description you quote makes clear, the 'source' port has sequence="true", which means that the step may receive a sequence of documents on the source port, not just one.
So no, you do not really have to store them and then load them inside the query; just feed them into the p:xquery step's source port as a sequence of documents.

Related

How to replace < and > in BizTalk message?

I am new to BizTalk and I need to read some values from a SQL Server table. An example of the result set I am getting is the follow:
<SelectResponse
xmlns="http://schemas.microsoft.com/Sql/2008/05/TableOp/dbo/tableName">
<SelectResult>
<tableName xmlns="http://schemas.microsoft.com/Sql/2008/05/Types/Tables/dbo">
<Message> <item_1> item_1Value </item_1>
<item_2> item_2Value </item_2>
<item_3> item_3Value </item_3>
<item_n> item_3Value </item_n> </Message>
</tableName>
</SelectResult>
</SelectResponse>
So I get my message in BizTalk (the schema is auto-generated from SQL Adapter). What I want is the following:
<SelectResponse
xmlns="http://schemas.microsoft.com/Sql/2008/05/TableOp/dbo/tableName">
<SelectResult>
<tableName xmlns="http://schemas.microsoft.com/Sql/2008/05/Types/Tables/dbo">
<Message>
<item_1> item_1Value </item_1>
<item_2> item_2Value </item_2>
<item_3> item_3Value </item_3>
<item_n> item_3Value </item_n>
</Message>
</tableName>
</SelectResult>
</SelectResponse>
I have the new schema (for item_1, item_2, ...). Considering that <Message> can appear multiple times inside the BizTalk message, what is the easier way to get what I need and how can I do that? Thanks.
The most likely reason you are seeing this is the item Xml content is stored within another Xml structure, Message. Xml stored within Xml is escaped so this isn't an actual problem, it's the expected behavior.
You have several options including:
Use a Stored Procedure that loads and handles the item Xml as Xml and not a string in the result.
In an Orchestration, extract the item Xml, reformat to include a root element and create a new Message based on that.
The problem is how this data has been stored, not how SQL Server is returning it or how BizTalk is presenting it.

Sequence input to <p:filter> in XProc 1.0?

Is <p:filter> in XProc able to accept a sequence of documents as input? When I feed Calabash the following:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:input port="source" sequence="true">
<p:inline>
<doc>
<content>Hello world!</content>
</doc>
</p:inline>
<p:inline>
<doc>
<content>Goodbye world!</content>
</doc>
</p:inline>
</p:input>
<p:output port="result" sequence="true"/>
<p:filter select="//content">
<p:input port="source" sequence="true"/>
</p:filter>
</p:declare-step>
it raises the following error:
err:XD0006 : 2 documents appear on the 'source' port. If sequence is not specified, or has the value false, then it is a dynamic error unless exactly one document appears on the declared port.
#sequence is specified, and with the value "true". If I remove the second inline document from the input, the processing runs to completion successfully. And if I leave the two inputs but replace <p:filter> with something else that accepts a sequence, like <p:count>, it also runs to completion successfully.
I’m confused because the error message doesn’t say that <p:filter> cannot accept a sequence; it tells me to specify a sequence, and I’ve done that. And since XPath filtering can be applied to an XPath collection() function, it isn't clear (well, to me) why it shouldn’t be possible, at least in principle, to filter a sequence of documents in XProc.
I’m also not sure how to read the spec, which says about <p:filter> that:
This step behaves just like an p:input with a select expression except that the
select expression is computed dynamically.
Since <p:input> can accept a sequence, if <p:filter> is said to behave the same way except for filtering, that would seem to imply that <p:filter> should also be able accept a sequence.
I think the options are:
<p:filter> accepts multiple inputs but I haven’t specified that correctly.
<p:filter> does not accept multiple inputs and either the error message and spec are misleading or I’ve failed to understand them correctly.
I’m happy (well, willing) to plead guilty to user error in either case, but I’d be grateful for clarification.
And yes, I can work around the problem by using <p:wrap-sequence> to form the multiple inputs into a single XML tree, but my question is about how <p:filter> works, and not about how to get a specific outcome result. In my actual code it takes 1.5 seconds to read and pass along my real input documents and 4.5 seconds if I add the step of wrapping them, and I’d like to save the 3 seconds, especially because the wrapping would be an ephemeral work-around, since I’m just going to extract content and wind up with multiple documents after the filtering step anyway.
As per the recommendation of the XProc language, one reads the following step defintion at 7.1.9 p:filter:
<p:declare-step type="p:filter">
<p:input port="source"/>
<p:output port="result" sequence="true"/>
<p:option name="select" required="true"/> <!-- XPathExpression -->
</p:declare-step>
You can notice notice that the source port is not declared with sequence="true", thus the second option you mentionned above is the right one.
As as workaround, you can indeed use a <p:wrap>.

Access the HTTP Response from xdmp:http-get()

Using MarkLogic to pull in data from a web service with xdmp:http-get() or xdmp:http-post(), I'd like to be able to check the headers that come back before I attempt to process the data. In DQ I can do this:
let $result := xdmp:http-get($query,$options) (: $query and $options are fine, I promise. :)
return $result
And the result I get back looks like this:
<v:results v:warning="more than one node">
<response>
<code>200</code>
<message>OK</message>
<headers>
<server>(actual server data was here)</server>
<date>Thu, 07 Jun 2012 16:53:24 GMT</date>
<content-type>application/xml;charset=UTF-8</content-type>
<content-length>2296</content-length>
<connection>close</connection>
</headers>
</response>
followed by the actual response. the problem is that I can't seem to XPath into this response node. If I change my return statement to return $result/response/code I get the empty sequence. If I could check that code to make sure I got a 200 back before attempting to process the actual data that came back it would be much better than using try-catch blocks to see if the data exists and is sane.
So, if anyone knows how to access those response codes I would love to see your solution.
For the record, I have tried xdmp:get-response-code(), but it doesn't take any parameters, so I don't don't know what response code it's looking at.
You're getting burned by two gotchas at once:
awareness of namespaces
awareness of document nodes
First, the namespace. The XML output of the http-get function is in a namespace as seen by the top-level element:
<response xmlns="xdmp:http-get">
To successfully access elements in that namespace, you need to declare a prefix in your query bound to the correct namespace, and then use that prefix in your XPath expressions. For example:
declare namespace h="xdmp:http-get";
//h:code
Now lets talk about document nodes. :-)
You're trying to access $result as if it is a document node containing an element, but in actuality, it is a sequence of two root nodes (so they're not siblings either). The first one (the one you're interested in here) is a parentless <response> element—not a document containing a <response> element.
This is a common gotcha: knowing when a document node is present or not. Document nodes are always invisible when serialized (hence the gotcha), and they're always present on documents stored in the database. However, when you just use a bare element constructor in XQuery (as the http-get implementation does), you construct not a document node but an element node without a document node parent.
For example, the following query will return the empty sequence, because it's trying to get the <foo> child of <foo>:
declare variable $foo := <foo>bar</foo>;
$foo/foo
On the other hand, the following does return <foo>, because it's getting the <foo> child of the document node (which has to be explicitly constructed, in XQuery):
$declare variable $doc := document{ <foo>bar</foo> };
$doc/foo
So you have to know how a given function's API is designed (whether it returns a document containing an element or just an element).
To solve your problem, don't try to access $result/h:response/h:code (which is trying to get the <response> child of <response>). Instead, access $result/h:code (or more precisely $result[1]/h:code, since <response> is the first of a sequence of two nodes returned by the http-get function).
For more information on document nodes, check out this blog article series: http://community.marklogic.com/blog/document-formats-part1

Adobe Air - User preferences XML

I need to create and read a user preferences XML file with Adobe Air. It will contain around 30 nodes.
<id>18981</id>
<firstrun>false</firstrun>
<background>green</background>
<username>stacker</username>
...
What's a good method to do this?
Write up an "XML parser" that reads the values and is aware of the data types to convert to based on the "save preferences model." So basically you write a method/class for writing the data from the "save preferences model" to XML then write a method/class for reading from the XML into the "save preferences model", you can use describeType for both. Describe type will return an XML description of the model classes properties and the types of those properties and accessibility (read/write, readonly, write only). For all properties that are read/write you would store them into the XML output, when reading them back in you would do the same thing except you could use the type property from the describeType output to determine if you need to do a string to boolean conversion (if(boolValue == "true")) and string to number conversions, parseInt or parseFloat. You could ultimately store the XML in a local SQL database if you want to keep history, or else just store the current preferences in flat file (using FileReference, or in AIR you can use FileStream to write directly to a location).
Edit:
Agree with Joshua's comment below local shared objects was the first thing I thought of when seeing this, you can eliminate the need to write the XML parser/reader since it will handle serializing/de-serializing the objects for you (but manually looking at the LSO is probably ugly)... anyhow I had done something similar for another project of mine, I tried stripping out the relevant code, to note in my example here I didn't use describe type but the general concept is the same:
http://shaunhusain.com/OnePageSaverLoader/index.php

XQuery - remove nodes based on its sub element being in the "ban" list

I am a total noob with XQuery, but before at start digging deep into it, i'd like to ask some experts advice about whether i am looking at the correct direction.
The problem:
A huge xml file which contains a whole lot of users and their access information (password access rights and so on) example below:
<user>
<name>JC1234</name>
<password>popstar</password>
<accesslevel>0</accesslevel>
</user>
<user>
<name>AHkl</name>
<password>Rudy10!</password>
<accesslevel>2</accesslevel>
</user>
i have a list of user names (csv file) that i need to remove from that huge xml files.
the result should be a new xml file wihtout those removed users....
is this feasable with XQuery?
any advice for a quick and dirty solution is welcomed!
There is no standard way of loading a CSV file in vanilla XQuery 1.0, although most implementations have an unparsed-text function or similar. If not the contents of the file can be passed in as a parameter.
The CSV file can be parsed using the tokenize function:
declare variable $names = tokenize(unparsed-text("banned.csv"), ",")
And the actual query is quite straightforward. Assuming your document is a a fragment containing just a list of <user /> nodes then the query is simply
doc("users.xml")/user[not(name=$names)]
If however the XML file contains a lot of other data then you may find XSLT's templating facilities more useful.

Resources