Sequence input to <p:filter> in XProc 1.0? - xproc

Is <p:filter> in XProc able to accept a sequence of documents as input? When I feed Calabash the following:
<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step"
version="1.0">
<p:input port="source" sequence="true">
<p:inline>
<doc>
<content>Hello world!</content>
</doc>
</p:inline>
<p:inline>
<doc>
<content>Goodbye world!</content>
</doc>
</p:inline>
</p:input>
<p:output port="result" sequence="true"/>
<p:filter select="//content">
<p:input port="source" sequence="true"/>
</p:filter>
</p:declare-step>
it raises the following error:
err:XD0006 : 2 documents appear on the 'source' port. If sequence is not specified, or has the value false, then it is a dynamic error unless exactly one document appears on the declared port.
#sequence is specified, and with the value "true". If I remove the second inline document from the input, the processing runs to completion successfully. And if I leave the two inputs but replace <p:filter> with something else that accepts a sequence, like <p:count>, it also runs to completion successfully.
I’m confused because the error message doesn’t say that <p:filter> cannot accept a sequence; it tells me to specify a sequence, and I’ve done that. And since XPath filtering can be applied to an XPath collection() function, it isn't clear (well, to me) why it shouldn’t be possible, at least in principle, to filter a sequence of documents in XProc.
I’m also not sure how to read the spec, which says about <p:filter> that:
This step behaves just like an p:input with a select expression except that the
select expression is computed dynamically.
Since <p:input> can accept a sequence, if <p:filter> is said to behave the same way except for filtering, that would seem to imply that <p:filter> should also be able accept a sequence.
I think the options are:
<p:filter> accepts multiple inputs but I haven’t specified that correctly.
<p:filter> does not accept multiple inputs and either the error message and spec are misleading or I’ve failed to understand them correctly.
I’m happy (well, willing) to plead guilty to user error in either case, but I’d be grateful for clarification.
And yes, I can work around the problem by using <p:wrap-sequence> to form the multiple inputs into a single XML tree, but my question is about how <p:filter> works, and not about how to get a specific outcome result. In my actual code it takes 1.5 seconds to read and pass along my real input documents and 4.5 seconds if I add the step of wrapping them, and I’d like to save the 3 seconds, especially because the wrapping would be an ephemeral work-around, since I’m just going to extract content and wind up with multiple documents after the filtering step anyway.

As per the recommendation of the XProc language, one reads the following step defintion at 7.1.9 p:filter:
<p:declare-step type="p:filter">
<p:input port="source"/>
<p:output port="result" sequence="true"/>
<p:option name="select" required="true"/> <!-- XPathExpression -->
</p:declare-step>
You can notice notice that the source port is not declared with sequence="true", thus the second option you mentionned above is the right one.
As as workaround, you can indeed use a <p:wrap>.

Related

getopt_long confusion, flag setting with single-char switches

I'm parsing command line options using getopt_long based on the example from the man page. That example does something a bit sneaky, it includes two flag-setting options in long_options but does not list those in the short form parameter string in getopt_long.
Poking about, this answer clarifies what happens in this case, if they enter the short-form it does not set the flag. And leads to my question: If you have a flag-setting switch, and you want to have both long and short forms for that switch on the command line...
Did I miss an option in the short-form string to set flags, sort of like how : works for indicating a parameter follows? Like "v?1" or something?
Failing that, is it best-practices to not do it in the long-form as well, and just return 'v'? Or would "the average code" use the flag-set and have separate code in the switch?
I'd like my code to be "as standard as possible", whatever that means, which is my I'm curious about (2).

xproc: p:xquery with multiple input documents

According to XProc: W3C Recommendation p:xquery gets only one input document and parameters (which can only be atomic, right?)
<p:declare-step type="p:xquery">
<p:input port="source" sequence="true" primary="true"/>
<p:input port="query"/>
<p:input port="parameters" kind="parameter"/>
<p:output port="result" sequence="true"/>
</p:declare-step>
If my query has multiple input documents (from previous steps), do I really have to store them first and load them inside the query?
No, as the syntax description you quote makes clear, the 'source' port has sequence="true", which means that the step may receive a sequence of documents on the source port, not just one.
So no, you do not really have to store them and then load them inside the query; just feed them into the p:xquery step's source port as a sequence of documents.

XML XQuery basic querying results

I'm here with a question that I hope can be answered, which is really quite silly and basic.
I have a file of authors in the format of:
<authorRoot>
<author>
<info tags on author>
</author>
etc
</authorRoot>
and all I wish to do is, through FLWOR, return a list where each 'author' and its information is a different value, so when I run the query, the result should come out looking like
1. <author><info>.....</info></author>
2. <author><info>.....</info></author>
etc
and I am CERTAIN that something as simple as that should just be the following code
xquery version "1.0";
for $x in //author
return $x
yet when I do so, the query result comes out as
1.<author><info>...</info></author><author><info>...</info></author><author><info>...</info></author><author><info>...</info></author><author><info>...</info></author>....etc
I'm relatively new to XQuery, and I'm using AltovaSpy. I've done similar questions as basic as this (where I have a file of similar layout and I use essentially the same code, resulting in an xquery result page of multiple values, not just one long one) but for this file it just doesn't seem to work! Is it something with my code that I'm just not seeing? Or could it be the file, perhaps?
Thank you for whatever input you have on the situation.
Well, your reasoning is correct. .
It is just a formatting issue, it seems Altova prints the entire sequence in a single line without linebreaks.
You can also try it in my XQuery online tester, there you can see that the sequence is as you expected it to be.
If you watch this demo video of Altova XMLSpy and advance to 2:35 you will see how clicking on one of the toolbar buttons (which appears to be labeled "Pretty-print") will format the results of your XQuery as nicely indented XML.

XML Header Creation

I have a few working XQuery scripts that I would like to use to generate valid XML documents. Therefore I would also like to include an XML header: <?xml version="1.0" encoding="UTF-8"?>. I am aware that this header is optional but I want it to be included regardless, especially to specify the right encoding.
However, I'm at a loss as to how I would insert this header into the output. My editor (XMLSpy) complains on any variation I can think of to insert the header.
According to the few resources I've found on this, it may be that I'm not supposed to generate this header manually, but rather let my serializer do this for me, possibly by setting interpreter options with declare option. I can't find any information on this when it comes to XMLSpy however.
Is there a way to insert this header manually? If not, do I need to modify my interpreter so it is generated automatically?
I think the declare options are implementation specific, so you'd need to look up the correct option for your xQuery processor. For example:
Saxon:
declare option saxon:output "omit-xml-declaration=no";
Marklogic:
declare option xdmp:output "omit-xml-declaration=no";
However, you should be able to manually output the prolog by prepending this to your output:
<?xml version="1.0" encoding="UTF-8" ?>
Presuming you're using AltovaXML for executing your xquery, you need to set omitXMLDeclaration to false (the default is true) through whatever interface you're executing it with - command line, java, or whatever. There does not appear to be a xquery-level option declaration to set this. The docs at http://manual.altova.com/AltovaXML/altovaxmlcommunity/ show how to do it for each of the various cases.

Access the HTTP Response from xdmp:http-get()

Using MarkLogic to pull in data from a web service with xdmp:http-get() or xdmp:http-post(), I'd like to be able to check the headers that come back before I attempt to process the data. In DQ I can do this:
let $result := xdmp:http-get($query,$options) (: $query and $options are fine, I promise. :)
return $result
And the result I get back looks like this:
<v:results v:warning="more than one node">
<response>
<code>200</code>
<message>OK</message>
<headers>
<server>(actual server data was here)</server>
<date>Thu, 07 Jun 2012 16:53:24 GMT</date>
<content-type>application/xml;charset=UTF-8</content-type>
<content-length>2296</content-length>
<connection>close</connection>
</headers>
</response>
followed by the actual response. the problem is that I can't seem to XPath into this response node. If I change my return statement to return $result/response/code I get the empty sequence. If I could check that code to make sure I got a 200 back before attempting to process the actual data that came back it would be much better than using try-catch blocks to see if the data exists and is sane.
So, if anyone knows how to access those response codes I would love to see your solution.
For the record, I have tried xdmp:get-response-code(), but it doesn't take any parameters, so I don't don't know what response code it's looking at.
You're getting burned by two gotchas at once:
awareness of namespaces
awareness of document nodes
First, the namespace. The XML output of the http-get function is in a namespace as seen by the top-level element:
<response xmlns="xdmp:http-get">
To successfully access elements in that namespace, you need to declare a prefix in your query bound to the correct namespace, and then use that prefix in your XPath expressions. For example:
declare namespace h="xdmp:http-get";
//h:code
Now lets talk about document nodes. :-)
You're trying to access $result as if it is a document node containing an element, but in actuality, it is a sequence of two root nodes (so they're not siblings either). The first one (the one you're interested in here) is a parentless <response> element—not a document containing a <response> element.
This is a common gotcha: knowing when a document node is present or not. Document nodes are always invisible when serialized (hence the gotcha), and they're always present on documents stored in the database. However, when you just use a bare element constructor in XQuery (as the http-get implementation does), you construct not a document node but an element node without a document node parent.
For example, the following query will return the empty sequence, because it's trying to get the <foo> child of <foo>:
declare variable $foo := <foo>bar</foo>;
$foo/foo
On the other hand, the following does return <foo>, because it's getting the <foo> child of the document node (which has to be explicitly constructed, in XQuery):
$declare variable $doc := document{ <foo>bar</foo> };
$doc/foo
So you have to know how a given function's API is designed (whether it returns a document containing an element or just an element).
To solve your problem, don't try to access $result/h:response/h:code (which is trying to get the <response> child of <response>). Instead, access $result/h:code (or more precisely $result[1]/h:code, since <response> is the first of a sequence of two nodes returned by the http-get function).
For more information on document nodes, check out this blog article series: http://community.marklogic.com/blog/document-formats-part1

Resources