Xquery can't access XML HTTP content as XML document - xquery

In HTTP POST requests, I am getting simple XML like this:
<createcoll>
<title>some foo title</title>
<editor>foouser1</editor>
<editor>foouser2</editor>
<editor/>
<indexer>foouser3</indexer>
<indexer>foouser4</indexer>
<indexer/>
</createcoll>
In eXist-db I have an xquery module which examines the request, extracts the content, and then must transform it into a document to be stored on the server.
Getting the content works without a problem:
let $content := request:get-data()
And I can store it directly on the server:
let $store := xmldb:store("/db/apps/myapp/data","foodoc.xml",$content)
And once it is stored I can query it like an XML document.
However I need to access $content via Xquery in order to extract and transform data before actually storing it. But, for example, this statement returns nothing:
let $editors := $content//editor
return $editors
I suspect XQuery doesn't "see" it yet as an XML document? How do I get it to see it as such?
(The document has no namespaces.)
Many thanks.

Related

XQuery file returning invalid entity reference using special characters

I have the following query in a MarkLogic XQuery file, and I am seeing the following error message returned
XDMP-ENTITYREF: (err:XPST0003) Invalid entity reference " " . See the MarkLogic server error log for further detail.
The following is the code I am using in the XQuery file.
xquery version "1.0-ml";
declare variable $query :=
cts:or-query
((
cts:element-word-query(xs:QName("lines"),"l&l"),
cts:element-word-query(xs:QName("lines"),"pool & cue"),
cts:element-word-query(xs:QName("lines"),"look")
));
declare function local:do-query(){
element xml {
for $i in cts:uris( (), (), $query)
let $item := doc($i)
return
element item {
element title { $item/title/string() }
}
}
};
local:do-query()
Obviously the 2x tags i am looking for are l&l and pool & cue. I have also looked into the repair-full suggestion in another question posted, but couldn't figure out how that fits into this query. If I removed the ones with special characters, it works as expected.
Any ideas?
Based on the additional info in the comments to the question, this is not an issue with the execution of the code, but rather with deployment of the code.
This happens often if you insert code using QConsole, or some other ways in which you evaluate XQuery code. The & get interpreted, and translated to the & character it represents. If you then write that into a .xqy file into some Modules database, it does not get escaped back into & again, since XQuery files are stored as plain text in MarkLogic, and & doesn't get escaped in plain text.
A better way to deploy code is by uploading or inserting from disk. That way characters like &, >, and { inside XML won't get interpreted, but preserved and inserted as is. There are tools like ml-gradle and Roxy that make deploying MarkLogic code very easy. Consider using these. Alternatively you could also look into using Curl against the Management REST api.
If you want to use QConsole after all, escape characters like & twice. E.g. & becomes &amp;, and < becomes &lt;.
HTH!

parse escaped HTML into node in xqilla

I'm trying to get text from an rss 2.0 feed (description tag) using XQilla. The address is here. This is fine but the tag contains escaped HTML like
"<a href="some_address>..."
It would be useful to have this HTML in a node and further work with it, but I am at a loss here. I can get the tag contents with
let $desc := $item/*[name()='description']
but do not know how to unescape it. I tried parse-html, which only strips the text of tags and returns a string, like the data() function. Searching on the web suggests that extension functions exist for this, but in other parsers. Is there a way to do it in XQilla? By the way, the code I am working on is a JAWS ResearchIt lookup source.
XQilla has – like lots of other XQuery implementations – a proprietary function to load XML and HTML from a string (they don't have anchor tags, thus you need to scroll through the document, I'm sorry).
xqilla:parse-xml($xml as xs:string?) as document-node()?
xqilla:parse-html($html as xs:string?) as document-node()?
Given $desc contains the unparsed HTML, xqilla:parse-html($desc) will return the parse result.

Access the HTTP Response from xdmp:http-get()

Using MarkLogic to pull in data from a web service with xdmp:http-get() or xdmp:http-post(), I'd like to be able to check the headers that come back before I attempt to process the data. In DQ I can do this:
let $result := xdmp:http-get($query,$options) (: $query and $options are fine, I promise. :)
return $result
And the result I get back looks like this:
<v:results v:warning="more than one node">
<response>
<code>200</code>
<message>OK</message>
<headers>
<server>(actual server data was here)</server>
<date>Thu, 07 Jun 2012 16:53:24 GMT</date>
<content-type>application/xml;charset=UTF-8</content-type>
<content-length>2296</content-length>
<connection>close</connection>
</headers>
</response>
followed by the actual response. the problem is that I can't seem to XPath into this response node. If I change my return statement to return $result/response/code I get the empty sequence. If I could check that code to make sure I got a 200 back before attempting to process the actual data that came back it would be much better than using try-catch blocks to see if the data exists and is sane.
So, if anyone knows how to access those response codes I would love to see your solution.
For the record, I have tried xdmp:get-response-code(), but it doesn't take any parameters, so I don't don't know what response code it's looking at.
You're getting burned by two gotchas at once:
awareness of namespaces
awareness of document nodes
First, the namespace. The XML output of the http-get function is in a namespace as seen by the top-level element:
<response xmlns="xdmp:http-get">
To successfully access elements in that namespace, you need to declare a prefix in your query bound to the correct namespace, and then use that prefix in your XPath expressions. For example:
declare namespace h="xdmp:http-get";
//h:code
Now lets talk about document nodes. :-)
You're trying to access $result as if it is a document node containing an element, but in actuality, it is a sequence of two root nodes (so they're not siblings either). The first one (the one you're interested in here) is a parentless <response> element—not a document containing a <response> element.
This is a common gotcha: knowing when a document node is present or not. Document nodes are always invisible when serialized (hence the gotcha), and they're always present on documents stored in the database. However, when you just use a bare element constructor in XQuery (as the http-get implementation does), you construct not a document node but an element node without a document node parent.
For example, the following query will return the empty sequence, because it's trying to get the <foo> child of <foo>:
declare variable $foo := <foo>bar</foo>;
$foo/foo
On the other hand, the following does return <foo>, because it's getting the <foo> child of the document node (which has to be explicitly constructed, in XQuery):
$declare variable $doc := document{ <foo>bar</foo> };
$doc/foo
So you have to know how a given function's API is designed (whether it returns a document containing an element or just an element).
To solve your problem, don't try to access $result/h:response/h:code (which is trying to get the <response> child of <response>). Instead, access $result/h:code (or more precisely $result[1]/h:code, since <response> is the first of a sequence of two nodes returned by the http-get function).
For more information on document nodes, check out this blog article series: http://community.marklogic.com/blog/document-formats-part1

parsing simple xml with jquery from asp.net webservice

I'm breaking my head over this for a while now and I have no clue what I do wrong.
The scenario is as followed, I'm using swfupload to upload files with a progressbar
via a webservice. the webservice needs to return the name of the generated thumbnail.
This all goes well and though i prefer to get the returned data in json (might change it later in the swfupload js files) the default xml data is fine too.
So when an upload completes the webservice returns the following xml as expected (note I removed the namespace in webservice):
<?xml version="1.0" encoding="utf-8"?>
<string>myfile.jpg</string>
Now I want to parse this result with jquery and thought the following would do it:
var xml = response;
alert($(xml).find("string").text());
But I cannot get the string value. I've tried lots of combinations (.html(), .innerhtml(), response.find("string").text() but nothing seems to work. This is my first time trying to parse xml via jquery so maybe I'm doing something fundemantally wrong. The 'response' is populated with the xml.
I hope someone can help me with this.
Thanks for your time.
Kind regards,
Mark
I think $(xml) is looking for a dom object with a selector that matches the string value of XML, so I guess it's coming back null or empty?
The First Plugin mentioned below xmldom looks pretty good, but if your returned XML really is as simply as your example above, a bit of string parsing might be quicker, something like:
var start = xml.indexOf('<string>') + 8;
var end = xml.indexOf('</string>');
var resultstring = xml.substring(start, end);
From this answer to this question: How to query an XML string via DOM in jQuery
Quote:
There are a 2 ways to approach this.
Convert the XML string to DOM, parse it using this plugin or follow this tutorial
Convert the XML to JSON using this plugin.
jQuery cannot parse XML. If you pass a string full of XML content into the $ function it will typically try to parse it as HTML instead using standard innerHTML. If you really need to parse a string full of XML you will need browser-specific and not-globally-supported methods like new DOMParser and the XMLDOM ActiveXObject, or a plugin that wraps them.
But you almost never need to do this, since an XMLHttpRequest should return a fully-parsed XML DOM in the responseXML property. If your web service is correctly setting a Content-Type response header to tell the browser that what's coming back is XML, then the data argument to your callback function should be an XML Document object and not a string. In that case you should be able to use your example with find() and text() without problems.
If the server-side does not return an XML Content-Type header and you're unable to fix that, you can pass the option type: 'xml' in the ajax settings as an override.

XQuery: Inserting Nodes

I'm reading in an XML file using XQuery and want to insert several nodes/elements and generate a new XML file. How can I accomplish this?
I've tried using the replace() function, but, it looks like all my XML tags are being stripped when I call doc() to load my document. So calling replace() isn't any good if my XML tags are being removed.
Any help? Are there other technologies I can use?
An extension to the XQuery language allowing updates -- the XQuery Update Facility -- exists to allow documents to be modified.
Inserting a node looks like this:
insert node <foo>bar</foo>
into /bar//baz[id='qux']
Among other engines, this is supported by BaseX.
See http://www.w3.org/TR/xquery-update-10/
replace() is a string operation, so the XML will be converted to a string before replacement.
To create a modified copy of the original file, you can modify an identity transformation which recursively copies the original file to insert the new nodes where required - see the article in the XQuery Wikibook
Alternatively if the file is in an XML database such as eXist, you can use update operations to insert elements in situ.
Using XQuery Scripting you can write programs like this:
variable $stores := doc("stores.xml")/stores;
insert node element store {
element store-number { 4 },
element state { "CA" }
} into $stores;
$stores
You can try such example live at http://www.zorba-xquery.com/html/demo#vpshT+pVURyQSCEOKrFBrF0jyGY=

Resources