Delete all documents matching a query? - xquery

I would like to delete all documents matching some predicates. The query I have come up with is as follows, but nothing is deleted from the database.
I suspect this is because the $doc is set to the XML value of the document rather than the document itself. Can anyone shed any light on this?
xquery version "1.0-ml";
for $doc in cts:search(fn:collection("MYCOLLECTIONNAME")/MyDocumentRoot,
cts:or-query((
cts:element-range-query (xs:QName("MyElement"), "=", "MyElementValue"),
)), "unfiltered" )
return xdmp:document-delete($doc);
The document looks like
<MyDocumentRoot>
<MyElementName>MyElementValue</MyElementName>
</MyDocumentRoot>

You are indeed passing the contents of the documents into xdmp:document-delete instead of its uri. You could derive the uri using for instance fn:base-uri(), but like this all docs you want to delete are retrieved from the database first, which is unnecessary.
Instead, enable the URI lexicon, and use cts:uris to do the deletion. It might also be wise to do the deletion in batches of lets say 1000 docs.
HTH!

xdmp:document-delete() takes the URI of the document, rather than a node. So the simplest fix would be to wrap $doc in base-uri(.):
return xdmp:document-delete(base-uri($doc))
But if you have the URI lexicon enabled, you can write a much faster query like this:
let $query := cts:and-query((
cts:collection-query("MYCOLLECTIONNAME"),
cts:or-query((...)) (: put your full or query here :)
))
for $uri in cts:uris("",(),$query) return xdmp:document-delete($uri)
In the latter case, you avoid having to read each document to get its URI.

xdmp:collection-delete("MYCOLLECTIONNAME") is also worth a mention.

On the last line, change to
xdmp:document-delete(fn:base-uri($doc));

Related

In gatling, how do I validate the value of a string extracted via the css check?

I'm writing a Gatling simulation, and I want to verify both that a certain element exists, and that the content of one of its attributes starts with a certain substring. E.g.:
val scn: ScenarioBuilder = scenario("BasicSimulation")
.exec(http("request_1")
.get("/path/to/resource")
.check(
status.is(200),
css("form#name", "action").ofType[String].startsWith(BASE_URL).saveAs("next_url")))
Now, when I add the startsWith above, the compiler reports an error that says startsWith is not a member of io.gatling.http.check.body.HttpBodyCssCheckBuilder[String]. If I leave the startsWith out, then everything works just fine. I know that the expected form element is there, but I cant confirm that its #action attribute starts with the correct base.
How can I confirm that the attribute start with a certain substring?
Refer this https://gatling.io/docs/2.3/general/scenario/
I have copied the below from there but it is a session function and will work like below :-
doIf(session => session("myKey").as[String].startsWith("admin")) { // executed if the session value stored in "myKey" starts with "admin" exec(http("if true").get("..."))}
I just had the same problem. I guess one option is to use a validator, but I'm not sure how if you can declare one on the fly to validate against your BASE_URL (the documentation doesn't really give any examples). You can use transform and is.
Could look like this:
css("form#name", "action").transform(_.startsWith(BASE_URL)).is(true)
If you also want to include the saveAs call in one go you could probably also do something like this:
css("form#name", "action").transform(_.substring(0, BASE_URL.length)).is(BASE_URL).saveAs
But that's harder to read. Also I'm not sure what happens when substring throws an exception (like IndexOutOfBounds).

How to remove punctuation from a database in marklogic?

I want to remove punctuation from a database of xml document in marklogic. This is made for preprocessing purposes for machine learning. I'm new to marklogic and i don't know how to do that. Is there an xquery query that could remove punctuation?
To do a mass replacement of all text in the database, and take out punctuation, you could start with something that looks like this code (modified for your needs):
for $doc in cts:search(fn:collection(), ())
for $text in $doc//text()
return xdmp:node-replace($text, text{fn:replace($text, "[\.,;]", "")})
To be honest, that task is much less expensive to do on the source text files themselves - or in MarkLogic by treating the XML as string during the replacement process. Updating nodes one element at a time will be expensive.
Outside of Marklogic:
use SED or AWK or a similar tool BEFORE INGESTION
Inside of MarkLogic(as a trigger, perhaps)
use xdmp:quote to change the XML to a string, then replace in a sing with fn:replace and then make XML again with xdmp:unquote
let $new-doc := xdmp:unquote(fn:replace(xdmp:quote($doc), "[\.,;]", ""))
Then either store by replacing the root node with xdmp:node-replace - or store this version as a property. This all depends on if the original (punctuated version matters to you). Or perhaps you just want to keep the original and serve this cleansed version back to someone.
In all cases above, you have to make sure that your replacement does not murder your XML. Also, be aware of options for the functions above(like how cdata is handled.
Lastly, "This is for machine learning purposes". You do not elaborate. I think many of us here have a feeling that this solution (cleansing punctuation before insert) rubs against the very grain of MarkLogic - in which you store as-is and then have awesome index, tokenizing, stemming, collation, search support to find and return your data as you need. If you were to elaborate on your use case a bit, you may inspire others to give more MarkLogic-Specific suggestions.
It will work if you use 'punctuation-insensitive' and if required 'diacritic-insensitive' in cts:element-word-query()
I'm not sure if this is what you're asking, but it's technically possible to update every document in the database to remove punctuation; however, it's very expensive and I wouldn't recommend it.
Using built-in search functions, you can probably achieve the same goal without updating your documents by querying with punctuation insensitivity. For example, if you want to select documents with a title matching a case insensitive string:
cts:search(//mydoc,
cts:element-word-query(xs:QName('title'), 'Moby-Dick', 'punctuation-insensitive'))
Or in an existing XQuery:
for $d in $documents
where cts:contains($d,
cts:element-word-query(xs:QName('title'), 'Moby-Dick', 'punctuation-insensitive'))
return $d/summary

How can I find documents accessed to answer my Xquery query?

I have the following objective. I want to find, which documents contain my data when executing any kind of Xquery or XPath. In other words, I need every document that is providing me the result data for a given query. I try to do this in eXist-db environment, but I suppose there should be something on Xquery level.
I found op:context-document() operator which seems to have functionality I want, yet, as an operator it is not available for me. fn:document-uri also does not do the trick, as its $arg must be a document node, otherwise it returns an empty sequence.
Do you have any idea in mind? All the assistance is highly appreciated.
fn:base-uri() may help; it returns the base URI property of a node:
for $d in doc('....')/your[query]
return base-uri($d)
You can also use it to filter your documents for specific types:
collection('/path/to/documents')[ends-with(base-uri(), '.xml')]
Use the standard XPath / XQuery function collection() .
For example, using Saxon:
collection('file:///a/b/c/d?select=*.xml')[yourBooleanExpression]
selects the document nodes of all XML documents, residing in the /a/b/c/d directory of the filesystem, that satisfy your criteria (yourBooleanExpression evaluates to true())

Access the HTTP Response from xdmp:http-get()

Using MarkLogic to pull in data from a web service with xdmp:http-get() or xdmp:http-post(), I'd like to be able to check the headers that come back before I attempt to process the data. In DQ I can do this:
let $result := xdmp:http-get($query,$options) (: $query and $options are fine, I promise. :)
return $result
And the result I get back looks like this:
<v:results v:warning="more than one node">
<response>
<code>200</code>
<message>OK</message>
<headers>
<server>(actual server data was here)</server>
<date>Thu, 07 Jun 2012 16:53:24 GMT</date>
<content-type>application/xml;charset=UTF-8</content-type>
<content-length>2296</content-length>
<connection>close</connection>
</headers>
</response>
followed by the actual response. the problem is that I can't seem to XPath into this response node. If I change my return statement to return $result/response/code I get the empty sequence. If I could check that code to make sure I got a 200 back before attempting to process the actual data that came back it would be much better than using try-catch blocks to see if the data exists and is sane.
So, if anyone knows how to access those response codes I would love to see your solution.
For the record, I have tried xdmp:get-response-code(), but it doesn't take any parameters, so I don't don't know what response code it's looking at.
You're getting burned by two gotchas at once:
awareness of namespaces
awareness of document nodes
First, the namespace. The XML output of the http-get function is in a namespace as seen by the top-level element:
<response xmlns="xdmp:http-get">
To successfully access elements in that namespace, you need to declare a prefix in your query bound to the correct namespace, and then use that prefix in your XPath expressions. For example:
declare namespace h="xdmp:http-get";
//h:code
Now lets talk about document nodes. :-)
You're trying to access $result as if it is a document node containing an element, but in actuality, it is a sequence of two root nodes (so they're not siblings either). The first one (the one you're interested in here) is a parentless <response> element—not a document containing a <response> element.
This is a common gotcha: knowing when a document node is present or not. Document nodes are always invisible when serialized (hence the gotcha), and they're always present on documents stored in the database. However, when you just use a bare element constructor in XQuery (as the http-get implementation does), you construct not a document node but an element node without a document node parent.
For example, the following query will return the empty sequence, because it's trying to get the <foo> child of <foo>:
declare variable $foo := <foo>bar</foo>;
$foo/foo
On the other hand, the following does return <foo>, because it's getting the <foo> child of the document node (which has to be explicitly constructed, in XQuery):
$declare variable $doc := document{ <foo>bar</foo> };
$doc/foo
So you have to know how a given function's API is designed (whether it returns a document containing an element or just an element).
To solve your problem, don't try to access $result/h:response/h:code (which is trying to get the <response> child of <response>). Instead, access $result/h:code (or more precisely $result[1]/h:code, since <response> is the first of a sequence of two nodes returned by the http-get function).
For more information on document nodes, check out this blog article series: http://community.marklogic.com/blog/document-formats-part1

XQuery: Inserting Nodes

I'm reading in an XML file using XQuery and want to insert several nodes/elements and generate a new XML file. How can I accomplish this?
I've tried using the replace() function, but, it looks like all my XML tags are being stripped when I call doc() to load my document. So calling replace() isn't any good if my XML tags are being removed.
Any help? Are there other technologies I can use?
An extension to the XQuery language allowing updates -- the XQuery Update Facility -- exists to allow documents to be modified.
Inserting a node looks like this:
insert node <foo>bar</foo>
into /bar//baz[id='qux']
Among other engines, this is supported by BaseX.
See http://www.w3.org/TR/xquery-update-10/
replace() is a string operation, so the XML will be converted to a string before replacement.
To create a modified copy of the original file, you can modify an identity transformation which recursively copies the original file to insert the new nodes where required - see the article in the XQuery Wikibook
Alternatively if the file is in an XML database such as eXist, you can use update operations to insert elements in situ.
Using XQuery Scripting you can write programs like this:
variable $stores := doc("stores.xml")/stores;
insert node element store {
element store-number { 4 },
element state { "CA" }
} into $stores;
$stores
You can try such example live at http://www.zorba-xquery.com/html/demo#vpshT+pVURyQSCEOKrFBrF0jyGY=

Resources