How to know the distinct namespaces in a database in MarkLogic? - xquery

I have a database in MarkLogic server. The database has many collections. Some of these collections have a namespace and some have different namespace. What is the query to know the distinct namespaces? My goal is to build a search application that would allow users to use a search bar and have the documents returned from the most relevant collections. Since all the collections have different xml structure I also want to customize the display of the documents base on the collection and search.

However you question is not clear to me but if you want to get all unique namespace from your DB you may run:
fn:distinct-values(//namespace-uri())
and if you want to get all unique collection from DB (on collection lexion of the DB):
cts:collections()
and if you want to perform search on particular collection only:
in search:search use:
<additional-query>{cts:collection-query('collectionName')}</additional-query>
in cts:search use:
cts:collection-query(("reports", "analysis")))

One way to get the list of unique collections in a database is to use the App Services Search API. You can specify a collection constraint in the search options which will return the unique collections. The example below specifies a collection constraint without a prefix, then returns a list of the facet values with the number of documents counted for each collection.
(: insert test documents here :)
xquery version "1.0-ml";
for $i in 0 to 5
let $collection := "https://example.com/" || $i
for $j in 0 to $i
return xdmp:document-insert("/example-doc/" || $i || "-" || $j, <example/>, (), $collection);
(: Use search API to get collections as a facet :)
xquery version "1.0-ml";
import module namespace search =
"http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
(: build a collection constraint facet :)
let $options :=
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="collections">
<collection prefix="" facet="true" />
</constraint>
</options>
(: return facets ordered by the number of documents in each collection :)
let $facets := search:search("", $options)/search:facet/search:facet-value
for $facet in $facets
order by $facet/#count descending
return (element collection {($facet/#name, $facet/#count)})
Returning:
<collection name="https://example.com/5" count="6"/>
<collection name="https://example.com/4" count="5"/>
<collection name="https://example.com/3" count="4"/>
<collection name="https://example.com/2" count="3"/>
<collection name="https://example.com/1" count="2"/>
<collection name="https://example.com/0" count="1"/>

What I have seen most typically with applications built on MarkLogic is that they either have a single search ui for all document types, or separate search ui for each document type. You can always to full-text search across any document type, and you can define and show facets regardless if they apply to all or only a subset of the documents. Collection name could be a facet for instance, but you could also have a facet called Keyword that only applies to two of the collections, and another facet called Company that applies to three other ones.
In short, think of what end users functionality you'd like to provide first, and think how to technically implement that as second step. I doubt knowing the namespaces truly matters to the search ui, and likely only matters on index level.
HTH!

Related

can we limit the number of documents return in flwor expression?

I have many documents in a database where I want to search for a specific condition. I saw can use predicates, but it works under only one document, is that correct?
FLOWR expression work with many documents under whole database, but while returning the number the document can we use predicate [] too for limiting the number of rows to return.
Need to get all points while querying a document and query across a Database or whole database in Marklogic 10.
You can write
for $x at $position in ....
where $position le 100
return ...

In XQuery Marklogic how to sort dynamically?

In XQuery Marklogic how to sort dynamically?
let $sortelement := 'Salary'
for $doc in collection('employee')
order by $doc/$sortelement
return $doc
PS: Sorting will change based on user input, like data, name in place of salary.
If Salary is the name of the element, then you could more generically select any element in the XPath with * and then apply a predicate filter to test whether the local-name() matches the variable for the selected element value $sortelement:
let $sortelement := 'Salary'
for $doc in collection('employee')
order by $doc/*[local-name() eq $sortelement]
return $doc
This manner of sorting all items in the collection may work with smaller number of documents, but if you are working with hundreds of thousands or millions of documents, you may find that pulling back all docs is either slow or blows out the Expanded Tree Cache.
A more efficient solution would be to create range indexes on the elements that you intend to sort on, and could then perform a search with options specified to order the results by cts:index-order with an appropriate reference to the indexed item, such as cts:element-reference(), cts:json-property-reference(), cts:field-reference().
For example:
let $sortelement := 'Salary'
return
cts:search(doc(),
cts:collection-query("employee"),
cts:index-order(cts:element-reference(xs:QName($sortelement)))
)
Not recommended because the chances of introducing security issues, runtime crashes and just 'bad results' is much higher and more difficult to control --
BUT available as a last resort.
ALL XQuery can be dynamically created as a string then evaluated using xdmp:eval
Much better to follow the guidance of Mads, and use the search apis instead of xquery FLOWR expressions -- note that these APIs actually 'compile down' to a data structure. This is what the 'cts constructors' do : https://docs.marklogic.com/cts/constructors
I find it helps to think of cts searches as a structured search described by data -- which the cts:xxx are simply helper functions to create the data structure.
(they dont actually do any searching, they build up a data structure that is used to do the searching)
If you look at the source to the search:xxx apis you can see how this is done.

How do we apply the ORDER BY and LIMIT,OFFSET Clauses to DAO.fetch(query) in cn1-data-access?

DAO.fetch(query) allows us to get a collection of entities from the sqlite database that meets the query condition. query can be a map or string []. How can we specify ordering with the ORDER BY clause and also how do we apply the LIMIT and OFFSET clauses or do we have to default to db.execute(query)?
Currently ORDER BY, LIMIT, and OFFSET clauses aren't supported. It wouldn't be hard to add. Please file an RFE.
Alternatively it wouldn't be difficult to add this in your own DAO subclass. You can see how fetch(query) is implemented here.

How can add specific relationship or query types as an attribute on a Bookshelf Model?

I've been using KnexJS for a while, and want to transition to BookshelfJS, as my model classes on the server side are starting to get a little hairy, and why re-invent the wheel.
For a lot of my API server, what I want to do is pre-fetch a list of related models (a document has many and belongs to many editors) without necessarily pre-fetching the whole thing. Ideally, I'd end up with
document = {
id: 1
body: 'foobar'
editor_ids: [1, 2]
}
Now, I can do this by doing editors: belongsToMany(Profiles) on the Document definition, and then doing a fetch().withRelated(['editors']), but the problem there is that it returns the full Profile object on the fetch.
This generates an extraneous join (documents_editors join editors on editors.id = documents_editors.editor_id) that's not needed, and to conform to the spec my client app expects, (the IDs embedded and then the profiles themselves added later in the JSON response only optionally, and actually in practice never, because profiles tend to get cached and loaded elsewhere), I have to manually shove the editor_ids attribute in there by parsing through Document.relations, which also adds (a tiny tiny bit) of extra time.
So, ultimately, I can do what I want but it's not elegant. Ideally, there's something in BookshelfJS where I could do something like
Document = bookshelf.Model.extend
tableName: 'documents'
fancyValue: ->
#rawQuery 'select editor_id from documents_editors where document_id = ?', [#id]
Or build a knex-style query in there. I know in the above particular use case, a raw query is kind of overkill, but I do actually have some more annoying queries to run as well. I track user-community memberships, and permission grants on documents to communities, which means I use a postgres-style CTE to do something like
with usergroups as
(
select communities.id from communities inner join edges
on communities.id = edges.parent_id and edges.parent_type = 'communities'
and edges.child_id = ? and edges.child_type = 'profiles'
and edges.type = 'grant: comment'
)
select distinct documents.id as parent_id, 'documents' as parent_type
from documents inner join edges
on edges.parent_id = documents.id and edges.parent_type = 'documents'
and edges.type = 'grant: edit'
and documents.type = 'collection'
where (edges.child_type = 'profiles' and edges.child_id = ?) or
(edges.child_type = 'communities' and edges.child_id in (select id from usergroups))
(which finds all the documents of type 'collection' that the user in question can edit, either because they were directly added as an editor or because they belong to a community which was granted edit access).
A description of how I approached this problem can be found here:
https://gist.github.com/ericeslinger/a83e74501e9901c8b795
which basically amounts to "I wrote some custom behaviors and threw them into a subclass of bookshelf.Model that all of my application Model objects inherits from".

Cassandra map data types in where clause

I'm new to cassandra, so this may be a trivial question.
Given a table defined as follows:
create table Users (
username text,
props map,
PRIMARY KEY (username)
);
Can I use individual elements of the map in a where clause? (i.e. select * from users where props['is_online']='no';)
I've seen examples referencing individual elements on updates and deletes, but haven't been able to find anything regarding usage as part of the where clause.
Indexing collections is not supported until 2.1: https://issues.apache.org/jira/browse/CASSANDRA-4511

Resources