Setting collection in MarkLogic - xquery

I have a requirement where I have to set collection to the existing documents. The thing is I have around 20 million records. I am running below query from query console. It is throwing time out error.
I also tried out limit=N option in below query. At max I was able to achieve N=40000, after that it's again throwing time out error.
Please help me with any faster query or approach.
for $each in cts:uri-match("/data/employee/*")
return xdmp:document-set-collections($each, "employee")

ml-gradle has OOTB support for this, no code needed. See https://github.com/marklogic-community/ml-gradle/wiki/DMSDK-Tasks#trying-it-out and look at "mlAddCollections".

Batching is the right option to perform this type of task. Use xdmp:spawn-function() function to put multiple tasks at a time on the task server. You just need to indentify the number of records which your query can finish within 10 minute or 1 hr as you required.
For example, if query can execute 5000 records within 10 minute:
let $total-records := xdmp:estimate(collection())
let $batch-size := 5000
let $pagination := 0
for $records in 1 to fn:ceiling($total-records div $batch-size )
let $start := fn:sum($pagination + 1)
let $end := fn:sum($batch-size + $pagination)
let $_ := xdmp:set($pagination, $end)
return
xdmp:spawn-function
(
function(),
for $each in cts:uri-match("/data/employee/*")[$start to $end]
return xdmp:document-set-collections($each, "employee")
)

Related

Write and read document in the same run

I am using BaseX version 9.5 and trying to ingest docs into database and read it in the same run:
let $result := for $i in 1 to 10
return
(
db:replace('test','/content/'||$i||'.xml',<test id="{$i}"></test>)
)
return
for $uris in fn:uri-collection('test')
return $uris
Currently document getting ingested into system but I am not getting the URIs of the document,
Please suggest how can I achieve this
Thanks

How to write/extend custom DQL Function for Cast(X $comparisonOperator Y) in Doctrine? (Goal: Show related Objects)

To show related Articles on a website, I need the Cast() function.
My Query looks like:
SELECT
*,
(CAST(a.uploader = ?1 AS UNSIGNED)
+ CAST(a.param2 = ?2 AS UNSIGNED)
...
) AS matches_count
FROM articles AS a
ORDER BY matches_count DESC
It counts the matches and sorts by the highest number of matches_counts.
The problem is, that there's no Cast() function built into doctrine.
After hours of trial and error I found an already available custom DQL Function:
https://github.com/beberlei/DoctrineExtensions/blob/master/src/Query/Mysql/Cast.php
I registered it inside my doctrine.yml.
But it doesn't work, because it expects Cast(X AS Y) and not Cast(Y $comparisonOperator X).
When I'am using this inside my repository, by example:
$this->createQueryBuilder('a, (CAST(author=25 AS UNSIGNED) AS matches_count)')
->getQuery()
->getResult()
;
I get this error, because it doesn't expect a comparison operator:
[Syntax Error] line 0, col 29: Error: Expected Doctrine\ORM\Query\Lexer::T_AS, got '='
Do you know how to maybe extend that class for and not Cast(Y $comparisonOperator X) instead of Cast(X AS Y)?
I didn't find any solution on the internet and tried it for hours.
Thank you in advance for taking the time to write an answer!
Update:
I changed line 37 in the above mentioned custom DQL class for Cast:
//old
//$this->fieldIdentifierExpression = $parser->SimpleArithmeticExpression();
//new
$this->fieldIdentifierExpression = $parser->ComparisonExpression();
and how to create the query:
$this->createQueryBuilder('a')
->select('a, (CAST(a.averageRating=:averageRating AS UNSIGNED) + CAST(a.author=:author AS UNSIGNED)) AS matches_count')
->setParameter('averageRating', $averageRating)
->setParameter('author', $author)
->orderBy('matches_count', 'DESC')
->getQuery()
->getResult();
and that seems to be it!
I hope its the right way of doing it, will help someone and that is the best way for this purpose.
To improve performance later, I plan to cache 10 ids of recommended articles for every single article page into its own table.
So it doesn't need to do the calculation on page load.
This table could get recreated every 24h via a cronjob.
ID | recommended_article_ids | article_id
1 | 10,24,76,88| 5
Feedback and tips are much appreciated!

How to match space in MarkLogic using CTS functions?

I need to search those elements who have space " " in their attributes.
For example:
<unit href="http:xxxx/unit/2 ">
Suppose above code have space in the last for href attribute.
I have done this using FLOWER query. But I need this to be done using CTS functions. Please suggest.
For FLOWER query I have tried this:
let $x := (
for $d in doc()
order by $d//id
return
for $attribute in data($d//#href)
return
if (fn:contains($attribute," ")) then
<td>{(concat( "id = " , $d//id) ,", data =", $attribute)}</td>
else ()
)
return <tr>{$x}</tr>
This is working fine.
For CTS I have tried
let $query :=
cts:element-attribute-value-query(xs:QName("methodology"),
xs:QName("href"),
xs:string(" "),
"wildcarded")
let $search := cts:search(doc(), $query)
return fn:count($search)
Your query is looking for " " to be the entirety of the value of the attribute. If you want to look for attributes that contain a space, then you need to use wildcards. However, since there is no indexing of whitespace except for exact value queries (which are by definition not wildcarded), you are not going to get a lot of index support for that query, so you'll need to run this as a filtered search (which you have in your code above) with a lot of false positives.
You may be better off creating a string range index on the attribute and doing value-match on that.

Marklogic How to change directory name

How to update directory from "/Collections/" to "/CollectionsCount/" in Marklogic
for $each in xdmp:directory("/collections/", "infinity")
return xdmp:node-uri($each)
result found:
/collections/Count2017-08-25.xml
/collections/Count2017-08-27.xml
/collections/Count2017-08-26.xml
/collections/Count2017-08-28.xml
I would like to update directory to become "/collectionsCount/" what function do I use. Thanks in advance.
/collectionsCount/Count2017-08-25.xml
/collectionsCount/Count2017-08-26.xml
....
Directories are a construct based on a document's URI, so in order to change the directories, you will need to delete the old documents and insert new ones at new URIs (using the new directory prefix in place of the old one).
for $each in xdmp:directory("/collections/", "infinity")
let $old-uri := $each/xdmp:node-uri(.)
let $permissions := xdmp:document-get-permissions($old-uri)
let $collections := xdmp:document-get-collections($old-uri)
let $quality := xdmp:document-get-quality($old-uri)
let $properties := xdmp:document-properties($old-uri)
let $new-uri := concat('/collectionsCount/', substring-after($old-uri, '/collections/'))
return (
xdmp:document-insert($new-uri, $each, $permissions, $collections, $quality),
xdmp:document-set-properties($new-uri, $properties),
xdmp:document-delete($old-uri))
Edit: Updated to include #hunterhacker's suggestion to propagate document metadata. Note that I intentionally left out assigning the new document to the old document's forest, since in most cases I assume it's better to let the database decide.

LINQ: Get all members with LAST order failed

I'm learning LINQ, and I'm trying to figure out how to get all members with the last order failed (each member can have many orders). For efficiency reasons I'd like to do it all in LINQ before putting it into a list, if possible.
So far I believe this is the right way to get all the members with a failed order which joined recently (cutoffDate is current date -10 days).
var failedOrders =
from m in context.Members
from o in context.Orders
where m.DateJoined > cutoffDate
where o.Status == Failed
select m;
I expect I need to use Last or LastOrDefault, or possibly I need to use
orderby o.OrderNumber descending
and then get the First or FirstOrDefault as suggested in this stackoverflow answer.
Note that I want to look at ONLY the last order for a given member and see if that has failed (NOT just find last failed order).
Normally you would write something like:
var failedOrders = from m in context.Members
where m.DateJoined > cutoffDate
select new
{
Member = m,
LastOrder = m.Orders.OrderByDescending(x => x.OrderNumber).FirstOrDefault()
} into mlo
// no need for null checks here, because the query is done db-side
where mlo.LastOrder.Status == Failed
select mlo; // or select mlo.Member to have only the member
This if there is a Members.Orders relationship

Resources