In marklogic, using MLCP can we read /export/import/copy data based on a condition?
Example : read only files with students subject element has only maths
Yes, you can apply the -query_filter option to restrict documents to those matching the filter query.
https://docs.marklogic.com/guide/mlcp/export#id_66898
The -query_filter option accepts a serialized XML cts:query or JSON cts.query as its value.
Controlling What is Exported, Copied, or Extracted
By default, mlcp exports all documents or all documents and metadata in the database, depending on whether you are exporting in document or archive format or copying the database. Several command line options are available to enable customization.
-query_filter - export/copy only documents matched by the specified cts query. You can use this option alone or in combination with a directory, collection or document selector filter.
-directory_filter - export only the documents in the listed database directories. You cannot use this option with -collection_filter or -document-selector.
-collection_filter - export only the documents in the listed collections. You cannot use this option with -directory_filter or -document_selector.
-document_selector export only documents selected by the specified XPath expression. You cannot use this option with -directory_filter or -collection_filter. Use -path_namespace to define namespace prefixes.
Related
So this is possible:
const docSnapshot = await firebase.firestore().collection("SOME_COL").doc("SOME_DOC").get();
console.log(docSnapshot.exists);
But it "downloads" the whole document just to check if it exists. And I'm currently working with some havier documents and I have a script where I just need to know if they exist, but I don't need to download them at that time.
Is there a way to check if a document exist without .get() and avoid downloading the document data?
It seems you are using the JavaScript SDK. With this SDK there isn't any way to only get a subset of the fields of a document.
One of the possible solutions is to maintain another collection with documents that have the same IDs than the main collection documents but which only hold a very small dummy field. You could use a set of Cloud Functions to synchronise the two collections (Documents creation/deletion).
On the other hand, with the Firestore REST API, it is possible, with the get method, to define a DocumentMask which defines a "set of field paths on a document" and is "used to restrict a get operation on a document to a subset of its fields". Depending on your exact use case, this can be an interesting and easier solution.
Firebase Firestore has a reference type while defining fields of a document which allows us to add a reference to another document via its "Document path".
For example, I have the document animals/3OYc0QTbGOTRkhXeiW0t, with a field name having value Zebra. I reference it in the array animals, of document zoo/xmo5wX0MLUEbfFJHvKq6. I am basically storing a list of animals in a zoo, by referring the animals to the corresponding animal document in the animals collections.
Now if I query a specific document from the zoo collection, will references to the animals be automatically resolved? Will I the get the animal names in the query result? If not, how can I achieve this?
All document queries in Firestore are shallow, meaning that you only get one document in return for each document requested.
References in a document are not automatically fetched - you will have to make subsequent queries using the references in the document to get those other documents on your own.
Same thing with documents in subcollections - they require separate queries.
I have a corb script to run node replace on the xml files.
If I don't specify the collection, will it remove the documents from the existing collections?
If you are altering the document with xdmp:node-replace(), then the document will remain in it's collections and you do not need to worry about setting/adding it back.
If you are using xdmp:document-insert() to replace the document at the current URI, then you do need to specify the collection(s), otherwise it will be removed from the existing collections.
However, you can use xdmp:document-get-collections() to retrieve the sequence of collections for the URI and use it for the 4th parameter of xdmp:document-insert()
xdmp:document-insert($URI, $doc, (), xdmp:document-get-collections($URI))
Its better to provide an empty collection value, while doing the node-replace so it doesn't alter the existing collections of the document. Not defining this attribute is throwing errors while running the script.
In SolrCloud Collections API (https://cwiki.apache.org/confluence/display/solr/Collections+API), we can list collections using action:
/admin/collections?action=LIST
However, aliases are not included in this list. There is also no corresponding command for aliases (we can only CREATEALIAS or DELETEALIAS). How to list aliases?
This feature seems to be not implemented yet: https://issues.apache.org/jira/browse/SOLR-4968
However, you can use this command:
/admin/collections?action=CLUSTERSTATUS
Each collection will be listed together with the aliases it is covered by. Also in the bottom of the XML there is a separate node, summarising all aliases and covered collections.
The aliases list can be fetched in json format using the following command.
[solr_server_hostname]:8983/solr/zookeeper?detail=true&path=/aliases.json
The "data" field in this JSON holds the list of collections object.
For Solr 6.6+, you could use:
/solr/admin/collections?action=LISTALIASES
See https://solr.apache.org/guide/6_6/collections-api.html#CollectionsAPI-listaliases
I am uploading some documents in Marklogic Server (doc, docx, pdf, txt etc). Now I am building an interface in HTML & XQuery that allows a user to enter a search term and if that matches the contents of any documents, then that document name is displayed in the grid. I am using search:search API for searching. Now I also want to show last modified date and author of the document in the grid. Every windows document have last modified date and author property. But how can I get this information from search:search API so that I can show these information in the grid ?
If you have enabled the settings "maintain last modified," Marklogic keeps the last modified information in document property fragments. However, this is unrelated to the properties information kept in Windows, which are lost by default when you load them in Marklogic.
If you want to retain the Windows properties data, set up a filter in Information Studio to populate the Marklogic property fragments with the data. Alternately, you could write your own XSLT and use xdmp:document-filter() to store the data directly in the document.
Once you have loaded your documents and populated them with the properties you need, you can access the data directly if stored in the document, or using xdmp:document-properties() if stored in document properties.