Marklogic - Delete Versioned Collections - xquery

I have around 43 million documents which is having the latest versioned document in LIVE collection and also have same versioned document in another version collection named as (/collection/versionNumber). I want to delete the versioned collections which is around 34 million. what is best approach to go for it to delete all in one go .

You could try using xdmp:collection-delete() to delete all documents in the collection in a single transaction.
If that doesn't work and it isn't able to delete in one shot, then I would look to utilize batch tools. For instance, a CoRB job.
An example job options file with properties needed, except for the XCC-CONNECTION-URI:
# Inline module to select all URIs from the collection
URIS-MODULE=INLINE-XQUERY|let $uris := cts:uris("",(),cts:collection-query("/collection/versionNumber")) return (count($uris), $uris)
# Inline module to delete the docs
PROCESS-MODULE=INLINE-XQUERY|declare variable $URI as xs:string external; xdmp:document-delete($URI)
THREAD-COUNT=10

I think your application is using DLS library for versioning. If yes, and if you never want any version to look into in future, then only delete the versioned documents. Use can use "dls:document-unmanage" API in that case.
Also, explore dls:purge and dls:document-purge before proceeding. I am not very sure of these two.
Anyways, even if it's not DLS, processing them in one go (single transaction) would not be a recommended way. Either process them in batches or set them all in different threads on task server through spawn.

Related

Pass search strategy to filter from rest URI

First time using api-platform and Symfony 4 to create an API interface for a MySQL db.
I'm updating an old search interface for the db for which I need to replicate many of the search options. This includes being able to search on a given field using various matching operators/strategies. e.g. starts with, contains exactly equals, etc.
I've set everything up for the api using Annotations.
The #ApiFilter(SearchFilter::class, properties={"fieldname": "strategy"} annotation on my table class works as designed, but I am limited to one-and-only-one strategy per field. I need to be able to pass the strategy to the api search function in the url. something like:
/api/staff?lastname[start]=dav
or
/api/staff?lastname=david&match=contains
or
/api/staff/lastname/son?searchtype=end
would be fine.
I can't figure out how to set this up. Shockingly, to me anyway, this common requirement doesn't seem to be documented at all.
The file CustomSearchFilter.php located at the repo https://github.com/jordonedavidson/custom_search_filter solves this use-case using the
/api/staff?lastname[start]=dav
syntax.
The file was written by Kévin Dunglas (the author of Api Platform) and is presented with his blessing.

aem-6-2-replication-issue-for-node-containing-invalid-jcr-names

I am working on AEM 5.6 to 6.2 upgrade project. There are some nodes in aem 5.6 environment which contains invalid character(as per JCR naming convention like rte[2] is one of the node name which doesn't follow the naming convention)but somehow we are able to replicate those nodes in 5.6 environment. After upgrade it to aem 6.2,it seems like JCR is more restricted and won't allow the nodes to replicate if it is having invalid characters.
Getting the below error in aem 6.2: error:
com.day.cq.replication.ReplicationException: Repository error during node import: Cannot create a new node using a name including an index
Is there any way we can configure AEM 6.2 to stop checking JCR node names?or any other solution?
JCR 2 does not allow [ as a valid character therefore you won't get an easy workaround for this. It's one of the limitations just like the same-named-sibling.
My recommendation will be to modify these nodes before the upgrade/migration to 6.2. This can be complicated and costly for business but 6.2 won't allow it.
As a background [ was allowed in older version due to twisted support for grammar syntax for same-named-siblings.
Assuming that these are all content nodes as nothing out-of-the-box in AEM 5.x follows this naming convention.
Some ways to fix it:
Write a custom servlet to query and rename the paths across all references. You will have to test your content for these renames.
Use Groovy console (https://github.com/OlsonDigital/aem-groovy-console) to rename the nodes.
In either case, you will need to modify the nodes before the migration as the structure is not oak compliant therefore you cannot use crx2oak commit hooks also. This can be done with both in-place upgrade and side-by-side migration. This is similar to the problem with same named siblings that must be corrected before the migration.
Some efficiency techniques that might help:
Write queries to find invalid node names on top-level nodes like /content/mysite-a, /content/mysite-b etc. Don't run root level queries on /content as it might downgrade to
traversal and halt the execution.
Ensure that all references are updated in same commit. If you are using custom servlet, call session.save() only after updating all the node names and it's corresponding references.
As i mention in the comment this replication failure causes because of the oak workspace restriction as the code snippet below
//handle index
if (oakName.contains("["))
{ throw new RepositoryException("Cannot create a new node using a name including an index");
}
and i feel you can't escape this constraint as it it required by the repository to maintain consistency
you can find nodes that ends with '[', by below query
SELECT [jcr:path] FROM [nt:base] WHERE ISDESCENDANTNODE('/content/path/') AND [jcr:path] like '%\['
and to modify the JCR/CRX nodes you can use CURL or SlingPostServlet method
Some helpful posts are blow.
https://github.com/paulrohrbeck/aem-links/blob/master/curl_cheatsheet.md
http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html
Can you try migrating using a tool like oak-upgrade and let us know if you are still facing this issue.
The tool is robust and you have the flexibility to configure specific sub-trees for migration using this tool.

Publish data with SSDT?

I have a SSDT-project. When publishing a new version I want to publish/initialize some data in the database as well. Can that be done using SSDT?
It can be done, but could be tricky. If you set up a variable in the project that can be used for "New" releases, you could put that in your post-deploy script as a section that would run a series of inserts, but only for that "New" type.
As David mentioned, the better way would likely be to use something like Red-Gate's data compare or run the scripts after creating the database. It's possible to do it in post-deploy scripts, but could prove tricky.
Something like this could work:
IF '$(DeployType)' = 'New'
BEGIN --"New" release scripts
PRINT 'Post-Deploy Scripts for release.'
:r .\InsertScript1.sql
:r .\InsertScript2.sql
--etc
END --"New" release scripts
This isn't possible in SSDT. The current guidance is to use a post-deployment script.
Redgate ReadyRoll provides many experiences familiar to SSDT users, but has improved static data management as well as many other improvements.
We include Merge-scripts automatically, when they are placed in a specific subfolder of the Project.
depending on what you do, table valued custructors might be something to have a look at:
SELECT *
FROM
(VALUES
(101, 'Bikes'),
(102, 'Accessories'),
(103, 'Clothes')
) AS Category(CategoryID, CategoryName);
These are easily transported and compared by SSDT.
For ore information see https://www.simple-talk.com/sql/sql-training/table-value-constructors-in-sql-server-2008/

avoiding XDMP-EXPNTREECACHEFULL and loading document

I am using marklogic 4 and I have some 15000 documents (each of around 10 KB). I want to load the entire content as a document ( and convert the total documents to a single csv file and output to HTTP output stream for downloading). While I load the documents this way:
let $uri := cts:uri-match('products/documents/*.xml')
let $doc := fn:doc ($uri)
The xpath has some 15000 xmls. So fn:doc throws an error XDMP-EXPNTREECACHEFULL.
Is there any workaround for this? I cannot increase tree cache size in admin console because the number of xml files in products/documents/*.xml may increase.
Thanks.
When you want to export large quantities of XML from MarkLogic, the best technique is to write the query so that results can stream, avoiding the expanded tree cache entirely. It is a very different style of coding, though: you'll have to avoid strong typing of any kind, and refactor your code to remove FLWOR expressions. You won't be able to test any of the code in cq or qconsole, either.
Take a look at http://blakeley.com/blogofile/2012/03/19/let-free-style-and-streaming/ for some tips on how to get there. At a minimum the code sample you posted would have to become:
doc(cts:uri-match('products/documents/*.xml'))
In passing I would try to rework that to avoid the *.xml part, because it will be slower than needed. Maybe something like this?
cts:search(
collection(),
cts:directory-query('products/documents/', 'infinity'))
If you need to test for something more than the directory, you could add a cts:and-query with some cts:element-query test.
For general information about this error, see the MarkLogic knowledge base article on XDMP-EXPNTREECACHEFULL

SQL Server Deploy Script

I use sql server 2005 for an asp.net project. I want to run a SQL file that contains all DB changes from the last release to easily bring a DB up to the latest version.
I basically just have a bunch of alter table, create table, create index, alter view, call stored proc, etc statements. But I would like to wrap it in a transaction so if any part of it fails, none of the changes will go through. Otherwise it could make for some really messy debugging where it finished.
Also, if you know of a better way to manage DB deployment let me know!
I do something similar with a Powershell script using SMO.
Pseudocode would be:
$SDB = SourceDBObject
$TDB = TargetDBObject
ForEach $table in $SDB.Tables
{
Add an entry to a hash table with the name
and some attributes (rowcount, columns, datasize)
}
# Same thing for $TDB
# Compare the two arrays, and make a list of all the ones that exist in the source but not in the target, or that are different
# Same thing for Procs and Views
# Pass this list to a SMO.Scripter as an UrnCollection object, and it will script them out in dependency order (it's an option), with drops
# Wrap the script in a transaction and execute it on target server
# Use SQLBulkCopy class to transfer data server-to-server
What version of Visual Studio do you use? In Visual Studio 2010 and from what I can remember Visual Studio 2008 - in the menu under "Data" there are two options - "Schema Compare" and "Data Compare". That should move you in the right direction.
BEGIN TRANSACTION #TranName;
USE AdventureWorks2008R2;
DELETE FROM AdventureWorks2008R2.HumanResources.JobCandidate
WHERE JobCandidateID = 13;
COMMIT TRANSACTION #TranName;
You should execute everything within a transaction
Note that some DDL statements have to be the first statement in a batch (batches are separate from transactions). (GO is the default batch separator in SSMS and SQLCMD).

Resources