How to rename a document in MarkLogic?

How to rename a document in MarkLogic? - xquery

I have simple task to do but unable to find the exact solutions for this.I have saved a file as abc.xml in MarkLogic.How can i rename the file as some example.xml using XQuery?
Code which I tried:
xquery version "1.0-ml";
xdmp:document-rename ("/aaa.xml","/final.xml");
This is showing an error.

There is no way, that I know of, to change the document URI of an existing document. The only way I can think of is to create a new document with the same content and the new URI, and delete the existing one, in the same transaction.
Where it gets tricky is to make sure to preserve the ownership, the permissions, all the properties, the property document, make sure that the old URI is not used anywhere to link to the existing document, etc.
But usually, the document URI is never really used. You should first considering whether you really need to rename the document, and why.
(Note that saying "this is showing an error" is rarely useful on SO or on mailing lists, if you do not show what the error is.)

Florent is correct, a true 'rename' is not possible, or perhaps not even meaningful. ( analogy - rename a file from one disk to another )
"Move" however is meaningful (copy then delete in a transaction).
Defining "Move" is use case dependent - i.e. what metatdata also needs to 'move' ? permissions? collections ? document properties ? inherited permissions ?
xmlsh (http://www.xmlsh.org) implements a 'rename' (http://www.xmlsh.org/MarkLogicRename) command for the marklogic extension which is really a 'move', with the implemenation borrowed from postings on markmail (http://markmail.org/)
The implementation is the following XQuery - it doesnt do everything you might want and it might do more then you want. YMMV
https://github.com/DALDEI/xmlsh/blob/master/extensions/marklogic/src/org/xmlsh/marklogic/resources/rename.xquery
( it was also written long ago - it is likely to benefit from improvement )

I have working example this works for me.
xquery version "1.0-ml";
declare function local:document-rename(
$old-uri as xs:string, $new-uri as xs:string)
as empty-sequence()
{
xdmp:document-delete($old-uri),
let $permissions := xdmp:document-get-permissions($old-uri)
let $collections := xdmp:document-get-collections($old-uri)
return xdmp:document-insert(
$new-uri, doc($old-uri),
if ($permissions) then $permissions
else xdmp:default-permissions(),
if ($collections) then $collections
else xdmp:default-collections(),
xdmp:document-get-quality($old-uri)
)
,
let $prop-ns := namespace-uri(<prop:properties/>)
let $properties :=
xdmp:document-properties($old-uri)/node()
[ namespace-uri(.) ne $prop-ns ]
return xdmp:document-set-properties($new-uri, $properties)
};
(: function call :)
local:document-rename ("/opt/backup/x.xml","y.xml");

MarkLogic has a tutorial up addressing file renaming (moving):
https://developer.marklogic.com/recipe/move-a-document/
Importantly, it uses the function xdmp:lock-for-update() to prevent modifications to the source file while it is being copied to the target location.
Also, if you are doing a batch renaming you'll want to make sure that each file URI you rename corresponds to a document in the database or you'll get runtime errors.

Related

how to write query where the input file is passed from the command line (Saxon)

I'm very new to this.
I have a query and an xml file.
I can write a query over that specific file
for $x in doc("file:///C:/Users/Foo/IdeaProjects/XQuery/src/books.xml")/bookstore/book
where $x/price>30
order by $x/title
return $x/title
I have a basic xml file, with books in it, works nicely in intellij.
but if I wanted to run this query against some file defined on the command line, then how do I do it?
the command line for running the above is (as much for other peoples reference)
java -cp C:\Users\Foo\.IdeaIC2019.2\config\plugins\xquery-intellij-plugin\lib\Saxon-HE-9.9.1-7.jar net.sf.saxon.Query -t -q:"C:\Users\Foo\IdeaProjects\XQuery\src\w3schools.com.xqy"
and that also works nicely.
the saxon documentation
https://www.saxonica.com/html/documentation/using-xquery/commandline.html
implies that I can specify an input file, using "-d"
and "The document node of the document is made available to the query as the context item"
but this doesnt really make any sense to my 1 day old XQuery skills.
how do I specify the document is sent from the command line in the query? what is the context item? and how do I reference it?
(I can do a bit of XSLT 1.0, so I understand the notion of a context).

I think the option is named -s (for source) so you can use -s:books.xml and inside your XQuery main expression any path is evaluated with that document as the context item so you can just use e.g.
for $x in /bookstore/book
where $x/price>30
order by $x/title
return $x/title

and the answer is to drop the doc() function
for $x in bookstore/book
i.e. the same notion as xslt.

Storing files with xquery in exist

Running the following xquery directly in eXide (Eval) it works fine, adding the XML files in MyFSdirectory into the MyCollectionPath:
xquery version "3.1";
let $selected_directory:= 'MyFSdirectory'
let $source-directory := $selected_directory
let $target-collection := 'MyCollectionPath'
return
xmldb:store-files-from-pattern($target-collection, $source-directory, '*.xml')
But when I add it to a function and call it from my app, the store-files-from-pattern is not doing the job (no errors are shown but files are not uploaded), the check point is printed in my screen, so the function is being called correctly. Any hints?
declare function app:upload_file($node as node(), $model as map(*)) {
let $selected_directory:= "MyFSdirectory"
let $source-directory := $selected_directory
let $target-collection := "MyCollectionPath"
return
<p>check point</p> |
xmldb:store-files-from-pattern($target-collection, $source-directory, '*.xml')
};

This sounds like a permissions issue. In other words, when you run the script in eXide, you're likely running as a user (e.g., "admin") with write permissions on the target collection, but in your application the script is likely running as a guest user without the required permission to write to the target collection.
To troubleshoot, add an expression calling xmldb:login() to your app:upload_file() function, supplying the credentials for the user you use in eXide.
If elevating privileges this way works, then the next step would be to consider setting appropriate permissions on the target collection or setting applying setuid or setgid on the module that writes to the database.

Updating embedded triples using xquery in MarkLogic

I tried to update embedded triples in marklogic using xquery but it seems to be not working for embedded triples however the same query is working for other triples
can you tell me if there is some other option which needs to specified while performing an update on embedded triples.
The code i used is
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/Marklogic/semantics.xqy";
let $triples := cts:triples(sem:iri("http://smartlogic.com/document#2012-10-26_DNB.OL_(Citi)_DNB_ASA_(DNB.OL)__Model_Update.61259187.xml"),()())
for $triple in $triples
let $node := sem:database-nodes($triple)
let $replace :=
<sem:triple>
<sem:subject>http://www.example.com/products/1001_Test
</sem:subject>
{$node/sem:predicate, $node/sem:object}
</sem:triple>
return $node ! xdmp:node-replace(., $replace)
My document contains the following triple
<sem:triples xmlns:sem="http://marklogic.com/semantics">
<sem:triple>
<sem:subject>http://smartlogic.com/document#2012-10-26_DNB.OL_(Citi)_DNB_ASA_(DNB.OL)__Model_Update.61259187.xml</sem:subject>
<sem:predicate>http://www.smartlogic.com/schemas/docinfo.rdf#cik</sem:predicate>
<sem:object>datatype="http://www.w3.org/2001/XMLSchema#string</sem:object>
</sem:triple>
</sem:triples>
and i want this particular subject to change into something like this
<sem:subject>http://www.example.com/products/1001_Test</sem:subject>
But when i use the xquery to update it , it does not alter anything, the embedded triple in the documents remains the same.
Because when i tried to see if any of the results have changed to the subject i specified it returned me no results.
I used the following query to test.
SELECT *
WHERE {
<http://www.example.com/products/1001_Test> ?predicate ?object
}

You need to add the option 'all' when you ask for the database nodes backing the triple: sem:database-nodes($triple, 'all').
To be perfectly honest, I am not 100% sure why, but I think this is because your sem:triples element is not the root element of the document it appears on.

Fix serialized data broken due to editing MySQL database in a text editor?

Background: I downloaded a *.sql backup of my WordPress site's database, and replaced all instances of the old database table prefix with a new one (e.g. from the default wp_ to something like asdfghjkl_).
I've just learnt that WordPress uses serialized PHP strings in the database, and what I did will have messed with the integrity of the serialized string lengths.
The thing is, I deleted the backup file just before I learnt about this (as my website was still functioning fine), and installed a number of plugins since. So, there's no way I can revert back, and I therefore would like to know two things:
How can I fix this, if at all possible?
What kind of problems could this cause?
(This article states that, a WordPress blog for instance, could lose its settings and widgets. But this doesn't seem to have happened to me as all the settings for my blog are still intact. But I have no clue as to what could be broken on the inside, or what issues it'd pose in the future. Hence this question.)

Visit this page: http://unserialize.onlinephpfunctions.com/
On that page you should see this sample serialized string: a:1:{s:4:"Test";s:17:"unserialize here!";}. Take a piece of it-- s:4:"Test";. That means "string", 4 characters, then the actual string. I am pretty sure that what you did caused the numeric character count to be out of sync with the string. Play with the tool on the site mentioned above and you will see that you get an error if you change "Test" to "Tes", for example.
What you need to do is get those character counts to match your new string. If you haven't corrupted any of the other encoding-- removed a colon or something-- that should fix the problem.

I came to this same problem after trying to change the domain from localhost to the real URL. After some searching I found the answer in Wordpress documentation:
https://codex.wordpress.org/Moving_WordPress
I will quote what is written there:
To avoid that serialization issue, you have three options:
Use the Better Search Replace or Velvet Blues Update URLs plugins if you can > access your Dashboard.
Use WP-CLI's search-replace if your hosting provider (or you) have installed WP-CLI.
Run a search and replace query manually on your database. Note: Only perform a search and replace on the wp_posts table.
I ended up using WP-CLI which is able to replace things in the database without breaking serialization: http://wp-cli.org/commands/search-replace/

I know this is an old question, but better late than never, I suppose. I ran into this problem recently, after inheriting a database that had had a find/replace executed on serialized data. After many hours of researching, I discovered that this was because the string counts were off. Unfortunately, there was so much data with lots of escaping and newlines and I didn't know how to count in some cases and I had so much data that I needed something automated.
Along the way, I stumbled across this question and Benubird's post helped put me on the right path. His example code did not work in production use on complex data, containing numerous special characters and HTML, with very deep levels of nesting, and it did not properly handle certain escaped characters and encoding. So I modified it a bit and spent countless hours working through additional bugs to get my version to "fix" the serialized data.
// do some DB query here
while($res = db_fetch($qry)){
$str = $res->data;
$sCount=1; // don't try to count manually, which can be inaccurate; let serialize do its thing
$newstring = unserialize($str);
if(!$newstring) {
preg_match_all('/s:([0-9]+):"(.*?)"(?=;)/su',$str,$m);
# preg_match_all("/s:([0-9]+):(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\")(?=;)/u",$str,$m); // alternate: almost works but leave quotes in $m[2] output
# print_r($m); exit;
foreach($m[1] as $k => $len) {
/*** Possibly specific to my case: Spyropress Builder in WordPress ***/
$m_clean = str_replace('\"','"',$m[2][$k]); // convert escaped double quotes so that HTML will render properly
// if newline is present, it will output directly in the HTML
// nl2br won't work here (must find literally; not with double quotes!)
$m_clean = str_replace('\n', '<br />', $m_clean);
$m_clean = nl2br($m_clean); // but we DO need to convert actual newlines also
/*********************************************************************/
if($sCount){
$m_new = $m[0][$k].';'; // we must account for the missing semi-colon not captured in regex!
// NOTE: If we don't flush the buffers, things like <img src="http://whatever" can be replaced with <img src="//whatever" and break the serialize count!!!
ob_end_flush(); // not sure why this is necessary but cost me 5 hours!!
$m_ser = serialize($m_clean);
if($m_new != $m_ser) {
print "Replacing: $m_new\n";
print "With: $m_ser\n";
$str = str_replace($m_new, $m_ser, $str);
}
}
else{
$m_len = (strlen($m[2][$k]) - substr_count($m[2][$k],'\n'));
if($len != $m_len) {
$newstr='s:'.$m_len.':"'.$m[2][$k].'"';
echo "Replacing: {$m[0][$k]}\n";
echo "With: $newstr\n\n";
$str = str_replace($m_new, $newstr, $str);
}
}
}
print_r($str); // this is your FIXED serialized data!! Yay!
}
}
A little geeky explanation on my changes:
I found that trying to count with Benubird's code as a base was too inaccurate for large datasets, so I ended up just using serialize to be sure the count was accurate.
I avoided the try/catch because, in my case, the try would succeed but just returned an empty string. So, I check for empty data instead.
I tried numerous regex's but only a mod on Benubird's would accurately handle all cases. Specifically, I had to modify the part that checked for the ";" because it would match on CSS like "width:100%; height:25px;" and broke the output. So, I used a positive lookahead to only match when the ";" was outside of the set of double quotes.
My case had lots of newlines, HTML, and escaped double quotes, so I had to add a block to clean that up.
There were a couple of weird situations where data would be replaced incorrectly by the regex and then the serialize would count it incorrectly as well. I found NOTHING on any sites to help with this and finally thought it might be related to caching or something like that and tried flushing the output buffer (ob_end_flush()), which worked, thank goodness!
Hope this helps someone... Took me almost 20 hours including the research and dealing with weird issues! :)

This script (https://interconnectit.com/products/search-and-replace-for-wordpress-databases/) can help to update an sql database with proper URLs everywhere, without encountering serialized data issues, because it will update the "characters count" that could throw your URLs out of sync whenever serialized data occurs.
The steps would be:
if you already have imported a messed up database (widgets not
working, theme options not there, etc), just drop that database
using PhpMyAdmin. That is, remove everything on it. Then export and
have at hand an un-edited dump of the old database.
Now you have to import the (un-edited) old database into the
newly created one. You can do this via an import, or copying over
the db from PhpMyAdmin. Notice that so far, we haven't done any
search and replace yet; we just have an old database content and
structure into a new database with its own user and password. Your site will be probably unaccessible at this point.
Make sure you have your WordPress files freshly uploaded to the
proper folder on the server, and edit your wp-config.php to make it
connect with the new database.
Upload the script into a "secret" folder - just for security
reasons - at the same level than wp-admin, wp-content, and wp-includes. Do not forget to remove it all once the search and
replace have taken place, because you risk to offer your DB details
open to the whole internet.
Now point your browser to the secret folder, and use the script's fine
interface. It is very self-explanatory. Once used, we proceed to
completely remove it from the server.
This should have your database properly updated, without any serialized data issues around: the new URL will be set everywhere, and serialized data characters counts will be accordingly updated.
Widgets will be passed over, and theme settings as well - two of the typical places that use serialized data in WordPress.
Done and tested solution!

If the error is due to the length of the strings being incorrect (something I have seen frequently), then you should be able to adapt this script to fix it:
foreach($strings as $key => $str)
{
try {
unserialize($str);
} catch(exception $e) {
preg_match_all('#s:([0-9]+):"([^;]+)"#',$str,$m);
foreach($m[1] as $k => $len) {
if($len != strlen($m[2][$k])) {
$newstr='s:'.strlen($m[2][$k]).':"'.$m[2][$k].'"';
echo "len mismatch: {$m[0][$k]}\n";
echo "should be: $newstr\n\n";
$strings[$key] = str_replace($m[0][$k], $newstr, $str);
}
}
}
}

I personally don't like working in PHP, or placing my DB credentials in an public file. I created a ruby script to fix serializations that you can run locally:
https://github.com/wsizoo/wordpress-fix-serialization
Context Edit:
I approached fixing serialization by first identifying serialization via regex, and then recalculating the byte size of the contained data string.
$content_to_fix.gsub!(/s:([0-9]+):\"((.|\n)*?)\";/) {"s:#{$2.bytesize}:\"#{$2}\";"}
I then update the specified data via an escaped sql update query.
escaped_fix_content = client.escape($fixed_content)
query = client.query("UPDATE #{$table} SET #{$column} = '#{escaped_fix_content}' WHERE #{$column_identifier} LIKE '#{$column_identifier_value}'")

Using XQuery with multiple collection functions

I am using the following XQuery to query a collection of files:
for $files in collection("/data?select=*data*.xml")
Each file in the directory has a specific name, which enables me to recognize it. I use this as the identifier, which I retrieve as follows:
let $file-id := tokenize(base-uri($files), "/")[last()]
The $file-id variable follows a certain pattern: abc-1234. The first eight characters are relevant, so I fetch them using the variable below:
let $file-link-id := substring($file-id, 1, 8)
Now, I have another collection of files, which I want to query. These files follow the same pattern in the name, because they contain connected information.
How can I use the $file-link-id to select the correct file in the second collection?
I assume I would have to include it in the second collection clause, something along the lines of ?select=$file-link-id.xml, but I am unsure of how to do this.

Maybe you could clearify your problem statement if I assume wrong here, because your problem seems to be very easy solveable (so maybe I misunderstood your problem).
So you have your correct $file-link-id and want to use it as string. If your xquery processor supports XQuery 3.0 you can use two pipes, i.e.
for $files in collection("/data?select=" || $file-link-id || ".xml")
If not, use string-join():
for $files in collection(string-join(("/data?select=", $file-link-id, ".xml"), ''))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex