BaseX - Out of memory when using enclosing xml in XQuery - xquery

I've been trying to query a BaseX db which contains more than 1500000 items.
When i run this query
for $item in collection('coll')//item
return $item (: returns an xml element :)
it executes in less than a second.
But when i try to return the result in an xml I get an "Out of main memory" error.
<xml>{
for $item in collection('coll')//item
return $item
}</xml>
This is something that makes me want to abandon the native xml db approach (same happens with other DBs, such as eXistDB), so if anyone has any info this problem, it would be extremely helpful.
Thanks

Due to the semantics of XQuery, all child nodes need to be copied if they are wrapped by a new parent node. This is demonstrated by the following query, which compares the node identity of the original and copied node. It will yield false:
let $node := <node/>
let $parent := <parent>{ $node }</parent>
return $parent/node is $node
As copying millions of nodes is expensive, this inevitably leads to an out-of-memory error.
If you write results to files, here is a pragmatic solution to get around this restriction:
(:~
: Writes element to a file, wrapped by a root node.
: #param $path path to file
: #param $elements elements to write
: #param $name name of root node
:)
declare function local:write-to(
$path as xs:string,
$elements as element()*,
$name as xs:string
) as empty-sequence() {
file:write-text($path, '<' || $name || '>'),
file:append($path, $elements),
file:append-text($path, '</' || $name || '>')
};
local:write-to('result.xml', <result/>, 'root')
To anticipate criticism: This is a clear hack. For example, the approach conflicts with various non-default serialization parameters of BaseX (the result will not be well-formed if an XML declaration needs to be be output, etc.).

With BaseX 9.0, you can temporarily disable node copying via the COPYNODE option:
(# db:copynode false #) {
<xml>{
for $item in collection('coll')//item
return $item
}</xml>
}

Related

How to list collections/resources recursivelly in XQuery

I would like to list all collections from a particular point recursively:
declare function local:list-collections($collection as xs:string) {
for $child in xmldb:get-child-collections($collection)
return
local:list-collections(concat($collection, '/', $child))
};
local:list-collections('/db/apps/tested-bunny/data/')
This returns nothing (no errors, no results). I am inspired by this article and consider it as a good starting point for recursive setting of permissions and so on.
See the dbutil:scan-*() functions in Wolfgang Meier's article on higher order functions with XQuery in eXist-db 2.0+. The article is very instructive article in general. These days the dbutil module is available in the shared-resources package that is installed by default with eXist, so you can make use of it as follows:
xquery version "3.0";
import module namespace dbutil="http://exist-db.org/xquery/dbutil"
at "/db/apps/shared-resources/content/dbutils.xql";
dbutil:scan-collections(
xs:anyURI('/db'),
function($collection) { $collection }
)
These functions perform well. I just ran this in eXide and the query returned 4125 collection names in 0.699s.
Your query does actually recursively find collections, but there is no output. I'd suggest to do something like
declare function local:list-collections($collection as xs:string) {
for $child in xmldb:get-child-collections($collection)
let $childCollection := concat($collection, '/', $child)
return
(local:list-collections($childCollection), $childCollection)
};
local:list-collections('/db/apps/fundocs')
But for sure Joe's suggestion is much cleaner.

What's the difference between global var post and get_post?

I use the_posts filter to add an object to each queried post. When access the added object, I get different result by using $post or get_post.
This is the code to attach the object to posts:
add_filter( 'the_posts', 'populate_posts_obj', 10,2 );
function populate_posts_obj( $posts, $query ){
if ( !count( $posts ) || !isset($query->query['post_type']) )
return $posts;
if( in_array( $query->query['post_type'], get_valid_grade_types())){
foreach ( $posts as $post ) {
if ( $obj = new Gradebook( $post->ID ) )
$post->gradebook = $obj;
}
}
return $posts;
}
Then, access the obj via $post, sometimes get the obj, sometimes not (even when it's the same post):
function get_the_gradebook(){
global $post;
return isset($post->gradebook) ? $post->gradebook : null;
}
Access the obj via get_post(), always get the obj:
function get_the_gradebook(){
global $post;
$p = get_post($post->ID);
return isset($p->gradebook) ? $p->gradebook : null;
}
I can just use the get_post() version, but it would be useful if I know why the difference.
Additional info:
If you ask the reason I attach an obj to each post, I think WordPress may take care of the caching process at the first place. Then, other caching plugins can work on my obj as if working on standard WP posts.
Lets explain you with a little bit pseudo code. I am trying to be broad with my approach so that my answer is relevant to StackOverflow however I still don't know how many down votes I may be receiving for this.
The simple difference is $post is a variable and get_post() is a method that means you can expect a different output from get_post() due to several dependencies however $post will only change when you explicitly do that.
Lets assume something like this
function get_post() {
return rand(0, 5);
}
$post = get_post(); /* lets assume random
value that was generated
this time was "2" */
Now each time you call get_post() its value be keep changing however the value of $post is always 2.
Coming back to the context of wordpress, $post is set using get_post() within the Loop and corresponds to the object referring to default post ID for current URL where as get_post() will take post ID as an input and return the post object.
$post is what WordPress considers to be the current "post" (post/page/custom post type) and can quite often end up giving you data you didn't quite expect. This is especially true if you perform WP_Query's in your template or have a template that uses data from several "posts".
By using get_post() with the ID you want the data from, you can be assured that you are getting the data you really want.

XQuery to change attribute value and return previous one

I'm trying to update an attribute value of a node and return its previous value all in one query and I can't find a way to do it. I'm using BaseX as my XML/XQuery database.
For now I've tried doing this:
/Root/Elem/properties/property[#id='17']/#format,
replace value of node /Root/Elem/properties/property[#id='17']/#format with 'URL'
and also this:
for $prop in /Root/Elem/properties/property[#id='17']
let $format := $prop/#format
return (replace value of node $prop/#format with 'URL', $format)
And multiple other tests but they all lead to the following error:
List expression: no updating expression allowed.
Is it a limitation of BaseX or is it not possible in XQuery?
XQuery Update does not allow returning results from an updating query. You can however use BaseX's proprietary update:output($seq) function to do that:
for $prop in /Root/Elem/properties/property[#id='17']
let $format := $prop/#format
return (replace value of node $format with 'URL', update:output($format))

Drupal: Return SQL string from db_query

Is it possible to return the actual SQL query as a string from the result of db_query?
Or otherwise take the returned resource ID from db_query and get the SQL string?
Edit:
As an addendum, I recently found out about db_queryd() from the Devel module, which echoes the query passed (as well as execute it). Doesn't return the string as this question asked, but really helpful for copying and pasting a complete query.
I don't think it is. However if you are only doing so for the purpose of debugging you can turn on the devel module and that will show you the queries run.
Actually you could just set the variable 'dev_query' to 1 and then access the global array $queries, but I wouldn't recommend it.
Drupal 7, if debug, you could find at \includes\database\database.inc:
function query($query, array $args = array(), $options = array())
$stmt's queryString
or
print_r($stmt->getQueryString());
If you have D7 but don't have Devel to hand, the following snippet could come in useful — it may not handle every type of placeholder however... it currently wrongly assumes all placeholders are strings (which has been fine for my usage).
function stringify_query( $query ){
$s = preg_replace('/\}|\{/', '', $query->__toString());
$a = $query->arguments();
foreach ( $a as $key => $val ) {
$a[$key] = '\'' . $val . '\'';
}
return strtr($s, $a);
}
It also rudely strips out Drupal's curly braces used to handle table prefixes, if you rely on table prefixes then you should find the correct Drupal function to have them replaced correctly.
I would recommend the use of the devel module. There is a setting devel offers which will show all queries run during the generation of a page at the bottom of the page, with data on query execution time and the function that called db_query(). If you have a general idea of what your query will look like or the function that called it, you could search for it within your browser and you can see what was actually send to the database.
Late answer, but you can often turn
$result = db_query($query, $arg1, $arg2);
quickly into
drupal_set_message(sprintf($query, $arg1, $arg2), "status");
And get what you want.
This doesn't help you if you are using an array as your argument to db_query as sprintf doesn't support that, but is often useful in your debugging toolkit.
For those using Drupal 7.x and the Devel module, the correct function to call to output the built SQL statement to the drupal message area is dpq(). It needs to be passed the query object though. e.g.
// to see the built SQL
$query = db_select('node', 'n')->fields('n');
dpq($query);
// to see the results of the query
$results = $query->execute()->fetchAssoc();
dsm($results);
Hope that can help!
D7 version with devel.
>= PHP 5.4
dpm(str_replace(['{', '}'], '', dpq($query, TRUE)));
< PHP 5.4
dpm(str_replace(array('{', '}'), '', dpq($query, TRUE)));

Removing [nid:n] in nodereference autocomplete

Using the autocomplete field for a cck nodereference always displays the node id as a cryptic bracketed extension:
Page Title [nid:23]
I understand that this ensures that selections are unique in case nodes have the same title, but obviously this is a nasty thing to expose to the user.
Has anyone had any success in removing these brackets, or adding a different unique identifier?
Ultimately, you need to change the output of nodereference_autocomplete() in nodereference.module.
To do this properly, you want a custom module to cleanly override the function.
This function is defined as a menu callback, thus,
/**
* Implementation of hook_menu_alter().
*/
function custom_module_menu_alter(&$items) {
$items['nodereference/autocomplete']['page callback'] = 'custom_module_new_nodereference_autocomplete';
}
Then, copy the nodereference_autocomplete function into your custom module, changing it's name to match your callback. Then change this one line:
$matches[$row['title'] ." [nid:$id]"] = '<div class="reference-autocomplete">'. $row['rendered'] . '</div>';
Dropping the nid reference.
$matches[$row['title']] = '<div class="reference-autocomplete">'. $row['rendered'] . '</div>';
I believe the identifier is purely cosmetic at this point, which means you could also change the text however you like. If it is not purely cosmetic, well, I haven't tested to see what will happen in the wrong conditions.
I always meant to identify how to do this. Thank you for motivating me with your question.
What Grayside has posted will work... as long as you don't have two nodes with the same title. In other words, if you want to do as Grayside has proposed, you need to be aware that the nid is not entirely unimportant. The nodereference_autocomplete_validate() function does two things. It checks to see if there is a node that matches, and if so, it passes the nid on, setting it to the $form_state array. If it can't find a node, it will set an error. If the nid is present, it will be used to get the node, which also is faster, the code is here:
preg_match('/^(?:\s*|(.*) )?\[\s*nid\s*:\s*(\d+)\s*\]$/', $value, $matches);
if (!empty($matches)) {
// Explicit [nid:n].
list(, $title, $nid) = $matches;
if (!empty($title) && ($n = node_load($nid)) && $title != $n->title) {
form_error($element[$field_key], t('%name: title mismatch. Please check your selection.', array('%name' => t($field['widget']['label']))));
}
}
This just checks to see if there is a nid and checks if that node matches with the title, if so the nid is passed on.
The 2nd option is a bit slower, but it is here errors can happen. If you follow the execution, you will see, that if will try to find a node based on title alone, and will take the first node that matches. The result of this, is that if you have two nodes with the same title, one of them will always be used. This might not be a problem for you, but the thing is, that you will never find out if this happens. Everything will work just fine and the user will think that he selected the node he wanted to. This might be the case, but he might as well have chosen the wrong node.
So in short, you can get rid of the nid in the autocomplete callback, but it has 2 drawbacks:
performance (little)
uncertainty in selecting the correct node.
So you have to think about it, before going this route. Especially, since you most likely wont be able to find the problem of the selection of the wrong nodes, should it happen. Another thing to be aware of, is that the nid showing up, also brings some valuable info to the users, a quick way to lookup the node, should they be in doubt if it is the one they want, if several nodes have similar titles.
I got Grayside's answer to work, but I had to use MENU alter, instead of the FORM alter he posted. No biggy!
function custommodule_menu_alter(&$items) {
$items['nodereference/autocomplete']['page callback'] = 'fp_tweaks_nodereference_autocomplete';
}
I've found an alternative solution is to change your widget type to select list and then use the chosen module to convert your list to an autocomplete field.
This handles nodes with the same title, and actually I think the UI is better than the one provided by the autocomplete widget.
To anyone coming across this (rather old) topic by way of a google search - for Drupal 7 please consider using entityreference module and "Entity Reference" field type if possible.
You can acheive a lot more in configuration with an "Entity Reference" field. It doesn't have this problem with the nid in square brackets.
Here is the full Drupal 7 version (References 7.x-2.1) of Grayside's answer. This goes in your custom module:
/**
* Implementation of hook_menu_alter().
*/
function custom_menu_alter(&$items) {
$items['node_reference/autocomplete/%/%/%']['page callback'] = 'custom_new_node_reference_autocomplete';
}
/**
* Implementation of Menu callback for the autocomplete results.
*/
function custom_new_node_reference_autocomplete($entity_type, $bundle, $field_name, $string = '') {
$field = field_info_field($field_name);
$instance = field_info_instance($entity_type, $field_name, $bundle);
$options = array(
'string' => $string,
'match' => $instance['widget']['settings']['autocomplete_match'],
'limit' => 10,
);
$references = node_reference_potential_references($field, $options);
$matches = array();
foreach ($references as $id => $row) {
// Markup is fine in autocompletion results (might happen when rendered
// through Views) but we want to remove hyperlinks.
$suggestion = preg_replace('/<a href="([^<]*)">([^<]*)<\/a>/', '$2', $row['rendered']);
// Add a class wrapper for a few required CSS overrides.
$matches[$row['title']] = '<div class="reference-autocomplete">' . $suggestion . '</div>'; // this is the line that was modified to remove the "[nid:XX]" disambiguator
}
drupal_json_output($matches);
}

Resources