How to convert string to XPATH in BaseX - xquery

How can i convert string into XPATH, below is the code
let $ti := "item/title"
let $tiValue := "Welcome to America"
return db:open('test')/*[ $tiValue = $ti]/base-uri()

Here is one way to solve it:
let $ti := "item/title"
let $tiValue := "Welcome to America"
let $input := db:open('test')
let $steps := tokenize($ti, '/')
let $process-step := function($input, $step) { $input/*[name() = $step] }
let $output := fold-left($input, $steps, $process-step)
let $test := $output[. = $tiValue]
return $test/base-uri()
The path string is split into single steps (item, title). With fold-left, all child nodes of the current input (initially db:open('test')) will be matched against the current step (initially, item). The result will be used as new input and matched against the next step (title), and so on. Finally, only those nodes with $tiValue as text value will be returned.

Your question is very unclear - the basic problem is that you've shown us some code that doesn't do what you want, and you're asking us to work out what you want by guessing what was going on in your head when you wrote the incorrect code.
I suspect -- I may be wrong -- that you were hoping this might somehow give you the result of
db:open('test')/*[item/title = $ti]/base-uri()
and presumably $ti might hold different path expressions on different occasions.
XQuery 3.0/3.1 doesn't have any standard way to evaluate an XPath expression supplied dynamically as a string (unless you count the rather devious approach of using fn:transform() to invoke an XSLT transformation that uses the xsl:evaluate instruction).
BaseX however has an query:eval() function that will do the job for you. See https://docs.basex.org/wiki/XQuery_Module

Related

Counting nr of elements in a file

I am trying to count the number of Harbour elements in an XML file. However, i keep getting the following error:
item expected, sequence found: (element harbour {...}, ...)
The code snippet is the following:
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour eq $harbour)
return concat($harbour, " ", $count)
Input XML:
<voyage>
<number>4411</number>
<leftpage>
<harbour>Rammekens</harbour>
</leftpage>
</voyage>
<voyage>
<number>4412</number>
<leftpage>
<harbour>Texel</harbour>
</leftpage>
</voyage>
Can someone help me out? How do I iterate over the number of harbours in the XML file instead of trying to use /VOC/voyage/leftpage/harbour?
eq is a value comparison, i.e. used to compare individual items. That is why the errors messages tells you that it is expecting a (single) item, but instead found all the harbour elements. You have to use the general comparison operator =. Also, when you would compare it like that
/VOC/voyage/leftpage/harbour = $harbour
it would always be 1 as it will compare the existence. instead, you want to filter out all harbour items which have an equal text element as child. You can do so using []. All together it will be
for $harbour in distinct-values(/VOC/voyage/leftpage/harbour)
let $count := count(/VOC/voyage/leftpage/harbour[. = $harbour])
return concat($harbour, " ", $count)
Also, if your XQuery processor supports XQuery 3.0 you can also use a group by operator, which in my opinion is nicer to read (and could be faster, but this depends on the implementation):
for $voyage in /VOC/voyage
let $harbour := $voyage/leftpage/harbour
let $harbour-name := $harbour/string()
group by $harbour-name
return $harbour-name || " " || count($harbour)

Compare two elements of the same document in MarkLogic

I have a MarkLogic 8 database in which there are documents which have two date time fields:
created-on
active-since
I am trying to write an Xquery to search all the documents for which the value of active-since is less than the value of created-on
Currently I am using the following FLWOR exression:
for $entity in fn:collection("entities")
let $id := fn:data($entity//id)
let $created-on := fn:data($entity//created-on)
let $active-since := fn:data($entity//active-since)
where $active-since < $created-on
return
(
$id,
$created-on,
$active-since
)
The above query takes too long to execute and with increase in the number of documents the execution time of this query will also increase.
Also, I have
element-range-index for both the above mentioned dateTime fields but they are not getting used here. The cts-element-query function only compares one element with a set of atomic values. In my case I am trying to compare two elements of the same document.
I think there should be a better and optimized solution for this problem.
Please let me know in case there is any search function or any other approach which will be suitable in this scenario.
This may be efficient enough for you.
Take one of the values and build a range query per value. This all uses the range indexes, so in that sense, it is efficient. However, at some point, there is a large query that us built. It reads similiar to a flword statement. If really wanted to be a bit more efficient, you could find out which if your elements had less unique values (size of the index) and use that for your iteration - thus building a smaller query. Also, you will note that on the element-values call, I also constrain it to your collection. This is just in case you happen to have that element in documents outside of your collection. This keeps the list to only those values you know are in your collection:
let $q := cts:or-query(
for $created-on in cts:element-values(xs:QName("created-on"), (), cts:collection-query("entities"))
return cts:element-value-range-query(xs:Qname("active-since"), "<" $created-on)
)
return
cts:search(
fn:collection("entities"),
$q
)
So, lets explain what is happening in a simple example:
Lets say I have elements A and B - each with a range index defined.
Lets pretend we have the combinations like this in 5 documents:
A,B
2,3
4,2
2,7
5,4
2,9
let $ := cts:or-query(
for $a in cts:element-values(xs:QName("A"))
return cts:element-value-range-query(xs:Qname("B"), "<" $a)
)
This would create the following query:
cts:or-query(
(
cts:element-value-range-query(xs:Qname("B"), "<" 2),
cts:element-value-range-query(xs:Qname("B"), "<" 4),
cts:element-value-range-query(xs:Qname("B"), "<" 5)
)
)
And in the example above, the only match would be the document with the combination: (5,4)
You might try using cts:tuple-values(). Pass in three references: active-since, created-on, and the URI reference. Then iterate the results looking for ones where active-since is less than created-on, and you'll have the URI of the doc.
It's not the prettiest code, but it will let all the data come from RAM, so it should scale nicely.
I am now using the following script to get the count of documents for which the value of active-since is less than the value of created-on:
fn:sum(
for $value-pairs in cts:value-tuples(
(
cts:element-reference(xs:QName("created-on")),
cts:element-reference(xs:QName("active-since"))
),
("fragment-frequency"),
cts:collection-query("entities")
)
let $created-on := json:array-values($value-pairs)[1]
let $active-since := json:array-values($value-pairs)[2]
return
if($active-since lt $created-on) then cts:frequency($value-pairs) else 0
)
Sorry for not having enough reputation, hence I need to comment here on your answer. Why do you think that ML will not return (2,3) and (4,2). I believe we are using an Or-query which will take any single query as true and return the document.

Recursively wrapping up an element

Say I have an element <x>x</x> and some empty elements (<a/>, <b/>, <c/>), and I want to wrap up the first inside the second one at a time, resulting in <c><b><a><x>x</x></a></b></c>. How do I go about this when I don't know the number of the empty elements?
I can do
xquery version "3.0";
declare function local:wrap-up($inner-element as element(), $outer-elements as element()+) as element()+ {
if (count($outer-elements) eq 3)
then element{node-name($outer-elements[3])}{element{node-name($outer-elements[2])}{element{node-name($outer-elements[1])}{$inner-element}}}
else
if (count($outer-elements) eq 2)
then element{node-name($outer-elements[2])}{element{node-name($outer-elements[1])}{$inner-element}}
else
if (count($outer-elements) eq 1)
then element{node-name($outer-elements[1])}{$inner-element}
else ($outer-elements, $inner-element)
};
let $inner-element := <x>x</x>
let $outer-elements := (<a/>, <b/>, <c/>)
return
local:wrap-up($inner-element, $outer-elements)
but is there a way to do this by recursion, not decending and parsing but ascending and constructing?
In functional programming, you usually try to work with the first element and the tail of a list, so the canonical solution would be to reverse the input before nesting the elements:
declare function local:recursive-wrap-up($elements as element()+) as element() {
let $head := head($elements)
let $tail := tail($elements)
return
element { name($head) } { (
$head/#*,
$head/node(),
if ($tail)
then local:recursive-wrap-up($tail)
else ()
) }
};
let $inner-element := <x>x</x>
let $outer-elements := (<a/>, <b/>, <c/>)
return (
local:wrap-up($inner-element, $outer-elements),
local:recursive-wrap-up(reverse(($inner-element, $outer-elements)))
)
Whether reverse(...) will actually require reversing the output or not will depend on your XQuery engine. In the end, reversing does not increase computational complexity, and might not only result in cleaner code, but even faster execution!
Similar could be achieved by turning everything upside down, but there are no functions for getting the last element and everything before this, and will possibly reduce performance when using predicates last() and position() < last(). You could use XQuery arrays, but will have to pass counters in each recursive function call.
Which solution is fastest in the end will require benchmarking using the specific XQuery engine and code.

Access variable from within itself in XQuery

I'm wondering whether in XQuery it is possible to access some elements in a variable from within the variable itself.
For instance, if you have a variable with several numbers and you want to sum them all up inside the variable itself. Can you do that with only one variable? Consider something like this:
let $my_variable :=
<my_variable_root>
<number>5</number>
<number>10</number>
<sum>{sum (??)}</sum>
</my_variable_root>
return $my_variable
Can you put some XPath expression inside sum() to access the value of the preceding number elements? I've tried $my_variable//number/number(text()), //number/number(text()), and preceding-sibling::number/number(text()) - but nothing worked for me.
You cannot do that. The variable is not created, till everything in it is constructed.
But you can have temporary variables in the variable
Like
let $my_variable :=
<my_variable_root>{
let $numbers := (
<number>5</number>,
<number>10</number>
)
return ($numbers, <sum>{sum ($numbers)}</sum>)
} </my_variable_root>
Or (XQuery 3):
let $my_variable :=
<my_variable_root>{
let $numbers := (5,10)
return (
$numbers ! <number>{.}</number>,
<sum>{sum ($numbers)}</sum>)
} </my_variable_root>
This is not possible, neither by using the variable name (it is not defined yet), nor using the preceding-sibling axis (no context item bound).
Construct the variable's contents in a flwor-expression instead:
let $my_variable :=
let $numbers := (
<number>5</number>,
<number>10</number>
)
return
<my_variable_root>
{ $numbers }
<sum>{ sum( $numbers) }</sum>
</my_variable_root>
return $my_variable
If you have similar patterns multiple times, consider writing a function; using XQuery Update might also be an alternative (but does not seem to be the most reasonable one to me, both in terms of readability and probably performance).

idl: pass keyword dynamically to isa function to test structure read by read_csv

I am using IDL 8.4. I want to use isa() function to determine input type read by read_csv(). I want to use /number, /integer, /float and /string as some field I want to make sure float, other to be integer and other I don't care. I can do like this, but it is not very readable to human eye.
str = read_csv(filename, header=inheader)
; TODO check header
if not isa(str.(0), /integer) then stop
if not isa(str.(1), /number) then stop
if not isa(str.(2), /float) then stop
I am hoping I can do something like
expected_header = ['id', 'x', 'val']
expected_type = ['/integer', '/number', '/float']
str = read_csv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i=0l,n_elements(expected_type) do
if not isa(str.(i), expected_type[i]) then stop
endfor
the above doesn't work, as '/integer' is taken literally and I guess isa() is looking for named structure. How can you do something similar?
Ideally I want to pick expected type based on header read from file, so that script still works as long as header specifies expected field.
EDIT:
my tentative solution is to write a wrapper for ISA(). Not very pretty, but does what I wanted... if there is cleaner solution , please let me know.
Also, read_csv is defined to return only one of long, long64, double and string, so I could write function to test with this limitation. but I just wanted to make it to work in general so that I can reuse them for other similar cases.
function isa_generic,var,typ
; calls isa() http://www.exelisvis.com/docs/ISA.html with keyword
; if 'n', test /number
; if 'i', test /integer
; if 'f', test /float
; if 's', test /string
if typ eq 'n' then return, isa(var, /number)
if typ eq 'i' then then return, isa(var, /integer)
if typ eq 'f' then then return, isa(var, /float)
if typ eq 's' then then return, isa(var, /string)
print, 'unexpected typename: ', typ
stop
end
IDL has some limited reflection abilities, which will do exactly what you want:
expected_types = ['integer', 'number', 'float']
expected_header = ['id', 'x', 'val']
str = read_csv(filename, header=inheader)
if ~array_equal(strlowcase(inheader), expected_header) then stop
foreach type, expected_types, index do begin
if ~isa(str.(index), _extra=create_struct(type, 1)) then stop
endforeach
It's debatable if this is really "easier to read" in your case, since there are only three cases to test. If there were 500 cases, it would be a lot cleaner than writing 500 slightly different lines.
This snipped used some rather esoteric IDL features, so let me explain what's happening a bit:
expected_types is just a list of (string) keyword names in the order they should be used.
The foreach part iterates over expected_types, putting the keyword string into the type variable and the iteration count into index.
This is equivalent to using for index = 0, n_elements(expected_types) - 1 do and then using expected_types[index] instead of type, but the foreach loop is easier to read IMHO. Reference here.
_extra is a special keyword that can pass a structure as if it were a set of keywords. Each of the structure's tags is interpreted as a keyword. Reference here.
The create_struct function takes one or more pairs of (string) tag names and (any type) values, then returns a structure with those tag names and values. Reference here.
Finally, I replaced not (bitwise not) with ~ (logical not). This step, like foreach vs for, is not necessary in this instance, but can avoid headache when debugging some types of code, where the distinction matters.
--
Reflective abilities like these can do an awful lot, and come in super handy. They're work-horses in other languages, but IDL programmers don't seem to use them as much. Here's a quick list of common reflective features I use in IDL, with links to the documentation for each:
create_struct - Create a structure from (string) tag names and values.
n_tags - Get the number of tags in a structure.
_extra, _strict_extra, and _ref_extra - Pass keywords by structure or reference.
call_function - Call a function by its (string) name.
call_procedure - Call a procedure by its (string) name.
call_method - Call a method (of an object) by its (string) name.
execute - Run complete IDL commands stored in a string.
Note: Be very careful using the execute function. It will blindly execute any IDL statement you (or a user, file, web form, etc.) feed it. Never ever feed untrusted or web user input to the IDL execute function.
You can't access the keywords quite like that, but there is a typename parameter to ISA that might be useful. This is untested, but should work:
expected_header = ['id', 'x', 'val']
expected_type = ['int', 'long', 'float']
str = read_cv(filename, header=inheader)
if not array_equal(strlowcase(inheader), expected_header) then stop
for i = 0L, n_elemented(expected_type) - 1L do begin
if not isa(str.(i), expected_type[i]) then stop
endfor

Resources