I'm trying to work out a nice cts query that matches two nodes, rather than one. For example, I have records from two sources, both with an ID value and a dateTime value. I'd like to find records in the first source that has a matching ID in the second source and a newer dateTime value.
Something like this (does not work):
cts:uris(
(),
(),
cts:and-query((
cts:collection-query("source1"),
cts:path-range-query(
"/record/ID",
"=",
cts:values(
cts:path-reference("/record/ID"),
(),
(),
cts:collection-query("source2")
)
),
cts:path-range-query(
"/record/dateTimeValue",
">",
cts:values(
cts:path-reference("/record/dateTimeValue"),
(),
(),
cts:collection-query("source2")
)
)
))
)
This wont work because it returns records that have an equal ID value and where there also exists a record with a greater dateTimeValue
How do I make the cts query match on two values? Can I only do this through a FLWOR?
If I understand the requirement correctly, this query could be implemented efficiently with a join:
Create a TDE view scoped to the collection with a context of /record that projects id and datetime columns
Use an Optic query to join the view with itself on the id with a condition of a greater datetime and then join the matching documents (or project all of the needed columns from the record)
Something like the following:
const docid = op.fragmentIdCol('docid');
const v1 = op.fromView(null,"source2", "v1", docid);
const v2 = op.fromView(null,"source2", "v2");
v1.joinInner(v2, op.on(v1.col("id"), v2.col("id"),
op.gt(v1.col("datetime"), v2.col("datetime")))
.select(docid)
.joinDoc('doc', 'docid')
.result();
For more detail, see:
https://docs.marklogic.com/ModifyPlan.prototype.joinInner
Hoping that helps,
You can do this with a cts query by searching source1 for every combination found in source2 (or vice versa). I don't know how performant it would be... it doesn't seem like it should be worse than getting two co-occurrence maps and doing it manually.
cts:uris(
(),
(),
cts:and-query((
cts:collection-query("source1"),
cts:or-query((
for $tuple in cts:value-co-occurrences(
cts:path-reference("/record/ID"),
cts:path-reference("/record/dateTimeValue"),
(),
cts:collection-query("source2")
)
return cts:and-query((
cts:path-range-query("/record/ID", "=", $tuple/cts:value[1]),
cts:path-range-query("/record/dateTimeValue", ">", $tuple/cts:value[2])
))
))
)
)
If the ID overlap between source1 and source2 is small, then it's probably better to find the overlap first and plug those IDs into the co-occurrence query, so it isn't scatter-querying so widely.
This does the work:
let $local:cL := function($dt as xs:dateTime, $id as xs:string)
{
let $query :=
cts:and-query((
cts:collection-query("source1"),
cts:path-range-query("/record/dateTimeValue", ">", $dt),
cts:path-range-query("/record/ID", "=", $id)
))
for $uri in cts:uris("", "document", $query)
return
<uri>{$uri}</uri>
}
let $docs := cts:search(doc(), cts:collection-query("source2"))
for $doc in $docs
return
xdmp:apply( $local:cL, $doc/record/dateTimeValue, $doc/record/ID )
Related
let $sortelement := 'Salary'
let $sortby := 'ascending'
for $doc in collection('employee')
order by $doc/*[local-name() eq $sortelement] $sortby
return $doc
This code throws and error, what is the correct way to do this?
If you are just looking to build the order by dynamically within a FLWOR statement, you can't. As Michael Kay points out in the comments, you could use a conditional statement and decide whether or not to reverse() the ascending (default) sorted sequence.
let $sortelement := 'Salary'
let $sortby := 'ascending'
let $results :=
for $doc in collection('employee')
order by $doc/*[local-name() eq $sortelement]
return $doc
return
if ($sortby eq 'descending')
then reverse($results)
else $results
Depending upon how many documents are in your collection, retrieving every document and sorting them won't scale. It can take a long time, and can exceed memory limits for expanded tree cache.
If you have indexes on those elements, then you can dynamically build a cts:index-order() and specify as the third parameter for cts:search() in order to get them returned in the specified order:
let $sortelement := 'Salary'
let $sortby := 'ascending'
return
cts:search(doc(),
cts:collection-query("employee"),
cts:index-order(cts:element-reference(xs:QName($sortelement)), $sortby)
)
I have a need to be able to query Azure Data Explorer (ADX) tables dynamically, that is, using application-specific metadata that is also stored in ADX.
If this is even possible, the way to do it seems to be via the table() function. In other words, it feels like I should be able to simply write:
let table_name = <non-trivial ADX query that returns the name of a table as a string>;
table(table_name) | limit 10
But this query fails since I am trying to pass a variable to the table() function, and "a parameter, which is not scalar constant string can't be passed as parameter to table() function". The workaround provided doesn't really help, since all the possible table names are not known ahead of time.
Is there any way to do this all within ADX (i.e. without multiple queries from the client) or do I need to go back to the drawing board?
if you know the desired output schema, you could potentially achieve that using union (note that in this case, the result schema will be the union of all tables, and you'll need to explicitly project the columns you're interested in)
let TableA = view() { print col1 = "hello world"};
let TableB = view() { print col1 = "goodbye universe" };
let LabelTable = datatable(table_name:string, label:string, updated:datetime)
[
"TableA", "MyLabel", datetime(2019-10-08),
"TableB", "MyLabel", datetime(2019-10-02)
];
let GetLabeledTable = (l:string)
{
toscalar(
LabelTable
| where label == l
| order by updated desc
| limit 1
)
};
let table_name = GetLabeledTable('MyLabel');
union withsource = T *
| where T == table_name
| project col1
I want to get the distinct countries of all mountains, but sometimes a mountain will be in more than one country as indicated by having multiple country codes in a string like this:
<mountain id="mount-Kangchendzonga" country="NEP IND"></mountain>
I can get all the distinct strings associated with a country using
let $mts := doc("mondial.xml")/mondial//mountain
let $countries := distinct-values(data($mts/#country))
But this isn't quite correct because if I had one mountain with country="NEP IND" and another with country="NEP" these would be recognized as distinct.
let $countries := distinct-values(concat(' ', data($mts/#country)))
let $countries := distinct-values(tokenize(data($mts/#country), "\s+"))
Is there a way I could first split up a string of a country by white space, and then get the distinct values of these? I have tried using distinct-values on concatenated and tokenized data like I showed above, but both result in errors with the compiler.
This is one possible way to combine tokenize() and distinct-values() to get the distinct country names :
let $all-countries :=
for $c in $mts/#country
return tokenize($c, "\s+")
let $distinct-countries := distinct-values($all-countries)
xpathtester.com demo
Or in XQuery 3.1, as suggested in comment below :
($mts/#country ! tokenize(., '\s+')) => distinct-values()
I want to create a cts:or-query in a for loop. How can I do this?
An example of my logic:
let $query := for $tag in (1,2,3,4,5)
return myquery
I would like to get final queries such as:
let $query := cts:or-query(
(
cts:element-query(xs:QName("ts:tag"),'1'),
cts:element-query(xs:QName("ts:tag"),'2'),
cts:element-query(xs:QName("ts:tag"),'3'),
cts:element-query(xs:QName("ts:tag"),'4'),
cts:element-query(xs:QName("ts:tag"),'5')
)
)
For this particular example it would be better to write a shotgun-OR:
cts:element-value-query(xs:QName("ts:tag"), xs:string(1 to 5))
This will behave like an or-query, but will be a little more efficient. Note that I changed your cts:element-query to an element-value query. That may or may not be what you want, but each query term should be as precise as possible.
You can also use a FLWOR expression to generate queries. This is useful for and-query semantics, where the previous technique doesn't help.
let $query := cts:and-query(
for $i in ('dog', 'cat', 'rat')
return cts:word-query($i))
return cts:search(collection(), $query)[1 to 20]
This will work:
let $query := cts:or-query(
for $val in ('1', '2', '3', '4', '5')
return cts:element-query(xs:QName("ts:tag"), $val)
)
The FLWOR loop returns a sequence of cts:element-query's.
I would like to filter a collection with grouped clauses. In SQL this would look something like:
SELECT * FROM `my_table` WHERE col1='x' AND (col2='y' OR col3='z')
How can I "translate" this to filtering a collection with ->addFieldToFilter(...)?
Thanks!
If your collection is an EAV type then this works well:
$collection = Mage::getResourceModel('yourmodule/model_collection')
->addAttributeToFilter('col1', 'x')
->addAttributeToFilter(array(
array('attribute'=>'col2', 'eq'=>'y'),
array('attribute'=>'col3', 'eq'=>'z'),
));
However if you're stuck with a flat table I don't think addFieldToFilter works in quite the same way. One alternative is to use the select object directly.
$collection = Mage::getResourceModel('yourmodule/model_collection')
->addFieldToFilter('col1', 'x');
$collection->getSelect()
->where('col2 = ?', 'y')
->orWhere('col3 = ?', 'z');
But the failing of this is the order of operators. You willl get a query like SELECT * FROM my_table WHERE (col1='x') AND (col2='y') OR (col3='z'). The OR doesn't take precedence here, to get around it means being more specific...
$collection = Mage::getResourceModel('yourmodule/model_collection')
->addFieldToFilter('col1', 'x');
$select = $collection->getSelect();
$adapter = $select->getAdapter();
$select->where(sprintf('(col2 = %s) OR (col3 = %s)', $adapter->quote('x'), $adapter->quote('y')));
It is unsafe to pass values unquoted, here the adapter is being used to safely quote them.
Finally, if col2 and col3 are actually the same, if you're OR-ing for values within a single column, then you can use this shorthand:
$collection = Mage::getResourceModel('yourmodule/model_collection')
->addFieldToFilter('col1', 'x')
->addFieldToFilter('col2', 'in'=>array('y', 'z'));