XQuery get distinct values after tokenize?

XQuery get distinct values after tokenize? - xquery

I want to get the distinct countries of all mountains, but sometimes a mountain will be in more than one country as indicated by having multiple country codes in a string like this:
<mountain id="mount-Kangchendzonga" country="NEP IND"></mountain>
I can get all the distinct strings associated with a country using
let $mts := doc("mondial.xml")/mondial//mountain
let $countries := distinct-values(data($mts/#country))
But this isn't quite correct because if I had one mountain with country="NEP IND" and another with country="NEP" these would be recognized as distinct.
let $countries := distinct-values(concat(' ', data($mts/#country)))
let $countries := distinct-values(tokenize(data($mts/#country), "\s+"))
Is there a way I could first split up a string of a country by white space, and then get the distinct values of these? I have tried using distinct-values on concatenated and tokenized data like I showed above, but both result in errors with the compiler.

This is one possible way to combine tokenize() and distinct-values() to get the distinct country names :
let $all-countries :=
for $c in $mts/#country
return tokenize($c, "\s+")
let $distinct-countries := distinct-values($all-countries)
xpathtester.com demo
Or in XQuery 3.1, as suggested in comment below :
($mts/#country ! tokenize(., '\s+')) => distinct-values()

Related

How to sort by dynamically with ascending or descending in Marklogic?

let $sortelement := 'Salary'
let $sortby := 'ascending'
for $doc in collection('employee')
order by $doc/*[local-name() eq $sortelement] $sortby
return $doc
This code throws and error, what is the correct way to do this?

If you are just looking to build the order by dynamically within a FLWOR statement, you can't. As Michael Kay points out in the comments, you could use a conditional statement and decide whether or not to reverse() the ascending (default) sorted sequence.
let $sortelement := 'Salary'
let $sortby := 'ascending'
let $results :=
for $doc in collection('employee')
order by $doc/*[local-name() eq $sortelement]
return $doc
return
if ($sortby eq 'descending')
then reverse($results)
else $results
Depending upon how many documents are in your collection, retrieving every document and sorting them won't scale. It can take a long time, and can exceed memory limits for expanded tree cache.
If you have indexes on those elements, then you can dynamically build a cts:index-order() and specify as the third parameter for cts:search() in order to get them returned in the specified order:
let $sortelement := 'Salary'
let $sortby := 'ascending'
return
cts:search(doc(),
cts:collection-query("employee"),
cts:index-order(cts:element-reference(xs:QName($sortelement)), $sortby)
)

How can I cts query on two values?

I'm trying to work out a nice cts query that matches two nodes, rather than one. For example, I have records from two sources, both with an ID value and a dateTime value. I'd like to find records in the first source that has a matching ID in the second source and a newer dateTime value.
Something like this (does not work):
cts:uris(
(),
(),
cts:and-query((
cts:collection-query("source1"),
cts:path-range-query(
"/record/ID",
"=",
cts:values(
cts:path-reference("/record/ID"),
(),
(),
cts:collection-query("source2")
)
),
cts:path-range-query(
"/record/dateTimeValue",
">",
cts:values(
cts:path-reference("/record/dateTimeValue"),
(),
(),
cts:collection-query("source2")
)
)
))
)
This wont work because it returns records that have an equal ID value and where there also exists a record with a greater dateTimeValue
How do I make the cts query match on two values? Can I only do this through a FLWOR?

If I understand the requirement correctly, this query could be implemented efficiently with a join:
Create a TDE view scoped to the collection with a context of /record that projects id and datetime columns
Use an Optic query to join the view with itself on the id with a condition of a greater datetime and then join the matching documents (or project all of the needed columns from the record)
Something like the following:
const docid = op.fragmentIdCol('docid');
const v1 = op.fromView(null,"source2", "v1", docid);
const v2 = op.fromView(null,"source2", "v2");
v1.joinInner(v2, op.on(v1.col("id"), v2.col("id"),
op.gt(v1.col("datetime"), v2.col("datetime")))
.select(docid)
.joinDoc('doc', 'docid')
.result();
For more detail, see:
https://docs.marklogic.com/ModifyPlan.prototype.joinInner
Hoping that helps,

You can do this with a cts query by searching source1 for every combination found in source2 (or vice versa). I don't know how performant it would be... it doesn't seem like it should be worse than getting two co-occurrence maps and doing it manually.
cts:uris(
(),
(),
cts:and-query((
cts:collection-query("source1"),
cts:or-query((
for $tuple in cts:value-co-occurrences(
cts:path-reference("/record/ID"),
cts:path-reference("/record/dateTimeValue"),
(),
cts:collection-query("source2")
)
return cts:and-query((
cts:path-range-query("/record/ID", "=", $tuple/cts:value[1]),
cts:path-range-query("/record/dateTimeValue", ">", $tuple/cts:value[2])
))
))
)
)
If the ID overlap between source1 and source2 is small, then it's probably better to find the overlap first and plug those IDs into the co-occurrence query, so it isn't scatter-querying so widely.

This does the work:
let $local:cL := function($dt as xs:dateTime, $id as xs:string)
{
let $query :=
cts:and-query((
cts:collection-query("source1"),
cts:path-range-query("/record/dateTimeValue", ">", $dt),
cts:path-range-query("/record/ID", "=", $id)
))
for $uri in cts:uris("", "document", $query)
return
<uri>{$uri}</uri>
}
let $docs := cts:search(doc(), cts:collection-query("source2"))
for $doc in $docs
return
xdmp:apply( $local:cL, $doc/record/dateTimeValue, $doc/record/ID )

Xquery: Counting the number of occurrences of a term in each record within a set of records

Given a set of xml records and a set of terms $terms . The terms in $terms sequence are extracted from the set of records. I want to count the number of occurrences of each term in each paragraph record. I used the following code to do so:
for $record in /rec:Record
for $term in $terms
return xdmp:unquote(concat('<info>',string(count(lower-case($record/rec:paragraph )[. = lower-case($term)])), '</info>'))
For each term in each record i got 0 count:
Example: $term:='Mathematics', $record/rec:paragraph:='Mathematics is the study of topics such as quantity'
I want the number of occurances of the term Mathematics in $record/rec:paragraph
Any idea of what caused this result? Is there any other way to count the number of occurrences of each of the terms in each paragraph.

Use tokenize() to split up the input string into word tokens. Then the counting itself is trivial. For example:
let $text := 'Mathematics is the study of topics such as quantity'
let $myterms := 'mathematics'
let $wds := tokenize($text, '\s+')
for $t in $myterms
return <term name="{$t}">{count($wds[lower-case(.)=lower-case($t)])}</term>
Returns this:
<term nm="mathematics">1</term>

displaying count values IN AN ARRAY

I have the following sql code:
$management_lcfruh_sql = "SELECT COUNT(schicht), codes.lcfruh, personal.status, dienstplan.kw, personal.perso_id, personal.sort_order, dienstplan.datum FROM dienstplan INNER JOIN codes ON dienstplan.schicht=codes.lcfruh INNER JOIN personal ON personal.perso_id=dienstplan.perso_id WHERE codes.lcfruh!='' AND personal.status='management' AND dienstplan.kw='$kw' ORDER BY personal.sort_order, dienstplan.datum";
$management_lcfruh_result= mysql_query($management_lcfruh_sql);
how can I get a list of counts instead of only one count, dienstplan.kw='$kw' is a week of the year which have seve days, so I should get seven result listed instead of a one count of all of the seven.
<?php
while($rows=mysql_fetch_array($management_lcfruh_result)){
?>
<? echo $rows['COUNT(schicht)']; ?>
<?php
}
?>

Um well you haven't been a big help on the extra info front, but may be this will get you going in the right direction.
Given Table1(WeekNo int, Category int, Schict int)
and Table2 (WeekNo int, Schict int, CategoryDate dateTime)
Then
Select t1.Catgeory, DayName(t2.CategoryDate), t1.Schict, Count(t1.schict)
From Table1 t1
Inner join Table2 t2 On t1.Schict = t2.schict and t1.WeekNo = t2.weekNo
Group by t1.Category,DayName(t2.CategoryDate), t1.Schict
Would get you a count by Category and schict across the two tables, but the number of records you get would be dependent on how many days of the week you had records for.
You could deal with that in Sql but it would be much easier to fill in the missing days in your array client side in PHP. Posibly even easier, if you were to use the mysql DayOfWeek
function instead of DayName.
Some clues anyway
Perhaps even easier would to simply add this date you won't tell us about to the select clause of your current query and then use PHP to get the day.

SQLite query with - || OR. Concatenated results

I have a problem with the following query :
SELECT DISTINCT city || strftime("%Y", begintime) FROM Texts_original1
The query itself works but results themselves are concatenated so instead of for example :
city = Dublin
strftime("%Y", begintime) = 2008
I get :
city || strftime("%Y", begintime) = Dublin2008
Any ideas how to avoid that concatenation and make the response be separated to different columns ?

The || operator is "concatenate" - it joins together the two strings of its operands. (Source)
So, whatever it is you're trying to do, the || operator isn't what you want.

change || by a comma to make it different columns. what happens if you try to execute this?
SELECT DISTINCT city, strftime("%Y", begintime)
FROM Texts_original1
you tried to mention this: how to avoid that concatenation and make the response be separated to different columns

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

XQuery get distinct values after tokenize? - xquery

Related

How to sort by dynamically with ascending or descending in Marklogic?

How can I cts query on two values?

Xquery: Counting the number of occurrences of a term in each record within a set of records

displaying count values IN AN ARRAY

SQLite query with - || OR. Concatenated results

Categories

Resources