Exclude empty lists from intersection in XQuery

Exclude empty lists from intersection in XQuery - xquery

I have several lists which may or may not be empty. I want to find those elements which occur in all lists, but only for those lists that are not empty.
I have something like this:
let $results :=
$list1 intersect
$list2 intersect
$list3 intersect
$list4
But if any of the lists is empty this expression returns an empty list. Is there any way I can exclude a list from my intersection if it is empty?
SOLUTION:
This is the solution I ended up using, based on the answer provided by Ranon.
let $union := $list1 | $list2 | $list3 | $list4
let $results :=
(if ($list1) then $list1 else $union) intersect
(if ($list2) then $list2 else $union) intersect
(if ($list3) then $list3 else $union) intersect
(if ($list4) then $list4 else $union)
I would like to thank all who have contributed. Coming from an object-oriented and procedural background, and with XQuery being a functional language it doesn't come as naturally to me (yet).

Instead of using the builtin intersect directly, you can wrap it in a function and check the input lists:
declare function local:safe-intersect($xs, $ys) {
if(exists($xs) and exists($ys))
then $xs intersect $ys
else ($xs, $ys) (: at least one is empty :)
};
Then your example would look like this:
let $results :=
local:safe-intersect(
$list1,
local:safe-intersect(
$list2,
local:safe-intersect($list3, $list4)
)
)
...

You could check all lists if they're empty. If so, assign the union of all lists to them.
For my example I used the functx-implementation of intersect and union to be able to intersect sequences, too. Maybe one should write some function to avoid redundant code, but for showing the idea this code is fine:
import module namespace functx = "http://www.functx.com" at ".../functx-1.0-nodoc-2007-01.xq";
let $list1 := (1,2,3)
let $list2 := (1,3)
let $list3 := ()
let $union := functx:value-union($list1, functx:value-union($list2, $list3))
let $list1a := if (count($list1) != 0) then $list1 else $union
let $list2a := if (count($list2) != 0) then $list2 else $union
let $list3a := if (count($list3) != 0) then $list3 else $union
return functx:value-intersect($list1a, functx:value-intersect($list2a, $list3a))
$union could also be written as ($list1, $list2, $list3), but that will lead to double elements which result in slower intersection-operations for large element counts.

Use:
( $vL1 | ($vL2 | $vL3 | $vL4)[not($vL1)] )
intersect
( $vL2 | ($vL3 | $vL4 | $vL1)[not($vL2)] )
intersect
( $vL3 | ($vL4 | $vL1 | $vL2)[not($vL3)] )
intersect
( $vL4 | ($vL1 | $vL2 | $vL3)[not($vL4)] )
In this expression every argument of intersect is either a $vN or, if $vN is empty, it is the union of the rest of the sets.
This can be written more compactly as:
let $vUniverse := $vL1 | $vL2 | $vL3 | $vL4
return
( $vL1 | $vUniverse [not($vL1)] )
intersect
( $vL2 | $vUniverse [not($vL2)] )
intersect
( $vL3 | $vUniverse [not($vL3)] )
intersect
( $vL4 | $vUniverse [not($vL4)] )
Here is a complete example:
let $vL1 := /*/*[. mod 2 eq 1],
$vL2 := /*/*[. mod 3 eq 1],
$vL3 := /*/*[. mod 4 eq 1],
$vL4 := /*/*[. mod 5 eq 1],
$vUniverse := $vL1 | $vL2 | $vL3 | $vL4
return
( $vL1 | $vUniverse [not($vL1)] )
intersect
( $vL2 | $vUniverse [not($vL2)] )
intersect
( $vL3 | $vUniverse [not($vL3)] )
intersect
( $vL4 | $vUniverse [not($vL4)] )
when this XQuery expression is evaluated (using Saxon 9.3.04 EE) on the following XML document:
<nums>
<num>01</num>
<num>02</num>
<num>03</num>
<num>04</num>
<num>05</num>
<num>06</num>
<num>07</num>
<num>08</num>
<num>09</num>
<num>10</num>
</nums>
the wanted, correct result is produced:
<num>01</num>

Related

Kusto: Apply function on multiple column values during bag_unpack

Given a dynamic field, say, milestones, it has value like: {"ta": 1655859586546, "tb": 1655859586646},
How do I print a table with columns like "ta", "tb" etc, with the single row as unixtime_milliseconds_todatetime(tolong(taValue)), unixtime_milliseconds_todatetime(tolong(tbValue)) etc.
I figured that I'll need to write a function that I can call, so I created this:-
let f = view(a:string ){
unixtime_milliseconds_todatetime(tolong(a))
};
I can use this function with a normal column as:- project f(columnName).
However, in this case, its a dynamic field, and the number of items in the list is large, so I do not want to enter the fields manually. This is what I have so far.
log_table
| take 1
| evaluate bag_unpack(milestones, "m_") // This gives me fields as columns
// | project-keep m_* // This would work, if I just wanted the value, however, I want `view(columnValue)
| project-keep f(m_*) // This of course doesn't work, but explains the idea.

Based on the mv-apply operator
// Generate data sample. Not part of the solution.
let log_table = materialize(range record_id from 1 to 10 step 1 | mv-apply range(1, 1 + rand(5), 1) on (summarize milestones = make_bag(pack_dictionary(strcat("t", make_string(to_utf8("a")[0] + toint(rand(26)))), 1600000000000 + rand(60000000000)))));
// Solution Starts here.
log_table
| mv-apply kv = milestones on
(
extend k = tostring(bag_keys(kv)[0])
| extend v = unixtime_milliseconds_todatetime(tolong(kv[k]))
| summarize milestones = make_bag(pack_dictionary(k, v))
)
| evaluate bag_unpack(milestones)
record_id
ta
tb
tc
td
te
tf
tg
th
ti
tk
tl
tm
to
tp
tr
tt
tu
tw
tx
tz
1
2021-07-06T20:24:47.767Z
2
2021-05-09T07:21:08.551Z
2022-07-28T20:57:16.025Z
2022-07-28T14:21:33.656Z
2020-11-09T00:54:39.71Z
2020-12-22T00:30:13.463Z
3
2021-12-07T11:07:39.204Z
2022-05-16T04:33:50.002Z
2021-10-20T12:19:27.222Z
4
2022-01-31T23:24:07.305Z
2021-01-20T17:38:53.21Z
5
2022-04-27T22:41:15.643Z
7
2022-01-22T08:30:08.995Z
2021-09-30T08:58:46.47Z
8
2022-03-14T13:41:10.968Z
2022-03-26T10:45:19.56Z
2022-08-06T16:50:37.003Z
10
2021-03-03T11:02:02.217Z
2021-02-28T09:52:24.327Z
2021-04-09T07:08:06.985Z
2020-12-28T20:18:04.973Z
9
2022-02-17T04:55:35.468Z
6
2022-08-02T14:44:15.414Z
2021-03-24T10:22:36.138Z
2020-12-17T01:14:40.652Z
2022-01-30T12:45:54.28Z
2022-03-31T02:29:43.114Z
Fiddle

how to accomplish each_slice like ruby with jq

Sample Input
[1,2,3,4,5,6,7,8,9]
My Solution
$ echo '[1,2,3,4,5,6,7,8,9]' | jq --arg g 4 '. as $l|($g|tonumber) as $n |$l|length as $c|[range(0;$c;($g|tonumber))]|map($l[.:.+$n])' -c
Output
[[1,2,3,4],[5,6,7,8],[9]]
shorthand, handy method anything else?

Use a while loop to chop off the first 4 elements .[4:] until the array is empty []. Then, for each result array, consider only its first 4 items [:4]. Generalized to $n:
jq -c --argjson n 4 '[while(. != []; .[$n:])[:$n]]'
[[1,2,3,4],[5,6,7,8],[9]]
Demo

There's an undocumented builtin function, _nwise/1, which you would use like this:
jq -nc --argjson n 4 '[1,2,3,4,5,6,7,8,9] | [_nwise($n)]'
[[1,2,3,4],[5,6,7,8],[9]]
Notice that using --argjson allows you to avoid the call to tonumber.

One way using reduce operating on the whole list, forming only n entries (sub-arrays) at a time
jq -c --argjson g 4 '. as $input |
reduce range(0; ( $input | length ) ; $g) as $r ( []; . + [ $input[ $r: ( $r + $g ) ] ] )'
The three argument form of range(from: upto; by) generates numbers from to upto with an increment of by
E.g. range(0; 9; 4) from your original input produces a set of indices - 0, 4, 8 which is ranged over and the final list is formed by appending the slices, coming out of the array slice operation e.g. [0:4], [4:8] and [8:12]

DB2 Case when need to count the digits of a value

I need to write a case when statement in db2. I am new, so I do not have much experience,sorry for that.
I have a column with different call numbers, each call number should contain 7 digits. (eg.AR78HJ8)
I need when the value is blank or "_______", (7 times _ ), the result to be 0,
and when I have a seven digits call number, (but not 7 times _ ) the result to be 1.
Also , there could be cases when the call number is 8, 6 or any other different then 7 digits. In this case I want to show the call number itself.
What I have written so far is
case when ab.call_number = '' then '0'
when ab.call_number = '_______' then '0'
else '1'
end as "Call number",
but in this case I assume that all other call numbers are always 7 digits.
What should I do?
Thanks a lot for your help!

Try this as is:
WITH TAB (CALL) AS
(
VALUES
'1234567'
, ''
, ' '
, 'AR78HJ8'
, '12345678'
)
SELECT CALL
,
CASE
WHEN CALL = '' THEN '0'
WHEN LENGTH(TRANSLATE(CALL, '', '0123456789', '')) = 0 AND LENGTH(CALL) = 7 THEN '1'
ELSE CALL
END AS "Call number"
FROM TAB;
The result is:
|CALL |Call number|
|--------|-----------|
|1234567 |1 |
| |0 |
| |0 |
|AR78HJ8 |AR78HJ8 |
|12345678|12345678 |

How to produce cartesian square in jq?

How to produce the Cartesian square of an array in jq?
Input:
[0,1,2]
Output:
[[0,0],[0,1],[0,2],
[1,0],[1,1],[1,2],
[2,0],[2,1],[2,2]]
I found simple way to make it work with arithmetic operations, but no luck with comma operator.

Cartesian product
One way to generate the array of pairs in the specified order would be as follow:
def data: [0,1,2];
data | [.[] as $i | .[] as $j | [$i, $j] ]
Alternatively, avoiding $-variables:
[range(0;3) | [.] + (range(0;3)|[.])]
Square matrix with m[i][j] = [i,j]
def Mij(n):
[ range(0;n) as $i
| [ range(0;n) as $j
| [$i, $j] ] ];
Mij(3)
produces:
[[[0,0],[0,1],[0,2]],[[1,0],[1,1],[1,2]],[[2,0],[2,1],[2,2]]]

jq select elements with array not containing string

Now, this is somewhat similar to jq: select only an array which contains element A but not element B but it somehow doesn't work for me (which is likely my fault)... ;-)
So here's what we have:
[ {
"employeeType": "student",
"cn": "dc8aff1",
"uid": "dc8aff1",
"ou": [
"4210910",
"4210910 #Abg",
"4210910 Abgang",
"4240115",
"4240115 5",
"4240115 5\/5"
]
},
{
"employeeType": "student",
"cn": "160f656",
"uid": "160f656",
"ou": [
"4210910",
"4210910 3",
"4210910 3a"
] } ]
I'd like to select all elements where ou does not contain a specific string, say "4210910 3a" or - which would be even better - where ou does not contain any member of a given list of strings.

When it comes to possibly changing inputs, you should make it a parameter to your filter, rather than hardcoding it in. Also, using contains might not work for you in general. It runs the filter recursively so even substrings will match which might not be preferred.
For example:
["10", "20", "30", "40", "50"] | contains(["0"])
is true
I would write it like this:
$ jq --argjson ex '["4210910 3a"]' 'map(select(all(.ou[]; $ex[]!=.)))' input.json

This response addresses the case where .ou is an array and we are given another array of forbidden strings.
For clarity, let's define a filter, intersectq(a;b), that will return true iff the arrays have an element in common:
def intersectq(a;b):
any(a[]; . as $x | any( b[]; . == $x) );
This is effectively a loop-within-a-loop, but because of the semantics of any/2, the computation will stop once a match has been found.(*)
Assuming $ex is the list of exceptions, then the filter we could use to solve the problem would be:
map(select(intersectq(.ou; $ex) | not))
For example, we could use an invocation along the lines suggested by Jeff:
$ jq --argjson ex '["4210910 3a"]' -f myfilter.jq input.json
Now you might ask: why use the any-within-any double loop rather than .[]-within-all double loop? The answer is efficiency, as can be seen using debug:
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; ($b[] | debug) != .)'
["DEBUG:",1]
["DEBUG:",1]
false
$ jq -n '[1,2,3] as $a | [1,1] as $b | all( $a[]; . as $x | all( $b[]; debug | $x != .))'
["DEBUG:",1]
false
(*) Footnote
Of course intersectq/2 as defined here is still O(m*n) and thus inefficient, but the main point of this post is to highlight the drawback of the .[]-within-all double loop.

Here is a solution that checks the .ou member of each element of the input using foreach and contains.
["4210910 3a"] as $list # adjust as necessary
| .[]
| foreach $list[] as $e (
.; .; if .ou | contains([$e]) then . else empty end
)
EDIT: I now realize a filter of the form foreach E as $X (.; .; R) can almost always be rewritten as E as $X | R so the above is really just
["4210910 3a"] as $list
| .[]
| $list[] as $e
| if .ou | contains([$e]) then . else empty end

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Exclude empty lists from intersection in XQuery - xquery

Related

Kusto: Apply function on multiple column values during bag_unpack

how to accomplish each_slice like ruby with jq

DB2 Case when need to count the digits of a value

How to produce cartesian square in jq?

jq select elements with array not containing string

Categories

Resources