XQuery loop conditions - xquery

I have a XML file that follows this DTD structure.
<!DOCTYPE report [
<!ELEMENT report (title,section+)>
<!ELEMENT section (title,body?,section*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT body (para+)>
<!ELEMENT para(#PCDATA)>
<!ATTLIST book version CDATA #REQUIRED>
<!ATTLIST section number ID CDATA #REQUIRED>
]>
And I want to query the following two things using XQuery.
1. Get all titles that appear at least twice (two sections with same title).
for $x in /report/section/
for $y in /report/section/
where $x/#title = $y/#title
return $x/#title
2. Get the number and titles of all sections with at least 10 paragraphs in the body or 5 nested sections.
for $x in /report/section/
where $x/para >= 10 or count(/section) > 10
return <large>$x/number $x/title</large>
But my queries don't seem to be correct. I am a beginner with XQuery OR XPath, could someone tell me how to fix my queries?
Edit: Sample XML
<?xml version="1.0" encoding="UTF-8"?>
<report version = '1'>
<title>Harry Potter</title>
<section number = '1'>
<title>sec1</title>
<body>
<para>1</para>
<para>2</para>
<para>3</para>
<para>4</para>
<para>5</para>
<para>6</para>
<para>7</para>
<para>8</para>
<para>9</para>
<para>10</para>
<para>11</para>
</body>
</section>
<section number = '2'>
<title>sec2</title>
<body><para>test</para></body>
<section number = '2.1'>
<title>sec21</title>
<body>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
<para>test</para>
</body>
</section>
<section number = '2.2'>
<title>sec21</title>
<body><para>test</para></body>
</section>
<section number = '2.3'>
<title>sec23</title>
<body><para>test</para></body>
</section>
<section number = '2.4'>
<title>sec24</title>
<body><para>test</para></body>
</section>
<section number = '2.5'>
<title>sec25</title>
<body><para>test</para></body>
</section>
<section number = '2.6'>
<title>sec1</title>
<body><para>test</para></body>
</section>
</section>
</report>

In your first example, there are two problems. First off, you are not getting the nested sections, because you are only iterating over the section elements that are direct children of the report element. Secondly, you are using two loops over the same content. It is possible for both $x and $y to be the same element, so the where condition will match at least once for each section. I would write it like this:
for $x in distinct-values(/report//section/title)
where count(/report//section[title=$x]) > 1
return $x
The loop gets all unique titles and loops over them (note that we use report//section to get all descendant sections). Then for each of these, we count how many times it was used keeping the ones that occurred more than once. We then return the loop variable (which is bound to the title).
Running it, we get back
sec1 sec21
In the second case, we have the same problem of not getting all descendants. We also need to take the counts. I would use
for $x in /report//section
where count($x/body/para) > 9 or count($x/section) > 4
return <large>{$x/#number} {string($x/title)}</large>
Notice that I selected $x/body/para to get the paragraphs in the section (they occur as children of the body element). This counts the direct descendants, but can be modified to get all descendants if necessary. Notice also the use of curly brackets in the direct element constructor. When we construct a direct element, all text is read literally. The curly brackets are used to evaluate an xquery expression instead of literal text.
I used the string function on the title in order to extract the text contents of the element. If we didn't do that, we would get an actual title element instead of its content (which may be a desired behavior). As we extract the number attribute, it will be a attribute on our constructed element (if we wanted it to be text, we could have applied the string function to it).
In this case, it returns
<large number="1">sec1</large>
<large number="2">sec2</large>
<large number="2.1">sec21</large>
The examples here were tested using the OP's provided XML (example.xml) using Saxon-HE 9.7.0.2J. Only the relevant parts appear above, but the complete first example ran looks like
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "text";
declare context item := doc("example.xml");
for $x in distinct-values(/report//section/title)
where count(/report//section[title=$x]) > 1
return $x
and the complete second example looks like
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xml";
declare context item := doc("example.xml");
for $x in /report//section
where count($x/body/para) > 9 or count($x/section) > 4
return <large>{$x/#number} {string($x/title)}</large>

For the first example in XQuery 3.0 I would use
declare context item := doc("example.xml");
for $x in /report//section/title/data()
group by $x
where count($x) > 1
return $x[1]

Related

XQuery: How to add a comma after sequence except for the last element

I have the following xml:
<form>
<bibl>
<biblScope>1</biblScope>
<biblScope>a</biblScope>
</bibl>
<bibl>
<biblScope>2</biblScope>
<biblScope>b</biblScope>
</bibl>
<bibl>
<biblScope>38</biblScope>
<biblScope>c</biblScope>
</bibl>
</form>
Using this XQuery
for $bibl in form/bibl
return
<div>
{
for $biblScope in $bibl/biblScope/text()
return
$biblScope
}
{if ($bibl[position()] ne $bibl[last()]) then ',' else '#'}
</div>
The <biblScope> contents are printed one after another. After each biblelement, I would like to add a separator (comma / ",") except for the last one, but with the code above, all I get is
<div>
1a
#
</div>
<div>
2b
#
</div>
<div>
38c
#
</div>
and this is definitely wrong, because what I would like to have is
<div>1a,</div>
<div>2b,</div>
<div>38c#</div>
(add a comma after every bibl element content; the last one is supposed to be followed by # instead of a comma.)
Having tried different things for some time now, I could need some help. What is the proper way to do this?
The problem is that position() and last() work on the current context, which is not set by flwor expressions. If you want to use similar semantics, use the at $position syntax to get a position counter, and define $last as the number of results:
let $last := count(form/bibl)
for $bibl at $position in form/bibl
return
<div>
{
for $biblScope in $bibl/biblScope/text()
return
$biblScope
}
{if ($position ne $last) then ',' else '#'}
</div>
If you're able to use XQuery 3.0, the application operator ! might be of use here. Replacing the unnecessary loops by axis steps and element constructors in those, you can rely on the positional predicates, as the context is set to to <bibl/> elements:
form/bibl ! <div>
{
biblScope/text(),
if (position() ne last()) then ',' else '#'
}
</div>
First note that this:
for $biblScope in $bibl/biblScope/text()
return
$biblScope
can be replaced by this:
$bibl/biblScope/text()
(This bit of verbosity is a surprisingly common mistake. It suggests you're thinking procedurally - processing items one at a time, rather than processing sets.)
Then this:
<div>
{$bibl/biblScope/text()}
{if ($bibl[position()] ne $bibl[last()]) then ',' else '#'}
</div>
should be replaced by this:
<div>{string-join($bibl/biblScope, ',')}#</div>
Note that I've also got rid of the unnecessary (and arguably incorrect) use of /text(), which is another XQuery anti-pattern.

Nested elements naming style (Jade, HAML, Slim)

Looking for solution how to use SMACSS naming convention with jade, haml or slim template engine.
Expect following jade code :
.module
.child
.child
as output i'll get following:
<div class="module">
<div class="child"></div>
<div class="child"></div>
</div>
but i'd like to reach following result:
<div class="module">
<div class="module-child"></div>
<div class="module-child"></div>
</div>
is there any solution to manage it like i can do it in SASS for example, i mean avoid adding 'module-' string to each 'child' manually ?
UPDATE
Also acceptable solutions with Haml and Slim
This is the closest I got with jade (live playground here):
mixin e(elt)
- var a = attributes;
- var cl = attributes.class;delete attributes.class
- var elt = elt ? elt : 'div' // If no parameter given
if cl
- var cl = parent + '-' + cl
else
- var cl = parent
#{elt}&attributes({'class': cl}, attributes)
block
- var parent = 'box'
+e('aside')#so-special
+e('h2').title Related
+e('ul').list
+e('li').item: +e('a')(href='#').link Item 1
+e('li').item: +e('span').link.current Item 2 and current
+e('li').item#third(data-dash='fine', aria-live='polite') Item 3 not even a link
| multi-line
| block
// - var parent = 'other' problem of scope I guess
+e('li').item lorem ipsum dolor sit amet
- var parent = 'footer'
+e('footer')(role='contentInfo')
+e.inner © Company - 2014
A mixin named e will output an element taken as a parameter (default is div) with its attributes and content as is, except for the first class that'll be prefixed with the value of the variable parent (or will be the value of parent itself if it hasn't any class)
I prefer using default jade syntax for attributes, including class and id than passing many parameters to a mixin (this one doesn't need any if it's a div, as with .sth text'd output <div class="sth>text</div> and +e.sth text will output <div class="parent-sth>text</div>)
Mixin would be shorter if it didn't have to deal with other attributes (href, id, data-*, role, etc)
Remaining problem: changing the value of parent has no effect when it's indented. It had with simpler previous attempts so I guess it's related to scope of variables. You theoretically don't want to change the prefix for child elements but in practice... Maybe as a second optional parameter?
Things I had problem with while playing with jade:
attributes doesn't work as expected. Now it's &attributes(attributes). Thanks to jade-book issue on GitHub
but it'll output class untouched plus the prefixed one, so I had to remove it (delete) in a place it'd be executed by jade
Some thoughts from me: what's wrong with a variable?
- var myModule = 'module'
div(class="#{myModule}")
div(class="#{myModule}-child")
div(class="#{myModule}-child")
or combine it with an each:
- var myModule2 = 'foobar'
div(class="#{myModule2}")
each idx in [0, 1, 2, 3]
div(class="#{myModule2}-child") I'm child #{idx}
Sure, there is much more code to write, but if a change is neccessary then you must do this only at one point.
Ciao
Ralf
You should be able to achieve this with SASS. As long as you have the latest SASS version, you should be able to use the following syntax:
.module {
&-child {
}
}
Have a look at this article for more information on newer features of SASS http://davidwalsh.name/future-sass

transforming tree to sequence of elements

I have something like this
<a>
<b>
<c>1</c>
<d>2<e>3</e></d>
</b>
</a>
and I want to obtain a sequence like this
<a/>
<b/>
<c>1</c>
<d>2<e>3</e></d>
that is, when recursing, every time an element occurs which does not have a text node, I want to output the element as an empty element, whereas every time an element with a text node occurs, I want to output it as it is. Of course, the text nodes in the above input have to be space-normalized.
If I put it through a standard identity transform,
declare function local:copy($element as element()) as element() {
element {node-name($element)}
{$element/#*,
for $child in $element/*
return
if ($child/text())
then $child
else (element {node-name($child)}{$child/#*}, local:copy($child))
}
};
<b> gets reconstructed as a full element (containing <c> and <d>), but if I remove the element constructor, it does not get output at all.
I don't quite get the fourth line in your example output, I'm just guessing what you want is actually this:
<a/>
<b/>
<c>1</c>
<d>2</d>
<e>3</e>
You don't need any functions. Just list all elements, reconstruct one with same name and include it's text node children.
for $element in //*
return element { local-name($element) } { $element/text() }
This version is even shorter, but I think it requires XQuery 3.0, because earlier versions did not allow element constructors in step expressions:
//*/element { local-name(.) } { text() }

Using "And" , "Or" operators in Xquery

let us consider an example,
<li>
<div id="comments-list-245667" class="comments yui-u">
<h3><span class="fn n">Ram</span></h3>
</div>
<div id="comments-list-245687" class="comments yui-u">
<h3><span class="fn n"><a href='http://www.xyz.com' rel='external nofollow' class='url url'>laxman</a></span></h3>
</div>
</li>
Now how to get both "Ram" and "laxman" from the nodes using "and" or "or" operators.
Use:
/*/*/h3/(span[not(*)] | span/a)/string()
This produces a sequence of the string values of every span that doesn't have an element child or of every a that is child of a span and that (the span element is a child of an h3 element, that is a grand-child of the top element of the XML document.
BTW, the above happens also to be a pure XPath 2.0 expression (XPath 2.0 is a subset of XQuery).
Assuming capitalization isn't important (in XML you have "laxman", in your question "Laxman") and you want to get the <li>s (otherwise, "and" wouldn't be reasonable, but if you want, move some axis steps in/out the predicate). Each line is an alternative.
or:
/li[.//h3//*[lower-case(.) = ("ram", "laxman")]]
/li[.//h3//*[lower-case(.) = "ram"] or .//h3//*[lower-case(.) = "laxman"]]
and:
/li[.//h3//*[lower-case(.) = "ram"]][.//h3//*[lower-case(.) = "laxman"]]
/li[.//h3//*[lower-case(.) = "ram"] and .//h3//*[lower-case(.) = "laxman"]]
The first or-alternative exhibits XQuery operator's set semantics.

How to write regular expressions in CSS

How do I use regular expressions in CSS? I found a tutorial here for matching static strings in CSS, but I haven't been able to find one for using regular expressions to match multiple strings in CSS. (I found one here, but I couldn't get it to work. I also looked at the W3C documentation on using regular expressions, but I couldn't make sense of the document.)
I'm want to match a series of <DIV> tags whose ids start at s1 and increase by one (ie. #s1 #s2 #s3...).
I know that div[id^=s], div[id^='s'], and div[id^='s'] each perform the match as I intend it in my CSS. However, each of those also match an id of #section, which I don't want to happen. I believe that "/^s([0-9])+$/" is the equivalent PHP string--I'm just looking for it in CSS version.
There is no way to match elements with a regular expression, even in CSS3. Your best option is probably to simply use a class for your divs.
<style>
.s-div {
// stuff specific to each div
}
</style>
<div id="s1" class="s-div"><!-- stuff --></div>
<div id="s2" class="s-div"><!-- stuff --></div>
<div id="s3" class="s-div"><!-- stuff --></div>
<div id="s4" class="s-div"><!-- stuff --></div>
<div id="s5" class="s-div"><!-- stuff --></div>
Also remember that you can separate multiple class names by a space inside a class attribute.
<div class="class1 class2 class3"></div>
javascript:
/* page scrape the DIV s# id's and generate the style selector */
re=/<div\b[^>]*\b(id=(('|")?)s[0-9]+\2)(\b[^>]*)?>/ig;
alert(
document . body . innerHTML .
match(re) . join("") .
replace(re,"div[$1], ") + "{ styling details here }" );
alert(
("test with <div id=s2 aadf><DIV ID=123> <DIV adf=asf as><Div id='s45'>" +
"<div id=s2a ><DIV ID=s23 > <DIV asdf=as id=S9 ><Div id='x45' >") .
match(re) . join("") .
replace(re,"div[$1], ") + "{ styling details here }"
);
The test yields
div[id=s2], div[id='s45'], div[ID=s23], div[id=S9], { styling details here }
Note the dangling , and the case preserved S9.
If you don't want or can't use the solution posted by #zneak, you could do that editing the labels with javascript, but i'll advice you: It's a hell of work.
The following CSS will select #s0, #s1, ... , #s9 and not #section, though a browser must implement the CCS3 negation :not().
The final selection is equivalent to:
/^s[0-9]?.*[0-9]$/
which says that each id must start with s and a number and end with a number like:
s6, s42, s123, s5xgh7, ...
The :not() line vacuously excludes those ID's that do not start properly using an empty style {}.
<style>
div:not([id^=s0]):not([id^=s1]):not([id^=s2]):not ... :not([id^=s9]) {}
div[id^=s][id$=0], div[id^=s][id$=1], div[id^=s][id$=2], ... div[id^=s][id$=9] { ... }
</style>
CSS3 does not use regular expressions to define selectors BUT ...
CSS Conditional Rules Module Level 3
defines a very specific function, regexp(<string>), that parses a URL with a regular expression when creating an #document rule.
<style>
/* eliminate the alphabet except s - NB some funny characters may be left */
/* HTML id's are case-sensitive - upper case may need exclusion & inclusion */
div[id*=a], div[id*=b], div[id*=c], ..., div[id*=q], div[id*=r] {}
div[id*=t], div[id*=u], div[id*=v], div[id*=w], div[id*=x], div[id*=y], div[id*=z] {}
div[id*='_'], div[id*='-'], div[id*='%'], div[id*='#'] {}
/* s can not be embedded */
div[id*=0s], div[id*=1s], div[id*=2s], ..., div[id*=9s] {}
/* s will be followed by a string of numerals - maybe a funny char or two */
div[id^=s0], div[id^=s1], div[id^=s2], ... div[id^=s9] { ... }
</style>

Resources