I have a movie database and want to search for actors with last and/or first name. The goal is to get as list the actors name and the title as well as the role name the actor played in the movie.
The XML movie database looks like this:
<movies>
<movie>
<title>A History of Violence</title>
<year>2005</year>
<country>USA</country>
<genre>Crime</genre>
<summary>Tom Stall, a humble family man and owner of a
popular neighborhood restaurant, lives a quiet but
fulfilling existence in the Midwest. One night Tom
foils a crime at his place of business and, to his
chagrin, is plastered all over the news for his
heroics. Following this, mysterious people follow
the Stalls' every move, concerning Tom more than
anyone else. As this situation is confronted, more
lurks out over where all these occurrences have
stemmed from compromising his marriage, family
relationship and the main characters' former
relations in the process.</summary>
<director>
<last_name>Cronenberg</last_name>
<first_name>David</first_name>
<birth_date>1943</birth_date>
</director>
<actor>
<first_name>Vigo</first_name>
<last_name>Mortensen</last_name>
<birth_date>1958</birth_date>
<role>Tom Stall</role>
</actor>
<actor>
<first_name>Maria</first_name>
<last_name>Bello</last_name>
<birth_date>1967</birth_date>
<role>Eddie Stall</role>
</actor>
<actor>
<first_name>Ed</first_name>
<last_name>Harris</last_name>
<birth_date>1950</birth_date>
<role>Carl Fogarty</role>
</actor>
<actor>
<first_name>William</first_name>
<last_name>Hurt</last_name>
<birth_date>1950</birth_date>
<role>Richie Cusack</role>
</actor>
</movie>
Actually I have the following code and it works so far but for example for the query with last_name=Dunst I get as result:
1. Dunst, Kirsten
movie title as role
2. Dunst, Kirsten
movie title as role
but I want to have the actor just one time so I tried to add distinct-values() but it doesn´t work :(
I would like to have the output like this:
1. Dunst, Kirsten
movie title as role
movie title as role
Here is the code:
xquery version "3.0";
declare option exist:serialize "method=xhtml media-type=text/html indent=yes";
let $last_name := distinct-values(request:get-parameter('last_name', ''))
let $first_name := distinct-values(request:get-parameter('first_name', ''))
let $movies := collection('/db/Movie/data')/movies/movie/actor[if(not($last_name)) then xs:boolean(1) else equals(last_name, $last_name)][if(not($first_name)) then xs:boolean(1) else equals(first_name, $first_name)]
return
<html>
<head>
</head>
<body>
<h1>Search results for actor {$last_name} {$first_name}:</h1>
<ol>{
for $movie in $movies
let $title := $movie/../title/text()
let $role := $movie/role/text()
return
<li>{$movie/last_name/text()}, {$movie/first_name/text()} <p> In the movie <i>{$title}</i> as role <i>{$role}</i> </p></li>
}</ol>
</body>
</html>
Hope someone can help me ;)
Thanks in advance!
The variable $movies is bound to a sequence of actor elements, which causes some confusion.
If you only intend to use this for one actor at a time, you can simply put the actor's name prior to the FLWOR expression, and get your intended output:
<ol>
<li>
<p>{ $movies[1]/last_name }, { $movies[1]/first_name}</p>
{
for $movie in $movies
let $title := $movie/../title
let $role := $movie/role
return
<p>In the movie <i>{$title}</i> as role <i>{$role}</i></p>
}</li>
</ol>
Note: the text() path selector is unnecessary in this case, and occasionally confusing as it can return a sequence of text nodes. If you need to ensure a type constraint, consider using fn:string() instead.
Related
I have a listPers.xml (TEI List containing persons, obviously ) . I want to write a function to update the listPers.xml
My function looks like this:
declare function app:addPerson($node as node(), $model as map(*)) {
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
let $list := doc(concat($config:app-root, '/resources/listPers_test.xml'))
return
update insert $person into $list//tei:listPerson
};
And the listPerson.xml
looks more or less like a typical list with person-entries
I have a tei:header (here omitted) followed by
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
</body>
</text>
</TEI>
(sorry for ruining indentions, it's just an excerpt )
I do not get an error, which means that my app:addPerson should be fine, right?
I want the listPers_test to look like this:
<text>
<body>
<listPerson xml:id="person">
<person xml:id="abbadie_jacques">
<persName ref="http://d-nb.info/gnd/100002307">
<forename>Jacques</forename>
<surname>Abbadie</surname>
</persName>
<note>Prediger der französisch-reformierten Gemeinde in <rs type="place" ref="#berlin">Berlin</rs>
</note>
</person>
<!-- here comes the output that I wish to have :-) -->
<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>
</body>
</text>
</TEI>
In the long run, I aim for an html-form that allows users to input names etc., where ids are generated using sth like
to-lowercase(concat($surname, "_", $forename));
But I will not get into my questions regarding forms and xquery, as I have barely done a quick Google-trip regarding html forms and xquery!
Can anyone hint me at why I do not get the listPers_test.xml file updated with the second value? :-)
All the best and thanks in advance to everyone,
K
Alright, I have a solution for anyone interested in it:
My first snippet $person:= ... contains a STRING, not an element.Changing the line
let $person := "<person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></person>"
to this one actually solves the issue:
let $person := <tei:person xml:id=""><persName><forename>Albert</forename><surname>Test</surname></persName></tei:person>
At the moment in Xquery 3.1 (in eXist 4.7) I receive XML fragments that look like the following (from eXist's Lucene full text search):
let $text :=
<tei:text>
<front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
<tei:listBibl>
</tei:div>
<tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
<tei:body>
<tei:text>
Currently I have Xquery send this fragment to an XSLT stylesheet in order to transform it into HTML like this:
<td>...elements which are subject <span class="search-hit">to</span> a Lucene search and often p...
Where the stylesheet's job is to return 30 characters of text before and after <exist:match/> and put the content of <exist:match/> into a span. There is only one <exist:match/> per transformation.
This all works fine. However, it's occurred to me that it is a very small job with effectively a single transformation of only one element, the rest being a sort of string-join. I therefore wonder if this can't be done efficiently in Xquery.
In trying to do this, I'm can't seem to find a way to handle the string content up to the <exist:match/> and then the string content after <exist:match/>. My idea is, in pseudo code, to output a result like:
let $textbefore := some function to get the text before <exist:match/>
let $textafter := some function to get text before <exist:match/>
return <td>...{$textbefore}
<span class="search-hit">
{$text//exist:match/text()}
</span> {$textafter}...</td>
Is this even worth doing in Xquery vs the current Xquery -> XSLT pipeline I have?
Many thanks.
I think it can be done as
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare namespace tei = "http://example.com/tei";
declare namespace exist = "http://example.com/exist";
declare option output:method 'html';
let $text :=
<tei:text>
<tei:front>
<tei:div>
<tei:listBibl>
<tei:bibl>There is some</tei:bibl>
<tei:bibl>text in certain elements</tei:bibl>
</tei:listBibl>
</tei:div>
<tei:div>
<tei:listBibl>
<tei:bibl>which are subject <exist:match>to</exist:match> a Lucene search</tei:bibl>
<tei:bibl></tei:bibl>
</tei:listBibl>
</tei:div>
</tei:front>
<tei:body>
<tei:p>and often produces</tei:p>
<tei:p>a hit.</tei:p>
</tei:body>
</tei:text>
,
$match := $text//exist:match,
$text-before-all := normalize-space(string-join($match/preceding::text(), ' ')),
$text-before := substring($text-before-all, string-length($text-before-all) - 30),
$text-after := substring(normalize-space(string-join($match/following::text(), ' ')), 1, 30)
return
<td>...{$text-before}
<span class="search-hit">
{$match/text()}
</span> {$text-after}...</td>
which is not really much of a query in XQuery either but just some XPath selection plus some possibly expensive string joining and extraction on the preceding and following axis.
When I try to execute my Xquery Code on xml file, I am getting multiple results in one of my fields.
Here is my xml file
<Actors>
<Actor name="NTR">
<Movie TITLE="Yamadonga" Director="Rajamouli"></Movie>
<Movie TITLE="AADI" Director="VV vinayak">
</Movie>
</Actor>
<Actor name="Rajeev">
<Movie TITLE="Yamadonga" Director="Rajamouli" ></Movie>
</Actor>
<Actor name="mahesh">
<Movie TITLE="pokiri" Director="puri">
</Movie>
</Actor>
my xquery file
<Director>
{
for $Movie in doc("actors.xml")/Actors/Actor/Movie
return
if($Movie/#TITLE=$title)
then
data($Movie/#Director)
else()
}
</Director>
Most importantly, my result
<movies>
<movie>
<Title>Yamadonga</Title>
<Actor>NTR</Actor>
<Actor>Rajeev</Actor>
<Director>Rajamouli Rajamouli</Director>
</movie>
</movies>
How to get only one value in the director field?
My procedure :-
I ran the distinct values function over (../Movie/#TITLE) and that gave me the answer for displaying title. But as title and director are attributes of movie, I cannot access one using the other. When I iterate over actor, as there are two actors having a single director for single movie, the director name gets printed twice. When I iterate over movie, I cannot use distinct-values over it as it is not an attribute.
Your XQuery is really not very efficient or easily readable. You can do a simple xpath:
<Director>
{
data((doc("actors.xml")/Actors/Actor/Movie[#TITLE = $title])[1]/#Director)
}
</Director>
It's because the for is returning 2 movies. Why don't you just use an XPath with distinct-values()?
<Director>
{
distinct-values(doc("actors.xml")/Actors/Actor/Movie[#TITLE=$title]/data(#Director))
}
</Director>
Happy New Year to All!
I am learning XQuery with BaseX and face the following problem now.
I am parsing the factbook.xml file which is the part of the distribution.
The following query runs ok:
for $country in db:open('factbook')//country
where $country/#population < 1000000 and $country/#population > 500000
return <country name="{$country/name}" population="{$country/#population}">
{
for $city in $country/city
let $pop := number($city/population)
order by $pop descending
return <city population="{$city/population/text()}"> {$city/name/text()}
</city>
}
</country>
but while trying to generate a html running the second query - if I try to put the "{$country/#population}" in the <h2>Country population: </h2> tag I see an error message "Attribute must follow the root element".
<html><head><title>Some Countries</title></head><body>
{
for $country in db:open('factbook')//country
let $pop_c := $country/#population
where $pop_c < 1000000 and $pop_c > 500000
return
<p>
<h1>Country: {$country/name/text()}</h1>
<h2>Country population: #error comes if I put it here!#</h2>
{
for $city in $country/city
let $pop := number($city/population)
order by $pop descending
return ( <h3>City: {$city/name/text()}</h3>,
<p>City population: {$city/population/text()}</p>
)
}
</p>
}
</body></html>
Where is my mistake?
Thank you!
Just using:
{$country/#population}
copies the attribute population in the result. An attribute should follow immediately an element (or other attributes that follow the element) -- but this one follows a text node and this causes the error to be raised.
Use:
<h2>Country population: {string($country/#population)} </h2>
When you write {$country/#population}, you do not insert the text of the population attribute, but the attribute itself. If you did not had the "Country population text before it", using {$country/#population} would create something like`
If you want its value, use:
{data($country/#population)}
Or
{data($pop_c)}
since you have already have it in a variable. (the number or string functions can also be used instead of data, but I think data is the fastest)
Hi I am new to marklogic and in Xquery world. I am not able to think of starting point to write the following logic in Marklogic Xquery. I would be thankful if somebody can give me idea/sample so I can achieve the following:
I want to Query A.XML based on a word lookup in B.XML. Query should produce C.XML. The logic should be as follows:
A.XML
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
</root>
B.XML
<WordLookUp>
<companies>
<company name="Vodafone">Vodafone</company>
<company name="Nokia">Nokia</company>
</companies>
<topics>
<topic group="Sports">Cricket</topic>
<topic group="Entertainment">HBO</topic>
<topic group="Finance">GDP</topic>
</topics>
<moods>
<mood number="4">Growth</mood>
<mood number="-5">Depression</mood>
<mood number="-3">Recession</mood>
</moods>
C.XML (Result XML)
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Creicket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>
Search each company/text() of A.xml in B.xml, if match found create tag:
TAG {company count="Number of occurrence of that word"}company/#name
{/company}
Search each topic/text() of A.xml in B.xml, if match found create tag
TAG {topic topic="Number of occurrences of that word"}topic/#group{/topic}
Search each mood/text() of A.xml in B.xml, if match found
[occurrences of first word * {/mood[first word]/#number}] + [occurrences of second word * {/mood[second word]/#number})]....
get the word count of element.
This was a fun one, and I learned a few things in the process. Thanks!
Note: to get the results you wanted, I fixed a typo in A.xml ("Creicket" -> "Cricket").
The following solution uses two MarkLogic-specific functions:
cts:highlight (for replacing matching text with nodes which you can then count)
cts:tokenize (for breaking up a given string into word, space, and punctuation parts)
It also includes some powerful magic specific to those two functions, respectively:
the dynamic binding of the special variable $cts:text (which isn't really necessary for this particular use case, but I digress), and
the data model extension which adds these subtypes of xs:string:
cts:word,
cts:space, and
cts:punctuation.
Enjoy!
xquery version "1.0-ml";
(: Generic function using MarkLogic's ability to find query matches within a single node :)
declare function local:find-matches($content, $search-text) {
cts:highlight($content, $search-text, <MATCH>{$cts:text}</MATCH>)
//MATCH
};
(: Generic function using MarkLogic's ability to tokenize text into words, punctuation, and spaces :)
declare function local:get-words($text) {
cts:tokenize($text)[. instance of cts:word]
};
(: The rest of this is pure XQuery :)
let $content := doc("A.xml")/root/content,
$lookup := doc("B.xml")/WordLookUp
return
<root>
{$content}
<updatedElement>
<companies>{
for $company in $lookup/companies/company
let $results := local:find-matches($content, string($company))
where exists($results)
return
<company count="{count($results)}">{string($company/#name)}</company>
}</companies>
<mood>{
sum(
for $mood in $lookup/moods/mood
let $results := local:find-matches($content, string($mood))
return count($results) * $mood/#number
)
}</mood>
<topics>{
for $topic in $lookup/topics/topic
let $results := local:find-matches($content, string($topic))
where exists($results)
return
<topic count="{count($results)}">{string($topic/#group)}</topic>
}</topics>
<word-count>{
count(local:get-words($content))
}</word-count>
</updatedElement>
</root>
Let me know if you have any follow-up questions about how all the above works. At first, I was inclined to use cts:search or cts:contains, which are the bread and butter for search in MarkLogic. But I realized that this example wasn't so much about search (finding documents) as it was about looking up matching text within an already-given document. If you needed to extend this somehow to aggregate across a large number of documents, then you'd want to look into the additional use of cts:search or cts:contains.
One final caveat: if you think your content might have <MATCH> elements already, you'll want to use a different element name when calling cts:highlight (a name which you can guarantee won't conflict with your content's existing element names). Otherwise, you'll potentially get the wrong number of results (higher than the accurate count).
ADDENDUM:
I was curious if this could be done without cts:highlight, given that cts:tokenize already breaks up the text into all the words for you. The same result is produced using this alternative implementation of local:find-matches (provided you swap the order of the function declarations because one depends on the other):
(: Find word matches by comparing them one-by-one :)
declare function local:find-matches($content, $search-text) {
local:get-words($content)[cts:stem(.) = cts:stem($search-text)]
};
It uses cts:stem to normalize the given word to its stem, so, for example searching for "pass" will match "passed", etc. However, this still won't work for multi-word (phrase) searches. So to be safe, I'd stick with using cts:highlight, which, like cts:search and cts:contains, can handle any cts:query you give it (including simple word/phrase searches like we do above).
Might make sense to step back and ask if you might be better served modeling your data and or documents for use with a document oriented database instead of an rdbms
This is simpler/shorter and fully compliant XQuery not containing any implementation extensions, which make it work with any compliant XQuery 1.0 processor:
let $content := doc('file:///c:/temp/delete/A.xml')/*/*,
$lookup := doc('file:///c:/temp/delete/B.xml')/*,
$words := tokenize($content, '\W+')[.]
return
<root>
{$content}
<updatedElement>
<companies>
{for $c in $lookup/companies/*,
$occurs in count(index-of($words, $c))
return
if($occurs)
then
<company count="{$occurs}">
{$c/text()}
</company>
else ()
}
</companies>
<mood>
{
sum($lookup/moods/*[false or index-of($words, data(.))]/#number)
}
</mood>
<topics>
{for $t in $lookup/topics/*,
$occurs in count(index-of($words, $t))
return
if($occurs)
then
<topic count="{$occurs}">
{data($t/#group)}
</topic>
else ()
}
</topics>
<word-count>{count($words)}</word-count>
</updatedElement>
</root>
When applied on the provided files A.xml and B.XML (contained in the local directory c:/temp/delete), the wanted, correct result is produced:
<root>
<content> The state passed its first ban on using a handheld cellphone while driving in 2004 Nokia Vodafone Nokia Growth Recession Cricket HBO</content>
<updatedElement>
<companies>
<company count="1">Vodafone</company>
<company count="2">Nokia</company>
</companies>
<mood>1</mood>
<topics>
<topic count="1">Sports</topic>
<topic count="1">Entertainment</topic>
</topics>
<word-count>22</word-count>
</updatedElement>
</root>