I have been doing a homework exercise in Xquery and I am currently stuck. The assignment was to generate infromation in regards to the two continents, that after 50 years would have the largest respectively smallest population increase. All I have left is to take the min and max, it is all saved into $minAndMaxCont and it looks like this:
<Continent name="asia" pop="4243769598" futurePop="7255593125" increase="3011823527" ratio="1.709704770122159681"/>
<Continent name="africa" pop="1043912572" futurePop="3405022718" increase="2361110146" ratio="3.261789166382412339"/>
<Continent name="america" pop="955621605" futurePop="1510928928" increase="555307323" ratio="1.581095404388643976"/>
<Continent name="australia" pop="93146473" futurePop="156995765" increase="63849292" ratio="1.685471923343785653"/>
<Continent name="europe" pop="633227105" futurePop="693248396" increase="60021291" ratio="1.094786357889717939"/>
So what I want to do is to extract the minumum and maximum value in regards to "increase" which seems simple enough. But I do not seem to get it to work, I have tried a lot of different approaches, one such approach being using the max and min functions by looking at other threads as guides.
One thread that I followed was this one:
How can I use XPath to find the minimum value of an attribute in a set of elements?
From there I took this code:
let $xml := <foo>
<bar id="1" score="192" />
<bar id="2" score="227" />
<bar id="3" score="105" />
</foo>
let $min := min($xml/bar/#id)
let $max := max($xml/bar/#id)
return $max
And this works perfectly fine, it will return the min/max value (I can use both Xquery and Datapath solutions btw). However, when I attempt to do something similar inside of my own collection of data, like this:
let $incMin := min($minAndMaxCont/#increase)
return $incMin
The generated result becomes this (It behaves the same way with max() too):
3.011823527E9
2.361110146E9
5.55307323E8
6.3849292E7
6.0021291E7
So instead of extracting a minumum (or maximum) value it converts the whole list into another form and does nothing with it. I really want to get this to work, and also I am genuinely curious as to why it converts the entries into another form instead of extracting the max value. I would very much appreciate any help.
//With kind regards.
Related
I have defined $today = current system date.
Objective: To find out the value of closest "date" element from today and get the corresponding "price_per_share" value based on below xml. FYI, it does not matter whether the date is in past or in future when calculating the "closest date".
How do I do this?
I was thinking of
Calculating the difference between today and each "date" value
Then adding "difference" as the child of "details" for each "details" node
Then sort by the smallest "duration"
Then extract the first occurrence of "price_per_share" value.
But somehow adding a child does not seem to work for me either.
<test_record>
<details>
<date>2013-01-16</date>
<currency>USD</currency>
<shares>48</shares>
<price_per_share>20</price_per_share>
</details>
<details>
<date>2018-05-28</date>
<currency>USD</currency>
<shares>49</shares>
<price_per_share>30</price_per_share>
</details>
<details>
<date>2018-10-25</date>
<currency>USD</currency>
<shares>50</shares>
<price_per_share>40</price_per_share>
</details>
<details>
<date>2018-05-02</date>
<currency>USD</currency>
<shares>51</shares>
<price_per_share>60</price_per_share>
</details>
Can't use "map" as I am restricted to use xquery 1.0.
The steps you describe sound good, except for the second one. There is no need to append anything as you don't actually want to modify the nodes. In fact, you can't do this with XQuery alone (XQuery Update is another specification which is intended for this). Also, there is no need to use a map here. Your logic sounds like you know procedural programming, but the concept of a functional programming language (XQuery being one) is quite different, so you might want to familiarize yourself with it.
Regarding you question: You first calculate the difference between the date and todays date. As you said it shouldn't matter whether this is in the past or the future, so we have to calculate the absolute value of the duration. To do this we first divide by seconds and thus get the number of seconds for this duration. As we now have a number we can get the absolute value in $abs-diff. We then order by this value, get the first element of the sequence and return price_per_share
let $today := fn:current-date()
return (
for $detail in //details
let $diff := xs:date($detail/date) - $today
let $abs-diff := abs($diff div xs:dayTimeDuration('PT1S'))
order by $abs-diff
return $detail
)[1]/price_per_share
Imagine if I have an xml document stored in Marklogic in the following format:
<document>
<id>DocumentID</id>
<questions>
<question_item>
<question>question1</question>
<answer>answer1</answer>
</question_item>
<question_item>
<important>high</important>
<question>question2</question>
<answer>answer2</answer2>
<question_item>
</document>
Basically, each document has a number of questions, only some of them have an element. I want to return all of the "important" questions in a flat format with metadata taken from the document it is pulled from (e.g., id).
The following xquery seems to work, and is reasonably fast:
for $x in cts:search(/document,
cts:element-query(xs:QName("important"),cts:and-query((
))
), "unfiltered" , 0.0)
return for $y in $x/questions/question_item
return
if ($y/important) then
fn:concat($x/id,'|',
$y/question,'|',
$y/answer,
$y/important
)
else ()
This seems to work and is reasonably fast. However, I usually find that the for loops are not the fastest way to work in xquery. The solution does seem to be a relatively cumbersome approach. Is there a better way to return just the "important" nodes initially, but then still have access to the main document elements?
Personally, I find conditional logic more cumbersome than for loops, but I think you can remove one of each for a simpler query. Instead of looping over the first sequence of documents, you can simply assign them to a variable, which will allow you to reference them. Then in your loop, use a predicate to constrain question_item to those with important elements, eliminating the need for the conditional:
let $documents := cts:search(/document,
cts:element-query(xs:QName("important"), cts:and-query(())
), "unfiltered" , 0.0)
for $y in $documents/questions/question_item[important]
return fn:concat($x/id,'|',
$y/question,'|',
$y/answer,
$y/important)
As in the code sample, the optimal approach is to match documents first based on the indexes and then to extract values from the matched documents. A FLWOR expression with non-redundent XPaths is an efficient way to extract values from a document.
One possible improvement would be to take a more fine-grained approach in modelling the documents: that is, to put each question item in a separate document. That way, the search will retrieve only the question items that are important.
That change would become important if the documents are large. For maximum performance, you could then put range indexes on the question, answer, and important elements and get one tuple for each question item directly from the indexes.
If the specific list of question items is usually retrieved and updated together, however, that would argue against splitting out each question as a separate document.
Hoping that helps,
I want to know if there is a way to check for the last element in a fusion for_each loop (in order to apply special code for this case)
Edit : Maybe a better question should be :
I have played with fusion::for_each, now I want to apply code on each element of a fusion sequence with special code (special code does not mean "extra code" but different code) for the last element. May be I should use iterators (an example please)?
Some ideas:
1) use boost::fusion::fold, count your way though, and on the last one, perform your edit
2) if all types in the tuple are heterogenous, match on type to determine last one
3) include some sort of marker for the last one on which you can match
4) use the 'prior(end(v))' operators to manipulate the last element when for_each processing is complete
I have a sorted list with 3 columns, and I'm searching to see if the second column matches 2 or 4, then returning the first column's element if so, and putting that into a function.
noOutliers((L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])
when nothing matches the condition. I get a
Error in ((L1LeanList[order(L1LeanList[, 1]), ])[1, ])[(L1LeanList[order(L1LeanList[, :
incorrect number of dimensions
due to the fact that we effectively have List[List[all false]]
I can't just sub out something like L1LLSorted<-(L1LeanList[order(L1LeanList[,1]),]
and use L1LLSorted[,2] since this returns an error when the list is of length exactly 1
so now my code would need to look like
noOutliers(ifelse(any((L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4)),0,
(L1LeanList[order(L1LeanList[,1]),])[(L1LeanList[order(L1LeanList[,1]),2]==2)|
(L1LeanList[order(L1LeanList[,1]),2]==4),1])))
which seems a bit ridiculous for the simple thing I'm requesting.
while writing this I realized that I can end up putting all this error checking into the noOutliers function itself so it looks like
noOutliers(L1LeanList,2,2,4) which will look much better, a necessity since slightly varying versions of this appear in my code dozens of times. I can't help but wonder, still, if theres a more elegant way to write the actual function.
for the curious, noOutliers finds a mean of the 30th-70th percentile in the sorted data set like so
noOutliers<-function(oList)
{
if (length(oList)<=20) return ("insufficient data")
cumSum<-0
iterCount<-0
for(i in round(length(oList)*3/10-.000001):round(length(oList)*7/10+.000001)+1)#adjustments deal with .5->even number rounding r mishandling
{ #and 1-based indexing (ex. for a list 1-10, taking 3-7 cuts off 1,2,8,9,10, imbalanced.)
cumSum<-cumSum+oList[i]
iterCount<-iterCount+1
}
return(cumSum/iterCount)
}
Let's see...
foo <- bar[(bar[,2]==2 | bar[,2]==4),1]
should extract all the first-column values you want. Then run whatever function you want on foo perhaps with the caveat "if (length(foo) < 1) then {exit, or skip, or something} "
Given the following query
let $a := xs:dateTime("2012-01-01T00:00:00.000+00:00")
let $b := xs:dateTime($a)
let $c := xs:dateTime($a cast as xs:string)
(: cannot - don't know how to - execute the function without assignment :)
let $d := adjust-dateTime-to-timezone($a, xs:dayTimeDuration("PT1H"))
return (<a>{$a}</a>,<b>{$b}</b>,<c>{$c}</c>)
the output is as follows
<a>2012-01-01T01:00:00+01:00</a>
<b>2012-01-01T01:00:00+01:00</b>
<c>2012-01-01T00:00:00Z</c>
Based on XQuery's documentation on constructor functions (the constructor function for a given type is used to convert instances of other atomic types into the given type) this is the expected behaviour. Calling xs:dateTime($a) simply returns $a as there is no need to cast, but xs:dateTime($a cast as xs:string) creates a new xs:string from $a first. However this requires an extra conversion.
Is there any other way to tackle this problem? Or conversions are cheap and I shouldn't care?
(If it makes any difference my XQuery processor is BaseX 7.2.)
It seems it does a make a difference that I'm using BaseX. I've really thought that this is the way the xs:dateTime constructor function and the adjust-dateTime-to-timezone function should be working, this is why I misinterpreted the XQuery documentation.
Given the input I've been given by Dimitre and Ranon it seems the problem described is gone.
By the why my use case is, or more like it was, that I wanted to make a date-time interval based query against my XML data set's date-time element. Because the input parameters and the source date-time values used different time-zones I had to make time-zone corrections with the above function, which modified its input parameter (the original source date-time in my case), however I wanted to preserve the original value. Given the function's name adjust-dateTime I thought that it's okay that it modifies its argument, so I automatically thought that I had to copy my original value using a constructor function to be able to keep the original date-time value.
Looks like you ran into some really weird bug.
Your line 5 shouldn't change $a-c at all as XQuery is a functional programming language with immutable variables (adjust-dateTime-to-timezone should not change your variables) and without side effects. Thats why you were forced to assign $d, otherwise your calculated results directly would have been thrown away.
I just submitted some bug request. Zorba is doing your query right, you can use it for understanding the problem.
BaseX as you preferred XQuery processor will do within few days, too. I or some other BaseX team member will trigger you here as soon as it's fixed.
I guess your problem arised from missunderstanding and wrong behaviour of BaseX and should be solved. Feel free to ask again if anything stayed unclear with your query.
The output that is reported is incorrect.
The correct output (produced running Saxon under oXygen) is:
<a>2012-01-01T00:00:00Z</a>
<b>2012-01-01T00:00:00Z</b>
<c>2012-01-01T00:00:00Z</c>
The adjust-dateTime-to-timezone() function, as any other function cannot modify its arguments -- its effect is only contained in the variable $d -- which you don't use in the return clause.