Can I use indexes for speeding up select queries over dateTime ranges in BaseX? - xquery

I would like to ask about the way to speed up the select queries in a Basex database.
I have for example the following xml in a database with many events(650000 approximately)
<EventList>
<Event>
<ID>317849</ID>
<Type>Measurement</Type>
<TimeStamp>2016-03-15T18:00:09.409</TimeStamp>
<Space>BIOCAT</Space>
<SourceID>BIOCAT.TE310A</SourceID>
<Content>
<Measurement>
<value>920</value>
</Measurement>
</Content>
</Event>
<Event>
<ID>317850</ID>
<Type>Measurement</Type>
<TimeStamp>2016-03-15T18:05:09.409</TimeStamp>
<Space>BIOCAT</Space>
<SourceID>BIOCAT.TE310A</SourceID>
<Content>
<Measurement>
<value>920</value>
</Measurement>
</Content>
</Event>
</EventList>
I am retrieving the events with the following code that selects with respect to the datetime of the Timestamp node
for $b in doc('mydb/my.xml')//EventList/Event
let $date_string as xs:string := xs:string($b/TimeStamp/data())
let $date as xs:dateTime := xs:dateTime($date_string)
where $date ge xs:dateTime('"+startdate+"')
and $date le xs:dateTime('"+enddate+"')
and $b/Type='"+EventType+"'
return $b
But it it is very slow it makes one minute to return 60 events.
There are many data in the BaseX database.
How can I speed up the request or my database?

BaseX currently does not have a range index for xs:dateTime, but you can use the text index for getting all events with a given event Type by moving the comparison into the XPath:
for $b in //EventList/Event[Type = 'Measurement']
let $date as xs:dateTime := xs:dateTime($b/TimeStamp)
where $date ge xs:dateTime('2016-03-15T18:00:00.000')
and $date le xs:dateTime('2016-03-15T19:00:00.000')
return $b
In the Info View of the GUI you can see that the text index is applied:
Compiling:
rewriting descendant-or-self step(s)
applying text index for "Measurement"
pre-evaluating "2016-03-15T18:00:00.000" cast as xs:dateTime
pre-evaluating "2016-03-15T19:00:00.000" cast as xs:dateTime
Optimized Query:
for $b_0 in db:text("mydb/my.xml", "Measurement")
/parent::*:Type/parent::*:Event[parent::*:EventList]
let $date_1 as xs:dateTime := $b_0/TimeStamp cast as xs:dateTime?
where (($date_1 ge "2016-03-15T18:00:00")
and ($date_1 le "2016-03-15T19:00:00"))
return $b_0

Related

How return total distinct user in last 30 days dateCreated

There are lots of audit documents in my database like below
<Record>
<objectType>Audit</objectType>
<dateCreated>2017-04-07T03:51:56.231-04:00</dateCreated>
<createdBy>first user</createdBy>
</Record>
How can I get total number of user(createdBy) who created audit file in last 30 days? There are some audit files in which createdBy same so we require distinct value count.
I tried the query below:
let $query := cts:values(cts:element-reference(fn:QName($NS, "createdBy")))
return fn:count($query)
But how can I use condition dateCreated>30 or cts:range-query inside cts:values.
Is there any other way to achieve this?
(I have set up element range index for createdBy and dateCreated)
You can pass the constraining range query as the third argument to cts:count-aggregate() - something along the following lines should work:
let $index := fn:QName($NS, "dateCreated")
let $count := cts:count-aggregate(
cts:element-reference($index),
(),
cts:element-range-query($index, ">",
fn:current-dateTime() - xs:dayTimeDuration("P30D")
)
)
That should give you the total number of dateCreated values for the past 30 days.
For more information, see:
http://docs.marklogic.com/cts:count-aggregate

XQuery - to insert splitted nodes data into another node

<?xml version="1.0" encoding="UTF-8"?>
<Data>
<A><DelInfo>123-20150308-345</DelInfo><OrderNo>11</OrderNo></A>
<A><DelInfo>1204-20150308-355</DelInfo><OrderNo>15</OrderNo></A>
<A><DelInfo>153-20150408-343</DelInfo><OrderNo>10</OrderNo></A>
<A><DelInfo>44345-20150308-341</DelInfo><OrderNo>21</OrderNo></A>
<A><DelInfo>153-20150204-245</DelInfo><OrderNo>1</OrderNo></A>
<A><DelInfo>423-20150311-445</DelInfo><OrderNo>13</OrderNo></A>
..........
</Data>
I receive following XML. The DelInfo node contains a combination of
EmpId, Delivery Date and Receipt No. The OrderNo node contains the
order number wrt the Delivery Information.
The XML is stored in BaseX and I need following report to be generated from the
above XML.
<A><DelInfo>123-20150308-345</DelInfo><OrderNo>11</OrderNo><Report>20150308 - 11</Report></A>
.....
In other word, I want to insert an additional node Report with Date and Order No.
Any idea?
Replace yourdoc with your document name.
for $x in doc('yourdoc')//A
let $d := substring-before(substring-after($x/DelInfo, "-"), "-")
let $o := $x/OrderNo/text()
let $i := <C>{concat($d, " - ", $o)}</C>
return
insert node $i after $x/OrderNo
The inner substring-after() will return the string after the first -. Then, the substring-before() will return the string before the -. This way you will get the Date portion.

Invoking database using xquery giving duplicate values

I am using a XQuery to query database in an OSB project. Consider the
following table:
userId Name Category
------ ------- --------
1 Dheepan Student
2 Raju Student
and the XQuery
let $userName:=fn-bea:execute-sql(
$dataSourceJndiName,
xs:string("NAME"),
xs:string("select NAME from USER where CATEGORY= 'Student'")
)/*:NAME[1]
return <root> {data($userName)} </root>
For this query I am getting the result as <root>Dheepan Raju</root>. But I
need to return only one row even the query returns more than one row like the
following <root>Dheepan</root>. I have used predicate [1] in the query but
no clue why it concatenates the values and returning. Can anybody tell me how
to return only the first row when more than one row is returned.
You need to use proper paranthesis:
let $userName:=(fn-bea:execute-sql(
$dataSourceJndiName,
xs:string("NAME"),
xs:string("select NAME from USER where CATEGORY= 'Student'")
)/*:NAME)[1]
return <root> {data($userName)} </root>

Sum using XQuery

I'm using XQuery to perform addition. Following is the structure of XML saved in database:
<Events>
<Event>
<id>1</id>
<code>1001</code>
<Amount>50,1</Amount>
</Event>
<Event>
<id>1</id>
<code>1002</code>
<Amount>5,5</Amount>
</Event>
<Event>
<id>1</id>
<code>1001</code>
<Amount>50,1</Amount>
</Event>
<Event>
<id>1</id>
<code>1002</code>
<Amount>5,5</Amount>
</Event>
</Events>
I want to get below output by using XQuery: the sum of amount having same code. Please note , is .. I need to replace , by . and the perform arithmetic operation.
<Total>
<1001> 100,2 </1001>
<1002> 11,0 </1002>
</Total>
If your XQuery processor supports XQuery 3.0, use the group by statement.
<Total>
{
for $i in //Event
let $code := $i/code
group by $code
return element {"code"} { attribute {"id"} {$code}, sum($i/Amount)}
}
</Total>
There are two differences to the XML snippets in your question: I changed the floating point seperator to points (which is required, of course you could do this using some XQuery string operations, too) and element names may not consist of numbers only, have a look at the element naming rules. I decided to return the code as id-attribute instead in my example.
This will get you the data as a result set.
declare #X xml
set #X =
'<Events>
<Event>
<id>1</id>
<code>1001</code>
<Amount>50,1</Amount>
</Event>
<Event>
<id>1</id>
<code>1002</code>
<Amount>5,5</Amount>
</Event>
<Event>
<id>1</id>
<code>1001</code>
<Amount>50,1</Amount>
</Event>
<Event>
<id>1</id>
<code>1002</code>
<Amount>5,5</Amount>
</Event>
</Events>'
select T.code,
sum(Amount) as Amount
from
(
select T.X.value('code[1]', 'int') as code,
cast(replace(T.X.value('Amount[1]', 'varchar(13)'), ',', '.') as float) as Amount
from #X.nodes('Events/Event') as T(X)
) as T
group by T.code
The following code will calculate the totals and output the result as XML, but not in your output (which is invalid):
SELECT Code AS 'Code', SUM(Value) AS 'Total'
FROM (
SELECT
CONVERT(DECIMAL(9,2), REPLACE(c.value('Amount[1]', 'VARCHAR(10)'), ',', '.')) AS Value
, c.value('code[1]', 'INT') AS Code
FROM #x.nodes('//Event') AS t(c)
) t
GROUP BY Code
FOR XML PATH('Total'), ROOT('Totals')
where #x is a XML variable containing your data.

Auto increment with XQuery Update?

Does XQuery Update support auto increment attributes, just like auto increment fields in SQL?
I'm using BaseX as my database.
Given an answer from Christian GrĂ¼n on the BaseX mailing list, this is doable when the node one is adding is defined in the XQuery Update statement, and hence can be enhanced using an {enclosed expression} before inserting it:
You might specify the attribute counter within your XML file/database
and increment it every time when you insert an element. A simple
example:
input.xml:
<root count="0"/>
insert.xq:
let $root := doc('input.xml')/root
let $count := $root/#count
return (
insert node <node id='{ $count }'/> into $root,
replace value of node $count with $count + 1
)
I've not been able to achieve the same with an external org.w3c.dom.Document created in Java, and added to the XML database using XQJ and declare variable $doc external. Here, one might be tempted to update the "auto-increment" data after adding the document. However, the processing model defines that changes are not visible until all the commands have been queued (the Pending Update List). Hence a new document or node is, by definition, simply not visible for updates in the same FLWOR expression. So:
db:add('db1', '<counters comment-id="0"/>', 'counters')
...followed by repetitive executions of the following, will NOT work:
let $doc := document{ <note id=""><text>Hello world</text></note> }
let $count := /counters/#comment-id
return (
db:add('db1', $doc, 'dummy'),
replace value of node $count with $count + 1
for $note in /note[#id='']
return replace value of node $note/#id with $count (: WRONG! :)
)
Above, the last inserted document will always have <note id="">, and will not be updated until the next document is added. (Also, it would not work when somehow multiple documents with <note id=""> would exist.)
Though in the example above one could successfully delete the for $note in ... part and use:
let $doc := document{ <note id="{ $count }"><text>Hello world</text></note> }
...I had no luck setting <note id="{ $count }"> in the Document in the Java code, as that enclosed expression would not be replaced then.
Finally, some state for similar solutions:
[...] perform badly as it will lock out concurrent updates. You should consider using xdmp:random() to generate a 64 bit random number for your unique identifier.
In our case, the id would also be used in URLs; not too nice then.
See also XRX/Autoincrement File ID and Using XQuery to return Document IDs.
This would depend on the implementation of the underlying data-store, because the auto-increment attribute is on the column definition in relational databases.
Probably, "yes".

Resources