Marklogic: what are field range query and path range query cts functions using xquery - xquery

I have been following the documentation to understand cts:field-range-query and cts:path-range-query. These are the links i used to understand.
https://docs.marklogic.com/cts:field-range-query
https://docs.marklogic.com/cts:path-range-query
In cts:path-range-query, i didnt understand the output. How do you compare a string with < or >?
cts:search(doc(),cts:path-range-query("/name/fname",">","Jim"),"filtered")
=>
<?xml version="1.0" encoding="UTF-8"?>
<name><fname>John</fname><mname>Rob</mname><lname>Goldings</lname></name>
<?xml version="1.0" encoding="UTF-8"?>
<name><fname>Ooi</fname><mname>Ben</mname><lname>Fu</lname></name>
In cts:field-range-query, here aswell i didnt get the output.
ts:search(doc(),cts:field-range-query("aname",">","Jim Kurla"));
(:
returns the following:
<?xml version="1.0" encoding="UTF-8"?>
<name>
<fname>John</fname>
<mname>Rob</mname>
<lname>Goldings</lname>
</name>
<?xml version="1.0" encoding="UTF-8"?>
<name>
<fname>Ooi</fname>
<mname>Ben</mname>
<lname>Fu</lname>
</name>
:)
Sorry, if it is silly but i have been trying to understand this little thing since several days but somehow i don't get it. Really appreciate the help

String comparison is based on alphanumeric comparison. It actually depends on the collation, but the default is based on Unicode (UCA Root Collation with case and diacritic sensitivity). A comes before B, but a comes after B, and also alpha comes after Zeta. More confusingly, 10 comes before 2 as well.
In your examples the path query only looks at fname where Jim comes before both John and Ooi.
The second example is likely a field with multiple paths, including fname, mname, and lname. The > satisfies if there is any name value in the document that is larger than Jim. Goldings, Ben, and Fu come before Jim alphabetically, but there are other names like John and Ooi that come after. So that returns both those values as well.
It is more fun to repeat the queries with Lee. The path query will then return 1 result only (the second), but the field is likely still returning both.

Related

BI Publisher conditional field masking

I have the following code on a field in a Peoplesoft BI Publisher RTF template where it is masking the last 4 digits of the Bank Account number.
<?xdofx:lpad('',length(Bank_Account__)-4,'*')?>
<?xdoxslt:rtrim(xdoxslt:right(Bank_Account__,4))?>
The problem is that sometimes the total Bank Account number length is less than 4 digits and when this happens it causes an negative array error on the lpad function to occur.
Can I wrap some kind of conditional IF statement around this where it will check the length of the bank account number and if it is longer than 5 digits than mask the last 4 digits, else (for Bank Account numbers less than 5 digits) just mask the last 2 digits. What would this look like?
Thanks in advance!
EDIT:
I should add that the existing code above is already wrapped in the following IF statement:
<?if#inlines:Bank_Account__!=''?>
So the entire statement is:
<?if#inlines:Bank_Account__!=''?>
<?xdofx:lpad('',length(Bank_Account__)-4,'*')?>
<?xdoxslt:rtrim(xdoxslt:right(Bank_Account__,4))?>
<?end if?>
I would just like to add in the conditional logic to check the bank account length and subsequently perform either of the above masking.
EDIT 2:
Here is my setup with your suggested changes, but I don't think I have the logic nested right, and the syntax may also be an issue.
Edit 3:
Here is the modified code, and the resulting error message:
The if statements can be nested, but since BIP does not have an else clause, the second if conditions has to check for the negative case.
Maybe this might work:
<?if#inlines:Bank_Account__!=''?>
<?if#inlines:string-length(Bank_Account__)>4?>
<?xdofx:lpad('',length(Bank_Account__)-4,'*')?><?xdoxslt:rtrim(xdoxslt:right(Bank_Account__,4))?>
<?end if?>
<?if#inlines:string-length(Bank_Account__)<=4?>
<?xdofx:lpad('','2','*')?><?xdoxslt:rtrim(xdoxslt:right(Bank_Account__,string-length(Bank_Account__)-2))?>
<?end if?>
<?end if?>
Update: Here is a screenshot of what I got:
Here is the xml snippet I used.
<?xml version="1.0"?>
<root>
<record>
<Bank_Account__>123456</Bank_Account__>
</record>
<record>
<Bank_Account__>12345</Bank_Account__>
</record>
<record>
<Bank_Account__>1234</Bank_Account__>
</record>
<record>
<Bank_Account__>123</Bank_Account__>
</record>
<record>
<Bank_Account__>12</Bank_Account__>
</record>
</root>
Download working files from here
There are some more functions available for other ways to implement this requirement.

How to dynamically fetch value using cts:seach in Marklogic?

My Database is having "n" number of documents and i need to search for document dynamically using the elements and value i am providing. I am explaining it below-
Sample documents in my database-
document1-
<root>
<id1>12345</id1>
<value>Country</value>
<node1>somevalue</node1>
<node2>somevalue</node2>
<node3>somevalue</node3>
<node4>somevalue</node4>
.......................
</root>
document2-
<root>
<id2>34567</id2>
<value>Fruits</value>
<node1>somevalue</node1>
<node2>somevalue</node2>
<node3>somevalue</node3>
<node4>somevalue</node4>
.......................
</root>
I need to give input parameters as Rest End Point to perform my operation and the input to rest xml document is as below-
INPUT XML-
<root>
<id>id1</id>
<idvalue>12345</idvalue>
.......................
</root>
Output i need is shown in example-
Example- Search for all the documents from the database which is having Id=Id1 and it's value=12345
Any Suggestions ?
You can explore Query By Example (QBE) of MarkLogic. For more details go to URL https://docs.marklogic.com/guide/search-dev/qbe
XPath can extract the input values for constructing a cts.elementValueQuery().
Something similar to the following should work in SJS:
cts.search(cts.elementValueQuery(
xs.QName(fn.string(input.xpath('/root/id'))),
fn.string(input.xpath('/root/idvalue'))
))
Or similar to the following in XQuery:
cts:search(fn:collection(), cts:element-value-query(
xs:QName(fn:string($input/root/id)),
fn:string($input/root/idvalue)
))
For more information, see http://docs.marklogic.com/cts.elementValueQuery
Hoping that helps,

R parsing plist XML

Sorry, edited with one more little nuance! I had simplified my raw file a little too much in the example I provided, so while your solution works beautifully as-is, what if there are a few extra things thrown into the second line? Those seem to throw off the xml_find_all(page, "//event"), since now it can't find that node. How can I get the script to ignore the extras (or maybe what is the right search term to incorporate them?) Thanks!!!
I'm new to working with xml, and I have some speech xml files that I'm trying to flatten into dataframes in R, but I can't get them to be read using some of the standard functions in the XML package. I think the problem is the plist format, because some of the other answers that I've tried to apply don't work on these files.
My files look as follows (*****second line edited):
<?xml version="1.0" encoding="us-ascii"?>
<event id="111" extraInfo="CivilwarSpeeches" xmlns = "someurl>
<meta>
<title>Gettysburg</title>
<date>1863-11-19</date>
<organizations>
<org>Union</org>
</organizations>
<people>
<person id="0" type="President">Honest Abe</person>
</people>
</meta>
<body>
<section name="Address">
<speaker id="0">
<plist>
<p>Four score and seven years ago</p>
</plist>
</speaker>
</section>
</body>
</event>
And I would like to end up with a dataframe that links some of the info in the two sections, something like
Section|Speaker|Speaker Type| Speaker Name|Body
Address|0 |President | Honest Abe |Four score and seven years ago
I found this answer fairly helpful, but it still can't seem to unpack my data. Parsing XML file with known structure and repeating elements
Any help would be appreciated!
I prefer to use the xml2 library over the xml library.
This is a pretty straight forward problem. Read the data in, parse out the desired attributes and nodes and assemble into a data frame.
library(xml2)
page<-read_xml('<?xml version="1.0" encoding="us-ascii"?>
<event id="111">
<meta>
<title>Gettysburg</title>
<date>1863-11-19</date>
<organizations>
<org>Union</org>
</organizations>
<people>
<person id="0" type="President">Honest Abe</person>
</people>
</meta>
<body>
<section name="Address">
<speaker id="0">
<plist>
<p>Four score and seven years ago</p>
</plist> </speaker> </section> </body> </event>')
#get the nodes
nodes<-xml_find_all(page, "//event")
#parse the requested information out of each node
Section<- xml_attr(xml_find_first(nodes, ".//section"), "name")
Speaker<- xml_attr(xml_find_first(nodes, ".//person"), "id")
SpeakerType<- xml_attr(xml_find_first(nodes, ".//person"), "type")
SpeakerName<- xml_text(xml_find_first(nodes, ".//person"))
Body<- xml_text(xml_find_first(nodes, ".//plist/p"))
#put together into a data.frame
answer<-data.frame(Section, Speaker, SpeakerType, SpeakerName, Body)
The code is set up to parse a series of "event" nodes. For clarity I am using 5 steps to parse out each requested information field out separately and then combine into the final dataframe.
Part of the justification for this is to maintain alignment in case the "event" nodes are missing some of the requested information. This could be simplified, but if your dataset is small, there shouldn't be much of a performance impact.

What is cstyle in XSLT?

My XSLT is shown below.
aic is a namespace.
What is cstyle?
is it a built-in XSLT element/function?
Or an element within the expected input xml?
<xsl:stylesheet exclude-result-prefixes="aic"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:aic="http://ns.adobe.com/AdobeInCopy/2.0/" >
<xsl:template match="/">
</xsl:template>
<xsl:template match="aic:cstyle[contains(#name,'bold')]">
</xsl:template>
</xsl:stylesheet>
It is an element within the expected input XML. The XPaths in an XSLT's match attributes are generally applied to contents from the input XML.
Exactly as in my answer to your previous question, aic:cstyle is a selector that matches elements whose local name is cstyle and whose namespace URI is http://ns.adobe.com/AdobeInCopy/2.0/ (the URI bound to the aic prefix in the xsl:stylesheet element). Thus
<xsl:template match="aic:cstyle[contains(#name,'bold')]">
is a template that will apply to any {http://ns.adobe.com/AdobeInCopy/2.0/}cstyle element that has a name attribute that contains the substring bold. (So, to answer your question directly: the expression in question will match elements in the input streams for which the stylesheet was written.)
As with any new programming language, I would strongly recommend that you find a decent tutorial and work through that to get comfortable with the syntax and idioms of the language through simple examples before you start trying to decode a large and complex XSLT that you've inherited from elsewhere.

Do not include repeated data in facets with MarkLogic

I'm doing a search using facets with the new api search:search but I have the next problem:
My source:
File #1
<root>
<location>
<university>
<name>Yale</name>
<country>USA</country>
</university>
</location>
<location>
<university>
<name>MIT</name>
<country>USA</country>
</university>
</location>
<location>
<university>
<name>Santander</name>
<country>Spain</country>
</university>
</location>
</root>
File #2
<root>
<location>
<university>
<name>MIT</name>
<country>USA</country>
</university>
</location>
</root>
I need to know the number of universities by each country, but the facets return me the number of files that include one country or the number of locations in all files repeat universities, so in the last example of data it returns me this with the 2 options.
First Option (using frequency-order)
USA - 2 (Number of Files with at least one location with USA)
SPAIN - 1
Second Option (Using item-frequency)
USA - 3
SPAIN - 1
When the result should be this:
USA - 2 (because in the two files there are only two universities)
SPAIN - 1
How can I do this???
I think you need the item-frequency option, instead of the default fragment-frequency option. You add it to a constraint as a so-called facet-option. More details, and examples can be found on CMC: http://community.marklogic.com/pubs/5.0/apidocs/SearchAPI.html#search:search
-- edit --
I think I didn't read your question thoroughly enough. The search library focusses on search results, and the facet counts on fragments. Easiest way to improve the counts is by defining the location element as a fragment root. However, I don't think that really returns the numbers you are looking for. The country facet really only counts the country occurrences, and not the universities within countries. You can't achieve that with the search library. It isn't difficult to do it yourself though:
for $country in cts:element-values(xs:QName('country'))
let $universities := cts:element-values(xs:QName('university'), (), cts:element-value-query(xs:QName('country'), $country))
return fn:concat($country, ' - ', fn:count($universities))
Note: Untested code, but it at least shows the essential steps. It also require countries to not occur within same fragments. You need to add location as fragment root in the ML admin interface.
HTH!
Try cts:element-value-co-occurrences with name and country

Resources