splitting up a particular field in Xquery - xquery

I have an input field coming as
<BID>12-ABS-65789345</BID>
I need to adjust the XQuery such way that, I need to capture only last part of the field like after two - symbols.
In above case, I need the output of XQuery as below
<BID>65789345</BID>
Any help here..
Thanks

Assuming your requirement can be interpreted as taking the content after the "last" hyphen, you can take the last item in the sequence formed by splitting the string on hyphen:
let $x := <BID>12-ABS-65789345</BID>
return
<BID>{tokenize($x,'-')[last()]}</BID>
If you always need the content after the second hyphen and you can guarantee there will always be at least two hyphens then you can take the third item after splitting the string:
let $x := <BID>12-ABS-65789345</BID>
return
<BID>{tokenize($x,'-')[3]}</BID>

Related

Custom sorting issue in MarkLogic?

xquery version "1.0-ml";
declare function local:sortit(){
for $i in ('a','e','f','b','d','c')
order by $i
return
element Result{
element N{1},
element File{$i}
}
};
local:sortit()
the above code is sample, I need the data in this format. This sorting function is used multiple places, and I need only element N data some places and only File element data at other places.
But the moment I use the local:sortit()//File. It removes the sorting order and gives the random output. Please let me know what is the best way to do this or how to handle it.
All these data in File element is calculated and comes from multiple files, after doing all the joins and calculation, it will be formed as XML with many elements in it. So sorting using index and all is not possible here. Only order by clause can be used.
XPath expressions are always returned in document order.
You lose the sorting when you apply an XPath to the sequence returned from that function call.
If you want to select only the File in sorted order, try using the simple mapping operator !, and then plucking the F element from the item as you are mapping each item in the sequence:
local:sortit() ! File
Or, if you like typing, you can use a FLWOR to iterate over the sequence and return the File:
for $result in local:sortit()
return $result/File

Numbers in cts:word query in Marklogic

I have a cts:word-query which is having number as the text value.
cts:search(fn:doc(),cts:word-query("226"))
This query will fetch results matching to only 226 in the documents. But I need to get the documents which contain 0026 also.
Example:
This is abc.xml
<a>
<b>00226</b>
</a>
This is abc1.xml
<a>
<b>226</b>
</a>
If I give the query as cts:search(fn:doc(),cts:word-query("226")), it will fetch only abc1.xml and if the query is cts:search(fn:doc(),cts:word-query("00226")), it will fetch only abc.xml.
But I need to get both the documents, irrespective of leading zeros.
Simplest way would be to use a wild card character (*) and add the wildcarded option
cts:search(fn:doc(),cts:word-query("*226", ('wildcarded')))
EDIT:
Although this matches the example documents, as Kishan points out in the comments, the wildcard also matches unwanted documents (e.g. containing "226226").
Since range indexes are not an option in this case because the data is mixed, here is an alternative hack:
cts:search(
fn:doc(),
cts:word-query(
for $lead in ('', '0', '00', '000')
return $lead || "226"))
Obviously, this depends on how many leading zeros there can be and will only work if this is known and limited.
You can add an element range index on the element <b> in the database with scalar type int or long, then you do the following query, it should return both documents:
let $query := cts:element-range-query(xs:QName("b"),"=",00226)
return cts:search(fn:doc(),$query)

RegularExpression Validator For Textbox

In my requirement a Textbox should allow Alphabets,Numeric s, Special Characters,Special Symbols With at least one Alphabet.
I will try like this but i am not getting.
^\d*[a-zA-Z][a-zA-Z0-9#*,$._&% -!><^#]*$
You may want to have 2 regular expression validators; one for validating the allowed characters, and one for validating that at least on alphabet has been provided. You may be able to get at least one, but this way, you can have two separate validation messages to show the user explaining why the input is wrong.
Just match for special characters until you encounter a letter, then match for everything until the end of the string:
^[0-9#*,$._&% -!><^#]*[a-zA-Z0-9#*,$._&% -!><^#]*$
Use lookaheads :
/^(?=.*[a-zA-Z])[\w#*,$.&%!><^#-]*$/
Edit :
I assume the - is meant as the actual - character and not a range of space to !.
I removed the space character. You can of course add it if you want.
[ -!]
Effectively means :
[ -!] # Match a single character in the range between “ ” and “!”
And I have no idea what that range entails!

ASPX attribute regex parsing in c#

I need to find attribute values in an ASPX file using regular expressions.
That means you don't need to worry about malformed HTML or any HTML related issues.
I need to find the value of a particular attribute (LocText). I want to get what's inside the quotes.
Any ASPX tags such as <%=, <%#, <%$ etc. inside the value don't make sense for this attribute therefore are considered as part of it.
The regex I began with looks like this:
LocText="([^"]+)"
This works great, the first group, which is the result text, gets everything except the double quotes, which are not allowed there (&quot ; must be used instead)
But the ASPX file allows using of single quotes - second regular expression must be applied then.
LocText='([^']+)'
I could use these two regular expressions but I'm looking for a way to connect them.
LocText=("([^"]+)"|'([^']+)')
This also works but doesn't seem very efficient as it's creating unnecessary number of groups. I think this could be somehow done by using backreferences, but I can't get it to work.
LocText=(["']{1})([^\1]+)\1
I thought that by this, I save the single/double quote to the first group and then I tell it to read anything that is NOT the char found in the first group. This is enclosed again by the quote from the first group. Obviously, I'm wrong and it's not working like that.
Is there any way, how to connect the first two expressions together creating just a minimum amount of groups with one group being the value of the attribute I want to get? Is it possible using a backreference for the single/double quote value, or have I completely misunderstood the meaning of them?
I'd say your solution with alternation isn't that bad, but you could use named captures so the result will always be found in the same group's value:
Regex regexObj = new Regex(#"LocText=(?:""(?<attr>[^""]+)""|'(?<attr>[^']+)')");
resultString = regexObj.Match(subjectString).Groups["attr"].Value;
Explanation:
LocText= # Match LocText=
(?: # Either match
"(?<attr>[^"]+)" # "...", capture in named group <attr>
| # or match
'(?<attr>[^']+)' # '...', also capture in named group <attr>
) # End of alternation
Another option would be to use lookahead assertions ([^\1] isn't working because you can't place backreferences inside a character class, but you can use them in lookarounds):
Regex regexObj = new Regex(#"LocText=([""'])((?:(?!\1).)*)\1");
resultString = regexObj.Match(subjectString).Groups[2].Value;
Explanation:
LocText= # Match LocText=
(["']) # Match and capture (group 1) " or '
( # Match and capture (group 2)...
(?: # Try to match...
(?!\1) # (unless it's the quote character we matched before)
. # any character
)* # repeat any number of times
) # End of capturing group 2
\1 # Match the previous quote character

combining captures in regex

some text I want to capture. <tag> junk I don't care about</tag> more stuff I want.
Is there a easy way to write a regex that captures the first and third sentences in one capture?
You could also consider stripping out the unwanted data and then capturing.
data = "some text to capture. <tag>junk</tag> other stuff to capture".
data = re.replace('<tag>[^<]*</tag>', data, "")
data_match = re.match('[\w\. ]+', data)
Not to my knowledge. Usually that's why regex search-and-replace functions allow you to refer to multiple capturing groups in the first place.
Unfortunately No, its not possible. The solution is to capture into two seperate captures and then contactenate after the fact.
According to this older thread on this site:
Regular expression to skip character in capture group
A group capture is consecutive so you cant. You can do it in one parse with regex like below and join the line in code
^(?<line1>.*?)(?:\<\w*\>.*?\</\w*\>)(?<line3>.*?)$
here's a non regex way, split on </tag>, go through the array items, find <tag>, then split on <tag> and get first element. eg
>>> s="some text I want to capture. <tag> junk I don't care about</tag> more stuff I want. <tag> don't care </tag> i care"
>>> for item in s.split("</tag>"):
... if "<tag>" in item:
... print item.split("<tag>")[0]
... else:
... print item
...
some text I want to capture.
more stuff I want.
i care
Use the split() function of asp.net to do the same.

Resources