How to send non xml (for example plain text) content in xquery codes in marklogic server?
I have seen that whatever we write, the output should always be xml format.
You can use the xdmp:set-response-content-type function , e.g. xdmp:set-response-content-type("text/plain"), see the official doc
XQuery module output can be XML, or text, or binary, or any combination of those three. Here is a valid main module, yielding text (technically a string item, but if I needed a text node I could wrap it with the text constructor):
xquery version "1.0-ml";
"hello world"
This module yields binary data:
xquery version "1.0-ml";
binary { xs:hexBinary("deadbeef") }
A module can also yield a sequence:
xquery version "1.0-ml";
"hello", "world"
Related
I have a $text = "Hello 😀😃😄 💜 🙏🏻 🦦üäö$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
Expected result : "Hello üäö$"
i tried to use:
replace($text, '\p{IsEmoticons}+', '')
but didn't work.
it just removed smiley's
Result now: "Hello 💜 🙏🏻 🦦üäö$"
Expected result : "Hello üäö$"
Thanks in advance :)
I outlined the approach in my answer to the original question, which I updated based on your comment asking about how to strip out 💜.
Quoting from that expanded answer:
The "Emoticons" block doesn't contain all characters commonly associated with "emoji." For example, 💜 (Purple Heart, U+1F49C), according to a site like https://www.compart.com/en/unicode/U+1F49C that lets you look up Unicode character information, is from:
Miscellaneous Symbols and Pictographs, U+1F300 - U+1F5FF
This block is not available in XPath or XQuery processors, since it is neither listed in the XML Schema 1.0 spec linked above, nor is it in Unicode block names for use in XSD regular expressions—a list of blocks that XPath and XQuery processors conforming to XML Schema 1.1 are required to support.
For characters from blocks not available in XPath or XQuery, you can manually construct character classes. For example, given the purple heart character above, we can match it as follows:
replace("Purple 💜 heart", "[🌀-🗿]", "")
This returns the expected result:
Purple Heart
This approach can be applied to 🙏🏻 , 🦦, or any other character:
Locate the character's unicode block.
Craft your regular expression with the block name (if available in XPath) or character class.
Alternatively, rather than locating the blocks of characters you want to strip out, you could identify the blocks of characters you want to preserve. For example, given the example string in the original post, perhaps the goal is to preserve only those characters in the "Basic Latin" block. To do so, we can match characters NOT in this block via the \P Category Escape:
xquery version "3.1";
let $text := "Hello 😀😃😄 💜 🙏🏻 🦦üäö$"
return
replace($text, "\P{IsBasicLatin}", "")
This query returns:
Hello $
Notice that this has stripped out the characters with diacritics, which perhaps isn't desired. These characters with diacritics belong to the Latin-1 Supplement block. To preserve characters from both the Latin and Latin-1 Supplement blocks, we'd need to adjust the query as follows:
xquery version "3.1";
let $text := "Hello 😀😃😄 💜 🙏🏻 🦦üäö$"
return
replace($text, "[^\p{IsBasicLatin}\p{IsLatin-1Supplement}]", "")
... which returns:
Hello üäö$
This now preserves the characters with diacritics.
To be precise about the characters you preserve or remove, you need to consult the Unicode blocks and charts.
When trying to use the oXygen editor to comment out a node inside of an element oXygen simply wrapped it into (:<foo>foo 1</foo>:), but I then found out that that way the node did not get commented out but was rather prefixed by a text node with (: and suffixed by a text node with :).
Then I looked up the syntax and found out you need to use an enclosed expression {(:<foo>foo 1</foo>:)} instead to have access to the comment syntax.
However, while BaseX and Saxon 9.8 happily accept {(:<foo>foo 1</foo>:)}, Altova complains and needs an additional empty sequence {(:<foo>foo 1</foo>:)()}.
https://www.w3.org/TR/xquery-31/#doc-xquery31-EnclosedExpr suggests in XQuery 3.1 the expression inside curly braces is optional and defaults to ().
Does this also mean that in XQuery 3.1 it should suffice to use simply the comment inside of the curly braces, without an empty sequence?
So to summarize, Saxon and BaseX allow me to use <root>{(:<foo>foo 1</foo>:)}</root> while Altova complains about incorrect syntax, forcing me to use <root>{(:<foo>foo 1</foo>:)()}</root>.
Is that still necessary in XQuery 3.1?
Sounds like a bug in their commenter, which is pretty common in XQuery editors. Within in an element - and assuming you are using direct element constructors, not computed element constructors - use XML comments:
<hello>world
<!-- Don't print me -->
</hello>
Computed element constructors still use XQuery comments:
element hello {
'world' (: Don't print me :)
}
My XQuery script:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "text";
for $row in all/row
return ('"<row>","',data($row),'"
')
My XML:
<all>
<row>one</row>
<row>two</row>
<row>three</row>
</all>
My command line:
java -cp …/saxon9he.jar net.sf.saxon.Query '!omit-xml-declaration=yes' -s:./trouble-with-output-escaping.xml -q:./trouble-with-output-escaping.xqy
My output as created by saxon9he:
"<row>"," one "
"<row>"," two "
"<row>"," three "
I actually want to have output like this:
"<row>","one"
"<row>","two"
"<row>","three"
During my web investigation I came across XSLT's disable-output-escaping.
I thought: if XQuery had that, that might help.
Update/0:
Actually nothing (visible) was wrong with the above XQuery script.
The namespace declaration above needs to get replaced by this one:
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
Looks the same, but it isn't, as Michael pointed out.
Having completed this, the above is an example of how to create text output using XQuery.
In some other place Michael showed, how get rid of the space (0x20), that is being used to separate the lines, i.e. the space character preceding lines 2 to the end:
string-join(…,"")
where "…" would be the entire FLWOR.
It's doing the right thing if you set output method "text" from the command line, that is
java net.sf.saxon.Query -q:test.xquery -s:test.xml -t !method=text
but you had me baffled as to why setting the serialization options from within the query isn't working. Looking at it in the debugger, though, I see that your URI, which looks like
http://www.w3.org/2010/xslt-xquery-serialization
actually contains several occurrences of decimal 8203, hex 200B, which is a zero-width space. This means the URI doesn't match the serialization output URI, and "declare option" with an unrecognized URI is ignored.
I want to use > and < in my xqy pages in Marklogic Server. But Marklogic converts > to > and < to < In Query Console also when I write > and run the query it prints the output as > but I want it to be > only and not >. How can I do this ?
In QConsole you can select the Text output format. If you do so with a query that only contains ">", than only a > will be output. If you select the XML output format, it will be escaped and wrapped in a result element by the QConsole eval function to make it well-formed.
If you are checking your xqy page using a common web browser, it could be escaping properly written results too, make sure to check page source.
Note also that Marklogic returns xquery output usually as text/xml. You can set the response content type to text/plain using xdmp:set-response-content-type("text/plain")
HTH
I am trying this in an XQuery (assume that doc('input:instance') does indeed return a valid XML document) which is generated using XSLT
let $a:= <xsl:text>"<xsl:copy-of select="doc('input:instance')//A" />"</xsl:text>
let $p := <xsl:text>"<xsl:copy-of select="doc('input:instance')//P" />"</xsl:text>
let $r := <xsl:text>"<xsl:copy-of select="doc('input:instance')//R" />"</xsl:text>
But I get the error:
xsl:text must not contain child elements
How do I retrieve XML results using the XPath in xsl:copy-of and then encode the special characters received in the result while formatting the result as string? I would be happy to use CDATA section if that's possible (if I do that instead of xsl:text above, xsl:copy-of is not evaluated since it becomes part of CDATA section).
Obviously I am a newcomer to XSL...
What you need here is the ability to serialize an XML document (here the document returned by doc()) using the XML serialization, into a string.
Various XQuery implementation have extension functions for this purpose. For example, if you are using Saxon:
saxon:serialize(document, 'xml')
This has nothing to do with XQuery (you could be building the XSLT stylesheet with any language, even XSLT itslef!).
From http://www.w3.org/TR/xslt20/#xsl-text
<!-- Category: instruction -->
<xsl:text
[disable-output-escaping]? = "yes" | "no">
<!-- Content: #PCDATA -->
</xsl:text>
[...] The content of the xsl:text
element is a single text node whose
value forms the string value of the
new text node.