I'm here with a question that I hope can be answered, which is really quite silly and basic.
I have a file of authors in the format of:
<authorRoot>
<author>
<info tags on author>
</author>
etc
</authorRoot>
and all I wish to do is, through FLWOR, return a list where each 'author' and its information is a different value, so when I run the query, the result should come out looking like
1. <author><info>.....</info></author>
2. <author><info>.....</info></author>
etc
and I am CERTAIN that something as simple as that should just be the following code
xquery version "1.0";
for $x in //author
return $x
yet when I do so, the query result comes out as
1.<author><info>...</info></author><author><info>...</info></author><author><info>...</info></author><author><info>...</info></author><author><info>...</info></author>....etc
I'm relatively new to XQuery, and I'm using AltovaSpy. I've done similar questions as basic as this (where I have a file of similar layout and I use essentially the same code, resulting in an xquery result page of multiple values, not just one long one) but for this file it just doesn't seem to work! Is it something with my code that I'm just not seeing? Or could it be the file, perhaps?
Thank you for whatever input you have on the situation.
Well, your reasoning is correct. .
It is just a formatting issue, it seems Altova prints the entire sequence in a single line without linebreaks.
You can also try it in my XQuery online tester, there you can see that the sequence is as you expected it to be.
If you watch this demo video of Altova XMLSpy and advance to 2:35 you will see how clicking on one of the toolbar buttons (which appears to be labeled "Pretty-print") will format the results of your XQuery as nicely indented XML.
Related
I am trying to put multiple values inside this content with this XQuery Expression Builder. I tried to use a string function like thisfn:concat($body, $inbound, $inbound), but this does not seems to keep the whole message.
Is there any way that I can put all these variables in one report action? If this is possble then how should I read these values out after they are stored in the database(some key value structure would be perfect).
You only need to form a xml with the content you want to show in your report:
<report>
<body>{$body}</body>
<inbound>{$inbound}</inbound>
...
</report>
the only requirement is that the output have to be an XML no matter the structure.
Not sure, but I would try something like this:
<myroot>{$body, $inbound, $outbound}</myroot>
Or if you really need a string returned:
fn:serialize(<myroot>{$body, $inbound, $outbound}</myroot>)
Note, fn:serialize is only in OSB 12c+.
I want to remove punctuation from a database of xml document in marklogic. This is made for preprocessing purposes for machine learning. I'm new to marklogic and i don't know how to do that. Is there an xquery query that could remove punctuation?
To do a mass replacement of all text in the database, and take out punctuation, you could start with something that looks like this code (modified for your needs):
for $doc in cts:search(fn:collection(), ())
for $text in $doc//text()
return xdmp:node-replace($text, text{fn:replace($text, "[\.,;]", "")})
To be honest, that task is much less expensive to do on the source text files themselves - or in MarkLogic by treating the XML as string during the replacement process. Updating nodes one element at a time will be expensive.
Outside of Marklogic:
use SED or AWK or a similar tool BEFORE INGESTION
Inside of MarkLogic(as a trigger, perhaps)
use xdmp:quote to change the XML to a string, then replace in a sing with fn:replace and then make XML again with xdmp:unquote
let $new-doc := xdmp:unquote(fn:replace(xdmp:quote($doc), "[\.,;]", ""))
Then either store by replacing the root node with xdmp:node-replace - or store this version as a property. This all depends on if the original (punctuated version matters to you). Or perhaps you just want to keep the original and serve this cleansed version back to someone.
In all cases above, you have to make sure that your replacement does not murder your XML. Also, be aware of options for the functions above(like how cdata is handled.
Lastly, "This is for machine learning purposes". You do not elaborate. I think many of us here have a feeling that this solution (cleansing punctuation before insert) rubs against the very grain of MarkLogic - in which you store as-is and then have awesome index, tokenizing, stemming, collation, search support to find and return your data as you need. If you were to elaborate on your use case a bit, you may inspire others to give more MarkLogic-Specific suggestions.
It will work if you use 'punctuation-insensitive' and if required 'diacritic-insensitive' in cts:element-word-query()
I'm not sure if this is what you're asking, but it's technically possible to update every document in the database to remove punctuation; however, it's very expensive and I wouldn't recommend it.
Using built-in search functions, you can probably achieve the same goal without updating your documents by querying with punctuation insensitivity. For example, if you want to select documents with a title matching a case insensitive string:
cts:search(//mydoc,
cts:element-word-query(xs:QName('title'), 'Moby-Dick', 'punctuation-insensitive'))
Or in an existing XQuery:
for $d in $documents
where cts:contains($d,
cts:element-word-query(xs:QName('title'), 'Moby-Dick', 'punctuation-insensitive'))
return $d/summary
I'm aware that there are some languages that writes the order of some characters differently than the common latin languages. E.g.: a percentage number in English would be like "100%", while in Persian it would be "/100" (the symbol comes before the number).
Question: how to consider that in the Qt internationalization system in an intelligent way?
I first thought about this code:
myLabel->setText(tr("%1%2").arg(value).arg(tr("%")));
So what would happen is that, in the Qt Linguist, the translator would change the order of the replacement fields:
%1%2 -> in Persian translation -> %2%1
I checked that in my code and I found out that while in the normal (English) translation everything was fine, when I changed to the file containing the performed translation, a bug would occur: the number to be shown was never complete having one less number that what I had written. So e.g. if I chose "99%", it would show "%9", and if I set only "9%", I would have just "%".
The problem disappeared when I put a space between %1 and %2 both in the source code as well as in the translation (%2 %1). Since ISO xxxxx says that the % should be placed with a space between it and the correspondent number, no problem for this specific situation. But what If I wanted to have both symbols without a space between each other? How should it be done?
I confirm that the problem you described exists. However I would solve this problem in the following way:
QString sPer = QString("%%1").arg(value); // %99
QString sEng = QString("%1%").arg(value); // 99%
So that
%1% -> in Persian translation -> %%1
Put the percentage inside the string to translate, something like
myLabel->setText(tr("%%1").arg(value))
even better, I think I would add a disambiguation string (Qt "old style" comment)
myLabel->setText(tr("%%1", "Show the number with a percentage").arg(value))
or maybe a new style translation comment like this
//: Show the number with a percentage
myLabel->setText(tr("%%1").arg(value))
Translator comments have issues of their own, you might be better off using a Qt disambiguation string like the preceding example.
Putting a disambiguation string (or a translator comment) will let your translator know what they are translating...
Let the translators decide where they want to put the percentage, don't try to handle it in the code as somebody else suggested, it is not scalable, you can't start handling this in the code like that, what if you need to use another character or formatting for another language?
It might even be possible that Qt might have calls to handle number formatting but I can't seem to find them at the moment and I am not fully sure how we handle them in the Qt application I work on...
If putting a % alone doesn't work, try to precede it with \, it might be necessary to escape it, I am not sure...
I can see the technology-independent Tridion Content Delivery Language (TCDL) link has the following parameters, which are pretty well described on SDL Live Content.
type
origin
destination
templateURI
linkAttributes
textOnFail
addAnchor
VariantId
How do we add multiple attribute-value pairs for the linkAttributes? Specifically, what do we use to escape the double quotes as well as separate pairs (e.g. if we need class="someclass" and onclick="someevent").
The separate pairs are just space delimited, like a normal series of attributes. Try XML encoding the value of linkAttributes however. So, " become "e;, etc...
If you are using some Javascript, you might take care of the Javascript quotes too, as in \".
Edit: after I figured out your real question, the answer is a lot simpler:
You should wrap the values inside your linkAttributes in single quotes. Spaces inside linkAttributes are typically handled fine; but if not, escape then with %20.
If you need something more or want something that isn't handled by the standard tcdl:ComponentLink, remember that you can always create your own TCDL tag and and use a TagHandler or TagRenderer (look them up in the docs for examples or search for Jaime's article on TagRenderer) to do precisely what you want.
My original answer was to a question you didn't ask: what is the format for TCDL tags (in general). But the explanation might still be useful to some, so remains below.
I'd suggest having a look at what format the default building blocks (e.g. the Link Resolver TBB in the Default Finish Actions) output and use that as a guide line.
This is what I could quickly get from the transport package of a published page:
<tcdl:Link type="Page" origin="tcm:5-199-64" destination="tcm:5-206-64"
templateURI="tcm:0-0-0" linkAttributes="" textOnFail="true"
addAnchor="" variantId="">Home</tcdl:Link>
<tcdl:ComponentPresentation type="Embedded" componentURI="tcm:5-69"
templateURI="tcm:5-133-32">
<span>
...
One of the things that I know from experience: your entire TCDL tag will have to be on a single line (I wrapped the lines above for readability only). Or at least that is the case if it is used to invoke a REL TagRenderer. Clearly the tcdl:ComponentPresentation tag above will span multiple lines, so that "single line rule" doesn't apply everywhere.
And that is probably the best advice: given the fact that TCDL tags are processed at multiple points in Tridion Publishing, Deployment and Delivery pipeline, I'd stick to the format that the default TBBs output. And from my sample that seems to be: put everything on a single line and wrap the values in (double) quotes.
I've tried to find the answer in other questions, and none of the "standard" answers are working, so I'm hoping someone can either point me to where this has already been answered, or can tell me how to do this.
I have a large file with multiple documents within it. For a sample, assume something like this
DOCUMENT_IDENTIFIER 123400000000000000000123457 OTHER STUFF HERE
LINE WITH STUFF HERE
LINE WITH STUFF HERE
DOCUMENT_IDENTIFIER 123500000000000000000127456 OTHER STUFF HERE
LINE WITH STUFF HERE
LINE WITH STUFF HERE
Now, I'll need to preserve everything in the DOCUMENT_IDENTIFIER Line starting with the first 0 through the 123 (or 127 in the second Document) That header line, plus the all the LINE WITH STUFF HERE lines below it should make up one Document, and a new document should start at the second DOCUMENT_IDENTIFIER line.
When I attempt to use the standard Debatching techniques, the pipeline fails: either it just fails completely (when, for instance, I try to define a header and body schemas for the pipeline) or it never starts the second document (if I try just a body schema).
I'm certain this is something fairly simple, but I'm completely missing how to get it done. Any suggestions/direction would be welcome.
If it matters, I'm stuck on BT2006 R2, at current.
What does your Body Schema look like? I would start getting that right and make sure that you have something that will create xml with separate records of all the "DOCUMENT_IDENTIFIER 1234" records.
I would use the "DOCUMENT_IDENTIFIER "1234 bit as the Tag Identifier, and then I would set the Tag Offset to 4, to avoid the first 4 characters.
You should have a
RecordForDocumentIdentifiers (Root of your Schema) Group Maxoccurs=*
RecordForDocumentIdentifier (Set the Tag Identifier here)
Fields for the columns you want to parse
RecordForOtherLines Group Maxoccurs=*
RecordForOtherLine Maxoccurs=* or whatever your rules are.
Fields for the columns of other lines
When that seems to parse your example okay, and generate the XML you want, I would start creating my header and body schemas from that. I know it is 2 steps, but it takes some of the guesswork out of it.
I guess the Header schema would be picked from the RecordDocumentIdentifier and the body would be RecordForOtherLines (The outer record for that).
I hope that helps. If not, post your actual file and schema and let us take a look at it.