In ReStructuredText, is it possible to have emphasis and no emphasis in the same word? For example:
*emph*not-emph
leading to "emph no-emph", but with no white space in between? I can't find a way to do it, not even with a substitution.
What you are looking for is Character-Level Inline Markup. The description from the reStructuredText specification is (emphasis mine):
It is possible to mark up individual characters within a word with backslash escapes [...] Backslash escapes can be used to allow arbitrary text to immediately follow inline markup.
The two examples provided in the specification are:
For a single character immediately following inline markup:
Python ``list``\s use square bracket syntax.
For arbitrary text immediately following inline markup:
Possible in *re*\ ``Structured``\ *Text*, though not encouraged.
So to achieve the output you want, you need to use the backslash-escaped whitespace pattern:
*emph*\ not-emph
The reason this is required is because the inline markup recognition rules require that:
Inline markup end-strings must end a text block or be immediately followed by
whitespace,
one of the ASCII characters - . , : ; ! ? \ / ' " ) ] } > or
a non-ASCII punctuation character with Unicode category Pd (Dash), Po (Other), Pe (Close), Pf (Final quote), or Pi (Initial quote).
Note that the use of that pattern above is discouraged in the reStructuredText specification:
The use of backslash-escapes for character-level inline markup is not encouraged. Such use is ugly and detrimental to the unprocessed document's readability. Please use this feature sparingly and only where absolutely necessary.
Related
I have a $text = "Hello πππ π ππ» π¦¦ΓΌΓ€ΓΆ$"
I wanted to remove just emoji's from the text using xquery. How can i do that?
Expected result : "Hello üÀâ$"
i tried to use:
replace($text, '\p{IsEmoticons}+', '')
but didn't work.
it just removed smiley's
Result now: "Hello π ππ» π¦¦ΓΌΓ€ΓΆ$"
Expected result : "Hello üÀâ$"
Thanks in advance :)
I outlined the approach in my answer to the original question, which I updated based on your comment asking about how to strip out π.
Quoting from that expanded answer:
The "Emoticons" block doesn't contain all characters commonly associated with "emoji." For example, π (Purple Heart, U+1F49C), according to a site like https://www.compart.com/en/unicode/U+1F49C that lets you look up Unicode character information, is from:
Miscellaneous Symbols and Pictographs, U+1F300 - U+1F5FF
This block is not available in XPath or XQuery processors, since it is neither listed in the XML Schema 1.0 spec linked above, nor is it in Unicode block names for use in XSD regular expressionsβa list of blocks that XPath and XQuery processors conforming to XML Schema 1.1 are required to support.
For characters from blocks not available in XPath or XQuery, you can manually construct character classes. For example, given the purple heart character above, we can match it as follows:
replace("Purple π heart", "[π-πΏ]", "")
This returns the expected result:
Purple Heart
This approach can be applied to ππ» , π¦¦, or any other character:
Locate the character's unicode block.
Craft your regular expression with the block name (if available in XPath) or character class.
Alternatively, rather than locating the blocks of characters you want to strip out, you could identify the blocks of characters you want to preserve. For example, given the example string in the original post, perhaps the goal is to preserve only those characters in the "Basic Latin" block. To do so, we can match characters NOT in this block via the \P Category Escape:
xquery version "3.1";
let $text := "Hello πππ π ππ» π¦¦ΓΌΓ€ΓΆ$"
return
replace($text, "\P{IsBasicLatin}", "")
This query returns:
Hello $
Notice that this has stripped out the characters with diacritics, which perhaps isn't desired. These characters with diacritics belong to the Latin-1 Supplement block. To preserve characters from both the Latin and Latin-1 Supplement blocks, we'd need to adjust the query as follows:
xquery version "3.1";
let $text := "Hello πππ π ππ» π¦¦ΓΌΓ€ΓΆ$"
return
replace($text, "[^\p{IsBasicLatin}\p{IsLatin-1Supplement}]", "")
... which returns:
Hello üÀâ$
This now preserves the characters with diacritics.
To be precise about the characters you preserve or remove, you need to consult the Unicode blocks and charts.
I need to concatenate two strings within an R object: one is just regular text; the other is italicized. So, I tried a lot of combinations, e.g.
paste0(" This is Regular", italic( This is Italics))
The desired result should be:
This is Regular This is Italics
Any ideia on how to do it?
Thanks!
In plot labels, you can use expressions, see mathematical annotation :
plot(1,xlab=expression("This is regular"~italic("this is italic")))
To provide an string for which an HTML parser will recognise the need to render the text in Italics, wrap the text in <i> and </i>. For example: "This is plain text, but <i>this is in Italics</i>.".
However, most HTML processors will assume that you want your text to appear as-is and will escape their input by default. This means that the special meanings of certain characters - including < and > will be "turned off". You need to tell the processor not to do this. How you do that will depend on context. I can't tell you that because you haven't given me context.
Are you for example, writing to a raw HTML file? (You need do nothing.) Are you writing to a Markdown file? If so, how? In plain text or in a rendered chunk? Are you writing a caption to a graphic? (Waldi has suggested a solution.) Etc, etc....
Given the following examples which I picked up from here:
CSS.escape(".foo#bar") // "\.foo\#bar"
CSS.escape("()[]{}") // "\(\)\[\]\{\}"
Since .foo#bar is a valid CSS selector expression. Why we need to append \ before some characters? Suppose I want to write my own program which does the same task of escaping all the values/expressions in a CSS file then, how should I proceed?
PS: I am always confused about the escaping, how should I think when it comes to escaping some input?
You escape strings only when those strings contain special symbols that you want to be treated literally. If you are expecting a valid CSS selector as user input, you shouldn't be escaping anything.
.foo#bar is a valid CSS selector, but it means something completely different from \.foo\#bar. The former matches an element with that respective class and ID, e.g. <div class=foo id=bar> in HTML. The latter matches an element with the element name ".foo#bar", which in a hypothetical markup language could be represented as <.foo#bar> (obviously this is not legal HTML or XML syntax, but you get the picture).
Can the 'class' attribute of HTML5 elements contain line breaks? Is it allowable in the specs and do browsers support it?
I ask because I have some code that dynamically inserts various classes into the element and this has created one very long line that is hard to manage. Normally I would build the class value using a variable but the CMS I'm using requires the template conditional tags to be positioned inline with the HTML. I can't use variables or PHP.
What I found in my research is that some HTML tag attributes need to be a single line, but I haven't been able to discover if the class attribute is one of those.
Does anyone know something about this?
Per the HTML 4 spec, the class attribute is CDATA:
User agents should interpret attribute values as follows:
o Replace character entities with characters
o Ignore line feeds
o Replace each carriage return or tab with a single space.
so you're in good shape there.
The HTML5 spec describes a class as a set of space separated tokens, where a 'space' includes newlines.
So you should be good there, too.
Can the [class] attribute of HTML5 elements contain line breaks?
Yes. The HTML5 spec says:
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.
The link proceeds to say:
A set of space-separated tokens is a string containing zero or more words (known as tokens) separated by one or more space characters, where words consist of any string of one or more characters, none of which are space characters.
And space characters include:
space (' ')
tab (\t)
line feed (\n)
form feed (\f)
carriage return (\r)
The space characters, for the purposes of this specification, are U+0020 SPACE, "tab" (U+0009), "LF" (U+000A), "FF" (U+000C), and "CR" (U+000D).
Newlines as you would add to UTF-8 documents are:
line feeds (\n)
carriage returns (\r)
a carriage return followed immediately by a line feed (\r\n)
I'm trying to create a RegEx Validator that checks the file extension in the FileUpload input against a list of allowed extensions (which are user specified). The following is as far as I have got, but I'm struggling with the syntax of the backward slash (\) that appears in the file path. Obviously the below is incorrect because it just escapes the (]) which causes an error. I would be really grateful for any help here. There seems to be a lot of examples out there, but none seem to work when I try them.
[a-zA-Z_-s0-9:\]+(.pdf|.PDF)$
To include a backslash in a character class, you need to use a specific escape sequence (\b):
[a-zA-Z_\s0-9:\b]+(\.pdf|\.PDF)$
Note that this might be a bit confusing, because outside of character classes, \b represents a word boundary. I also assumed, that -s was a typo and should have represented a white space. (otherwise it shouldn't compile, I think)
EDIT: You also need to escape the dots. Otherwise they will be meta character for any character but line breaks.
another EDIT: If you actually DO want to allow hyphens in filenames, you need to put the hyphen at the end of the character class. Like this:
[a-zA-Z_\s0-9:\b-]+(\.pdf|\.PDF)$
You probably want to use something like
[a-zA-Z_0-9\s:\\-]+\.[pP][dD][fF]$
which is same as
[\w\s:\\-]+\.[pP][dD][fF]$
because \w = [a-zA-Z0-9_]
Be sure character - to put as very first or very last item in the [...] list, otherwise it has special meaning for range or characters, such as a-z.
Also \ character has to be escaped by another slash, even inside of [...].