.NET regular expression for reddit style bullets, blockquotes, etc - asp.net

I'm writing an application that will take user input and convert certain character strings to HTML tags in much the same way that reddit does. I have regular expressions for bold, italics, numbered lists, strikethrough, superscript all working properly, but doing the same for blockquotes and bulleted lists are causing problems.
What I have:
* Text (start of line, asterisk, space then text to the next line break)
r = New Regex("(?s)^|\n\*\s(.+?)\n", RegexOptions.IgnoreCase Or RegexOptions.Multiline)
strOutput = r.Replace(strOutput, "<ul><li>$1</li></ul>")
r = Nothing
This appears to be putting bullets in random places.
Likewise, blockquote would be:
> Text (start of line, greater than symbol, space then text to next line break)
r = New Regex("(^?|\n?)\>\s(.+?)\n", RegexOptions.IgnoreCase Or RegexOptions.Multiline)
strOutput = r.Replace(strOutput, "<blockquote>$1</blockquote>")
r = Nothing
Any ideas?

By simplifying your alternations for new line/start of string I was able to get these to work:
(^|\n)\*\s(.+?)\n
(^|\n)\>\s(.+?)\n
For example, looking for "zero or one start of string" with ^? doesn't make a whole lot of sense to me, especially when alternately matching \n.

Related

How to remove characters between space and specific character in R

I have a question similar to this one but instead of having two specific characters to look between, I want to get the text between a space and a specific character. In my example, I have this string:
myString <- "This is my string I scraped from the web. I want to remove all instances of a picture. picture-file.jpg. The text continues here. picture-file2.jpg"
but if I were to do something like this: str_remove_all(myString, " .*jpg) I end up with
[1] "This"
I know that what's happening is R is finding the first instance of a space and removing everything between that space and ".jpg" but I want it to be the first space immediately before ".jpg". My final result I hope for looks like this:
[1] "This is my string I scraped from the web. I want to remove all instances of a picture. the text continues here.
NOTE: I know that a solution may arise which does what I want, but ends up putting two periods next to each other. I do not mind a solution like that because later in my analysis I am removing punctuation.
You can use
str_remove_all(myString, "\\S*\\.jpg")
Or, if you also want to remove optional whitespace before the "word":
str_remove_all(myString, "\\s*\\S*\\.jpg")
Details:
\s* - zero or more whitespaces
\S* - zero or more non-whitespaces
\.jpg - .jpg substring.
To make it case insensitive, add (?i) at the pattern part: "(?i)\\s*\\S*\\.jpg".
If you need to make sure there is no word char after jpg, add a word boundary: "(?i)\\s*\\S*\\.jpg\\b"

Create new emphasis command R Markdown

In R Markdown, to make a text bold, we just need to do:
**code**
The the word code shows in bold.
I was wondering if there is a way to create a new command, let's say:
***code***
That would make the text highlighted?
Thanks!
It is not easily possible to create new markup, but one can change the way existing markup commands are rendered. Text enclosed by three stars is interpreted as emphasized strong emphasis. So one has to change that interpretation and change it to something else. One way to do so is via pandoc Lua filters. We just have to match on pandoc's internal representation of emphasized strong text and convert it to whatever we want:
function Strong (strong)
-- if this contains only one element, and if that element
-- is emphasized text, convert it to highlighted text.
local element = #strong.content == 1 and strong.content[1]
if element and element.t == 'Emph' then
table.insert(element.content, 1, pandoc.RawInline('html', '<mark>'))
table.insert(element.content, pandoc.RawInline('html', '</mark>'))
return element.content
end
end
The above works for HTML output. One would have to define what "highlighted text" means for each targeted format.
See this and this question for other approaches to the problem, and for details of how to use the filter with R Markdown.

Implementing syntax highlighting for markdown titles in PySide/PyQt

I am trying to implement a syntax highlighter for markdown for my project in PySide. The current code covers the basic, with bold, italic, code blocks, and some custom tags. Below is an extract of the relevant part of the current code.
What is blocking me right now is how to implement the highlighting for titles (underlined with ===, for the main title, or --- for sub-titles). The method that is used by Qt/PySide to highlight the text is highlightBlock, which processes only one line at a time.
class MySyntaxHighlighter(QtGui.QSyntaxHighlighter):
def highlightBlock(self, text):
# do something with this line of text
self.setCurrentBlockState(0)
startIndex = 0
if self.previousBlockState() != 1:
startIndex = self.blockStartExpression.indexIn(text)
while startIndex >= 0:
endIndex = self.blockEndExpression.indexIn(
text, startIndex)
...
There is a way to recover the previousBlockState, which is useful when a block has a defined start (for instance, the ~~~ syntax at the beginning of a code-block). Unfortunately, there is nothing that defines the start of a title, except for the underlining with === or --- that take place on the next line. All the examples I found only handle cases where there is a defined start of the expression, and so that the previousBlockState gives you an information (as in the example above).
The question is then: is there a way to recover the text of the next line, inside the highlightBlock? To perform a look-ahead, in some sense.
I though about recovering the document currently being worked on, and find the current block in the document, then find the next line and make the regular expression check on this. This would however break if there is a line in the document that has the exact same wording as the title. Plus, it would become quite slow to systematically do this for all lines in the document. Thanks in advance for any suggestion.
If self.currentBlock() gives you the block being highlighted, then:
self.currentBlock().next().text()
should give you the text of the following block.

Justify text according to a size as opposed to string length? ASP.NET

I have listbox with text in it, and I was asked to see if I could just justify its contents after the dash. My resulting code produced something like this:
Which works fine for scenarios where the text to the left of the dash is less than the max length found from the other items in the listbox (i.e. (B20) is less than (B15-B19), which is the longest entry found, so add some whitespace before the dash).
The issue, though, is that if the text before the dash is same length, it still looks like it isn't justified. Example:
Is there a way to truly line up all the dashes? I would imagine I would have to look at the actual pixel length of the characters before the dash as opposed to the length?
Notes:
I am using ASP.NET Webforms
VB.NET
The text for each item in the listbox is all one string
Right now, my method to accomplish what you see in the first picture is as follows:
Public Sub JustifyDisplayName()
Const ACCOUNT_FOR_DASH As Integer = 4
Dim maxCharCount As Integer = 0
Dim whiteSpace As String = HttpUtility.HtmlDecode(" ")
'Find which one is the longest code
For Each element As TextEntry In Me
If element.Value.Length > maxCharCount Then
maxCharCount = element.Value.Length
End If
Next
'Now, extend the '-' to the max for all items
For Each element As TextEntry In Me
'See how much white space we need to inject
Dim paddingNeeded As Integer = maxCharCount - element.Value.Length
Dim tempDisplay As StringBuilder = New StringBuilder(element.Value)
If paddingNeeded > 0 Then
tempDisplay.Append(CChar(whiteSpace), paddingNeeded + ACCOUNT_FOR_DASH)
tempDisplay.Append(" - " & element.Description)
End If
tempDisplay.Append(" - " & element.Description)
element.DrillDownDisplayNameJustified = tempDisplay.ToString()
Next
End Sub
Thanks.
If you used a fixed-width font, you could make this all much easier. In addition to good ol' Courier, I believe there are others.
If you don't, you're not going to be able to get exactly the right width. You could get close, but you won't get it exactly, because the difference in length between (H60-H95) and (I00-I99) as they are rendered may not evenly divide into increments of one .
But if you really want to give this a try, you'll have to use the System.Drawing namespace, the Graphics class, and a method on Graphics called MeasureString. This will be just to get the lengths of the strings in your selected font, though: System.Drawing doesn't apply to web apps.
If you could append spaces to short items before the dash so that you always have the same number of characters before the dash, you may consider using Monospaced Fonts, where each character occupies the same width - Ref: Similar Question.

How to check for and remove a newline (\n) in the first line of a text field in actionscript?

In the script, sometimes a newline is added in the beginning of the text field (am using a textArea in adobe flex 3), and later on that newline might need to be removed (after other text has been added). I was wondering how to check if there is a newline at the beginning of the text field and then how to remove it. Thanks in advance.
How about
private function lTrimTextArea(ta:TextArea) {
ta.text = ta.text.replace(/^\n*/,'');
}
To remove all line breaks from the start of a string, regardless of whether they are Windows (CRLF) or UNIX (LF only) line breaks, use:
ta.text = ta.text.replace(/^[\r\n]+/,'');
You should use + in the regex instead of * so that the regex only makes a replacement if there is actually a line break at the start of the string. If you use ^\n* as Robusto suggested the regex will find a zero-length match at the start of the string if the string does not start with a line break, and replace that with nothing. Replacing nothing with nothing is a waste of CPU cycles. It may not matter in this situation, but avoiding unintended zero-length matches is a very good habit when working with regular expressions. In other situations they will bite you.
If you particularly want to check first character here is solution:
if(ta.text.charAt(0) == "\n" || ta.text.charAt(0) == "\r")
{
ta.text.slice(1,ta.text.length-1);
}
slice method will slice that first character and gives your text from second character.
If you want to simply disable the ability to create a carriage return / new line, then all you need to do is disable multiline for that TextField...
(exampleTextField as TextField).multiline = false;
This will still trigger KEY_DOWN and KEY_UP events, however will not append the text with the carriage return.

Resources