Select a single element from a complex ID - r

I have IDs such as "3_K97_T12_High_2_Apples". I want to select just "T12" and store it in a character vector (Tiles) so I can call it in text() when I label my points in the plot() with label = Tiles. I want to label each point with just the 3rd element of the ID (i.e T12).
How can I do this?

Use a regex to extract - capture the third set of characters that are not a _ (([^_]+) from the start (^) of the string as a group and in the replacement specify the backreference (\\1) of the captured group
Tiles <- sub("^[^_]+_[^_]+_([^_]+)_.*", "\\1", str1)
Tiles
[1] "T12"
^ - start of the string
[^_]+ - one or more characters not a _
_ - the _
[^_]+ - one or more characters not a _
_ - the _
([^_]+) - one or more characters not a _ captured
data
str1 <- "3_K97_T12_High_2_Apples"

Related

Extract a regex capture group from a string in MariaDB

For example:
Regex: District ([0-9]{1,2})([^0-9]|$)
Input District 12 2021 returns 12
Input Southern District 3 returns 3
Input FooBar returns NULL
The function REGEXP_SUBSTR doesn't allow extracting a single capturing group.
You can use e.g. REGEXP_REPLACE(input, regex, '\\1') to replace occurrences of regex in input with the first capture group of regex.
The following stored function makes this easy to use:
DELIMITER $$
CREATE FUNCTION regexp_extract(inp TEXT, regex TEXT, capture INT) RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE capstr VARCHAR(5);
DECLARE mregex TEXT;
IF inp IS NULL OR LENGTH(inp) = 0 OR inp NOT REGEXP regex THEN
RETURN NULL;
END IF;
SET capstr = CONCAT('\\', capture);
SET mregex = CONCAT('.*', regex, '.*'); -- Want to match the entire input string so it all gets replaced
RETURN REGEXP_REPLACE(inp, mregex, capstr);
END;
$$
DELIMITER ;
Used like so:
SELECT regexp_extract('District 12 2021', 'District ([0-9]{1,2})([^0-9]|$)', 1);
For those users who might be stuck with an earlier version of MySQL or MariaDB which does not have REGEXP_REPLACE available, we can also use SUBSTRING_INDEX here:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX('Southern District 3', 'District ', -1), ' ', 1); -- 3

Remove character string only if it occurs at the end of a string

I want to remove the final characters "txt" if they exist at the end of a character string. However, sometimes, the characters can be found at other locations of the string also.
i.e. I would like to remove txt from "/some-text-here/txt" but not from "/txt-not-to-remove-text/" since in the last example it does not occur after the last occurance of the /
So, I only want to extract txt if it occurs at the end of the string.
string = c("/some-text-here/txt",
"/some-other-text-here/txt",
"/txt-not-to-remove-text/",
"/txt-another-line-of-text/txt")
We may use trimws
trimws(string, whitespace = "/txt", which = 'right')
-output
[1] "/some-text-here" "/some-other-text-here"
[3] "/txt-not-to-remove-text/" "/txt-another-line-of-text"
Or using sub and specify the end ($) of the string
sub("/txt$", "", string)

Superscript price decimal numbers when using Standard Numeric Format Strings

I'm using Standard Numeric Format Strings (see here) to format pricing on my page:
Dim price As Integer = 378
Dim s As String = (CDec(price) / 100).ToString("F")
results in string "3,78" which I print in HTML.
However, I want the decimal part "78" to be superscripted using HTML/CSS. How can I do so?
Unless the text is automatically HTML encoded (and you have access to the source of all this) you can use the <sup> tag.
To superscript only the 78 part just split the string by , (or . if that is the case) into an array and join it together again with the tag included.
For example:
Dim Parts() As String = s.Split(If(s.Contains("."), ".", ","))
Dim s2 As String = Parts(0) & ",<sup>" & Parts(1) & "</sup>"

Access 2010 sql query to format 14 character finance data

I have raw finance text files that I'm importing into Access 2010 and exporting in Excel format. These files contain several 14 character length fields which represent dollar values. I'm having issues converting these fields into currency because of the 14th character. The 14th character is a number represented by a bracket or letter. It also dictates whether the unique field is a positive or negative value.
Positive numbers 0 to 9 start with open bracket { being zero, A being one, B being two,...I being nine.
Negative numbers -0 to -9 (I know, -0 is a mathematical faux pas but stay with me. I don't know how else to explain it.) start with close bracket } being -0, J being -1,K being -2,...R being -9.
Example data (all belonging to the same field/column):
0000000003422{ converted is $342.20
0000000006245} converted is -$624.50
0000000000210N converted is -$21.05
0000000011468D converted is $1,146.84
Here's the query that I'm working with. Each time I execute it, the entire field is deleted though. I would prefer to stick to a SQL query if possible but I'm open to all methods of resolution.
SET FIELD_1 = Format(Left([FIELD_1],12) & "." & Mid([FIELD_1],13,1) & IIf(Right([FIELD_1],1)="{",0,IIf(Right([FIELD_1],1)="A",1,IIf(Right([FIELD_1],1)="B",2,IIf(Right([FIELD_1],1)="C",3,IIf(Right([FIELD_1],1)="D",4,IIf(Right([FIELD_1],1)="E",5,IIf(Right([FIELD_1],1)="F",6,IIf(Right([FIELD_1],1)="G",7,IIf(Right([FIELD_1],1)="H",8,IIf(Right([FIELD_1],1)="I",9,"")))))))))),"$##0.00"), IIf(Right([FIELD_1],1)="}",0,IIf(Right([FIELD_1],1)="J",1,IIf(Right([FIELD_1],1)="K",2,IIf(Right([FIELD_1],1)="L",3,IIf(Right([FIELD_1],1)="M",4,IIf(Right([FIELD_1],1)="N",5,IIf(Right([FIELD_1],1)="O",6,IIf(Right([FIELD_1],1)="P",7,IIf(Right([FIELD_1],1)="Q",8,IIf(Right([FIELD_1],1)="R",9,"")))))))))),"-$##0.00")
here is a function that you can call to convert an input string like the ones in your example into a string formatted as you desire.
Private Function ConvertCurrency(strCur As String) As String
Const DIGITS = "{ABCDEFGHI}JKLMNOPQR"
Dim strAlphaDgt As String
Dim intDgt As Integer, intSign As Integer
Dim f As Integer
Dim curConverted As Currency
strAlphaDgt = Right(strCur, 1) ' Extract 1st char from right
f = InStr(DIGITS, strAlphaDgt) ' Search char in DIGITS. Its position is related to digit value
intDgt = (f - 1) Mod 10 ' Converts position into value of the digit
intSign = 1 - 2 * Int((f - 1) / 10) ' If it's in the 1st half is positive, if in the 2nd half of DIGITS it's negative
curConverted = intSign * _
CCur(Left(strCur, Len(strCur) - 1) & _
Chr(intDgt + 48)) / 100 ' Rebuild a currency value with 2 decimal digits
ConvertCurrency = Format(curConverted, _
"$#,###.00") ' Format output
End Function
If you need to have a Currency as returned value, you can change the type returned from String to Currency and return the content of curConverted variable.
Bye.

Find first comma in string, then extract value between spaces

I'm extracting rows from a txt file.
This row contains values like this:
DESCRIPTION 1 1.234,00 15.980,00 [etc.]
I would like to extract these values (I mean only numeric values).
So I thought to find first comma, execute a for cycle backwards until first White space and execute a For cycle forward for decimals digits.
The I should go to the second comma and perform these cycles again.
Can you suggest some code that could be useful for my solution?
From your description, if you just need the decimal number before the comma, then you can do this with a pretty simple regex:
Dim s = "DESCRIPTION 1 1.234,00 15.980,00"
Dim pattern = "\d+(\.\d+)?,\d+"
Dim matches = System.Text.RegularExpressions.Regex.Matches(s, pattern)
For Each match in matches
Console.WriteLine(match.Value)
Next
'Outputs:
'
'1.234,00
'15.980,00
Here's a quick breakdown of the regex:
\d+ - \d is shorthand for [0-9], which just means "any numeric character". The + just indicates "one or more"
\. - this just matches a period character.
, - this just matches a comma.
( ... ) - parentheses just creates a group (think of it as a sub-regex)
? - question marks mean that the previous item is optional. In this case, that means that the group matching (\.\d+)? is optional, which allows you to match both 0.000,00 and 0,00
In that regex, if the comma and period are optional, then you can add a ? after them.
My Visual Basic knowledge is pretty limited, but can't you utilize the IsNumeric function available in VB.NET?
Someting like this:
' initial string/row/etc
Dim s As String = "DESCRIPTION 1 1.234,00 15.980,00"
' Split string based on spaces
Dim words As String() = s.Split(New Char() {" "c})
' Use For Each loop over split and display them
Dim word As String
For Each word In words
If IsNumeric(word) Then
Console.WriteLine(word & " is numeric")
Else
Console.WriteLine(word & " is not numeric")
End If
Next
I think you'll be needing to look at System.Text.Regex.
Match m = Regex.Match("DESCRIPTION 1 1.234,00 15.980,00", ".*?( [0-9]*?.(?'n1'[0-9]+),(?'n2'[0-9]+)))
While m.Success
System.Diagnostics.Debug.WriteLine(m.Groups["n1"].Value + " "+m.Groups["n2"].Value);
m = m.NextMatch()
End While
If the columns are fixed width, you can get the values like this:
Dim input As String = "DESCRIPTION 1 1.234,00 15.980,00"
Dim col1 As String = input.SubString(17, 12).Trim()
Dim col2 As String = input.SubString(29).Trim()

Resources