Extract a regex capture group from a string in MariaDB

Extract a regex capture group from a string in MariaDB - mariadb

For example:
Regex: District ([0-9]{1,2})([^0-9]|$)
Input District 12 2021 returns 12
Input Southern District 3 returns 3
Input FooBar returns NULL

The function REGEXP_SUBSTR doesn't allow extracting a single capturing group.
You can use e.g. REGEXP_REPLACE(input, regex, '\\1') to replace occurrences of regex in input with the first capture group of regex.
The following stored function makes this easy to use:
DELIMITER $$
CREATE FUNCTION regexp_extract(inp TEXT, regex TEXT, capture INT) RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE capstr VARCHAR(5);
DECLARE mregex TEXT;
IF inp IS NULL OR LENGTH(inp) = 0 OR inp NOT REGEXP regex THEN
RETURN NULL;
END IF;
SET capstr = CONCAT('\\', capture);
SET mregex = CONCAT('.*', regex, '.*'); -- Want to match the entire input string so it all gets replaced
RETURN REGEXP_REPLACE(inp, mregex, capstr);
END;
$$
DELIMITER ;
Used like so:
SELECT regexp_extract('District 12 2021', 'District ([0-9]{1,2})([^0-9]|$)', 1);

For those users who might be stuck with an earlier version of MySQL or MariaDB which does not have REGEXP_REPLACE available, we can also use SUBSTRING_INDEX here:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX('Southern District 3', 'District ', -1), ' ', 1); -- 3

Related

PLSQL SUBSTR function ignore the trailing zero

select TO_NUMBER (SUBSTR(10.31, INSTR (10.31, '.') + 1)) from dual
Above query returns 31 as the output. But below query returns 3 as the output.
select TO_NUMBER (SUBSTR(10.30, INSTR (10.30, '.') + 1)) from dual
How could I get the 30 as the output instead of the 3?

As it seems (from comments) that you are starting with a numeric value that you want to turn into words, you should begin by splitting it into dollars and cents.
If you really need to use substr etc, then you could start with a known format, such as to_char(amount,'fm9990.00'), so it will be a string with exactly two decimal places. However, if you have the numeric value it would be easier to convert it into the desired units using arithmetic functions. Whole dollars are trunc(amount) and cents are 100 * mod(amount,1).
Another issue is that the 'Jsp' date format approach can't handle zeroes. If you are using Oracle 12.2 or later there is a workaround using the default on conversion error clause:
create table demo
( amount number(6,2) );
insert into demo values (10.3);
insert into demo values (.25);
insert into demo values (25);
select amount
, nvl(to_char(to_date(trunc(amount) default null on conversion error,'J'),'Jsp'),'Zero') as dollars
, nvl(to_char(to_date(100 * mod(amount,1) default null on conversion error,'J'),'Jsp'),'Zero') as cents
from demo;
AMOUNT DOLLARS CENTS
-------- ------------ -------------
10.30 Ten Thirty
25.00 Twenty-Five Zero
0.25 Zero Twenty-Five
In 12.1 you could get around it using an inline function (maybe not a bad idea even in later versions, to simplify the rest of the query):
with
function to_words(num number) return varchar2 as
begin
return
case num
when 0 then 'Zero'
else to_char(to_date(num,'J'),'Jsp')
end;
end;
select amount
, to_words(trunc(amount)) as dollars
, to_words(100 * mod(amount,1)) as cents
from demo;
For values greater than 5373484 (the Julian representation of date '9999-12-31'), you can use this from Ask Tom: Spell the number (converted here to a WITH clause, but you can create it as a standalone function):
with function spell_number
( p_number in number )
return varchar2
as
-- Tom Kyte, 2001:
-- https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1407603857650
l_num varchar2(50) := trunc(p_number);
l_return varchar2(4000);
type myarray is table of varchar2(15);
l_str myarray :=
myarray
( ''
, ' thousand '
, ' million '
, ' billion '
, ' trillion '
, ' quadrillion '
, ' quintillion '
, ' sextillion '
, ' septillion '
, ' octillion '
, ' nonillion '
, ' decillion '
, ' undecillion '
, ' duodecillion ');
begin
for i in 1 .. l_str.count loop
exit when l_num is null;
if substr(l_num, length(l_num) -2, 3) <> 0 then
l_return := to_char(to_date(substr(l_num, length(l_num) - 2, 3), 'J'), 'Jsp') || l_str(i) || l_return;
end if;
l_num := substr(l_num, 1, length(l_num) - 3);
end loop;
return l_return;
end spell_number;
select amount
, spell_number(trunc(amount)) as dollars
, spell_number(100 * mod(amount,1)) as cents
from demo
/

I am actually surprised that your current query is even running without error, given that Oracle's SUBSTR function is supposed to operate on strings, not numbers. That being said, if you properly use your current query with strings, then it works:
SELECT TO_NUMBER(SUBSTR('10.30', INSTR ('10.30', '.') + 1)) FROM dual; -- returns 30
A more compact (though not necessarily more performant) way of doing this might be to use REGEXP_SUBSTR:
SELECT REGEXP_SUBSTR('10.30', '[0-9]+$') FROM dual;
This would retain only digits appearing after the decimal point, in the case that a decimal point be present. Otherwise, it would just return all numbers for inputs which have no decimal component.

u-sql script can not obtain scalar value from dataset

In u-sql script I must extract a variable from file to a dataset and then use it to form a name of output file. How can I get the variable from the dataset?
In details.
I have 2 input files: csv file with a set of fields and a dictionary file. The 1st file has file name like ****ClintCode*****.csv. The 2nd file-dictionary has 2 fields with mapping: ClientCode - ClintCode2. My task is extract ClientCode value from the file name, get ClientCode2 from the dictionary, insert it as a field to output file (implemented), and, moreover, form the name of output file as ****ClientCode2****.csv.
Dictionary csv file has the content:
OldCode NewCode
6HAA Alfa
CCVV Beta
CVXX gamma
? Davis
The question is how to get ClientCode2 into scalar variable to write an expression for the output file?
DECLARE #inputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ESP_FAD_GL_6HAA_20170930.txt"; // '6HAA' is ClientCode here that mapped to other code in ClientCode_KVP.csv
DECLARE #outputFile string = "D:/DFS_SSC_Automation/Tasks/FundInfo/ClientCode_sftp_" + // 'ClientCode' should be replaced with ClientCode from mapping in ClientCode_KVP.csv
DateTime.Now.ToString("yyyymmdd") + "_" +
DateTime.Now.ToString("HHmmss") + ".csv";
DECLARE #dictionaryFile string = "D:/DFS_SSC_Automation/ClientCode_KVP.csv";
#dict =
EXTRACT [OldCode] string,
[NewCode] string
FROM #dictionaryFile
USING Extractors.Text(skipFirstNRows : 1, delimiter : ',');
#theCode =
SELECT Path.GetFileNameWithoutExtension(#inputFile).IndexOf([OldCode]) >= 0 ? 1 : 3 AS [CodeExists],
[NewCode]
FROM #dict
UNION
SELECT *
FROM(
VALUES
(
2,
""
)) AS t([CodeExists],[NewCode]);
#code =
SELECT [NewCode]
FROM #theCode
ORDER BY [CodeExists]
FETCH 1 ROWS;
#GLdata =
EXTRACT [ASAT] string,
[ASOF] string,
[BASIS_INDICATOR] string,
[CALENDAR_DATE] string,
[CR_EOP_AMOUNT] string,
[DR_EOP_AMOUNT] string,
[FUND_ID] string,
[GL_ACCT_TYPE_IND] string,
[TRANS_CLIENT_FUND_NUM] string
FROM #inputFile
USING Extractors.Text(delimiter : '|', skipFirstNRows : 1);
// Prepare output dataset
#FundInfoGL =
SELECT "" AS [AccountPeriodEnd],
"" AS [ClientCode],
[FUND_ID] AS [FundCode],
SUM(GL_ACCT_TYPE_IND == "A"? System.Convert.ToDecimal(DR_EOP_AMOUNT) : 0) AS [NetValueOtherAssets],
SUM(GL_ACCT_TYPE_IND == "L"? System.Convert.ToDecimal(CR_EOP_AMOUNT) : 0) AS [NetValueOtherLiabilities],
0.0000 AS [NetAssetsOfSeries]
FROM #GLdata
GROUP BY FUND_ID;
// NetAssetsOfSeries calculation
#FundInfoGLOut =
SELECT [AccountPeriodEnd],
[NewCode] AS [ClientCode],
[FundCode],
Convert.ToString([NetValueOtherAssets]) AS [NetValueOtherAssets],
Convert.ToString([NetValueOtherLiabilities]) AS [NetValueOtherLiabilities],
Convert.ToString([NetValueOtherAssets] - [NetValueOtherLiabilities]) AS [NetAssetsOfSeries]
FROM #FundInfoGL
CROSS JOIN #code;
// Output
OUTPUT #FundInfoGLOut
TO #outputFile
USING Outputters.Text(outputHeader : true, delimiter : '|', quoting : false);

As David points out: You cannot assign query results to scalar variables.
However, we have a dynamic partitioned output feature in private preview right now that will give you the ability to generate file names based on column values. Please contact me if you want to try it out.

You can't. Please see Convert Rowset variables to scalar value.
You may still be able to achieve your ultimate goal in a different manner. Please consider re-writing your post with clear & concise language, small dataset, expected output, and a very minimal amount of code needed to repro - remove all details and nuances that aren't necessary to create a test case.

Access 2010 sql query to format 14 character finance data

I have raw finance text files that I'm importing into Access 2010 and exporting in Excel format. These files contain several 14 character length fields which represent dollar values. I'm having issues converting these fields into currency because of the 14th character. The 14th character is a number represented by a bracket or letter. It also dictates whether the unique field is a positive or negative value.
Positive numbers 0 to 9 start with open bracket { being zero, A being one, B being two,...I being nine.
Negative numbers -0 to -9 (I know, -0 is a mathematical faux pas but stay with me. I don't know how else to explain it.) start with close bracket } being -0, J being -1,K being -2,...R being -9.
Example data (all belonging to the same field/column):
0000000003422{ converted is $342.20
0000000006245} converted is -$624.50
0000000000210N converted is -$21.05
0000000011468D converted is $1,146.84
Here's the query that I'm working with. Each time I execute it, the entire field is deleted though. I would prefer to stick to a SQL query if possible but I'm open to all methods of resolution.
SET FIELD_1 = Format(Left([FIELD_1],12) & "." & Mid([FIELD_1],13,1) & IIf(Right([FIELD_1],1)="{",0,IIf(Right([FIELD_1],1)="A",1,IIf(Right([FIELD_1],1)="B",2,IIf(Right([FIELD_1],1)="C",3,IIf(Right([FIELD_1],1)="D",4,IIf(Right([FIELD_1],1)="E",5,IIf(Right([FIELD_1],1)="F",6,IIf(Right([FIELD_1],1)="G",7,IIf(Right([FIELD_1],1)="H",8,IIf(Right([FIELD_1],1)="I",9,"")))))))))),"$##0.00"), IIf(Right([FIELD_1],1)="}",0,IIf(Right([FIELD_1],1)="J",1,IIf(Right([FIELD_1],1)="K",2,IIf(Right([FIELD_1],1)="L",3,IIf(Right([FIELD_1],1)="M",4,IIf(Right([FIELD_1],1)="N",5,IIf(Right([FIELD_1],1)="O",6,IIf(Right([FIELD_1],1)="P",7,IIf(Right([FIELD_1],1)="Q",8,IIf(Right([FIELD_1],1)="R",9,"")))))))))),"-$##0.00")

here is a function that you can call to convert an input string like the ones in your example into a string formatted as you desire.
Private Function ConvertCurrency(strCur As String) As String
Const DIGITS = "{ABCDEFGHI}JKLMNOPQR"
Dim strAlphaDgt As String
Dim intDgt As Integer, intSign As Integer
Dim f As Integer
Dim curConverted As Currency
strAlphaDgt = Right(strCur, 1) ' Extract 1st char from right
f = InStr(DIGITS, strAlphaDgt) ' Search char in DIGITS. Its position is related to digit value
intDgt = (f - 1) Mod 10 ' Converts position into value of the digit
intSign = 1 - 2 * Int((f - 1) / 10) ' If it's in the 1st half is positive, if in the 2nd half of DIGITS it's negative
curConverted = intSign * _
CCur(Left(strCur, Len(strCur) - 1) & _
Chr(intDgt + 48)) / 100 ' Rebuild a currency value with 2 decimal digits
ConvertCurrency = Format(curConverted, _
"$#,###.00") ' Format output
End Function
If you need to have a Currency as returned value, you can change the type returned from String to Currency and return the content of curConverted variable.
Bye.

How to remove an extra char which is coming in XMLAGG() output

Im using Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
We replaced LISTAGG() with XMLAGG() to avoid concatenation error.
when i check the lenght of charecters from both of the fuction output, XMLAGG() giving an extra char in length.
Could you please suggest me how can i overcome this issue.
Please find the below sql and out put
XMLAGG():
SELECT TO_CHAR (
SUBSTR (
XMLAGG (XMLELEMENT (e, table_name, CHR (13)).EXTRACT (
'//text()') ORDER BY tablespace_name).GetClobVal (),
1,
2000))
AS str_concate,
LENGTH (
TO_CHAR (
SUBSTR (
XMLAGG (XMLELEMENT (e, table_name, CHR (13)).EXTRACT (
'//text()') ORDER BY tablespace_name).GetClobVal (),
1,
2000)))
AS str_length
FROM all_tables
WHERE table_name = 'TEST_LOAD
OUTPUT:
STR_CONCATE STR_LENGTH
TEST_LOAD TEST_LOAD 26
LISTAGG()
SELECT LISTAGG (SUBSTR (table_name, 1, 2000), CHR (13))
WITHIN GROUP (ORDER BY tablespace_name)
AS str_concate,
LENGTH (
LISTAGG (SUBSTR (table_name, 1, 2000), CHR (13))
WITHIN GROUP (ORDER BY tablespace_name))
AS str_length
FROM all_tables
WHERE table_name = 'TEST_LOAD';
OUTPUT:
STR_CONCATE STR_LENGTH
TEST_LOAD TEST_LOAD 25

In case of XMLELEMENT, you actually create node of XML tree with two children: table_name and CHR(13). (May be it finally looks like single node since both are texts but it is not important.) It is expansion of value_expr nonterminal. The substantial thing is the node is not aware of other nodes and CHR(13) is added to every node as its suffix or, in other words, terminator.
In case of LISTAGG, you describe aggregation of multiple elements. In this case, your CHR(13) serves as delimiter (see syntax diagram) which is put between elements. It is separator rather than terminator.
Since XMLAGG does not suffer with 4000 char limit, I usually prefer XMLAGG.
If separator is needed, I recommend to prepend it before each value and cut first occurence using substr. Appending after is possible but makes expression harder.
substr(
xmlagg(
xmlelement(e, ', ' || table_name).extract('//text()')
order by tablespace_name
).getclobval(),
3 -- length(', ')+1
)

Find first comma in string, then extract value between spaces

I'm extracting rows from a txt file.
This row contains values like this:
DESCRIPTION 1 1.234,00 15.980,00 [etc.]
I would like to extract these values (I mean only numeric values).
So I thought to find first comma, execute a for cycle backwards until first White space and execute a For cycle forward for decimals digits.
The I should go to the second comma and perform these cycles again.
Can you suggest some code that could be useful for my solution?

From your description, if you just need the decimal number before the comma, then you can do this with a pretty simple regex:
Dim s = "DESCRIPTION 1 1.234,00 15.980,00"
Dim pattern = "\d+(\.\d+)?,\d+"
Dim matches = System.Text.RegularExpressions.Regex.Matches(s, pattern)
For Each match in matches
Console.WriteLine(match.Value)
Next
'Outputs:
'
'1.234,00
'15.980,00
Here's a quick breakdown of the regex:
\d+ - \d is shorthand for [0-9], which just means "any numeric character". The + just indicates "one or more"
\. - this just matches a period character.
, - this just matches a comma.
( ... ) - parentheses just creates a group (think of it as a sub-regex)
? - question marks mean that the previous item is optional. In this case, that means that the group matching (\.\d+)? is optional, which allows you to match both 0.000,00 and 0,00
In that regex, if the comma and period are optional, then you can add a ? after them.

My Visual Basic knowledge is pretty limited, but can't you utilize the IsNumeric function available in VB.NET?
Someting like this:
' initial string/row/etc
Dim s As String = "DESCRIPTION 1 1.234,00 15.980,00"
' Split string based on spaces
Dim words As String() = s.Split(New Char() {" "c})
' Use For Each loop over split and display them
Dim word As String
For Each word In words
If IsNumeric(word) Then
Console.WriteLine(word & " is numeric")
Else
Console.WriteLine(word & " is not numeric")
End If
Next

I think you'll be needing to look at System.Text.Regex.
Match m = Regex.Match("DESCRIPTION 1 1.234,00 15.980,00", ".*?( [0-9]*?.(?'n1'[0-9]+),(?'n2'[0-9]+)))
While m.Success
System.Diagnostics.Debug.WriteLine(m.Groups["n1"].Value + " "+m.Groups["n2"].Value);
m = m.NextMatch()
End While

If the columns are fixed width, you can get the values like this:
Dim input As String = "DESCRIPTION 1 1.234,00 15.980,00"
Dim col1 As String = input.SubString(17, 12).Trim()
Dim col2 As String = input.SubString(29).Trim()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract a regex capture group from a string in MariaDB - mariadb

For example: Regex: District ([0-9]{1,2})([^0-9]|$) Input District 12 2021 returns 12 Input Southern District 3 returns 3 Input FooBar returns NULL

For those users who might be stuck with an earlier version of MySQL or MariaDB which does not have REGEXP_REPLACE available, we can also use SUBSTRING_INDEX here: SELECT SUBSTRING_INDEX( SUBSTRING_INDEX('Southern District 3', 'District ', -1), ' ', 1); -- 3

Related

PLSQL SUBSTR function ignore the trailing zero

u-sql script can not obtain scalar value from dataset

Access 2010 sql query to format 14 character finance data

How to remove an extra char which is coming in XMLAGG() output

Find first comma in string, then extract value between spaces

Categories

Resources