I need help extracting 49 from this string: "7 DAYS LATE 49 UNDERUSE" in Teradata, I keep on messing up with STRTOK or regexp_substr. Thanks
select strtok('7 DAYS LATE 49 UNDERUSE',' ',4);
You can use REGEXP_SUBSTR to extract a number between two strings. Here's one way:
SELECT REGEXP_SUBSTR(
'7 DAYS LATE 49 UNDERUSE' -- source string
'.*\w*\s+(\d+)\w*\s+.*', -- regexp pattern
1, 1, 'i' -- additional options (start position, match #, case sensitivity)
)
This will give you the first matching number between two strings that precede and follow whitespace.
TD Manual
SQL Fiddle (Postgres)
Related
I have something like below stored in a table column. I need only 133 extracted from this.
015.133.Governmental Affairs
When I do
select regexp_substr('015.133.Governmental Affairs', '\.*+[[:digit:]]+*',1,2) from dual;
The result is .133
If I do
regexp_substr('015.133.Governmental Affairs', '\*+[[:digit:]]+*',1,2)
it returns nothing. What's correct expression here?
The trick with coming up with a good regex is to be able to explain it in plain language first.
Editing to explain better hopefully.
Here I am matching zero or more digits where followed by a literal period. The 4th argument to REGEXP_SUBSTR (2) is which occurrence of this pattern to match on. Note the pattern consists of 2 groups as defined by being surrounded by parentheses. The 6th argument to REGEXP_SUBSTR says when a match is found to return the 1st subgroup (the numbers, not the period), if you put a 2 there you'd get the period that follows the number 133.
SELECT REGEXP_SUBSTR('015.133.Governmental Affairs', '([[:digit:]]*?)(\.)', 1, 2, NULL, 1) AS nbr
FROM dual;
NBR
---
133
1 row selected.
Here's something adapted from this question: How to extract group from regular expression in Oracle?
SELECT REGEXP_REPLACE(
'015.133.Governmental Affairs',
'^[[:digit:]]+\.([[:digit:]]+)\..*',
'\1'
) FROM DUAL;
The regex looks for a string that starts with a series of digits, then ., then more digits, then another ., then the rest of the string. It then replaces the entire match (which is the entire string) with \1, which is whatever was in that second set of digits, inside the parentheses.
I am trying to create a query which groups payments into ranges (e.g. 4-, 5 - 9, 10 - 49, 50 - 99, 100 - 149, 150+).
If I try to order these by the above range they appear in alphabetical order (as you would expect).
Is it possible for me to order these by a manual list (see above range)
What's your TD release?
TD14 supports regular expressions, simply extract the first string of digits and cast it to an integer:
ORDER BY CAST(REGEXP_SUBSTR(grp, '[0-9]+') AS INTEGER)
You can use OTRANSLATE, which basically allows you to specify characters and replace them with another.
EDIT : Thanks to JNeville for setting me straight on this being ranges. The same idea still applies though, if you take his suggestion to make the last entry a range as well.
So, assuming you only have numbers, +, and -, and white space:
select
otranslate(<your column>,'+- ','')
from
<your table>
Which should return just the numeric portion of those strings. Then you should be able to cast it as an integer, and sort it.
create volatile table vt as
(select cast ('-5' as varchar(10)) as theCol)
with data
on commit preserve rows;
INSERT into vt values ('10 - 49');
INSERT INTO vt values ('50 - 99');
insert into vt
values ('150-9999');
select
cast (otranslate(theCol,'+- ','') as integer) as theNum
from
vt
order by theNum
5
1049
5099
1509999
I am wanting to sort my data but the standard Excel "A to Z" sort function isn't cutting it. I was hoping someone knew how to make a custom sort that could suit my needs. Here is a sample:
chrPos count
chr1_10000598 10
chr1_10000647 10
chr1_10001370 30
chr1_10001390 30
chr1_10001392 30
chr1_10001414 30
chr1_10001418 30
chr1_10001473 10
chr1_10001505 10
chr1_10001516 20
chr1_1000156 30
As you can see the last row is out of place when using the built in sort function, this should be the first row not the last one here. I think adding a second layer of sorting would to the trick but that layer would have to sort by ascending value based on the number that is following the underscore.
Any ideas? Would this possibly be easier with R instead?
Edit to add details from comments:
Sorting is to be ascending on the numeric part after the underscore, within ascending on the chr numeric part (running from 1 to 22 both inclusive) and then chrM_, chrX_ and chrY_ in that order (also with their numeric parts sorted ascending).
The numeric part after the underscore may be up to 8 digits.
Assuming chrPos is in ColumnA, please try in a helper column:
=IF(FIND("_",A1)=5,CHAR(64+MID(A1,4,1)),CHAR(64+MID(A1,4,2)))&REPT("0",8-LEN(A1)+FIND("_",A1))&MID(A1,FIND("_",A1)+1,8)
OR, for additional requirements as mentioned in comments:
=IF(MID(A1,4,1)="M","W",IF(MID(A1,4,1)="X","X",IF(MID(A1,4,1)="Y","Y",IF(FIND("_",A1)=5,CHAR(64+MID(A1,4,1)),CHAR(64+MID(A1,4,2))))))&REPT("0",9-LEN(A1)+FIND("_",A1))&MID(A1,FIND("_",A1)+1,9)
then select the helper column, Copy, Paste Special, Values over the top and use that for sorting.
For a project, I have to loop through the alphabet and run a search for each letter against some values in a database. The function would return the number of matches for each letter.
I would like to be able to do this in a SQL Stored Procedure, but I'm not certain how I could do a 'FOR letter = A to Z' loop in a SP. Does anyone know how this could be done?
it depends according to alphabet if you only need English characters you can do a loop from 65 (ascii for A) and 90 (Z) and use char letter = (char)i to get the letter.
If you also need non-English ones just set a web config setting "ABC......Z" and loop through it.
with ATable(c) as
(
select cast('A' as CHAR(1)) as c
union all
select CHAR(ASCII(c)+1) as C from ATable where C<'Z'
)
select * from ATable
SQLFiddle demo
Use a loop going from 65 (A) to 90 (Z), and use the T-SQL CHAR() function.
Of course, I'm assuming that you are using a SQL Server database. If not, please post the DB you're using.
You need to use "group by" and possibly include "count" in the query too. you can find further information here http://msdn.microsoft.com/en-us/library/ms177673.aspx
;WITH Alphabet AS
(
SELECT CHAR(65) AS Letter, 65 AS Code
UNION ALL
SELECT CHAR(Code + 1), Code + 1
FROM Alphabet
WHERE Code < 90
)
SELECT Letter
FROM Alphabet
WITH alpha AS
(
SELECT 65 AS c
UNION ALL
SELECT c + 1 FROM alpha
WHERE c < 90
)
SELECT CHAR(c) FROM alpha
this doubt is very basic, however, after reading an answer for a given question I got fairly confused (I don't know why as it is a simple subject).
Consider this basic query:
SELECT * FROM emp WHERE ename BETWEEN ‘A’ AND ‘C’
The employees name returned will be those whose names start with A and B, and the explanation is as follows:
Here, a character column is compared against a string using the
BETWEEN operator, which is equivalent to ename >= ‘A’ AND ename <=
‘C’. The name CLARK will not be included in this query, because
‘CLARK’ is > ‘C’.
Why is Clark considered greater than 'C' if in the explanation we have the statement: ename is less than or equal to 'C' ?
Thank you.
Because when you alphabetically sort
Constant Clark C Claude
you'll get
C Clark Claude Constant
so
C < Clark < Claude < Constant
See Wikipedia for a more formal explanation, the essence is this (emphasis mine):
To decide which of two strings comes first in alphabetical order,
initially their first letters are compared. The string whose first
letter appears earlier in the alphabet comes first in alphabetical
order. If the first letters are the same, then the second letters are
compared, and so on, until the order is decided. (If one string runs
out of letters to compare, then it is deemed to come first; for
example, "cart" comes before "carthorse".) The result of arranging a
set of strings in alphabetical order is that words with the same first
letter are grouped together, and within such a group words with the
same first two letters are grouped together and so on.
Why is Clark considered greater than 'C' if in the explanation we have the statement: ename is less than or equal to 'C' ?
There is one thing to consider here.For obvious reasons we have ALLEN,BLAKE,CLARK,ADAMS,A,C.Sorting alphabetically we have
A
ADAMS
ALLEN
BLAKE
C
CLARK
That's why CLARK IS NOT PART OF THE RANGE BECAUSE IT COMES AFTER C.