I am trying to extract the following from the text field using Regrex in Oracle.
For example
"This is example,
and this really a example :h,j,j,j,j,
l //Updated question , as this letter is on the next line
now this is a disease:yes"
I am expecting a result as h,j,j,j,j,l, but if I use
REGEXP_SUBSTR(text_field,'example :[^:]+,') AS Result
I am getting example:h,j,j,j,j
But I am not getting the last letter 'l' like above and I am guessing that's because it's on the next line.Also, if I want the string "disease:yes" only, that will be so helpful as well. Thank you much!
The result you are getting is because your pattern includes the word 'example' and ends with a comma, leaving out the ending 'l'. Try this form instead. Note the example is shown using a Common table Expression (CTE). The WITH statement creates the table called tbl which just sets up test data, kind of like a temp table. This is also a great way to set up data when asking a question. This form of the REGEXP_SUBSTR() function uses a captured group, which is the set of characters after the string 'example:' until the end of that line in the multi-line field. From this you should be able to get the other string you are after. Give it a go.
WITH tbl(text_field) AS (
SELECT 'This is example,
and this really a example :h,j,j,j,j,l
now this is a disease:yes' FROM dual
)
SELECT REGEXP_SUBSTR(text_field,'example :(.*)', 1, 1, NULL, 1) AS Result
FROM tbl;
RESULT
-----------
h,j,j,j,j,l
1 row selected.
Edit based on new info. Since that last letter could be on it's own line, you'll need to allow for the newline. Use the 'n' flag to REGEXP_REPLACE() which allows the newline to match in the usage of the dot (match any character) symbol in regex. We switch to REGEXP_REPLACE as we'll need to return multiple capture groups. Here the WITH sets up 2 rows, one with an embedded newline in the data and one without. The capture groups are (going left to right) 1-the data after "example :" and ending in a comma, 2-the optional newline and 3-the next single character. Then replace the entire data with captured groups 1 and 3 (leaving out the newline).
NOTE this is very specific to the case of only 1 character on the following line.
WITH tbl(ID, text_field) AS (
SELECT 1, 'This is example,
and this really a example :h,j,j,j,j,
l
now this is a disease:yes' FROM dual UNION ALL
SELECT 2, 'This is example,
and this really a example :h,j,j,j,j,l
now this is a disease:yes' FROM dual
)
SELECT ID,
REGEXP_REPLACE(text_field, '.*example :(.*,)('||CHR(10)||')?(.).*', '\1\3', 1, 1, 'n') AS Result
FROM tbl;
ID RESULT
---------- ------------
1 h,j,j,j,j,l
2 h,j,j,j,j,l
2 rows selected.
I have more than 200 000 data in a table with a column that has a special character 'Æ??` and text is test1234Æ??. How to replace that with '?' symbol?
The Æ has unicode 00C6. Just google "Æ unicode" to find that - with that knowledge, you can then use the UNISTR function to represent that character. But you can also just paste that character in your sql as shown below if your client supports the unicode characters.
WITH mystrings AS
(SELECT 'found a Æ in this text' as str FROM DUAL UNION ALL
SELECT 'text is test1234Æ??' FROM DUAL
)
SELECT
REPLACE(str,UNISTR('\00c6'),'?') as clean_string,
REPLACE(str,'Æ','?') as clean_string2,
REPLACE(str,UNISTR('\00c6??'),'?') as clean_stringnoqm
FROM mystrings;
CLEAN_STRING CLEAN_STRING2 CLEAN_STRINGNOQM
----------------------- ----------------------- -----------------------
found a ? in this text found a ? in this text found a Æ in this text
text is test1234??? text is test1234??? text is test1234?
If you want to only keep characters in the range a-zA-Z and comma you could use a regular expression. There are plenty of other answers around for that, for example this one.
WITH mystrings AS
( SELECT
'Brand1® is supercool, but I prefer bRand2™ since it supports Æ' AS str
FROM dual
)
SELECT
regexp_replace(str,'[^A-Za-z0-9, ]', '?') AS clean_string
FROM mystrings;
I need a procedure or function for validation-
if i insert data in a column then only that numbers will enter which have only integers not special character not alphanumeric values.
If I understand it correctly you need a function that in a string with numbers, letters and special characters will keep only the numbers.
In that case try with REGEXP_REPLACE
E.g.:
SELECT REGEXP_REPLACE('1/A2!46', '([^0-9])+', '') FROM dual;
Output:
REGEXP_REPLACE('1/A2!46','([^0-9])+','')
-----------------------------------------
1246
I am trying to split a string (a delimited string separated using '|' or ','). I used fn:tokenize to implement this. Consider below the example text in which I have 4 columns text out of which in 3rd column i got the same value as split pattern.
fn:tokenize("column1|column2|||column4", "|")
Result of the above code is giving me 5 values in which 2 are empty:
column1
column2
column4
I also tried with adding quotes to column3 value, which is also not giving me the expected result.
In MarkLogic 9 you can define your own custom tokenizer.
Apart from fn:tokenize splitting by regular expressions and thus requiring | to be escaped, this seems like a horrible data format. Putting apart issues indicated by Michael Kay and expecting that || will always indicate a new field starting with |, and there are never empty columns, you can apply a simple hack and replace the pipe symbols by another character, and converting back afterwards. This requires you find some character in the Unicode range not allowed in your data set, though.
for $token in fn:tokenize(fn:replace("column1|||||column4", "\|\|", "|_"), "\|")
return fn:replace($token, "_", "|")
Result:
column1
|
|
column4
If the assumptions I made do not apply to your use case, you will have to determine another set of similar strict assumptions to be able to parse your contents.
I do a full text search using LIKE clause and the text can contain a '%'.
What is a good way to search for a % sign in an sqlite database?
I did try
SELECT * FROM table WHERE text_string LIKE '%[%]%'
but that doesn't work in sqlite.
From the SQLite documentation
If the optional ESCAPE clause is present, then the expression following the ESCAPE keyword must evaluate to a string consisting of a single character. This character may be used in the LIKE pattern to include literal percent or underscore characters. The escape character followed by a percent symbol (%), underscore (_), or a second instance of the escape character itself matches a literal percent symbol, underscore, or a single escape character, respectively.
We can achieve same thing with the below query
SELECT * FROM table WHERE instr(text_string, ?)>0
Here :
? => your search word
Example :
You can give text directly like
SELECT * FROM table WHERE instr(text_string, '%')>0
SELECT * FROM table WHERE instr(text_string, '98.9%')>0 etc.
Hope this helps better.