What is an equivalent to OTRANSLATE in BigQuery? - teradata

I'm trying to translate a query to run in BigQuery that uses the OTRANSLATE function from Teradata. For example,
SELECT OTRANSLATE(text, 'ehlo', 'EHLO')
FROM (
SELECT 'hello world' AS text UNION ALL
SELECT 'elliott'
);
This should produce:
HELLO wOrLd
ELLiOtt
Is there any way of expressing this function in BigQuery? It doesn't look like there is a direct equivalent.

Another, slightly different approach (BigQuery Standard SQL)
#standardSQL
CREATE TEMP FUNCTION OTRANSLATE(text STRING, from_string STRING, to_string STRING) AS ((
SELECT STRING_AGG(IFNULL(y, a), '' ORDER BY pos)
FROM UNNEST(SPLIT(text, '')) a WITH OFFSET pos
LEFT JOIN (
SELECT x, y
FROM UNNEST(SPLIT(from_string, '')) x WITH OFFSET
JOIN UNNEST(SPLIT(to_string, '')) y WITH OFFSET
USING(OFFSET)
)
ON a = x
));
WITH `project.dataset.table` AS (
SELECT 'hello world' AS text UNION ALL
SELECT 'elliott'
)
SELECT text, OTRANSLATE(text, 'ehlo', 'EHLO') as new_text
FROM `project.dataset.table`
with output
Row text new_text
1 hello world HELLO wOrLd
2 elliott ELLiOtt
Note: above version assumes from and to strings of equal length and no repeated chars in from string
Update to follow up on "expanded expectations" for the version of that function in BigQuery
#standardSQL
CREATE TEMP FUNCTION OTRANSLATE(text STRING, from_string STRING, to_string STRING) AS ((
SELECT STRING_AGG(IFNULL(y, a), '' ORDER BY pos)
FROM UNNEST(SPLIT(text, '')) a WITH OFFSET pos
LEFT JOIN (
SELECT x, ARRAY_AGG(IFNULL(y, '') ORDER BY OFFSET LIMIT 1)[OFFSET(0)] y
FROM UNNEST(SPLIT(from_string, '')) x WITH OFFSET
LEFT JOIN UNNEST(SPLIT(to_string, '')) y WITH OFFSET
USING(OFFSET)
GROUP BY x
)
ON a = x
));
SELECT -- text, OTRANSLATE(text, 'ehlo', 'EHLO') as new_text
OTRANSLATE("hello world", "", "EHLO") AS empty_from, -- 'hello world'
OTRANSLATE("hello world", "hello world1", "EHLO") AS larger_from_than_source, -- 'EHLLL'
OTRANSLATE("hello world", "ehlo", "EHLO") AS equal_size_from_to, -- 'HELLO wOrLd'
OTRANSLATE("hello world", "ehlo", "EH") AS larger_size_from, -- 'HE wrd'
OTRANSLATE("hello world", "ehlo", "EHLOPQ") AS larger_size_to, -- 'hello world'
OTRANSLATE("hello world", "ehlo", "") AS empty_to; -- 'wrd'
with result
Row empty_from larger_from_than_source equal_size_from_to larger_size_from larger_size_to empty_to
1 hello world EHLLL HELLO wOrLd HE wrd HELLO wOrLd wrd
.
Note: Teradata version of this function is recursive, so the current implementation is not exact implementation of Teradata's OTRANSLATE
Usage Notes (from teradata documentation)
If the first character in from_string occurs in the source_string, all occurrences of it are replaced by the first character in to_string. This repeats for all characters in from_string and for all characters in from_string. The replacement is performed character-by-character, that is, the replacement of the second character is done on the string resulting from the replacement of the first character.
This can be easily implemented with JS UDF which is trivial, I think so I am not going this direction :o)

Yes, you can do this using array operations over the strings. Here is one solution:
CREATE TEMP FUNCTION OTRANSLATE(s STRING, key STRING, value STRING) AS (
(SELECT
STRING_AGG(
IFNULL(
(SELECT value[OFFSET(
SELECT o FROM UNNEST(SPLIT(key, '')) AS k WITH OFFSET o2
WHERE k = c)]
),
c),
'' ORDER BY o1)
FROM UNNEST(SPLIT(s, '')) AS c WITH OFFSET o1)
)
);
SELECT OTRANSLATE(text, 'ehlo', 'EHLO')
FROM (
SELECT 'hello world' AS text UNION ALL
SELECT 'elliott'
);
The idea is to find the character in the same position of the key string in the value string. If there is no matching character in the key string, we end up with a null offset, so the second argument to IFNULL causes it to return the unmapped character. We then aggregate back into a string, ordered by character offset.
Edit: Here's a variant that also handles differences in the key and value lengths:
CREATE TEMP FUNCTION otranslate(s STRING, key STRING, value STRING) AS (
IF(LENGTH(key) < LENGTH(value) OR LENGTH(s) < LENGTH(key), s,
(SELECT
STRING_AGG(
IFNULL(
(SELECT ARRAY_CONCAT([c], SPLIT(value, ''))[SAFE_OFFSET((
SELECT IFNULL(MIN(o2) + 1, 0) FROM UNNEST(SPLIT(key, '')) AS k WITH OFFSET o2
WHERE k = c))]
),
''),
'' ORDER BY o1)
FROM UNNEST(SPLIT(s, '')) AS c WITH OFFSET o1
))
);
SELECT
otranslate("hello world", "", "EHLO") AS empty_from, -- 'hello world'
otranslate("hello world", "hello world1", "EHLO") AS larger_from_than_source, -- 'hello world'
otranslate("hello world", "ehlo", "EHLO") AS equal_size_from_to, -- 'HELLO wOrLd'
otranslate("hello world", "ehlo", "EH") AS larger_size_from, -- 'HE wrd'
otranslate("hello world", "ehlo", "EHLOPQ") AS larger_size_to, -- 'hello world'
otranslate("hello world", "ehlo", "") AS empty_to; -- 'wrd'

Related

Extract a regex capture group from a string in MariaDB

For example:
Regex: District ([0-9]{1,2})([^0-9]|$)
Input District 12 2021 returns 12
Input Southern District 3 returns 3
Input FooBar returns NULL
The function REGEXP_SUBSTR doesn't allow extracting a single capturing group.
You can use e.g. REGEXP_REPLACE(input, regex, '\\1') to replace occurrences of regex in input with the first capture group of regex.
The following stored function makes this easy to use:
DELIMITER $$
CREATE FUNCTION regexp_extract(inp TEXT, regex TEXT, capture INT) RETURNS TEXT DETERMINISTIC
BEGIN
DECLARE capstr VARCHAR(5);
DECLARE mregex TEXT;
IF inp IS NULL OR LENGTH(inp) = 0 OR inp NOT REGEXP regex THEN
RETURN NULL;
END IF;
SET capstr = CONCAT('\\', capture);
SET mregex = CONCAT('.*', regex, '.*'); -- Want to match the entire input string so it all gets replaced
RETURN REGEXP_REPLACE(inp, mregex, capstr);
END;
$$
DELIMITER ;
Used like so:
SELECT regexp_extract('District 12 2021', 'District ([0-9]{1,2})([^0-9]|$)', 1);
For those users who might be stuck with an earlier version of MySQL or MariaDB which does not have REGEXP_REPLACE available, we can also use SUBSTRING_INDEX here:
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX('Southern District 3', 'District ', -1), ' ', 1); -- 3

Crafting a Subquery-able UNION ALL based on the results of a query

Data
I have a couple of tables like so:
CREATE TABLE cycles (
`cycle` varchar(6) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`cycle_type` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`start` date DEFAULT NULL,
`end` date DEFAULT NULL
);
CREATE TABLE rsvn (
`str` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL
);
INSERT INTO `cycles` (`cycle`, `cycle_type`, `start`, `end`) values
('202013', 'a', '2021-01-04', '2021-01-31'),
('202013', 'b', '2021-01-04', '2021-01-31'),
('202101', 'a', '2021-01-04', '2021-01-31'),
('202101', 'b', '2021-01-04', '2021-01-31'),
('202102', 'a', '2021-02-01', '2021-02-28'),
('202102', 'b', '2021-02-01', '2021-02-28'),
('202103', 'a', '2021-03-01', '2021-03-28'),
('202103', 'b', '2021-03-01', '2021-03-28');
INSERT INTO `rsvn` (str, start_date, end_date) values
('STR01367', '2020-12-07', '2020-06-21'),
('STR00759', '2020-12-07', '2021-04-25'),
('STR01367', '2021-01-04', '2021-09-12'),
('STR01367', '2021-06-21', '2022-02-27');
Desired Results
For any given cycle, I want to count the number of occurrences of str across cycles. So between cycle 2108 - 2108 (one cycle), I see:
str
count
STR01367
1
STR00759
1
And from between 2108 - 2109 (two cycles) I see:
str
count
STR01367
2
STR00759
1
What I've tried
I'm trying to figure out how to dynamically obtain those results. I don't see any options outside a UNION ALL query (one query for each cycles), so I tried writing a PROCEDURE. However, that didn't work because I want to do post-processing on the query results, and I don't believe you can use the results of a PROCEDURE in a CTE or subquery.
My PROCEDURE (works, can't include results in a subquery like SELECT * FROM call count_cycles (?)):
CREATE PROCEDURE `count_cycles`(start_cycle CHAR(6), end_cycle CHAR(6))
BEGIN
SET #cycles := (
SELECT CONCAT('WITH installed_cycles_count AS (',
GROUP_CONCAT(
CONCAT('
SELECT rsvn.str, 1 AS installed_cycles
FROM rsvn
WHERE "', `cy`.`start`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
OR "', `cy`.`end`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
GROUP BY rsvn.str
'
)
SEPARATOR ' UNION ALL '
),
')
SELECT
store.chain AS "Chain"
,store.division AS "Division"
,dividers_store AS "Store"
,SUM(installed_cycles) AS "Installed Cycles"
FROM installed_cycles_count r
LEFT JOIN store ON store.name = r.dividers_store
GROUP BY dividers_store
ORDER BY chain, division, dividers_store, installed_cycles'
)
FROM cycles `cy`
WHERE `cy`.`cycle_type` = 'Ad Cycle'
AND `cy`.`cycle` >= CONCAT('20', RIGHT(start_cycle, 4))
AND `cy`.`cycle` <= CONCAT('20', RIGHT(end_cycle, 4))
GROUP BY `cy`.`cycle_type`
);
EXECUTE IMMEDIATE #cycles;
END
Alternatively, I attempted to use a recursive query to obtain my results by incrementing my cycle. This gave me the cycles I wanted:
WITH RECURSIVE xyz AS (
SELECT cy.`cycle`, cy.`start`, cy.`end`
FROM cycles cy
WHERE cycle_type = 'Ad Cycle'
AND `cycle` = '202101'
UNION ALL
SELECT cy.`cycle`, cy.`start`, cy.`end`
FROM xyz
JOIN cycles cy
ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
AND cy.`cycle_type` = 'Ad Cycle'
WHERE cy.`cycle` <= '202110'
)
SELECT * FROM xyz;
But I can't get it working when I add in the reservations table:
infinite loop?
WITH RECURSIVE xyz AS (
SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
FROM cycles cy
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
WHERE cy.`cycle_type` = 'Ad Cycle'
AND cy.`cycle` = '202101'
UNION ALL
SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
FROM xyz
JOIN cycles cy
ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
AND cy.`cycle_type` = 'Ad Cycle'
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
WHERE cy.`cycle` <= '202102'
)
SELECT * FROM xyz
What options do I have to get the results I need, in such a way that I can use them in a CTE or subquery?
The results I am looking for are easily obtained via a two-stage grouping. Something like this:
WITH sbc AS (
SELECT cy.`cycle`, dr.str, 1 AS 'count'
FROM cycles cy
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND dr.end_date
OR cy.`end` BETWEEN dr.start_date AND dr.end_date
WHERE cy.`cycle_type` = 'Ad Cycle'
AND cy.`cycle` BETWEEN '202201' AND '202205'
GROUP BY cy.`cycle`, dr.str
ORDER BY dr.str, cy.`cycle`
)
SELECT `cycle`, str, SUM(`count`) as `count`
FROM sbc
GROUP BY str
The CTE produces one result per rsvn per cycle. Afterwards all that is needed is to group by store and count the number of occurrences.
Besides being simpler, I suspect that this query is faster than the union concept I was stuck on when I asked the question, since among other things the server does not need to perform a union on multiple grouping queries. However, I do not understand how MariaDB optimizes such queries, and while I am curious I don't have the time to run benchmarks to find out.

How to use a % wild card in a query to an sqlite3 table with user input where clause

I have written a query which allows a user to search a table in an sqlite3 database with any of 4 different fields. It will only return data if what the user enters is an exact match is there a way to put a wild card in for each field. Code below
import sqlite3
conn = sqlite3.connect("lucyartists.db")
cursor = conn.cursor()
#Get user input
print ('Search for an Artist')
print ('You can search by Name, DOB, Movement or Description')
print ('')
name = input("Enter Artists Name: ")
dob = input("Enter Date of Birth: ")
movement = input("Enter Movemment: ")
description = input ("Enter Description: ")
#query database with user input
sql = ("select name, dob, movement, description from artists where name like (?) or dob like (?) or movement like (?) or description like (?)")
values = (name, dob, movement, description)
#Run query
cursor.execute(sql, values)
result = cursor.fetchall()
#print results without quotes and comma separated
print()
print('RESULTS')
for r in result:
print()
print("{}, {}, {}, {}".format(r[0], r[1], r[2], r[3]))

How do I solve Sqlite DB Index Error

Am working with Web2py and sqlite Db in Ubuntu. Iweb2py, a user input posts an item into an sqlite DB such as 'Hello World' as follows:
In the controller default the item is posted into ThisDb as follows:
consult = db.consult(id) or redirect(URL('index'))
form1 = [consult.body]
form5 = form1#.split()
name3 = ' '.join(form5)
conn = sqlite3.connect("ThisDb.db")
c = conn.cursor()
conn.execute("INSERT INTO INPUT (NAME) VALUES (?);", (name3,))
conn.commit()
Another code picks or should read the item from ThisDb, in this case 'Hello World' as follows:
location = ""
conn = sqlite3.connect("ThisDb.db")
c = conn.cursor()
c.execute('select * from input')
c.execute("select MAX(rowid) from [input];")
for rowid in c:break
for elem in rowid:
m = elem
c.execute("SELECT * FROM input WHERE rowid = ?", (m,))
for row in c:break
location = row[1]
name = location.lower().split()
my DB configuration for the table 'input' where Hello World' should be read from is this:
CREATE TABLE `INPUT` (
`NAME` TEXT
);
This code previously workd well while coding with windows7 and 10 but am having this problem ion Ubuntu 16.04. And I keep getting this error:
File "applications/britamintell/modules/xxxxxx/define/yyyy0.py", line 20, in xxxdefinition
location = row[1]
IndexError: tuple index out of range
row[0] is the value in the first column.
row[1] is the value in the second column.
Apparently, your previous database had more than one column.

SQLite3 in python - can't output

I'm having an issue with SQL in python, I am simply creating a table and I want to print it out. I'm confused D: I am using python 3.x
import sqlite3
print ("-" * 80)
print ("- Welcome to my Address Book -")
print ("-" * 80)
new_db = sqlite3.connect('C:\\Python33\\test.db')
c = new_db.cursor()
c.execute('''CREATE TABLE Students
(student_id text,
student_firstname text,
student_surname text,
student_DOB date,
Form text)
''')
c.execute('''INSERT INTO Students
VALUES ('001', 'John','Doe', '12/03/1992', '7S')''')
new_db.commit()
new_db.close()
input("Enter to close")
firstname = "John"
surname = "Foo"
c.execute("SELECT * FROM Students WHERE student_firstname LIKE ? OR student_surname LIKE ? ", [firstname, surname])
while True:
row = c.fetchone()
if row == None:
break# stops while loop if there is no more lines in table
for column in row: #otherwise print line from table
print col, " "
output should be:
001 John Doe 12/03/1992 7S
and any other row from table that contain first name John OR surname Foo

Resources