how to join 2 table column data into single column - sqlite

I did find some examples but they do not merge into single column.
So, I am trying to join 2 table columns data into single column
I have Url1, site1, url2, site2, endurl 5 columns in table1
and keywords column in table2
I want to join or merge these columns into one column like
url1 site1 keywords,url2 site2 keywords endurl this will convert to a url generation just for understanding.
I tried
SELECT table1.Url1, table1.site1, table1.url2, table1.site2, table1.endurl, table2.keywords
FROM table1
LEFT JOIN table2
ON table1.site1 = table2.keywords AND table1.site2 = table2.keywords;
want to merge all columns into single column.

What you're probably looking for is the format function which uses SQLite's builtin printf implementation. So, assuming your columns are all TEXT columns, this will give you what you're looking for:
SELECT format('%s, %s, %s, %s, %s, %s', table1.Url1, table1.site1, table1.url2, table1.site2, table1.endurl, table2.keywords) as my_column
FROM table1
LEFT JOIN table2
ON table1.site1 = table2.keywords AND table1.site2 = table2.keywords;

You could probably concat those columns into one.
Edit: Now for SQLite
SELECT t1.Url1 || ' ' || t1.site1 || ' ' || t2.keywords ||',' ||t1.url2||' '||t1.site2||' '||t3.keywords as column_name
FROM table1 t1, table2 t2, table2 t3
WHERE t1.site1 = t2.keywords AND t1.site2 = t3.keywords;

Related

How to remove data after space in sqlite

Let's suppose I have data like
column
ABC
ABC PQR
ABC (B21)
XYZ ABC
and I wanted output as first string i.e.
ABC
XYZ
i.e. group by column
but I could not able to remove string after space.
I believe that the following would do what you want :-
SELECT * FROM mytable GROUP BY CASE WHEN instr(mycolumn,' ') > 0 THEN substr(mycolumn,1,instr(mycolumn,' ')-1) ELSE mycolumn END;
obviously table and column name changed appropriately.
As an example, using your data plus other data to demonstrate, the following :-
DROP TABLE IF EXISTS mytable;
CREATE TABLE IF NOT EXISTS mytable (mycolumn);
INSERT INTO mytable VALUES ('ABC'),('ABC PQR'),('ABC (B21)'),('XYZ'),('A B'),('AAAAAAAAAAAAAAAAAAAAAAAA B'),(' ABC'),(' XZY');
SELECT * FROM mytable;
SELECT *,group_concat(mycolumn) FROM mytable GROUP BY CASE WHEN instr(mycolumn,' ') > 0 THEN substr(mycolumn,1,instr(mycolumn,' ')-1) ELSE mycolumn END;
DROP TABLE IF EXISTS mytable;
group_concat added to show the columns included in each group
Produces:-
The ungrouped table (first SELECT):-
The grouped result (plus group_concat column) :-
the first row being grouped due to the first character being a space in ABC and XZY
You don't want to do any aggregation, so there is no need for a GROUP BY clause.
Use string functions like SUBSTR() and INSTR() to get the 1st word of each string and then use DISTINCT to remove duplicates from the results:
SELECT DISTINCT SUBSTR(columnname, 1, INSTR(columnname || ' ', ' ') - 1) new_column
FROM tablename
See the demo.
Results:
new_column
ABC
XYZ

Recursing through a SQLite table to find a matching subset of records

I have a table with 3 text fields column (A, B & C) imported from a flat file comprising many thousands of lines. None of these fields have a UNIQUE constraint and there is no primary key combination. As a result one or more records may have the same values and there will even be records with the same values across all fields. In many records, columns A, B and C should be the same but due to data quality issues, column C has many variant where column A and B are the same. Where column A and B are the same the corresponding value in column C may be subsets of the value of column C in another record having the same values as other records for column A and B.
To illustrate a subset arrived at by using GROUP BY gives:
enter image description here
I now need to narrow down that subset further to find all records where the value in column C is INSTR the values of the other grouped results i.e. i'd like to return:
enter image description here
because "Buckingham" and "Lindsey" are both INSTR the records that contain "Lindsey Buckingham" in column C
With EXISTS and INSTR():
select t.* from tablename t
where exists (
select 1 from tablename
where a = t.a and b = t.b and c <> t.c and instr(c, t.c) > 0
)
or with LIKE:
select t.* from tablename t
where exists (
select 1 from tablename
where a = t.a and b = t.b and c <> t.c and c like '%' || t.c || '%'
)
or with a self join:
select distinct t.*
from tablename t inner join tablename tt
on tt.a = t.a and tt.b = t.b and tt.c <> t.c and tt.c like '%' || t.c || '%'

pyodbc-access column by name

I have 2 tables with 150 columns and trying to join those tables and fetch the result set one by one and process them:
qry = '''select a.*, b.*
from table_a a
full outer join table_b b
where a.id = b.id'''
table_row = conn.execute(qry) #execute method yields a generator
Now, I need to access the resultset which is generator and determine the values of each and every column of table-1 & table-2
For example:- if table-1 & table-2 has a column named name, I need to compare it..
How can I access the resultset by columnnname, im using Pyodbc,
ie resultset.table1.name = resultset.table2.name
Use the ISO information schema views (I'm using SQL Server in the
example) to return column names for each table, substituting
database and schema parameters values as appropriate.
Merge the resulting lists into a set containing column names present in both tables.
Use this set to build a string representing column names to select from each table, aliasing each column by prefixing with a table name. Defining column aliases will allow you to differentiate columns by table.
Execute select query and print values for comparison.
Code sample
# assumes connection, cursor already setup
# build SQL for retrieving column names
sql = '''SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMN
WHERE TABLE_CATALOG = ? AND TABLE_SCHEMA = ?
AND TABLE_NAME = ?'''
# get column names from table_a
rows = cursor.execute(sql, ('database', 'schema', 'table_a')).fetchall()
table_a_columns = [column[0] for column in rows]
# get column names from table_b
rows = cursor_b.execute(sql, ('database', 'schema', 'table_b')).fetchall()
table_b_columns = [column[0] for column in rows]
# get unique matching columns from lists
matches = set(table_a_columns).intersection(table_b_columns)
# get string of column names to use in query, setting column alias prefixed with
# table name for each column
column_alias = 'a.{0} as a_{0}, b.{0} as b_{0}'
columns = ', '.join([column_alias.format(column) for column in matches])
sql = 'SELECT {} FROM table_a a FULL OUTER JOIN table_b b ON a.id = b.id'
sql = sql.format(columns)
# print values to compare
for row in cursor.execute(sql):
print row
There's probably a less complicated way, but it's eluding me.

Get count on a joined tables

I have two tables(oracle):
(I have marked the primary keys with a star before the column name)
Table1 Columns are :
*date,
*code,
*symbol,
price,
weight
Table2 columns are :
*descriptionID
code
symbol
date
description
I need to find the below information using query,
For a given code and a symbol on a particular day,is there any description.
for example: code = "AA" and symbol = "TEST" on 2012-4-1 on Table 1 => is there atleast one row like ID=, code ="AA", symbol ="TEST" ,date = 2012-4-1 in table 2
I tried with the below query:
select * from Table1 t1 INNER JOIN
Table2 t2
on t1.code = t2.code and t1.symbol = t2.symbol and
TO_CHAR(t1.date, 'YYYY/MM/DD') = TO_CHAR(t1.date, 'YYYY/MM/DD')
But it doesnt give me output like:
code = AA, symbol = TEST, date 2012-4-1 => descrition count = 10
code = AA, symbol = TEST, date 2012-4-2 => descrition count = 5
code = BB, symbol = HELO, date 2012-4-1 => descrition count = 20
Can some one suggest me a query which can achieve the above output.
I don't see why you need the join:
SELECT count(*)
FROM Table2
WHERE code='AA'
AND symbol = 'TEST'
AND date = to_date('2012-04-01', 'yyyy-mm-dd')
UPDATE: (after reading your comment)
I still don't see why you need the join. Do you need some data from table1 ?
Anyway, if you want the count for all the (code,symbol,date)s then why not group by ?
As for the dates, better use trunc to get rid of the time parts.
So:
SELECT code, symbol, date, count(*)
FROM Table2
GROUP BY code, symbol, date
the Trunc() Method takes a String\Date input and Creates a DATE output that is in this Format: "DD\MM\YYYY".
So Its should do exactly what you want.

Nested subquery is too slow - outer join equivalent?

I'm collecting some basic statistics on our codebase and am trying to generate a query using the following schema data
A files table holding all the files (synthetic Primary Key ID, unique path, and a region column which holds who the file belongs to.
A file_stats table holding data for the files on a specific date (Primary Key is combination of date and file_id)
CREATE TABLE files (
id INT PRIMARY KEY,
path VARCHAR(255) NOT NULL UNIQUE,
region VARCHAR(4) CHECK (region IN ('NYK', 'LDN', 'CORE', 'TKY')),
)
CREATE TABLE file_stats (
date DATE NOT NULL,
file_id INT NOT NULL REFERENCES files,
num_lines INT NOT NULL,
CONSTRAINT file_stats__pk PRIMARY KEY(date, file_id)
)
I'm trying to create a query which will return all combinations of dates and regions in the tables and the number of files for that combination.
The simple approach of
SELECT date, region, COUNT(*) FROM file_stats fs, files f WHERE fs.file_id = f.id
GROUP BY date, region
doesn't work as not all regions are represnted at all dates.
I've tried
SELECT
d.date,
r.region,
(SELECT COUNT(*) FROM file_stats fs, files f
WHERE fs.file_id = file.id AND fs.date = d.date AND d.region = r.region
) AS num_files
FROM
(SELECT DISTINCT date FROM file_stats) AS d,
(SELECT DiSTINCT region FROM files) AS r
but the performance is unacceptable because of the nested subquery.
I've tried LEFT OUTER JOINS, but never seem to be able to make them work.
The database is SQLITE
Can anyone suggest a better query?
SELECT date, region, COUNT(*) FROM file_stats fs, files f WHERE fs.file_id = f.id
GROUP BY date, region
doesn't work as not all regions are
represnted at all dates.
Assuming you mean it works correctly, but you need all the dates to show whether a region might appear there or not, then you need two things.
A calendar table.
A left join on the calendar table.
After you have a calendar table, something like this . . .
SELECT c.cal_date, f.region, COUNT(*)
FROM calendar c
LEFT JOIN file_stats fs ON (fs.date = c.cal_date)
INNER JOIN files f ON (fs.file_id = f.id)
GROUP BY date, region
I used cal_date above. The name you use depends on your calendar table. This will get you started. You can use a spreadsheet to generate the dates.
CREATE TABLE calendar (cal_date date primary key);
INSERT INTO "calendar" VALUES('2011-01-01');
INSERT INTO "calendar" VALUES('2011-01-02');
INSERT INTO "calendar" VALUES('2011-01-03');
INSERT INTO "calendar" VALUES('2011-01-04');
INSERT INTO "calendar" VALUES('2011-01-05');
INSERT INTO "calendar" VALUES('2011-01-06');
INSERT INTO "calendar" VALUES('2011-01-07');
INSERT INTO "calendar" VALUES('2011-01-08');
If you're certain that all the dates are in file_stats, you can do without a calendar table. But there are some cautions.
select fs.date, f.region, count(*)
from file_stats fs
left join files f on (f.id = fs.file_id)
group by fs.date, f.region;
This will work if your data is right, but your tables don't guarantee the data will be right. You don't have a foreign key reference, so there might be file id numbers in each table that don't have matching id numbers in the other table. Let's have some sample data.
insert into files values (1, 'a long path', 'NYK');
insert into files values (2, 'another long path', 'NYK');
insert into files values (3, 'a shorter long path', 'LDN'); -- not in file_stats
insert into file_stats values ('2011-01-01', 1, 35);
insert into file_stats values ('2011-01-02', 1, 37);
insert into file_stats values ('2011-01-01', 2, 40);
insert into file_stats values ('2011-01-01', 4, 35); -- not in files
Running this query (same as immediately above, but add ORDER BY) . . .
select fs.date, f.region, count(*)
from file_stats fs
left join files f on (f.id = fs.file_id)
group by fs.date, f.region
order by fs.date, f.region;
. . . returns
2011-01-01||1
2011-01-01|NYK|2
2011-01-02|NYK|1
'LDN' doesn't show, because there's no row in file_stats with file id number 3. One row has a null region, because no row in files has file id number 4.
You can quickly find mismatched rows with a left join.
select f.id, fs.file_id
from files f
left join file_stats fs on (fs.file_id = f.id)
where fs.file_id is null;
returns
3|
meaning that there's a row in files that has id 3, but no row in file_stats that has id 3. Flip the table around to determine the rows in file_stats that have no matching row in files.
select fs.file_id, f.id
from file_stats fs
left join files f on (fs.file_id = f.id)
where f.id is null;
One (slower due to performance hit of a second half) way of doing what you want is a UNION of things that have a count with manufactured list of things that have zero count:
-- Include the counts for date/region pairs that HAVE files
SELECT date, region, COUNT(*) as COUNT1
FROM file_stats fs, files f
WHERE fs.file_id = f.id
GROUP BY date, region
UNION
SELECT DISTINCT date, region, 0 as COUNT1
FROM file_stats fs0, files f0
WHERE NOT EXISTS (
SELECT 1
FROM file_stats fs, files f
WHERE fs.file_id = f.id
AND fs.date=fs0.date
AND f.region=f0.region
)
I'm not entirely sure why you're opposed to the use of temp tables? E.g. (this is Sybasyish syntax for temp table population but should port easily - don't recall exact SQLite one). Table size should be minimal (just # of days * # of regions)
CREATE TABLE COMBINATIONS TEMPORARY (region VARCHAR(4), date DATE)
INSERT COMBINATIONS SELECT DISTINCT date, region FROM files, file_stats
SELECT c.date, c.region, SUM(CASE WHEN file_stats.id IS NULL THEN 0 ELSE 1 END)
FROM COMBINATIONS c
LEFT JOIN files f ON f.region=c.region
LEFT OUTER JOIN file_stats fs ON fs.date=c.date AND fs.file_id = f.id
GROUP BY c.date, c.region
I suspect that it is having to try scan file_stats and files for every single row of the output. The following version might be substantially faster. And it won't require creating new tables.
SELECT d.date
, r.region
, count(f.file_id) AS num_files
FROM (SELECT DISTINCT date FROM file_states) AS d,
(SELECT DISTINCT region FROM files) AS r,
LEFT JOIN file_stats AS fs
ON fs.date = d.date
LEFT JOIN files f
ON f.file_id = fs.file_id
AND f.region = r.region
GROUP BY d.date, r.region;

Resources