When I am copying rows from one table to another, there is a convenient syntax:
INSERT INTO table1 SELECT * FROM table2 WHERE <table 2 rows have some property>
But what if I want to overwrite entire existing rows in table1 with entire rows from table2? So I want something like:
UPDATE table1 SET * FROM table2 WHERE <table 1 and 2 rows match on some key field>
But from what I can tell, the only way to do this is to enumerate the columns being set one by one (set table1.columnA = table2.columnA, table1.columnB = table2.columnB, and so on). Is there some way to say "do it for all the columns" when using UPDATE like there is when using INSERT? If not, why not?
(I guess I could delete all rows from table1 with the given property, and then use the INSERT INTO table1 SELECT * syntax to bring in the replacement rows from table2. But that seems like it leaves a bunch of unwanted deleted rows in the database needing to be vacuumed at some point, as opposed to a clean UPDATE where there are no deleted records? Or maybe I'm not understanding the efficiency of a bunch of deletes followed by a bunch of inserts?)
There is no such syntax for exactly what you have in mind, and I think you will need to SET each column separately. Also, SQLite does not support a direct update join syntax, but we may use correlated subqueries:
UPDATE table1
SET table1.columnA = (SELECT columnA FROM table2 WHERE table1.col = table2.col),
SET table1.columnB = (SELECT columnB FROM table2 WHERE table1.col = table2.col),
SET table1.columnC = (SELECT columnC FROM table2 WHERE table1.col = table2.col);
Related
I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks
If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)
With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);
why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.
try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times
I'm not sure I'm using right terminology here.
Basically I want to update entire "id" column using count(*) [485] as a delimiter, in an ascending order, so the resulting row value will correspond with rownumber (not the rowid).
If I understand you correctly, this should work for you:
UPDATE tbl_name SET id=rowid
EDIT
If that's is the case -> then it's a lit bit more tricky, since SQlite doesn't support variables declaration.
So what I suggest is,
To create temporary table from select of your original table which makes it's rowids to be as row numbers 1,2,3 etc...
Set it's rowNum (the needed row number column) as each rowid
Then replace the original table with it.
Like this: (assume original table called orig_name)
CREATE TABLE tmp_tbl AS SELECT rowNum FROM orig_name;
UPDATE tmp_tbl SET rowNum=rowid;
DROP TABLE orig_name;
CREATE TABLE orig_name AS SELECT rowNum FROM tmp_tbl;
DROP TABLE tmp_tbl;
Try this: http://www.sqlite.org/lang_createtable.html#rowid
You can use some inner database variables such as rowid, oid etc to get what you need.
[edit]
Think I just understood what you meant, you want for each insert action, add a value that is the total count of rows currently in the table?
If so, try something like this:
UPDATE tbl_name
SET id = (select count(*) from tbl_name)
WHERE yada_yada_yada
I execute a query in my DB:
SELECT table1.*, tabl2.* FROM table1 JOIN table2 USING(id);
In these 2 tables i have a common column "id". What I have to ask, in order to get the column 'id' once time in the results and not twice?
I thought one solution is to write down in the query which columns I want. But If I want to avoid this (as there are many) ?
Will it work for you to name specific columns you need from both tables? something like:
SELECT table1.id, table2.other_column1, table2.other_column2 FROM table1 JOIN table2 USING(id);
You are selecting all fields from both tables by using (*)
Hope you are doing good. I am new to SQL coding. I want to write a query which finds the difference between two tables and writes updates or new data into third table. My two tables have identical column names. Third table which captures changes have extra column called comment. I would like to insert the comment whether it is a new row or updated row based on the row modification.
**TABLE1 (BACKUP)**
KEY,FIRST_NAME,LAST_NAME,CITY
1,RAM,KUMAR,INDIA
2,TOM,MOODY,ENGLAND
3,MOHAMMAD,HAFEEZ,PAKISTAN
4,MONIKA,SAM,USA
5,MIKE,PALEDINO,USA
**TABLE2 (CURRENT)**
KEY,FIRST_NAME,LAST_NAME,CITY
1,RAM,KUMAR,USA
2,TOM,MOODY,ENGLAND
3,MOHAMMAD,HAFEEZ,PAKISTAN
4,MONIKA,SAM,INDIA
5,MIKE,PALEDINO,USA
6,MAHELA,JAYA,SL
**TABLE3 (DIFFERENCE FROM TABLE2 TO TABLE1)**
KEY,FIRST_NAME,LAST_NAME,CITY,COMMENT
1,RAM,KUMAR,USA,UPDATE
4,MONIKA,SAM,INDIA,UPDATE
6,MAHELA,JAYA,SL,INSERT
Anyone else? I want to update my comments columns whether it is a new insert or update to existing row
#danny117 is correct in the general sense though I think using MINUS is better
SELECT * FROM TABLE2
MINUS
SELECT * FROM TABLE1
You may also like to look at this documentation which explains more about minus, intersect
INSERT INTO TABLE3
SELECT KEY,FIRST_NAME,LAST_NAME,CITY,NULL AS COMMENTS FROM TABLE2
MINUS
SELECT KEY,FIRST_NAME,LAST_NAME,CITY,NULL AS COMMENTS FROM TABLE1;
UPDATE TABLE3
SET COMMENTS =
CASE
WHEN 1=(SELECT 1 FROM TABLE1 WHERE TABLE1.KEY=TABLE3.KEY) THEN 'UPDATED'
ELSE 'INSERTED'
END
Can we delete duplicate records from a multiset table in teradata without using intermediate table.
Suppose we have 2 rows with values
1, 2, 3
and 1, 2, 3
in my multiset table then after delete i should have
only one row i.e. 1, 2, 3.
You can't unless the ROWID usage has been enabled on your system (and probablity is quite low). You can easily test it by trying to explain a SELECT ROWID FROM table;
Otherwise there are two possible ways.
Low number of duplicates:
create a new table as result of SELECT all columns FROM table GROUP BY all columns HAVING COUNT(*) > 1;
DELETE FROM tab WHERE EXISTS (SELECT * FROM newtab WHERE...)
INSERT INTO tab SELECT * FROM newtab
High number of duplicates:
copy to a new table using SELECT DISTINCT * or copy to a SET TABLE to get rid of the duplicates and then re-INSERT back
Use the same approach, but create a volatile table in the middle.
CREATE VOLATILE MULTISET TABLE TEMPDB.TEMP_DUP_ID (
Row_ID DECIMAL(31,0)
) PRIMARY INDEX (Row_ID)
ON COMMIT PRESERVE ROWS;
INSERT INTO TEMPDB.TEMP_DUP_ID
SELECT ROW_ID
FROM DB.TABLE T
QUALIFY ROW_NUMBER() OVER (PARTITION BY DUP ORDER BY DUP DESC) > 1
Then use the table to delete.
Ideally you will have unique key per row, otherwise, you will need to manipulate the data a bit more to generate one (with row_number() for instance... This is just a recommendation).
---Without creating intermediate table
delete FROM ORGINAL_TABLE WHERE (COL1, 2) in (select COL1, count() from ORGINAL_TABLE
GROUP BY 1
HAVING COUNT() >1 )
and DUPLICATE_BASED_COL >1; -------Delete one row(keep it)
If you have duplicates and want to delete one row, then we need to use the last line in the sql, if we want to delete both rows than, ignore the condition.
create table without dup
CREATE TABLE new AS (SELECT DISTINCT * FROM old) WITH DATA;
verify
select * from new;
drop the original one
drop table old;
rename the new table as original
RENAME TABLE new to old;
verify
select * from old;
SEL * FROM THE_TABLE_Containing_duplications
QUALIFY (ROW_number() over(partition by duplicated_column order by duplicated_column)=1) --keep only one occurence (the first one)