How to convert rownum in following query(oracle) to teradata equivalent:
and not exists(select 1
from CSE, SPD
WHERE cse.id=spd.id
AND ROWNUM = 1
AND CSE.STATUSID IN(6,7,8,13)
thanks.
There's no ROWNUM in Teradata, but you can usually rewrite it using ROW_NUMBER plus QUALIFY.
In your case there's no need for ROWNUM at all (at least logically, maybe Oracle prefers it to do a better plan), this is exactly the same:
and not exists(select *
from CSE, SPD
WHERE cse.id=spd.id
AND CSE.STATUSID IN(6,7,8,13)
Teradata specifically does not have any rownumber attached to the rows in a table as in Oracle.
But it has two analytical functions such as ROW_NUMBER() and RANK() which will give a number to your row and then you can accordingly pick you data.
You may use something like below:
QUALIFY ROW_NUMBER()(Partition by Id order by date)=1
Here, Partition by with a column or few columns can be used to group your data, like suppose some id column in your table and order by will sort the data for that id in your table according to the order by column you provide and =1 means it then chooses the row for that id which is given row number as 1.
Use somethin like below:
row_number() over(partition by '' order by statusid asc) as rownum_1
qualify rownum_1 = 1
The above statement imitates the rownum functionality of oracle.
you can use row_number. keep in mind the columns you want to take into consideration while calculating the row number. You can use qualify for the same.
other options are dense_rank, rank etc. In your case you can use rank in the sub query and put the condition rank = 1. if you want the syntax do let me know.
As you using this tweak for efficiency, I suppose, QUALIFY ROW_NUMBER() and other window functions will not suite you during their heavy use of CPU.
You can simply remove this part, Teradata should be fine.
Related
I have 300000 entries in my db and am trying to access entry 50000-100000 (to 50000 total).
My query is as follows:
query = 'SELECT TOP 50000* FROM database ORDER BY col_name QUALIFY ROW_NUMBER() BETWEEN 50000 and 100000'
I only found the BETWEEN KEYWORD in one source however and am suspecting I am not using it correctly since it says it can't be used on a non-ordered database. I assume the QUALIFY then gets evaluated before the ORDER BY.
So I tried something along the lines of
query_second_try = 'SELECT TOP 50000* FROM database QUALIFY ROW_NUMBER() OVER (ORDER BY col_name)'
to see if this fixes the problem (without taking into account the specific rows I want to select). This is also not the case.
I have tried using qualify with rank, but this doesn't seem to be exactly what I need either, I think the BETWEEN statement would be a better fit.
Can someone push me in the right direction here?
I am essentially trying to do the equivalent of 'ORDER BY col_name OFFSET BY 50000' in teradata.
Any help would be appreciated.
Few problems here.
row_number requires an order by. And it needs to be granular enough to ensure it's deterministic. You can also play around with rank, dense_rank, and row_number, depending on what you want to do with ties.
You're also mixing top N and qualify.
Try this:
select
*
from
<table>
qualify row_number() over (order by <column(s)>) between X and Y
I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks
If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)
With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);
why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.
try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times
I'm not sure I'm using right terminology here.
Basically I want to update entire "id" column using count(*) [485] as a delimiter, in an ascending order, so the resulting row value will correspond with rownumber (not the rowid).
If I understand you correctly, this should work for you:
UPDATE tbl_name SET id=rowid
EDIT
If that's is the case -> then it's a lit bit more tricky, since SQlite doesn't support variables declaration.
So what I suggest is,
To create temporary table from select of your original table which makes it's rowids to be as row numbers 1,2,3 etc...
Set it's rowNum (the needed row number column) as each rowid
Then replace the original table with it.
Like this: (assume original table called orig_name)
CREATE TABLE tmp_tbl AS SELECT rowNum FROM orig_name;
UPDATE tmp_tbl SET rowNum=rowid;
DROP TABLE orig_name;
CREATE TABLE orig_name AS SELECT rowNum FROM tmp_tbl;
DROP TABLE tmp_tbl;
Try this: http://www.sqlite.org/lang_createtable.html#rowid
You can use some inner database variables such as rowid, oid etc to get what you need.
[edit]
Think I just understood what you meant, you want for each insert action, add a value that is the total count of rows currently in the table?
If so, try something like this:
UPDATE tbl_name
SET id = (select count(*) from tbl_name)
WHERE yada_yada_yada
I'm trying to add an auto-calculated field in SQL Server 2012 Express, that stores the % of project completion, by calculating the date difference by using:
ALTER TABLE dbo.projects
ADD PercentageCompleted AS (select COUNT(*) FROM projects WHERE project_finish > project_start) * 100 / COUNT(*)
But I am getting this error:
Msg 1046, Level 15, State 1, Line 2
Subqueries are not allowed in this context. Only scalar expressions are allowed.
What am I doing wrong?
Even if it would be possible (it isn't), it is anyway not something you would want to have as a caculated column:
it will be the same value in each row
the entire table would need to be updated after every insert/update
You should consider doing this in a stored procedure or a user defined function instead.Or even better in the business logic of your application,
I don't think you can do that. You could write a trigger to figure it out or do it as part of an update statement.
Are you storing "percentageCompleted" as a duplicated column value in the same table as your project data?
If this is the case, I would not recommend this, because it would duplicate the data.
If you don't care about duplicate data, try something separating the steps out like this:
ALTER TABLE dbo.projects
ADD PercentageCompleted decimal(2,2) --You could also store it as a varchar or char
declare #percentageVariable decimal(2,2)
select #percentageVariable = (select count(*) from projects where Project_finish > project_start) / (select count(*) from projects) -- need to get ratio by completed/total
update projects
set PercentageCompleted = #percentageVariable
this will give you a decimal value in that table, then you can format it on select if you desire to % + PercentageCompleted * 100
I want to get a number of rows in my table using max(id). When it returns NULL - if there are no rows in the table - I want to return 0. And when there are rows I want to return max(id) + 1.
My rows are being numbered from 0 and autoincreased.
Here is my statement:
SELECT CASE WHEN MAX(id) != NULL THEN (MAX(id) + 1) ELSE 0 END FROM words
But it is always returning me 0. What have I done wrong?
You can query the actual number of rows withSELECT Count(*) FROM tblName
see https://www.w3schools.com/sql/sql_count_avg_sum.asp
If you want to use the MAX(id) instead of the count, after reading the comments from Pax then the following SQL will give you what you want
SELECT COALESCE(MAX(id)+1, 0) FROM words
In SQL, NULL = NULL is false, you usually have to use IS NULL:
SELECT CASE WHEN MAX(id) IS NULL THEN 0 ELSE (MAX(id) + 1) END FROM words
But, if you want the number of rows, you should just use count(id) since your solution will give 10 if your rows are (0,1,3,5,9) where it should give 5.
If you can guarantee you will always ids from 0 to N, max(id)+1 may be faster depending on the index implementation (it may be faster to traverse the right side of a balanced tree rather than traversing the whole tree, counting.
But that's very implementation-specific and I would advise against relying on it, not least because it locks your performance to a specific DBMS.
Not sure if I understand your question, but max(id) won't give you the number of lines at all. For example if you have only one line with id = 13 (let's say you deleted the previous lines), you'll have max(id) = 13 but the number of rows is 1. The correct (and fastest) solution is to use count(). BTW if you wonder why there's a star, it's because you can count lines based on a criteria.
I got same problem if i understand your question correctly, I want to know the last inserted id after every insert performance in SQLite operation. i tried the following statement:
select * from table_name order by id desc limit 1
The id is the first column and primary key of the table_name, the mentioned statement show me the record with the largest id.
But the premise is u never deleted any row so the numbers of id equal to the numbers of rows.
Extension of VolkerK's answer, to make code a little more readable, you can use AS to reference the count, example below:
SELECT COUNT(*) AS c from profile
This makes for much easier reading in some frameworks, for example, i'm using Exponent's (React Native) Sqlite integration, and without the AS statement, the code is pretty ugly.