Sql query: date range and value greater than 10 each year - sqlite

I have table which contain say 3 columns( companyname, years, roc). I want to find out companyname which satisfy below condition.
Between years 2010 and 2020 and roc> 10 each years. Mean if in any years between 2010 and 2020 roc<10 it should not include that company. It should only show companyname if each year roc >10.

Filter the table for the years that you want and aggregate with a condition in the HAVING clause for the column roc:
SELECT companyname
FROM tablename
WHERE years BETWEEN 2010 AND 2020
GROUP BY companyname
HAVING MIN(roc) > 10;

Okay, I think just now I got your problem. What is the main issue in here is, the companies get duplicated within a single table which violates the DB principles what apparently makes you harder when querying back.
So the best thing you could do is, break this single table into two - COMPANY_TABLE and COMPANY_ROC
So the COMPANY_TABLE will only have the companyID(PRIMARY KEY) and the companyName.
Next have another table called COMPANY_ROC - which will have roc , years , companyID(FOREIGN KEY - PRIMARY KEY of COMPANY_TABLE).
CREATE TABLE COMPANY(
companyID int PRIMARY KEY NOT NULL,
companyName varchar,
)
CREATE TABLE COMPANY_ROC(
companyID int,
roc int,
years int,
FOREIGN KEY(companyID) REFERENCES COMPANY(companyID)
)
SO when you querying, you can query as follows using an INNER JOIN
SELECT COMPANY.companyName from COMPANY INNER JOIN COMPANY_ROC WHERE COMPANY_ROC.years>=2010 AND COMPANY_ROC.years<=2020 AND COMPANY_ROC.roc>10 AND COMPANY.companyID = COMPANY_ROC.companyID
Maybe my Query might have some issues as I didn't test it. Just understand what I explained and give it a try. Breaking the table into two and having primary , foreign keys is the key for easy querying :)

Related

SQLite Swap 2 Primary Key Values In A Column

My table student is as follows:
Name
Id
Dave
3414
Bob
2861
The ID is the primary key value that has a unique constraint. I'm trying to swap the two primary key values, so Bob's ID is 3414 and Dave's ID is 2861. What's the quickest way to do this?
UPDATE student SET Id=2861 WHERE Id=3414;
UPDATE student SET Id=3414 WHERE Id=2861;
These two statements won't work as it will create duplicate primary keys.
You can do it in three steps (Wrapped in a transaction so it's atomic as far as other connections to the database are concerned), by first changing one of the PKs to a value that's not already in the database. Negative numbers work well, assuming your PKs are normally values >= 0, which is true of automatically generated rowid/INTEGER PRIMARY KEY values in Sqlite.
BEGIN;
UPDATE student SET Id=-Id WHERE Id=2861;
UPDATE student SET Id=2861 WHERE Id=3414;
UPDATE student SET Id=3414 WHERE Id=-2861;
COMMIT;
Somebody asked something like this here
but it apparently does not work in some versions

How to solve a spool space error with rank () over partition by SQL optimising?

I have a table holding information about contacts made to many different customers in the format
email_address | treatment_group | customer_id | contact_date |
I am trying to add a column that looks at each distinct customer and numbers the contacts they've received from longest ago to most recent. I'm using this code:
explain create table db.responses_with_rank
as
( select a.*,
rank () over (partition by customer_id order by contact_date asc)as xrank
from db.responses_with_rank
)
with data
primary index (email_address, treatment_group)
My query is spooling out. There is a primary index of email_address, treatment_group that leads to a skew factor of 1.1 and a secondary primary index on customer_id. I've collected statistics on both sets of indexes. The table is quite large - around 200M records. Is there something I can try to optimize this query?
There is not enough information to determine the cause of the error.
For start, please add the following to your question:
TD version (select * from dbc.dbcinfo)
Execution plan
The statistics collection commands you have used
customer_id top frequencies (select top 10 customer_id,count(*) from db.responses_with_rank group by 1 order by 2 desc)
Do you have wide text columns in your table?
P.s.
I strongly recommend to use create multiset table and not create table.

How to find 2 matching records comparing two fields in one table to two fields in another table?

Using MS Access 2010, I have two tables that I need to compare and only retrieve the matches.
Table [EE] has 5 fields:
Field4, SSN, Birthdate, Address1, Address2
Table [UPDATED] has fields:
Field4, DOB
and several others not relevant to thus question.
I need to find all records in [EE] from fields Field4 AND Birthdate that have a matching values in BOTH Field4 and DOB in [UPDATED]. I have tried INNER JOIN and it is returning me several duplicates. I have tried:
SELECT EE.Birthdate, EE.Field4
FROM EE, UPDATED
WHERE (EE.Birthdate = UPDATED.DOB)
AND (EE.Field4 = UPDATED.FIELD4)
And
SELECT EE.Birthdate, EE.Field4
FROM INNER JOIN UPDATED ON EE.Birthdate = UPDATED.DOB)
AND (EE.Field4 = UPDATED.FIELD4)
I am getting a lot of duplicate records and only want the records that appear in BOTH tables.
There are probably actually duplicates. Add the DISTINCT keyword to eliminate duplicates:
SELECT DISTINCT EE.Birthdate, EE.Field4
FROM EE
JOIN UPDATED ON EE.Birthdate = UPDATED.DOB
AND EE.Field4 = UPDATED.FIELD4
I also fixed a few syntax errors.
Also, always prefer using proper join syntax.

Get last_insert_rowid() from bulk insert

I have an sqlite database where I need to insert spatial information along with metadata into an R*tree and an accompanying regular table. Each entry needs to be uniquely defined for the lifetime of the database. Therefore the regular table have an INTEGER PRIMARY KEY AUTOINCREMENT column and my plan was to start with the insert into this table, extract the last inserted rowids and use these for the insert into the R*tree. Alas this doesn't seem possible:
>testCon <- dbConnect(RSQLite::SQLite(), ":memory:")
>dbGetQuery(testCon, 'CREATE TABLE testTable (x INTEGER PRIMARY KEY, y INTEGER)')
>dbGetQuery(testCon, 'INSERT INTO testTable (y) VALUES ($y)', bind.data=data.frame(y=1:5))
>dbGetQuery(testCon, 'SELECT last_insert_rowid() FROM testTable')
last_insert_rowid()
1 5
2 5
3 5
4 5
5 5
Only the last inserted rowid seems to be kept (probably for performance reasons). As the number of records to be inserted is hundreds of thousands, it is not feasible to do the insert line by line.
So the question is: Is there any way to make the last_insert_rowid() bend to my will? And if not, what is the best failsafe alternative? Some possibilities:
Record highest rowid before insert and 'SELECT rowid FROM testTable WHERE rowid > prevRowid'
Get the number of rows to insert, fetch the last_insert_rowid() and use seq(to=lastRowid, length.out=nInserts)
While the two above suggestion at least intuitively should work I don't feel confident enough in sqlite to know if they are failsafe.
The algorithm for generating autoincrementing IDs is documented.
For an INTEGER PRIMARY KEY column, you can simply get the current maximum value:
SELECT IFNULL(MAX(x), 0) FROM testTable
and then use the next values.

Speed up SQL select in SQLite

I'm making a large database that, for the sake of this question, let's say, contains 3 tables:
A. Table "Employees" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
B. Table "Job_Sites" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
C. Table "Workdays" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
emp_id = is a foreign key to Employees(id)
job_id = is a foreign key to Job_Sites(id)
datew = INTEGER that stands for the actual workday, represented by a Unix date in seconds since midnight of Jan 1, 1970
The most common operation in this database is to display workdays for a specific employee. I perform the following select statement:
SELECT * FROM Workdays WHERE emp_id='Actual Employee ID' AND job_id='Actual Job Site ID' AND datew>=D1 AND datew<D2
I need to point out that D1 and D2 are calculated for the beginning of the month in search and for the next month, respectively.
I actually have two questions:
Should I set any fields as indexes besides primary indexes? (Sorry, I seem to misunderstand the whole indexing concept)
Is there any way to re-write the Select statement to maybe speed it up. For instance, most of the checks in it would be to see that the actual employee ID and job site ID match. Maybe there's a way to split it up?
PS. Forgot to say, I use SQLite in a Windows C++ application.
If you use the above query often, then you may get better performance by creating a multicolumn index containing the columns in the query:
CREATE INDEX WorkdaysLookupIndex ON Workdays (emp_id, job_id, datew);
Sometimes you just have to create the index and try your queries to see what is faster.

Resources