I have a .net web application that uses SQL Server 2008. The data table I am trying to display in a grid contains columns that are actually rows of another table. Right now, I am doing this in the BLL, reading data into data table; getting the data from another table and making it into columns of first data table and then going through each row of data in that data table to populate the new columns. Very time consuming and slow.
I believe this can be done through a query in SQL 2012 and above using "Transpose" or something similar but not sure if it is possible in 2008. I researched and tried using "pivot" but I am not good at SQL and couldn't get it to work.
This is a simplified example of DB tables and what I need to display:
Facility Table:
FacilityID
12345
67890
PartnerInfo table:
PartnerID Partner
1 Partner1
2 Partner2
3 Partner3
FacilityPartner table:
FacilityID PartnerID
12345 1
12345 3
67890 2
67890 3
Need a query to return something like:
FacilityID Partner1 Partner2 Partner3
12345 true false true
67890 false true true
Following should give some idea on pivoting the data. It doesn't give you exact true false as you asked.
declare #facility table (facilityId int)
declare #PartnerInfo table (partnerid int, partnerN varchar(1000))
declare #FacilityPartner table (facilityId int,partnerid int)
insert into #facility values (12345)
insert into #facility values (67890)
insert into #facility values (67891)
insert into #PartnerInfo values (1, 'partner1')
insert into #PartnerInfo values (2, 'partner2')
insert into #PartnerInfo values (3, 'partner3')
insert into #FacilityPartner values(12345, 1)
insert into #FacilityPartner values(12345, 3)
insert into #FacilityPartner values(67890, 2)
insert into #FacilityPartner values(67890, 3)
select f.facilityId as facid, p.PartnerN as partn, 100 as val
FROM #facility f
LEFT join #FacilityPartner fp on f.facilityId = fp.facilityId
LEFT JOIN #PartnerInfo p on p.partnerid = fp.partnerid
select facid, Partner1 , partner2,partner3 FROM
(select f.facilityId as facid, p.PartnerN as partn, 100 as val
FROM #facility f
LEFT join #FacilityPartner fp on f.facilityId = fp.facilityId
LEFT JOIN #PartnerInfo p on p.partnerid = fp.partnerid) x
PIVOT(
avg(val)
for partn in ([partner1], [partner2],[partner3])
) as pvt
The first thing to understand is, just like many other languages, SQL has a sort of "compile" process, where an execution plan is produced. An SQL query MUST be able to know the precise number and types of columns at compile time, without referencing the data (it does have some table metadata available for the compile, which is why SELECT * works).
This means what you want to do is only possible if one of two conditions is met:
You must know the precise number of partners (and the names for the columns, in this case) ahead of time. This is true even for a query using the PIVOT keyword.
You must be willing to do this in multiple steps, using dynamic SQL, where the first step looks at the data to know how many columns you'll need. Then you can build up a new query in a varchar variable, and finally execute that string using Exec() orsp_executesql(). This works because the last step invokes a new "compile" process and execution context for that string variable.
Of course there's also a third option: pivot the data in your client code. That is my preference. Most people, though, opt for option 2.
Related
I have a sqlite database of about 1.4 million rows and 16 columns.
I have to run an operation on 80,000 id's :
Get all rows associated with that id
convert to R date object and sort by date
calculate difference between 2 most recent dates
For each id I have been querying sqlite from R using dbSendQuery and dbFetch for step 1, while steps 2 and 3 are done in R. Is there a faster way? Would it be faster or slower to load the entire sqlite table into a data.table ?
I heavily depends on how you are working on that problem.
Normally loading the whole query inside the memory and then do the operation will be faster from what I have experienced and have seen on grahics, I can not show you a benchmark right now. If logically it makes hopefully sense, because you have to repeat several operations multiple times on multiple data.frames. As you can see here, 80k rows are pretty fast, faster than 3x 26xxx rows.
However you could have a look at the parallel package and use multiple cores on your machine to load subsets of your data and process them parallel, each on a multiple core.
Here you can find information how to do this:
http://jaehyeon-kim.github.io/2015/03/Parallel-Processing-on-Single-Machine-Part-I
If you're doing all that in R and fetching rows from the database 80,0000 times in a loop... you'll probably have better results doing it all in one go in sqlite instead.
Given a skeleton table like:
CREATE TABLE data(id INTEGER, timestamp TEXT);
INSERT INTO data VALUES (1, '2019-07-01'), (1, '2019-06-25'), (1, '2019-06-24'),
(2, '2019-04-15'), (2, '2019-04-14');
CREATE INDEX data_idx_id_time ON data(id, timestamp DESC);
a query like:
SELECT id
, julianday(first_ts)
- julianday((SELECT max(d2.timestamp)
FROM data AS d2
WHERE d.id = d2.id AND d2.timestamp < d.first_ts)) AS days_difference
FROM (SELECT id, max(timestamp) as first_ts FROM data GROUP BY id) AS d
ORDER BY id;
will give you
id days_difference
---------- ---------------
1 6.0
2 1.0
An alternative for modern versions of sqlite (3.25 or newer) (EDIT: On a test database with 16 million rows and 80000 distinct ids, it runs considerably slower than the above one, so you don't want to actually use it):
WITH cte AS
(SELECT id, timestamp
, lead(timestamp, 1) OVER id_by_ts AS next_ts
, row_number() OVER id_by_ts AS rn
FROM data
WINDOW id_by_ts AS (PARTITION BY id ORDER BY timestamp DESC))
SELECT id, julianday(timestamp) - julianday(next_ts) AS days_difference
FROM cte
WHERE rn = 1
ORDER BY id;
(The index is essential for performance for both versions. Probably want to run ANALYZE on the table at some point after it's populated and your index(es) are created, too.)
I have a similar question like the one here: distinct values as new columns & count
But instead of having only 3 values (in the case above: drivers), I have about 1 million, so I cannot list all of them in my code. How can I do that in SQLite?
So I kind of want something like the code below to be repeated for i= 1 to length(DISTINCT(driver)):
SELECT model
, COUNT(model) as drives
, SUM(distance) as distance
, SUM(CASE WHEN driver=DISTINCT(driver)[i] THEN 1 ELSE 0 END) AS DISTINCT(driver)[i]
FROM new_table
GROUP BY model;
SQLite has no mechanism for dynamic SQL. You have to read the list of all possible drivers from the database, and construct the query with a separate SUM(CASE...) column for each value in your program.
But a large number of columns is inefficient, and when it becomes larger than 2000, it will not work anyway.
It might be a better idea to return each matrix entry individually:
SELECT model,
driver,
COUNT(*) AS drives_for_this_model_and_driver
FROM new_table
GROUP BY model, driver
ORDER BY model, driver;
Is there any way to query directly by row number in a table in Oracle? In other words, to achieve the same effect of ordinary lookup in an array in some basic language like C or Java. I've not yet tried virtual columns.
For instance, the following is an example of an efficient query, but it wastes disk space:
create table ary (row_position_id number(10) NOT NULL,
datum binary_float NOT NULL);
declare i pls_integer;
begin
for i in 0..10000000
loop
insert into ary values (i, dbms_random.normal());
end loop;
commit;
end;
create unique index ary_rp on ary(row_position_id);
now, i'm going to create a set of query values to store in another "parameter" table:
create table query_values (qval number(10) NOT NULL);
declare i pls_integer;
begin
for i in 0..10000
loop
insert into query_values (abs(dbms_random.random() % 10000000));
end loop;
commit;
end;
now, having these query values, i'm going to query the original table
select d.* from ary d where exists (select 0 from query_values v
where d.row_position_id = v.qval);
Now, this query would be fine -- it would use INDEX UNIQUE SCAN and TABLE access by ROWID. The problem I have is that the row_position_id takes up as much space in the table blocks as the actual data (the DATUM column).
I am aware of Index-organized tables and also Virtual Columns (which cannot be used with IOTs). And, of course, things like ROWNUM and ROW_NUMBER are irrelevant here (unless I'm misunderstanding something).
Also worth pointing out, this table is static data -- once loaded, it will never change. I would likely do an ALTER TABLE ARY READ ONLY;
What I would really like is:
create table ary (datum binary_float not null);
-- load rows in a specific order
-- efficiently query this table by implicit row position
Thanks very much!
Henry
I think you're going to want to keep the extra column. Here's why:
As you said, ROWNUM and ROW_NUMBER are not applicable here because they are generated as rows are returned in the query; they will not tell you anything about insert order.
What about ROWID? ROWID is just where a row is stored - again, from the docs:
The data object number of the object
The data block in the data file in which the row resides
The position of the row in the data block (first row is 0)
The data file in which the row resides (first file is 1). The file number is relative to the tablespace.
The "position in the data block" sounds interesting, but you would have no idea what order of the data blocks that were inserted (Oracle could use whatever datablocks it can quickly make use of) so this would not be a reliable option, and even so, you'd be having to parse ROWIDs which are not human readable (e.g. in 12g they look like this: *BAGAASMCwQL+ )
Another option is ORA_ROWSCN which is interesting in that it does give you some idea of order, in terms of the system change number. However, it doesn't come for free. Just to start, you have to create your table with the ROWDEPENDENCIES option and as per docs:
ROWDEPENDENCIES Specify ROWDEPENDENCIES if you want to enable
row-level dependency tracking. This setting is useful primarily to
allow for parallel propagation in replication environments. It
increases the size of each row by 6 bytes.
The other catch with this is that you would have to have to follow each row inserted with a commit so each row would get a different SCN.
If you're willing to go this far, you'll still have to convert the rows to have indexes (starting with, say, 0 or 1) that you can use to join to other tables.
Here's a quick sample of what it would involve:
DROP TABLE temp;
CREATE TABLE temp
( a number(10)
, b varchar2(10)
)
ROWDEPENDENCIES
;
-- one commit after all rows
INSERT INTO temp VALUES (1, 'A');
INSERT INTO temp VALUES (2, 'B');
INSERT INTO temp VALUES (3, 'C');
INSERT INTO temp VALUES (4, 'D');
INSERT INTO temp VALUES (5, 'E');
INSERT INTO temp VALUES (6, 'F');
COMMIT;
SELECT X.*, ROWNUM
FROM (SELECT T.*
, ORA_ROWSCN
FROM TEMP T
ORDER BY ORA_ROWSCN
) x
;
A B ORA_ROWSCN ROWNUM
1 A 2272340 1
2 B 2272340 2
6 F 2272340 3
4 D 2272340 4
5 E 2272340 5
3 C 2272340 6
Whoops. Those rows are definitely not in the order they came in.
Now using one commit per row:
TRUNCATE TABLE temp;
INSERT INTO temp VALUES (1, 'A');
COMMIT;
INSERT INTO TEMP VALUES (2, 'B');
COMMIT;
INSERT INTO temp VALUES (3, 'C');
COMMIT;
INSERT INTO temp VALUES (4, 'D');
COMMIT;
INSERT INTO temp VALUES (5, 'E');
COMMIT;
INSERT INTO temp VALUES (6, 'F');
COMMIT;
SELECT X.*, ROWNUM
FROM (SELECT T.*
, ORA_ROWSCN
FROM TEMP T
ORDER BY ORA_ROWSCN
) x
;
A B ORA_ROWSCN ROWNUM
1 A 2272697 1
2 B 2272699 2
3 C 2272701 3
4 D 2272703 4
5 E 2272705 5
6 F 2272707 6
Better. But if you've got a significant number of rows it's not going to go in fast. (I think this is what you would do if you intentionally wanted to slow down your inserts. ;) )
I think that's about as good as you'll get trying to get around using your own column, BUT there is still hope to economize storage: you can do away with the table + index and just go with an index-organized table. It's basically an index that you query directly.
It's just this easy:
CREATE TABLE TEMP2
( A NUMBER(10)
, B VARCHAR2(10)
, CONSTRAINT PK_CONSTRAINT PRIMARY KEY (A)
)
ORGANIZATION INDEX
;
There are other parameters you'll want to consider for this as well, but for more info check out... the docs.
I have a SQLite table which contains a numeric field field_name. I need to group by ranges of this column, something like this: SELECT CAST(field_name/100 AS INT), COUNT(*) FROM table GROUP BY CAST(field_name/100 AS INT), but including ranges which have no value (COUNT for them should be 0). And I can't get how to perform such a query?
You can do this by using a join and (though kludgy) an extra table.
The extra table would contain each of the values you want a row for in the response to your query (this would not only fill in missing CAST(field_name/100 AS INT) values between your returned values, but also let you expand it such that if your current groups were 5, 6, 7 you could include 0 through 10.
In other flavors of SQL you'd be able to right join or full outer join, and you'd be on your way. Alas, SQLite doesn't offer these.
Accordingly, we'll use a cross join (join everything to everything) and then filter. If you've got a relatively small database or a small number of groups, you're in good shape. If you have large numbers of both, this will be a very intensive way to go about this (the cross join result will have #ofRowsOfData * #ofGroups rows, so watch out).
Example:
TABLE: groups_for_report
desired_group
-------------
0
1
2
3
4
5
6
Table: data
fieldname other_field
--------- -----------
250 somestuff
230 someotherstuff
600 stuff
you would use a query like
select groups_for_report.desired_group, count(data.fieldname)
from data
cross join groups_for_report
where CAST(fieldname/100.0 AS INT)=desired_group
group by desired_group;
I am trying to update Table B of a database looking like this:
Table A:
id, amount, date, b_id
1,200,6/31/2012,1
2,300,6/31/2012,1
3,400,6/29/2012,2
4,200,6/31/2012,1
5,200,6/31/2012,2
6,200,6/31/2012,1
7,200,6/31/2012,2
8,200,6/31/2012,2
Table B:
id, b_amount, b_date
1,0,0
2,0,0
3,0,0
Now with this query I get all the data I need in one select:
SELECT A.*,B.* FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,0,0
3,400,6/29/2012,1,1,0,0
Now, I just want to copy the selected column amount to b_amount and date to b_date
b_amount=amount, b_date=date
resulting in
id, amount, date, b_id, id, b_amount, b_date
1,200,6/31/2012,1,1,200,6/31/2012
3,400,6/29/2012,1,1,400,6/29/2012
I've tried COALESCE() without success.
Does someone experienced have a solution for this?
Solution:
Thanks to the answers below, I managed to come up with this. It is probably not the most efficient way but it is fine for a one time only update. This will insert for you the first corresponding entry of each group.
REPLACE INTO A SELECT id, amount, date FROM
(SELECT A.id, A.amount, B.id as Bid FROM A INNER JOIN B ON (B.id=A.B_id)
ORDER BY A.id DESC)
GROUP BY Bid;
So what you are looking for seems to be a JOIN inside of an UPDATE query. In mySQL you would use
UPDATE B INNER JOIN A ON B.id=A.b_id SET B.amount=A.amount, B.date=A.date;
but this is not supported by sqlite as this probably related question points out. However, there is a workaround using REPLACE:
REPLACE INTO B
SELECT B.id, A.amount, A.date FROM A
LEFT JOIN B ON B.id=A.b_id
WHERE A.b_id>0 GROUP BY B.id;
The query will simply fill in the values of table B for all columns which should keep their state and fill in the values of table A for the copied values. Make sure the order of the columns in the SELECT statement meet your column order of table B and all columns are mentioned or you will loose these field's data. This is probably dangerous for future changes on table B. So keep in mind to change the column order/presence of this query when changing table B.
Something a bit off topic, because you did not ask for that: A.b_id is obviously a foreign key to B.id. It seems you are using the value 0 for the foreign key to express that there is no corresponding entry in B. (Inferred from your SELECT with WHERE A.b_id>0.) You should consider using the null value for that. When you are using INNER JOIN then instead of LEFT JOIN you can drop the WHERE clause entirely. The DBS will then sort out all unsatisfied relations.
WARNING Some RDBMS will return 2 rows as you show above. Others will return the Cartesian product of the rows i.e. A rows times B rows.
One tricky method is to generate SQL that is then executed
SELECT "update B set b.b_amount = ", a.amount, ", b.b_date = ", a.date,
" where b.id = ", a.b_id
FROM A LEFT JOIN B ON B.id=A.b_id WHERE A.b_id>0 GROUP BY B.id
Now add the batch terminator and execute this SQL. The query result should look like this
update B set b.b_amount = 200, b.b_date = 6/31/2012 where b.id = 1
update B set b.b_amount = 400, b.b_date = 6/29/2012 where b.id = 3
NOTE: Some RDBMS will handle dates differently. Some require quotes.