SQL question: vlookup equivalent in SELECT - teradata

My T1:
ROUTE_NAME
ASE DTW
BLI DTW
DTW MOD
DTW OGG
DTW VPS
DTW LAS
T2 is the lookup table which has two columns:
airp_cd city_cd
UPP UPP
MUF MUF
PPU PPU
CGV CGV
DTW DTT
I'd like to get the city pair by looking up the airp_cd in T2, and able to write an Excel formula for it, City Pair = VLOOKUP(LEFT(A2,3),T2!$A$1:$B$6,2,0)&" "&VLOOKUP(RIGHT(A2,3),T2!$A$1:$B$6,2,0). Here is the expected result:
ROUTE_NAME City pair
ASE DTW ASE DTT
BLI DTW BLI DTT
DTW MOD DTT MOD
DTW OGG DTT OGG
DTW VPS DTT VPS
DTW LAS DTT LAS
How can I write the sql for equivalent? I tried with:
SELECT
T1.ROUTE_NAME,
T2.city_cd
FROM T1
LEFT OUTER JOIN T2
ON (LEFT(T1.ROUTE_NAME,3) = T2.airp_cd AND RIGHT(T1.ROUTE_NAME,3) = T2.airp_cd)
But it's not the expected results.

Logically you are doing two separate lookups. So you can't use AND in a single join criteria - that would require that BOTH the left side AND the right side be equal to the same airp_cd. You would still be in a pickle if you tried OR instead - that would just mean only the right or left sides need to match, but you really want the left side to match (for the first lookup), and separately you also want the right side to match (for the second lookup).
So to make this work, you can separate the two lookups by using your T2 table twice, with aliases to represent the two lookups that they represent.
Note: this is written for MS SQL SERVER (although it is fairly vanilla SQL except maybe the syntax for temp tables and the datatypes) - so you may need to alter as needed for your database system.
sample data
CREATE TABLE #T1 (Route_Name NVARCHAR(30));
INSERT INTO #T1 VALUES
('ASE DTW'),
('BLI DTW'),
('DTW MOD'),
('DTW OGG'),
('DTW VPS'),
('DTW LAS');
CREATE TABLE #T2 (airp_cd NVARCHAR(30), city_cd NVARCHAR(30));
INSERT INTO #T2 VALUES
('UPP', 'UPP'),
('MUF', 'MUF'),
('PPU', 'PPU'),
('CGV', 'CGV'),
('DTW', 'DTT'),
('ASE', 'ASE'),
('BLI', 'BLI'),
('MOD', 'MOD'),
('VPS', 'VPS'),
('LAS', 'LAS'),
('OGG', 'OGG');
query
SELECT
T1.Route_Name,
COALESCE(RouteName1.city_cd, '')
+ ' '
+ COALESCE(RouteName2.city_cd, '') AS City_Pair
FROM
#T1 T1
LEFT OUTER JOIN #T2 AS RouteName1
ON LEFT(T1.Route_Name,3) = RouteName1.airp_cd
LEFT OUTER JOIN #T2 AS RouteName2
ON RIGHT(T1.Route_Name,3) = RouteName2.airp_cd;
result
ROUTE_NAME
City_Pair
ASE DTW
ASE DTT
BLI DTW
BLI DTT
DTW MOD
DTT MOD
DTW OGG
DTT OGG
DTW VPS
DTT VPS
DTW LAS
DTT LAS
PLEASE NOTE - using functions like Left() and Right() in your join criteria will almost certainly result in bad performance if you have a significant amount of data.
Table T1 really should be separated into a two-column table.
I hope this helps.

Related

SQLite order results by smallest difference

In many ways this question follows on from my previous one. I have a table that is pretty much identical
CREATE TABLE IF NOT EXISTS test
(
id INTEGER PRIMARY KEY,
a INTEGER NOT NULL,
b INTEGER NOT NULL,
c INTEGER NOT NULL,
d INTEGER NOT NULL,
weather INTEGER NOT NULL);
in which I would typically have entries such as
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
INSERT INTO test (a,b,c,d) VALUES(1,2,5,5,10100306);
INSERT INTO test (a,b,c,d) VALUES(1,5,5,5,11100306);
INSERT INTO test (a,b,c,d) VALUES(5,5,5,5,21101306);
Typically this table would have multiple rows with the some/all of b, c and d values being identical but with different a and weather values. As per the answer to my other question I can certainly issue
WITH cte AS (SELECT *, DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn FROM test where a = 1) SELECT * FROM cte WHERE rn < 3;
No issues thus far. However, I have one further requirement which arises as a result of the weather column. Although this value is an integer it is in fact a composite where each digit represents a "banded" weather condition. Take for example weather = 20100306. Here 2 represents the wind direction divided up into 45 degree bands on the compass, 0 represents a wind speed range, 1 indicates precipitation as snow etc. What I need to do now while obtaining my ordered results is to allow for weather differences. Take for example the first two rows
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30100306);
INSERT INTO test (a,b,c,d,weather) VALUES(1,2,3,4,30140306);
Though otherwise similar they represent rather different weather conditions - the fourth number is four as opposed to 0 indicating a higher precipitation intensity brand. The WITH cte... above would rank the first two rows at the top which is fine. But what if I would rather have the row that differs the least from an incoming "weather condition" of 30130306? I would clearly like to have the second row appearing at the top. Once again, I can live with the "raw" result returned by WITH cte... and then drill down to the right row based on my current "weather condition" in Java. However, once again I find myself thinking that there is perhaps a rather neat way of doing this in SQL that is outwith my skill set. I'd be most obliged to anyone who might be able to tell me how/whether this can be done using just SQL.
You can sort the results 1st by DENSE_RANK() and 2nd by the absolute difference of weather and the incoming "weather condition":
WITH cte AS (
SELECT *,
DENSE_RANK() OVER (ORDER BY (b=2) + (c=3) + (d=4) DESC) rn
FROM test
WHERE a = 1
)
SELECT a,b,c,d,weather
FROM cte
WHERE rn < 3
ORDER BY rn, ABS(weather - ?);
Replace ? with the value of that incoming "weather condition".

Find the x minimal values in a distance matrix in R

I have computed a distance matrix between patches of ancient forests and recent forests in PostgresSQL thank to the following code:
CREATE TABLE MatDist as (
SELECT
a.id a,
b.id b ,
st_distance(a.geom, b.geom) dist
FROM public.bvi_foret a, public.bvi_foret b
WHERE a.id != b.id AND a.ANC_FOR != b.ANC_FOR
)
and it works perfectly.
I want now to select the 5 pairs ancient forests (a)/recent forest (b) presenting the minimal distance between them.
So I started working with R, and I can find the unique pair presenting the minim distance, thanks to the following code:
DT <- data.table(df)
DT[ , .SD[which.min(dist)], by = a]
But how can I compute the 5 first pairs? It's probably easy, with a for loop or an apply function in R, but I can't find it...
Thanks in advance for your answers.
Using pure SQL:
SELECT *
FROM MatDistMat
ORDER BY dist
LIMIT 5;
Thanks for your answer, but I need the 5 first pairs FA/FR for each patch of ancient forest.
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY a ORDER BY dist ASC) as rn
FROM MatDistMat) sub
WHERE sub.rn <= 5;

SQL Query assistance - Looping through data query

I have two tables. Config and Data. Config table has info to define what I call "Predefined Points". The columns are configId, machineId, iotype, ioid, subfield and predeftype. I have a second table that contains all the data for all the items in the config table linked by configId. Data table contains configId, timestamp, value.
I am trying to return each row from the config table with 2 new columns in the result which would be min timestamp of this particular predefined point and max timestamp of this particular predefined point.
Pseudocode would be
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId where configId = (select configId from TrendConfig)
Where the subquery would return multiple values.
Any idea how to formulate this?
Try an inner join:
select a.*, b.min(timestamp), b.max(timestamp)
from config a
inner join data b
on a.configId = b.configID
I was able to find an answer using: Why can't you mix Aggregate values and Non-Aggregate values in a single SELECT?
The solution was indeed GROUP BY as CL mentioned above.
select a.*, min(b.timestamp), max(b.timestamp) from TrendConfig a join TrendData b on a.configId = b.configId group by a.configId

SQL Query to transpose rows into columns

I have a .net web application that uses SQL Server 2008. The data table I am trying to display in a grid contains columns that are actually rows of another table. Right now, I am doing this in the BLL, reading data into data table; getting the data from another table and making it into columns of first data table and then going through each row of data in that data table to populate the new columns. Very time consuming and slow.
I believe this can be done through a query in SQL 2012 and above using "Transpose" or something similar but not sure if it is possible in 2008. I researched and tried using "pivot" but I am not good at SQL and couldn't get it to work.
This is a simplified example of DB tables and what I need to display:
Facility Table:
FacilityID
12345
67890
PartnerInfo table:
PartnerID Partner
1 Partner1
2 Partner2
3 Partner3
FacilityPartner table:
FacilityID PartnerID
12345 1
12345 3
67890 2
67890 3
Need a query to return something like:
FacilityID Partner1 Partner2 Partner3
12345 true false true
67890 false true true
Following should give some idea on pivoting the data. It doesn't give you exact true false as you asked.
declare #facility table (facilityId int)
declare #PartnerInfo table (partnerid int, partnerN varchar(1000))
declare #FacilityPartner table (facilityId int,partnerid int)
insert into #facility values (12345)
insert into #facility values (67890)
insert into #facility values (67891)
insert into #PartnerInfo values (1, 'partner1')
insert into #PartnerInfo values (2, 'partner2')
insert into #PartnerInfo values (3, 'partner3')
insert into #FacilityPartner values(12345, 1)
insert into #FacilityPartner values(12345, 3)
insert into #FacilityPartner values(67890, 2)
insert into #FacilityPartner values(67890, 3)
select f.facilityId as facid, p.PartnerN as partn, 100 as val
FROM #facility f
LEFT join #FacilityPartner fp on f.facilityId = fp.facilityId
LEFT JOIN #PartnerInfo p on p.partnerid = fp.partnerid
select facid, Partner1 , partner2,partner3 FROM
(select f.facilityId as facid, p.PartnerN as partn, 100 as val
FROM #facility f
LEFT join #FacilityPartner fp on f.facilityId = fp.facilityId
LEFT JOIN #PartnerInfo p on p.partnerid = fp.partnerid) x
PIVOT(
avg(val)
for partn in ([partner1], [partner2],[partner3])
) as pvt
The first thing to understand is, just like many other languages, SQL has a sort of "compile" process, where an execution plan is produced. An SQL query MUST be able to know the precise number and types of columns at compile time, without referencing the data (it does have some table metadata available for the compile, which is why SELECT * works).
This means what you want to do is only possible if one of two conditions is met:
You must know the precise number of partners (and the names for the columns, in this case) ahead of time. This is true even for a query using the PIVOT keyword.
You must be willing to do this in multiple steps, using dynamic SQL, where the first step looks at the data to know how many columns you'll need. Then you can build up a new query in a varchar variable, and finally execute that string using Exec() orsp_executesql(). This works because the last step invokes a new "compile" process and execution context for that string variable.
Of course there's also a third option: pivot the data in your client code. That is my preference. Most people, though, opt for option 2.

HAVING clause with a LEFT JOIN and sqlite

I have 2 tables, say T1 and T2, with a 1-n relationship (n can be 0). I need to join the 2 tables, but only on the latest T2. So the query I made was like:
select * from t1 left join t2 on t1.a = t2.b group by t1.a having t2.c=max(t2.c)
Problem is that if there a no lines on T2 the query does not return a line, despite the LEFT JOIN. I think this is incorrect in regards to the SQL standard.
So does anyone know how to have a result even when n=0?
HAVING is executed after the grouping.
And any comparison with NULL fails, so this filters out rows with n=0.
In any case, t2.c=max(t2.c) does not make sense and does not do what you want.
If you have SQLite 3.7.11 or later, you can use max() in the result to select the row from which the GROUP BY results for the other columns come:
SELECT *, max(t2.c)
FROM t1 LEFT JOIN t2 ON t1.a = t2.b
GROUP BY t1.a

Resources