How to find 2 matching records comparing two fields in one table to two fields in another table? - ms-access-2010

Using MS Access 2010, I have two tables that I need to compare and only retrieve the matches.
Table [EE] has 5 fields:
Field4, SSN, Birthdate, Address1, Address2
Table [UPDATED] has fields:
Field4, DOB
and several others not relevant to thus question.
I need to find all records in [EE] from fields Field4 AND Birthdate that have a matching values in BOTH Field4 and DOB in [UPDATED]. I have tried INNER JOIN and it is returning me several duplicates. I have tried:
SELECT EE.Birthdate, EE.Field4
FROM EE, UPDATED
WHERE (EE.Birthdate = UPDATED.DOB)
AND (EE.Field4 = UPDATED.FIELD4)
And
SELECT EE.Birthdate, EE.Field4
FROM INNER JOIN UPDATED ON EE.Birthdate = UPDATED.DOB)
AND (EE.Field4 = UPDATED.FIELD4)
I am getting a lot of duplicate records and only want the records that appear in BOTH tables.

There are probably actually duplicates. Add the DISTINCT keyword to eliminate duplicates:
SELECT DISTINCT EE.Birthdate, EE.Field4
FROM EE
JOIN UPDATED ON EE.Birthdate = UPDATED.DOB
AND EE.Field4 = UPDATED.FIELD4
I also fixed a few syntax errors.
Also, always prefer using proper join syntax.

Related

SQLite: treat non-existent column as NULL

I have a query like this (simplified and anonymised):
SELECT
Department.id,
Department.name,
Department.manager_id,
Employee.name AS manager_name
FROM
Department
LEFT OUTER JOIN Employee
ON Department.manager_id = Employee.id;
The field Department.manager_id may be NULL. If it is non-NULL then it is guaranteed to be a valid id for precisely one row in the Employee table, so the OUTER JOIN is there just for the rows in the Department table where it is NULL.
Here is the problem: old instances of the database do not have this Department.manager_id column at all. In those cases, I would like the query to act as if the field did exist but was always NULL, so e.g. the manager_name field is returned as NULL. If the query only used the Department table then I could just use SELECT * and check for the column in my application, but the JOIN seems to make this impossible. I would prefer not to modify the database, partly so that I can load the database in read only mode. Can this be done just by clever adjustment of the query?
For completeness, here is an answer that does not require munging both possible schemas into one query (but still doesn't need you to actually do the schema migration):
Check for the schema version, and use that to determine which SELECT query to issue (i.e. with or without the manager_id column and JOIN) as a separate step. Here are a few possibilities to determine the schema version:
The ideal situation is that you already keep track of the schema by assigning version numbers to the schema and recording them in the database. Commonly this is done with either:
The user_version pragma.
A table called "Schema" or similar with one row containing the schema version number.
You can directly determine whether the column is present in the table. Two possibilities:
Use the table_info pragma to determine the list of columns in the table.
Use a simple SELECT * FROM Table LIMIT 1 and look at what columns are returned (this is probably better as it is independent of the database engine).
This seems to work:
SELECT
Dept.id,
Dept.name,
Dept.manager_id,
Employee.name AS manager_name
FROM
(SELECT *, NULL AS manager_id FROM Department) AS Dept
LEFT OUTER JOIN Employee
ON Dept.manager_id = Employee.id;
If the manager_id column is present in Department then it is used for the join, whereas if it is not then Dept.manager_id and Employee.name are both NULL.
If I swap the column order in the subquery:
(SELECT NULL AS manager_id, * FROM Department) AS Dept
then the Dept.manager_id and Employee.name are both NULL even if the Department.manager_id column exists, so it seems that Dept.manager_id refers to the first column in the Dept subquery that has that name. It would be good to find a reference in the SQLite documentation saying that this behaviour is guaranteed (or explicitly saying that it is not), but I can't find anything (e.g. in the SELECT or expression pages).
I haven't tried this with other database systems so I don't know if it will work with anything other than SQLite.

How to update multiple columns with a single statement with multiple conditions

I want to update a series of columns Country1, Country2... Country 9 based on a comma delimited string of country names in column Country. I have done so with the following statment:
cur.execute("\
UPDATE t \
SET Country1 = returnCommaDelimitedValue(Country,0),\
Country2 = returnCommaDelimitedValue(Country,1),\
...
Country10 = returnCommaDelimitedValue(Country,9),\
WHERE Country IS NOT NULL\
;")
I also have a column named Genre in my tabe. Now I would like to update columns Genre1, Genre2... Genre9 in the same statment. I think the statment would look something like:
cur.execute("\
UPDATE t \
SET Country1 = returnCommaDelimitedValue(Country,0), Genre1 = returnCommaDelimitedValue(Genre,0)\
Country2 = returnCommaDelimitedValue(Country,1), Genre2 = returnCommaDelimitedValue(Genre,1)\
...
Country10 = returnCommaDelimitedValue(Country,9), Genre10 = returnCommaDelimitedValue(Genre,9)\
WHERE Country IS NOT NULL\
AND Genre IS NOT NULL
;")
Is the statement correct?
You should run 2 separate update queries. The problem is that Country can be null independently of Genre. So however you combine the conditions Country IS NOT NULL and Genre IS NOT NULL in the where clause of the query, you will miss some columns that could have been updated, or you will update some columns with null values.
Now, I haven't seen the implementation of returnCommaDelimitedValue. It is possible that it returns null when the string argument is null. In that case, you might consider remove the where clause completely and update the countryN and genreN columns in the same query. In that case, if Country is null, this will also make Country1 null, so this might be something that you want. If almost all rows have a non-null Country and Genre, this approach might be faster.
Country1, Country2, Country3... this is an anti-pattern. You're feeling the consequences of very poor table design. Don't feel too put out, this is a very common anti-pattern. Lists are difficult and very non-intuitive to work with in standard SQL. The whole mess is referred to as "normalization".
Lists are represented by join tables in a one-to-many relationship. One t has many countries. This is a very big topic to get into, but here's a sketch.
Instead of having Country1, Country2, Country3, etc... in whatever t is, I'm going to call it Thing, you'd have a table called ThingCountries to represent the list.
create table ThingCountries (
ThingID integer references Thing(id),
Country text
);
Each Country belonging to a Thing would be inserted into ThingCountries along with the ID of the thing.
# Do this in a loop
insert into ThingCountries (ThingID, Country) values (?, ?)
They'd be retrieved with a join by linking Thing.ID and ThingCountries.ThingID.
select ThingCountries.Country from Thing
join ThingCountries on Thing.ID = ThingCountries.ThingID
where Thing.ID = ?
By querying ThingCountries you can quickly find out which Things have a certain Country.
select ThingID from ThingCountries
where Country = ?
They can be removed with a simple delete.
delete from ThingCountries where ThingID = ? and Country = ?
There's no need to know how many Country columns there are. There's no gaps to be filled in. There's no limit to how many Countries a Thing can have.
Later on down the road you might want to store information about each country, like its name and abbreviation. In which case you make a Country table.
create table Country (
id integer primary key,
name text not null,
abbrev text not null
);
And then ThingCountries references Country by is id rather than storing the Country name.
create table ThingCountries (
ThingID integer references Thing(id),
CountryID integer references Country(id)
);
Now you can store whatever information you want about each Country, and it protects against typos (because the Country has to exist in the Country table).
Do the same thing for Genre and your problem goes away.
It's a bit awkward, but that's how SQL does it. Best to get used to it. Alternatively some databases offer array types to make this simpler, like Postgres arrays.
More Reading:
Ten Common Database Design Mistakes particularly "Ignoring Normalization"
Database normalization

Return a column once time

I execute a query in my DB:
SELECT table1.*, tabl2.* FROM table1 JOIN table2 USING(id);
In these 2 tables i have a common column "id". What I have to ask, in order to get the column 'id' once time in the results and not twice?
I thought one solution is to write down in the query which columns I want. But If I want to avoid this (as there are many) ?
Will it work for you to name specific columns you need from both tables? something like:
SELECT table1.id, table2.other_column1, table2.other_column2 FROM table1 JOIN table2 USING(id);
You are selecting all fields from both tables by using (*)

Comparing records of two tables and discarding matches

I am trying to implement an idea where I have two sql tables in a database.
Table Info which has a field Nationality and the other Table Exclusion which has a field Keyword.
`Info.Nationality` `Exclusion.Keyword`
|British| |France|
|resteraunt de France| |Spanish|
|German|
|Flag Italian|
|Spanish rice|
|Italian pasta
|Irish beef|
In my web application I am creating a GridView4 and through a DataTable and SqlDataAdapter I am populating that GridView4 with the SQL command:
SELECT DISTINCT Info.Nationality WHERE Exclusion.Keyword NOT LIKE '%Spanish%'
That SQL statement retrieves all the distinct records in Info.Nationality which do not contain the word spanish.
What I am currently doing is that in the web app which is in vb.net I am adding two different GridViews, each have the data of each table which means that GridView2 has DISTINCT Info.Nationality and GridView3 has Exclusion.Keyword and then adding another GridView4 to display the results of the above SQL command.
The idea is to retrieve all the distinct records from Info.Nationlity which are not suppressed by the keyword constraints in Exclusion.keyword. So from the above mentioned Sql command the GridView4 will retrieve all the records which do not have the word "Spanish".
I am doing all of this in a nested for loop where in the first loop it takes each record (one by one) from Info.Nationality e.g.for each row As DataRow in Me.GridView2.Rows() and compare it with the second for loop which goes till the end of the Exclusion.Keyword which would be like For i=0 To Gridview3 - 1.
The problem is that in that sql statement I have to explicitly specify the word to compare. I tried adding the records of Exclusion.Keyword in a String and then replacing the Spanish Keyword In between the NOT LIKE with the name of the String which is Keywords and then assigning the name a parameter with cmd.parameter.addwithvalue(#String, Keywords). However this is not working, it is only comparing with the last record in the string and ignoring the first.
The idea behind all of this is to display all the records of Info.Nationality in GridView4 which do not contain the keywords in Exclusion.Keyword.
Is there an easier or more effecient way to do this? I was thinking of an Inner Join with a Like command but that is not my problem. My problem is that how can I compare each record one by one of Info.Nationlity with all the records in Exclusion.keyword and then retrieving the ones that do not match and discarding the ones that match.
Then in Gridview4 how can I edit the records without reflecting those changes or affecting in Info.Nationality but rather only Inserting to Exclusion.Keyword the changes.
SOLVED by adding ToString() after Text
In my asp.net web app, I tried this, but didn't work: (SOLVED)
`SELECT DISTINCT Nationality
FROM Info Where NOT EXISTS(SELECT * FROM Exclusion WHERE Info.Nationality LIKE '%' + #GridView +'%')`
`cmd.parameters.AddwithValue("#GridView", GridView3.Rows(i).Cells(0).Text.ToString())`
GridView3 Here has the Exclusion.Keywords data.
Would really appreciate your suggestions and thoughts around this.
You do not need to do this one-by-one, or "Row by agonizing row" as some DBAs are fond of describing this type of approach. There are lots of ways to write a query to only return the records from Info.nationality that do not match the exclusion keywords as a single expression.
My preference is to use the EXISTS clause and a correlated subquery:
SELECT Nationality
FROM Info I
WHERE NOT EXISTS(SELECT * FROM Exclusion WHERE I.Nationality LIKE '%' + Keyword + '%')
You can also express this as a left join.
SELECT I.Nationality
FROM Info I
LEFT OUTER JOIN Exclusion E
ON I.Nationality LIKE '%' + E.Keyword + '%'
WHERE E.Keyword IS NULL
The left join will return all the rows from info and insert nulls in the columns for Exclusion except where the join criteria matches. By filtering for only where those values are null, you can avoid the matches.

Speed up SQL select in SQLite

I'm making a large database that, for the sake of this question, let's say, contains 3 tables:
A. Table "Employees" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
B. Table "Job_Sites" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
Others don't matter
C. Table "Workdays" with fields:
id = INTEGER PRIMARY INDEX AUTOINCREMENT
emp_id = is a foreign key to Employees(id)
job_id = is a foreign key to Job_Sites(id)
datew = INTEGER that stands for the actual workday, represented by a Unix date in seconds since midnight of Jan 1, 1970
The most common operation in this database is to display workdays for a specific employee. I perform the following select statement:
SELECT * FROM Workdays WHERE emp_id='Actual Employee ID' AND job_id='Actual Job Site ID' AND datew>=D1 AND datew<D2
I need to point out that D1 and D2 are calculated for the beginning of the month in search and for the next month, respectively.
I actually have two questions:
Should I set any fields as indexes besides primary indexes? (Sorry, I seem to misunderstand the whole indexing concept)
Is there any way to re-write the Select statement to maybe speed it up. For instance, most of the checks in it would be to see that the actual employee ID and job site ID match. Maybe there's a way to split it up?
PS. Forgot to say, I use SQLite in a Windows C++ application.
If you use the above query often, then you may get better performance by creating a multicolumn index containing the columns in the query:
CREATE INDEX WorkdaysLookupIndex ON Workdays (emp_id, job_id, datew);
Sometimes you just have to create the index and try your queries to see what is faster.

Resources