Was it mandatory to put a "distinct" field as the first field in a query? - sqlite

Just out of curiosity, looks like a distinct field must be placed ahead of any other fields, am I wrong?
See this example in SQLite,
sqlite> select ip, distinct code from parser; # syntax error?
Error: near "distinct": syntax error
sqlite> select distinct code, ip from parser; # works
Why is that? Do I really have a syntax error?

There is no such thing as a "distinct field".
distinct applies to all fields in the query and therefore must appear immediately after select.
In other words, select distinct code, ip is really
select distinct
code,
ip
rather than
select
distinct code,
ip
It selects all distinct pairs of (code, ip). Thus the result set could include repeated values of code (each with a different value of ip).
It is not possible to apply distinct to a single field in the way you're trying to (group by might be a useful alternative, but we need to understand what it is exactly that you're trying to achieve).

Related

How to get the id of a newly-created value with Diesel and SQLite?

Diesel's SqliteBackend does not implement the SupportsReturningClause trait, so the get_result method cannot be used to retrieve a newly created value.
Is there another way to find out the id of the inserted row? Python has a solution for this. The only solution I've found so far is to use a UUID for ids instead of an autoincrement field.
The underlying issue here is that SQLite does not support SQL RETURNING clauses which would allow you to return the auto generated id as part of your insert statement.
As the OP only provided a general question I cannot show examples how to implement that using diesel.
There are several ways to workaround this issue. All of them require that you execute a second query.
Order by id and select just the largest id. That's the most direct solution. It shows directly the issues with doing a second query, as there can be a racing insert at any point in time, so that you can get back the wrong id (at least if you don't use transactions).
Use the last_insert_rowid() SQL function to receive the row id of the last inserted column. If not configured otherwise those row id matches your autoincrement primary integer key. On diesel side you can use no_arg_sql_function!() to define the underlying sql function in your crate.

Do not fail on missing column in a SQLLite query

I have a simple query like this:
SELECT * FROM CUSTOMERS WHERE CUSTID LIKE '~' AND BANKNO LIKE '~'
The problem is, the customers-table might or might not contain the BANKNO column depending on circumstances I've no control over. If however BANKNO is not a column in CUSTOMERS, this query fails.
So my question is: it is possible to test if the BANKNO column exists and if so, to include it in the query and if not to exclude this column?
The query really has to be flexible.
A non-existent column in a SELECT to sqlite3 will always fail.
One option might be to put the "full" sql in a try block, and if it errors, execute the other sql.
Or, you could query PRAGMA table_info('CUSTOMERS') and interrogate the result to see if a column in question is in the database. Find the sqlite doc here https://www.sqlite.org/pragma.html#pragma_table_info.
I'm sure there are other options, but the bottom line is you need to know before the sql is executed that it contains only valid column names.

Column ' ' in where clause is ambiguous Error in mysql

SELECT tbl_user.userid,
tbl_user.firstname,
tbl_user.lastname,
tbl_user.email,
tbl_user.created,
tbl_user.createdby,
tbl_organisation.organisationname
FROM tbl_user
INNER JOIN tbl_organisation
ON tbl_user.organisationid = tbl_organisation.organisationid
WHERE organisationid = #OrganisationID;
I am using this statement to do a databind. I am getting a error here.
Column 'OrganisationID' in where clause is ambiguous
What should I do is it wrong to name the OrganisationID in tbl_user same as tbl_organisation.
OrganisationID is a foreign key from tbl_Organisation
Since you have two columns with the same name on two different tables (and that's not a problem, it's even recommended on many cases), you must inform MySQL which one you want to filter by.
Add the table name (or alias, if you were using table aliases) before the column name. In your case, either
WHERE tbl_user.OrganisationID
or
WHERE tbl_Organisation.OrganisationID
should work.
You just need to indicate which table you are targeting with that statement, like "tbl_user.OrganisationID". Otherwise the engine doesn't know which OrganisationID you meant.
It is not wrong the have the same column names in two tables. In many (even most) cases, it is actually perferred.

Filemaker Sql Queries against columns with spaces in the name

I have an ODBC DSN setup to hit a Filemaker database from my ASP.Net application. I'm trying to form a valid query where the column name has spaces in it. In T-SQL, you would enclose it in []. But I fail to get it to work in this case. Here's a valid query:
select * from ua_inventory where location like '%a%'
But this is not:
select * from ua_inventory where [item place] like '%a%'
I get the following error:
[DataDirect][ODBC SequeLink driver][ODBC Socket][DataDirect][ODBC FileMaker driver][FileMaker]Parse Error in SQL
Does anyone have a clue how to form queries where the table and/or columns have spaces in the name?
Thanks in advance
Here are some example queries:
SELECT DISTINCT LastNameFirst, "Full Name" FROM "UA Biographies" ORDER BY LastNameFirst"
SELECT DISTINCT Categories FROM UA_Inventory ORDER BY Categories
The important thing to remember is objects (table name & column names) need double quotes
The back-n-forth comments at the bottom of this artcle really helped out:
http://www.nathanm.com/filemaker-pro-odbc-quirks/

Efficiently finding unique values in a database table

I've got a database table with a very large amount of rows. This table represents messages that are logged by a system. Each message has a message type and this is stored it it's own field in the table. I'm writing a website for querying this message log. If I want to search by message type then ideally I would want to have a drop down box listing the message types that have come up in the database. Message types may change over time so I can't hard code the types into the drop down. I'll have to do some sort of lookup. Iterating over the entire table contents to find unique message values is obviously very stupid however being stupid in the database field I'm here asking for a better way. Perhaps a separate lookup table which the database occasionally updates listing just the unique message types that I can populate my drop down from would be a better idea.
Any suggestions would be much appreciated.
The platform I'm using is ASP.NET MVC and SQL Server 2005
A separate lookup table with the id of the message type stored in your log. This will reduce the size and increase the efficiency of the log. Also it would Normalize your data.
Yep, I would definitely go with the separate lookup table. You can then populate it using something like:
INSERT TypeLookup (Type)
SELECT DISTINCT Type
FROM BigMassiveTable
You could then run a top-up job periodically to pull in new types from your main table that don't already exist in the lookup table.
SELECT DISTINCT message_type
FROM message_log
is the most straightforward but not very efficient way.
If you have a list of types that can possibly appear in the log, use this:
SELECT message_type
FROM message_types mt
WHERE message_type IN
(
SELECT message_type
FROM message_log
)
This will be more efficient if message_log.message_type is indexed.
If you don't have this table but want to create one, and message_log.message_type is indexed, use a recursive CTE to emulate loose index scan:
WITH rows (message_type) AS
(
SELECT MIN(message_type) AS mm
FROM message_log
UNION ALL
SELECT message_type
FROM (
SELECT mn.message_type, ROW_NUMBER() OVER (ORDER BY mn.message_type) AS rn
FROM rows r
JOIN message_type mn
ON mn.message_type > r.message_type
WHERE r.message_type IS NOT NULL
) q
WHERE rn = 1
)
SELECT message_type
FROM rows r
OPTION (MAXRECURSION 0)
I just wanted to state the obvious: normalize the data.
message_types
message_type | message_type_name
messages
message_id | message_type | message_type_name
Then you can just do without any cached DISTINCT:
For your dropdown
SELECT * FROM message_types
For your retrieval
SELECT * FROM messages WHERE message_type = ?
SELECT m.*, mt.message_type_name FROM messages AS m
JOIN message_types AS mt
ON ( m.message_type = mt.message_type)
I'm not sure why you would want a cached DISTINCT which you'll have to update, when you can slightly tweak the schema and have one with RI.
Create an index on the message type:
CREATE INDEX IX_Messages_MessageType ON Messages (MessageType)
Then to get a list of unique Message Types, you run:
SELECT DISTINCT MessageType
FROM Messages
ORDER BY MessageType
Because the index is physically sorted in order of MessageType SQL Server can very quickly, and efficiently, scan through the index, picking up a list of unique message types.
It is not bad performing - it's what SQL Server is good at.
Admittedly, you can save some space by having a "message types" table. And if you only display a few messages at a time: then the bookmark lookup, as it joins back to the MessageTypes table, won't be a problem. But if you start displaying hundreds or thousands of messages at a time, then the join back to MessageTypes can get pretty expensive, and needless, and it will be faster to have the MessageType stored with the message.
But i would have no problem with creating an index on the MessageType column, and selecting distinct. SQL Server loves that sort of thing. But if you're finding it to be a real load on your server, once you're getting dozens of hits a second, then follow the other suggestion and cache them in memory.
My personal solution would be:
create the index
select distinct
and if i still had problems
cache in memory that expires after 30 seconds
As for the normalized/denormalized issue. Normalizing saves space, at the cost of CPU when joins are constantly performed. But the logical point of denoralization is to avoid duplicate data, which can lead to inconsistent data.
Are you planning on changing the text of a message type, which if you stored with the messages you would have to update all rows?
Or is there something to be said for the fact that at the time of the message the message type was "Client response requested"?
Have you considered an indexed view? Its result set is materialized and persists in storage so that the overhead of the lookup is separated from the rest of whatever you're trying to do.
SQL Server takes care of automagically updating the view when there is a data change which in its opinion would change the contents of the view, so in this respect it's less flexible than Oracle materialized.
The MessageType should be a Foreign Key in the main table to a definition table containing the message type codes and descriptions. This will greatly increase your lookup performance.
Something like
DECLARE #MessageTypes TABLE(
MessageTypeCode VARCHAR(10),
MessageTypeDesciption VARCHAR(100)
)
DECLARE #Messages TABLE(
MessageTypeCode VARCHAR(10),
MessageValue VARCHAR(MAX),
MessageLogDate DATETIME,
AdditionalNotes VARCHAR(MAX)
)
From this design, your lookup should only query MessageTypes
As others have said, create a separate table of message types. When you add a record to the message table, check if the message type already exists in the table. If not, add it. In either case, then post the identifier from the message type table into the message table. This should give you normalized data. Yes, it's a little extra time when you add a record, but should be more efficient on retrieval.
If there are a lot more adds then reads and if the "message type" is short, an entirely different approach would be to still create the separate message type table, but don't reference it when doing adds, and only update it lazily, on demand.
Namely, (a) Include a time-stamp in each message record. (b) Keep a list of the message types found as of the last time you checked. (c) Each time you check, search for any new message types added since the last time, as in:
create table temp_new_types as
(select distinct message_type
from message
where timestamp>last_type_check
);
insert into message_type_list (message_type)
select message_type
from temp_new_types
where message_type not in (select message_type from message_type_list);
drop table temp_new_types;
Then store the timestamp of this check somewhere so you can use it the next time around.
The answer is to use 'DISTINCT' and each best solution is different for different sizes of table. Thousands of rows, millions, billions ? more ? This are very different best solutions.

Resources