Is this normal behavior for a unique index in Sqlite? - sqlite

I'm working with SQLite in Flash.
I have this unique index:
CREATE UNIQUE INDEX songsIndex ON songs ( DiscID, Artist, Title )
I have a parametised recursive function set up to insert any new rows (single or multiple).
It works fine if I try to insert a row with the same DiscID, Artist and Title as an existing row - ie it ignores inserting the existing row, and tells me that 0 out of 1 records were updated - GOOD.
However, if, for example the DiscId is blank, but the artist and title are not, a new record is created when there is already one with a blank DiscId and the same artist and title - BAD.
I traced out the disc id prior to the insert, and Flash is telling me it's undefined. So I've coded it to set anything undefined to "" (an empty string) to make sure it's truly an empty string being inserted - but subsequent inserts still ignore the unique index and add a brand new row even though the same row exists.
What am I misunderstanding?
Thanks for your time and help.

SQLite allows NULLable fields to participate in UNIQUE indexes. If you have such an index, and if you add records such that two of the three columns have identical values and the other column is NULL in both records, SQLite will allow that, matching the behavior you're seeing.
Therefore the most likely explanation is that despite your effort to INSERT zero-length strings, you're actually still INSERTing NULLs.
Also, unless you've explicitly included OR IGNORE in your INSERT statements, the expected behavior of SQLite is to throw an error when you attempt to insert a duplicate INDEX value into a UNIQUE INDEX. Since you're not seeing that behavior, I'm guessing that Flash provides some kind of wrapper around SQLite that's hiding the true behavior from you (and could also be translating empty strings to NULL).

Larry's answer is great. To anyone having the same problem here's the SQLite docs citation explaining that in this case all NULLs are treated as different values:
For the purposes of unique indices, all NULL values are considered
different from all other NULL values and are thus unique. This is one
of the two possible interpretations of the SQL-92 standard (the
language in the standard is ambiguous). The interpretation used by
SQLite is the same and is the interpretation followed by PostgreSQL,
MySQL, Firebird, and Oracle. Informix and Microsoft SQL Server follow
the other interpretation of the standard, which is that all NULL
values are equal to one another.
See here: https://www.sqlite.org/lang_createindex.html

Related

Querying on Global Secondary indexes with a usage of contains operator

I've been reading a DynamoDB docs and was unable to understand if it does make sense to query on Global Secondary Index with a usage of 'contains' operator.
My problem is as follows: my dynamoDB document has a list of embedded objects, every object has a 'code' field which is unique:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
]
}
I want to be able to get all documents that contain entities with entity.code = X.
For this purpose I'm considering adding a Global Secondary Index that would contain all entity.codes that are present in current db document separated by a comma. So the example above would look like:
{
"entities":[
{"code":"entity1Code", "name":"entity1Name"},
{"code":"entity2Code", "name":"entity2Name"}
],
"entitiesGlobalSecondaryIndex":"entityCode1,entityCode2"
}
And then I would like to apply filter expression on entitiesGlobalSecondaryIndex something like: entitiesGlobalSecondaryIndex contains entityCode1.
Would this be efficient or using global secondary index does not make sense in this way and DynamoDB will simply check the condition against every document which is similar so scan?
Any help is very appreciated,
Thanks
The contains operator of a query cannot be run on a partition Key. In order for a query to use any sort of operators (contains, begins with, > < ect...) you must have a range attributes- aka your Sort Key.
You can very well set up a GSI with some value as your PK and this code as your SK. However, GSIs are replication of the table - there is a slight potential for the data ina GSI to lag behind that of the master copy. If the query you're doing against this GSI isn't very often, then you're probably safe from that.
However. If you are trying to do this to the entire table at once then it's no better than a scan.
If what you need is a specific Code to return all its documents at once, then you could do a GSI with that as the PK. If you add a date field as the SK of this GSI it would even be time sorted. If you query against that code in that index, you'll get every single one of them.
Since you may have multiple codes, if they aren't too many per document, you maybe could use a Sparse Index - if you have an entity with code "AAAA" then you also have an attribute named AAAA (or AAAAflag or something.) It is always null/does not exist Unless the entities contains that code. If you do a GSI on this AAAflag attribute, it will only contain documents that contain that entity code, and ignore all where this attribute does not exist on a given document. This may work for you if you can also provide a good PK on this to keep the numbers well partitioned and if you don't have too many codes.
Filter expressions by the way are different than all of the above. Filter expressions are run on tbe data that would be returned, after it is already read out of the table. This is useful I'd you have a multi access pattern setup, but don't want a particular call to get all the documents associated with a particular PK - in the interests of keeping the data your code is working with concise. The query with a filter expression still retrieves everything from that query, but only presents what makes it past the filter.
If are only querying against a particular PK at any given time and you want to know if it contains any entities of x, then a Filter expressions would work perfectly. Of course, this is only per PK and not for your entire table.
If all you need is numbers, then you could do a count attribute on the document, or a meta document on that partition that contains these values and could be queried directly.
Lastly, and I have no idea if this would work or not, if your entities attribute is a map type you might very well be able to filter against entities code - and maybe even with entities.code.contains(value) if it was an SK - but I do not know if this is possible or not

Tying table records together in SQLite3

I am currently working on a database structure in SQLite Studio (not sure whether that's in itself important, but might as well mention), and error messages are making me wonder whether I'm just going at it the wrong way or there's some subtlety I'm missing.
Assume two tables, people-basics (person-ID, person-NAME, person-GENDER) and people-stats (person-ID, person-NAME, person-SIZE). What I'm looking into achieving is "Every record in people-basics corresponds to a single record in people-stats.", ideally with the added property that person-ID and person-NAME in people-stats reflect the associated person-ID and person-NAME in people-basics.
I've been assuming up to now that one would achieve this with Foreign Keys, but I've also been unable to get this to work.
When I add a person in people-basics, it works fine, but then when I go over to people-stats no corresponding record exists and if I try to create one and fill the Foreign Key column with corresponding data, I get this message: "Cannot edit this cell. Details: Error while executing SQL query on database 'People': no such column: people-basics.person" (I think the message is truncated).
The DDL I currently have for my tables (auto-generated by SQLite Studio based on my GUI operations):
CREATE TABLE [people-basics] (
[person-ID] INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
[person-NAME] TEXT UNIQUE
NOT NULL,
[person-GENDER] TEXT
);
CREATE TABLE [people-stats] (
[person-NAME] TEXT REFERENCES [people-basics] ([person-NAME]),
[person-SIZE] NUMERIC
);
(I've removed the person-ID column from people-stats for now as it seemed like I should only have one foreign key at a time, not sure whether that's true.)
Alright, that was a little silly.
The entire problem was solved by removing hyphens from table names and column names. (So: charBasics instead of char-basics, etc.)
Ah well.

Is SQLite "Insert or Replace" slower than just "Insert"?

I am copying millions of rows to a table in another database. I am doing a few things with the data in-between and have duplicates on a certain column that is used as a key in the destination table. Ignoring all the other solutions to fix this, I am testing out using "Insert or Replace" and so far processing is going smooth, but I am not sure whether this is faster than a normal "Insert" (given a case where there are no PK duplicates)?
The OR REPLACE clause works only if there is some UNIQUE (or PRIMARY KEY) constraint that could be violated.
This means that the database always has to check whether there is a duplicate, the only difference is what happens when a duplicate is found: report an error, or delete the old row.

How does GAE datastore index null values

I'm concerned about read performance, I want to know if putting an indexed field value as null is faster than giving it a value.
I have lots of items with a status field. The status can be, "pending", "invalid", "banned", etc...
my typical request is to find the status "ok" (or null). Since null fields are not saved to datastore, it is already a win to avoid to have a "useless" default value I can replace with null. So I already have less disk space use.
But I was wondering, since datastore is noSql, it doesn't know about the data structure and it doesn't know there is a missing column status. So how does it do the status = null request check?
Does it have to check all columns of each row trying to find my column? or is there some smarter mechanism?
For example, index (null=Entity,key) when we pass a column explicitly saying it is null (if this is the case, does Objectify respect that and keep the field in the list when passing it to the native API if it's null?)
And mainly, which request is more efficient?
The low level API (and Objectify) stores and indexes nulls if you specify that a field/property should be indexed. For Objectify, you can specify #Ignore(IfNull.class) or #Unindex(IfNull.class) if you want to alter this behavior. You are probably confusing this with documentation for other data access APIs.
Since GAE only allows you to query for indexed fields, your question is really: Is it better to index nulls and query for them, or to query for everything and filter out non-null values?
This is purely a question of sparsity. If the overwhelming majority of your records contain null values, then you're probably better off querying for everything and filtering out the ones you don't want manually. A handful of extra entity reads are probably cheaper than updating and storing an extra index. On the other hand, if null records are a small percentage of your data, then you will certainly want the index.
This indexing dilema is not unique to GAE. All databases present this question with respect to low-cardinality fields; it's just that they'll do the table scan (testing and skipping rows) for you.
If you really want to fine-tune this behavior, read Objectify's documentation on Partial Indexes.
null is also treated as a value in datastore and there will be entries for null values in indexes. Datastore doc says, "Datastore distinguishes between an entity that does not possess a property and one that possesses the property with a null value"
Datastore will never check all columns or all records. If you have this property indexed, it will get records from the index only If not indexed, you cannot query by that property.
In terms of query performance, it should be the same, but you can always profile and check.

tSQLt AssertEqualsTable does not check ordering

I have two tables defined for actual and expected with exactly the same schema. I insert two rows into the expected table with say Ids of 2, 1.
I run
INSERT INTO actual EXEC tSQLt.ResultSetFilter 1, '{statement}'
to populate the actual then
EXEC tSQLt.AssertEqualsTable #expected = 'expected' , #actual = 'actual'
to compare the results.
Even though the data is in a different order (Ids are 1, 2 in the actual), the test passes.
I confirmed that the data was different by adding SELECT * FROM actual and SELECT * FROM expected in the test and running the test on its own with tSQLt.Run '{test name}'.
Does anyone know if this is a known bug? Apparently it is supposed to check per row so the ordering should be checked. All the other columns are NULL that are returned it is just the ID column that contains a value.
Unless an order by clause is specified in the select statement, the order isn't guaranteed by SQL server (see the top bullet point at this MSDN page) - although in practice it is often ordered as you might expect.
Because of this, I believe that tSQLt looking for non-identical and identical rows makes sense - but checking the order doesn't - otherwise the answer could change at the whim of SQL server and the test would be meaningless (and worse - intermittently failing!). The tSQLt user guide on AssertEqualsTable states that it checks the content of the table, but not that it checks the ordering therein. What leads you to conclude that the order should be being checked as well? I couldn't find mention of it.
If you need the order to be checked, you could insert both expected and actual results into a temporary table with an identity column (or use ROW_NUMBER) and check the resultant table - if the order is different then the identity cols would be different.
There is a similar method documented here on Greg M Lucas' blog.
Relying on the order returned from the table without an order by clause is not recommended (MSDN link) - so I'd suggest including one in your application's call to the statement, or if an SP within it if the order of returned rows is important.

Resources