SQLITE and autoindexing - sqlite

I recently began exploring indexing in sqlite. I'm able to successfully create an index with my desired columns.
After doing this I took a look at the database to see that the index was created successfully only to find that sqlite had already auto created indexes for each of my tables:
"sqlite_autoindex_tablename_1"
These auto generated indices used two columns of each table, the two columns that make up my composite primary key. Is this just a normal thing for sqlite to do when using composite primary keys?
Since I'll be doing most of my queries based on these two columns, does it make sense to manually create indices, which are the exact same thing?
New to indices so really appreciate any support/feedback/tips, etc -- thank you!

SQLite requires an index to enforce the PRIMARY KEY constraint -- without an index, enforcing the constraint would slow dramatically as the table grows in size. Constraints and indexes are not identical, but I don't know of any relational database that does not automatically create an index to enforce primary keys. So yes, this is normal behavior for any relational database.
If the purpose of creating an index is to optimize searches where you have an indexable search term that involves the first column in the index then there's no reason to create an additional index on the column(s) -- SQLite will use the automatically created one.
If your searches will involve the second column in the index without including an indexable term for the first column you will need to create your index. Neither SQLite (nor any other relational database I know of) can use composite indexes to optimize filtering when the head columns of the index are not specified in the search.

Related

Dynamodb using partition key in a global secondary index

New to DynamoDB, I have the partition group_id, and sort key groupid_storeid_sortk.
I am wanting to setup additional access pattern with the group_id and store_addrss_sortk.
Will this have any impact on performance using the partition key in the secondary index, or would it be better to create a new attribute as the secondary key, even though it would be duplicate data.
ThankYou
It’s fine to use the same partition key attribute again as the PK for the GSI. No problem there.
For the future: You may want to watch some videos on single-table design and start using PK/SK as generic names since you might want to overload what’s inside them for different items. And then you might want GSI1PK/GSI1SK as the GSI keys.
That’s a style thing when you aim for some optimizations single-table design can bring.
An index is simply another table that you don't have to manage yourself. When you create an index, the service (DynamoDB, for example) creates a new table for you and manages the synchronization of the data between the tables.
In DynamoDB you have two types of secondary indexes, Global and Local. If you use the same partition key, you can use both of these options. However, you have to define the secondary local index (SLI) when you create the table and you can't add it later. Only secondary global indexes (SGI) can be added after the creation of the table. You can read more about it in DyanmoDB documentation.
Regarding performance, you need to consider the cost (read/write capacity) on top of the usual time considerations. You need to see if you are writing a lot to the table and not only reading a lot. Based on that you can plan carefully the projection of the data into the new index. Remember that writes are about 10 times more expensive and slower than reads. You can read more about projection best practices here.

Does clickhouse support quick retrieval of any column?

I tried to use clickhouse to store 4 billion data, deployed on a single machine, 48-core cpu and 256g memory, mechanical hard disk.
My data has ten columns, and I want to quickly search any column through SQL statements, such as:
select * from table where key='mykeyword'; or select * from table where school='Yale';
I use order by to establish a sort key, order by (key, school, ...)
But when I search, only the first field ordered by key has very high performance. When searching for other fields, the query speed is very slow or even memory overflow (the memory allocation is already large enough)
So ask every expert, does clickhouse support such high-performance search for each column index similar to mysql? I also tried to create a secondary index for each column through index, but the performance did not improve.
You should try to understand how works sparse primary indexes
and how exactly right ORDER BY clause in CREATE TABLE help your query performance.
Clickhouse never will works the same way as mysql
Try to use PRIMARY KEY and ORDER BY in CREATE TABLE statement
and use fields with low value cardinality on first order in PRIMARY KEY
don't try to use ALL
SELECT * ...
it's really antipattern
moreover, maybe secondary data skip index may help you (but i'm not sure)
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/#table_engine-mergetree-data_skipping-indexes

Does SQLite support bulk-loading (sort-then-indexing)?

When constructing indexing tree from existing data, there is a bulk-loading algorithm, like
https://en.wikipedia.org/wiki/B%2B_tree#Bulk-loading
https://www.youtube.com/watch?v=HJgXVxsO5YU
When creating an index for a non-empty table, does SQLite use bulk-loading or create indexing by insertions? From my performance test, it seems that SQLite uses insertion to create indexing because the time costs between inserting table after indexing and creating indexing after insertion are similar.
Do we know why bulk-loading is not used? Does it not work well in practice?
Bulk loading requires that the data is already sorted.
SQLite implements sorting by inserting the rows into a temporary index, so using it for bulk loading would not be productive.

Performance issue with primary key

I am populating a medium-sized table (60GB, 500 million rows). The process completes reasonably fast if the table has no primary key (~1 hour using bulk insert), but it takes ~10 times longer if I create that table with the primary key. I assume this is because it takes time to verify the uniqueness constraint and also update the index at each insert.
I thought a good workaround would be to add the primary key later, since indexation on the table that's already populated should be much faster compared to incremental indexation. But sqlite doesn't seem to have the option to add primary key after the table is created (not sure why?).
I guess I could just not use a primary key at all, and instead just add a unique index after the table is populated. Is there any disadvantage to that?
Or any better solution recommended?
From a purely technical point of view, an unique index has exactly the same effect as a primary key. (In SQLite, some primary keys allow NULLs for backwards compatibility.)
The only difference is that the primary key constraint does not show up in the table definition itself, which might be a bad thing for documentation purposes.
Also see Is CREATE UNIQUE INDEX or INTEGER PRIMARY KEY more performant in SQLite.
Run the bulk insert inside a transaction and you'll avoid quite a few things that slow inserts down.
I just found this which is a great write up on how to speed things up in sqlite3.
Improve INSERT-per-second performance of SQLite?

Global indexes while renaming the partition name

I have a existing table with some indexes in it. I am going to do partitioning of that table using dbms redefinition. I also have to rename the partition names every 24 hours.
Is there any problem in global indexes after I rename the partition names. Please reply.
Is it mandatory to have a primary key to perform interval partitioning?
I am using oracle 11g
Renaming partitions doesn't affect index status, global or otherwise. They stay valid if they were valid before the rename.
You don't need a primary key for interval partitioning. The constraints are the same as for range partitioning, with some restrictions. See Interval Partitioning in the concepts guide:
You can only specify one partitioning key column, and it must be of NUMBER or DATE type.
Interval partitioning is not supported for index-organized tables.
You cannot create a domain index on an interval-partitioned table.
Note that the names for the partitions created automatically on an interval-partitioned table are system-generated. You can rename them after they've been created, but you cannot, in 11gR2, have them created with a name of your choice.

Resources