I am new to idexes and DB optimization. I know there is simple index for one
CREATE index ON table(col)
possibly B-Tree will be created and search capabilities will be improved.
But what is happen for 2 columns index ? And why is the order of defnition important?
CREATE index ON table(col1, col2)
Yes, B-Tree index will be created in most of the database if you didn't specify other type of index. Composite index is useful when the combined selectivity of the composite columns happed on the queries.
The order of the columns on the composite index is important as searching by giving exact values for all the fields included in the index leads to minimal search time but search uses only the first field to retrieve all matched recaords if we provide the values partially with first field.
I found following example for your understanding:
In the phone book example with an composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns.
Related
Given two DynamoDB tables: Books and Words, how can I create an index that associates the two? Specifically, I'd like to query to get all Books that contain a certain Word, and query to get all Words that appear in a specific Book.
The objective is to avoid scanning an entire table for these queries.
Based on your question I can't tell if you only care about unique words or if you want every word including duplicates. I'll assume unique words.
This can be done with a single table and a Global Secondary Index.
Create a table called BookWords with a Hash key of bookId and a Sort key of word. If you Query this table with a bookId you will get all of the unique words in that book.
Create a Global Secondary Index with a Hash key of word and a Sort key of bookId. If you Query this index with a word you will get all of the bookIds of books that contain that word.
Depending of your use case, you will probably want to normalize the words. For example, is "Word" the same as "word"?
If you want all words, not just unique words, you can use a similar approach with a few small changes. Let me know
I want to make unique constraint in cassandra .
As i want to all the value in my column be unique in my column family
ex:
name-rahul
phone-123
address-abc
now i want that i this row no values equal to rahul ,123 and abc get inserted again on seraching on datastax i found that i can achieve it by doing query on partition key as IF NOT EXIST ,but not getting the solution for getting all the 3 values uniques
means if
name- jacob
phone-123
address-qwe
this should also be not inserted into my database as my phone column has the same value as i have shown with name-rahul.
The short answer is that constraints of any type are not supported in Cassandra. They are simply too expensive as they must involve multiple nodes, thus defeating the purpose of having eventual consistency in first place. If you needed to make a single column unique, then there could be a solution, but not for more unique columns. For the same reason - there is no isolation, no consistency (C and I from the ACID). If you really need to use Cassandra with this type of enforcement, then you will need to create some kind of synchronization application layer which will intercept all requests to the database and make sure that the values are unique, and all constraints are enforced. But this won't have anything to do with Cassandra.
I know this is an old question and the existing answer is correct (you can't do constraints in C*), but you can solve the problem using batched creates. Create one or more additional tables, each with the constrained column as the primary key and then batch the creates, which is an atomic operation. If any of those column values already exist the entire batch will fail. For example if the table is named Foo, also create Foo_by_Name (primary key Name), Foo_by_Phone (primary key Phone), and Foo_by_Address (primary key Address) tables. Then when you want to add a row, create a batch with all 4 tables. You can either duplicate all of the columns in each table (handy if you want to fetch by Name, Phone, or Address), or you can have a single column of just the Name, Phone, or Address.
I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)
I'm trying to get to grips with indexes. Given the table:
Books
------
ID (PK)
Title
CategoryID (FK)
AuthorID (FK)
Where in my ASP.net pages, I have webpages that will fetch the books by author, or by category, would I create an index on CategoryID Asc, AuthorID asc if I wanted to improve retrieval times?
Have I correctly understood it? If I use multiple columns as above, is that called a clustered index or is that something else?
You should create two indexes, one for the CategoryID and one for the AuthorID. Having both in the same index is not what you need if you look for one or the other; you'd need that if you were always querying for both at the same time (e.g. category and author).
A clustered index controls the physical order of the data. Usually, if you have an identity column, using it as clustered index (which the primary key by default is) is just fine.
A clustered index means the data is stored in the table and on disk (etc.) in the order the index specifies. A consequence of this is, that only one clustered index can exist.
The index CategoryID Asc, AuthorID asc will make lookups on specific categories faster, and lookups on specific categories with specific authors would be ideal. But it is not ideal for author lookups alone because you will have to find authors for every category. In that case two separate indexes would be better.
The appropriate index would depends on what the query does. If you have a query joining against both category and author, then you may have use for an index with both fields, otherwise you may have more use for two separate indexes.
A clustered index is an index that decides the storage order of the records in the table, and has nothing to do with the number of fields it contains. You should already have a clustered index on the primary key, so you can't create another clustered index for that table.
I've read about primary, unique, clustered indexes etc. But I need to understand it via an example.
The image below is the auto-generated aspnet_Users table captured from SQL Server Web Admin Panel.
Auto-generated ASP.NET Users Table http://eggshelf.com/capture.jpg
Taking this as a model; I will create a custom table called Companies and let's say the fields are: ID, Name, ShortName, Address, City, Country.. No values can be duplicate for ID, Name and ShortName fields.
What is your approach on creating indexes for this table? Which should be clustered or non-clustered? Are indexes below logical to you?
Index Columns Primary Unique Clustered ign.Dup.Keys Unique Key
------------------------------------------------------------------------------------------
PK_ID ID True True False False False
Comp_Index Name,ShortName False True True False False
regards..
Indexes are not about table structure, but about access patterns.
You need to see how you query the data in the table and create your indexes accordingly.
The rule of thumb is to consider defining indexes on fields that are commonly used in the WHERE clause.
See this blog post on the subject.
Update
You can only define a single clustered index on a table. This is normally done on the identity field of the table, as you have in your example.
Other indexes will be non-clustered.
In regards to the other (non-clustered) index - if you intend on only having queries that contain both fields in the WHERE clause and the ORDER BY will have a primary sort on Name (as opposed to a primary sort on ShortName). The reason for that is that this is how the index will be stored - first on Name, then on ShortName.
If however, you will use ShortName as primary sort or without Name in the WHERE clause, you are better off with two indexes, one for each.
Go and get a quick overall understanding of SQL Server Indexes by reading Brad's Sure Guide to Indexes
Typically, having not performed any query analysis your starting point will be:
The Primary Key column can make a good candidate for the Clustered Index (often dependant on data type used and key width).
You should create Non-Clustered indexes on Foreign Key Columns.
You should create Non-Clustered indexes on SARG columns from your queries.
Then take a look at these generic index tips.
Oded is right - indices (Clustered and non) are all about performance and needs intimate knowledge about the types of queries.
e.g. If both ShortName and Name are both queried independently, you might want to have separate Non Clustered indexes for ShortName and Name.
If you need to enforce uniqueness, use UNIQUE INDEX (or add UNIQUE CONSTRAINTs to ShortName and Name). ID is already unique as it is the PK.
You can also change the Clustered Index (from its default of ID) if you know more about how data from your companies table will be fetched (e.g. Cluster on City if it is common practice to fetch all Companies in a City at once etc)