I've read about primary, unique, clustered indexes etc. But I need to understand it via an example.
The image below is the auto-generated aspnet_Users table captured from SQL Server Web Admin Panel.
Auto-generated ASP.NET Users Table http://eggshelf.com/capture.jpg
Taking this as a model; I will create a custom table called Companies and let's say the fields are: ID, Name, ShortName, Address, City, Country.. No values can be duplicate for ID, Name and ShortName fields.
What is your approach on creating indexes for this table? Which should be clustered or non-clustered? Are indexes below logical to you?
Index Columns Primary Unique Clustered ign.Dup.Keys Unique Key
------------------------------------------------------------------------------------------
PK_ID ID True True False False False
Comp_Index Name,ShortName False True True False False
regards..
Indexes are not about table structure, but about access patterns.
You need to see how you query the data in the table and create your indexes accordingly.
The rule of thumb is to consider defining indexes on fields that are commonly used in the WHERE clause.
See this blog post on the subject.
Update
You can only define a single clustered index on a table. This is normally done on the identity field of the table, as you have in your example.
Other indexes will be non-clustered.
In regards to the other (non-clustered) index - if you intend on only having queries that contain both fields in the WHERE clause and the ORDER BY will have a primary sort on Name (as opposed to a primary sort on ShortName). The reason for that is that this is how the index will be stored - first on Name, then on ShortName.
If however, you will use ShortName as primary sort or without Name in the WHERE clause, you are better off with two indexes, one for each.
Go and get a quick overall understanding of SQL Server Indexes by reading Brad's Sure Guide to Indexes
Typically, having not performed any query analysis your starting point will be:
The Primary Key column can make a good candidate for the Clustered Index (often dependant on data type used and key width).
You should create Non-Clustered indexes on Foreign Key Columns.
You should create Non-Clustered indexes on SARG columns from your queries.
Then take a look at these generic index tips.
Oded is right - indices (Clustered and non) are all about performance and needs intimate knowledge about the types of queries.
e.g. If both ShortName and Name are both queried independently, you might want to have separate Non Clustered indexes for ShortName and Name.
If you need to enforce uniqueness, use UNIQUE INDEX (or add UNIQUE CONSTRAINTs to ShortName and Name). ID is already unique as it is the PK.
You can also change the Clustered Index (from its default of ID) if you know more about how data from your companies table will be fetched (e.g. Cluster on City if it is common practice to fetch all Companies in a City at once etc)
Related
I'm new to AWS DynamoDB and wanted to clarify something. Is it possible to query a table and filter base on a non-primary key attribute. My table looks like the following
Store
Id: PrimaryKey
Name: simple string
Location: simple string
Now I want to query on the Name, but I think I have to give the key as well from what I know? Apart from that I can use the scan but then I will be loading all the data.
From the docs:
The Query operation finds items based on primary key values. You can query any table or secondary index that has a composite primary key (a partition key and a sort key).
DynamoDB requires queries to always use the partition key.
In your case your options are:
create a Global Secondary Index that uses Name as a primary key
use a Scan + Filter if the table is relatively small, or if you expect the result set will include the majority of the records in the table
There are few designs principals that you can follow while you are using DynamoDB. If you are coming from a relational background, you have already witnessed the query limitations from primary key attributes.
Design your tables, for querying and separating hot and cold data.
Create Indexes for Querying from Non Key attributes (You have two options, Global Secondary Index which you can define at any time and Local Secondary Index which you need to specify at table creation time).
With the Global Secondary Index you can promote any NonKey attribute as the Partition Key for the Index and select another attribute for Sort Key for querying. For Local Secondary Index, you can promote any Non Key attribute as the Sort Key keeping the same Partition Key.
Using Indexes for query is important also to improve the efficiency in using provisioned throughput.
Although having indexes consumes the read throughput from the table, it also saves read through put from in a way that, if you project the right amount of attributes to read, it can give a huge benefit in reading. Check the following example.
Lets say you have a DynamoDB table that has items of 40KB. If you read directly from the table to list 10 items, it consumes 100 Read Throughput Units (For one item 10 Units since one unit can read 4KB and multiply it by 10). If you have an index defined just to project the attributes needed to list which will be having 4KB per item, then it will be consuming only 10 Read Throughput Units(One Unit per item) which makes a huge difference in terms of cost.
With DynamoDB its really important how you define Indexes to optimize for Querying not only from Query capability but also in terms of throughput.
You can not query based non-primary key attribute in Dynamo Db.
If you wanted to still do that you can do it using scan query,but scan is costly operation in DyanmoDB and if table is large, then it will affect performance and not recommended because it will scan each item in table and AWS cost you for all item it scan for that query.
There are two ways to achieve it
Keep Store Id as your PrimaryKey/ Partaion key of Dyanmo DB table and add Name/Location as sort Key (only one as Dyanmo DB accept only one Attribute as sort key by design.
Create Global Secondary Indexes for Querying from Non Key attributes which you are more frequenly required.
There are 3 ways to created GSI in Dyanamo DB, In your case select GSI with option INCLUDE and add Name , Location and store ID in Idex.
KEYS_ONLY – Each item in the index consists only of the table partition key and sort key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.
INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.
ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.
I am new to idexes and DB optimization. I know there is simple index for one
CREATE index ON table(col)
possibly B-Tree will be created and search capabilities will be improved.
But what is happen for 2 columns index ? And why is the order of defnition important?
CREATE index ON table(col1, col2)
Yes, B-Tree index will be created in most of the database if you didn't specify other type of index. Composite index is useful when the combined selectivity of the composite columns happed on the queries.
The order of the columns on the composite index is important as searching by giving exact values for all the fields included in the index leads to minimal search time but search uses only the first field to retrieve all matched recaords if we provide the values partially with first field.
I found following example for your understanding:
In the phone book example with an composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns.
I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)
I'm trying to get to grips with indexes. Given the table:
Books
------
ID (PK)
Title
CategoryID (FK)
AuthorID (FK)
Where in my ASP.net pages, I have webpages that will fetch the books by author, or by category, would I create an index on CategoryID Asc, AuthorID asc if I wanted to improve retrieval times?
Have I correctly understood it? If I use multiple columns as above, is that called a clustered index or is that something else?
You should create two indexes, one for the CategoryID and one for the AuthorID. Having both in the same index is not what you need if you look for one or the other; you'd need that if you were always querying for both at the same time (e.g. category and author).
A clustered index controls the physical order of the data. Usually, if you have an identity column, using it as clustered index (which the primary key by default is) is just fine.
A clustered index means the data is stored in the table and on disk (etc.) in the order the index specifies. A consequence of this is, that only one clustered index can exist.
The index CategoryID Asc, AuthorID asc will make lookups on specific categories faster, and lookups on specific categories with specific authors would be ideal. But it is not ideal for author lookups alone because you will have to find authors for every category. In that case two separate indexes would be better.
The appropriate index would depends on what the query does. If you have a query joining against both category and author, then you may have use for an index with both fields, otherwise you may have more use for two separate indexes.
A clustered index is an index that decides the storage order of the records in the table, and has nothing to do with the number of fields it contains. You should already have a clustered index on the primary key, so you can't create another clustered index for that table.
I am using SQL Server2005 with asp.net. I want validation at server side to restrict duplicate entries, Here i am using two tables companies and Branches. In Branches Table i had maintain a foreign key of CompanyId. In Branches the BranchName can be duplicate but not for the Particular CompanyId.
Companies Table:
Columns: CompanyId (Primary Key), CompanyName
Branches Table :
Columns: BranchId(Primary Key), BranchName, CompanyId (Foreign Key).
Company Id can be Repeat multiple times, one to many Relationship.
Which query I use to that allow duplicate but not for the same CompanyId?
You want a constraint that enforces uniqueness against both the CompanyID and BranchName columns. This can either by the primary key for the table (as Tim has recommended), or a UNIQUE constraint:
ALTER TABLE Branches ADD
CONSTRAINT UQ_BranchNamesWithinCompanies UNIQUE (BranchName,CompanyID);
You can decide which order to put the columns within the constraint, based on how frequently searches are performed in the table based on the two columns. I.e. you're actually creating an index on these columns, so you may as well use it to improve some query performance.
The above ordering was based on a guess that you might search for branch names without reference to a particular company. If you're always searching within a company, and are performing prefix searches (e.g. CompanyID=21 and BranchName like 'Lon%'), then you'd want to reverse the order of the columns.
You could create a composite primary key from BranchName+CompanyId.
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx