SQL Server, is this correct index usage? - asp.net

I'm trying to get to grips with indexes. Given the table:
Books
------
ID (PK)
Title
CategoryID (FK)
AuthorID (FK)
Where in my ASP.net pages, I have webpages that will fetch the books by author, or by category, would I create an index on CategoryID Asc, AuthorID asc if I wanted to improve retrieval times?
Have I correctly understood it? If I use multiple columns as above, is that called a clustered index or is that something else?

You should create two indexes, one for the CategoryID and one for the AuthorID. Having both in the same index is not what you need if you look for one or the other; you'd need that if you were always querying for both at the same time (e.g. category and author).
A clustered index controls the physical order of the data. Usually, if you have an identity column, using it as clustered index (which the primary key by default is) is just fine.

A clustered index means the data is stored in the table and on disk (etc.) in the order the index specifies. A consequence of this is, that only one clustered index can exist.
The index CategoryID Asc, AuthorID asc will make lookups on specific categories faster, and lookups on specific categories with specific authors would be ideal. But it is not ideal for author lookups alone because you will have to find authors for every category. In that case two separate indexes would be better.

The appropriate index would depends on what the query does. If you have a query joining against both category and author, then you may have use for an index with both fields, otherwise you may have more use for two separate indexes.
A clustered index is an index that decides the storage order of the records in the table, and has nothing to do with the number of fields it contains. You should already have a clustered index on the primary key, so you can't create another clustered index for that table.

Related

How best to perform a query on primary partition key only, for a table which has both partition key and sort key?

Ok, I have a table with primary partition key (Employee ID) and Sort Key (Poject ID). Now I want a list of all projects an employee works on. Also I want list of all employees working on a project. The relationship is many to many. I have created schema in AppSync (GraphQL). Appsync created the required queries and mutations for the type (EmployeeProjects). Now the ListEmployeeProjects takes a filter input with different attributes. My question is when I do the two searches on Employee ID or Project ID only, will it be a complete table scan? How efficient will that be. If it is a table scan, can I reduce the time complexity by creating indexes (GSI or LSI). The end product will have huge amount of data, so I cannot test the app with such data before hand. My project works fine, but I am worried about the problems that might arise later on with a lot of data. Can someone please help.
You don't need to (and should not) perform a Scan for this.
To get all of the projects an employee is working on, you just need to perform a Query on the base table, specifying employee ID as the partition key.
To get all of the employees on a project, you should create a GSI on the table. The partition key should be project ID and sort key should be employee ID. Then perform a Query on the GSI, using partition key of project ID.
In order to model this correctly you will probably want three tables
Employee Table
Project Table
Employee-Project reference table (i.e. just two attributes of employee ID and project ID)

Indexes for more than one column in DB

I am new to idexes and DB optimization. I know there is simple index for one
CREATE index ON table(col)
possibly B-Tree will be created and search capabilities will be improved.
But what is happen for 2 columns index ? And why is the order of defnition important?
CREATE index ON table(col1, col2)
Yes, B-Tree index will be created in most of the database if you didn't specify other type of index. Composite index is useful when the combined selectivity of the composite columns happed on the queries.
The order of the columns on the composite index is important as searching by giving exact values for all the fields included in the index leads to minimal search time but search uses only the first field to retrieve all matched recaords if we provide the values partially with first field.
I found following example for your understanding:
In the phone book example with an composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns.

How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Query that allow duplicate values in a table but not for same foreign key reference

I am using SQL Server2005 with asp.net. I want validation at server side to restrict duplicate entries, Here i am using two tables companies and Branches. In Branches Table i had maintain a foreign key of CompanyId. In Branches the BranchName can be duplicate but not for the Particular CompanyId.
Companies Table:
Columns: CompanyId (Primary Key), CompanyName
Branches Table :
Columns: BranchId(Primary Key), BranchName, CompanyId (Foreign Key).
Company Id can be Repeat multiple times, one to many Relationship.
Which query I use to that allow duplicate but not for the same CompanyId?
You want a constraint that enforces uniqueness against both the CompanyID and BranchName columns. This can either by the primary key for the table (as Tim has recommended), or a UNIQUE constraint:
ALTER TABLE Branches ADD
CONSTRAINT UQ_BranchNamesWithinCompanies UNIQUE (BranchName,CompanyID);
You can decide which order to put the columns within the constraint, based on how frequently searches are performed in the table based on the two columns. I.e. you're actually creating an index on these columns, so you may as well use it to improve some query performance.
The above ordering was based on a guess that you might search for branch names without reference to a particular company. If you're always searching within a company, and are performing prefix searches (e.g. CompanyID=21 and BranchName like 'Lon%'), then you'd want to reverse the order of the columns.
You could create a composite primary key from BranchName+CompanyId.
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx

Setting indexed columns on a custom SQL table

I've read about primary, unique, clustered indexes etc. But I need to understand it via an example.
The image below is the auto-generated aspnet_Users table captured from SQL Server Web Admin Panel.
Auto-generated ASP.NET Users Table http://eggshelf.com/capture.jpg
Taking this as a model; I will create a custom table called Companies and let's say the fields are: ID, Name, ShortName, Address, City, Country.. No values can be duplicate for ID, Name and ShortName fields.
What is your approach on creating indexes for this table? Which should be clustered or non-clustered? Are indexes below logical to you?
Index Columns Primary Unique Clustered ign.Dup.Keys Unique Key
------------------------------------------------------------------------------------------
PK_ID ID True True False False False
Comp_Index Name,ShortName False True True False False
regards..
Indexes are not about table structure, but about access patterns.
You need to see how you query the data in the table and create your indexes accordingly.
The rule of thumb is to consider defining indexes on fields that are commonly used in the WHERE clause.
See this blog post on the subject.
Update
You can only define a single clustered index on a table. This is normally done on the identity field of the table, as you have in your example.
Other indexes will be non-clustered.
In regards to the other (non-clustered) index - if you intend on only having queries that contain both fields in the WHERE clause and the ORDER BY will have a primary sort on Name (as opposed to a primary sort on ShortName). The reason for that is that this is how the index will be stored - first on Name, then on ShortName.
If however, you will use ShortName as primary sort or without Name in the WHERE clause, you are better off with two indexes, one for each.
Go and get a quick overall understanding of SQL Server Indexes by reading Brad's Sure Guide to Indexes
Typically, having not performed any query analysis your starting point will be:
The Primary Key column can make a good candidate for the Clustered Index (often dependant on data type used and key width).
You should create Non-Clustered indexes on Foreign Key Columns.
You should create Non-Clustered indexes on SARG columns from your queries.
Then take a look at these generic index tips.
Oded is right - indices (Clustered and non) are all about performance and needs intimate knowledge about the types of queries.
e.g. If both ShortName and Name are both queried independently, you might want to have separate Non Clustered indexes for ShortName and Name.
If you need to enforce uniqueness, use UNIQUE INDEX (or add UNIQUE CONSTRAINTs to ShortName and Name). ID is already unique as it is the PK.
You can also change the Clustered Index (from its default of ID) if you know more about how data from your companies table will be fetched (e.g. Cluster on City if it is common practice to fetch all Companies in a City at once etc)

Resources