I'm designing a health care form database. We use a variety of forms, and the user id and the year are the unique identifiers. Currently I have one table per form, each with a user id and a year for the primary key: ex, table health_form_1, pk (user_id, year) various form-specific columns. table health_form_2, pk (user_id, year) various form-specific columns.
I feel weird looking at a set of tables that all have the same primary key. Is there a better way to do this?
Database tables shouldn't map to your forms. Rather, the tables should map to real-world entities that your system is modeling.
For instance, if you are working on a medical billing system, then you might have tables like:
Patient
Clinician
Invoice
etc...
Each of these tables would have its own primary key.
The problem with that approach is that if the business requirement ever changes (e.g., the user can create the same form more than once in one year), you are in the position of having to change what your primary key is, which can be especially problematic when it is also used as a foreign key elsewhere.
Instead, I would create a surrogate autoincrement primary key for each table, and create a unqiue index or constraint on the UserID and Yeart columns instead.
Additionally, many ORMs work much better with a single PK, and it can make your queries more succinct.
Related
Ok, I have a table with primary partition key (Employee ID) and Sort Key (Poject ID). Now I want a list of all projects an employee works on. Also I want list of all employees working on a project. The relationship is many to many. I have created schema in AppSync (GraphQL). Appsync created the required queries and mutations for the type (EmployeeProjects). Now the ListEmployeeProjects takes a filter input with different attributes. My question is when I do the two searches on Employee ID or Project ID only, will it be a complete table scan? How efficient will that be. If it is a table scan, can I reduce the time complexity by creating indexes (GSI or LSI). The end product will have huge amount of data, so I cannot test the app with such data before hand. My project works fine, but I am worried about the problems that might arise later on with a lot of data. Can someone please help.
You don't need to (and should not) perform a Scan for this.
To get all of the projects an employee is working on, you just need to perform a Query on the base table, specifying employee ID as the partition key.
To get all of the employees on a project, you should create a GSI on the table. The partition key should be project ID and sort key should be employee ID. Then perform a Query on the GSI, using partition key of project ID.
In order to model this correctly you will probably want three tables
Employee Table
Project Table
Employee-Project reference table (i.e. just two attributes of employee ID and project ID)
I am having some confusion on how to determine a primary key in regards to an ERD model.
Say for example,
I created the following table to keep track of employees salary.
Sal_His(Emp#, Salary, Reason, Raise-Date)
How would I determine which key would become the primary key?
A primary key can also be a combination of multilple fields.
In your case, Emp# and Raise-Date together might form the primary key.
EDIT At the logical level, those two fields form a compound primary key. That primary key indentifies uniquely each row of the table (unless an employee can have multiple raises per day) and is irreducible because none of those fields alone is sufficient to uniquely identify your records.
When you get to the physical level, you might want to introduce a surrogate primary Key (an ID) and create a unique index on the two columns (RaiseDate, Emp#).
You can find more information about the benefits and drawbacks of this approach here.
I only use primary key integer ID for it's "auto-increment function".
What if I don't need an "auto-increment"? Do I still need primary key if I don't care the uniqueness of record?
Example: Lets compare this table:
create table if not exists `table1`
(
name text primary key,
tel text,
address text
);
with this:
create table if not exists `table2`
(
name text,
tel text,
address text
);
table1 applies primary key and table2 don't. Is there any bad thing happen to table2?
I don't need the record to be unique.
SQLite is a relational database system. So it's all about relations. You build relations between tables on keys.
You can have tables without a primary key; it is not necessary for a table to have a primary key. But you will almost always want a primary key to show what makes a record unique in that table and to build relations.
In your example, what would it mean to have two identical records? They would mean the same person, no? Then how would you count how many persons named Anna are in the database? If you count five, how many of them are unique, how many are mere duplicates? Such queries can be done properly, but get overly complicated because of the lacking primary key. And how would you build relations, say the cars a person drives? You would have a car table and then how to link it to the persons table in your example?
There are cases when you want a table without a primary key. These are usually log tables and the like. They are rare. Whenever you are creating a table without a primary key, ask yourself why this is the case. Maybe you are about to build something messy ;-)
You get auto-incrementing primary keys only when a column is declared as INTEGER PRIMARY KEY; other data types result in plain primary keys.
You are not required to declare a PRIMARY KEY.
But even if you do not do this, there will be some column(s) used to identify and look up records.
The PRIMARY KEY declaration helps to document this, enforces uniqueness, and optimizes lookups through the implicit index.
I am using SQL Server2005 with asp.net. I want validation at server side to restrict duplicate entries, Here i am using two tables companies and Branches. In Branches Table i had maintain a foreign key of CompanyId. In Branches the BranchName can be duplicate but not for the Particular CompanyId.
Companies Table:
Columns: CompanyId (Primary Key), CompanyName
Branches Table :
Columns: BranchId(Primary Key), BranchName, CompanyId (Foreign Key).
Company Id can be Repeat multiple times, one to many Relationship.
Which query I use to that allow duplicate but not for the same CompanyId?
You want a constraint that enforces uniqueness against both the CompanyID and BranchName columns. This can either by the primary key for the table (as Tim has recommended), or a UNIQUE constraint:
ALTER TABLE Branches ADD
CONSTRAINT UQ_BranchNamesWithinCompanies UNIQUE (BranchName,CompanyID);
You can decide which order to put the columns within the constraint, based on how frequently searches are performed in the table based on the two columns. I.e. you're actually creating an index on these columns, so you may as well use it to improve some query performance.
The above ordering was based on a guess that you might search for branch names without reference to a particular company. If you're always searching within a company, and are performing prefix searches (e.g. CompanyID=21 and BranchName like 'Lon%'), then you'd want to reverse the order of the columns.
You could create a composite primary key from BranchName+CompanyId.
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
I am using ASP.NET and the Entity Framework to make a website. I currently have a map table for a many to many relationship between... let's say users and soccer teams. So:
Users
Teams
UserTeams
Part 1: Is it best practice to use a composite key for the primary key of the map table? In other words:
UserTeams table
PK UserId
PK TeamId
PreferenceId
Part 2: The caveat is that I also have another table. Let's call it "UserTeamPredictions" that stores the user's predictions for a given team for each year. That table has a foreign key that points back to the map table. So it looks something like this:
UserTeamPredictions table
PK UserTeamPredictionId
FK UserId
FK TeamId
Prediction
PredictionYear
This seems to work fine in the Entity Framework, however, I have had some problems when referencing relationships in third-party controls that I use like Telerik. Even though it might not be the ideal data setup, should I change the table structure/relationships so that its easier to work with in the code with data binding and other things?
The change would be to add an integer primary key to the UserTeams map table, allowing the UserTeamPredictions table to reference the key directly, instead of through the composite key as it currently does:
UserTeams table
PK UserTeamId
FK UserId
FK TeamId
PreferenceId
UserTeamPredictions table
PK UserTeamPredictionId
FK UserTeamId
Prediction
PredictionYear
What do you think!?
You should change it. Search stack overflow for discussions on "natural keys" - it's almost universally agreed that surrogate keys are better, especially when using entity generation. Natural or composite keys do not play well with entity framework style DAL layers in general. For example, Lightspeed and Subsonic both require that you have a single unique column as a PK... Lightspeed in it's current version even goes so far to insist that your column is called "Id", although that will be changing next version.
I would choose not to. I would use a surrogate key and put a unique index on the UserId and TeamId columns. I get really sick of composite keys when there are more than two, and rather than have a mix of composite and surrogate keys, I choose to go with all surrogate, meaningless autoincrement keys wherever possible.
This has the bonus of giving you good performance on joins, and means you always know the key for a given table (table name + ID), without having to reference the schema. Some ORM tools only work properly with single column rather than composite keys, too.