I'm creating an asp.net site that used linq to sql to create, edit and delete cars and race results. Each car has it's own number which has been set as the primary key. Each result has a result number, and there is a many-to-one relationship between the results and cars.
To create a new car object I use the Car DataContext, which automatically updates the database as requires using the DataContext.SubmitChanges() function. However it won't update the primary key, instead choosing a new one by incrementing the largest current value.
Since each car's number is important, is there any way to choose the primary key value using this method? Or should I make the car ID separate and use a separate piece of code to make sure the ID is unique?
As you aluded to in your question, keeping the Car number separate from its Id is the way to go. The reason for this is that it is possible that two cars could at some point have the same number, in addition to the fact that the database is choosing its own value for the Id anyway.
Just add another field to your Car table to record its number and you should be good to go.
See Update primary key value using entity framework for more information.
Related
In every cosmos db repository example I've seen, the id/row key has been generated like this: {partitionKey}:{Guid.newGuid()}. I'm working on a web api where the user won't necessarily have any way of knowing what this random GUID is. But they will know the EmployeeId, ProjectId etc. of the respective object, so I'm wondering if there are any issues with using i.e. EmployeeId as both the partition key and Id?
There's nothing technically wrong with the approach of setting id and partition key the same however you will have just one document per partition and that's bad design IMHO as all your read queries will be cross-partition queries (e.g. listing all employees).
One approach could be to set the partition key as the type of the entity (Employee, Project etc.) and then set the id as the unique identifier of the entity (employee id, project id etc.).
To be honest, if you know the partition key AND the item id, you can do a Point read which is the fastest.
We used to also take the approach of using random guids for all item IDs, but this means you will always need to know this id and partition key. Sometimes a more functional key as the item ID makes more sense so have a good thought about it!
And remember, an item ID is not unique, the uniqueness is only within the partition key.
So you could have two items with the same item ID and different partition key.
Ok, I have a table with primary partition key (Employee ID) and Sort Key (Poject ID). Now I want a list of all projects an employee works on. Also I want list of all employees working on a project. The relationship is many to many. I have created schema in AppSync (GraphQL). Appsync created the required queries and mutations for the type (EmployeeProjects). Now the ListEmployeeProjects takes a filter input with different attributes. My question is when I do the two searches on Employee ID or Project ID only, will it be a complete table scan? How efficient will that be. If it is a table scan, can I reduce the time complexity by creating indexes (GSI or LSI). The end product will have huge amount of data, so I cannot test the app with such data before hand. My project works fine, but I am worried about the problems that might arise later on with a lot of data. Can someone please help.
You don't need to (and should not) perform a Scan for this.
To get all of the projects an employee is working on, you just need to perform a Query on the base table, specifying employee ID as the partition key.
To get all of the employees on a project, you should create a GSI on the table. The partition key should be project ID and sort key should be employee ID. Then perform a Query on the GSI, using partition key of project ID.
In order to model this correctly you will probably want three tables
Employee Table
Project Table
Employee-Project reference table (i.e. just two attributes of employee ID and project ID)
What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.
i have two tables
asset employee
assetid-pk empid-pk
empid-fk
now, i have a form to populate the asset table but it cant because of the foreign key constraint..
what to do?
thx
Tk
Foreign keys are created for a good reason - to prevent orphan rows at a minimum. Create the corresponding parent and then use the appropriate value as the foreign key value on the child table.
You should think about this update as a series of SQL statements, not just one statement. You'll process the statements in order of dependency, see example.
Asset
PK AssetID
AssetName
FK EmployeeID
etc...
Employee
PK EmployeeID
EmployeeName
etc...
If you want to "add" a new asset, you'll first need to know which employee it will be assigned to. If it will be assigned to a new employee, you'll need to add them first.
Here is an example of adding a asset named 'BOOK' for a new employee named 'Zach'.
DECLARE #EmployeeFK AS INT;
INSERT (EmployeeName) VALUES ('Zach') INTO EMPLOYEE;
SELECT #EmployeeFK = ##IDENTITY;
INSERT (AssetName, EmployeeID) VALUES ('BOOK',#EmployeeFK) INTO ASSET;
The important thing to notice above, is that we grab the new identity (aka: EmployeeID) assigned to 'Zach', so we can use it when we add the new asset.
If I understand you correctly, are you trying to build the data graph locally before persisting to the data? That is, create the parent and child records within the application and persist it all at once?
There are a couple approaches to this. One approach people take is to use GUIDs as the unique identifiers for the data. That way you don't need to get the next ID from the database, you can just create the graph locally and persist the whole thing. There's been a debate on this approach between software and database for a long time, because while it makes a lot of sense in many ways (hit the database less often, maintain relationships before persisting, uniquely identify data across systems) it turns out to be a significant resource hit on the database.
Another approach is to use an ORM that will handle the persistence mapping for you. Something like NHibernate, for example. You would create your parent object and the child objects would just be properties on that. They wouldn't have any concept of foreign keys and IDs and such, they'd just be objects in code related by being set as properties on each other (such as a "blog post" object with a generic collection of "comment" objects, etc.). This graph would be handed off to the ORM which would use its knowledge of the mapping between the objects and the persistence to send it off to the database in the correct order, perhaps giving back the same object but with ID numbers populated.
Or is this not what you're asking? It's a little unclear, to be honest.
I am using ASP.NET and the Entity Framework to make a website. I currently have a map table for a many to many relationship between... let's say users and soccer teams. So:
Users
Teams
UserTeams
Part 1: Is it best practice to use a composite key for the primary key of the map table? In other words:
UserTeams table
PK UserId
PK TeamId
PreferenceId
Part 2: The caveat is that I also have another table. Let's call it "UserTeamPredictions" that stores the user's predictions for a given team for each year. That table has a foreign key that points back to the map table. So it looks something like this:
UserTeamPredictions table
PK UserTeamPredictionId
FK UserId
FK TeamId
Prediction
PredictionYear
This seems to work fine in the Entity Framework, however, I have had some problems when referencing relationships in third-party controls that I use like Telerik. Even though it might not be the ideal data setup, should I change the table structure/relationships so that its easier to work with in the code with data binding and other things?
The change would be to add an integer primary key to the UserTeams map table, allowing the UserTeamPredictions table to reference the key directly, instead of through the composite key as it currently does:
UserTeams table
PK UserTeamId
FK UserId
FK TeamId
PreferenceId
UserTeamPredictions table
PK UserTeamPredictionId
FK UserTeamId
Prediction
PredictionYear
What do you think!?
You should change it. Search stack overflow for discussions on "natural keys" - it's almost universally agreed that surrogate keys are better, especially when using entity generation. Natural or composite keys do not play well with entity framework style DAL layers in general. For example, Lightspeed and Subsonic both require that you have a single unique column as a PK... Lightspeed in it's current version even goes so far to insist that your column is called "Id", although that will be changing next version.
I would choose not to. I would use a surrogate key and put a unique index on the UserId and TeamId columns. I get really sick of composite keys when there are more than two, and rather than have a mix of composite and surrogate keys, I choose to go with all surrogate, meaningless autoincrement keys wherever possible.
This has the bonus of giving you good performance on joins, and means you always know the key for a given table (table name + ID), without having to reference the schema. Some ORM tools only work properly with single column rather than composite keys, too.