I have a user object with these attributes.
id (key), name and email
And I am trying to make sure these attributes are unique in the DB.
How do I prevent a put/create/save operation from succeeding, in case of either of the non-key attributes, email and name, already exists in the DB?
I have a table, tblUsers with one key-attribute being the id.
I then have two globally secondary indexes, each with also one key-attribute, being the email for the first index-table, and name for the second.
I am using microsoft .net identity framework, which itself checks for existing users with a given name or email, before creating a user.
The problem I forsee, is the delay between checking for existing users and creating a new. There is no safety, that multiple threads wont end up creating two users with the same name or email.
dynamodb can force uniqueness only for hash-range table keys (not for global secondary index keys)
do in your case there are 2 options:
1) force it on application level - if your problem is safety, then use locks (cache locks)
2) dont use dynamodb (maybe its not answer your requirements )
I am using ehcache locally to check for duplicates . Adding one more check if ehcache is empty (for some reason ,cache has been reset ) .repopulate cache by making query to dynamoDb .
Related
I have a DynamoDB table:
How shoul I filter entried in DB table where all keys are: access.role = "ADMIN"?
You would be best served by setting up an Global Index (GSI). You set the Partition Key equal to that attribute, and the Sort Key equal to some other attribute that you can guarantee will be unique. Then you use your SDK of choice or the Query option in the console, select the index, and query for partion_key = ADMIN
However. Be aware. Index's are a complete replication of the table. Dynamo is very good at this and relatively fast at doing so, but there is still the possibility that your index will be out of sync with the actual data. If you are not making the call against the index very often you are pretty much fine. If you are calling it very often, then you should restructure your table.
Dynamo is not an SQL. When setting up a dynamo schema you have to consider how you will access your data. your Access Patterns. You should design your data with your Partition Key as the data you will have when looking up (Ie: i always will have a user ID number) and your sort keys as the individual documents related to that PK (ie: a user has a document that is his profile data, a document that is his profile picture url, a document that is a list of his friends user numbers, a document that is ... ect)
Then you use Indexs for things like your question that you wont be doing very often.
I have a JSON document with two properties deviceIdentity, version.
Partition Key for my collection is deviceIdentity.
My JSON documents comes with different versions I want to keep all versions of this document.
Like:
deviceIdentity1, v1
deviceIdentity1, v2
Two documents should be there.
Problem is since my PK is deviceIdentity, it is always updating the existing record even though I have defined a unique key constraint on deviceIdentity, version.
enter image description here
Any pointers will be of help!
I believe you are confusing partition key with primary key.
Partition key determines how data is scaled horizontally. This should not be unique as otherwise any read except exact document lookup would require to scan all partitions, which would be innefficient. In your case deviceIdentity may be a suitable candidate - all versions of the same device would fall to same partition.
Primary key is your document identity (the field id). As you already noticed, there can be only 1 document with given id. The id field MUST be unique per document you want to store. In your case, you could use a combination values like "deviceIdentity1, v2" as the identity. Or, you could use technical unique id, like a guid.
Also, note that by Unique keys in Azure Cosmos DB:
By creating a unique key policy when a container is created, you ensure the uniqueness of one or more values per partition key.
Meaning if your partition key is deviceIdentity then you don't have to duplicate the deviceIdentity in unique constraint part. Constraint on /version would suffice to ensure that every single partition/device has at most one document per version.
Thanks for all the answers.
The problem was we have an old legacy system where “id “was an already heavily used property but it did not have unique values.
So whenever a document comes with a different version it was updating as “id” in cosmos has predefined meaning that is UPSERT of any arriving document is done on unique id value, in our case id is never unique.
Solution we found.
Whenever a document comes we process it in an azure function and swap “id” column with the value of unique “deviceidentity” value and save it, as structure of JSON cannot be changed as stated by our client and while reading these documents we have exposed an API which does the swapping again and sends the document to the requesting client as it is.
What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.
Let's say I have two schemas: HR and Orders.
[HR].Employees [Orders].Entries
-------------- ----------------
Id_Employee ----> Employee
Fullname Id_Entry
Birthday Description
Amount
As you can see, what I'd want is to be able to establish a cross-database foreign key, but when I try this using a database link, I get:
-- From [Orders]
ALTER TABLE Entries
ADD CONSTRAINT FK_Entries_Employees FOREIGN KEY (Employee)
REFERENCES Employees#HR;
COMMIT;
ORA-02021: DDL operations are not allowed on a remote database
Is there a way around this? It's a legacy database, so I can't change the existing schema.
For the NHibernate crowd: I would then use this relation to map the NHibernate's domain objects.
One option would be to create a materialized view of Employees on [Orders] and then use that as the parent for the foreign key.
Of course, that has some drawbacks. In particular,
-- you won't be able to do a complete refresh of the materialized view without disabling the foreign key, so it'll have to fast refresh.
-- keys entered into EMPLOYEES won't be available to ENTRIES until the materialized view refresh. If that's critical, you may want to set it to refresh on commit.
Other alternatives are to handle the key enforcement yourself through a trigger or through a post cleanup process. Or convince the DBA's that these schemas can reside on the same database instance.
As far as I know constraints and referential integrity are only supported within one single database.
If you need to cross the boundaries of the database, you'd have to be creative. Maybe write some triggers checking for data in the other database or enforce these constraints on the application level. But then you may encounter the problem with transaction scope limited to one single database.
i have two tables
asset employee
assetid-pk empid-pk
empid-fk
now, i have a form to populate the asset table but it cant because of the foreign key constraint..
what to do?
thx
Tk
Foreign keys are created for a good reason - to prevent orphan rows at a minimum. Create the corresponding parent and then use the appropriate value as the foreign key value on the child table.
You should think about this update as a series of SQL statements, not just one statement. You'll process the statements in order of dependency, see example.
Asset
PK AssetID
AssetName
FK EmployeeID
etc...
Employee
PK EmployeeID
EmployeeName
etc...
If you want to "add" a new asset, you'll first need to know which employee it will be assigned to. If it will be assigned to a new employee, you'll need to add them first.
Here is an example of adding a asset named 'BOOK' for a new employee named 'Zach'.
DECLARE #EmployeeFK AS INT;
INSERT (EmployeeName) VALUES ('Zach') INTO EMPLOYEE;
SELECT #EmployeeFK = ##IDENTITY;
INSERT (AssetName, EmployeeID) VALUES ('BOOK',#EmployeeFK) INTO ASSET;
The important thing to notice above, is that we grab the new identity (aka: EmployeeID) assigned to 'Zach', so we can use it when we add the new asset.
If I understand you correctly, are you trying to build the data graph locally before persisting to the data? That is, create the parent and child records within the application and persist it all at once?
There are a couple approaches to this. One approach people take is to use GUIDs as the unique identifiers for the data. That way you don't need to get the next ID from the database, you can just create the graph locally and persist the whole thing. There's been a debate on this approach between software and database for a long time, because while it makes a lot of sense in many ways (hit the database less often, maintain relationships before persisting, uniquely identify data across systems) it turns out to be a significant resource hit on the database.
Another approach is to use an ORM that will handle the persistence mapping for you. Something like NHibernate, for example. You would create your parent object and the child objects would just be properties on that. They wouldn't have any concept of foreign keys and IDs and such, they'd just be objects in code related by being set as properties on each other (such as a "blog post" object with a generic collection of "comment" objects, etc.). This graph would be handed off to the ORM which would use its knowledge of the mapping between the objects and the persistence to send it off to the database in the correct order, perhaps giving back the same object but with ID numbers populated.
Or is this not what you're asking? It's a little unclear, to be honest.