I have a question on the NODE_NOTARY_COMMIT_LOG table used to record the notary transactions. My first (bad) assumption was that the TRANSACTION_ID was unique in this table, but it appears that this is not the case - when I found two tables entries with the same TRANSACTION_ID and CONSUMING_TRANSACTION_ID, but the CONSUMING_INPUT_INDEX was incremented, and the OUTPUT_INDEX was also different (the opposite to the consuming input index). could someone explain to me how this works and how to determine uniqueness in the table - Thanks in advance :)
The NODE_NOTARY_COMMIT_LOG table is effectively a map of state reference to consuming transaction id. The column pair (TRANSACTION_ID, OUTPUT_INDEX) identifies the state: it's the id of the transaction that issued the state, and the state's position in outputs.
CONSUMING_TRANSACTION_ID and CONSUMING_INPUT_INDEX specify which transaction consumed the state, and the state's position in the inputs.
Note that since Corda 3.0 the CONSUMING_INPUT_INDEX is no longer recorded.
Related
I'm using Corda 4.0. While using linear states I found out that it is possible to create multiple linear states with same linear id (external id and UUID).
As per my logic for non fungible digital assets, linear id suppose to be unique, something like unique token in blockchain implementations.
In database, in vault_linear_states (same as vault_states and vault_fungible_states) table, I can see that primary key constraint defined as "output_index, transaction_id"
If some state will be changed multiple times, there will be multiple entries for same linear.
output_index column as I see in DB always "0".
Question:
1) What is purpose of output_index? Didn't find appropriated information.
2) How to work properly with linear states in terms of uniqueness? Should I programmatically select and check before insert or there is some other ways to deal with that?
1) Output index is the index of the previous sub transaction as one transaction can have multiple sub transactions. The reason why its always 0, if there are more sub transactions then it will be 0,1,2 so on.
2) First create the linear state return its unique id after that use same id to query the vault and use it as the input state in the transaction builder and updated one as the output state creating chain.
more info can be found here https://docs.corda.net/key-concepts-transactions.html
What exactly is Room #ForeignKey used for?
I know that it is used for linking two tables, so that whenever some update happens to the parent it updates children as well. For example,
onDelete = ForeignKey.CASCADE
I suppose it's nothing but my given definition (second paragraph), right?.
The reason I am asking this question is in OrmLite for example when you define foreign = true then you can have join database and can fill the foreign value with data. This you can not do with #ForeignKey of Room.
Here is a detailed explanation of what foreign does in OrmLite.
Am I right?
FKs (foreign keys) are a relational database concept. A FK says table subrows appear elsewhere uniquely. Equivalently, a FK says entities that participate in a relation(ship)/association participate uniquely in another. Those statement are equivalent because in a relational database a table represents entities/values that participate together per a relation(ship)/association--hence "the Relational Model" & "the Entity-Relationship Model".
The FK graph can be used for convenience/shorthand: default join conditions; preventing updates to invalid states; cascading updates; getting a unique value associated with an entity in the other relation(ship)/association; simultaneously setting values in one relation(ship)/association and the other one. FKs are wrongly called "relationships" and don't have to be known to query. They must be known to ask for a single value associated with an entity, but we can always just ask for a set of values whether or not it might always only ever have one element.
FKs, CKs (candidate keys), PKs (primary keys) & superkeys (unique column/field sets) are special cases of constraints, which are just conditions that are always true in every database state & (equivalently) businesss situation. They are determined by the relation(ship)s/associations & the valid business situations that can arise. When we tell the DBMS about them it can prevent update to a state that must be invalid because it violates them.
What is the difference between an entity relationship model and a relational model?
Corda saves data into Vault. Vault is nothing but Database. By default it has support for H2 database. Corda stores states into H2 table as BLOB. I have performed scan on tables NODE_TRANSACTION, VAULT_LINEAR_STATES and VAULT_STATES. I run IOU Example and performed several transactions. I truncated NODE_TRANSACTION and VAULT_LINEAR_STATES and test on UI but UI was still showing State's data. Data is showing from VAULT_STATES but how it showing is still question. There was no BLOB found in VAULT_STATES my question is where exactly it is referring state in db
The NODE_TRANSACTIONS table maps each transaction ID to a blob of the transaction. This blob includes the transaction's output states, as well as the other components of the transaction
The VAULT_STATES table references each state by the ID of transaction that created it, and its index in the outputs of that transaction. This (ID, output idx) pair is then used to retrieve the state object from the corresponding blob in the NODE_TRANSACTIONS table
I have a JSON document with two properties deviceIdentity, version.
Partition Key for my collection is deviceIdentity.
My JSON documents comes with different versions I want to keep all versions of this document.
Like:
deviceIdentity1, v1
deviceIdentity1, v2
Two documents should be there.
Problem is since my PK is deviceIdentity, it is always updating the existing record even though I have defined a unique key constraint on deviceIdentity, version.
enter image description here
Any pointers will be of help!
I believe you are confusing partition key with primary key.
Partition key determines how data is scaled horizontally. This should not be unique as otherwise any read except exact document lookup would require to scan all partitions, which would be innefficient. In your case deviceIdentity may be a suitable candidate - all versions of the same device would fall to same partition.
Primary key is your document identity (the field id). As you already noticed, there can be only 1 document with given id. The id field MUST be unique per document you want to store. In your case, you could use a combination values like "deviceIdentity1, v2" as the identity. Or, you could use technical unique id, like a guid.
Also, note that by Unique keys in Azure Cosmos DB:
By creating a unique key policy when a container is created, you ensure the uniqueness of one or more values per partition key.
Meaning if your partition key is deviceIdentity then you don't have to duplicate the deviceIdentity in unique constraint part. Constraint on /version would suffice to ensure that every single partition/device has at most one document per version.
Thanks for all the answers.
The problem was we have an old legacy system where “id “was an already heavily used property but it did not have unique values.
So whenever a document comes with a different version it was updating as “id” in cosmos has predefined meaning that is UPSERT of any arriving document is done on unique id value, in our case id is never unique.
Solution we found.
Whenever a document comes we process it in an azure function and swap “id” column with the value of unique “deviceidentity” value and save it, as structure of JSON cannot be changed as stated by our client and while reading these documents we have exposed an API which does the swapping again and sends the document to the requesting client as it is.
What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.