Corda State persist where exactly in DB (IOU Example) - corda

Corda saves data into Vault. Vault is nothing but Database. By default it has support for H2 database. Corda stores states into H2 table as BLOB. I have performed scan on tables NODE_TRANSACTION, VAULT_LINEAR_STATES and VAULT_STATES. I run IOU Example and performed several transactions. I truncated NODE_TRANSACTION and VAULT_LINEAR_STATES and test on UI but UI was still showing State's data. Data is showing from VAULT_STATES but how it showing is still question. There was no BLOB found in VAULT_STATES my question is where exactly it is referring state in db

The NODE_TRANSACTIONS table maps each transaction ID to a blob of the transaction. This blob includes the transaction's output states, as well as the other components of the transaction
The VAULT_STATES table references each state by the ID of transaction that created it, and its index in the outputs of that transaction. This (ID, output idx) pair is then used to retrieve the state object from the corresponding blob in the NODE_TRANSACTIONS table

Related

Where does vaultQuery in RPC get its output from DB from in PersistentState

Using the cordapp-example as a basis where the IOUstate is a queryableState and persisted. In the DB, you will see a new table IOU_States with column values as you have defined.
Build the project and start the nodes
Create a Tx from partyA to partyB
flow start ExampleFlow iouValue: 50, otherParty: "O=PartyB,L=New York,C=US"
Run vaultquery() on partyA and take note of the displayed output (label as display 1)
run vaultQuery contractStateType: com.example.state.IOUState
attach a H2 console to DB of partyA
Run search on IOU_States table
You will see the IOUState state object as a row item, note the value of 50
Run an update to change the value of 50 to 60
Run search on the IOU_States table to confirm the change
Run vaultquery() on partyA and take note of the displayed output (lable as display 2)
display 1 = display 2
Question:
1. When i have corrupted my persisted table, what exactly have i changed?
2. Does vaultQuery() query the node_transactions instead and de-serialise from the blob?
3. In the Vault_states table, we used to have a column Contract_states, but it is no longer there. That is the snapshot we tend to change to test data tampering previously. Where is the snap shot of the state kept now?
changing your state's table does not corrupt the transaction , the vault query acts on the vault, so even if you update the data , any transaction that has taken the manipulated state will not have inconsistencies

In a Corda Node when and how do the following tables fit into picture

NODE_TRANSACTION_MAPPINGS table (TX_ID ,STATE_MACHINE_RUN_ID),
NODE_CHECKPOINTS table (checkpoint Id, checkpoint Value)and
NODE_TRANSACTIONS table.
What I think I understand is first Tx info is added to node_transactions table (irrespective to the validity of the Tx ) then it gets added to node_transaction_mappings then we update the checkpoint. (what are these checkpoints as we update this at each step, an enum to understand would help :).
Also, when do we put the Tx values node_transactions table? do we update any table once we send/receive a message from artemis?.
In short, do we have a transaction lifecycle somewhere? as in after what step what gets updated? Will make it easier to debug a Transaction.
When ReceiveTransactionFlow is invoked, the following process happens for every individual transaction received:
The node creates a new DB transaction
As part of the existing DB transaction, the node adds a checkpoint to the NODE_CHECKPOINTS table
The node receives the Corda transaction from the counterparty
As part of the existing DB transaction, the node updates the NODE_TRANSACTIONS table
As part of the existing DB transaction, the node updates the NODE_TRANSACTION_MAPPINGS table
The node commits the DB transaction

NODE_PROPERTIES table in database

What is the purpose of NODE_PROPERTIES table in the database and how do we get this table populated with key value pairs and how do we query? And how do we query data in other NODE tables like NODE_INFOS, NODE_NAMED_IDENTITIES , NODE_INFO_HOSTS? Is there any service level function available in CordaRPCClient to do that? We would like to store some extra properties for each node
The NODE_PROPERTIES table is used for internal purposes to store information that doesn't justify having its own table (currently, whether or not the node was in flow-drain mode when it was last stopped).
Feel free to store additional key-value pairs there, as long as they don't clash with keys used for internal purposes (a clash is unlikely, as we currently use long key-names to store information in this table).
You can get access to the node's database via the node's ServiceHub, which is available inside flows and services. The Flow DB sample shows an example of a service that connects, reads and writes directly to the node's database: https://github.com/corda/samples.
You can also connect directly to the node via JDBC (e.g. from a client or server). The node lists its JDBC database connection string at start-up. You can also set it in the node's configuration file, as shown here: https://docs.corda.net/corda-configuration-file.html#examples.

Corda - Notary Vault table and uniqueness

I have a question on the NODE_NOTARY_COMMIT_LOG table used to record the notary transactions. My first (bad) assumption was that the TRANSACTION_ID was unique in this table, but it appears that this is not the case - when I found two tables entries with the same TRANSACTION_ID and CONSUMING_TRANSACTION_ID, but the CONSUMING_INPUT_INDEX was incremented, and the OUTPUT_INDEX was also different (the opposite to the consuming input index). could someone explain to me how this works and how to determine uniqueness in the table - Thanks in advance :)
The NODE_NOTARY_COMMIT_LOG table is effectively a map of state reference to consuming transaction id. The column pair (TRANSACTION_ID, OUTPUT_INDEX) identifies the state: it's the id of the transaction that issued the state, and the state's position in outputs.
CONSUMING_TRANSACTION_ID and CONSUMING_INPUT_INDEX specify which transaction consumed the state, and the state's position in the inputs.
Note that since Corda 3.0 the CONSUMING_INPUT_INDEX is no longer recorded.

Change the schema of a DynamoDB table: what is the best/recommended way?

What is the Amazon-recommended way of changing the schema of a large table in a production DynamoDB?
Imagine a hypothetical case where we have a table Person, with primary hash key SSN. This table may contain 10 million items.
Now the news comes that due to the critical volume of identity thefts, the government of this hypothetical country has introduced another personal identification: Unique Personal Identifier, or UPI.
We have to add an UPI column and change the schema of the Person table, so that now the primary hash key is UPI. We want to support for some time both the current system, which uses SSN and the new system, which uses UPI, thus we need both these two columns to co-exist in the Person table.
What is the Amazon-recommended way to do this schema change?
There are a couple of approaches, but first you must understand that you cannot change the schema of an existing table. To get a different schema, you have to create a new table. You may be able to reuse your existing table, but the result would be the same as if you created a different table.
Lazy migration to the same table, without Streams. Every time you modify an entry in the Person table, create a new item in the Person table using UPI and not SSN as the value for the hash key, and delete the old item keyed at SSN. This assumes that UPI draws from a different range of values than SSN. If SSN looks like XXX-XX-XXXX, then as long as UPI has a different number of digits than SSN, then you will never have an overlap.
Lazy migration to the same table, using Streams. When streams becomes generally available, you will be able to turn on a Stream for your Person table. Create a stream with the NEW_AND_OLD_IMAGES stream view type, and whenever you detect a change to an item that adds a UPI to an existing person in the Person table, create a Lambda function that removes the person keyed at SSN and add a person with the same attributes keyed at UPI. This approach has race conditions that can be mitigated by adding an atomic counter-version attribute to the item and conditioning the DeleteItem call on the version attribute.
Preemptive (scripted) migration to a different table, using Streams. Run a script that scans your table and adds a unique UPI to each Person-item in the Person table. Create a stream on Person table with the NEW_AND_OLD_IMAGES stream view type and subscribe a lambda function to that stream that writes all the new Persons in a new Person_UPI table when the lambda function detects that a Person with a UPI was changed or when a Person had a UPI added. Mutations on the base table usually take hundreds of milliseconds to appear in a stream as stream records, so you can do a hot failover to the new Person_UPI table in your application. Reject requests for a few seconds, point your application to the Person_UPI table during that time, and re-enable requests.
DynamoDB streams enable us to migrate tables without any downtime. I've done this to great effective, and the steps I've followed are:
Create a new table (let us call this NewTable), with the desired key structure, LSIs, GSIs.
Enable DynamoDB Streams on the original table
Associate a Lambda to the Stream, which pushes the record into NewTable. (This Lambda should trim off the migration flag in Step 5)
[Optional] Create a GSI on the original table to speed up scanning items. Ensure this GSI only has attributes: Primary Key, and Migrated (See Step 5).
Scan the GSI created in the previous step (or entire table) and use the following Filter:
FilterExpression = "attribute_not_exists(Migrated)"
Update each item in the table with a migrate flag (ie: “Migrated”: { “S”: “0” }, which sends it to the DynamoDB Streams (using UpdateItem API, to ensure no data loss occurs).
NOTE: You may want to increase write capacity units on the table during the updates.
The Lambda will pick up all items, trim off the Migrated flag and push it into NewTable.
Once all items have been migrated, repoint the code to the new table
Remove original table, and Lambda function once happy all is good.
Following these steps should ensure you have no data loss and no downtime.
I've documented this on my blog, with code to assist:
https://www.abhayachauhan.com/2018/01/dynamodb-changing-table-schema/
I'm using a variant of Alexander's third approach. Again, you create a new table that will be updated as the old table is updated. The difference is that you use code in the existing service to write to both tables while you're transitioning instead of using a lambda function. You may have custom persistence code that you don't want to reproduce in a temporary lambda function and it's likely that you'll have to write the service code for this new table anyway. Depending on your architecture, you may even be able to switch to the new table without downtime.
However, the nice part about using a lambda function is that any load introduced by additional writes to the new table would be on the lambda, not the service.
If the changes involve changing the partition key, you can add a new GSI (global secondary index). Moreover, you can always add new columns/attributes to DynamoDB without needing to migrate tables.

Resources