Corda persistent API - corda

I am reading the documentation about Corda Persistence https://docs.corda.net/api-persistence.html and I have several points that are not clear to me.
Am I right that data are persisted in parallel with vault storing. I.e. vault storage is not changed and new tables are being added to store data also.
When we use
cordaRPCClient.vaultQueryBy method will it understand by itself what to use: vault or data persisted in the custom database tables?
How the choice is done when for example only part of the data are available in the tables? is there any way to tell Corda explicitly that persisted data should be used for the query?

Here are the answers to your queries:
Yes, you are correct, new tables are created in the vault corresponding to your QueryableState. All states that are required to be persisted should implement the QueryableState interface.
Your states are stored as the normal binary format as well, thus cordaRPCClient.vaultQueryBy would always query the vault for the ContractState, not the PersistentState. You could, however, query the custom database table using a jdbc session/ jpa.
What part of the state is needed to be persisted is a call you make depending on your requirement. Persisted data could be queried using custom JDBC queries/ JPA. The vaultQuery API always works in ContractState.

Related

Store a JSON in the PostgreSQL database of a Corda node

I have a scenario where I need to store a large JSON String in the PostgreSQL database of a Corda node.
Does a Corda node support this type of scenario?
Technically I think you can store a large string as part of a state in corda. However, it would perhaps not be a great idea to do so. The larger the string would be the bigger the transaction size would be and thus greater network latency. Also if the string becomes part of the state, it would get copied across its state evolution and hence any node not having previous transactions of the state would need to download the complete backchain which means all the previous consumed states which adds up the network latency.
It would rather be better to add a reference to the state like a hash of the json file.

What are the ways to access the flink state from outside the flink cluster?

I am new to Apache flink and building a simple application where I am reading the events from a kinesis stream, say something like
TestEvent{
String id,
DateTime created_at,
Long amount
}
performing aggregation (sum) on field amount on above stream keyed by id. The transformation is equivalent to SQL select sum(amount) from testevents group by id where testevents are all the events received till now.
The aggregated result is stored in a flink state and I want the result to be exposed via an API. Is there any way to do so?
PS: Can we store the flink state in dynamoDB and create an API there? or any other way to persist and expose the state to outside world?
I'd recommend to ignore state for now and rather look at sinks as the primary way for a stream application to output results.
If you are already using Kinesis for input, you could also use Kinesis to output the results from Flink. You can then use the Kinesis adapter for DynamoDB that is provided by AWS as further described on a related stackoverflow post.
Coming back to your original question: you can query Flinks state and ship a REST API together with your stream application, but that's a whole lot of work that is not needed to achieve your goal. You could also access checkpointed/savepointed state through the state API, but again that's quite a bit of manual work that can be saved by going the usual route outlined above.
This is Flink's documentation, which provides some use cases queryable_state
You can also use the API to read it offlineState Processor API

Get an associated collections via RPC

We have two states stored in Corda Vault (policy and event). Policy can have many events associated with it.
We are attempting to get a joined result (as if we run SQL with JOIN statement) via RPC client and we can't find a graceful way: either we should make several VaultQueries or just use direct JDBC connection to the underlying database and extract the required data. Neither of ways looks appealing and we wonder if there is a good way to extract the data.
As we cannot use JPA/Hibernate annotations to link objects inside the CordApp, we have just policy_id stored in event state.
For more complex queries, it is fine and even expected that the user will query the node's database directly using the JDBC connection.

DB SQL for Vault in corda 3

In corda open documentation I read the following:
The ORM mapping is specified using the Java Persistence API (JPA) as annotations and is converted to database table rows by the node automatically every time a state is recorded in the node’s local vault as part of a transaction.
Presently the node includes an instance of the H2 database but any database that supports JDBC is a candidate and the node will in the future support a range of database implementations via their JDBC drivers. Much of the node internal state is also persisted there.
Can I replace h2 DB with an SQL one using JDBC?
As I understood, the FinalityFlow is used to record the transaction in the local Vault using h2 DB.
If I implement a custom Flow to record in an SQL DB, i have to avoid the FinalityFlow call?
Yes, it is possible to run a node with a SQL database other than H2. In fact, support for PostgreSQL and SQLServer has been contributed by the open-source community. See the set-up instructions here. However, be aware that the Corda continuous integration pipeline does not run unit tests or integration tests of these databases, so they must be used at your own risk.
Note that in both cases, you configure the node to use the alternative database via the configuration file, and it stores all its data in this alternative database (transactions, states, identities, etc.). You are not expected to access the database directly in a flow to do this, and can rely upon the standard ServiceHub operations and standard flows like FinalityFlow.

Can you query different databases on the same server using 1 NHibernate Session?

Does a new SessionFactory and Session object have to be created for each database? I have a data store for my application data, and a separate data store for my employee security, which is used to validate users. Do I have to create a new SessionFactory ans Session object for calls to the 2 different databases?
ok so this doesn't answer your question directly but it might offer an insight as to why you should create multiple session objects for each datastore.
This article explains how you can implement a thread safe lazy singleton for each type of Session you need so that you only have one session per datastore but it's shared across the entire application. So at most you're only ever going to have 2 session objects.
To directly answer your question however, you will need 1 session object per database.
General case
The general case answer is no, you need at least different sessions for the general case.
You may use a single session factory by using the OpenSession overload taking an opened connection as argument, allowing you to switch database for the session requiring it.
This has some drawbacks, like lack of connection auto-releasing after transactions, disabling of second level cache, ... Better have two session factories in my opinion, rather than supplying your own connection on session opening.
Database specific cases
Depending on the database server you use, you may be able to use a single connection string for accessing both with NHibernate. If you can use a single connection string, then you can use a single session factory and use the same session for accessing your entities split between two databases.
Simplest case
Using SQL Server, you may have your two databases on the same SQL Server. In such case, you can use a single connection string and adjust the catalog attribute on your <class> mappings for telling in which database the table is to be found. (schema can be used too, by appending a dot. It is available in NHibernate since longer, so with an old version you may only have schema.)
Of course, the connection credentials must be valid for accessing both databases.
Other cases
Still using SQL Server, if the second database is on another server, you may use a linked server. You would adjust again the catalog attribute on classes requiring it for specifying the appropriate linkedServerName.DbName.
Maybe other databases could have similar solutions.

Resources