Corda Don't Know About Party - Notary - corda

I'm using Corda 3.1 with a self compiled version of the obligation cordapp example. The environment has a spring boot network map service deployed with party nodes and a notary node deployed to multiple AWS EC2 instances. Each node's persistence is backed by its own schema in a postgres database.
I'm running into the following exception when starting the IssueObligation.kt flow (IOU) from the internal web server:
[INFO ] 2018-06-07T14:27:01,751Z [Node thread-1] flow.[d04d24bf-5aa7-472a-b336-2e72feff6abf].initiateSession - Initiating flow session with party O=Notary, L=Dover, C=US. Session id for tracing purposes is SessionId(toLong=7742727399076294852). {}
[WARN ] 2018-06-07T14:27:01,776Z [Node thread-1] flow.[d04d24bf-5aa7-472a-b336-2e72feff6abf].run - Terminated by unexpected exception {}
java.lang.IllegalArgumentException: Don't know about party O=Notary, L=Dover, C=US
at net.corda.node.services.statemachine.StateMachineManagerImpl.sendSessionMessage(StateMachineManagerImpl.kt:616) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.processSendRequest(StateMachineManagerImpl.kt:582) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.processIORequest(StateMachineManagerImpl.kt:569) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.access$processIORequest(StateMachineManagerImpl.kt:63) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl$initFiber$2.invoke(StateMachineManagerImpl.kt:444) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl$initFiber$2.invoke(StateMachineManagerImpl.kt:63) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.FlowStateMachineImpl$suspend$2.write(FlowStateMachineImpl.kt:507) ~[corda-node-3.1-corda.jar:?]
at co.paralleluniverse.fibers.Fiber$3.run(Fiber.java:1994) ~[quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:824) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.RunnableFiberTask.doExec(RunnableFiberTask.java:100) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.RunnableFiberTask.run(RunnableFiberTask.java:91) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at net.corda.node.utilities.AffinityExecutor$ServiceAffinityExecutor$1$thread$1.run(AffinityExecutor.kt:62) [corda-node-3.1-corda.jar:?]
There isn't any other exception from the flow that points to exactly where this occurs, but it does happen after the party node successfully interacts with the other party node to issue the IOU. The network map service knows about the notary as it does report it in its list of nodes that have registered.
The party node knows about the notary because we don't see a failure from the flow when it executes this line:
val firstNotary get() = serviceHub.networkMapCache.notaryIdentities.firstOrNull() ?: throw FlowException("No available notary.")
Just trying to find out what steps I should take to troubleshoot the problem.

I have a similar set up and this has worked for me using dev mode.
Root cause: Network parameter file is just a signed file by the dev_CA endorsing a signedNodeInfo signed with key X is the valid notary. The error is thrown because your notary's existing key on the VM is maybe X, but the network parameter file distributed to the other nodes are endorsing Y.
So the notary couldn't prove to the other nodes that, that he with key X is the valid notary.
What bootstrapper does:
Request the notary node to regenerate a new dev cert/keys.
Request the notary node to sign his node.conf using his keys to produce nodeInfo-${hash}
Uses the devCA to sign over the notary signed nodeInfo-${hash} to produce network parameter file
To fix the problem.
Redistribute the nodeInfo-${hash} (optional)
Assuming all your notary and nodes already have an existing certificates folder.
Delete all nodeInfo-${hash} from all nodes.
Shut down all nodes and drop the node_identities table
Run java -jar corda.jar --just-generate-node-info to generate the nodeInfo-${hash} file
Redistribute the node info
Regenerate the network parameter file by forcing bootstrapper.jar to use notary existing key
Delete network parameter file from all nodes
Copy your notary existing certificates folder and place it somewhere/locally in this structure
Run java -jar network-bootstrapper.jar folder/
Redistribute the network parameter file generated
Restart all nodes
.
.folder/ // root folder containing the notary
├── notary_node // The notary's folder name (must be renamed this way)
├── certificates // The notary's certificates folder with the keys
└── notary_node.conf // The notary config (must be renamed this way)

After changing the Obligation example to match the IOU example, we were able to get this to work. Turns out that setting a time window on the transaction is enabling some kind of communication with the notary that causes the error. I do not know why.
.setTimeWindow(serviceHub.clock.instant(), 30.seconds)
Although removing this line this will allow an obligation (iou) to be written to the ledger, this line must be required to do the subsequent settlement command with the notary - as is we always see an insufficient funds error even though cash has been issued. So this is still not answered yet.

Related

Replaying flow from last checkpoint error

Please note that my node is part of Corda TestNet and it's deployed on Google Cloud Platform.
I'm using the following:
1. Corda OS 4.1 (updated to 4.3-RC01)
2. Tokens SDK 1.1-SNAPSHOT (updated to 1.1-RC01)
3. OKHttp 3.5.0
I have a flow that does the following:
1. Make an HTTP call (using OKHttp as per the HTTP example from the Samples repo); during my local tests I got an error that the Client, Request, and Response objects cannot be serialized so I nullified those references and it worked.
2. Fetch the latest version of my EvolvableTokenType
3. Use that version to create new FungibleTokens
4. Call IssueTokens sub-flow
5. Sleep for one millisecond (to checkpoint the flow); I removed this and still got the same error.
6. Call UpdateEvolvableToken sub-flow (my token-type has a field that tracks the number of issued tokens, so after each issue; I update that field).
This all works when I run my flow tests or when I run the flow locally from the node terminal (against H2 and Postgres DB).
I deployed this CorDapp to GCP (Google Cloud Platfrom) and I was able to run a different flow from that app, but when I run this flow; I get the below error:
net.corda.node.services.statemachine.FlowTimeoutException: replaying flow from the last checkpoint
at net.corda.node.services.statemachine.SingleThreadedStateMachineManager$scheduleTimeoutException$$inlined$with$lambda$1.run(SingleThreadedStateMachineManager.kt:638) ~[corda-node-4.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_222]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
I removed the HTTP call (fearing it was the issue) and redeployed, re-ran on GCP and I still got that error.
Not sure but for a start Tokens 1.1 should be using Corda 4.3 :)

Use GCP debugger but got permission error

I add #google-cloud/debug-agent on my nodejs project which is deployed on GKE.
But I got error:
restify listening to http://[::]:80
#google-cloud/debug-agent Failed to re-register debuggee nodejs-bot: Error: The caller does not have permission
#google-cloud/debug-agent Failed to re-register debuggee nodejs-bot: Error: The caller does not have permission
#google-cloud/debug-agent Failed to re-register debuggee nodejs-bot: Error: The caller does not have permission
#google-cloud/debug-agent Failed to re-register debuggee nodejs-bot: Error: The caller does not have permission
I have checked my GKE have the debug permission. I don't know why the service didn't have permission.
Here is the code I define on my index.ts
import * as tracer from '#google-cloud/trace-agent';
tracer.start();
import * as debug from '#google-cloud/debug-agent';
debug.start();
This issue can be resolved by doing the following:
1 - Create a new Cluster with these permissions enabled (i.e. Cloud Debugger/ Cloud Platform 'Enabled') and with the required scopes. [1]
Example:
$ gcloud container clusters create example-cluster-name --scopes https://www.googleapis.com/auth/cloud_debugger --zone
2- You can use the same YAML config files you used to deploy your original workloads in the new cluster. You must make sure you have the required scopes for this to work.
You can review how authentication and scopes work using [2] [3].
I found the issue is caused by workload identity, so I just close this feature to fix this issue.
Because I select to launch the workload identity feature. Every pod which needs to connect GCP service will need to create a service account for these pods. Otherwise, the permission will be blocked.
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

BizTalk - Start 'Rule Engine Update Service' Errors

I'm on a QA environment, and can talk to the BizTalk Admin tomorrow. Apparently, I'm the first person to call a BizRule from an orchestration, and the orchestration is giving error:
Inner exception: The type initializer for 'Microsoft.RuleEngine.RuleEngineCache' threw an exception.
I checked, and the Rules Engine is configured on this machine. But the "Rule Engine Update Service" was not running. I tried to start it, and get this error in the "Services" tool:
"The Rule Engine Update Service service on Local Computer started and
then stopped. Some services stop automatically if they are not in use
by other services or programs."
I checked the event log, and for each time I tried to start it, I see this:
Service could not be started. :
System.Reflection.TargetInvocationException: Exception has been thrown
by the target of an invocation. System.InvalidCastException: Specified
cast is not valid.
Any ideas what I can do? I have admin privileges on the machine (BizTalk 2013/R2).
Our BizTalk Admin corrected the CacheEntries using RegEdit. The one on the right was the other server in the group that was working.

Registering custom oracle

I've followed the oracle example in the corda docs to set up a simple price oracle for our cordapp, but am having an issue when it comes to registering this oracles service within our cordapp.
The error being displayed within our main code is
java.lang.IllegalArgumentException: Corda service com.secLendModel.flow.oracle.Oracle does not exist
By following this error down the line, it appears that this is because in the constructor of our oracle, the code segment below throws a null pointer exception, although the service seems to exist within our main testing code (i.e when I print the oracles advertised services, com.secLendModel.flow.oracle.Oracle is the only service it advertises)
constructor(services: PluginServiceHub) : this(services.myInfo.serviceIdentities(PriceType.type).first(), services)
The error for this constructor code is shown within node startup
W 09:45:34 1 Node.invoke - com.secLendModel.flow.oracle.Oracle does not have a type field, optimistically proceeding with install.
E 09:45:34 1 Node.installCordaServices - Corda service com.secLendModel.flow.oracle.Oracle failed to instantiate
java.util.NoSuchElementException: List is empty.
at kotlin.collections.CollectionsKt___CollectionsKt.first(_Collections.kt:178) ~[kotlin-stdlib-1.1.1.jar:1.1.1]
at com.secLendModel.flow.oracle.Oracle.<init>(Oracle.kt:24) ~[main/:?]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_131]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_131]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_131]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_131]
at net.corda.node.internal.AbstractNode.installCordaService(AbstractNode.kt:322) ~[corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode.installCordaServices(AbstractNode.kt:302) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode.access$installCordaServices(AbstractNode.kt:99) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode$start$3.invoke(AbstractNode.kt:254) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode$start$3.invoke(AbstractNode.kt:99) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode$initialiseDatabasePersistence$3.invoke(AbstractNode.kt:597) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode$initialiseDatabasePersistence$3.invoke(AbstractNode.kt:99) [corda-node-0.13.0.jar:?]
at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.inTopLevelTransaction(ThreadLocalTransactionManager.kt:69) [exposed-0.5.0.jar:?]
at org.jetbrains.exposed.sql.transactions.ThreadLocalTransactionManagerKt.transaction(ThreadLocalTransactionManager.kt:57) [exposed-0.5.0.jar:?]
at net.corda.node.utilities.DatabaseSupportKt.transaction(DatabaseSupport.kt:48) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode.initialiseDatabasePersistence(AbstractNode.kt:596) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.Node.initialiseDatabasePersistence(Node.kt:304) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.AbstractNode.start(AbstractNode.kt:223) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.Node.start(Node.kt:310) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.NodeStartup.startNode(NodeStartup.kt:103) [corda-node-0.13.0.jar:?]
at net.corda.node.internal.NodeStartup.run(NodeStartup.kt:81) [corda-node-0.13.0.jar:?]
at net.corda.node.Corda.main(Corda.kt:11) [corda-node-0.13.0.jar:?]
It appears this node setup happens within the abstractNode class under install corda service. I am not sure if this could be an error here, an issue with the fact the PluginServiceHub is now depreceated or if I am missing a step in my node setup.
The code used to setup the node is as follows
val oracle = startNode(ORACLE, advertisedServices = setOf(ServiceInfo(PriceType.type)))
oracleNode = oracle.get()
//This is what is reporting as empty when instantiating the node, but its definetely not. Could be something to do with abstractNode
println(oracleNode.nodeInfo.serviceIdentities(PriceType.type).first())
setUpNodes()
It should be noted that once all the nodes are setup, the oracle node then displays no services
CN=Oracle SP,O=Oracle SP,L=Brisbane,C=AU started on localhost:20018
While our notary node does display services
CN=Notary Service,O=R3,OU=corda,L=Zurich,C=CH,OU=corda.notary.validating started on localhost:20009
The code for our oracle follows the example code, and it doesnt seem there is any issues with it but I can post if if need be.
This is from an older version of Corda that had the concept of service identities for services such as oracles.
The error message indicates that an exception is being thrown on this line:
`println(oracleNode.nodeInfo.serviceIdentities(PriceType.type).first())`
In particular, it indicates that you are trying to get the first item from a list with no elements, and therefore that the PriceType.type service identity has not been registered correctly.
You have probably failed to register the service identity in the node's node.conf file.

Suspended orchestration service instance re-throwing the same unexpected exception after Resume

I am getting below error, when i am trying to resume Suspended(resumable) orchestration instance.
Scenario: Request went thourgh DB2 Static solicit - Response port, and it got failed because of access permission denied. I can see two instances suspended in the admin console one is related to port and another one is related to orchestration. After fixing the credentials, suspended port instance got resumed but the orchestration one is keep on failing.
Uncaught exception (see the 'inner exception' below) has suspended an instance of service 'Orchestration name'.
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: ca927086-465d-40e8-93fe-c3a0e4c161f7
Shape name:
ShapeId:
Exception thrown from: segment -1, progress -1
Inner exception: An error occurred while processing the message, refer to the details section for more information
Message ID: {96B72521-9833-48EF-BB2F-4A2E2265D697}
Instance ID: {F6FBC912-C9DC-489C-87F3-103FA1273FDC}
Error Description: The user does not have the authority to access the host resource. Check your authentication credentials or contact your system administrator. SQLSTATE: HY000, SQLCODE: -1000
Exception type: XlangSoapException
Source: Microsoft.XLANGs.BizTalk.Engine
Target Site: Void VerifyTransport(Microsoft.XLANGs.Core.Envelope, Int32, Microsoft.XLANGs.Core.Context)
The following is a stack trace that identifies the location where the exception occured
at Microsoft.BizTalk.XLANGs.BTXEngine.BTXPortBase.VerifyTransport(Envelope env, Int32 operationId, Context ctx)
at Microsoft.XLANGs.Core.Subscription.Receive(Segment s, Context ctx, Envelope& env, Boolean topOnly)
at Microsoft.XLANGs.Core.PortBase.GetMessageIdForSubscription(Subscription subscription, Segment currentSegment, Context cxt, Envelope& env, CachedObject location)
at Microsoft.XLANGs.Core.PortBase.GetMessageId(Subscription subscription, Segment currentSegment, Context cxt, Envelope& env, CachedObject location)
at (StopConditions stopOn)
at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Any thoughts how to fix this?
Creating the above scenario using samples:
Go to BizTalk
samples/orchestrations/consumeWebservice
folder, install the
ConsumeWebService application and
publish POWebservice to IIS.
Change IIS Directory security
permissions for POWebservice, remove
anonymous or any other access.
Now drop the message you will see
suspended messages because of HTTP
status 401: Access Denied, then give
access to POWebservice either
anonymous or Windows.
Then resume
the suspended instances, one will
get disappear but
another(orchestration) one wont.
The orchestration will continue to fail with the exception because when it was suspended, the last persistence point was the receipt of the exception. This means that the orchestration will re-start (when resumed) and re-throw the exception.
Here's at article discussing some points at which orchestration state is persisted to the database: http://blogs.msdn.com/b/sanket/archive/2006/11/12/understanding-persistence-points-in-biztalk-orchestration.aspx
You can manipulate this to some extent in your orchestration design, as Richard Seroter discusses here, but generally you would do better to use failed message routing, enabling you to handle the failed messages, and terminate the failed orchestration instance.
Please correct me if I'm wrong, but is this not just normal biztalk behavior? I am not 100% sure so please let me know if this is wrong:
The outbound messaging instance was suspended because the credentials the port was using to connect to to the DB were wrong.
This caused the orchestrations making these calls to also suspend.
The suspended message instance was resumed and was processed correctly because the problem was fixed. So the call was made to the DB.
However, the orchestration instance may not be able to resume because when resumed it found itself at the most recent persistence point and the original error which was delivered back from the send port is still available to the orchestration, causing it to re-suspend.
In the error message, it actually says "If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception."
If you want to handle this sort of thing you could make the call to the database atomic. That way the orchestration will not persist itself at the point of making the DB call. If the orchestration then suspends it will resume at a point before the DB call is made, and will make the DB call as normal, which should succeed this time because you have fixed the original issue.
The only problem with this is if your DB call cannot be executed more than once with the same data without bad things happenning (is not idempotent).
I am not 100% on the above explaination. Please point out if my understanding is incorrect.
this scenario not treated by Microsoft Biztalk = Middleware FAIL.
you have to solve this at the orchestration design stage up front...
http://seroter.wordpress.com/2007/01/02/orchestration-handling-of-suspended-messages/

Resources