Corda - IllegalArgumentException Don't know about party - corda

We are seeing this error coming back from an initiating node
I 17:49:50+0000 [Node thread-1] flow.[a3694ae6-ff1e-482e-af51-81cde48dbb94].initiateSession - Initiating flow session with party O=Notary, L=London, C=GB. Session id for tracing purposes is SessionId(toLong=3291272982884783111). {}
W 17:49:50+0000 [Node thread-1] flow.[a3694ae6-ff1e-482e-af51-81cde48dbb94].run - Terminated by unexpected exception {}
java.lang.IllegalArgumentException: Don't know about party O=Notary, L=London, C=GB
java.lang.IllegalArgumentException: Don't know about party O=Notary, L=London, C=GB
Is there a reason that the initiating node can't find the notary?

You can check the local network map cache of the node via crash shell, using the command run networkMapSnapshot and check that the node can see the notary.

Related

Node thread-1 transactions.TransactionBuilder

[ERROR] 17:33:56+0800 [Node thread-1] transactions.TransactionBuilder. - The transaction currently built will not validate because of an unknown error most likely caused by a
missing dependency in the transaction attachments.

Cannot run flow in corda node

I am running a corda network on kubernetes (corda version 4.4) and I am trying to install and run a cordapp .
The cordapp am trying to run is the Heartbeat one (from the github corda sample folder) .
But whenever I try to start the flow using the command start StartHeartbeatFlow
I get the following error message :
[INFO] 11:00:32+0200 [pool-2-thread-11] shell.StartShellCommand.main - Executing command "start StartHeartbeatFlow <no arguments>",
start StartHeartbeatFlow: exception: com.heartbeat.StartHeartbeatFlow
Tue Apr 07 11:00:32 CEST 2020>>> [ERROR] 11:00:32+0200 [pool-2-thread-11] command.CRaSHSession.execute - Error while evaluating request 'start StartHeartbeatFlow' start StartHeartbeatFlow: exception: com.heartbeat.StartHeartbeatFlow [errorCode=1oe81or, moreInformationAt=https://errors.corda.net/OS/4.4/1oe81or]
Which doesn't really help me on how to solve the issue :/
Running flow list is listing the StartHeartbeatFlow so it's not an issue with the installation of the cordapp ...
Has anyone encountered the same kind of issue ?
Thanks !
Edit : The logs in the corda node I have when I execute the flow start StartHeartbeatFlow command .
corda#corda-node-corda-node-0:~/logs$ tail -f corda-node.log | grep -A 10 -B 10 "heartbeat"
[DEBUG] 2020-04-07T13:21:09,767Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] realm.AuthenticatingRealm. - Looked up AuthenticationInfo [rpcuser] from doGetAuthenticationInfo
[DEBUG] 2020-04-07T13:21:09,767Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] realm.AuthenticatingRealm. - AuthenticationInfo caching is disabled for info [rpcuser]. Submitted token: [org.apache.shiro.authc.UsernamePasswordToken - rpcuser, rememberMe=false].
[DEBUG] 2020-04-07T13:21:09,767Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] credential.SimpleCredentialsMatcher. - Performing credentials equality check for tokenCredentials of type [[C and accountCredentials of type [java.lang.String]
[DEBUG] 2020-04-07T13:21:09,767Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] credential.SimpleCredentialsMatcher. - Both credentials arguments can be easily converted to byte arrays. Performing array equals comparison
[DEBUG] 2020-04-07T13:21:09,767Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] authc.AbstractAuthenticator. - Authentication successful for token [org.apache.shiro.authc.UsernamePasswordToken - rpcuser, rememberMe=false]. Returned account [rpcuser]
[DEBUG] 2020-04-07T13:21:09,768Z [Thread-8 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] artemis.BrokerJaasLoginModule. - Login for rpcuser succeeded
[DEBUG] 2020-04-07T13:21:09,771Z [Thread-12 (ActiveMQ-client-global-threads)] rpc.RPCServer. - -> RPC by rpcuser -> registeredFlows
[DEBUG] 2020-04-07T13:21:09,772Z [Thread-12 (ActiveMQ-client-global-threads)] rpc.RPCServer. - Arguments: [] {actor_id=rpcuser, actor_owning_identity=OU=Regular Node, O=organization, L=Brussels, C=BE, actor_store_id=NODE_CONFIG, invocation_id=aac88106-cf60-4c63-b1b9-c5fac224b89a, invocation_timestamp=2020-04-07T13:21:09.771Z, origin=rpcuser, session_id=df6cc401-6f9f-41b5-9a18-790c28e33b06, session_timestamp=2020-04-07T13:11:30.204Z}
[DEBUG] 2020-04-07T13:21:09,772Z [rpc-server-handler-pool-1] realm.AuthorizingRealm. - No authorizationCache instance set. Checking for a cacheManager... {actor_id=rpcuser, actor_owning_identity=OU=Regular Node, O=organization, L=Brussels, C=BE, actor_store_id=NODE_CONFIG, invocation_id=aac88106-cf60-4c63-b1b9-c5fac224b89a, invocation_timestamp=2020-04-07T13:21:09.771Z, origin=rpcuser, session_id=df6cc401-6f9f-41b5-9a18-790c28e33b06, session_timestamp=2020-04-07T13:11:30.204Z}
[DEBUG] 2020-04-07T13:21:09,772Z [rpc-server-handler-pool-1] realm.AuthorizingRealm. - No cache or cacheManager properties have been set. Authorization cache cannot be obtained. {actor_id=rpcuser, actor_owning_identity=OU=Regular Node, O=organization, L=Brussels, C=BE, actor_store_id=NODE_CONFIG, invocation_id=aac88106-cf60-4c63-b1b9-c5fac224b89a, invocation_timestamp=2020-04-07T13:21:09.771Z, origin=rpcuser, session_id=df6cc401-6f9f-41b5-9a18-790c28e33b06, session_timestamp=2020-04-07T13:11:30.204Z}
[DEBUG] 2020-04-07T13:21:09,773Z [rpc-server-sender] rpc.RPCServer. - <- RPC <- RpcReply(id=10ea96d9-5c19-4200-a64e-1eb3903835ce, timestamp: 2020-04-07T13:21:09.748Z, entityType: Invocation, result=Success([com.heartbeat.StartHeartbeatFlow, net.corda.core.flows.ContractUpgradeFlow$Authorise, net.corda.core.flows.ContractUpgradeFlow$Deauthorise, net.corda.core.flows.ContractUpgradeFlow$Initiate]), deduplicationIdentity=9c974c01-08af-44d0-bdef-c609efee11a8)
[DEBUG] 2020-04-07T13:21:09,775Z [Thread-62 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] artemis.BrokerJaasLoginModule. - Processing login for SystemUsers/NodeRPC
[DEBUG] 2020-04-07T13:21:09,775Z [Thread-62 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5#2f4a89fa)] artemis.BrokerJaasLoginModule. - Login for SystemUsers/NodeRPC succeeded
[DEBUG] 2020-04-07T13:21:09,809Z [Network Map Updater Thread-1] pool.PoolBase. - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#6fa7296c
[DEBUG] 2020-04-07T13:21:10,560Z [RxIoScheduler-2] network.NodeInfoWatcher. - pollDirectory /opt/corda/additional-node-infos
[DEBUG] 2020-04-07T13:21:10,560Z [RxIoScheduler-2] network.NodeInfoWatcher. - Examining /opt/corda/additional-node-infos/nodeInfo-FEBE485DF04D12B91F70740AC3EDDDB1A0C5058B017C6DD6046A1AF37AB1687D
[DEBUG] 2020-04-07T13:21:10,560Z [RxIoScheduler-2] network.NodeInfoWatcher. - Read 0 NodeInfo files from /opt/corda/additional-node-infos
[DEBUG] 2020-04-07T13:21:10,560Z [RxIoScheduler-2] network.NodeInfoWatcher. - Number of removed NodeInfo files 0
[DEBUG] 2020-04-07T13:21:11,824Z [Network Map Updater Thread-1] pool.PoolBase. - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#6fa7296c
[DEBUG] 2020-04-07T13:21:13,844Z [Network Map Updater Thread-1] pool.PoolBase. - HikariPool-1 - Reset (autoCommit) on connection org.postgresql.jdbc.PgConnection#6fa7296c
[DEBUG] 2020-04-07T13:21:15,559Z [RxIoScheduler-2] network.NodeInfoWatcher. - pollDirectory /opt/corda/additional-node-infos
I could invoke the flow from the standalone shell , I was having some weird issues with the /cordapp folder holding my cordapp locally . I deleted it and recreated it and now it works .
Can you update your question with more stack trace? Are there any other errors in your node's log file? I'm asking because I just tried the example and it worked for me.
Here's what I did:
Built the Java version:
// Browse to Java files.
cd /heartbeat/contracts-java
// Build the nodes (Notary and PartyA).
./../gradlew deployNodes
Start the nodes (I don't like using runnodes task, so I start each node individually):
// Terminal 1 (Notary).
cd /heartbeat/contracts-java/build/nodes/Notary
// Start the Notary.
java -jar corda.jar
// Terminal 2 (PartyA).
cd /heartbeat/contracts-java/build/nodes/PartyA
// Start PartyA.
java -jar corda.jar
Start the flow inside of PartyA's terminal. Notice that I use flow start instead of just start (like in your case); it's worth trying flow start, even though both should work:
flow start StartHeartbeatFlow
You will see in my screenshot below that the flow completed (i.e. it created the SchedulableState that will start the flow again, which will lead to an endless loop until you shutdown the node):
Now I can watch that flow being called again and again by typing the below in PartyA's terminal:
flow watch
I could invoke the flow from the standalone shell , I was having some weird issues with the /cordapp folder holding my cordapp locally . I deleted it and recreated it and now it works .

Initiating flow session failed

when we try to submit a transaction, it always failed, and from the Corda log, below is the last error message we got, anyone can help what does this error mean? How should I further do the trouble shooting. Thanks.
[INFO ] 2018-08-24T07:49:19,739Z [Node thread-1] flow.[c833dc79-501e-4484-9c43-a6924b472542].initiateSession - Initiating flow session with party O=CompanyC, L=Paris, C=FR. Session id for tracing purposes is SessionId(toLong=4256917187941908080). {}
[WARN ] 2018-08-24T07:50:01,777Z [Messaging DLGQRf63MNQ2zpywoVzUZ3eBVB4Yp5oaA5aYSogUwzuCCA] messaging.P2PMessagingClient.sendWithRetry - Reached the maximum number of retries (3) for message ClientMessageImpl[messageID=0, durable=true, address=internal.peers.DL2zA4g5QWv3dzx985Q9PMcvrNX4DUGv2pc7DcVjNgA8Hj,userID=null,properties=TypedProperties[platform-version=3,corda-vendor=Corda Open Source,release-version=3.2-corda,platform-topic=platform.session,_AMQ_DUPL_ID=8473dd65-96e3-4a45-8076-92016a03c56c]] redelivery to internal.peers.DL2zA4g5QWv3dzx985Q9PMcvrNX4DUGv2pc7DcVjNgA8Hj {}
[WARN ] 2018-08-24T07:50:01,808Z [Messaging DLGQRf63MNQ2zpywoVzUZ3eBVB4Yp5oaA5aYSogUwzuCCA] messaging.P2PMessagingClient.sendWithRetry - Reached the maximum number of retries (3) for message ClientMessageImpl[messageID=0, durable=true, address=internal.peers.DL2zA4g5QWv3dzx985Q9PMcvrNX4DUGv2pc7DcVjNgA8Hj,userID=null,properties=TypedProperties[platform-version=3,corda-vendor=Corda Open Source,release-version=3.2-corda,platform-topic=platform.session,_AMQ_DUPL_ID=66467ea0-56b9-4655-8311-f0806bf7fa97]] redelivery to internal.peers.DL2zA4g5QWv3dzx985Q9PMcvrNX4DUGv2pc7DcVjNgA8Hj {}
This error would occur if one node cannot reach another node, e.g. due to the node being down or incorrect firewall settings. Use a tool to see whether you can ping the receiving node's messaging port from the sending node's machine.

Processing is hanging at notary

Currently, I am using Corda V3.1 and there is one issue which I could not figure out the root cause of. The behavior of the error occurs when the application processes a transaction. It is hanged at the last step in the below logs:
>> Verifying contractCode constraints.
>> Signing transaction with our private key.
>> Collecting signatures from counterparties.
>> Done
>> Obtaining notary signature and recording transaction.
>> Requesting signature by notary service
>> Requesting signature by Notary service(hanged here)
I didn't make any changes, but it stopped working. From the log, I could see:
[INFO ] 2018-06-10T07:06:35,287Z [main] BasicInfo.printBasicNodeInfo - Node for "Notary" started up and registered in 42.91 sec {}
[INFO ] 2018-06-10T07:06:40,305Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Adding node with info: NodeInfo(addresses=[[2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005], legalIdentitiesAndCerts=[O=CompanyA, L=London, C=GB], platformVersion=3, serial=1528610763747) {}
[INFO ] 2018-06-10T07:06:40,336Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Previous node was identical to incoming one - doing nothing {}
[INFO ] 2018-06-10T07:06:40,336Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Done adding node with info: NodeInfo(addresses=[[2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005], legalIdentitiesAndCerts=[O=CompanyA, L=London, C=GB], platformVersion=3, serial=1528610763747) {}
[INFO ] 2018-06-10T07:06:40,336Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Adding node with info: NodeInfo(addresses=[[2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10008], legalIdentitiesAndCerts=[O=CompanyB, L=New York, C=US], platformVersion=3, serial=1528610765829) {}
[INFO ] 2018-06-10T07:06:40,352Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Previous node was identical to incoming one - doing nothing {}
[INFO ] 2018-06-10T07:06:40,352Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Done adding node with info: NodeInfo(addresses=[[2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10008], legalIdentitiesAndCerts=[O=CompanyB, L=New York, C=US], platformVersion=3, serial=1528610765829) {}
[INFO ] 2018-06-10T07:06:40,352Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Adding node with info: NodeInfo(addresses=[[2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10002], legalIdentitiesAndCerts=[O=Notary, L=London, C=GB], platformVersion=3, serial=1528610765215) {}
[INFO ] 2018-06-10T07:06:40,352Z [RxIoScheduler-2] network.PersistentNetworkMapCache.addNode - Discarding older nodeInfo for O=Notary, L=London, C=GB {}
[INFO ] 2018-06-10T07:06:53,654Z [nioEventLoopGroup-2-1] netty.AMQPClient.operationComplete - Failed to connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:06:54,663Z [nioEventLoopGroup-2-2] netty.AMQPClient.run - Retry connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:07:15,687Z [nioEventLoopGroup-2-3] netty.AMQPClient.operationComplete - Failed to connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:07:16,696Z [nioEventLoopGroup-2-4] netty.AMQPClient.run - Retry connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:07:37,720Z [nioEventLoopGroup-2-5] netty.AMQPClient.operationComplete - Failed to connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:07:38,728Z [nioEventLoopGroup-2-6] netty.AMQPClient.run - Retry connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:07:59,747Z [nioEventLoopGroup-2-7] netty.AMQPClient.operationComplete - Failed to connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:08:00,747Z [nioEventLoopGroup-2-8] netty.AMQPClient.run - Retry connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:08:21,768Z [nioEventLoopGroup-2-9] netty.AMQPClient.operationComplete - Failed to connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
[INFO ] 2018-06-10T07:08:22,779Z [nioEventLoopGroup-2-10] netty.AMQPClient.run - Retry connect to [2002:aafc:ce75:1007:34eb:f37b:e811:c350]:10005 {}
The last two steps are repeating again and again. The only approach to resolve it is to clean and re-deploy nodes but, for sure, that is not correct. Anyone able to help with this? Thanks a lot.
So it's not clear based on your description exactly how you were running your corda nodes.
The issue is that the corda nodes are having trouble communicating with each other but it's not clear why. if it was running on localhost than this is really strange.
If you're running these in the cloud than I'd try to regenerate your node configuration or maybe take another look at the network map corda node as it's definitely gotten wonky.
It also could be that the cordapp it's trying to run is making mistakes when trying to execute on the nodes or the notary.
You may have an easier time getting this to work with some of the newer developer samples in order to determine whether corda updates solved this problem.
The most basic sample that basically always works is the yo cordapp (https://github.com/corda/samples-java/tree/master/Basic/yo-cordapp). Try running it to see if you can isolate the problem to the flows or to corda.

Huge lag for WSO2 SVN Synchronizer to sync with manager updates on cluster

I'm running a test environment on WSO2 APIM 1.10.0 on VM on my Windows PC. It is configured to use MySQL server (MiraDB which I run on my PC as well). Everything was working ok.
Recently I wanted to try a WSO2 cluster environment by setting up 3 VM's on my PC:
The first one is running the publisher, store, KM, etc (I'm using offset 1, so ports are 9444, 9764, etc..)
The other two each run a gateway worker ("guest" port 8243, mapped to "host" ports 8243 and 8943).
I'm also running VisualSVN server to synchronize between them all.
On the manager node, SVN synchronizer is configured as:
<DeploymentSynchronizer>
<Enabled>true</Enabled>
<AutoCommit>true</AutoCommit>
<AutoCheckout>true</AutoCheckout>
<RepositoryType>svn</RepositoryType>
<SvnUrl>https://10.0.2.2/svn/apigw/</SvnUrl>
<SvnUser>...</SvnUser>
<SvnPassword>...</SvnPassword>
<SvnUrlAppendTenantId>true</SvnUrlAppendTenantId>
</DeploymentSynchronizer>
And on the worker nodes:
<DeploymentSynchronizer>
<Enabled>true</Enabled>
<AutoCommit>false</AutoCommit>
<AutoCheckout>true</AutoCheckout>
<RepositoryType>svn</RepositoryType>
<SvnUrl>https://10.0.2.2/svn/apigw/</SvnUrl>
<SvnUser>...</SvnUser>
<SvnPassword>...</SvnPassword>
<SvnUrlAppendTenantId>true</SvnUrlAppendTenantId>
</DeploymentSynchronizer>
AXIS2 is configured for clustering with
Manager node on port 4500
Worker nodes on port 4100 and 4200
I checked using telnet and all ports are accessible from all nodes.
Changes to API's on the manager are submitted correctly to the SVN. I checked on both Visual SVN and on a command line SVN client. For example, after adding API ofer3, revision #11 created, and was seen by the command line SVN tool:
> svn.exe revert .
> svn.exe update . -r HEAD --depth=infinity
Updating '.':
U -1234\synapse-configs\default\api\admin--ofer3_v1.0.0.xml
Updated to revision 11.
But it takes about 10 minutes before changes done on the manager nodes are populated to the worker.
For example, adding ofer2 API at the manager on 16:29
TID: [-1234] [] [2017-03-07 16:29:01,156] INFO {org.apache.synapse.rest.API} - Initializing API: admin--ofer2:v1.0.0
TID: [-1234] [] [2017-03-07 16:29:16,104] INFO {org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask} - Sent [SynchronizeRepositoryRequest{tenantId=-1234, tenantDomain='carbon.super', messageId=64959660-b2e6-4293-ad9c-3b0d68229976}]
Arrived to the client on 16:34, 5 minutes later:
TID: [-1234] [] [2017-03-07 16:34:14,134] INFO {org.apache.synapse.rest.API} - Initializing API: admin--ofer2:v1.0.0
TID: [-1234] [] [2017-03-07 16:34:14,134] INFO {org.apache.synapse.deployers.APIDeployer} - API named 'admin--ofer2:v1.0.0' has been deployed from file : /AppMount/wso2worker-1.10.0/repository/deployment/server/synapse-configs/default/api/admin--ofer2_v1.0.0.xml
And many times it took even more (9-10 minutes).
I turned on synchronizing debugger on the worker, expecting to see it trying to sync with SVN repository every few seconds, but only saw it trying to do so every few minutes.
Also tried with:
<SynchronizationPeriod>1</SynchronizationPeriod>
But it did not change anything.
As for log messages
On the worker log, I see:
TID: [-1234] [] [2017-03-07 15:07:31,431] ERROR {org.apache.catalina.loader.WebappClassLoa
der} - The web application [/api/am/publisher/v0.9] appears to have started a TimerThread
named [Timer-8] via the java.util.Timer API but has failed to stop it. To prevent a memor
y leak, the timer (and hence the associated thread) has been forcibly canceled.
But /api/am/publisher/0.9 is publisher's REST, which is not related.
Nothing else in the log seems interesting.
Final note: Changes to tenants are not populated at all. I have tenant #1 on the manager, and I do see it on the SVN repository, but on the worker a directory /AppMount/wso2/repository/tenants is empty. Only changes to super-carbon [-1234] are populated. Not sure if that's the same issue, or something else.
Any ideas will be highly appreciated.
FOLLOW UP #1, based on input from Pubci
Time is synchronized b/w all three nodes
Domain is identical in all three nodes (I left the default value)
axis2.xml of manager (10.0.2.2 is the address of host of the VM, so it serves as the "bridge" from one VM to the other)
<parameter name="domain">wso2.am.domain</parameter>
<parameter name="membershipScheme">wka</parameter>
<parameter name="localMemberPort">4500</parameter>
<members>
<member><hostName>127.0.0.1</hostName><port>4500</port></member>
<member><hostName>10.0.2.2</hostName><port>4100</port></member>
<member><hostName>10.0.2.2</hostName><port>4200</port></member>
</members>
axis2.xml of worker node 1:
<parameter name="domain">wso2.am.domain</parameter>
<parameter name="membershipScheme">wka</parameter>
<parameter name="localMemberPort">4500</parameter>
<members>
<member><hostName>10.0.2.2</hostName><port>4500</port></member>
<member><hostName>127.0.0.1</hostName><port>4100</port></member>
<member><hostName>10.0.2.2</hostName><port>4200</port></member>
</members>
When worker is coming up, it lists the following members:
TID: [-1234] [] [2017-03-08 09:40:39,450] INFO {org.wso2.carbon.core.clustering.hazelcast.util.MemberUtils} - Added member: Host:10.0.2.2, Remote Host:null, Port: 4500, HTTP:-1, HTTPS:-1, Domain: null, Sub-domain:null, Active:true
TID: [-1234] [] [2017-03-08 09:40:39,450] INFO {org.wso2.carbon.core.clustering.hazelcast.util.MemberUtils} - Added member: Host:127.0.0.1, Remote Host:null, Port: 4100, HTTP:-1, HTTPS:-1, Domain: null, Sub-domain:null, Active:true
TID: [-1234] [] [2017-03-08 09:40:39,451] INFO {org.wso2.carbon.core.clustering.hazelcast.util.MemberUtils} - Added member: Host:10.0.2.2, Remote Host:null, Port: 4200, HTTP:-1, HTTPS:-1, Domain: null, Sub-domain:null, Active:true
Note the "Domain: null" in the log. Is this ok?
When worker is coming up, it synchronizes correctly with the SVN repository:
TID: [-1234] [] [2017-03-08 09:40:51,184] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNNotifyListener} - revert /AppMount/wso2/repository/deployment/server
TID: [-1234] [] [2017-03-08 09:40:58,139] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNNotifyListener} - update /AppMount/wso2/repository/deployment/server -r HEAD --depth=infinity
TID: [-1234] [] [2017-03-08 09:40:59,766] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNNotifyListener} - notify.at
TID: [-1234] [] [2017-03-08 09:41:00,103] DEBUG {org.wso2.carbon.deployment.synchronizer.subversion.SVNBasedArtifactRepository} - files were updated to revision number: 15 using SVN Kit
From then on, every 15 seconds the carbon scheduler tasks says it runs SVN sync:
TID: [-1234] [] [2017-03-08 09:41:45,213] DEBUG {org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask} -
Running deployment synchronizer update... tenant : carbon.super
But the SVN synchronizer does not seem to update the files in synapse-config under server/deployment.
You mentioned this is because the message from the manager does not reach at the worker.
I do see the manager send a message:
TID: [-1234] [] [2017-03-08 08:49:48,121]
INFO {org.wso2.carbon.core.deployment.CarbonDeploymentSchedulerTask} -
Sent [SynchronizeRepositoryRequest{tenantId=-1234, tenantDomain='carbon.super',
messageId=a99ff1fc-58d8-44dd-8804-491216ae1a7c}]
Which debug should I enable to see if the message arrives at the worker?
For the troubleshooting, you can check the following.
Clustering configurations in Axis2.xml - As you are running multiple profiles in a single node you need to cluster all the 3 nodes as one cluster. So the domain name should be same in all the 3 nodes.
Time should be synched between all the 3 nodes.
Once you publish an API, cluster message is sent to the worker nodes. Then only worker nodes will get the update from the SVN.
Regarding the error message you got in the manager node, please check the AuthManager configuration in api-manager.xml. Looks like you have set the value as admin/services. That value should be key manager node hostname. In your case, it should be the hostname of the manager node.
Thanks to the input from Pubci I found the issue.
a) In axis2.xml of both manager and workers, the localMemberHost must be 10.0.2.2 (this is the gateway from the VM to the other hosts) and not 127.0.0.1:
<parameter name="localMemberHost">10.0.2.2</parameter>
b) Also in axis2.xml I enabled groupManagement, which was disabled. In the manager node:
<groupManagement enable="true">
<applicationDomain name="wso2.apim.domain"
description="APIM group"
agent="org.wso2.carbon.core.clustering.hazelcast.HazelcastGroupManagementAgent"
subDomain="mgt"
port="2233"/>
</groupManagement>
In the worker node:
<groupManagement enable="true">
<applicationDomain name="wso2.apim.domain"
description="APIM group"
agent="org.wso2.carbon.core.clustering.hazelcast.HazelcastGroupManagementAgent"
subDomain="worker"
port="2233"/>
</groupManagement>
(I'm using port 2233 instead of 2222, which is the default, as port 2222 is used for other purposes in my cluster).
Now in manager I see:
INFO {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme} -
Member joined [6bf6ae47-bea4-4bc4-beec-9140a626781b]: /10.0.2.2:4200
And in the worker, following API changes I do see the message coming in, also for tenants other than super.carbon:
INFO {org.wso2.carbon.core.clustering.hazelcast.HazelcastClusterMessageListener} -
Received ClusteringMessage: SynchronizeRepositoryRequest{tenantId=1, tenantDomain='0000s7.com', messageId=a573eeef-46d7-4a2b-bfc9-362296bb60d4}
Tips for anyone having issues with SVN on cluster:
Make sure the list of members that hazelcast displays when WSO2 is coming up is correct.
Make sure you see Member Joins messages at the manager log when the worker is coming up.
Make sure you see "Message Received" at the worker after changes at the manager.
Debugging options to help you out:
log4j.logger.org.wso2.carbon.core.deployment=DEBUG
log4j.logger.org.wso2.carbon.deployment.synchronizer=DEBUG

Resources