Replaying flow from last checkpoint error - corda

Please note that my node is part of Corda TestNet and it's deployed on Google Cloud Platform.
I'm using the following:
1. Corda OS 4.1 (updated to 4.3-RC01)
2. Tokens SDK 1.1-SNAPSHOT (updated to 1.1-RC01)
3. OKHttp 3.5.0
I have a flow that does the following:
1. Make an HTTP call (using OKHttp as per the HTTP example from the Samples repo); during my local tests I got an error that the Client, Request, and Response objects cannot be serialized so I nullified those references and it worked.
2. Fetch the latest version of my EvolvableTokenType
3. Use that version to create new FungibleTokens
4. Call IssueTokens sub-flow
5. Sleep for one millisecond (to checkpoint the flow); I removed this and still got the same error.
6. Call UpdateEvolvableToken sub-flow (my token-type has a field that tracks the number of issued tokens, so after each issue; I update that field).
This all works when I run my flow tests or when I run the flow locally from the node terminal (against H2 and Postgres DB).
I deployed this CorDapp to GCP (Google Cloud Platfrom) and I was able to run a different flow from that app, but when I run this flow; I get the below error:
net.corda.node.services.statemachine.FlowTimeoutException: replaying flow from the last checkpoint
at net.corda.node.services.statemachine.SingleThreadedStateMachineManager$scheduleTimeoutException$$inlined$with$lambda$1.run(SingleThreadedStateMachineManager.kt:638) ~[corda-node-4.1.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_222]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_222]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
I removed the HTTP call (fearing it was the issue) and redeployed, re-ran on GCP and I still got that error.

Not sure but for a start Tokens 1.1 should be using Corda 4.3 :)

Related

My clone of draw.io fails doGet when authenticating to Google Drive

We have a clone of draw.io. We have made some modifications to the UI for increased functionality. This works on localhost. We then did the Google Drive authentication process. But it was done on version 12.1.7 of draw.io. That integrated with google drive, except when re-opening a previously saved file.[Error with this version discussed here:https://stackoverflow.com/questions/62889536/no-access-control-allow-origin-header-is-present-on-the-requested-resource]
So, we ported our changes to the recent version of draw.io ,version 13.4.5 (https://github.com/jgraph/drawio/releases/tag/v13.4.5). Now, we get a different error. When trying to authenticate <screen shot of pop-up just before error> we get:
Login attempts failed.Please try again later <screenshot of error message>. When we look at developer tools, we see this
HTTP Status 500 - Can't make API call memcache.Set in a thread that is neither the original request thread nor a thread created by ThreadManager
type Exception report
message Can't make API call memcache.Set in a thread that is neither the original request thread nor a thread created by ThreadManager
description The server encountered an internal error that prevented it from fulfilling this request.
exception
com.google.apphosting.api.ApiProxy$CallNotFoundException: Can't make API call memcache.Set in a thread that is neither the original request thread nor a thread created by ThreadManager
com.google.apphosting.api.ApiProxy$CallNotFoundException.foreignThread(ApiProxy.java:800)
com.google.apphosting.api.ApiProxy$1.get(ApiProxy.java:175)
com.google.apphosting.api.ApiProxy$1.get(ApiProxy.java:172)
com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:89)
com.google.appengine.api.memcache.MemcacheServiceImpl.quietGet(MemcacheServiceImpl.java:26)
com.google.appengine.api.memcache.MemcacheServiceImpl.put(MemcacheServiceImpl.java:69)
com.google.appengine.api.memcache.stdimpl.GCache.put(GCache.java:157)
com.mxgraph.online.AbsAuthServlet.doGet(Unknown Source)
javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
note The full stack trace of the root cause is available in the Apache Tomcat/8.0.32 (Ubuntu) logs.
Notice the line in <com.mxgraph.online.AbsAuthServlet>. It says “doGet” is throwing error. I am new to servlets and don’t know how to proceed. I did not find “doGet” in the file AbsAuthServlet.java .
Any help to solve this issue will be great.

InvalidPidMappingException in KafkaListener BeginTransaction

We have been getting lots of InvalidPidMappingException failures recently.
I came across https://github.com/spring-projects/spring-kafka/issues/660, but this is specific to producerOnly publishing in tx.
In our case, this is happening in consumers, so it cannot be resolved using retries (keeps failing with each rollback retry).
There's a fix in kafka to address this exception, but will probably be released in 4 months (v2.5.0). https://github.com/apache/kafka/pull/7115
Is there a way we can recover from these failures?
One option we're considering is to use ContainerAwareErrorHandler to restart the listenercontainer in case of these 2 specific exceptions. Any issues with this approach? what would happen if another listener is processing another message?
org.apache.kafka.common.KafkaException: Cannot execute transactional method because we are in an error state
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeFailWithError(TransactionManager.java:924)
at org.apache.kafka.clients.producer.internals.TransactionManager.beginTransaction(TransactionManager.java:290)
at org.apache.kafka.clients.producer.KafkaProducer.beginTransaction(KafkaProducer.java:642)
at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.beginTransaction(DefaultKafkaProducerFactory.java:597)
at org.springframework.kafka.core.ProducerFactoryUtils.getTransactionalResourceHolder(ProducerFactoryUtils.java:101)
at org.springframework.kafka.transaction.KafkaTransactionManager.doBegin(KafkaTransactionManager.java:160)
at com.projectdrgn.common.messaging.config.KafkaProducerConfig$kafkaTransactionManager$kafkaTransactionManager$1.doBegin(KafkaProducerConfig.kt:65)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:376)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:137)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.recordAfterRollback(KafkaMessageListenerContainer.java:1463)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListenerInTx(KafkaMessageListenerContainer.java:1438)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeRecordListener(KafkaMessageListenerContainer.java:1398)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.invokeListener(KafkaMessageListenerContainer.java:1165)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:949)
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:884)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.errors.InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id.
at org.apache.kafka.clients.producer.internals.TransactionManager$AddPartitionsToTxnHandler.handleResponse(TransactionManager.java:1204)
at org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1069)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:561)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:307)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:238)
... 1 common frames omitted
Caused by: org.apache.kafka.common.errors.InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id.

Firebase Strictmode resource leak

After converting from Crashlytics via Fabric to Crashlytics via Firebase, I started seeing the below call stack in debug runs where StrictMode is enabled looking for resource leaks.
StrictMode is in use with this code, only on debug builds:
StrictMode.setVmPolicy(new StrictMode.VmPolicy.Builder()
.detectLeakedClosableObjects()
.penaltyLog()
.build());
I'm using this versions of Fabric's gradle tools in project-level gradle:
classpath "io.fabric.tools:gradle:1.27.0"
and these versions of Firebase and Crashlytics in module-level gradle:
implementation "com.google.firebase:firebase-core:16.0.7"
implementation "com.crashlytics.sdk.android:crashlytics:2.9.8"
During initialization, the Firebase instrumentation kicks off a background thread that is doing settings calls using okhttp. When it does, StrictMode causes this call stack to pop:
W/CrashlyticsCore: Received null settings, skipping report submission!
D/StrictMode: StrictMode policy violation: android.os.strictmode.LeakedClosableViolation: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks.
at android.os.StrictMode$AndroidCloseGuardReporter.report(StrictMode.java:1786)
at dalvik.system.CloseGuard.warnIfOpen(CloseGuard.java:264)
at java.util.zip.Inflater.finalize(Inflater.java:398)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:250)
at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:237)
at java.lang.Daemons$Daemon.run(Daemons.java:103)
at java.lang.Thread.run(Thread.java:764)
Caused by: java.lang.Throwable: Explicit termination method 'end' not called
at dalvik.system.CloseGuard.open(CloseGuard.java:221)
at java.util.zip.Inflater.<init>(Inflater.java:114)
at com.android.okhttp.okio.GzipSource.<init>(GzipSource.java:62)
at com.android.okhttp.internal.http.HttpEngine.unzip(HttpEngine.java:473)
at com.android.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:648)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:471)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:407)
at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:538)
at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)
at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:26)
at io.fabric.sdk.android.services.network.HttpRequest.code(HttpRequest.java:1357)
at io.fabric.sdk.android.services.settings.DefaultSettingsSpiCall.handleResponse(DefaultSettingsSpiCall.java:104)
at io.fabric.sdk.android.services.settings.DefaultSettingsSpiCall.invoke(DefaultSettingsSpiCall.java:88)
at io.fabric.sdk.android.services.settings.DefaultSettingsController.loadSettingsData(DefaultSettingsController.java:90)
at io.fabric.sdk.android.services.settings.DefaultSettingsController.loadSettingsData(DefaultSettingsController.java:67)
at io.fabric.sdk.android.services.settings.Settings.loadSettingsData(Settings.java:153)
at io.fabric.sdk.android.Onboarding.retrieveSettingsData(Onboarding.java:126)
at io.fabric.sdk.android.Onboarding.doInBackground(Onboarding.java:99)
at io.fabric.sdk.android.Onboarding.doInBackground(Onboarding.java:45)
at io.fabric.sdk.android.InitializationTask.doInBackground(InitializationTask.java:63)
at io.fabric.sdk.android.InitializationTask.doInBackground(InitializationTask.java:28)
at io.fabric.sdk.android.services.concurrency.AsyncTask$2.call(AsyncTask.java:311)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:458)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:764) 
D/FA: Event not sent since app measurement is disabled
I see this happening on pretty much every debug run of my app, during the initial activity startup in the app. However support claims they don't see this.
Does anyone know what conditions cause Fabric to kick off this onboarding/settings thread?
I've seen similar StrictMode call stacks in these other posts. I can't tell if this leak is in Fabric code, or in the okhttp library they are using. Here are links to similar cases where people are seeing what looks to me like the same underlying resource leak:
StrictMode penalising for Firebase Ads
Crashlytics with StrictMode enabled (detect all) gives "untagged socket detected"
https://github.com/cloudant/sync-android/issues/577

Corda Don't Know About Party - Notary

I'm using Corda 3.1 with a self compiled version of the obligation cordapp example. The environment has a spring boot network map service deployed with party nodes and a notary node deployed to multiple AWS EC2 instances. Each node's persistence is backed by its own schema in a postgres database.
I'm running into the following exception when starting the IssueObligation.kt flow (IOU) from the internal web server:
[INFO ] 2018-06-07T14:27:01,751Z [Node thread-1] flow.[d04d24bf-5aa7-472a-b336-2e72feff6abf].initiateSession - Initiating flow session with party O=Notary, L=Dover, C=US. Session id for tracing purposes is SessionId(toLong=7742727399076294852). {}
[WARN ] 2018-06-07T14:27:01,776Z [Node thread-1] flow.[d04d24bf-5aa7-472a-b336-2e72feff6abf].run - Terminated by unexpected exception {}
java.lang.IllegalArgumentException: Don't know about party O=Notary, L=Dover, C=US
at net.corda.node.services.statemachine.StateMachineManagerImpl.sendSessionMessage(StateMachineManagerImpl.kt:616) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.processSendRequest(StateMachineManagerImpl.kt:582) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.processIORequest(StateMachineManagerImpl.kt:569) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl.access$processIORequest(StateMachineManagerImpl.kt:63) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl$initFiber$2.invoke(StateMachineManagerImpl.kt:444) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.StateMachineManagerImpl$initFiber$2.invoke(StateMachineManagerImpl.kt:63) ~[corda-node-3.1-corda.jar:?]
at net.corda.node.services.statemachine.FlowStateMachineImpl$suspend$2.write(FlowStateMachineImpl.kt:507) ~[corda-node-3.1-corda.jar:?]
at co.paralleluniverse.fibers.Fiber$3.run(Fiber.java:1994) ~[quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:824) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.RunnableFiberTask.doExec(RunnableFiberTask.java:100) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at co.paralleluniverse.fibers.RunnableFiberTask.run(RunnableFiberTask.java:91) [quasar-core-0.7.9-jdk8.jar:0.7.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at net.corda.node.utilities.AffinityExecutor$ServiceAffinityExecutor$1$thread$1.run(AffinityExecutor.kt:62) [corda-node-3.1-corda.jar:?]
There isn't any other exception from the flow that points to exactly where this occurs, but it does happen after the party node successfully interacts with the other party node to issue the IOU. The network map service knows about the notary as it does report it in its list of nodes that have registered.
The party node knows about the notary because we don't see a failure from the flow when it executes this line:
val firstNotary get() = serviceHub.networkMapCache.notaryIdentities.firstOrNull() ?: throw FlowException("No available notary.")
Just trying to find out what steps I should take to troubleshoot the problem.
I have a similar set up and this has worked for me using dev mode.
Root cause: Network parameter file is just a signed file by the dev_CA endorsing a signedNodeInfo signed with key X is the valid notary. The error is thrown because your notary's existing key on the VM is maybe X, but the network parameter file distributed to the other nodes are endorsing Y.
So the notary couldn't prove to the other nodes that, that he with key X is the valid notary.
What bootstrapper does:
Request the notary node to regenerate a new dev cert/keys.
Request the notary node to sign his node.conf using his keys to produce nodeInfo-${hash}
Uses the devCA to sign over the notary signed nodeInfo-${hash} to produce network parameter file
To fix the problem.
Redistribute the nodeInfo-${hash} (optional)
Assuming all your notary and nodes already have an existing certificates folder.
Delete all nodeInfo-${hash} from all nodes.
Shut down all nodes and drop the node_identities table
Run java -jar corda.jar --just-generate-node-info to generate the nodeInfo-${hash} file
Redistribute the node info
Regenerate the network parameter file by forcing bootstrapper.jar to use notary existing key
Delete network parameter file from all nodes
Copy your notary existing certificates folder and place it somewhere/locally in this structure
Run java -jar network-bootstrapper.jar folder/
Redistribute the network parameter file generated
Restart all nodes
.
.folder/ // root folder containing the notary
├── notary_node // The notary's folder name (must be renamed this way)
├── certificates // The notary's certificates folder with the keys
└── notary_node.conf // The notary config (must be renamed this way)
After changing the Obligation example to match the IOU example, we were able to get this to work. Turns out that setting a time window on the transaction is enabling some kind of communication with the notary that causes the error. I do not know why.
.setTimeWindow(serviceHub.clock.instant(), 30.seconds)
Although removing this line this will allow an obligation (iou) to be written to the ledger, this line must be required to do the subsequent settlement command with the notary - as is we always see an insufficient funds error even though cash has been issued. So this is still not answered yet.

"KinesisClientLibIOException: Shard is not closed" on a DynamoDB Stream

I have a DynamoDB table to which I added a Stream. I created a Lambda to process this stream and test throughput, latency, etc. After finishing my tests I deleted the lambda's trigger.
Then I proceeded to test the same table with the Python MultiLangDaemon client, to compare and also see if it could pick up where the lambda left.
The Daemon starts processing shards and blows up with the exception below. Searching for it, I only found this answer, which did not apply. I tried deleting the DynamoDB table used to track the worker and have the MultiLangDaemon recreate it. Same thing happened.
Why does this happen and how can I recover without losing my data in the stream?
SEVERE: Caught exception:
com.amazonaws.services.kinesis.clientlibrary.exceptions.internal.KinesisClientLibIOException: Shard [shardId-00000001500614265247-3b7a2849, shardId-00000001500628985464-c896556e] is not closed. This can happen if we constructed the list of shards while a reshard operation was in progress.
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.assertClosedShardsAreCoveredOrAbsent(ShardSyncer.java:206)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.cleanupLeasesOfFinishedShards(ShardSyncer.java:652)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.syncShardLeases(ShardSyncer.java:141)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncer.checkAndCreateLeasesForNewShards(ShardSyncer.java:88)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownTask.call(ShutdownTask.java:122)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:49)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:24)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Resources