Issue in using pagination in R3 Corda

Issue in using pagination in R3 Corda - corda

We are querying the vault and fetching the records which are more than 200 per nodes but getting the following error -
Please specify a PageSpecification as there are more results [201] than the default page size [200]
After increasing the pagination to 400 from the default 200, I am getting java heap out of memory error.
Can you please help.

You can increase the heap size available to the node by following the instructions here: https://docs.corda.net/running-a-node.html#starting-an-individual-corda-node.
If you continue to get an out-of-memory exception, you should either increase the memory allocated to the Java process further, or inspect your node's database to see if the transaction objects are abnormally large.

you can always add pagination logic for fetching the large number of records. please find the this link :
https://docs.corda.net/docs/corda-os/4.5/api-vault-query.html

Related

How to prevent high memory consumption caused by a aql-query returning a large result set?

In our artifactory-pro 7.38 instance I discovered very high memory usage that I haven't seen before in artifactory 6. Now I have a memory dump showing me a stack trace that reveals the cause of the memory consume. When using a certain aql-query to filter all artifacts by a date, the jdbc-resultset seems to become very large (+20 mio items). While there a probably options to limit the result, I wonder how can I protect the instance against such situation. Is there a way it generally limit the size of the resultset in terms of number of results? I read that there is at least support to pass a limit along with the aql-query but is there something that can be done on the server side, such as enforcing pagination?

In Artifactory version 7.41.x there has been an improvement to allow the system to kill long-running AQL queries exactly for this scenario, to avoid performance issues.
By default, the system will kill any queries that last more than 15 min. In case you want to change the default time for this you can add the following property to the system.properties file:
artifactory.aql.query.timeout.seconds - The query timeout for AQL, by default is 15mins (900 secs)
In addition, as you mentioned, it could be that the query can be improved. I recommend you to read this wiki page regarding Limits and Pagination.
I hope this clarifies and helps.

AWS Neptune Query gremlins slowness on cold call

I'm currently running some queries with a big gap of performance between first call (up to 2 minutes) and the following one (around 5 seconds).
This duration difference can be seen through the gremlin REST API in both execution and profile mode.
As the query is loading a big amount of data, I expect the issue is coming from the caching functionalities of Neptune in its default configuration. I was not able to find any way to improve this behavior through configuration and would be glad to have some advices in order to reduce the length of the first call.
Context :
The Neptune database is running on a db.r5.8xlarge instance, and during execution CPU always stay bellow 20%. I'm also the only user on this instance during the tests.
As we don't have differential inputs, the database is recreated on a weekly basis and switched to production once the loader has loaded everything. Our database have then a short lifetime.
The database is containing slightly above 1.000.000.000 nodes and far more edges. (probably around 10.000.000.000) Those edges are splitted across 10 types of labels, and most of them are not used in the current query.
Query :
// recordIds is a table of 50 ids.
g.V(recordIds).HasLabel("record")
// Convert local id to neptune id.
.out('local_id')
// Go to tree parent link. (either myself if edge come back, or real parent)
.bothE('tree_top_parent').inV()
// Clean duplicates.
.dedup()
// Follow the tree parent link backward to get all children, this step load a big amount of nodes members of the same tree.
.in('tree_top_parent')
.not(values('some flag').Is('Q'))
// Limitation not reached, result is between 80k and 100K nodes.
.limit(200000)
// Convert back to local id for the 80k to 100k selected nodes.
.in('local_id')
.id()

Neptune's architecture is comprised of a shared cluster "volume" (where all data is persisted and where this data is replicated 6 times across 3 availability zones) and a series of decoupled compute instances (one writer and up to 15 read replicas in a single cluster). No data is persisted on the instances however, approximately 65% of the memory capacity on an instance is reserved for a buffer pool cache. As data is read from the underlying cluster volume, it is stored in the buffer pool cache until the cache fills. Once the cache fills, a least-recently-used (LRU) eviction policy will clear buffer pool cache space for any newer reads.
It is common to see first reads be slower due to the need to fetch objects from the underlying storage. One can improve this by writing and issuing "prefetch" queries that pull in objects that they think they might need in the near future.
If you have a use case that is filling buffer pool cache and constantly seeing buffer pool cache misses (a metric one can see in the CloudWatch metrics for Neptune), then you may also want to consider using one of the "d" instance types (ex: r5d.8xlarge) and enabling the Lookup Cache feature [1]. This feature specifically focuses on improving access to property values/literals at query time by keeping them in a directly attached NVMe store on the instance.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-lookup-cache.html

Why Heap usage increase continuously?

I create API that just return string("OK").
And I test it with following configuration.
I monitor it with visualVM, and notice that heap usage increase continuously.
Eventually jvm perform gc.
Question.
why heap usage increase continuously?

Hello，I guess there is other classes in SpringMVC is created。such as threadlocal

Better than guessing you can actually use jvisualvm to help you find what objects are there in heap by memory sampling. You can see the number of instances of objects and total size in the profiling section as shown in below screenshot. You can take snapshots while your load test runs and you can then later analyze them.
You can even take a heap dump and analyze that with tools like Eclipse Memory Analyzer

Why is Gremlin Server / JanusGraph ignoring some of my requests?

I'm using the Gremlin Python library to perform traversals on a JanusGraph deployment of Gremlin Server (the same also happens using just Tinkergraph). Some long traversals (with thousands of instructions) don't get a response, no errors, no timeouts, no log entries or errors on the server or client. Nothing.
The conditions for this silence treatment aren't clear. The described behaviour doesn't linearly depend on bytes or number of instructions. For instance, this code will hang forever for me:
g = traversal().withRemote(DriverRemoteConnection('ws://localhost:8182/gremlin', 't'))
g = g.inject("")
for i in range(0, 8000):
g = g.constant("test")
print(f"submitting traversal with length={len(g.bytecode.step_instructions)}")
result = g.next()
print(f"done, got: {result}") # this is never reached
It doesn't depend on just the number of bytes in the request message since the number of instructions beyond which I don't get response doesn't change even with very large constant values in place of just "test". For instance, injecting 7000 values with many paragraphs of Lorem Ipsum works as expected and returns in a few milliseconds.
While it shouldn't matter (since I should be getting a proper error instead of nothing), I've already increased server-side maxHeaderSize, maxChunkSize, maxContentLength etc. to ridiculously high numbers. Changing the serialization format (e.g. from GraphSONMessageSerializerV3d0 to GraphBinaryMessageSerializerV1) doesn't help either.
Note: I know that very long traversals are an anti-pattern in Gremlin, but sometimes it's not possible or very inefficient to structure traversals such that they can use injected values instead.

I've answered this question on gremlin-users not realizing it was also asked here on StackOverflow. For completeness, I'll duplicate my response here.
The issue is less related to bytes and string lengths and more with the length of the traversal chain (i.e. the number of steps in your traversal). You end up hitting a JVM limit on the stack size on the server. You can increase the stack size on the jvm by changing the size of the -Xss value which should allow you a longer traversal length. That will likely come with the need to re-examine other JVM settings like -Xmx and perhaps garbage collection options.
I do find it interesting that you don't get any error messages though - you should see a stackoverflow somewhere, unless the server is just wholly bogged down by your request. I'd consider throwing more -Xmx at it to see if you can get it to respond with an error at least or to keep an eye on server logs to at least see it surface there.

H2 DB persistence.mv.db file size increases even on data cleanup in CordApp

Persistence.mv.db size increases even on wiping out old data. And after size increases more than 71 Mb it gives handshake timeout(netty connection). Nodes stop responding to REST services.
We have cleared data from tables like NODE_MESSAGE_IDS, NODE_OUR_KEY_PAIRS, due to large number of hoping between six nodes. And generation of temporary key pairs for a session. And similarly many other tables, e.g. node_transactions, even after clearing them, size increases.
And also when we declare:
val session = serviceHub.jdbcSession()
"session.autoCommit is false" everytime. Also I tries to set its value to true, and execute sql queries.But it did not decrease database size.
This is in reference to the same project. We solved pagination issue by removing the data from tables but DB size still increases. So it is not completely solved:-
Buffer overflow issue when rows in vault is more than 200

There might be issue with your flows, as the node is doing a lot of checkpointing.
Besides that I cannot think of any other scenarios to cause the database to constantly growing.