Gridgain node discovery and gridcache - datagrid

I start the grid gain node using G.start(gridConfiguration), the node automatically joins the existing nodes.After this I start loading the GridCache ( which is configured to be LOCAL ).
This works fine, but is there is way to access the Grid Cache without doing the G.start(gridConfiguration), since I would like to load the LOCAL cache first and then have the node being detected by other nodes once the cache is loaded succesfully

You need to have GridGain started in order to use it's API's. After the grid is started, you can access it using GridGain.grid().cache(...) method.
What you can do, for example, is use distributed count down latch (GridCacheCountDownLatch) which is exactly the same as java.util.concurrent.CountDownLatch class. Then you can have other nodes wait on the latch while your local cache is loading. Once loading is done, you can call latch.countDown() and other nodes will be able to proceed.
More information on count-down-latch, as well as other concurrent data structures in GridGain can be found in documentation.

Related

Google Cloud VM instance can't start and cannot be moved because of terminated state

I wanted to resize RAM and CPU of my machine, so I stopped the VM instance and when I tried to start it I got an error:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Here you can see the screenshot.
I've tried to start VM instance today, but result was the same and I got an error message again:
The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not
have enough resources available to fulfill the request. Try a
different zone, or try again later.`
Then I tried to move my instance to different region, but I got an error message:
sarkarijobalerts123#cloudshell:~ (freesarkarijobalerts)$ gcloud compute instances move wordpress-2-vm --zone=asia-south1-a --destination-zone=asia-south1-b
Moving gce instance wordpress-2-vm...failed.
ERROR: (gcloud.compute.instances.move) Instance cannot be moved while in state: TERMINATED
My website is DOWN for a couple of days, please help me.
The standard procedure is to create a snapshot out of the stopped VM instance [1] and then create a new one in another zone [2].
[1] https://cloud.google.com/compute/docs/disks/create-snapshots
[2] https://cloud.google.com/compute/docs/disks/restore-and-delete-snapshots#restore_a_snapshot_of_a_persistent_disk_to_a_new_disk
Let's have a look at the cause of this issue:
When you stop an instance it releases some resources like vCPU and memory.
When you start an instance it requests resources like vCPU and memory back and if there's not enough resources available in the zone you'll get an error message:
Error: The zone 'projects/freesarkarijobalerts/zones/asia-south1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
more information available in the documentation:
If you receive a resource error (such as ZONE_RESOURCE_POOL_EXHAUSTED
or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS) when requesting new
resources, it means that the zone cannot currently accommodate your
request. This error is due to Compute Engine resource obtainability,
and is not due to your Compute Engine quota.
Resource availability are depending from users requests and therefore are dynamic.
There are a few ways to solve your issue:
Move your instance to another zone by following instructions.
Wait for a while and try to start your VM instance again.
Reserve resources for your VM by following documentation to avoid such issue in future:
Create reservations for Virtual Machine (VM) instances in a specific
zone, using custom or predefined machine types, with or without
additional GPUs or local SSDs, to ensure resources are available for
your workloads when you need them. After you create a reservation, you
begin paying for the reserved resources immediately, and they remain
available for your project to use indefinitely, until the reservation
is deleted.

Spring Kafka batch within time window

Spring Boot environment listening to kafka topics(#KafkaListener / #StreamListener)
Configured the listener factory to operate in batch mode:
ConcurrentKafkaListenerContainerFactory # setBatchListener
or via application.properties:
spring.kafka.listener.type=batch
How to configure the framework so that given two numbers: N and T, it will try to fetch N records for the listener but won't wait more than T seconds, like described here: https://doc.akka.io/docs/akka/2.5/stream/operators/Source-or-Flow/groupedWithin.html
Some properties I've looked at:
max-poll-records ensures you won't get more than N numbers in a batch
fetch-min-size get at least this amount of data in a fetch request
fetch-max-wait but don't wait more than necessary
idleBetweenPolls just sleep a bit between polls :)
It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
It is obviously possible to implement that by hand, I'm looking whether it's possible to configure Spring to to that for me.
It seems like fetch-min-size combined with fetch-max-wait should do it but they compare bytes, not messages/records.
That is correct, unfortunately, Kafka provides no mechanism such as fetch.min.records.
I don't anticipate that Spring would layer this functionality on top of the kafka-clients; it would be better to ask for a new feature in Kafka itself.
Spring does not manipulate the records returned from the poll at all, except you can now specify subBatchPerPartition to get batches containing just one partition in order to properly support zombie fencing when using exactly once read/prcess/write.

How to resolve celery.backends.rpc.BacklogLimitExceeded error

I am using Celery with Flask after working for a good long while, my celery is showing a celery.backends.rpc.BacklogLimitExceeded error.
My config values are below:
CELERY_BROKER_URL = 'amqp://'
CELERY_TRACK_STARTED = True
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = False
Can anyone explain why the error is appearing and how to resolve it?
I have checked the docs here which doesnt provide any resolution for the issue.
Possibly because your process consuming the results is not keeping up with the process that is producing the results? This can result in a large number of unprocessed results building up - this is the "backlog". When the size of the backlog exceeds an arbitrary limit, BacklogLimitExceeded is raised by celery.
You could try adding more consumers to process the results? Or set a shorter value for the result_expires setting?
The discussion on this closed celery issue may help:
Seems like the database backends would be a much better fit for this purpose.
The amqp/RPC result backends needs to send one message per state update, while for the database based backends (redis, sqla, django, mongodb, cache, etc) every new state update will overwrite the old one.
The "amqp" result backend is not recommended at all since it creates one queue per task, which is required to mimic the database based backends where multiple processes can retrieve the result.
The RPC result backend is preferred for RPC-style calls where only the process that initiated the task can retrieve the result.
But if you want persistent multi-consumer result you should store them in a database.
Using rabbitmq as a broker and redis for results is a great combination, but using an SQL database for results works well too.

How can the data of a Corda node be deleted without restarting the node?

When running Corda nodes for testing or demo purposes, I often find a need to delete all the node's data and start it again.
I know I can do this by:
Shutting down the node process
Deleting the node's persistence.mv.db file and artemis folder
Starting the node again
However, I would like to know if it is possible to delete the node's data without restarting the node, as this would be much faster.
It is not currently possible to delete the node's data without restarting the node.
If you are "resetting" the nodes for testing purposes, you should make sure that you are using the Corda testing APIs to allow your contracts and flows to be tested without actually starting a node. See the testing API docs here: https://docs.corda.net/api-testing.html.
One alternative to restarting the nodes would also be to put the demo environment in a VmWare workstation, take a snapshot of the VM while the nodes are still "clean", run the demo, and then reload the snapshot.

How to lock node for deletion process

Within alfresco, I want to delete a node but I don't want to be used by any other users in a cluster environment.
I know that I will use LockService for lock a node (in a cluster environment) as in the folloing lines:
lockService.lock(deleteNode);
nodeService.deleteNode(deleteNode);
lockService.unlock(deleteNode);
the last line may cause an exception because the node has already been deleted, and indeed it causes the exception is
A system error happened during the operation: Node does not exist: workspace://SpacesStore/cb6473ed-1f0c-4fa3-bfdf-8f0bc86f3a12
So how to ensure concurrency in a cluster environment when delete a node to prevent two users to access the same node at the same time one of them want to update it and the second once want o delete it?
Depending on your cluster environment (e.g. same DB server used by all Alfresco instances), transactions might most likely just be enough to ensure no stale content is used:
serverA(readNode)
serverB(deleteNode)
serverA(updateNode) <--- transaction failure
The JobLockService allows more control in case of more complex operations, which might involve multiple, dynamic nodes (or no nodes at all, e.g. sending emails or similar):
serverA(acquireLock)
serverB(acquireLock) <--- wait for the lock to be released
serverA(readNode1)
serverA(if something then updateNode2)
serverA(updateNode1)
serverA(releaseLock)
serverB(readNode2)
serverB(releaseLock)

Resources