Raft.Next - persistent cluster configuration fails when running multiple processes

Raft.Next - persistent cluster configuration fails when running multiple processes - .net-core

I'm currently investigating Raft in dotNext and would like to move from the fairly simplistic example which registers all the nodes in the cluster at startup to using an announcer to notify the leader when a new node has joined.
To my understanding this means that I should start the initial node in ColdStart but then subsequent nodes should use the ClusterMemberAnnouncer to add to the cluster as:
services.AddTransient<ClusterMemberAnnouncer<UriEndPoint>>(serviceProvider => async (memberId, address, cancellationToken) =>
{
// Register the node with the configuration storage
var configurationStorage = serviceProvider.GetService<IClusterConfigurationStorage<UriEndPoint>>();
if (configurationStorage == null)
throw new Exception("Unable to resolve the IClusterConfigurationStorage when adding the new node member");
await configurationStorage.AddMemberAsync(memberId, address, cancellationToken);
});
It makes sense to me that the nodes should use a shared/persisted configuration storage so that when the second node tries to start up and announce itself, it's able to see the first cold-started active node in the cluster. However if I use the documented services.UsePersistentConfigurationStorage("configurationStorage") approach and then run the nodes in separate console windows ie. separate processes, the second node understandably says:
The process cannot access the file 'C:\Projects\RaftTest\configurationStorage\active.list' because it is being used by another process.
Has anyone perhaps got an example of using an announcer in Raft dotnext?
And does anyone know the best way (hopefully with an example) to use persistent cluster configuration storage so that separate processes (potentially running in different docker containers) are able to access the active list?

Related

Facing issue with Azure cosmos change feed processor for Multi region AKS

We have our custom Change Feed processor deployed in single Region in AKS with 5 instances. Things were always running in single region fine. (Please note that each pod instance (feed processor) is assigned a unique [.WithInstanceName(new GUID)].
We recently moving to a multi region setup as following:
EastUS AKS Cluster = 5 pods (5 feed processor with each unique instance name)
WestUS AKS Cluster = 5 pods (5 feed processor with each unique instance name)
Now with the above setup, the result is not very consistent as sometimes after the AKS service deployment our feedprocessor stops recieving the events for some of the collections).
To fix this we need to eventually delete the lease collection and then everything starts working again.
We cannot go live with the above workaround, so need help to resolve the issue.
Here is the code snippet:
Container leaseContainer = cosmosClient.GetContainer(databaseName, leaseContainerName);
changeFeedProcessor = cosmosClient.GetContainer(databaseName, sourceContainerName)
.GetChangeFeedProcessorBuilder(processorName: sourceContainerName, async (IReadOnlyCollection<TContainer> changes, CancellationToken cancellationToken) => await onChangesDelegate(changes, cancellationToken))
.WithInstanceName($"{Guid.NewGuid()}")
.WithLeaseContainer(leaseContainer)
.Build();
where leaseContainerName = "container-lease"

The problem is that you are mixing instances. If you want each region to work independently (the same document change go to both groups of processors independently), set a different processorName.
When you define a cluster of machines with a particular processorName, lease and monitored containers, you define a Deployment Unit. The change feed events are distributed across those machines.
If you deploy 2 clusters but with the same values, then the 10 pods are now the same Deployment Unit, so the changes are spread across the 10 pods, meaning now that a particular change will land in 1 of the instances on one of the region but the other region will not see it.
You could set as processorName the region name for example:
Container leaseContainer = cosmosClient.GetContainer(databaseName, leaseContainerName);
changeFeedProcessor = cosmosClient.GetContainer(databaseName, sourceContainerName)
.GetChangeFeedProcessorBuilder(processorName: regionName, async (IReadOnlyCollection<TContainer> changes, CancellationToken cancellationToken) => await onChangesDelegate(changes, cancellationToken))
.WithInstanceName($"{Guid.NewGuid()}")
.WithLeaseContainer(leaseContainer)
.Build();

How to deduplicate events when using RabbitMQ Publish/Subscribe Microservice Event Bus

I have been reading This Book on page 58 to understand how to do asynchronous event integration between microservices.
Using RabbitMQ and publish/subscribe patterns facilitates pushing events out to subscribers. However, given microservice architectures and docker usage I expect to have more than once instance of a microservice 'type' running. From what I understand all instances will subscribe to the event and therefore would all receive it.
The book doesn't clearly explain how to ensure only one of the instances handle the request.
I have looked into the duplication section, but that describes a pattern that explains how to deduplicate within a service instance but not necessarily against them...
Each microservice instance would subscribe using something similar to:
public void Subscribe<T, TH>()
where T : IntegrationEvent
where TH : IIntegrationEventHandler<T>
{
var eventName = _subsManager.GetEventKey<T>();
var containsKey = _subsManager.HasSubscriptionsForEvent(eventName);
if (!containsKey)
{
if (!_persistentConnection.IsConnected)
{
_persistentConnection.TryConnect();
}
using (var channel = _persistentConnection.CreateModel())
{
channel.QueueBind(queue: _queueName,
exchange: BROKER_NAME,
routingKey: eventName);
}
}
_subsManager.AddSubscription<T, TH>();
}
I need to understand how a multiple microservice instances of the same 'type' of microservice can deduplicate without loosing the message if the service goes down while processing.

From what I understand all instances will subscribe to the event and
therefore would all receive it.
Only one instance of subscriber will process the message/event. When you have multiple instances of a service running and subscribed to same subscription the first one to pick the message will set the message invisible from the subscription (called visibility timeout). If the service instance is able to process the message in given time it will tell the queue to delete the message and if it's not able to process the message in time , the message will re-appear in queue for any instance to pick it up again.
All standard service bus (rabbitMQ, SQS, Azure Serivce bus etc) provide this feature out of box.
By the way i have read this book and used the above code from eShotContainers and it works the way i described.
You should look into following pattern as well
Competing Consumers pattern
Hope that helps!

Cosmo ChangeFeed -Errors,exceptions and Service fail scenario's

All,
I am using Change Feed Processor Library.Want to know the best way to handle service failure along with the exceptions/errors scenario's in ProcessChangesAsync method. Below are the events am referring to.
1) Service failure - Service having the processor library crashed in the middle of some operation. How to start the process from the same document(doc on failure instance)? is there any inbuilt mechanism where change feed will start with the last failed documents? E.g. Let assume,in current batch we have 10 docs.5 processed successfully and then service breaks because of network failure or by some other reasons.Will my process starts with 6th document once service is re-started? How to achieve this?
2) Exception and Errors- Any errors in ProcessChangesAsync method can be handle using try catch at the global level but how to persist those failure records and make them available for the next batch? Again,looking for any available inbuilt mechanism in change feed process.

1) The Processor Library, by default, checkpoints after a successful run of ProcessChangesAsync. In the latest library version, you can customize the Checkpointer to do manual checkpoints in case you need it. If for some reason the processor shuts down before checkpointing, then it will start processing next from the the last successful checkpoint stored in the Leases collection. In your case, it will start with the first document again, so you will never lose a change but you could experience double processing (this is an "at least once" model).
2) There is no built-in mechanism that you can leverage, handling exceptions within the ProcessChangesAsync is your responsibility. You could not only add a global try/catch but, in the case you are looping over the documents, add a try/catch inside the loop, to handle a failing document (maybe send it to queue for later analysis/post-process) without losing the batch. If you require logging for those errors (I'm assuming that's what you mean by persisting errors?), then the latest version is compatible with LibLog, so plugging your own custom logging is as simple as:
using Microsoft.Azure.Documents.ChangeFeedProcessor.Logging;
var hostName = "SampleHost";
var tracelogProvider = new TraceLogProvider(); //You can use any provider supported by LibLog
using (tracelogProvider.OpenNestedContext(hostName))
{
LogProvider.SetCurrentLogProvider(tracelogProvider);
// After this, create IChangeFeedProcessor instance and start/stop it.
}
Source
Extra info for the comments
To avoid exceptions halting the batch or causing a batch to be reprocessed, you can have handling like this:
public async Task ProcessChangesAsync(IChangeFeedObserverContext context, IReadOnlyList<Document> documents, CancellationToken cancellationToken)
{
try
{
foreach(var document in documents)
{
try
{
// Do your work for the document
}
catch(Exception ex)
{
// Something happened with the current document, handle it, send it to a queue / another storage to analyze, log it. This catch will make the loop continue with the next.
}
}
}
catch(Exception ex)
{
// Something unhandled happened, log it and avoid throwing it again so the next batch is processed
}
}

How to subscribe for RabbitMQ notification messages?

I am developing a Qt5 server application and I am using the QAMQP library.
What I want to do is the following:
Another server should send a message whenever something about a user
should change
My server, which is distributed among multiple machines and has multiple processes per machine needs to be notified about these updates
The thing is, I am not sure about the architecture that I should build. I just know that whenever something about some user changes, the server needs to send a message to the RabbitMQ broker and all my processes that are interested in updates for that particular user should get the message. But should I create one queue per process, and bind it with a separate exchange for each user? Or maybe create in each process a separate queue for each user and bind that somehow to some exchange. Fanout exchanges come to mind, and one queue per process, I am just not sure about the queue-exchange relations even though I've spent quiet some time trying to figure it out.
Update, in order to clarify things and write about the progress
I have a distributed application that needs to be notified for product changes. Those changes happen often and are tracked by another platform. I want to get those updates in my application.
In order to achieve that, each one of my application instances creates it's own queue. Then, whenever an instance is interested in updates for a particular product it creates an exchange for that product and binds it to the queue, like this:
Exchange type : 'direct'
Exchange name : 'product_update'
Routing key : 'PRODUCT_CODE'
Where PRODUCT_CODE is a string that represents the code of the product. In the platform that track the changes, I just publish messages with the corresponding exchanges.
The problem comes when i need to unsubscribe for a product update. I am using the QAMQP library, and in the destructor of the QAMQP::Exchange there's an unconditional remove() call.
When that function is called I am getting error in the RabbitMQ log, which looks like this:
=ERROR REPORT==== 28-Jan-2014::08:41:35 ===
connection <0.937.0>, channel 7 - soft error:
{amqp_error,precondition_failed,
"exchange 'product_update' in vhost 'test-app' in use",
'exchange.delete'}
I am not sure how to properly unsubscribe. I know from the RabbitMQ web interface that I have only one exchange ('product_update') which has bindings to multiple queues with difference routing keys.
I can see that the call to remove() in QAMQP tries to delete the exchange, but since it's used by my other processes, it's still in use and cannot be removed, which I beleive is ok.
But what should I do to delete the exchange object that I created? Should I first unbind it from the queue? I believe that i should be able to delete the object without calling remove(), but I may be mistaken or I may doing it wrong.
Also, if there's a better pattern for what I am trying to accomplish, please advice.
Here's some sample code, per request.
ProductUpdater::ProductUpdater(QObject* parent) : QObject(parent)
{
mClient = new QAMQP::Client(this);
mClient->setAutoReconnect(true);
mClient->open(mConnStr);
connect(mClient, SIGNAL(connected()), this, SLOT(amqp_connected()));
}
void ProductUpdater::amqp_connected()
{
mQueue = mClient->createQueue();
connect(mQueue, SIGNAL(declared()), this, SLOT(amqp_queue_declared()));
connect(mQueue, SIGNAL(messageReceived(QAMQP::Queue*)),
this, SLOT(message_received(QAMQP::Queue*)));
mQueue->setNoAck(false);
mQueue->declare(QString(), QAMQP::Queue::QueueOptions(QAMQP::Queue::AutoDelete));
}
void ProductUpdater::amqp_queue_declared()
{
mQueue->consume();
}
void ProductUpdater::amqp_exchange_declared()
{
QAMQP::Exchange* exchange = qobject_cast<QAMQP::Exchange*>(sender());
if (mKeys.contains(exchange))
mQueue->bind(exchange, mKeys.value(exchange));
}
void ProductUpdater::message_received(QAMQP::Queue* queue)
{
while (queue->hasMessage())
{
const QAMQP::MessagePtr message = queue->getMessage();
processMessage(message);
if (!queue->noAck())
queue->ack(message);
}
}
bool ProductUpdater::subscribe(const QString& productId)
{
if (!mClient)
return false;
foreach (const QString& id, mSubscriptions) {
if (id == productId)
return true; // already subscribed
}
QAMQP::Exchange* exchange = mClient->createExchange("product_update");
mSubscriptions.insert(productId, exchange);
connect(exchange, SIGNAL(declared()), this, SLOT(amqp_exchange_declared()));
exchange->declare(QStringLiteral("direct"));
return true;
}
void ProductUpdater::unsubscribe(const QString& productId)
{
if (!mSubscriptions.contains(productId))
return;
QAMQP::Exchange* exchange = mSubscriptions.take(productId);
if (exchange) {
// This may even be unnecessary...?
mQueue->unbind(exchange, productId);
// This will produce an error in the RabbitMQ log
// But if exchange isn't destroyed, we have a memory leak
// if we do exchange->deleteLater(); it'll also produce an error...
// exchange->remove();
}
}

Amy,
I think your doubt is related to the message distribution style (or patterns) and the exchange types available for RabbitMQ. So, I'll try to cover them all with a short explanation and you can decide which will fit best for your scenario (RabbitMQ tutorials explained in another way).
Work Queue
Using the default exchange and a binding key you can post messages directly yo a queue. Once a message arrives for a queue, the consumers "compete" to grab the message, it means a message is not delivered to more than one consumer. If there are multiple consumers listening to a single queue, the messages will be delivered in a round-robin fashion.
Use this approach when you have work to do and you want to scale across multiple servers/processes easily.
Publish/Subscribe
In this model, one single sent message may reach many consumers listening on their queues. For this scenario, where you must unselectively dispatch messages to all consumers, you can use a fanout exchange. These exchanges are "dumb" and acts just like their names imply: like a fan. One thing enters and is replicated without any intelligence to all queues that are bound to the exchange. You could as well use direct exchanges, but only if you need to do any filtering or routing on the messages.
Use this scenario when you have something like an event and you may need multiple servers, processes and consumers to handle that event, each one doing a task of different nature to handle the event. If you do not need any filter/routing, use fanout exchange for this scenario.
Routing / Topic
A particular case of the Publish/Subscribe model, where you can have queues "listen" on the exchange using filters, that may have pattern matching (topics) or not (just route).
If you need pattern matching, use topic exchange type. If you don't, use direct.
When a queue "listens" to an exchange, a binding is used. In this binding, you may specify a binding key.
To deliver the message to the correct queues, the exchange examines the message's routing key. If it matches the binding key, the message is forwarded to that queue. The match strategy depends on wether you are using topic or direct exchange, as said before.
TL;DR:
For your scenario, if each process do something different with the User change event, use a single exchange with fanout type. Each class of handler declares the same queue name bound to that exchange. This relates to the Publish/Subscribe model above. You can distribute work to among consumers of the same class listening on the same queue name, even if they don't reside on the same process.
However, if all the consumers that are interested in the event perform the same task when handling, use the work queue model.
Hope this helps,

How to lock node for deletion process

Within alfresco, I want to delete a node but I don't want to be used by any other users in a cluster environment.
I know that I will use LockService for lock a node (in a cluster environment) as in the folloing lines:
lockService.lock(deleteNode);
nodeService.deleteNode(deleteNode);
lockService.unlock(deleteNode);
the last line may cause an exception because the node has already been deleted, and indeed it causes the exception is
A system error happened during the operation: Node does not exist: workspace://SpacesStore/cb6473ed-1f0c-4fa3-bfdf-8f0bc86f3a12
So how to ensure concurrency in a cluster environment when delete a node to prevent two users to access the same node at the same time one of them want to update it and the second once want o delete it?

Depending on your cluster environment (e.g. same DB server used by all Alfresco instances), transactions might most likely just be enough to ensure no stale content is used:
serverA(readNode)
serverB(deleteNode)
serverA(updateNode) <--- transaction failure
The JobLockService allows more control in case of more complex operations, which might involve multiple, dynamic nodes (or no nodes at all, e.g. sending emails or similar):
serverA(acquireLock)
serverB(acquireLock) <--- wait for the lock to be released
serverA(readNode1)
serverA(if something then updateNode2)
serverA(updateNode1)
serverA(releaseLock)
serverB(readNode2)
serverB(releaseLock)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex