raft: log cleaning and AppendEntries - raft

Log cleaning would remove entries which does not contribute to current state, then how to construct AppendEntries for those removed entries if they are needed for slow followers or new member? Need to modify AppendEntries so that it could contain non-consecutive entries? or use snapshot instead?

Copycat implements a form of incremental compaction using an algorithm that's somewhat similar to the log cleaning algorithm described in the Raft dissertation. So, there is some precedent and code for how to do this. Copycat's incremental compaction algorithm differs from the one described in the Raft literature in that it retains the positions of entries in the log after compaction to take advantage of sequential reads rather than copying entries to the head of the log, but AppendEntries RPCs sent to followers can still be sent with gaps in the batch of entries.
We handle the missing entries simply by including the index with each entry in an AppendRequest batch. But this also requires some mechanism in the log to skip entries on followers. If a follower is receiving entries from a compacted segment of the leader's log, the follower must duplicate the structure of the leader's log by skipping entries as they're written to the log.
There are some other challenges with incremental compaction in Raft that are not extensively described in the Raft literature, particularly with respect to handling tombstones. One of the issues with tombstones is they can't be removed from the log until they've been applied on all servers. If a tombstone is committed and is removed from the leader's log before it's replicated to a follower (that may be partitioned), that follower may never delete its state. In Copycat, this necessitated adding a globalIndex to track the highest index stored on all servers.
But I digress and I realize I over-answered your question. Incremental compaction in Raft is an interesting and challenging problem. If you're interested in reading more about how it was solved in Copycat, I've written extensive documentation on the Copycat's incremental compaction algorithm, including in-depth descriptions of various issues with handling of tombstones and approaches for implementing snapshots on top of an incremental compaction algorithm.
If you learn one thing from Copycat's documentation, it will likely be that there's a lot of complexity in incremental compaction in Raft. It took us many months to work out all the algorithms therein, but perhaps the lessons we learned can be of use to you.
For sure, implementing snapshots in Raft is significantly easier than incremental compaction algorithms like the one in Copycat. But there are still some complexities to it. For example, Java is not well suited to forking a process to prevent blocking during snapshots, and that was one of the reasons we chose to write an incremental compaction algorithm for Raft. Implementing support for large snapshots in Copycat would require either copying the full state machine state in memory or adding a leader transfer mechanism to ensure leaders are not blocked while snapshotting large state machines. Weigh the options against the reality of your environment.

Related

About the pattern to overcome the one update per second/entity limit on google datastore

I read this document and among several very relevant topics, some of them are key to a scalability problem I am facing.
Basically the document states that it is possible to overcome the 1 per second update ratio per entity that basically me drove me to redis in a use case that would not demand me to do it.
"a (google) software engineer in the Datastore team had mentioned a technique to obtain much higher throughput than one update per second on an entity group"
"The basic idea of Job Aggregation is to use a single thread to process a batch of updates. Because there is only one thread and only one transaction open on the entity group, there are no transaction failures due to concurrent updates. You can find similar ideas in other storage products such as VoltDb and Redis."
This is very useful to me but I don't have any clue on how this works.
Just creating a service and serialising (pull queue) upserts to a specific Kind could solve the issue? How datastore could be sure that no other thread would suddenly begin to upsert?
Thanks
It is important to keep in mind that Job Aggregation is not part of Datastore. As the documentation says, you need to use a single batch of updates. You can take a look here Batch operations to know how to upsert multiple entities.
About your second question, Datastore is not the responsible to ensure that other thread begin to upsert, you must to ensure that this not happens to get a better performance.
Here Datastore best practices there are other best practices that Google recommends to get better performance.

Can I rely on riak as master datastore in e-commerce

In riak documentation, there are often examples that you could model your e-commerce datastore in certain way. But here is written:
In a production Riak cluster being hit by lots and lots of concurrent writes,
value conflicts are inevitable, and Riak Data Types
are not perfect, particularly in that they do not guarantee strong
consistency and in that you cannot specify the rules yourself.
From http://docs.basho.com/riak/latest/theory/concepts/crdts/#Riak-Data-Types-Under-the-Hood, last paragraph.
So, is it safe enough to user Riak as primary datastore in e-commerce app, or its better to use another database with stronger consistency?
Riak out of the box
In my opinion out of the box Riak is not safe enough to use as the primary datastore in an e-commerce app. This is because of the eventual consistency nature of Riak (and a lot of the NoSQL solutions).
In the CAP Theorem distributed datastores (Riak being one of them) can only guarentee at most 2 of:
Consistency (all nodes see the same data at the same time)
Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
Partition tolerance (the system
continues to operate despite arbitrary partitioning due to network
failures)
Riak specifically errs on the side of Availability and Partition tolerance by having eventual consistency of data held in its datastore
What Riak can do for an e-commerce app
Using Riak out of the box, it would be a good source for the content about the items being sold in your e-commerce app (content that is generally written once and read lots is a great use case for Riak), however maintaining:
count of how many items left
money in a users' account
Need to be carefully handled in a distributed datastore.
Implementing consistency in an eventually consistent datastore
There are several methods you can use, they include:
Implement a serialization method when writing updates to values that you need to be consistent (ie: go through a single/controlled service that guarantees that it will only update a single item sequentially), this would need to be done outside of Riak in your API layer
Change the replication properties of your consistent buckets so that you can 'guarantee' you never retrieve out of date data
At the bucket level, you can choose how many copies of data you want
to store in your cluster (N, or n_val), how many copies you wish to
read from at one time (R, or r), and how many copies must be written
to be considered a success (W, or w).
The above method is similar to using the strong consistency model available in the latest versions of Riak.
Important note: In all of these data store systems (distributed or not) you in general will do:
Read the current data
Make a decision based on the current value
Change the data (decrement the Item count)
If all three of the above actions cannot be done in atomic way (either by locking or failing the 3rd if the value was changed by something else) an e-commerce app is open to abuse. This issue exists in traditional SQL storage solutions (which is why you have SQL Transactions).

What documentation exists for DynamoDB's consistency model, CAP, partition recovery etc?

I'm considering using Amazon's DynamoDB. Naturally, if you're bothering to use a highly available distributed data store, you want to make sure your client deals with outages in a sensible way!
While I can find documentation describing Amazon's "Dynamo" database, it's my understanding that "DynamoDB" derives its name from Dynamo, but is not at all related in any other way.
For DynamoDB itself, the only documentation I can find is a brief forum post which basically says "retry 500 errors". For most other databases much more detailed information is available.
Where should I be looking to learn more about DynamoDB's outage handling?
While Amazon DynamoDB indeed lacks a detailed statement about their choices regarding the CAP theorem (still hoping for a DynamoDB edition of Kyle Kingsbury's most excellent Jepsen series - Call me maybe: Cassandra analyzes a Dynamo inspired database), Jeff Walker Code Ranger's answer to DynamoDB: Conditional writes vs. the CAP theorem confirms the lack of clear information in this area, but asserts that we can make some pretty strong inferences.
The referenced forum post also suggests a strong emphasis on availability too in fact:
DynamoDB does indeed synchronously replicate across multiple
availability zones within the region, and is therefore tolerant to a
zone failure. If a zone becomes unavailable, you will still be able to
use DynamoDB and the service will persist any successful writes that
we have acknowledged (including writes we acknowledged at the time
that the availability zone became unavailable).
The customer experience when a complete availability zone is lost
ranges from no impact at all to delayed processing times in cases
where failure detection and service-side redirection are necessary.
The exact effects in the latter case depend on whether the customer
uses the service's API directly or connects through one of our SDKs.
Other than that, Werner Vogels' posts on Dynamo/DynamoDB provide more insight eventually:
Amazon's Dynamo - about the original paper
Amazon DynamoDB – a Fast and Scalable NoSQL Database Service Designed for Internet Scale Applications - main introductory article including:
History of NoSQL at Amazon – Dynamo
Lessons learned from Amazon's Dynamo
Introducing DynamoDB - this features the most relevant information regarding the subject matter
Durable and Highly Available. Amazon DynamoDB replicates its data over
at least 3 different data centers so that the system can continue to
operate and serve data even under complex failure scenarios.
Flexible. Amazon DynamoDB is an extremely flexible system that does
not force its users into a particular data model or a particular
consistency model. DynamoDB tables do not have a fixed schema but
instead allow each data item to have any number of attributes,
including multi-valued attributes. Developers can optionally use
stronger consistency models when accessing the database, trading off
some performance and availability for a simpler model. They can also
take advantage of the atomic increment/decrement functionality of
DynamoDB for counters. [emphasis mine]
DynamoDB One Year Later: Bigger, Better, and 85% Cheaper… - about improvements
Finally, Aditya Dasgupta's presentation about Amazon's Dynamo DB also analyzes its modus operandi regarding the CAP theorem.
Practical Guidance
In terms of practical guidance for retry handling, the DynamoDB team has meanwhile added a dedicated section about Handling Errors, including Error Retries and Exponential Backoff.

SQLite and concurrency [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Original close reason(s) were not resolved
Improve this question
I need to integrate a database in one of our products and I wonder which one would be more suited to our needs (easy automatic deployment, no administration, good performance), and sqlite seems to be a good solution. The problem is that the database could potentially face high concurrency issues: it is accessed through PHP (Apache) each time a client connects to the server the database is running on. One client connects (and execute an INSERT query) approximatively every 10 seconds to the server, and it could possibly have more than 100 clients running.
When executing an INSERT query, sqlite locks the entire database at a certain time for a certain duration. Is there a way to compute that duration? If this is not possible, do you think sqlite (v3.3.7) is still adapted with the above conditions?
I try to avoid emotive replies and hyperbole but I am truly astonished at the lack of knowledge about sqlite displayed on this page. Different database implementations serve different needs and from the operational specs you provide, sqlite3 seems ideal for your needs. To elaborate:
sqlite3 is fully ACID compliant, meaning it ensures atomic commits, which is something neither MySQL (good as it may be) nor Oracle can brag about. See more here
Also, sqlite3 has a deceptively simple mechanism for ensuring maximum concurrency (which is also thread-safe) as described in their File locking and Concurrency document.
By their (sqlite3 developers') own estimation, sqlite3 is capable of up to 50,000 INSERTs per second - a theoretical maximum which is limited by disk rotation speed. ACID compliance requires sqlite3 to confirm that a database commit has been written to disk, so an INSERT, UPDATE or DELETE transaction requires two full disk rotations, thereby effectively reducing the number of transactions to 60/s on a 7200rpm diskdrive. This is outlined in the sqlite FAQ linked in another answer and the fact gives some idea of the engine's data throughput capability in production. But what about concurrent reading and writing?
The File locking and Concurrency document linked earlier, explains how sqlite3 avoids "writer startvation" - a condition whereby heavy database read access prevents a process/thread seeking to write to the database from acquiring a lock. The escalation of locking state from SHARED to PENDING to EXCLUSIVE happens as sqlite3 encounters an INSERT (or UPDATE or DELETE) statement and then again upon COMMIT, meaning that the full database lock is delayed to the last moment before an actual write is performed. The outcome of sqlite's clever mechanism for handling file locking means that should a writer join the queue (PENDING lock), existing reads (SHARED locks) will complete, grant an EXCLUSIVE lock to the writer process and then resume reading. This takes only a few milliseconds, meaning that the effective transaction throughput will hardly move from the 60/s rate quoted above.
I believe the default sqlite3 WAIT on an EXCLUSIVE lock is 3 seconds, so given the fact that 60 transactions per second is a reasonable expectation and that you seek to write to the database on average once every 10 seconds - I'd say sqlite3 is well up to the task and will only require the introduction of clustering once your traffic increases by a factor of 500.
Not bad and perfect for your requirement.
I don't think that SQLite would be a good solution for those requirements. SQLite is designed for local and lightweight use only, not to serve hundreds of requests.
I would recommend some other solution, for example MySQL or PostgreSQL, both can be scripted quite well. So, if I were you, I would put my efforts into the setup scriptings.
To avoid the flame war between SQLite believers and haters, let me draw draw your attention to the often referred SQLite When-To-Use document (I believe it is considered as a credible source). Here they state the following:
Situations Where A Client/Server RDBMS May Work Better
High Concurrency
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
I think that in the referred question involves many writes and if the OP would go for SQLite, it would result a non-scalable solution.
Here is what SQLite has to say about appropriate uses of SQLite: http://www.sqlite.org/whentouse.html In particular, that page says SQLite is good for low the medium traffic sites, exactly the sort of application that you're contemplating.
Seems like SQLite would work for you, unless you expect substantial growth. Depending on what you do in each request, I would expect that a query rate of 0.17 queries per second to be well within SQLite's capabilities.
For good user experience, you should design your site so that queries needed to service a single request take ~ 200 milliseconds. To achieve this, result sets should probably not touch more than a few score rows; and should rely on indeces, not full table scans. If you hit that, then you'll have enough headroom to serve 5 queries per second (at peak). That's 30x the requirement that you state in your question.
The SQLite FAQ covers this topic: http://www.sqlite.org/faq.html (See: "(5) Can multiple applications or multiple instances of the same application access a single database file at the same time?")
But for your particular use, you'd probably want to do some stress testing to verify it'll meet your needs. 100 concurrent users might be a bit much for SQLite.
I read through the FAQs, and it seems that SQLite has some pretty decent support for concurrency, but may require the use of transactions to be sure things are going to go well.
The comments above regarding Apache concurrency are correct: one Apache server can serve out multiple requests, the number depends on how many processes are run. Most of the servers I run, its set to 3-5 processes, while on larger installations, it might reach 20. The point here is that SQLite can more than handle a small to medium traffic web site, as it can do thousands of inserts a second.
I plan on using SQLite for my current project, but to be safe, I fully intend on using BEGIN TRANSACTION and COMMIT for any writing or concurrency-sensitive parts.
Bottom line, as usual, Read The Manual.
In addition to the discussion above about sqlite3, one more awesome feature introduced in sqlite v 3.7.0 is WAL Mode, in which you can read from multiple processes and can write with one process at the same time.
have a look at http://www.sqlite.org/wal.html

What are common reasons for deadlocks?

Deadlocks are hard to find and very uncomfortable to remove.
How can I find error sources for deadlocks in my code? Are there any "deadlock patterns"?
In my special case, it deals with databases, but this question is open for every deadlock.
Update: This recent MSDN article, Tools And Techniques to Identify Concurrency Issues, might also be of interest
Stephen Toub in the MSDN article Deadlock monitor states the following four conditions necessary for deadlocks to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a)
{
...
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
He goes on to explain that the way to avoid deadlocks is to avoid (or thwart) condition four.
Joe Duffy discusses several techniques
for avoiding and detecting deadlocks,
including one known as lock leveling.
In lock leveling, locks are assigned
numerical values, and threads must
only acquire locks that have higher
numbers than locks they have already
acquired. This prevents the
possibility of a cycle. It's also
frequently difficult to do well in a
typical software application today,
and a failure to follow lock leveling
on every lock acquisition invites
deadlock.
The classic deadlock scenario is A is holding lock X and wants to acquire lock Y, while B is holding lock Y and wants to acquire lock X. Since neither can complete what they are trying to do both will end up waiting forever (unless timeouts are used).
In this case a deadlock can be avoided if A and B acquire the locks in the same order.
No deadlock patterns to my knowledge (and 12 years of writing heavily multithreaded trading applications).. But the TimedLock class has been of great help in finding deadlocks that exist in code without massive rework.
http://www.randomtree.org/eric/techblog/archives/2004/10/multithreading_is_hard.html
basically, (in dotnet/c#) you search/replace all your "lock(xxx)" statements with "using TimedLock.Lock(xxx)"
If a deadlock is ever detected (lock unable to be obtained within the specified timeout, defaults to 10 seconds), then an exception is thrown. My local version also immediately logs the stacktrace. Walk up the stacktrace (preferably debug build with line numbers) and you'll immediately see what locks were held at the point of failure, and which one it was attempting to get.
In dotnet 1.1, in a deadlock situation as described, as luck would have it all the threads which were locked would throw the exception at the same time. So you'd get 2+ stacktraces, and all the information necessary to fix the problem. (2.0+ may have changed the threading model internally enough to not be this lucky, I'm not sure)
Making sure all transactions affect tables in the same order is the key to avoiding the most common of deadlocks.
For example:
Transaction A
UPDATE Table A SET Foo = 'Bar'
UPDATE Table B SET Bar = 'Foo'
Transaction B
UPDATE Table B SET Bar = 'Foo'
UPDATE Table A SET Foo = 'Bar'
This is extremely likely to result in a deadlock as Transaction A gets a lock on Table A, Transaction B gets a lock on table B, therefore neither of them get a lock for their second command until the other has finished.
All other forms of deadlocks are generally caused through high intensity use and SQL Server deadlocking internally whilst allocated resources.
Yes - deadlocks occur when processes try to acquire resources in random order. If all your processes try to acquire the same resources in the same order, the possibilities for deadlocks are greatly reduced, if not eliminated.
Of course, this is not always easy to arrange...
The most common (according to my unscientific observations) DB deadlock scenario is very simple:
Two processes read something (a DB record for example), both acquire a shared lock on the associated resource (usually a DB page),
Both try to make an update, trying to upgrade their locks to exclusive ones - voila, deadlock.
This can be avoided by specifying the "FOR UPDATE" clause (or similar, depending on your particular RDBMS) if the read is to be followed by an update. This way the process gets the exclusive lock from the start, making the above scenario impossible.
I recommend reading this article by Herb Sutter. It explains the reasons behind deadlocking issues and puts forward a framework in this article to tackle this problem.
The typical scenario are mismatched update plans (tables not always updated in the same order). However it is not unusual to have deadlocks when under high processing volume.
I tend to accept deadlocks as a fact of life, it will happen one day or another so I have my DAL prepared to handle and retry a deadlocked operation.
A condition that occure whene two process are each waiting for the othere to complete befoure preceding.the result is both procedure is hang.
its most comonelly multitasking and clint/server.
Deadlock occurs mainly when there are multiple dependent locks exist. In a thread and another thread tries to lock the mutex in reverse order occurs. One should pay attention to use a mutex to avoid deadlocks.
Be sure to complete the operation after releasing the lock. If you have multiple locks, such as access order is ABC, releasing order should also be ABC.
In my last project I faced a problem with deadlocks in an sql Server Database. The problem in finding the reason was, that my software and a third party software are using the same Database and are working on the same tables. It was very hard to find out, what causes the deadlocks. I ended up writing an sql-query to find out which processes an which sql-Statements are causing the deadlocks. You can find that statement here: Deadlocks on SQL-Server
To avoid the deadlock there is a algorithm called Banker's algorithm.
This one also provides helpful information to avoid deadlock.

Resources