Can a Select Count(*) Affect Writes in Cassandra - count

I experienced a scenario where a select count(*) on a table every minute (yes, this should definitely be avoided) caused a huge increase in Cassandra writes to around 150K writes per second.
Can anyone explain this weird behavior? Why would a Select query significantly increase write count in Cassandra?
Thanks!

If you check
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground
and
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking
metrics you can see if its read repairs sending mutations. Perhaps reading all the data to service the count(*) is causing a lot of read repairs if your data is inconsistent. If thats the case lowering the read_repair_chance and dclocal_read_repair_chance on the table (ALTER TABLE) could reduce load.
Other likely possibilities are:
You have tracing enabled (either globally or on the table) as some %.
Or if you use DSE and you have slow query's enabled.

A possible explanation could be found in the write path of an update:
During a write , Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.
Then
Most Cassandra installations store replicas of each row on two or more nodes. Each node performs compaction independently. This means that even though out-of-date versions of a row have been dropped from one node, they may still exist on another node.
And finally:
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas.

Related

Cosmos DB - Slow COUNT

I am working on an existing Cosmos DB where the number of physical partitions is less than 100. Each contains around 30,000,000 documents. There is an indexing policy in place on "/*".
I'm just trying to get a total count from SQL API like so:
SELECT VALUE COUNT(1) FROM mycollection c
I have set EnableCrossPartitionQuery to true, and MaxDegreeOfParallelism to 100 (so as to at least cover the number of physical partitions AKA key ranges). The database is scaled to 50,000 RU. The query is running for HOURS. This does not make sense to me. An equivalent relational database would answer this question almost immediately. This is ridiculous.
What, if anything, can I change here? Am I doing something wrong?
Microsoft support ended up applying an update to the underlying instance. In this case, the update was in the development pipeline to be rolled out gradually. This instance got it earlier as a result of the support case. The update related to using indexes to service this type of query.

Is it ok to build architecture around regular creation/deletion of tables in DynamoDB?

I have a messaging app, where all messages are arranged into seasons by creation time. There could be billions of messages each season. I have a task to delete messages of old seasons. I thought of a solution, which involves DynamoDB table creation/deletion like this:
Each table contains messages of only one season
When season becomes 'old' and messages no longer needed, table is deleted
Is it a good pattern and does it encouraged by Amazon?
ps: I'm asking, because I'm afraid of two things, met in different Amazon services -
In Amazon S3 you have to delete each item before you can fully delete bucket. When you have billions of items, it becomes a real pain.
In Amazon SQS there is a notion of 'unwanted behaviour'. When using SQS api you can act badly regarding SQS infrastructure (for example not polling messages) and thus could be penalized for it.
Yes, this is an acceptable design pattern, it actually follows a best practice put forward by the AWS team, but there are things to consider for your specific use case.
AWS has a limit of 256 tables per region, but this can be raised. If you are expecting to need multiple orders of magnitude more than this you should probably re-evaluate.
You can delete a table a DynamoDB table that still contains records, if you have a large number of records you have to regularly delete this is actually a best practice by using a rolling set of tables
Creating and deleting tables is an asynchronous operation so you do not want to have your application depend on the time it takes for these operations to complete. Make sure you create tables well in advance of you needing them. Under normal circumstances tables create in just a few seconds to a few minutes, but under very, very rare outage circumstances I've seen it take hours.
The DynamoDB best practices documentation on Understand Access Patterns for Time Series Data states...
You can save on resources by storing "hot" items in one table with
higher throughput settings, and "cold" items in another table with
lower throughput settings. You can remove old items by simply deleting
the tables. You can optionally backup these tables to other storage
options such as Amazon Simple Storage Service (Amazon S3). Deleting an
entire table is significantly more efficient than removing items
one-by-one, which essentially doubles the write throughput as you do
as many delete operations as put operations.
It's perfectly acceptable to split your data the way you describe. You can delete a DynamoDB table regardless of its size of how many items it contains.
As far as I know there are no explicit SLAs for the time it takes to delete or create tables (meaning there is no way to know if it's going to take 2 seconds or 2 minutes or 20 minutes) but as long your solution does not depend on this sort of timing you're fine.
In fact the idea of sharding your data based on age has the potential of significantly improving the performance of your application and will definitely help you control your costs.

How can I improve performance while altering a large mysql table?

I have 600 Millions records in a table and I am not able to add a column in this table as every time I try to do it, it times out.
Suppose in your MYSQL database you have a giant table having 600 Millions of rows, having some schema operation such as adding a unique key, altering a column, even adding one more column to it is a very cumbersome process which will takes hours to process and sometimes there is a server time out. In order to overcome that, one to have to come up with very good migration plan, one of which I jotting below.
1) Suppose there is table Orig_X in which I have to add a new column colNew with default value as 0.
2) A Dummy table Dummy_X is created which is replica of Orig_X except with a new column colNew.
3) Data is inserted from the Orig_X to Dummy_X with the following settings.
4) Auto commit is set to zero, so that data is not committed after each insert statement hindering the performance.
5) Binary logs are set to zero, so that no data will be written in these logs.
6) After insertion of data bot the feature are set to one.
SET AUTOCOMMIT = 0;
SET sql_log_bin = 0;
Insert into Dummy_X(col1, col2, col3, colNew)
Select col1, col2, col3, from Orig_X;
SET sql_log_bin = 1;
SET AUTOCOMMIT = 1;
7) Now primary key can be created with the newly inserted column, which is now the part of primary key.
8) All the unique keys can now be created.
9) We can check the status of the server by issuing the following command
SHOW MASTER STATUS
10) It’s also helpful to issue FLUSH LOGS so MySQL will clear the old logs.
11) In order to boost performance to run the similar type of queries such as above insert statement, one should have query cache variable on.
SHOW VARIABLES LIKE 'have_query_cache';
query_cache_type = 1
Above were the steps for the migration strategy for the large table, below I am witting so steps to improve the performance of the database/queries.
1) Remove any unnecessary indexes on the table, pay particular attention to UNIQUE indexes as these when disable change buffering. Don't use a UNIQUE index if you have no reason for that constraint, prefer a regular INDEX.
2) If bulk loading a fresh table, delay creating any indexes besides the PRIMARY KEY. If you create them once all after data is loaded, then InnoDB is able to apply a pre-sort and bulk load process which is both faster and results in typically more compact indexes.
3) More memory can actually help in performance optimization. If SHOW ENGINE INNODB STATUS shows any reads/s under BUFFER POOL AND MEMORY and the number of Free buffers (also under BUFFER POOL AND MEMORY) is zero, you could benefit from more (assuming you have sized innodb_buffer_pool_size correctly on your server.
4) Normally your database table gets re-indexed after every insert. That's some heavy lifting for you database, but when your queries are wrapped inside a Transaction, the table does not get re-indexed until after this entire bulk is processed. Saving a lot of work.
5) Most MySQL servers have query caching enabled. It's one of the most effective methods of improving performance that is quietly handled by the database engine. When the same query is executed multiple times, the result is fetched from the cache, which is quite fast.
6) Using the EXPLAIN keyword can give you insight on what MySQL is doing to execute your query. This can help you spot the bottlenecks and other problems with your query or table structures. The results of an EXPLAIN query will show you which indexes are being utilized, how the table is being scanned and sorted etc...
7) If your application contains many JOIN queries, you need to make sure that the columns you join by are indexed on both tables. This affects how MySQL internally optimizes the join operation.
8) In every table have an id column that is the PRIMARY KEY, AUTO_INCREMENT and one of the flavors of INT. Also preferably UNSIGNED, since the value cannot be negative.
9) Even if you have a user’s table that has a unique username field, do not make that your primary key. VARCHAR fields as primary keys are slower. And you will have a better structure in your code by referring to all users with their id's internally.
10) Normally when you perform a query from a script, it will wait for the execution of that query to finish before it can continue. You can change that by using unbuffered queries. This saves a considerable amount of memory with SQL queries that produce large result sets, and you can start working on the result set immediately after the first row has been retrieved as you don't have to wait until the complete SQL query has been performed.
11) With database engines, disk is perhaps the most significant bottleneck. Keeping things smaller and more compact is usually helpful in terms of performance, to reduce the amount of disk transfer.
12) The two main storage engines in MySQL are MyISAM and InnoDB. Each have their own pros and cons.MyISAM is good for read-heavy applications, but it doesn't scale very well when there are a lot of writes. Even if you are updating one field of one row, the whole table gets locked, and no other process can even read from it until that query is finished. MyISAM is very fast at calculating
SELECT COUNT(*)
types of queries.InnoDB tends to be a more complicated storage
engine and can be slower than MyISAM for most small applications. But it supports row-based locking, which scales better. It also supports some more advanced features such as transactions.

DynamoDB: Conditional writes vs. the CAP theorem

Using DynamoDB, two independent clients trying to write to the same item at the same time, using conditional writes, and trying to change the value that the condition is referencing. Obviously, one of these writes is doomed to fail with the condition check; that's ok.
Suppose during the write operation, something bad happens, and some of the various DynamoDB nodes fail or lose connectivity to each other. What happens to my write operations?
Will they both block or fail (sacrifice of "A" in the CAP theorem)? Will they both appear to succeed and only later it turns out that one of them actually was ignored (sacrifice of "C")? Or will they somehow both work correctly due to some magic (consistent hashing?) going on in the DynamoDB system?
It just seems like a really hard problem, but I can't find anything discussing the possibility of availability issues with conditional writes (unlike with, for instance, consistent reads, where the possibility of availability reduction is explicit).
There is a lack of clear information in this area but we can make some pretty strong inferences. Many people assume that DynamoDB implements all of the ideas from its predecessor "Dynamo", but that doesn't seem to be the case and it is important to keep the two separated in your mind. The original Dynamo system was carefully described by Amazon in the Dynamo Paper. In thinking about these, it is also helpful if you are familiar with the distributed databases based on the Dynamo ideas, like Riak and Cassandra. In particular, Apache Cassandra which provides a full range of trade-offs with respect to CAP.
By comparing DynamoDB which is clearly distributed to the options available in Cassandra I think we can see where it is placed in the CAP space. According to Amazon "DynamoDB maintains multiple copies of each item to ensure durability. When you receive an 'operation successful' response to your write request, DynamoDB ensures that the write is durable on multiple servers. However, it takes time for the update to propagate to all copies." (Data Read and Consistency Considerations). Also, DynamoDB does not require the application to do conflict resolution the way Dynamo does. Assuming they want to provide as much availability as possible, since they say they are writing to multiple servers, writes in DyanmoDB are equivalent to Cassandra QUORUM level. Also, it would seem DynamoDB does not support hinted handoff, because that can lead to situations requiring conflict resolution. For maximum availability, an inconsistent read would only have to be at the equivalent of Cassandras's ONE level. However, to get a consistent read given the quorum writes would require a QUORUM level read (following the R + W > N for consistency). For more information on levels in Cassandra see About Data Consistency in Cassandra.
In summary, I conclude that:
Writes are "Quorum", so a majority of the nodes the row is replicated to must be available for the write to succeed
Inconsistent Reads are "One", so only a single node with the row need be available, but the data returned may be out of date
Consistent Reads are "Quorum", so a majority of the nodes the row is replicated to must be available for the read to succeed
So writes have the same availability as a consistent read.
To specifically address your question about two simultaneous conditional writes, one or both will fail depending on how many nodes are down. However, there will never be an inconsistency. The availability of the writes really has nothing to do with whether they are conditional or not I think.

Reindexing a large SQL Server database to Lucene

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.
These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):
i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}
This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.
Can anyone suggest a more efficient way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)
For what we do with Lucene, we rarely need to reindex everything. I can't remember coming across any case when all index would be corrupted (Lucene is actually quite safe/good at this), but it has been many times when individual items needed to be reindexed because of one reason or another. I'd say the most frequent reindexing patterns would be:
reindex items by given id (or set of ids)
reindex items by given period of time
The latter, of course, requires separate db index on the relevant date field(s) which should be a bit costly for 20M+ records but we decided to go for it (our biggest deployment had up to 10M records) as disk space is cheap these days anyway.
EDIT: added few explanations as per question author's comment.
If the source data structure changes, requiring reindexing of all records, our approach is to roll out new code which ensures all new data is correct (basically forms correct Lucene Document from this moment). Then after we can reindex things in batches (either manually or by hand), by providing relevant period ranges. This, to certain extent, also applies to Lucene version changes, too.
Why is a COUNT(*) a performance killer? What about MAX(id)? I'm thinking that a index would provide the information needed for those queries. You do have an index on your primary key, right?
I actually just figured it out - I can use IDENT_CURRENT(table_name) to get the last generated id, and use that instead of MAX() or Count() - this method should blow the other two away :)

Resources