Why is myisam storage engine is faster than Innodb storage engine - innodb

Why is myisam storage engine is faster than Innodb storage engine
what makes it to be faster and innodb to be slow?

MyISAM can be faster than innodb because it has a simpler design. Also it has been around a much longer time, and has more time to accumulate incremental improvements and optimizations based on data on real usage.
Design of MyISAM is based on ISAM: http://en.m.wikipedia.org/wiki/ISAM
The speed comes with a price though: MyISAM does not support transactions or foreign keys. This means it is optimal for example for saving and analyzing log data, and as a backend for data mining and data warehouse technologies, while innodb is better for transaction processing and as the storage layer of information systems in general.
Wikipedia has a nice comparison of MySQL storage engines here: http://en.m.wikipedia.org/wiki/Comparison_of_MySQL_database_engines

Related

AWS Glue crawler is too slow

There is a problem using aws glue.
I am trying to build a data catalog using a crawler, but it takes too long.
Also, it uses too much dynamodb read capacity.
Normally, there wasn't much read capacity, but read capacity increased to 62.
I know that crawler is a function that only generates meta data, but why does it take so long?
After running for about an hour, it took too long to cancel.
Is the crawler reading all data from dynamodb?
my dynamodb
dynamodb storage size: 66GB
dynamodb record: 158,296,668
read/write capacity mode : on-demand

Does SQLite have an auto-deletion feature in an in-memory database?

I want to start using an in-memory database with an auto-deletion feature enabled and set for a couple hours.
I have a huge volume of data and requirements to support different types of complex queries (just for data persisted on last one or two hours).
I believe that in-memory databases can help me with that. And SQLite is very well adopted and trusted. And it has a in-memory feature. But, I didn't find anything about an auto-deletion feature (based on time).
I see that auto-deletion is a common feature on in-memory databases (i.e., IMDB), but I didn't find a SQLite documentation about it.
I want to have some feedback/direction from the community with more experience with in-memory databases.
Thanks.
Full disclosure: I work for the vendor. eXtremeDB (an in-memory database system available since 2001) has a time-to-live (TTL) feature. It can be expressed either in terms of the maximum number of rows to be stored for a table, or the amount of elapsed time after which a row is purged (but only when a new row comes in). There are non-SQL and SQL APIs available.

what is DynamoDB's replication type?

What is the replication type of dynamoDB?
Im assuming it is peer-to-peer based on online results but can any confirm or deny?
People often assume a connection between the Dynamo Paper and DynamoDB, so they claim that DynamoDB uses leaderless replication.
However, DynamoDB is based on many principles of Dynamo, but it is not an implementation of Dynamo. See reference 4 of the Wikipedia article on Dynamo for a quote explaining that it uses single-leader replication.
Dynamo had a multi-leader design requiring the client to resolve version conflicts and DynamoDB uses synchronous replication across multiple data centers for high durability and availability.

What is the most efficient way to read N entities from an Azure Table structure

Background - will be using .NET 4.0, Azure SDK 1.7, Azure Table Storage
Problem
How to most efficiently (= fastest processing time ) to read N entries, where N is a large # (1000's to millions) of entities, and each entity is very small (<200 bytes) from a set of Azure tables, where upfront I know the PartitionID and RowID for each of the entities ie [(P1,R1),(P2,R2),...,(PN,RN)].
What is the most efficient way to 'batch' process such a request. Naturally, underneath there will be the need to async / parallelise the fetches, without causing threadlocks either through IO locks or Synchonisation locks, ideally I should see the CPU reach >80% throughput for the server making the calls to Azure Table storage, as this processing should be CPU bound vs IO or Memory bound.
Since you are asking for "fastest" processing time to read from Azure Storage, here are some general tips that made my performance improve (top ones are the most important):
Ensure the Azure Storage has been created since July 2012. This is the Gen2 of Azure Storage and it includes storage on SSD drives.
In your case, table storage has increased scalability targets for partitions for Gen2 of Azure Storage: http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
10 Gbps network vs 1 Gpbs networks
Single partition can process 20,000 entities/second
.NET default connections change this number (I think this might be addressed in the new SDK, but not sure): http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/d84ba34b-b0e0-4961-a167-bbe7618beb83
You can "warm" Azure Storage, the more transactions it sees the more of the controller/drive cache it will use. This might be expensive to constantly hit your storage in this way
You can use MULTIPLE Azure Storage accounts. This can distribute your load very efficiently (sharding): http://tk.azurewebsites.net/2012/08/26/hacking-azure-for-more-disk-performance/
You have several ways to architect/design in Table Storage. You have the partition key and the row key. However, you also have the table itself..remember this is NoSQL, so you can have 100 tables with the same structure serving different data. That can be a performance boost in itself and also you can store these tables in different Azure Storage accounts. RowKey-> PartitionKey -> Table -> Multiple Storage Accounts can all be thought of as "indexes" for faster access
I dunno your data, but since you will be searching on PartitionKey (I assume), maybe instead of storing 1,000,0000 really small records for each PartitionKey have that in zip file and fetch it real quick/unzip and then parallel-query it with linq when it is in the local server. Playing with caching always will help since you do have a lot of small objects. You could probably put entire partitions in memory. Another option might be to store a partition key with column data that is binary/comma seperated etc.
You say you are on the Azure 1.7 SDK...I had problem with it and using the StorageClient 2.0 library. I used the 1.8 SDK with the StorageClient 2.0 library. Something of note (not necessarily performance), since they may have improved efficiency of the libraries over the last 2+ years

Does non-relational OLAP Engine exist?

Like a NoSQL Database, but for OLAP. Open Source of course :)
Edit:
OLAP engines use Relational DB behind the scenes. For example SAPBW can use Oracle, etc. What I meant was an OLAP engine WITHOUT this underlying Relation DB. Sort of like a Google BigTable with OLAP functionality.
The OLAP DB can be gigantic since BigTable is about the same amount of Data, and I want to know if anybody has made a model for fusing both.
Palo is an open source OLAP engine that does not use an RDBMS as a backend. All data is stored in memory (and persisted to disk as CSV) so it's not suitable for very large amounts of data.
datacube is a new HBase-backed OLAP engine from Urban Airship.
This is like asking if there are non-brick houses. OLAP is non-relational, or rather, it's more of an application layer or optimized data structure that can live on a relational database.
You probably want to look at Pentaho - specifically Mondrian.
There are several non-relational OLAP engines. Learn about OLAP engines like 'Druid' and 'Kylin'. There is also a big-data analysis platform like Metatron Discovery that uses it.

Resources