HammerDB - how to do in-memory testing with mysql, TPCC mode? - hammerdb

I was doing TPCC testing with HammerDB 3.1 and with mysql as a backend.
By looking at the resources consumption during testing, I don't think it does in-memory testing, as there is a lot of IO activity.
So my question is if with mysql server it is possible to do in-memory hammerdb testing, or how can I achieve that?
Using the default mysql configuration otherwise, based on the hammerdb documentation.

For TPCC testing on relational databases the I/O activity will be divided into 2 major areas, the data area and the redo/transaction log or WAL and both of these will be buffered in memory but with a key difference. For the data area you will have a buffer cache or pool into which you read the data blocks or pages, for MySQL and the InnoDB storage engine this is set by for example innodb_buffer_pool_size=64000M. At a basic level during a test rampup you will read most of your data blocks from disk into this buffer pool from where all of the operations on the blocks will take place in memory. Periodically the modified blocks will be written out to disk. To prevent data being lost as result of a failure all of the changes are written out to the redo log and this is flushed to disk on commit. There is an in memory buffer where changes will be queued and potentially flushed to disk together however this buffer will be small as all the changes need to reach persistent media when they happen (so a log buffer larger than 10s of MB will not be filled before it is flushed). Therefore for the TPCC test you will see a lot of write activity to the redo log. If the persistent media (HDD or SDD) cannot keep up with the writes this will be a bottleneck preventing a higher transaction rate from adding more virtual users and therefore need less memory in the data area as by default one virtual user will work mostly on 1 warehouse. If you want to increase the data area activity the "use all warehouses" check-box will increase the number of warehouses that each virtual user will use.

Related

SQLite shared cache

I have a huge (>10GB) sqlite database that is shared among many (up to CPU core count) processes (same executable). This is a specialized application so RAM is not an issue and I want to cache as much of the database in memory. I have found about PRAGMA cache_size; and I am successfully using it but this blows the RAM usage out of proportion as each of many processes has its own private cache.
Now, I found SQLite Shared-Cache Mode but I can't see if this applies to different processes or just threads in one process. I have run some tests which confirm the latter but I am not sure if I am doing something wrong or whether something else needs to be done to make this work.
That page explains that "the same cache can be shared across an entire process".
In theory, you could try to configure your OS so that the entire database is held in the file cache.
If the amount of data in individual queries is small, it might be worthwhile to use a client/server database so that the caching needs to be done only in the server process.

SQLite Concurrent Access

Does SQLite3 safely handle concurrent access by multiple processes
reading/writing from the same DB? Are there any platform exceptions to that?
If most of those concurrent accesses are reads (e.g. SELECT), SQLite can handle them very well. But if you start writing concurrently, lock contention could become an issue. A lot would then depend on how fast your filesystem is, since the SQLite engine itself is extremely fast and has many clever optimizations to minimize contention. Especially SQLite 3.
For most desktop/laptop/tablet/phone applications, SQLite is fast enough as there's not enough concurrency. (Firefox uses SQLite extensively for bookmarks, history, etc.)
For server applications, somebody some time ago said that anything less than 100K page views a day could be handled perfectly by a SQLite database in typical scenarios (e.g. blogs, forums), and I have yet to see any evidence to the contrary. In fact, with modern disks and processors, 95% of web sites and web services would work just fine with SQLite.
If you want really fast read/write access, use an in-memory SQLite database. RAM is several orders of magnitude faster than disk.
Yes it does.
Lets figure out why
SQLite is transactional
All changes within a single transaction in SQLite either occur
completely or not at all
Such ACID support as well as concurrent read/writes are provided in 2 ways - using the so-called journaling (lets call it “old way”) or write-ahead logging (lets call it “new way”)
Journaling (Old Way)
In this mode SQLite uses DATABASE-LEVEL locking.
This is the crucial point to understand.
That means whenever it needs to read/write something it first acquires a lock on the ENTIRE database file.
Multiple readers can co-exist and read something in parallel.
During writing it makes sure an exclusive lock is acquired and no other process is reading/writing simultaneously and hence writes are safe.
(This is known as a multiple-readers-single-writer or MSRW lock)
This is why here they’re saying SQlite implements serializable transactions
Troubles
As it needs to lock an entire database every time and everybody waits for a process handling writing concurrency suffers and such concurrent writes/reads are of fairly low performance
Rollbacks/outages
Prior to writing something to the database file SQLite would first save the chunk to be changed in a temporary file. If something crashes in the middle of writing into the database file it would pick up this temporary file and revert the changes from it
Write-Ahead Logging or WAL (New Way)
In this case all writes are appended to a temporary file (write-ahead log) and this file is periodically merged with the original database.
When SQLite is searching for something it would first check this temporary file and if nothing is found proceed with the main database file.
As a result, readers don’t compete with writers and performance is much better compared to the Old Way.
Caveats
SQlite heavily depends on the underlying filesystem locking functionality so it should be used with caution, more details here
You're also likely to run into the database is locked error, especially in the journaled mode so your app needs to be designed with this error in mind
Yes, SQLite handles concurrency well, but it isn't the best from a performance angle. From what I can tell, there are no exceptions to that. The details are on SQLite's site: https://www.sqlite.org/lockingv3.html
This statement is of interest: "The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more processes do not try to access the database in incompatible ways at the same time"
Nobody seems to have mentioned WAL (Write Ahead Log) mode. Make sure the transactions are properly organised and with WAL mode set on, there is no need to keep the database locked whilst people are reading things whilst an update is going on.
The only issue is that at some point the WAL needs to be re-incorporated into the main database, and it does this when the last connection to the database closes. With a very busy site you might find it take a few seconds for all connections to be close, but 100K hits per day should not be a problem.
In 2019, there are two new concurrent write options not released yet but available in separate branches.
"PRAGMA journal_mode = wal2"
The advantage of this journal mode over regular "wal" mode is that writers may continue writing to one wal file while the other is checkpointed.
BEGIN CONCURRENT - link to detailed doc
The BEGIN CONCURRENT enhancement allows multiple writers to process write transactions simultanously if the database is in "wal" or "wal2" mode, although the system still serializes COMMIT commands.
When a write-transaction is opened with "BEGIN CONCURRENT", actually locking the database is deferred until a COMMIT is executed. This means that any number of transactions started with BEGIN CONCURRENT may proceed concurrently. The system uses optimistic page-level-locking to prevent conflicting concurrent transactions from being committed.
Together they are present in begin-concurrent-wal2 or each in a separate own branch.
SQLite has a readers-writer lock on the database level. Multiple connections (possibly owned by different processes) can read data from the same database at the same time, but only one can write to the database.
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution. -- Appropriate Uses For SQLite # SQLite.org
The readers-writer lock enables independent transaction processing and it is implemented using exclusive and shared locks on the database level.
An exclusive lock must be obtained before a connection performs a write operation on a database. After the exclusive lock is obtained, both read and write operations from other connections are blocked till the lock is released again.
Implementation details for the case of concurrent writes
SQLite has a lock table that helps locking the database as late as possible during a write operation to ensure maximum concurrency.
The initial state is UNLOCKED, and in this state, the connection has not accessed the database yet. When a process is connected to a database and even a transaction has been started with BEGIN, the connection is still in the UNLOCKED state.
After the UNLOCKED state, the next state is the SHARED state. In order to be able to read (not write) data from the database, the connection must first enter the SHARED state, by getting a SHARED lock.
Multiple connections can obtain and maintain SHARED locks at the same time, so multiple connections can read data from the same database at the same time. But as long as even only one SHARED lock remains unreleased, no connection can successfully complete a write to the database.
If a connection wants to write to the database, it must first get a RESERVED lock.
Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock. RESERVED differs from PENDING in that new SHARED locks can be acquired while there is a RESERVED lock. -- File Locking And Concurrency In SQLite Version 3 # SQLite.org
Once a connection obtains a RESERVED lock, it can start processing database modification operations, though these modifications can only be done in the buffer, rather than actually written to disk. The modifications made to the readout content are saved in the memory buffer.
When a connection wants to submit a modification (or transaction), it is necessary to upgrade the RESERVED lock to an EXCLUSIVE lock. In order to get the lock, you must first lift the lock to a PENDING lock.
A PENDING lock means that the process holding the lock wants to write to the database as soon as possible and is just waiting on all current SHARED locks to clear so that it can get an EXCLUSIVE lock. No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue.
An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held.
-- File Locking And Concurrency In SQLite Version 3 # SQLite.org
So you might say SQLite safely handles concurrent access by multiple processes writing to the same DB simply because it doesn't support it! You will get SQLITE_BUSY or SQLITE_LOCKED for the second writer when it hits the retry limitation.
This thread is old but i think it would be good to share result of my tests done on sqlite:
i ran 2 instances of python program (different processes same program) executing statements SELECT and UPDATE sql commands within transaction with EXCLUSIVE lock and timeout set to 10 seconds to get a lock, and result were frustrating. Every instance did in 10000 step loop:
connect to db with exclusive lock
select on one row to read counter
update the row with new value equal to counter incremented by 1
close connection to db
Even if sqlite granted exclusive lock on transaction, the total number of really executed cycles were not equal to 20 000 but less (total number of iterations over single counter counted for both processes).
Python program almost did not throw any single exception (only once during select for 20 executions).
sqlite revision at moment of test was 3.6.20 and python v3.3 CentOS 6.5.
In mine opinion it is better to find more reliable product for this kind of job or restrict writes to sqlite to single unique process/thread.
It is natural when you specify the name for db or even in memory db if you have concurrent access (specially write) you will get this.
In my case, I am using Sqlite for testing and it is because there are several tests in the same solution it happens.
You can have two improvements:
Delete before creating db.Database.EnsureDeletedAsync();
Use an empty string for connection, in this case it will create a random name each call:
{
"ConnectionStrings": {
"ConnectionType": "sqlite",
"ConnectionString": ""
}
}

Optimizing Put Performance in Berkeley DB

I just started playing with Berkeley DB a few days ago so I'm trying to see if there's something I've been missing when it comes to storing data as fast as possible.
Here's some info about the data:
- it comes in 512 byte chunks
- chunks come in order
- chunks will be deleted in FIFO order
- if i lose some data off the end because of power failure that's ok as long as the whole db isn't broken
After reading the a bunch of the documentation it seemed like a Queue db was exactly what I wanted.
However, after trying some test code my fastest results were about 1MByte per second just looping through a DB->put with DB_APPEND set. I also tried using transactions and bulk puts but both of these slowed things down considerably so I didn't pursue them for much time. I was inserting into a fresh db created on a NANDFlash chip on my Freescale i.MX35 dev board.
Since we're looking to get at least 2MBytes per second write speeds, I was wondering if there's something I missed that can improve my speeds since I know that my hardware can write faster than this.
Try putting this into your DB_CONFIG:
set_flags DB_TXN_WRITE_NOSYNC
set_flags DB_TXN_NOSYNC
From my experience, these increase write performance a lot.
DB_TXN_NOSYNC
If set, Berkeley DB will not write or synchronously flush the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the application or system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how many log updates can fit into the log buffer, how often the operating system flushes dirty buffers to disk, and how often the log is checkpointed
Calling DB_ENV->set_flags with the DB_TXN_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
DB_TXN_WRITE_NOSYNC
If set, Berkeley DB will write, but will not synchronously flush, the log on transaction commit or prepare. This means that transactions exhibit the ACI (atomicity, consistency, and isolation) properties, but not D (durability); that is, database integrity will be maintained, but if the system fails, it is possible some number of the most recently committed transactions may be undone during recovery. The number of transactions at risk is governed by how often the system flushes dirty buffers to disk and how often the log is checkpointed.
Calling DB_ENV->set_flags with the DB_TXN_WRITE_NOSYNC flag only affects the specified DB_ENV handle (and any other Berkeley DB handles opened within the scope of that handle). For consistent behavior across the environment, all DB_ENV handles opened in the environment must either set the DB_TXN_WRITE_NOSYNC flag or the flag should be specified in the DB_CONFIG configuration file.
The DB_TXN_WRITE_NOSYNC flag may be used to configure Berkeley DB at any time during the life of the application.
See http://www.mathematik.uni-ulm.de/help/BerkeleyDB/api_c/env_set_flags.html for more details.
I suggest you must use transactions / TDS datastore if as you mention you cannot recreate a database (i.e. it isnt just a local cache) if it gets corrupted. If you dont care about loosing a few items in event of a crash/power outage then DB_TXN_WRITE_NOSYNC will improve TDS performance, you database will still be integral and recoverable.
If you store using BTREE and a numeric index (if you have no natural key) and watch out for endian issues so you get good key locality and high page utilization then you should be able to get way more than 2000 inserts a second, especially to SSD, especially if you Use DbMultileKeyDataBuilder to do bulk inserts.

Available RAM on shared hosting provider

I'm building business app that will hold somewhere between 50,000 to 150,000 companies. Each company (db row) is represented with 4-5 properties/columns (title, location,...). ORM is LINQ2SQL.
I have to do some calculation, and for that I have lot of queries for specific company. Now, i go to db every time when i need something, and it produces 50-200 queries, depending on calculation complexy. I tried to put all companies to cache, and for 10,000 rows (companies) in db, it takes around 5,5MB of cache. In this scenario, I have only one query.
This application will be on shared hosting server, so my resources are limited. I'm interested, what will happen if I try to load, let say 100,000 companies (rows, objects)? Or put that in cache?
Is there any RAM limit that average hosting company give to ASP.NET application? Does it depend on dedicated Applcation Pool (I can put app to dedicated pool)?
Options are:
- load whole table to c# objects. Id did some memory profiling, 10,000 objects needs 5MB RAM
- query db to get referenced objects when needed.
Task is: for given company A, build tree of connected companies.
Table and columns:
Company : IdCompany, Title, Address, Contact
CompanyConnection: IdParentCompany, IdChildCompany
Your shared host will likely be IIS 7 on Windows Server running as a virtual machine. This machine will behave as any ordinary machine would - it is not 'aware' of being shared or virtualised.
You should expect Windows to begin paging to disk when it is out of physical RAM and then out of memory errors only get thrown only when the page file has filled the disk. Of course, you don't ever want to page any part of the warm cache to disk.
Windows itself can begin nagging you about being out of memory, but this is not the same 'urgency' and applications will continue to be able to request RAM and it will continue being given (albeit serviced from the page file).
If you application could crash and leave corrupt state or a partial transaction, then you should code defensively and check memory is available before embarking upon an action.
Create the expected number of objects in a loop with pretend data and watch the memory consumption on the box - the Working Set of the worker process is the one to watch. You can do this in Task Manager.
Watch for Page Faults. These are events when a memory operation had to be directed to disk.
Also, very large sets of objects can cause long garbage collection cycles >1second. This can be a big issue in time-sensitive applications like trading and market data.
Hope that helps.
Update: I do a similar caching thang for a mega data-mining application.
Each ORM type has a GetObject method which uses a giant cache or goes to disk and then updates the cache: Person.GetPerson( check people cache, go to db, add to people cache )
Now my queries return just the unique keys of the results. Then each key is fetched using the above method. This is slow initially until the cache builds up but...
The point being that each query result points to the same instance in memory! This means the RAM footprint is much smaller due to sharing.
The query results are then cached, too. Of course.
Where objects are not immutable, each object-write updates its own instance in the giant cache but also causes all query caches that concern that type of object to void themselves!
Of course, in this application, writes are rare as its mainly reference data.

Cache Persistence

I am using the Asp.Net Caching mechanism for a highly frequent changing web app.
the cache holds chat participants and their messages, and it needs to keep track of
participants presence.
Data is changing very frequently, participants go in and out, and messages are sent and recieved.
The cache provides me with solutions for:
- performance
- reducing the number of DDL operations in the database (SQL Server) - we had a problem with the transaction log getting full.
I want to continue working this way, but I cannot rely on the cache (I can lose all data when the Cache recycles, or some of the data when the memory gets full).
The option I see right now is to save the data to the database every time the cache changes, otherwise I will lose data.
But this means many SQL update/insert statements.
someone adviced me to persist to the database evey N messages/changes, but it's still not a reliable solution. I still lose data.
Anyone has an idea?
Thanks
Yaron
Fix your database capacity issues. If you need to be able to reliably save n changes per second, then your database needs to be able to handle n operations per second.
Anything else (including saving every few operations) will lead to some possibility of data loss. If you can accept that data loss risk, then that would work.
A distributed cache (project Velocity or otherwise) could also help (data is at least saved to multiple machines). But that needs extra hardware, and you could spend that on the capacity of the database.
Finally, rather than trying to cache changes look for other opportunities to cache database reads, taking that load off might allow the writes to go through. At least until you get more usage.

Resources