MySQL 5.0, InnoDB Table, Slow inserts when heavy traffic. - innodb

I have INNODB table that stores user navigation details once user logs in.
I have simple INSERT statement for this purpose.
But sometimes this INSERT will take 15-24 secs when there is heavy traffic otherwise for single user it comes in micro seconds.
Server has 2GB RAM.
Below is MySQL configuration details:
max_connections=500
# You can set .._buffer_pool_size up to 50 - 80 % of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 800M
innodb_additional_mem_pool_size = 20M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 200M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
table_cache = 90
query_cache_size = 256M
query_cache_limit = 256M
thread_cache_size = 16
sort_buffer_size = 64M
innodb_thread_concurrency=8
innodb_flush_method=O_DIRECT
innodb_buffer_pool_instances=8
Thanks.

As a first measure have you considered updating? 5.0 is old. It's end of product lifecycle was reached. There has not been any changes to it since two years. There were made serious improvements to different aspects of the whole DBMS in versions 5.1 and 5.5. You should seriously consider upgrading.
You might want to try the tuning primer as another direction in what options you can change.
You can also check with SHOW FULL PROCESSLIST in what state single threads of MySQL are hanging. Maybe you spot something relevant.

Related

Mariadb in raid 10 NVME SSD slow read speeds

It would seem to me that we have a bottleneck we just cant seem to get over.
We have a setup which contains 4 NVME drives in Raid 10
We are using mariadb 10.4
We have indexes
The workload that we have will 99% of the time be IO bound there is no way around that fact
What I have seen while watching the performance dashboard in mysql workbench is that both the SATA SSD and NVME SSD read at about 100MB for the same data set
Now if I am searching through 200M rows(or pulling 200M) I would think that the Innodb disk read would go faster then 100MB
I mean these drives should be capable of reading 3GB(s) so I would at least expect to see like 500MB(s)
The reality here is that I am seeing the exact same speed on the NVME that I see on the SATA SSD
So the question I have is how do I get these disk to be fully utilized
Here is the only config settings outside of replication
sql_mode = 'NO_ENGINE_SUBSTITUTION'
max_allowed_packet = 256M
innodb_file_per_table
innodb_buffer_pool_size = 100G
innodb_log_file_size = 128M
innodb_write_io_threads = 16 // Not sure these 2 lines actually do anything
innodb_read_io_threads = 16
IO bound there is no way around that fact
Unless you are very confident on the suitability of indexes this seems a little presumptuous.
Assuming your right, this would imply a 100% write workload, or a data size orders of magnitude higher that RAM available and a uniform distribution of small accesses?
innodb_io_capacity is providing a default limitation and your hardware is capable of more.
Also if you are reading so frequently, your innodb_buffer_pool_size isn't sufficient.

The first DB call is much slower

First call at morning takes 15 seconds,
FOR EACH ... NO-LOCK:
END.
the second call takes only 1,5 seconds.
What causes this delay?
What can I log to identify it?
Even when I restart the DB I can't reproduce the behaviour of the first call.
(In complex queries I measure difference of 15 minutes to 2 seconds)
The most likely cause for this will be caching. There are two caches in place:
The -B buffer pool of the database which caches database blocks in memory. It is a typical observation, that once this cache is warmed up after a restart of the DB server queries are executing much faster. Of course this all depends on the size of your DB and the size of the -B buffer pool. Relatively small databases may fit into a relatively large -B buffer pool in large parts
The OS disk cache will also play it's part in your observation

Thrift driver OutOfMemory when running multiple Hive queries simultaneously

we use Spark2 Thrift in order to run Hive queries.
Thrift comes as part of the HDP 2.6 and our spark version is 2.1.0.2.6.0.3-8.
The more queries we run simultaneously, the faster we encounter OOM in the driver. These queries also contain JOINs and UNIONs.
from the jstat is seems there's no memory leak, however no matter how much memory is given to the driver, it seems it's never enough. The more queries that are run simultaneously, the faster Thrift driver starts to perform full GC until it crashes, since the full GC can't clean the old memory (since it's being used).
The OOM never occurs in the executors, only in the driver.
Does anyone work with Thrift over spark and encounters this problem? and if so - how can the Thrift driver be configured not to crash on OOM when running several queries simultaneously?
These are the configurations we use:
Thrift spark driver:
spark.driver.memory=15g
Thrift spark executors:
spark.executor.memory=10g
num cores = 7
config params from /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-sparkconf.conf:
spark.broadcast.blockSize 32m
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.driver.maxResultSize 0
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 45s
spark.dynamicAllocation.initialExecutors 2
spark.dynamicAllocation.maxExecutors 15
spark.dynamicAllocation.minExecutors 0
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.executor.memory 10g
spark.files.maxPartitionBytes 268435456
spark.files.openCostInBytes 33554432
spark.hadoop.cacheConf false
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.kryoserializer.buffer.max 2000m
spark.master yarn-client
spark.memory.offHeap.enabled true
spark.memory.offHeap.size 104857600
spark.scheduler.allocation.file /usr/hdp/current/spark2-thriftserver/conf/spark-thrift-fairscheduler.xml
spark.scheduler.mode FAIR
spark.shuffle.service.enabled true
spark.sql.autoBroadcastJoinThreshold 1073741824
spark.sql.shuffle.partitions 100
spark.storage.memoryMapThreshold 8m
Try changing the scheduler mode to FIFO.
Also, dont forget there are 2 different zones in memory :
- storage
- execution
Storage will use by défaut 60% of driver memory, so if you never cache data, decrease it to give more memory where its needed (they say its done automatically but ...).
Try decreasing spark shuffle partition to 100 then 10 if possible.
Try offheap (never tested but could help).

Encrypted Vs Unencrypted EBS Volumes AWS

We are testing standard EBS volume, EBS volume with encryption on EBS optimized m3.xlarge EC2 instance.
While analyzing the test results, we came to know that
EBS volume with encryption is taking lesser time during read, write, read/write operations as compared to EBS without encryption.
I think there will be an effect of latency on encrypted EBS volume because of extra encryption overhead on every I/O request.
What will be the appropriate reason why EBS encrypted volumes are faster than normal EBS volumes??
Expected results should be that EBS should yield better results that Encrypted EEBS.
Results :
Encrpted EBS results:
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Initializing random number generator from timer.
Extra file open flags: 16384
8 files, 512Mb each
4Gb total file size
Block size 16Kb
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Threads started!
Done.
Operations performed: 0 Read, 262144 Write, 8 Other = 262152 Total
Read 0b Written 4Gb Total transferred 4Gb (11.018Mb/sec)
705.12 Requests/sec executed
Test execution summary:
total time: 371.7713s
total number of events: 262144
total time taken by event execution: 2973.6874
per-request statistics:
min: 1.06ms
avg: 11.34ms
max: 3461.45ms
approx. 95 percentile: 1.72ms
EBS results:
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 8
Initializing random number generator from timer.
Extra file open flags: 16384
8 files, 512Mb each
4Gb total file size
Block size 16Kb
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing sequential write (creation) test
Threads started!
Done.
Operations performed: 0 Read, 262144 Write, 8 Other = 262152 Total
Read 0b Written 4Gb Total transferred 4Gb (6.3501Mb/sec)
406.41 Requests/sec executed
Test execution summary:
total time: 645.0251s
total number of events: 262144
total time taken by event execution: 5159.7466
per-request statistics:
min: 0.88ms
avg: 19.68ms
max: 5700.71ms
approx. 95 percentile: 6.31ms
please help me resolve this issue.
That's certainly unexpected conceptually and also confirmed by Amazon EBS Encryption:
[...] and you can expect the same provisioned IOPS performance on encrypted volumes as you would with unencrypted volumes with a minimal effect on latency. You can access encrypted Amazon EBS volumes the same way you access existing volumes; encryption and decryption are handled transparently and they require no additional action from you, your EC2 instance, or your application. [...] [emphasis mine]
Amazon EBS Volume Performance provides more details on EBS performance in general - from that angle, but pure speculation, maybe the use of encryption implies some default Pre-Warming Amazon EBS Volumes:
When you create any new EBS volume (General Purpose (SSD), Provisioned IOPS (SSD), or Magnetic) or restore a volume from a snapshot, the back-end storage blocks are allocated to you immediately. However, the first time you access a block of storage, it must be either wiped clean (for new volumes) or instantiated from its snapshot (for restored volumes) before you can access the block. This preliminary action takes time and can cause a 5 to 50 percent loss of IOPS for your volume the first time each block is accessed. [...]
Either way, I suggest to rerun the benchmark after pre-warming both new EBS volumes, in case you haven't done so already.

APC is quick - then bogs down server

I am tentatively serving up a crap site with a good amount of traffic until new development is finished.
The server is a 4GB cloud server with 2 CPU cores running NGINX, PHP-FPM, APC, Memcached and a cloud database server (Rackspace).
The site, to let you know how bad it is, gave me an uncached load of 1.2 with JUST be roaming around on it quickly. Terrible. 170 queries per page, some with 2000 records or more. Terrible.
So, being on Joomla, I enabled APC. QUICKLY snapped up the site to more than livable while we develop.
Now the site is live and consistently has 30 - 60 live visitors according to GA.
Here's the weird part. Regardless if I use APC or Memcached, as the site runs quickly at first after resetting php-fpm.. and then it goes for a while and gradually loads up and the CPU is at 1.x, 2.x and upward gradually. Never coming back down even after visits subdue a bit.
Why is this happening? I've scoured the internet looking for any consistent direction for php-fpm settings, APC settings, etc.. It's so mis-mosh out there so Im hoping for some sound advice on calculating and determining what settings need to be as the demand changes.
Below are my settings - at this point the only thing I can think of would be to CRON "service php-fpm restart" every 30 minutes or so.
[APC]
apc.stat = 1
apc.max_file_size = 2M
apc.localcache = 1
apc.localcache.size = 128M
apc.shm_segments = 1
apc.ttl = 3600
apc.user_ttl = 600
apc.gc_ttl = 3600
apc.cache_by_default = 1
apc.filters =
apc.write_lock = 1
apc.num_files_hint = 7000
apc.user_entries_hint = 5000
apc.shm_size = 64M
apc.mmap_file_mask = /tmp/apc.XXXXXX
apc.include_once_override = 0
apc.file_update_protection = 2
apc.canonicalize = 1
apc.report_autofilter = 0
apc.stat_ctime = 0
apc.stat = 0
(this also ends up fragmenting pretty hard - I have apc.php available if anyone needs more information)
pm = dynamic
pm.max_children = 80
pm.start_servers = 32
pm.min_spare_servers = 16
pm.max_spare_servers = 56
pm.max_requests = 1000
(I've played with these.. never seems to make much difference but I don't think I've found any sound advice either)
Any help or pointers would be greatly appreciated :-/

Resources