Partitioning vs extra database - mariadb

Where i work, we got a dilemma. We are using a database(MariaDB 10) that has 1 table that is growing very large(107.4GiB as i write this,. so 1.181 million rows..). This does off course affect the performance of the system.
Me and a coworker had a discussion, he suggested using partitions on that table. This will likely increase the performance, but does not reduce the size of the DB.
In previous times, i however, have been working on writing a cronjob that will move data older then 2 years from that table to a exact copy of the database on a other location.
I feel that that is the more effective way. I expect that doing this will not only increase performance(except during the times when the cronjob is running) but i know that it will also reduce the size of the table.
We don't expect that our customers are interested in this old data anyway.
Question is: What would you choose? I prefer my option, because old data is not used anyway and it keeps the main DB a lot cleaner, my coworker prefers his solution because it means less load at all times and customers can still access the old data.
I have read some of the pro's to use partitioning but haven't found a comparison yet between partitioning and moving old data to another database/place
The table in question uses several query's, This is the most important insert:
INSERT INTO ".$defaultDataTable." (
sensor_data_type_id,
sequence_number,
value,
flag,
datetime
) VALUES (
'".Database::esc($sdtid)."',
'".Database::esc($valueSequence)."',
'".Database::esc($value)."',
'".Database::esc($valueSensorDataFlagsExtended)."',
'".Database::esc($valueDateTime)."'
);
The data is selected in several pages of the application, but 1 example is the following.
SELECT
ws_sensor_data_type.sensor_data_type_id as sensor_data_type_id,
ws_sensor_data_type.name as sensor_data_type_name,
ws_sensor_data_type.equation_id as equation_id,
ws_sensor.name as sensor_name,
ws_equation.description as data_type_name,
ws_basestation.network_id as network_id,
ws_basestation.name as basestation_name,
ws_basestation.worldwide_id as worldwide_id,
ws_client.name as client_name,
ws_sensor.device_type_id as device_type,
ws_sensor.device_id as device_id
FROM
ws_sensor_data_type,
ws_sensor,
ws_basestation,
ws_client_basestation,
ws_client,
ws_equation
WHERE ws_sensor.sensor_id = ws_sensor_data_type.sensor_id
AND ws_sensor.basestation_id = ws_basestation.basestation_id
AND ws_basestation.basestation_id = ws_client_basestation.basestation_id
AND ws_client_basestation.client_id = ws_client.client_id
AND ws_sensor_data_type.equation_id = ws_equation.equation_id
AND ws_sensor_data_type.sensor_data_type_id = '".Database::esc($sdtid)."'
");
In this example, the data, along with some other information is being selected to create a .CSV export file.
The create table statement will follow as i am creating a copy of the Development DB right now to test partitioning on.
We do not use UUID's so that should not be a problem.

It depends.
Partitioning does not inherently improve performance. Only a very limited number of use cases show any performance improvement. More details .
If you are only fetching "recent" rows from the table and you have adequate indexing, then "neither" is the answer -- your million rows could grow to a billion without any performance degradation.
If you are using UUIDs, you are doomed. Performance declines terribly once the data is too big to be cached.
You have done some "hand waving". So have I. If you want to continue this discussion, please provide more specifics. CREATE TABLE, sample queries, proposed partition mechanism, proposed mechanism for accessing 'old' data, etc.

Related

Is there any such thing as "too many indexes" when it comes to speed in SQLite3?

I wanted to improve the performance on my SQLite3 database. I went with the most extreme course of action first (just to see what would happen) and added an index to every column of every table in the database.
The database size more than doubled, and to my surprise, performance dropped drastically. Where I had previously gotten 4000 selects per second I now get ~50 selects per second.
This question is not specifically about my case. My question is; is it possible that adding indexes will decrease SELECT performance in SQLite3? I'm asking because I want to know if my problem is that I added too many indexes, or if I've made a mistake somewhere that is causing the slowdown.
To be more specific about my case: the database increased from 140 MB to 280 MB and I have an SSD.
There a mechanisms by which additional indexes could cause a slowdown:
Most optimization decisions are designed for the worst case – when you're accessing data that is too large to fit into any cache and has to be loaded from disk.
If the data itself fits into the caches, but all the various indexes used by your queries are so large that the entire working set becomes too large, you will get more swapping.
SELECT queries will ignore any indexes that are not actually used.
However, INSERT/UPDATE/DELETE statements must update all indexes of the changed table, so every additional index will slow down such changes.
Use EXPLAIN QUERY PLAN to check which indexes are actually used by a query.
Read Query Planning and The SQLite Query Planner to understand how indexes can be used.

Sqlite3 database performance

I want create database. Simple I think. Just to storage number of phone, date, time and note.
Better (for database perfomance) use new table for every phone number and notes or one table and all information in it?
The right way is to normalize your data (hence, use as much tables as needed).
If you split your data into several tables (assuming you use indexed) write performance will be better.
Regarding read performance, depends on the size of the data (namely notes), but I would argue that having more tables is also better - except if indexing is out of the question (no reason for that really) and if you would otherwise need to join tables to get data. Even then, I don't think it would be a big trade-off.
SQLite can write millions of rows/s and read another more, are you sure you want to ask this question?

Calculate at runtime vs Lookup from SQL Server Table

I have an MVC application that needs to run several tillion calculations. Of those, I am interested in only about 8 million results. I have to do this work because I need to see an overall high and low score. I will save this data, and store it is in a single table of 16 floats. I have a few indexes too on this table for lookups. So far I have only processed 5% of my data.
As users enter data into my website, I have to do calculations based on their data. I have to determine the Best and Worst outcomes. This is only about 4 million calculations. Right now, that takes about a second or less to calculate on my local PC. Or it is a simple query that will always return 2 records from my stored data. The Best and The Worst. Right now, the query to get the results is the same speed or faster than calculating the result, but I don't have all 8 million records yet. I am worried that the DB will get slow.
I was thinking I would use the Database Lookup, and if performance became an issue, switch to runtime calculation.
QUESTION: Should I just save myself the trouble and do the runtime calculation anyway?
I am not sure which option is more scalable. I don't expect a large user base for this website.
The site needs to be snappy.
Your question is a little vague to provide a clear cut answer, but my guess is using the db to calculate the totals will be far more efficient than you writing the code on the website. Sql Server will attempt to optimize the query to use as much of the server resources as possible to make it more efficient. Your code won't do that unless you specifically write it to do so.
I would start by loading the data and doing tests before making an optimization strategy. You have no idea where the real bottlenecks of the system will be before you load data that is remotely close to what you are going to have to deal with.
If I understand the question performing the calculation is more scalable has it is on that single data set. As you add data to a table even with indexes lookups will get slower. Also the indexes increase table size and increase the time required to insert a record.
If I've understood you correctly, this is a question about caching - should you calculate on the fly, or lookup the results in a cache?
In most web architectures, your SQL database is a brilliant cache, right up to the point where it becomes a terrible cache. Scaling your (SQL) database is notoriously tricky - introducing clustering, sharding etc. becomes a production in its own right.
My - very general - advice is to use your relational database for managing transactional data, and to use caching technology for caching. 8 million records should fit into RAM on a decent server these days - and you can add web servers far more cheaply than scaling your database.

Partition Or Separate Table

I am designing database for fleet management system.
I will be getting n number of records every 3 seconds. Obviously, there will be millions of record in my table where I am going to store current Information of vehicle in the current_location table. Here performance is an BIG issue.
To solve this, I received the following suggestions:
Create a separate table for each vehicle.
Here a table will be created at a run time as as soon as I click on create new table.And all the data related to particular table will be inserted and retrieve from that particular table.
Go for partition.
Please answer the following questions about these solutions.
What is difference between the two?
Which is best and why?
At what point will the number of rows in the tables cause performance issues?
Are there any other solutions?
Now ---if I go for range partition in sql server 2008 what should i do to,
partition using varchar(20).
i am planning to do partition based on vehicle no. eg MH30 q 1234.
Here In vehicle no. lets say mh30 q 1234--only 30 & q going to change....so my question is HOW SHOULD I GO. means how should write the partition function.
***1st this question was asked for my sql..now for sql server
********sorry guys now I shifted from my sql to sql server*****With The same question
definitely use partitioning. why go to all of the hassle to figure out which table to use to answer a question when mysql will do it for you? and good luck find the current location of all of your trucks if you're not using partitioning!
partitioning gives you the performance benefits of multiple tables, but with automatic pruning (selection of just the tables needed to answer the query).
nothing is ever "best". the question is: what is best for your problem?
this is impossible to answer. you will just have to monitor your system for performance issues and adjust server settings or scale as necessary.
at least as far as mysql is concerned, none as good as partitioning!
Don't bother with partitioning for 28,800 rows per day.
We don't (yet) with over 5 million per day. (The "yet" means we have no business input on what data retention policy they want)
There should be very little performance difference between making a separate table for each vehicle, and making the vehicle ID the first field in the primary key. You get the same grouping on disk either way, and mysql should have no trouble with millions of rows in a table.
Partitions are only useful if you have multiple disks on your machine and want to spread the load across disks.
So I guess my answer is do neither. Designing this in a priori seems overkill.
One thing I want to point out is that having one table (which you can partition later when you need to) will be much easier to maintain both in the database and in terms of querying the data.

How many rows can an SQLite table hold before queries become time comsuming

I'm setting up a simple SQLite database to hold sensor readings. The tables will look something like this:
sensors
- id (pk)
- name
- description
- units
sensor_readings
- id (pk)
- sensor_id (fk to sensors)
- value (actual sensor value stored here)
- time (date/time the sensor sample was taken)
The application will be capturing about 100,000 sensor readings per month from about 30 different sensors, and I'd like to keep all sensor readings in the DB as long as possible.
Most queries will be in the form
SELECT * FROM sensor_readings WHERE sensor_id = x AND time > y AND time < z
This query will usually return about 100-1000 results.
So the question is, how big can the sensor_readings table get before the above query becomes too time consuming (more than a couple seconds on a standard PC).
I know that one fix might be to create a separate sensor_readings table for each sensor, but I'd like to avoid this if it is unnecessary. Are there any other ways to optimize this DB schema?
If you're going to be using time in the queries, it's worthwhile adding an index to it. That would be the only optimization I would suggest based on your information.
100,000 insertions per month equates to about 2.3 per minute so another index won't be too onerous and it will speed up your queries. I'm assuming that's 100,000 insertions across all 30 sensors, not 100,000 for each sensor but, even if I'm mistaken, 70 insertions per minute should still be okay.
If performance does become an issue, you have the option to offload older data to a historical table (say, sensor_readings_old) and only do your queries on the non-historical table (sensor_readings).
Then you at least have all the data available without affecting the normal queries. If you really want to get at the older data, you can do so but you'll be aware that the queries for that may take a while longer.
Are you setting indexes properly? Besides that and reading http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html, the only answer is 'you'll have to measure yourself' - especially since this will be heavily dependent on the hardware and on whether you're using an in-memory database or on disk, and on if you wrap inserts in transactions or not.
That being said, I've hit noticeable delays after a couple of tens of thousands of rows, but that was absolutely non-optimized - from reading a bit I get the impression that there are people with 100's of thousands of rows with proper indexes etc. who have no problems at all.
SQLite now supports R-tree indexes ( http://www.sqlite.org/rtree.html ), ideal if you intend to do a lot of time range queries.
Tom
I know I am coming to this late, but I thought this might be helpful for anyone that comes looking at this question later:
SQLite tends to be relatively fast on reading as long as it is only serving a single application/user at a time. Concurrency and blocking can become issues with multiple users or applications accessing it at a single time and more robust databases like MS SQL Server tend to work better in a high concurrency environment.
As others have said, I would definitely index the table if you are concerned about the speed of read queries. For your particular case, I would probably create one index that included both id and time.
You may also want to pay attention to the write speed. Insertion can be fast, but commits are slow, so you probably want to batch many insertions together into one transaction before hitting commit. This is discussed here: http://www.sqlite.org/faq.html#q19

Resources