websql performance, can we shard tables - sqlite

I am using websql to store data in a phonegap application. One of table have a lot of data say from 2000 to 10000 rows. So when I read from this table, which is just a simple select statement it is very slow. I then debug and found that as the size of table increases the performance deceases exponentially. I read somewhere that to get performance you have to divide table into smaller chunks, is that possible how?

One idea is to look for something to group the rows by and consider breaking into separate tables based on some common category - instead of a shared table for everything.
I would also consider fine tuning the queries to make sure they are optimal for the given table.
Make sure you're not just running a simple Select query without a where clause to limit the result set.

Related

Should I use WITH instead of a JOIN on a table with a lot of data?

I have a MariaDB table which contains a lot of metadata and is very big in terms of bytes.
I have columns A, B in that table a long with other columns.
I would like to join that table with another table (stuff) in order to get column C from it.
So I have something like:
SELECT metadata.A, metadata.B, stuff.C FROM metadata JOIN
stuff on metadata.D = stuff.D
This query takes a very long time sometimes, I suspect its because (AFAIK, please correct me if Im wrong) that JOIN stores the result of the join in some side table and because metadata table is very big it has to copy a lot of data even though I dont use it, so I thought about optimizing it with WITH as follows:
WITH m as (SELECT A,B,D FROM metadata),
s as (SELECT C,D FROM stuff)
SELECT * FROM m JOIN s ON m.D = s.D;
The execution plan is the same (using EXPLAIN) but I think it will be faster since the side tables that will be created by WITH (again AFAIK WITH also creates side tables, please correct me if Im wrong) will be smaller and only contain the needed data.
Is my logic correct? Is there some way I can test that in MariaDB?
More likely, there is some form of cache speeding up one query or the other.
The Query cache is usually recognizable by a query time that is only about 1ms. It can be turned off via SELECT SQL_NO_CACHE ... to get a timing to compare against.
The other likely cache is the buffer_pool. Data is read from disk into the buffer_pool unless it is already there. The simple workaround for strange timings is to run the query twice and take the second 'time'.
Your hypothesis that WITH creates 'small' temp tables falls apart because of the work that is needed to read the original tables is the same with or without WITH.
Please provide SHOW CREATE TABLE for the two tables. There are a couple of datatype issues that may be involved -- big TEXTs or BLOBs.
The newly-added WITH opens up the possibility of recursive CTEs (and other things). And it provides a way to materialize a temp table that is used more than once. Neither of those applies in your query, so I would not expect any performance improvement.

Is there any such thing as "too many indexes" when it comes to speed in SQLite3?

I wanted to improve the performance on my SQLite3 database. I went with the most extreme course of action first (just to see what would happen) and added an index to every column of every table in the database.
The database size more than doubled, and to my surprise, performance dropped drastically. Where I had previously gotten 4000 selects per second I now get ~50 selects per second.
This question is not specifically about my case. My question is; is it possible that adding indexes will decrease SELECT performance in SQLite3? I'm asking because I want to know if my problem is that I added too many indexes, or if I've made a mistake somewhere that is causing the slowdown.
To be more specific about my case: the database increased from 140 MB to 280 MB and I have an SSD.
There a mechanisms by which additional indexes could cause a slowdown:
Most optimization decisions are designed for the worst case – when you're accessing data that is too large to fit into any cache and has to be loaded from disk.
If the data itself fits into the caches, but all the various indexes used by your queries are so large that the entire working set becomes too large, you will get more swapping.
SELECT queries will ignore any indexes that are not actually used.
However, INSERT/UPDATE/DELETE statements must update all indexes of the changed table, so every additional index will slow down such changes.
Use EXPLAIN QUERY PLAN to check which indexes are actually used by a query.
Read Query Planning and The SQLite Query Planner to understand how indexes can be used.

Should I be worried about the settings table getting huge?

I have got a pretty fat settings table in SQL Server 2012, now with over 100 columns. As the name suggests, this table keeps track of all kinds of setting values within our website. It used to be having less than 50 columns but now its size is doubled.
The reason why I store setting values into database is because users will need to have ability to change these settings via UI.
Should I really be worried about this table getting bigger and bigger over time? Or I will have to find some other ways to store settings data, e.g save into files, perhaps?
First, you don't need to store settings in a database in order to update them at runtime by users. You can simply store them in a settings file that gets updated whenever the user makes changes. This is an xml config file and works well.
If, however, the application is network based, and you want the settings to follow the user from machine to machine, it makes more sense to put it in a database.
Second, yes... 100 columns is huge. Instead of storing each setting in a separate column, you might consider storing each setting in a separate row, and then have a common row format which is ID, SettingName, SettingValue, (maybe) DefaultValue. Then your table can grow as large as you like.
We are using JSON to store user settings. The table obtains only two columns - the user Id and the setting string. This string is quite long, but it doesn't matter. You can also use XML to store this data.
This is worse solution to modify data by finger, but faster to get from your DB and process by the client or by the ASP.NET server.
I am imagining that you are concerned about performance on huge tables?
One question is how many rows in this table? 100 columns with 10000 rows is not real problem. 100 columns over 10million rows is a slightly different ballgame. Not worse of better, just different.
The same considerations apply for small and large tables:
1. Are you indexing properly
2. Is your IO fine
3. Is your space fine
4. Are you querying efficiently
There is no right answer for this, it would depend of why you have big column counts and whether it's hitting your overall performance.
We run 1000s of tables with > 150 columns and no problems, even with millions of rows between them and I can't complain about performance.
And this is relatively de-normalized data, so lots of text.

sqlite3 insert into dynamic table

I am using sqlite3 (maybe sqlite4 in the future) and I need something like dynamic tables.
I have many tables with the same format: values_2012_12_27, values_2012_12_28, ... (number of tables is dynamic) and I want to select dynamically the table that receives some data.
I am using _sqlite3_prepare with INSERT INTO ? VALUES(?,?,?). Ofcourse this fails to compile (syntax error near ?). There is a nice and simple way to do this in sqlite ?
Thanks
Using SQL parameters is not possible for identifiers such as table or column names.
If you don't want to keep so many prepared statements around, just prepare them on the fly whenever you need one.
If your database were properly normalized, you would have a single big values table with an extra date column.
This organization is usually to be preferred, unless you have measured both and found that the better performance (if it actually exists) outweighs the overhead of managing multiple tables.

Database design question: How to handle a huge amount of data in Oracle?

I have over 1.500.000 data entries and it's going to increase gradually over time. This huge amount of data would come from 150 regions.
Now should I create 150 tables to manage this increasing huge data? Will this be efficient? I need fast operation. ASP.NET and Oracle will be used.
If all the data is the same, don't split it in to different tables. Take a look at Oracle's table partitions. One-hundred fifty partitions (or more) split out by region (or more) is probably more in line with what you're going to be looking for.
I would also recommend you look at the Oracle Database Performance Tuning Tips & Techniques book and browse Ask Tom on Oracle's website.
Only 1.5 M rows? Not a lot really...
Use one table; working out how to write a 150-way union across 150 tables will be murder.
1.5 million rows doesn't really seem like that much. How many people are accessing the table(s) at any given point? Do you have any indexes setup? If you expect it to grow much larger, you may want to look into partitioning in databases.
FWIW, I work with databases on a regular basis with 100M+ rows. It shouldn't be this bad unless you have thousands of people using it at a time.
1 table per region is way not normalized; you're probably going to lose a bunch of efficiency there. 1 table per data entry site is pretty unusual too. Normalization is huge, it will save you a ton of time down the road, so I'd make sure you're not storing any duplicate data.
If you're using oracle, you shouldn't need to have multiple tables. It'll support a lot more than 1.5 million rows. If you need to speed up data access, you can try a snowflake schema to pull in commonly accessed data.
If you mean 1,500,000 rows in a table then you do not have much to worry about. Oracle can handle much larger loads than that with ease.
If you need to identify the regions that the data came in, you can create a Region table and tie the ID from that to the big data table.
IMHO, you should post more details and we can help you better.
A database with 2,000 rows can be slow. It all depends on your database design, index, keys and most important is the hardware configuration your database server is running on. The way your application uses this data is also important. Is a read intensive database or transaction intensive? There is no right answer to what you are asking right now.
You first need to consider what operations are going to access the table. How will inserts be performed? Will the existing rows be updated, and if so how? By how much will the rows grow, and what percentage of them will grow? Will rows get deleted? By what criteria? How will you be selecting data? By what criteria and how many per query?
Data partition can be used for volume of data much larger than 1.5m rows. Look into optimizing
the SQL query ,batch processing and storage of data.

Resources