Database design question: How to handle a huge amount of data in Oracle? - asp.net

I have over 1.500.000 data entries and it's going to increase gradually over time. This huge amount of data would come from 150 regions.
Now should I create 150 tables to manage this increasing huge data? Will this be efficient? I need fast operation. ASP.NET and Oracle will be used.

If all the data is the same, don't split it in to different tables. Take a look at Oracle's table partitions. One-hundred fifty partitions (or more) split out by region (or more) is probably more in line with what you're going to be looking for.
I would also recommend you look at the Oracle Database Performance Tuning Tips & Techniques book and browse Ask Tom on Oracle's website.

Only 1.5 M rows? Not a lot really...
Use one table; working out how to write a 150-way union across 150 tables will be murder.

1.5 million rows doesn't really seem like that much. How many people are accessing the table(s) at any given point? Do you have any indexes setup? If you expect it to grow much larger, you may want to look into partitioning in databases.
FWIW, I work with databases on a regular basis with 100M+ rows. It shouldn't be this bad unless you have thousands of people using it at a time.

1 table per region is way not normalized; you're probably going to lose a bunch of efficiency there. 1 table per data entry site is pretty unusual too. Normalization is huge, it will save you a ton of time down the road, so I'd make sure you're not storing any duplicate data.
If you're using oracle, you shouldn't need to have multiple tables. It'll support a lot more than 1.5 million rows. If you need to speed up data access, you can try a snowflake schema to pull in commonly accessed data.

If you mean 1,500,000 rows in a table then you do not have much to worry about. Oracle can handle much larger loads than that with ease.
If you need to identify the regions that the data came in, you can create a Region table and tie the ID from that to the big data table.
IMHO, you should post more details and we can help you better.

A database with 2,000 rows can be slow. It all depends on your database design, index, keys and most important is the hardware configuration your database server is running on. The way your application uses this data is also important. Is a read intensive database or transaction intensive? There is no right answer to what you are asking right now.

You first need to consider what operations are going to access the table. How will inserts be performed? Will the existing rows be updated, and if so how? By how much will the rows grow, and what percentage of them will grow? Will rows get deleted? By what criteria? How will you be selecting data? By what criteria and how many per query?

Data partition can be used for volume of data much larger than 1.5m rows. Look into optimizing
the SQL query ,batch processing and storage of data.

Related

SQLite with large data, is it best to split

I am using SQLite because this is for cross-platform. I have about 10 tables with a small amount of data (maybe a few dozen rows each), but then I also have a set of data which might have a million or more rows.
The small dataset isn't really modified that much, just queried, but the large data set will be queried and modified frequently.
Rather than have a single SQLite database with all the tables in it, I was wondering if splitting it into two databases might be smartest.
Basically I'd have one database, lets call it "settings", with the 10 tables in it. I'd then have another database, lets call it "userdata", with the million rows.
I'll be creating a third database called "audits" where I record each change to the "userdata" database. This database is expected to grow (for a short time period).
I am just wondering if people have an opinion as to whether it is a good idea to split my data into multiple databases or if I should just have one massive one.
My thinking is the queries on the "userdata" database might be slightly more efficient since it will only have one table.
Note, this is not for long-term. It is for a short period of time. It will be queried and edited for about a week, then it is done.

websql performance, can we shard tables

I am using websql to store data in a phonegap application. One of table have a lot of data say from 2000 to 10000 rows. So when I read from this table, which is just a simple select statement it is very slow. I then debug and found that as the size of table increases the performance deceases exponentially. I read somewhere that to get performance you have to divide table into smaller chunks, is that possible how?
One idea is to look for something to group the rows by and consider breaking into separate tables based on some common category - instead of a shared table for everything.
I would also consider fine tuning the queries to make sure they are optimal for the given table.
Make sure you're not just running a simple Select query without a where clause to limit the result set.

Split large sqlite table by sessionid field

I am relatively new to sql(ite), and I'm learning as I go while working on a new project.
We have got millions of transaction rows in one "data" table, one field being a "sessionid" field.
Since I want to concentrate on in-session activity for now, I primarily need to look only at transactions from the same sessions.
My intuition now is, that it would be a lot faster if I separate the database by sessions into many single session tables, than always querying for a single sessionid, and then proceeding. My question: is that correct? will that make a difference?
Even if not: Could you help me out and tell me, how I could split the one "data" table rows into many session-specific tables, the rows staying the same? Plus one table which relates sessionIds to their tables?
Thanks!
A friend just told me, the splitting-into-tables thing would be extremely unflexible, and I should try adding a distinct index instead for the different sessionId rows to access single sessions faster. Any thoughts on that and how to do it best?
First of all, are you having any specific performance bottleneck with it till now? If yes, please describe it.
Having one table per session will probably speed lookups/indexes (for INSERTs) things up.
SQLite doesn't impose a limit on the number of tables, so you should be okay.
One other solution that provides easier maintenance, is if you create one table per day/week.
Depending on how long your sessions last, this could be feasible or not.
Related: https://stackoverflow.com/a/811862/89771

Calculate at runtime vs Lookup from SQL Server Table

I have an MVC application that needs to run several tillion calculations. Of those, I am interested in only about 8 million results. I have to do this work because I need to see an overall high and low score. I will save this data, and store it is in a single table of 16 floats. I have a few indexes too on this table for lookups. So far I have only processed 5% of my data.
As users enter data into my website, I have to do calculations based on their data. I have to determine the Best and Worst outcomes. This is only about 4 million calculations. Right now, that takes about a second or less to calculate on my local PC. Or it is a simple query that will always return 2 records from my stored data. The Best and The Worst. Right now, the query to get the results is the same speed or faster than calculating the result, but I don't have all 8 million records yet. I am worried that the DB will get slow.
I was thinking I would use the Database Lookup, and if performance became an issue, switch to runtime calculation.
QUESTION: Should I just save myself the trouble and do the runtime calculation anyway?
I am not sure which option is more scalable. I don't expect a large user base for this website.
The site needs to be snappy.
Your question is a little vague to provide a clear cut answer, but my guess is using the db to calculate the totals will be far more efficient than you writing the code on the website. Sql Server will attempt to optimize the query to use as much of the server resources as possible to make it more efficient. Your code won't do that unless you specifically write it to do so.
I would start by loading the data and doing tests before making an optimization strategy. You have no idea where the real bottlenecks of the system will be before you load data that is remotely close to what you are going to have to deal with.
If I understand the question performing the calculation is more scalable has it is on that single data set. As you add data to a table even with indexes lookups will get slower. Also the indexes increase table size and increase the time required to insert a record.
If I've understood you correctly, this is a question about caching - should you calculate on the fly, or lookup the results in a cache?
In most web architectures, your SQL database is a brilliant cache, right up to the point where it becomes a terrible cache. Scaling your (SQL) database is notoriously tricky - introducing clustering, sharding etc. becomes a production in its own right.
My - very general - advice is to use your relational database for managing transactional data, and to use caching technology for caching. 8 million records should fit into RAM on a decent server these days - and you can add web servers far more cheaply than scaling your database.

Partition Or Separate Table

I am designing database for fleet management system.
I will be getting n number of records every 3 seconds. Obviously, there will be millions of record in my table where I am going to store current Information of vehicle in the current_location table. Here performance is an BIG issue.
To solve this, I received the following suggestions:
Create a separate table for each vehicle.
Here a table will be created at a run time as as soon as I click on create new table.And all the data related to particular table will be inserted and retrieve from that particular table.
Go for partition.
Please answer the following questions about these solutions.
What is difference between the two?
Which is best and why?
At what point will the number of rows in the tables cause performance issues?
Are there any other solutions?
Now ---if I go for range partition in sql server 2008 what should i do to,
partition using varchar(20).
i am planning to do partition based on vehicle no. eg MH30 q 1234.
Here In vehicle no. lets say mh30 q 1234--only 30 & q going to change....so my question is HOW SHOULD I GO. means how should write the partition function.
***1st this question was asked for my sql..now for sql server
********sorry guys now I shifted from my sql to sql server*****With The same question
definitely use partitioning. why go to all of the hassle to figure out which table to use to answer a question when mysql will do it for you? and good luck find the current location of all of your trucks if you're not using partitioning!
partitioning gives you the performance benefits of multiple tables, but with automatic pruning (selection of just the tables needed to answer the query).
nothing is ever "best". the question is: what is best for your problem?
this is impossible to answer. you will just have to monitor your system for performance issues and adjust server settings or scale as necessary.
at least as far as mysql is concerned, none as good as partitioning!
Don't bother with partitioning for 28,800 rows per day.
We don't (yet) with over 5 million per day. (The "yet" means we have no business input on what data retention policy they want)
There should be very little performance difference between making a separate table for each vehicle, and making the vehicle ID the first field in the primary key. You get the same grouping on disk either way, and mysql should have no trouble with millions of rows in a table.
Partitions are only useful if you have multiple disks on your machine and want to spread the load across disks.
So I guess my answer is do neither. Designing this in a priori seems overkill.
One thing I want to point out is that having one table (which you can partition later when you need to) will be much easier to maintain both in the database and in terms of querying the data.

Resources