I have got a pretty fat settings table in SQL Server 2012, now with over 100 columns. As the name suggests, this table keeps track of all kinds of setting values within our website. It used to be having less than 50 columns but now its size is doubled.
The reason why I store setting values into database is because users will need to have ability to change these settings via UI.
Should I really be worried about this table getting bigger and bigger over time? Or I will have to find some other ways to store settings data, e.g save into files, perhaps?
First, you don't need to store settings in a database in order to update them at runtime by users. You can simply store them in a settings file that gets updated whenever the user makes changes. This is an xml config file and works well.
If, however, the application is network based, and you want the settings to follow the user from machine to machine, it makes more sense to put it in a database.
Second, yes... 100 columns is huge. Instead of storing each setting in a separate column, you might consider storing each setting in a separate row, and then have a common row format which is ID, SettingName, SettingValue, (maybe) DefaultValue. Then your table can grow as large as you like.
We are using JSON to store user settings. The table obtains only two columns - the user Id and the setting string. This string is quite long, but it doesn't matter. You can also use XML to store this data.
This is worse solution to modify data by finger, but faster to get from your DB and process by the client or by the ASP.NET server.
I am imagining that you are concerned about performance on huge tables?
One question is how many rows in this table? 100 columns with 10000 rows is not real problem. 100 columns over 10million rows is a slightly different ballgame. Not worse of better, just different.
The same considerations apply for small and large tables:
1. Are you indexing properly
2. Is your IO fine
3. Is your space fine
4. Are you querying efficiently
There is no right answer for this, it would depend of why you have big column counts and whether it's hitting your overall performance.
We run 1000s of tables with > 150 columns and no problems, even with millions of rows between them and I can't complain about performance.
And this is relatively de-normalized data, so lots of text.
Related
I am using websql to store data in a phonegap application. One of table have a lot of data say from 2000 to 10000 rows. So when I read from this table, which is just a simple select statement it is very slow. I then debug and found that as the size of table increases the performance deceases exponentially. I read somewhere that to get performance you have to divide table into smaller chunks, is that possible how?
One idea is to look for something to group the rows by and consider breaking into separate tables based on some common category - instead of a shared table for everything.
I would also consider fine tuning the queries to make sure they are optimal for the given table.
Make sure you're not just running a simple Select query without a where clause to limit the result set.
I am relatively new to sql(ite), and I'm learning as I go while working on a new project.
We have got millions of transaction rows in one "data" table, one field being a "sessionid" field.
Since I want to concentrate on in-session activity for now, I primarily need to look only at transactions from the same sessions.
My intuition now is, that it would be a lot faster if I separate the database by sessions into many single session tables, than always querying for a single sessionid, and then proceeding. My question: is that correct? will that make a difference?
Even if not: Could you help me out and tell me, how I could split the one "data" table rows into many session-specific tables, the rows staying the same? Plus one table which relates sessionIds to their tables?
Thanks!
A friend just told me, the splitting-into-tables thing would be extremely unflexible, and I should try adding a distinct index instead for the different sessionId rows to access single sessions faster. Any thoughts on that and how to do it best?
First of all, are you having any specific performance bottleneck with it till now? If yes, please describe it.
Having one table per session will probably speed lookups/indexes (for INSERTs) things up.
SQLite doesn't impose a limit on the number of tables, so you should be okay.
One other solution that provides easier maintenance, is if you create one table per day/week.
Depending on how long your sessions last, this could be feasible or not.
Related: https://stackoverflow.com/a/811862/89771
I am designing a Web application that we estimate may have about 1500 unique users per hour. (We have no stats for concurrent users.). I am using ASP.NET MVC3 with an Oracle 11g backend and all retrieval will be through packaged stored procedures, not inline SQL. The application is read-only.
Table A has about 4 million records in it.
Table B has about 4.5 million records.
Table C has less than 200,000 records.
There are two other tiny lookup tables that are also linked to table A.
Tables B and C both have a 1 to 1 relationship to Table A - Tables A and B are required, C is not. Tables B and C contain many string columns (some up to 256 characters).
A search will always return 0, 1, or 2 records from Table A, with its mate in table b and any related data in C and the lookup tables.
My data access process would create a connection and command, execute the query, return a reader, load the appropriate object from that reader, close the connection, and dispose.
My question is this....
Is it better (as performance goes) to return a single, wide record set all at once (using only one connection) or is it better to query one table right after the other (using one connection for each query), returning narrower records and joining them in the code?
EDIT:
Clarification - I will always need all the data I would bring over in either option. Both options will eventually result in the same amount of data displayed on the screen as was brought from the DB. But one would have a single connection getting all at once (but wider, so maybe slower?) and the other would have multiple connections, one right after the other, getting smaller amounts at a time. I don't know if the impact of the number of connections would influence the decision here.
Also - I have the freedom to denormalize the table design, if I decide it's appropriate.
You only ever want to pull as much data as you need. Whichever way moves less from the database over to your code is the way you want to go. I would pick your second suggestion.
-Edit-
Since you need to pull all of the records regardless, you will only want to establish a connection once. Since you're getting the same amount of data either way, you should try to save as much memory as possible by keeping the number of connections down.
I'm developing a quick side project that needs a users table, and I want them to be able to store profile data. I was already reaching for the ASP.NET profile provider when I realized that users will only ever have one profile.
I realize that frequently changing data will impact performance on things like indexes and stuff but how frequent is too frequent?
If I have one profile change per month per user happening for say 1000 users, is that a lot?
Or are we talking more like users changing profile data on an hourly basis?
I realize this isn't an exact science but I'm trying to gauge at what point the threshold starts to peak, and since my users profile data will probably rarely change if I should bother the extra work or just wait a few decades for it to be a problem.
One thing to consider is how adding a large text column to a table will affect the layout of the rows. Some databases will store the large columns inlined with the other fixed size columns; this will make the rows variable sized and that means more work for the database when it needs to pull a row off the disk. Other databases (such as PostgreSQL) store large text columns away from the fixed size columns; this leads to fixed sized rows with quick access during table scans and the like but an extra bit of work is needed to pull out the text columns.
1000 users isn't that much in database terms so there's probably nothing to worry about one way or the other. OTOH, little one-off side projects have a nasty habit of turning into real mission critical projects when you're not looking so doing it right from the beginning is a good idea.
I think Justin Cave has covered the index issue well enough.
As long as you structure your data access properly (i.e. all access to your user table goes through one isolated pile of code) then changing your data schema for users won't be much work anyway.
Does the profile information actually need to be indexed? Or are you just going to be retrieving it based on the USER_ID of the table or some other indexed USER column? If the profile data isn't indexed, which seems likely to me, than there are no performance impacts to other indexes on the table.
The only reason I can think of to be concerned about putting profile information in the table is if there is a lot of data compared to the necessary information to define a user and if the USER table needs to be full scanned for some reason. In that case, increasing the size of the table would adversely affect the performance of a table scan. Assuming that you don't have a use case where it's regularly going to make sense to do a full scan on the USERS table, and given that the table will only have 1000 rows, that's probably not a big deal.
I am building UI for a large product catalog (millions of products).
I am using Sql Server, FreeText search and ASP.NET MVC.
Tables are normalized and indexed. Most queries take less then a second to return.
The issue is this. Let's say user does the search by keyword. On search results page I need to display/query for:
Display 20 matching products on first page(paged, sorted)
Total count of matching products for paging
List of stores only of all matching products
List of brands only of all matching products
List of colors only of all matching products
Each query takes about .5 to 1 seconds. Altogether it is like 5 seconds.
I would like to get the whole page to load under 1 second.
There are several approaches:
Optimize queries even more. I already spent a lot of time on this one, so not sure it can be pushed further.
Load products first, then load the rest of the information using AJAX. More like a workaround. Will need to revise UI.
Re-organize data to be more Report friendly. Already aggregated a lot of fields.
I checked out several similar sites. For ex. zappos.com. Not only they display the same information as I would like in under 1 second, but they also include statistics (number of results in each category).
The following is the search for keyword "white"
http://www.zappos.com/white
How do sites like zappos, amazon make their results, filters and stats appear almost instantly?
So you asked specifically "how does Zappos.com do this". Here is the answer from our Search team.
An alternative idea for your issue would be using a search index such as solr. Basically, the way these work is you load your data set into the system and it does a huge amount of indexing. My projects include product catalogs with 200+ data points for each of the 140k products. The average return time is less than 20ms.
The search indexing system I would recommend is Solr which is based on lucene. Both of these projects are open source and free to use.
Solr fits perfectly for your described use case in that it can actually do all of those things all in one query. You can use facets (essentially group by in sql) to return the list of different data values for all applicable results. In the case of keywords it also would allow you to search across multiple fields in one query without performance degradation.
you could try replacing you aggergate queries with materialized indexed views of those aggregates. this will pre-compute all the aggregates and will be as fast as selecting any regular row data.
.5 sec is too long for an appropriate hardware. I agree with Aaronaught and first thing to do is to convert it in single SQL or possibly Stored Procedure to ensure it's compiled only once.
Analyze your queries to see if you can create even better indexes (consider covering indexes), fine tune existing indexes, employ partitioning.
Make sure you have appropriate hardware config - data, log, temp and even index files should be located on independent spindles. make sure you have enough RAM and CPU's. I hope you are running 64-bit platform.
After all this, if you still need more - analyze most used keywords and create aggregate result tables for top 10 keywords.
Amount Amazon - they most likely use superior hardware and also take advantage of CDN's. Also, they have thousands of servers surviving up the content and there is no performance bottlenecks - data is duplicated multiple times across several data centers.
As completely separate approach - you may want to look into "in-memory" databases such as CACHE - this is the fastest you can get on DB side.