Is it possible to prototype server/SQL performance on paper for various loads? -

I am trying to figure out whether a web development project is feasible at the moment and have so far learned that the total row count of the proposed database (30 million rows, 5 columns and about 3 gb of storage) is well within the budget limits in terms of storage requirements, but because of the anticipated large number of queries that users will make to the database I am not sure if this will cause an unrealistic load to manage for the server to provide adequate performance (within my budget).
I will be using this grid (a live demo of performance benchmarks for 300,000 rows - Inserting a search term in the "product name" box and pressing enter takes 1.6 seconds from query to results render. It seems to me (a newbie) that 300,000 rows which take 1.6 seconds all in all must take much longer with 30 million rows, and so I am trying to figure out
what the increase in time would be the more rows are added up to 30 million
what the increase in time would be for each additional 1000 people using the search grid at the same time.
what hardware requirements are necessary to reduce the delays to an acceptable level
Hopefully if I can figure that out I can get a more realistic assessment for feasibility. FYI: The database need not be updated very regularly, it is more for readonly purposes.
Can this problem be prototyped on paper for these 3 points?
Even wide ball park estimates- without considering optimisation, am I talking hundreds of dollars for 5000 users to have searches below 10 seconds each, thousands, or tens of thousands of dollars?
[Will be RadControls for AJAX Grid, One of these cloud hosted servers: 4,096MB RAM
160GB Diskspace, and either Microsoft® SQL Server® 2008 R2 and SQL Server 2012 ]

The database need not be updated very regularly, it is more for readonly purposes.
Your search filters allow for substring searches, so db indexes are not going to help you and the search will go row-by-row.
It looks like your data would probably fit in 5GB of memory or so. I would store the whole thing in memory and seach there.


Speeding up an ASP.NET app

I am developing an ASP.NET web application. The web application searches for people across about thirty different database (it is a law enforcement app). For example, a police office searches for Fred Smith DOB 01/01/1950 (not a real person) and it returns any hits from all the databases.
For the majority of searches; the speed is acceptable i.e. if the average person has five hits then the average time to load the page is five seconds. However, some searches have hundreds and sometimes thousands of hits. I saw one search which took 25 minutes, which is obviously not acceptable.
Longer term a data warehouse will probably be created to ensure all the data is in one database. However, what is the best strategy for speeding up searches in this scenario? I thought of caching but the same person is rarely searched for twice in a small amount of time. Are there any other ideas?
You have insufficient details (what databases, whats the frontend language, how are you querying, is there indexing etc.)
but here are some preliminary suggestions.. (in the increasing order of effort, probably)
heavily index all the databases for the key columns and do your searches.
multi threading - spawn 30 threads (1 per database) and do the search. start displaying the results as the threads come back.
have a backend job to consolidate all data from 30 databases into a single denormalized table which is fully indexed. query that table.
setup #3 with mostly a SOLR/LUCERNE like indexing engine, for even faster querying.
use big data etc.
You are asking the wrong question, if you're db searches are taking too long, it's because the columns you are searching are not indexed or other db related issue. Showing them 'faster' with is the least of your problems.
Show us some code and it's easier to help out.

Statistics on large table presented on the web

We have a large table of data with about 30 000 0000 rows and growing each day currently at 100 000 rows a day and that number will increase over time.
Today we generate different reports directly from the database (MS-SQL 2012) and do a lot of calculations.
The problem is that this takes time. We have indexes and so on but people today want blazingly fast reports.
We also want to be able to change timeperiods, different ways to look at the data and so on.
We only need to look at data that is one day old so we can take all the data from yesterday and do something with it to speed up the queries and reports.
So do any of you got any good ideas on a solution that will be fast and still on the web not in excel or a BI tool.
Today all the reports are in c# webforms with querys against MS SQL 2012 tables..
You have an OLTP system. You generally want to maximize your throughput on a system like this. Reporting is going to require latches and locks be taken to acquire data. This has a drag on your OLTP's throughput and what's good for reporting (additional indexes) is going to be detrimental to your OLTP as it will negatively impact performance. And don't even think that slapping WITH(NOLOCK) is going to alleviate some of that burden. ;)
As others have stated, you would probably want to look at separating the active data from the report data.
Partitioning a table could accomplish this if you have Enterprise Edition. Otherwise, you'll need to do some hackery like Paritioned Views which may or may not work for you based on how your data is accessed.
I would look at extracted the needed data out of the system at a regular interval and pushing it elsewhere. Whether that elsewhere is a different set of tables in the same database or a different catalog on the same server or an entirely different server would depend a host of variables (cost, time to implement, complexity of data, speed requirements, storage subsystem, etc).
Since it sounds like you don't have super specific reporting requirements (currently you look at yesterday's data but it'd be nice to see more, etc), I'd look at implementing Columnstore Indexes in the reporting tables. It provides amazing performance for query aggregation, even over aggregate tables with the benefit you don't have to specify a specific grain (WTD, MTD, YTD, etc). The downside though is that it is a read-only data structure (and a memory & cpu hog while creating the index). SQL Server 2014 is going to introduce updatable columnstore indexes which will be giggity but that's some time off.

Handling extremely large amounts of data in web-based applications

What would be the best way to store a very large amount of data for a web-based application?
Each record has just 3 fields, but there will be around 144 million records a day - stored for one month - 4,464,000,000 records total. Let's round up to 5 billion.
Data has to be searchable on keyword & return results as fast as possible to the end user.
Which programming language?
JSON / XML / Some Database System I've Never Heard Of?
What sort of infrastructure? Imagine this system is only serving the needs of a maximum of 1,000 users at the same time.
I assume the code is the same whether you're searching 10 records or 10 billion, you just have to be a whole lot more efficient. I also assume mySQL/PHP doesn't stand a chance, and we're going to be paying out a very large sum for a hosting solution.
Just need some guidance on where to start, really. Thank you!
There are many tools in the Big Data ecosystem (NoSQL databases, distributed computing, machine learning, search, etc) which can form an answer to your question. Since your application will be write-heavy, I would advocate Apache Cassandra for its excellent write-performance (although it requires more data modeling than a NoSQL/document database such as MongoDB). You also need a Solr or ElasticSearch based search solution, and Map/Reduce for indexes and queries.
The programming language doesn't matter unless you have business end-users which will be writing queries against your Big Data in which case you can use something very SQL-like such as Hive or Pig. To get you started, the following (recent) link might give you some idea on how to pick an analytics stack based on your needs - please note that every database or distributed computing paradigm specializes for some particular use case:
How we picked our analytics stack
Also look at High Scalability for various use cases on how companies tackle their scalability problems.

Calculate at runtime vs Lookup from SQL Server Table

I have an MVC application that needs to run several tillion calculations. Of those, I am interested in only about 8 million results. I have to do this work because I need to see an overall high and low score. I will save this data, and store it is in a single table of 16 floats. I have a few indexes too on this table for lookups. So far I have only processed 5% of my data.
As users enter data into my website, I have to do calculations based on their data. I have to determine the Best and Worst outcomes. This is only about 4 million calculations. Right now, that takes about a second or less to calculate on my local PC. Or it is a simple query that will always return 2 records from my stored data. The Best and The Worst. Right now, the query to get the results is the same speed or faster than calculating the result, but I don't have all 8 million records yet. I am worried that the DB will get slow.
I was thinking I would use the Database Lookup, and if performance became an issue, switch to runtime calculation.
QUESTION: Should I just save myself the trouble and do the runtime calculation anyway?
I am not sure which option is more scalable. I don't expect a large user base for this website.
The site needs to be snappy.
Your question is a little vague to provide a clear cut answer, but my guess is using the db to calculate the totals will be far more efficient than you writing the code on the website. Sql Server will attempt to optimize the query to use as much of the server resources as possible to make it more efficient. Your code won't do that unless you specifically write it to do so.
I would start by loading the data and doing tests before making an optimization strategy. You have no idea where the real bottlenecks of the system will be before you load data that is remotely close to what you are going to have to deal with.
If I understand the question performing the calculation is more scalable has it is on that single data set. As you add data to a table even with indexes lookups will get slower. Also the indexes increase table size and increase the time required to insert a record.
If I've understood you correctly, this is a question about caching - should you calculate on the fly, or lookup the results in a cache?
In most web architectures, your SQL database is a brilliant cache, right up to the point where it becomes a terrible cache. Scaling your (SQL) database is notoriously tricky - introducing clustering, sharding etc. becomes a production in its own right.
My - very general - advice is to use your relational database for managing transactional data, and to use caching technology for caching. 8 million records should fit into RAM on a decent server these days - and you can add web servers far more cheaply than scaling your database.

How many rows can an SQLite table hold before queries become time comsuming

I'm setting up a simple SQLite database to hold sensor readings. The tables will look something like this:
- id (pk)
- name
- description
- units
- id (pk)
- sensor_id (fk to sensors)
- value (actual sensor value stored here)
- time (date/time the sensor sample was taken)
The application will be capturing about 100,000 sensor readings per month from about 30 different sensors, and I'd like to keep all sensor readings in the DB as long as possible.
Most queries will be in the form
SELECT * FROM sensor_readings WHERE sensor_id = x AND time > y AND time < z
This query will usually return about 100-1000 results.
So the question is, how big can the sensor_readings table get before the above query becomes too time consuming (more than a couple seconds on a standard PC).
I know that one fix might be to create a separate sensor_readings table for each sensor, but I'd like to avoid this if it is unnecessary. Are there any other ways to optimize this DB schema?
If you're going to be using time in the queries, it's worthwhile adding an index to it. That would be the only optimization I would suggest based on your information.
100,000 insertions per month equates to about 2.3 per minute so another index won't be too onerous and it will speed up your queries. I'm assuming that's 100,000 insertions across all 30 sensors, not 100,000 for each sensor but, even if I'm mistaken, 70 insertions per minute should still be okay.
If performance does become an issue, you have the option to offload older data to a historical table (say, sensor_readings_old) and only do your queries on the non-historical table (sensor_readings).
Then you at least have all the data available without affecting the normal queries. If you really want to get at the older data, you can do so but you'll be aware that the queries for that may take a while longer.
Are you setting indexes properly? Besides that and reading, the only answer is 'you'll have to measure yourself' - especially since this will be heavily dependent on the hardware and on whether you're using an in-memory database or on disk, and on if you wrap inserts in transactions or not.
That being said, I've hit noticeable delays after a couple of tens of thousands of rows, but that was absolutely non-optimized - from reading a bit I get the impression that there are people with 100's of thousands of rows with proper indexes etc. who have no problems at all.
SQLite now supports R-tree indexes ( ), ideal if you intend to do a lot of time range queries.
I know I am coming to this late, but I thought this might be helpful for anyone that comes looking at this question later:
SQLite tends to be relatively fast on reading as long as it is only serving a single application/user at a time. Concurrency and blocking can become issues with multiple users or applications accessing it at a single time and more robust databases like MS SQL Server tend to work better in a high concurrency environment.
As others have said, I would definitely index the table if you are concerned about the speed of read queries. For your particular case, I would probably create one index that included both id and time.
You may also want to pay attention to the write speed. Insertion can be fast, but commits are slow, so you probably want to batch many insertions together into one transaction before hitting commit. This is discussed here:
