database lookup expedition using externally provided statistics during initialization - sqlite

I have a large table that I can not fit to the memory. I am considering moving the table to a database, for example sqlite. However, the some of the lookups will be made very often, therefore they could stay in memory and rest could stay in the hard disk. I want to expedit the table lookup by supplying some statistics during initialization, so that memory lookups to the hard disk will be reduced. I have tried it with sqlite, however, I am not able to supply it a list of external statistics. Any suggestions?
Thank you,

Related

Calculate at runtime vs Lookup from SQL Server Table

I have an MVC application that needs to run several tillion calculations. Of those, I am interested in only about 8 million results. I have to do this work because I need to see an overall high and low score. I will save this data, and store it is in a single table of 16 floats. I have a few indexes too on this table for lookups. So far I have only processed 5% of my data.
As users enter data into my website, I have to do calculations based on their data. I have to determine the Best and Worst outcomes. This is only about 4 million calculations. Right now, that takes about a second or less to calculate on my local PC. Or it is a simple query that will always return 2 records from my stored data. The Best and The Worst. Right now, the query to get the results is the same speed or faster than calculating the result, but I don't have all 8 million records yet. I am worried that the DB will get slow.
I was thinking I would use the Database Lookup, and if performance became an issue, switch to runtime calculation.
QUESTION: Should I just save myself the trouble and do the runtime calculation anyway?
I am not sure which option is more scalable. I don't expect a large user base for this website.
The site needs to be snappy.
Your question is a little vague to provide a clear cut answer, but my guess is using the db to calculate the totals will be far more efficient than you writing the code on the website. Sql Server will attempt to optimize the query to use as much of the server resources as possible to make it more efficient. Your code won't do that unless you specifically write it to do so.
I would start by loading the data and doing tests before making an optimization strategy. You have no idea where the real bottlenecks of the system will be before you load data that is remotely close to what you are going to have to deal with.
If I understand the question performing the calculation is more scalable has it is on that single data set. As you add data to a table even with indexes lookups will get slower. Also the indexes increase table size and increase the time required to insert a record.
If I've understood you correctly, this is a question about caching - should you calculate on the fly, or lookup the results in a cache?
In most web architectures, your SQL database is a brilliant cache, right up to the point where it becomes a terrible cache. Scaling your (SQL) database is notoriously tricky - introducing clustering, sharding etc. becomes a production in its own right.
My - very general - advice is to use your relational database for managing transactional data, and to use caching technology for caching. 8 million records should fit into RAM on a decent server these days - and you can add web servers far more cheaply than scaling your database.

SQlite3 Optimization: Store external file name in db? Or just have a huge number of rows?

I am a newbie with no comp sci background. So please forgive me for whatever dumb stuff I may say. I am working on a solar power monitoring project to monitor the power output of the solar power systems my company installs. I am writing a client that will query the inverter (for power output, voltage output, current output, system errors/faults, etc--which constitutes one "reading") of each of our monitoring customers every 15 minutes for as long as they have their system--which means roughly 35k readings per year per customer. So I was thinking of organizing my sqlite3 database in one of the two following ways.
(1) Have the database be two tables, one table with regular customer information (name, email, etc) and another much bigger table where each row represents one reading and includes the customer id and timestamp of reading as identifiers. Which means roughly 35k rows will be being added to this bigger table per customer per year. (Data more than two years old will be pared down and archived.)
OR
(2) Store all readings in a csv file (one csv file per customer) and store the csv file name in my table with regular customer information
This database will be serving a website (built on rails if that makes any difference for options) where customers will be able to view their power output data. I want to minimize the amount of time it will take to load their output data on login. I basically don't have a clear idea of the amount of time it would take for my computer to open and read in lines from a text file versus open, look for (based on customer id) and read in the data from a huge sqlite3 table--and therefore am having trouble knowing how to judge between the two options above. Also I'm having trouble gauging the limits of sqlite3 where it functions optimally despite having read some about it (I don't think I have the background to understand the reading I did because it seems to say 100s of millions of rows are just fine when I read other people's comments seeming to say just the opposite.). I am also open to a completely different option as I'm not married to anything right now. Whatever makes things load faster. Thanks so much in advance!
Storing the parsed data in sqlite would definitely be a timesaver if you're doing any kind of repeated data mining on it. CSV Parsing overhead would almost instantly eat up any database space/time savings you'd gain.
As for efficiency, you'd have to test it. There's no one hard fast rule that says "use this database" or "use that database". It's ALWAYS a "depends on the scenario". SQLite may be perfect for you in this case, but be useless for someone else with a slightly different workload.
SQL applications in general do very well with large data sets, as long as the columns being queried are indexed. You should keep them in the same database. It will take a huge lot less to obtain the data from the database than for parsing CSV files. Databases are created with the purpose of storing and retrieving data, CSV files are not.
I use MySQL databases with tens of millions of rows per table and queries return results in fractions of a second. SQLite might be faster.
Just make sure you create indexes for what you will be searching.
I would do option 1, but use a database server such as PostgreSQL instead of SQLite.
SQLite will lock the table on update so you may run into locking issues if you read and write to the table a lot. SQLite is better suited for single user applications on the desktop or in a smartphone.
You can easily have millions of rows without it causing any problems.

What data store technology/solution allows very fast inserts, lookups and 'selects'

Here's my problem.
I want to ingest lots and lots of data .... right now millions and later billions of rows.
I have been using MySQL and I am playing around with PostgreSQL for now.
Inserting is easy, but before I insert I want to check if that particular records exists or not, if it does I don't want to insert. As the DB grows this operation (obviously) takes longer and longer.
If my data was in a Hashmap the look up would be o(1) so I thought I'd create a Hash index to help with lookups. But then I realised that if I have to compute the Hash again every time I will slow the process down massively (and if I don't compute the index I don't have o(1) lookup).
So I am in a quandry, is there a simple solution? Or a complex one? I am happy to try other datastores, however I need to be able to do reasonably complex queries e.g. something to similar to SELECT statements with WHERE clauses, so I am not sure if no-sql solutions are applicable.
I am very much a novice, so I wouldn't be surprised if there is a trivial solution.
Nosql Stores are good for handling huge inserts and updates
MongoDB has really good feature for update/Insert (called as upsert) based on whether the document is existing.
Check out this page from mongo doc
http://www.mongodb.org/display/DOCS/Updating#Updating-UpsertswithModifiers
Also you can checkout the safe mode in mongo connection. Which you can set it as false to get more efficiency in inserts.
http://www.mongodb.org/display/DOCS/Connections
You could use CouchDB. Its no SQL so you can't do queries per se, but you can create design documents that allow you to run map/reduce functions on your data.

How can i improve the performance of the SQLite database?

Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.

Alternatives of Datatable

In my web application, I have a dynamic query that returns huge data to datatable, and this query is often recalled with different parameters. So database is exhausted.
I want to get all record with no parameters to an object, and perform queries (may be with linq) on this object. So database will not be exthausted.
Which objects can be used instead of datatable?
This is one of my pet peeves - people who return all the data from the database.
There is absolutely no need for this unless you are doing reporting.
If you are doing reporting, then you need to increase your hardware capability so that the database can cope. This may also include tuning your database, rearranging tables, reindexing, regular rebuilding of indexes, updating statistics, archiving out old data, etc.
If you are NOT doing reporting, then start limiting how much data can be queried at any one time. Users DO NOT need to see massive quantities of data all at once. They need to see discrete amounts of data presented in a manageable and coherent way.
Another rule of thumb i like to observe is: let your database server do the work, it is made to manipulate lots of data, it is what it is good at, and it should have the power to do it. Pulling back loads of data to the client, and then trying to manipulate that data on the client is a foolish thing to do. If your client machines are more powerful than the database server then you have issues.
Never ever perform this(except cache)!!!
You are trying to implement DB mechanisms, like
persistent storage
index search and query strategy
replication
and so on
Spend your time on db optimization(optimal scheme, indexes, query, partitioning).

Resources