How to plan for schema changes in an SQLite database? - sqlite

I am currently developing an application that will store data in an SQLite database. The database will have much more read- than write-access (in fact, it will be filled with data once, and then almost only reading will happen). The read-performance is therefore much mre important. The schema I am currently developing is very likely to change in the future, with additional columns and tables being added. I do not have very much experience with databases in general. My question is, specifically in SQLite, are there any pitfalls to be considered when changing a schema? Are there any patterns or best practices to plan ahead for such cases?

Here are some suggestions:
Don't use select * from ... because the meaning of * changes with schema changes; explicitly name the columns your query uses
Keep the schema version number in the database and keep code in the application to convert from schema version N to version N+1; then all the code in the application works with the latest schema version; this may mean having default values to fill added columns
You can avoid copying tables for schema updates with SQLite version 3.1.3 or better which supports ALTER TABLE ADD COLUMN...

Look into data-marts and star schema design. This might be overkill for your situation, but at least it will prevent you from designing at random.

Related

Decimal/Date Sorting With Entity Framework Core and SQLite

Using Entity Framework Core (Code First) and SQLite, Guid is stored as binary but Decimal and Date fields are stored as text with Microsoft's provider.
I can understand they might not want the imprecision of DOUBLE for currency amounts and thus use text.
What happens if I need to sort? Is Entity Framework Core smart enough to make sorting work as expected (but slower because it needs to parse everything!), or will it sort alphabetically instead of sorting by number? I don't want it to return 100 before 2.
I'll have to do things like "give me the latest order" so what's the best approach for that? I want to make sure it's going to work.
Am I better to switch to System.Data.SQLite provider to store dates in UNIX format (this is not supported by the Microsoft's provider)? and then would I have to do the parsing back and forth myself or it could take care of it automatically?
I am still learning system.data.sqlite myself, but I am aware that you can create and assign a custom collation to your column. The collation can either be assigned to the table column, or only to a particular view or query using standard sqlite SQL syntax and the COLLATE keyword.
This is not a complete example/tutorial, but for starters visit the Microsoft.data.sqlite docs. Also see this stack overflow answer. These are just hints, but provide a consistent method to do this. Remember that sqlite is an in-process DB engine, so it should still be rather efficient and still allow working with the database in a normal fashion without having to constantly inject custom logic between queries. Once you have the custom collation defined and properly registered, it should be rather seamless with perhaps the only extra requirement to append e.g. COLLATE customDecimal to the ORDER BY clauses.
The custom collation function would convert the string value to an appropriate numeric type and return the comparison. It's very similar to the native .Net IComparer and IComparison interfaces/implementations.

Which one to use? EAV or Blobs in the database?

I am currently working to rework the data system of our application. Basically, it is designed so that people can add all the custom fields they want, with only a few constant/always-there fields.
Our current design is giving us plenty of maintenance problems. What we do is dynamically(at runtime) add a column to the database for each field. We have to have a meta table and other cruft to maintain all of these dynamic columns.
Now we are looking at EAV, but it doesn't seem much better. Basically, we have many different types of fields, so there would be a StringValues, IntegerValues, etc table... which makes things that much worse.
I am wondering if using JSON or XML blobs in the database may be a better solution, specifically because in most use cases, when we retrieve anything out of these tables, we need the entire row. The problems is that we need to be able to create reports for this data as well.. No solution really makes custom queries look easy. And searching across such a blob database will surely be a performance nightmare when reports are ran.
Each "row" needs to have anywhere from about 15 to 100(possibly more) attributes/columns associated with it.
We are using SQL Server 2008 and our application interfacing with the database is a C# web application(so, ASP.Net).
what do you think? Use EAV or blobs or something else entirely? (Also, yes, I know a schema free database like MongoDB would be awesome here, but I can't convince my boss to use it)
What about the xml datatype? Advanced querying is possible against this type.
We've used the xml type with good success. We do most of our heavy lifting at the code level using linq to parse out values. Our schema is somewhat fixed, so that may not be an option for you.
One interesting feature of SQL server is the sql_variant type. It's fully supported in .NET and quite easy to use. The advantages is you don't need to create StringValue, IntValue, etc... columns, just one Value column that can contain all the simple types.
This very specific type favors the EAV option, IMHO.
It has some drawbacks though (sorting, distinct selects, etc...). So if you want to use it, make sure you read all the documentation and understand its limit.
Create a table with your known columns and "X" sparse columns using a sequential name such as DataColumn0001, DataColumn0002, etc. When there is a definition for a new column just rename a column and start inserting data. The great advantage to the sparse column is it is indexable.
More info at this link.
What you're doing is STUPID with a database that doesn't support your data type. You should work with a medium that meets your needs which include NoSQL databases such as RavenDB, MongoDB, DocumentDB, CouchBase or Postgres in RDMBS to name several.
You are inherently using the tool in a capacity it was neither designed for, and one it specifically attempts to limit you from achieving success. NoSQL database solutions frequently use JSON as an underlying storage because JSON is inherently schemaless. Want to add a property? Sure go ahead, want to add a whole sub collection? Sure go ahead. NoSQL databases were in part, created specifically to remove rigid schema requirements of RDBMS.
2015 Edit: Postgres now natively supports JSON. This is a viable option for RDBMS. My answer is still correct that you need to use the correct tool for the problem. It is a polygot persistence world.

(LINQ-To-SQL) Creating classes first, database table second, how to auto-connect the two?

I'm creating a data model first using the LINQ-To-SQL graphical designer by using right-click->Add->Class. My idea is that I'll set up everything first using test repositories, design the entire website, then as a final step, create a database using the LINQ-To-SQL classes as a model for the database tables and relationships. My reasoning is that it's easy to edit the classes, but hard to modify DB tables (especially if there's already data in them), so by doing the database part last, it becomes much easier to design the structure.
My question is, is there an automatic way to link the two once I have the DB tables created? I know you can manually fill out the class properties for the LINQ-To-SQL entities but this is pretty cumbersome if you have a lot of tables to deal with. The other option is to delete your manually-created classes and drag the tables from the database into the designer to auto-generate the classes, but I'm not sure if this is the best way of doing it.
Linq to Sql is intended to be a relatively thin ORM layer over a database. While you can of course just add properties to a data context and use them as a sort of mock, you are correct, it isn't really easy to work with.
Instead of relying solely on Linq to Sql generated classes to give you freedom from the database implementation, you may want to look into the repository design pattern. It allows you to have a smooth separation between your database, domain model, and your middle tier; I have used it on two projects now, and have been able to (for the most part) build everything top-down, leaving the actual database for last. Below is a link to a good tutorial on the pattern (better than I could scribble down here).
https://web.archive.org/web/20110503184234/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/10/08/the-repository-pattern.aspx
Depending on your database permissions, you may call your datacontext's DeleteDatabase() and CreateDatabase() methods as a ungraceful way of resyncing your classes and tables. This is not much of an option when you have actual data in the database, but does work when you are in your development stages.
Take a look at my add-in (which you can download from http://www.huagati.com/dbmltools/ , free 45-day trial licenses are also available from the same site).
It can generate SQL-DDL diff scripts with the SQL-DDL statements for updating your database with only the portions that has changed in the L2S model (e.g. add missing columns, missing tables, missing FKs etc), instead of the L2S-out-of-the-box support for recreating the entire db from scratch.
It also supports syncing the other way; updating the model from the database.

Avoding unnecessary updates In Update Query

In our application,Many pages includes "update" and when we update a table,we update unnecessary columns,which dont change,too.
i want to know that is there a way to avoid unnecessary column updates?We use stored procedures in .net 2003.In Following link,i found a solution but it is not for stored procedures.
http://blogs.msdn.com/alexj/archive/2009/04/25/tip-15-how-to-avoid-loading-unnecessary-properties.aspx
Thanks
You can really only accomplish this with a good ORM tool that generates the update query for you. It will typically look at what changed and generate the query for only the columns that changed.
If you're using a stored procedure then all of the column values get sent over to the database anyway when you call the stored procedure so you can't save there. The SP will probably just execute a run-of-the-mill UPDATE statement. The RDMS then takes over. It won't physically change the data on disc if it's not different. It's smart enough for that.
So my answer in short: don't worry about it. It's not really a big deal and requires drastic changes to get what you want and you wont even see performance benefits.
When I was working at a financial software company, performance was vital. Some tables had hundreds of columns, and the update statements were costly. We created our own ORM layer (in java) which included an object cache. When we generated the update statement, we compared the current values of every field to the values as they were on load and only updated the changed fields.
Our db was SQLServer. I do not remember the performance improvement, but it was substantial and worth the investment. We also did bulk inserts and updates where possible.
I believe that Hibernate and the other big ORMs all do this sort of thing for you, if you do not want to write one yourself.

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources