Saving MFC Model as SQLite database - sqlite

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?

This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.

This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Related

SQL Server database or XML, what to choose for asp.net app?

I have a database with about 10,000 records. Each record has one text field (200 chars or so) and about 30 numeric fields.
The asp.net app only does some searching and sorting and displaying data in a grid. No data sharing between users, read only operations (no database updating), very little calculation.
Should I go with an XML file and use Linq or should I use an SQL Server database? Please give me explanations of your choice.
Should I go with an XML file and use Linq or should I use an SQL Server database?
TOTAL non issue - SQL.
The asp.net app only does some searching and sorting
Read up on a beginner SQL book what an "INDEX" is. XML files have none - so SQL databases are a lot more efficient with sorting and filtering.
It really depends on your needs, Ask your self following questions.
Is your data set going to increase?
Is speed one of the most desired thing of your app?
Are you going to run complex queries?
Is the schema of your data going to change?
If answer to most of the questions above is 'no' then feel free to use XML. Sql provides lot's of features and mainly intended for data storage and retrieval, while with XML you can store data but I would say its main usage is data interoperability and exchange
If your data set increases then SQL should be a choice because you can create indexes on your dataset which will increase speed of retrieval for the data, files are usually read serially and thus are slower for ad-hoc data search.
I think you'll find SQL to be much easier to develop and maintain. XML is great in some scenarios, but I've found it often presents a steady stream of headaches in the long term.
From a performance perspective alone, it's hard to say which approach would be better without knowing the details of your queries and schema. In general, though, SQL tends to win, since it's built for searching and sorting, where XML is not.
For a readonly dataset of that size, you might be able to read the whole thing into memory on the web server side, and do your searching and sorting there.

Store map key/values in a persistent file

I will be creating a structure more or less of the form:
type FileState struct {
LastModified int64
Hash string
Path string
}
I want to write these values to a file and read them in on subsequent calls. My initial plan is to read them into a map and lookup values (Hash and LastModified) using the key (Path). Is there a slick way of doing this in Go?
If not, what file format can you recommend? I have read about and experimented with with some key/value file stores in previous projects, but not using Go. Right now, my requirements are probably fairly simple so a big database server system would be overkill. I just want something I can write to and read from quickly, easily, and portably (Windows, Mac, Linux). Because I have to deploy on multiple platforms I am trying to keep my non-go dependencies to a minimum.
I've considered XML, CSV, JSON. I've briefly looked at the gob package in Go and noticed a BSON package on the Go package dashboard, but I'm not sure if those apply.
My primary goal here is to get up and running quickly, which means the least amount of code I need to write along with ease of deployment.
As long as your entiere data fits in memory, you should't have a problem. Using an in-memory map and writing snapshots to disk regularly (e.g. by using the gob package) is a good idea. The Practical Go Programming talk by Andrew Gerrand uses this technique.
If you need to access those files with different programs, using a popular encoding like json or csv is probably a good idea. If you just have to access those file from within Go, I would use the excellent gob package, which has a lot of nice features.
As soon as your data becomes bigger, it's not a good idea to always write the whole database to disk on every change. Also, your data might not fit into the RAM anymore. In that case, you might want to take a look at the leveldb key-value database package by Nigel Tao, another Go developer. It's currently under active development (but not yet usable), but it will also offer some advanced features like transactions and automatic compression. Also, the read/write throughput should be quite good because of the leveldb design.
There's an ordered, key-value persistence library for the go that I wrote called gkvlite -
https://github.com/steveyen/gkvlite
JSON is very simple but makes bigger files because of the repeated variable names. XML has no advantage. You should go with CSV, which is really simple too. Your program will make less than one page.
But it depends, in fact, upon your modifications. If you make a lot of modifications and must have them stored synchronously on disk, you may need something a little more complex that a single file. If your map is mainly read-only or if you can afford to dump it on file rarely (not every second) a single csv file along an in-memory map will keep things simple and efficient.
BTW, use the csv package of go to do this.

Which one to use? EAV or Blobs in the database?

I am currently working to rework the data system of our application. Basically, it is designed so that people can add all the custom fields they want, with only a few constant/always-there fields.
Our current design is giving us plenty of maintenance problems. What we do is dynamically(at runtime) add a column to the database for each field. We have to have a meta table and other cruft to maintain all of these dynamic columns.
Now we are looking at EAV, but it doesn't seem much better. Basically, we have many different types of fields, so there would be a StringValues, IntegerValues, etc table... which makes things that much worse.
I am wondering if using JSON or XML blobs in the database may be a better solution, specifically because in most use cases, when we retrieve anything out of these tables, we need the entire row. The problems is that we need to be able to create reports for this data as well.. No solution really makes custom queries look easy. And searching across such a blob database will surely be a performance nightmare when reports are ran.
Each "row" needs to have anywhere from about 15 to 100(possibly more) attributes/columns associated with it.
We are using SQL Server 2008 and our application interfacing with the database is a C# web application(so, ASP.Net).
what do you think? Use EAV or blobs or something else entirely? (Also, yes, I know a schema free database like MongoDB would be awesome here, but I can't convince my boss to use it)
What about the xml datatype? Advanced querying is possible against this type.
We've used the xml type with good success. We do most of our heavy lifting at the code level using linq to parse out values. Our schema is somewhat fixed, so that may not be an option for you.
One interesting feature of SQL server is the sql_variant type. It's fully supported in .NET and quite easy to use. The advantages is you don't need to create StringValue, IntValue, etc... columns, just one Value column that can contain all the simple types.
This very specific type favors the EAV option, IMHO.
It has some drawbacks though (sorting, distinct selects, etc...). So if you want to use it, make sure you read all the documentation and understand its limit.
Create a table with your known columns and "X" sparse columns using a sequential name such as DataColumn0001, DataColumn0002, etc. When there is a definition for a new column just rename a column and start inserting data. The great advantage to the sparse column is it is indexable.
More info at this link.
What you're doing is STUPID with a database that doesn't support your data type. You should work with a medium that meets your needs which include NoSQL databases such as RavenDB, MongoDB, DocumentDB, CouchBase or Postgres in RDMBS to name several.
You are inherently using the tool in a capacity it was neither designed for, and one it specifically attempts to limit you from achieving success. NoSQL database solutions frequently use JSON as an underlying storage because JSON is inherently schemaless. Want to add a property? Sure go ahead, want to add a whole sub collection? Sure go ahead. NoSQL databases were in part, created specifically to remove rigid schema requirements of RDBMS.
2015 Edit: Postgres now natively supports JSON. This is a viable option for RDBMS. My answer is still correct that you need to use the correct tool for the problem. It is a polygot persistence world.

(LINQ-To-SQL) Creating classes first, database table second, how to auto-connect the two?

I'm creating a data model first using the LINQ-To-SQL graphical designer by using right-click->Add->Class. My idea is that I'll set up everything first using test repositories, design the entire website, then as a final step, create a database using the LINQ-To-SQL classes as a model for the database tables and relationships. My reasoning is that it's easy to edit the classes, but hard to modify DB tables (especially if there's already data in them), so by doing the database part last, it becomes much easier to design the structure.
My question is, is there an automatic way to link the two once I have the DB tables created? I know you can manually fill out the class properties for the LINQ-To-SQL entities but this is pretty cumbersome if you have a lot of tables to deal with. The other option is to delete your manually-created classes and drag the tables from the database into the designer to auto-generate the classes, but I'm not sure if this is the best way of doing it.
Linq to Sql is intended to be a relatively thin ORM layer over a database. While you can of course just add properties to a data context and use them as a sort of mock, you are correct, it isn't really easy to work with.
Instead of relying solely on Linq to Sql generated classes to give you freedom from the database implementation, you may want to look into the repository design pattern. It allows you to have a smooth separation between your database, domain model, and your middle tier; I have used it on two projects now, and have been able to (for the most part) build everything top-down, leaving the actual database for last. Below is a link to a good tutorial on the pattern (better than I could scribble down here).
https://web.archive.org/web/20110503184234/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/10/08/the-repository-pattern.aspx
Depending on your database permissions, you may call your datacontext's DeleteDatabase() and CreateDatabase() methods as a ungraceful way of resyncing your classes and tables. This is not much of an option when you have actual data in the database, but does work when you are in your development stages.
Take a look at my add-in (which you can download from http://www.huagati.com/dbmltools/ , free 45-day trial licenses are also available from the same site).
It can generate SQL-DDL diff scripts with the SQL-DDL statements for updating your database with only the portions that has changed in the L2S model (e.g. add missing columns, missing tables, missing FKs etc), instead of the L2S-out-of-the-box support for recreating the entire db from scratch.
It also supports syncing the other way; updating the model from the database.

Drawbacks to having (potentially) thousands of directories in a server instead of a database?

I'm trying to start using plain text files to store data on a server, rather than storing them all in a big MySQL database. The problem is that I would likely be generating thousands of folders and hundreds of thousands of files (if I ever have to scale).
What are the problems with doing this? Does it get really slow? Is it about the same performance as using a Database?
What I mean:
Instead of having a database that stores a blog table, then has a row that contains "author", "message" and "date" I would instead have:
A folder for the specific post, then *.txt files inside that folder than has "author", "message" and "date" stored in them.
This would be immensely slower reading than a database (file writes all happen at about the same speed--you can't store a write in memory).
Databases are optimized and meant to handle such large amounts of structured data. File systems are not. It would be a mistake to try to replicate a database with a file system. After all, you can index your database columns, but it's tough to index the file system without another tool.
Databases are built for rapid data access and retrieval. File systems are built for data storage. Use the right tool for the job. In this case, it's absolutely a database.
That being said, if you want to create HTML files for the posts and then store those locales in a DB so that you can easily get to them, then that's definitely a good solution (a la Movable Type).
But if you store these things on a file system, how can you find out your latest post? Most prolific author? Most controversial author? All of those things are trivial with a database, and very hard with a file system. Stick with the database, you'll be glad you did.
It is really depends:
What is file size
What durability requirements do you have?
How many updates do you perform?
What is file system?
It is not obvious that MySQL would be faster:
I did once such comparison for small object in order to use it as sessions storage for CppCMS. With one index (Key Only) and Two indexes (primary key and secondary timeout).
File System: XFS ext3
-----------------------------
Writes/s: 322 20,000
Data Base \ Indexes: Key Only Key+Timeout
-----------------------------------------------
Berkeley DB 34,400 1,450
Sqlite No Sync 4,600 3,400
Sqlite Delayed Commit 20,800 11,700
As you can see, with simple Ext3 file system was faster or as fast as Sqlite3 for storing data because it does not give you (D) of ACID.
On the other hand... DB gives you many, many important features you probably need, so
I would not recommend using files as storage unless you really need it.
Remember, DB is not always the bottle neck of the system
Forget about long-winded answers, here's the simplest reasons why storing data in plaintext files is a bad idea:
It's near-impossible to query. How would you sort blog posts by date? You'd have to read all the files and compare their date, or maintain your own index file (basically, write your own database system.)
It's a nightmare to backup. tar cjf won't cut it, and if you try you may end up with an inconsistent snapshot.
There's probably a dozen other good reasons not to use files, it's hard to monitor performance, very hard to debug, near impossible to recover in case of error, there's no tools to handle them, etc...
I think the key here is that there will be NO indexing on your data. SO to retrieve anything in say a search would be rediculously slow compared to an indexed database. Also, IO operations are expensive, a database could be (partially) in memory, which makes the data available much faster.
You don't really say why you won't use a database yourself... But in the scenario you are describing I would definitely use a DB over folder any day, for a couple of reasons. First of all, the blog scenario seems very simple but it is very easy to imagine that you, someday, would like to expand it with more functionality such as search, more post details, categories etc.
I think that growing the model would be harder to do in a folder structure than in a DB.
Also, databases are usually MUCH faster that file access due to indexing and memory caching.
IIRC Fudforum used the file-storage for speed reasons, it can be a lot faster to grab a file than to search a DB index, retrieve the data from the DB and send it to the user. You're trading the filesystem interface with the DB and DB-library interfaces.
However, that doesn't mean it will be faster or slower. I think you'll find writing is quicker on the filesystem, but reading faster on the DB for general issues. If, like fudforum, you have relatively immutable data that you want to show several posts in one, then a file-basd approach may be a lot faster: eg they don't have to search for every related post, they stick it all in 1 text file and display it once. If you can employ that kind of optimisation, then your file-based approach will work.
Also, mail servers work in the file-based approach too, the Maildir format stores each email message as a file in a directory, not in a database.
one thing I would say though, you'll be better storing everything in 1 file, not 3. The filesystem is better at reading (and caching) a single file than it is with multiple ones. So if you want to store each message as 3 parts, save them all in a single file, read it to get any of the parts and just display the one you want to show.
...and then you want to search all posts by an author and you get to read a million files instead of a simple SQL query...
Databases are NOT faster. Think about it: In the end they store the data in the filesystem as well. So the question if a database is faster depends strongly on the access path.
If you have only one access path, which correlates with your file structure the file system might be way faster then a database. Just make sure you have some caching available for the filesystem.
Of course you do loose all the nice things of a database:
- transactions
- flexible ways to index data, and therefore access data in a flexible way reasonably fast.
- flexible (though ugly) query language
- high recoverability.
The scaling really depends on the filesystem used. AFAIK most file system have some kind of upper limit for number of files (totally or per directory), though on the new ones this is often very high. For hundreds and thousands of files with some directory structure to keep directories to a reasonable size it should be possible to find a well performing file system.
#Eric's comment:
It depends on what you need. If you only need the content of exact on file per query, and you can determine the location and name of the file in a deterministic way the direct access is faster than what a database does, which is roughly:
access a bunch of index entries, in order to
access a bunch of table rows (rdbms typically read blocks that contain multiple rows), in order to
pick a single row from the block.
If you look at it: you have indexes and additional rows in memory, which make your caching inefficient, where is the the speedup of a db supposed to come from?
Databases are great for the general case. But if you have a special case, there is almost always a special solution that is better in some sense.
if you are preferred to go away with RDBMS, why dont u try the other open source key value or document DBs (Non- relational Dbs)..
From ur posting i understand that u r not goin to follow any ACID properties of relational db.. it would be better to adapt other key value dbs (mongodb,coutchdb or hyphertable) instead of your own file system implementation.. it will give better performance than the existing approaches..
Note: I am not also expert in this.. just started working on MongoDB and find useful in similar scenarios. just wanted to share in case u r not aware of these approaches

Resources