Could you make a case for using Berkeley DB XML - berkeley-db

I'm trying to read through the documentation on Berkeley DB XML, and I think I could really use a developer's blog post or synopsis of when they had a problem that found the XML layer atop Berkeley DB was the exact prescription for.
Maybe I'm not getting it, but it seems like they're both in-process DBs, and ultimately you will parse your XML into objects or data, so why not start by storing your data parsed, rather than as XML?

Ultimately I want my data stored in some reasonable format.
If that data started as XML and I want to retrieve it/them using XQuery, without the XML layer, I have to write a lot of code to do the XQuery by myself, and perhaps even worse to know my XML well enough to be able to have a reasonable storage system for it.
Conversely, so long as the performance of the system allows, I can forget about that part of the back end, and just worry about my XML document and up (i.e. to the user) level and leave the rest as a black box. It gives me the B-DB storage goodness, but I get to use it from a document-centric perspective.

Related

What serialization format should we use to store serializaed objects in a SQL Server database

We are developing a customized caching solution that will use a SQL Server database to store cached objects. The hosting environment of the application does not provide an "in-memory" cache such as memcached or app fabric so we must use a SQL Server database.
While most of the cached objects will be simple types (int, string, dates, etc) we will need to also store more complex types such as DataSets, DataTables, generic collections and custom classes.
I have very little experience with the .NET's native serialization and deserialization but I figure we will have to serialize the objects into some form (binary, xml, JSON, etc) to store it in the database and then deserialize it when we pull it out of the database. I would like to have some expert opinions on what the the "some form" should be.
We are using JSON.NET to serialize data into JSON for various AJAX requests. My initial thought was to serialize the cached data into JSON to store it in the database. However, I wanted to get a few opinions as to what would be best for performance and data integrity.
All three of the serialization options you mentioned (binary, json or XML) are valid choices for a serialization format. There are many other serialization formats but the three you mentioned are the most common. As to choosing between the three, here are some of the considerations:
If you store your data in a binary format in the database, it is not human readable if if you ever want to look at it via using Sql Server Management Studio or via a text editor. You would have to write some sort of deserialization tool if you wanted to manually peruse the data.
Binary format will likely result in serialize objects have the smallest size, followed by json, with XML being the largest. As far as the actual size differences, that will vary with your data structures.
As far as performance, binary serialization may be faster than json or XML. However, you would have to benchmark this with your data to see what the differences are.
I think there are excellent .net libraries and BCL support for all three of the format types, so any choice should be doable.
So your choice would depend upon which factors are most important to you: CPU utilization, disk storage space, human readability, and/or personal preference.
We have used json extensively for serialization of our objects for storage in a database , using JSON.Net and we like it a lot. It is handy sometimes to manually view the data via SSMS, and json is significantly more compact for our data than XML.
I won't repeat Joe's answer as he is dead on. I want to add that Binary Serialization does increase the complexity if you upgrade the classes. It is manageable but it takes a little work, and requires you to dig into the binary serializer. Where as with a text based approach you could migrate the data using other options (XML you could run XSLT's on it for example)
The cache must be small and fast, and I like to be more specific about what to use.
I suggest the protobuf-net is the same that SO use, I use it, and the speed together with the size is really good. At least on my tests is the smaller and faster.
We use it for the same reason (for cache), after we have try other serializations libraries, this was the faster and smaller in result. Now in a cache schema you do not actually need to see with your eyes whats is inside because is not a setup that you may need to change something because you did not fix yet the function for that.
If you like to see whats on the cache object you can make a simple function that prints it.

SQL Server database or XML, what to choose for asp.net app?

I have a database with about 10,000 records. Each record has one text field (200 chars or so) and about 30 numeric fields.
The asp.net app only does some searching and sorting and displaying data in a grid. No data sharing between users, read only operations (no database updating), very little calculation.
Should I go with an XML file and use Linq or should I use an SQL Server database? Please give me explanations of your choice.
Should I go with an XML file and use Linq or should I use an SQL Server database?
TOTAL non issue - SQL.
The asp.net app only does some searching and sorting
Read up on a beginner SQL book what an "INDEX" is. XML files have none - so SQL databases are a lot more efficient with sorting and filtering.
It really depends on your needs, Ask your self following questions.
Is your data set going to increase?
Is speed one of the most desired thing of your app?
Are you going to run complex queries?
Is the schema of your data going to change?
If answer to most of the questions above is 'no' then feel free to use XML. Sql provides lot's of features and mainly intended for data storage and retrieval, while with XML you can store data but I would say its main usage is data interoperability and exchange
If your data set increases then SQL should be a choice because you can create indexes on your dataset which will increase speed of retrieval for the data, files are usually read serially and thus are slower for ad-hoc data search.
I think you'll find SQL to be much easier to develop and maintain. XML is great in some scenarios, but I've found it often presents a steady stream of headaches in the long term.
From a performance perspective alone, it's hard to say which approach would be better without knowing the details of your queries and schema. In general, though, SQL tends to win, since it's built for searching and sorting, where XML is not.
For a readonly dataset of that size, you might be able to read the whole thing into memory on the web server side, and do your searching and sorting there.

Store map key/values in a persistent file

I will be creating a structure more or less of the form:
type FileState struct {
LastModified int64
Hash string
Path string
}
I want to write these values to a file and read them in on subsequent calls. My initial plan is to read them into a map and lookup values (Hash and LastModified) using the key (Path). Is there a slick way of doing this in Go?
If not, what file format can you recommend? I have read about and experimented with with some key/value file stores in previous projects, but not using Go. Right now, my requirements are probably fairly simple so a big database server system would be overkill. I just want something I can write to and read from quickly, easily, and portably (Windows, Mac, Linux). Because I have to deploy on multiple platforms I am trying to keep my non-go dependencies to a minimum.
I've considered XML, CSV, JSON. I've briefly looked at the gob package in Go and noticed a BSON package on the Go package dashboard, but I'm not sure if those apply.
My primary goal here is to get up and running quickly, which means the least amount of code I need to write along with ease of deployment.
As long as your entiere data fits in memory, you should't have a problem. Using an in-memory map and writing snapshots to disk regularly (e.g. by using the gob package) is a good idea. The Practical Go Programming talk by Andrew Gerrand uses this technique.
If you need to access those files with different programs, using a popular encoding like json or csv is probably a good idea. If you just have to access those file from within Go, I would use the excellent gob package, which has a lot of nice features.
As soon as your data becomes bigger, it's not a good idea to always write the whole database to disk on every change. Also, your data might not fit into the RAM anymore. In that case, you might want to take a look at the leveldb key-value database package by Nigel Tao, another Go developer. It's currently under active development (but not yet usable), but it will also offer some advanced features like transactions and automatic compression. Also, the read/write throughput should be quite good because of the leveldb design.
There's an ordered, key-value persistence library for the go that I wrote called gkvlite -
https://github.com/steveyen/gkvlite
JSON is very simple but makes bigger files because of the repeated variable names. XML has no advantage. You should go with CSV, which is really simple too. Your program will make less than one page.
But it depends, in fact, upon your modifications. If you make a lot of modifications and must have them stored synchronously on disk, you may need something a little more complex that a single file. If your map is mainly read-only or if you can afford to dump it on file rarely (not every second) a single csv file along an in-memory map will keep things simple and efficient.
BTW, use the csv package of go to do this.

Which is better caching data with a file or a sqlite database?

I am caching data in an application I am currently writing and was wondering which would be better to use a regular text file or a sqlite database to hold the cached data? Thanks.
EDIT:
I am using Zend_Cache so relationships are handled without the need of database. What I am caching is xml strings if saved as regular files can be as big as 60kBs.
Depends on the kind of data you're planning on storing.
If the data is related, and you'd want to retrieve the data based on those relationships...then use SQLite.
If the data is completely unrelated and you're just looking for a way to store a retreive plain text...then use a plain text file. It won't have the added overhead of calling in to SQLite.
If you don't need to persist the data in a file for any reason, and the data lends itself to key/value pair storage...you should look in to something like memcached so you don't have to deal with file IO.
If you really CACHE data (not store or transfer) may be better use cache solutions like memcached ?
60k is so small you probably wouldn't even bother caching the data, just keep it in memory.
The only reason to write it out to disk if you want to persist the data between sessions. In this case, using text/xml or sqlite wont make much difference.

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources