Which one to use? EAV or Blobs in the database? - asp.net

I am currently working to rework the data system of our application. Basically, it is designed so that people can add all the custom fields they want, with only a few constant/always-there fields.
Our current design is giving us plenty of maintenance problems. What we do is dynamically(at runtime) add a column to the database for each field. We have to have a meta table and other cruft to maintain all of these dynamic columns.
Now we are looking at EAV, but it doesn't seem much better. Basically, we have many different types of fields, so there would be a StringValues, IntegerValues, etc table... which makes things that much worse.
I am wondering if using JSON or XML blobs in the database may be a better solution, specifically because in most use cases, when we retrieve anything out of these tables, we need the entire row. The problems is that we need to be able to create reports for this data as well.. No solution really makes custom queries look easy. And searching across such a blob database will surely be a performance nightmare when reports are ran.
Each "row" needs to have anywhere from about 15 to 100(possibly more) attributes/columns associated with it.
We are using SQL Server 2008 and our application interfacing with the database is a C# web application(so, ASP.Net).
what do you think? Use EAV or blobs or something else entirely? (Also, yes, I know a schema free database like MongoDB would be awesome here, but I can't convince my boss to use it)

What about the xml datatype? Advanced querying is possible against this type.
We've used the xml type with good success. We do most of our heavy lifting at the code level using linq to parse out values. Our schema is somewhat fixed, so that may not be an option for you.

One interesting feature of SQL server is the sql_variant type. It's fully supported in .NET and quite easy to use. The advantages is you don't need to create StringValue, IntValue, etc... columns, just one Value column that can contain all the simple types.
This very specific type favors the EAV option, IMHO.
It has some drawbacks though (sorting, distinct selects, etc...). So if you want to use it, make sure you read all the documentation and understand its limit.

Create a table with your known columns and "X" sparse columns using a sequential name such as DataColumn0001, DataColumn0002, etc. When there is a definition for a new column just rename a column and start inserting data. The great advantage to the sparse column is it is indexable.
More info at this link.

What you're doing is STUPID with a database that doesn't support your data type. You should work with a medium that meets your needs which include NoSQL databases such as RavenDB, MongoDB, DocumentDB, CouchBase or Postgres in RDMBS to name several.
You are inherently using the tool in a capacity it was neither designed for, and one it specifically attempts to limit you from achieving success. NoSQL database solutions frequently use JSON as an underlying storage because JSON is inherently schemaless. Want to add a property? Sure go ahead, want to add a whole sub collection? Sure go ahead. NoSQL databases were in part, created specifically to remove rigid schema requirements of RDBMS.
2015 Edit: Postgres now natively supports JSON. This is a viable option for RDBMS. My answer is still correct that you need to use the correct tool for the problem. It is a polygot persistence world.

Related

Meteor Approach to Collections Considering Join Aren't Supported by Core Yet?

I'm creating my main project functionality right now so it's kind of a big decision to make in my project, I want efficient & scalable solution. I use different API's to fetch users products ultimately for 1 collection to display products information inside a table with possible merge by SKU TITLE from different sources.
I have thought of 2 approaches (In both approaches we add Meteor.userId() to collection insert so each users has it's own products:
1) to create each API it's own collection and fetch the products to it, after or in middle of the API query where I insert it to sourceXProducts also add the logic of merge products by sku and add it to main usersProducts Only the fields I need, and we have the collection of the sourceXproducts if we ever need anything we didn't really include to main usersProducts we can query it and get it so we basically keep all the information possible (because it can come handy)
source1Products = new Meteor.Collection('source1Products');
source2Products = new Meteor.Collection('source2Products');
usersProducts = new Meteor.Collection('usersProducts');
Pros: Honestly I'm not sure, It makes it organized also the way I learned Meteor it seems to be used a lot.
Cons: Meteor collection joins is not supported in core yet, So I have to use a meteor package such as: meteor-publish-composite which seems good but this way might hit performance
2) Create 1 collection and just insert everything the API resonse has and additional apiSource field so we can choose products from X user X api.
usersProducts = new Meteor.Collection('usersProducts');
Pros: No joins, possibly better performance
Cons: Not organized, It can become a large collection maybe it's not good for mongodb
3) Your ideas? :)
First, you should improve the question. You do not tell us anything precise about your schema. What are the entities you have and what type of relations are there and what type of joins do you think you will be doing. How often you will be doing them?
Second, you should rethink your schema and think in the terms of a non-relational database. I see many people coming from SQL world and then they simply design their schema in the same way. Wrong. MongoDB is not SQL and things you learned there you should not try to just reuse here. You should start using features like subdocuments and arrays which can help you solve many basic things you would do in SQL with joins. So, knowing your schema would help us help you design the schema. For example, see this answer and especially the comments for the discussion for a similar type of question you are asking here.
Third, have you evaluated various solutions which exist out there? There are many, but you have not shown us that you tried any of them and how it worked for you. What were pros and cons of them, for you and your project?
Fourth, if you are lazy to evaluate, you can just use peerlibrary:peerdb and peerlibrary:related. They are simply perfect. You should trust me. I am their author.

SQL Server database or XML, what to choose for asp.net app?

I have a database with about 10,000 records. Each record has one text field (200 chars or so) and about 30 numeric fields.
The asp.net app only does some searching and sorting and displaying data in a grid. No data sharing between users, read only operations (no database updating), very little calculation.
Should I go with an XML file and use Linq or should I use an SQL Server database? Please give me explanations of your choice.
Should I go with an XML file and use Linq or should I use an SQL Server database?
TOTAL non issue - SQL.
The asp.net app only does some searching and sorting
Read up on a beginner SQL book what an "INDEX" is. XML files have none - so SQL databases are a lot more efficient with sorting and filtering.
It really depends on your needs, Ask your self following questions.
Is your data set going to increase?
Is speed one of the most desired thing of your app?
Are you going to run complex queries?
Is the schema of your data going to change?
If answer to most of the questions above is 'no' then feel free to use XML. Sql provides lot's of features and mainly intended for data storage and retrieval, while with XML you can store data but I would say its main usage is data interoperability and exchange
If your data set increases then SQL should be a choice because you can create indexes on your dataset which will increase speed of retrieval for the data, files are usually read serially and thus are slower for ad-hoc data search.
I think you'll find SQL to be much easier to develop and maintain. XML is great in some scenarios, but I've found it often presents a steady stream of headaches in the long term.
From a performance perspective alone, it's hard to say which approach would be better without knowing the details of your queries and schema. In general, though, SQL tends to win, since it's built for searching and sorting, where XML is not.
For a readonly dataset of that size, you might be able to read the whole thing into memory on the web server side, and do your searching and sorting there.

How do I do this in Drupal?

Im currently evaluating Drupal to see if we can use it to replace our framework. My problem is I have this legacy tables which I would want to try to reflect in Drupal. It involves a join table. There's quite a lot of this kind of relationship in our existing web app so I am looking for possible ways to solve it.
Thank you for your insight!
There are several ways to do this, and it's hard to know which is best with no context about what you're actually doing with the data, but here are some options:
One way to do this is to make a content type representing each table (using CCK) with the foreign keys represented by type-specific node reference fields. Doing everything as nodes gives you a bunch of prebuilt functionality around nodes, but has a bit of overhead you may want to avoid.
Another option is to leave your database just like it is now. Drupal can do direct database queries, or you can use Data to expose your tables to Views.
Another option, if those referenced tables really only have 1 non-ID field, is to do the project_companies_assignments as nodes and do the other 3 as taxonomies. But this won't work if those are really more complex entities, and wouldn't be very flexible if they might become more complex.
What about using hook_views_api and exposing your legacy tables in hook_views_data? i tried something like this myself - not sure if that is what you want...
try and let me know if that works for you.
http://drupalwalla.blogspot.com/2011/09/how-do-you-expose-your-legacy-database.html
Going with Views and CCK, optionally with the additional Data module has one huge disadvantage: it comes with complexity.
My preferred alternative, is to write your own module. Drupal offers little help wrt database abstraction, it comes not with a proper ORM or such. But with some simple CRUD functions for the data in the database, a few simple forms in front, and a menu-callback with some pages to present the data, you can -quite often- get your datamodel worked out much faster then going the route of the overly complex, often poorly documented CCK and views modules. KISS.

(LINQ-To-SQL) Creating classes first, database table second, how to auto-connect the two?

I'm creating a data model first using the LINQ-To-SQL graphical designer by using right-click->Add->Class. My idea is that I'll set up everything first using test repositories, design the entire website, then as a final step, create a database using the LINQ-To-SQL classes as a model for the database tables and relationships. My reasoning is that it's easy to edit the classes, but hard to modify DB tables (especially if there's already data in them), so by doing the database part last, it becomes much easier to design the structure.
My question is, is there an automatic way to link the two once I have the DB tables created? I know you can manually fill out the class properties for the LINQ-To-SQL entities but this is pretty cumbersome if you have a lot of tables to deal with. The other option is to delete your manually-created classes and drag the tables from the database into the designer to auto-generate the classes, but I'm not sure if this is the best way of doing it.
Linq to Sql is intended to be a relatively thin ORM layer over a database. While you can of course just add properties to a data context and use them as a sort of mock, you are correct, it isn't really easy to work with.
Instead of relying solely on Linq to Sql generated classes to give you freedom from the database implementation, you may want to look into the repository design pattern. It allows you to have a smooth separation between your database, domain model, and your middle tier; I have used it on two projects now, and have been able to (for the most part) build everything top-down, leaving the actual database for last. Below is a link to a good tutorial on the pattern (better than I could scribble down here).
https://web.archive.org/web/20110503184234/http://blogs.hibernatingrhinos.com/nhibernate/archive/2008/10/08/the-repository-pattern.aspx
Depending on your database permissions, you may call your datacontext's DeleteDatabase() and CreateDatabase() methods as a ungraceful way of resyncing your classes and tables. This is not much of an option when you have actual data in the database, but does work when you are in your development stages.
Take a look at my add-in (which you can download from http://www.huagati.com/dbmltools/ , free 45-day trial licenses are also available from the same site).
It can generate SQL-DDL diff scripts with the SQL-DDL statements for updating your database with only the portions that has changed in the L2S model (e.g. add missing columns, missing tables, missing FKs etc), instead of the L2S-out-of-the-box support for recreating the entire db from scratch.
It also supports syncing the other way; updating the model from the database.

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources