Database structure of a triple store? - symfony

I want to use RDF / triples in my Symfony2 project in order to organize things (in my case it is Tags).
I would see something like this :
ENTITY TAG <-------------- TAG_TAG --------------> ASSOCIATION_TYPE
^ |
|---------------------/
Fields :
TAG
ID
Tag (text)
Description (text/html)
TAG_TAG
ID
*TAG1
*TAG2
*ASSOCIATION_TYPE
ASSOCIATION_PARAM
Like this, I would be able :
To store triple associations
To set different association types. For example, PHP is a Programming_language ; stackoverflow.com is a website ; but the Earth turns around the Sun.
To set parameters (which permits to give more information inside associations)
We could consider setting a many-to-many relation between TAG_TAG and ASSOCIATION_TYPE. By doing this we could set several parameters.
So I have several questions :
Do you think it's a good way to store triples efficiently ?
Is there any RDF layer to extract existing RDF/triples databases and populate my own ?
Should I consider using some kind of tripleStore like Sesame and use it with Symfony ?

To answer your questions:
1) I'm not entirely sure what you're asking. If you're asking if that's a reasonable way to model your data, it's probably ok. But your diagram is not clear and you're a bit light on details. Best thing to do is just do something that works to start with. You can improve the modeling later without much of a hassle.
If you're asking about storage of triples, don't. See my response to #3.
2) There are many RDF libraries available, you have Jena & Sesame in Java, dotNetRdf for the .Net world, RDFLib in python, redland for C, etc.
3) Yes. Don't attempt to re-invent the wheel and build your own triple store. It's not an easy project and you won't do better than even the worst existing triple store on any reasonable time scale.

As Michael said - please don't build your own triple store! There are several solutions available in PHP:
ARC2 provides a triple store based on MySQL
The librdf extension provides a PHP wrapper for the standrad RDF C library
The Erfurt Library is an abstraction library for connecting with the open source Virtuoso Server triple store, but also has its own triple store based on Zend DB taht can be used with MySQL.

Related

How to enforce typing in gremlin/tinkerpop?

In something like SQL, when I create a table I can create type constraints (String with certain lengths, booleans, etc).
How do I do that in gremlin? I am using a javascript implementation, and I know I can switch to typescript and add a lot of type enforcement on that side, but ideally I would also like to have type constraints on the database side too.
I know I can switch to typescript and add a lot of type enforcement on that side
There is an open issue for Typescript at TINKERPOP-2027 and while there was some activity there, no one has really picked up the work.
I would also like to have type constraints on the database side too.
Constraints in the database are not a feature of TinkerPop. For 3.x we long ago committed to allowing graph providers who implement TinkerPop interfaces to provide their own functionality for doing so. There are a lot of historical reasons for that which I won't bother to detail, but the basic answer to your question is that if you want such functionality you need to choose a graph that has that sort of thing. Perhaps take a look at JanusGraph or DS Graph as both have a fairly robust schema language.

Which one to use? EAV or Blobs in the database?

I am currently working to rework the data system of our application. Basically, it is designed so that people can add all the custom fields they want, with only a few constant/always-there fields.
Our current design is giving us plenty of maintenance problems. What we do is dynamically(at runtime) add a column to the database for each field. We have to have a meta table and other cruft to maintain all of these dynamic columns.
Now we are looking at EAV, but it doesn't seem much better. Basically, we have many different types of fields, so there would be a StringValues, IntegerValues, etc table... which makes things that much worse.
I am wondering if using JSON or XML blobs in the database may be a better solution, specifically because in most use cases, when we retrieve anything out of these tables, we need the entire row. The problems is that we need to be able to create reports for this data as well.. No solution really makes custom queries look easy. And searching across such a blob database will surely be a performance nightmare when reports are ran.
Each "row" needs to have anywhere from about 15 to 100(possibly more) attributes/columns associated with it.
We are using SQL Server 2008 and our application interfacing with the database is a C# web application(so, ASP.Net).
what do you think? Use EAV or blobs or something else entirely? (Also, yes, I know a schema free database like MongoDB would be awesome here, but I can't convince my boss to use it)
What about the xml datatype? Advanced querying is possible against this type.
We've used the xml type with good success. We do most of our heavy lifting at the code level using linq to parse out values. Our schema is somewhat fixed, so that may not be an option for you.
One interesting feature of SQL server is the sql_variant type. It's fully supported in .NET and quite easy to use. The advantages is you don't need to create StringValue, IntValue, etc... columns, just one Value column that can contain all the simple types.
This very specific type favors the EAV option, IMHO.
It has some drawbacks though (sorting, distinct selects, etc...). So if you want to use it, make sure you read all the documentation and understand its limit.
Create a table with your known columns and "X" sparse columns using a sequential name such as DataColumn0001, DataColumn0002, etc. When there is a definition for a new column just rename a column and start inserting data. The great advantage to the sparse column is it is indexable.
More info at this link.
What you're doing is STUPID with a database that doesn't support your data type. You should work with a medium that meets your needs which include NoSQL databases such as RavenDB, MongoDB, DocumentDB, CouchBase or Postgres in RDMBS to name several.
You are inherently using the tool in a capacity it was neither designed for, and one it specifically attempts to limit you from achieving success. NoSQL database solutions frequently use JSON as an underlying storage because JSON is inherently schemaless. Want to add a property? Sure go ahead, want to add a whole sub collection? Sure go ahead. NoSQL databases were in part, created specifically to remove rigid schema requirements of RDBMS.
2015 Edit: Postgres now natively supports JSON. This is a viable option for RDBMS. My answer is still correct that you need to use the correct tool for the problem. It is a polygot persistence world.

tool to create a domain model diagram

My requirement is to create a diagram which can represent the object relationship which is stored in an xml.
For example
This needs to get translated as Class abc has a field which is xyz. This heirarchy can be multi level. and we need to represent
a) high level struturing of classes
b) contents of these classes.
I looked at some tools like umlet, violet, visio. but all of these require a lot of manual intervention. Is there a tool which can be configured to read from xml.
Try using Graphviz and the dot language.
http://www.graphviz.org/
You'll need to write a translation layer, but that shouldn't be too hard in the language of your choice.
UModel might be a good pick for you on this one... http://www.altova.com/umodel/xml-schemas-in-uml.html

Sample Data Creation Tool (mainly for Databases)

I’m thinking through some database design concepts and believe that creating sample data simulating real-world volume of my application will help solidify some design decisions.
Does any anyone know of a tool to create sample data? I’m looking for something that’s database and platform neutral if possible (from MySQL to DB/2 and Windows to UNIX) so to test the design across different systems/architectures. I’m visioning some tool that you can:
point to a database table(s) (some configuration of the DSN, etc.)
introspect the fields and based on the field... (point-and-click or add some configuration)
have a means for expressing how to create sample data (MySQL Sample Data Creator is the kind of thing I vision but I think their'd be some more options like commit frequency so to create very large data sets... millions or billions of rows... don't think this tool would scale to the volume of data I want to create)
push a button and go (depending on your parameters, this may take a long time)
Any thoughts? Sure, I could write an app to do this but it seems so generic that I shouldn’t have to reinvent the wheel.
DBMonster is fine but I prefer databene benerator as I explained it in this answer to a similar question.
Something like DBMonster?
This page also has a listing of many DB data generators.
I cannot help you with MySQL or DB/2 but, in case anyone gets to this answer with a need for MS SQL Server, I can recommend the Data Generator from Red Gate.
Our test data generator, Datanamic DB Data Generator can do this for you. Works with MySQL. It uses default "generator settings" when loading your tables the first time. You can then "fine-tune" the fields and/or choose other "generators".

Saving MFC Model as SQLite database

I am playing with a CAD application using MFC. I was thinking it would be nice to save the document (model) as an SQLite database.
Advantages:
I avoid file format changes (SQLite takes care of that)
Free query engine
Undo stack is simplified (table name, column name, new value
and so on...)
Opinions?
This is a fine idea. Sqlite is very pleasant to work with!
But remember the old truism (I can't get an authoritative answer from Google about where it originally is from) that storing your data in a relational database is like parking your car by driving it into the garage, disassembling it, and putting each piece into a labeled cabinet.
Geometric data, consisting of points and lines and segments that refer to each other by name, is a good candidate for storing in database tables. But when you start having composite objects, with a heirarchy of subcomponents, it might require a lot less code just to use serialization and store/load the model with a single call.
So that would be a fine idea too.
But serialization in MFC is not nearly as much of a win as it is in, say, C#, so on balance I would go ahead and use SQL.
This is a great idea but before you start I have a few recommendations:
Be careful that each database is uniquely identifiable in some way besides file name such as having a table that describes the file within the database.
Take a look at some of the MFC based examples and wrappers already available before creating your own. The ones I have seen had borrowed on each to create a better result. Google: MFC SQLite Wrapper.
Using SQLite database is also useful for maintaining state. Think ahead about how you would manage keeping in mind what features are included and are missing in SQLite.
You can also think now about how you may extend your application to the web by making sure your database table structure is easily exportable to other SQL database systems- as well as easy enough to extend to a backup system.

Resources