How to enforce typing in gremlin/tinkerpop? - gremlin

In something like SQL, when I create a table I can create type constraints (String with certain lengths, booleans, etc).
How do I do that in gremlin? I am using a javascript implementation, and I know I can switch to typescript and add a lot of type enforcement on that side, but ideally I would also like to have type constraints on the database side too.

I know I can switch to typescript and add a lot of type enforcement on that side
There is an open issue for Typescript at TINKERPOP-2027 and while there was some activity there, no one has really picked up the work.
I would also like to have type constraints on the database side too.
Constraints in the database are not a feature of TinkerPop. For 3.x we long ago committed to allowing graph providers who implement TinkerPop interfaces to provide their own functionality for doing so. There are a lot of historical reasons for that which I won't bother to detail, but the basic answer to your question is that if you want such functionality you need to choose a graph that has that sort of thing. Perhaps take a look at JanusGraph or DS Graph as both have a fairly robust schema language.

Related

Protobuf 3 breaks contract additivity

I'm using Protobuf 3 along with gRPC in distributed environment ("microservices").
Due to lack of supporting not-set/missing values in Protobuf 3 I got the following issue related to contract additivity.
Imagine I have Service A and couple of consumer services B and C owned by Team B and Team C.
If I add a field, say, boolean value to contract of Service A, at the first it will have default value which will be written, say, to database as is.
Then, Team B updates their service to talk using updated contract and passes 'true' as the field value.
Then, Team C still uses old contract and calls the same service - value gets replaced to false. But Team C didn't mean it, moreover they weren't aware about that field at all.
Thus, Service A cannot extend contract at all because consumers that didn't get updated for various reasons yet are able to harm data and the Service A can do nothing about it.
In Thrift such things are done just by single check (.isSet()).
There are dirty workarounds like wrapping primitives into objects but it forces to use library-implementation-specific checks-by-reference (at least in java) which seems to be rather poor hack than robust solution. Also, eventually, I have to wrap everything in wrappers, which as you imagine is not great solution as well.
What are best practices you use to manage such situations in Protobuf 3 in 2017? How do you manage/coordinate contract updates between teams/services? Thanks
Note: this question is not exactly about how to implement absence of detection for not-set/missing values, but rather about how to live with that and follow Protobuf 3 philosophy.
I think the problem here is that trying to check for field presence this way is not really an idiomatic use of protocol buffers (not even in proto2). It sounds like you are trying to evolve your schema by adding new fields but not reading those new fields unless you're sure they came from an updated client. The idiomatic way is to do this instead: just make sure the defaults for the new fields are reasonable and maintain compatible behavior if they're not explicitly set. Then don't try to check for presence--just read the fields and older clients will get good default behavior.
To give you an example, let's say you're adding a new feature that can be enabled or disabled. The right way to do this would be to add a bool field in your request message called enable_new_feature. Since older clients don't know about this field, their requests will have this default to false and so they get the old behavior they're expecting. Adding a disable_new_feature field instead would probably be the wrong way to do it because then you would indeed break older clients by enabling something they didn't want.
Using oneof looks like a better/cleaner alternative to wrappers. See this answer to a similar question: https://stackoverflow.com/a/40552570/618259

Meteor Approach to Collections Considering Join Aren't Supported by Core Yet?

I'm creating my main project functionality right now so it's kind of a big decision to make in my project, I want efficient & scalable solution. I use different API's to fetch users products ultimately for 1 collection to display products information inside a table with possible merge by SKU TITLE from different sources.
I have thought of 2 approaches (In both approaches we add Meteor.userId() to collection insert so each users has it's own products:
1) to create each API it's own collection and fetch the products to it, after or in middle of the API query where I insert it to sourceXProducts also add the logic of merge products by sku and add it to main usersProducts Only the fields I need, and we have the collection of the sourceXproducts if we ever need anything we didn't really include to main usersProducts we can query it and get it so we basically keep all the information possible (because it can come handy)
source1Products = new Meteor.Collection('source1Products');
source2Products = new Meteor.Collection('source2Products');
usersProducts = new Meteor.Collection('usersProducts');
Pros: Honestly I'm not sure, It makes it organized also the way I learned Meteor it seems to be used a lot.
Cons: Meteor collection joins is not supported in core yet, So I have to use a meteor package such as: meteor-publish-composite which seems good but this way might hit performance
2) Create 1 collection and just insert everything the API resonse has and additional apiSource field so we can choose products from X user X api.
usersProducts = new Meteor.Collection('usersProducts');
Pros: No joins, possibly better performance
Cons: Not organized, It can become a large collection maybe it's not good for mongodb
3) Your ideas? :)
First, you should improve the question. You do not tell us anything precise about your schema. What are the entities you have and what type of relations are there and what type of joins do you think you will be doing. How often you will be doing them?
Second, you should rethink your schema and think in the terms of a non-relational database. I see many people coming from SQL world and then they simply design their schema in the same way. Wrong. MongoDB is not SQL and things you learned there you should not try to just reuse here. You should start using features like subdocuments and arrays which can help you solve many basic things you would do in SQL with joins. So, knowing your schema would help us help you design the schema. For example, see this answer and especially the comments for the discussion for a similar type of question you are asking here.
Third, have you evaluated various solutions which exist out there? There are many, but you have not shown us that you tried any of them and how it worked for you. What were pros and cons of them, for you and your project?
Fourth, if you are lazy to evaluate, you can just use peerlibrary:peerdb and peerlibrary:related. They are simply perfect. You should trust me. I am their author.

Database structure of a triple store?

I want to use RDF / triples in my Symfony2 project in order to organize things (in my case it is Tags).
I would see something like this :
ENTITY TAG <-------------- TAG_TAG --------------> ASSOCIATION_TYPE
^ |
|---------------------/
Fields :
TAG
ID
Tag (text)
Description (text/html)
TAG_TAG
ID
*TAG1
*TAG2
*ASSOCIATION_TYPE
ASSOCIATION_PARAM
Like this, I would be able :
To store triple associations
To set different association types. For example, PHP is a Programming_language ; stackoverflow.com is a website ; but the Earth turns around the Sun.
To set parameters (which permits to give more information inside associations)
We could consider setting a many-to-many relation between TAG_TAG and ASSOCIATION_TYPE. By doing this we could set several parameters.
So I have several questions :
Do you think it's a good way to store triples efficiently ?
Is there any RDF layer to extract existing RDF/triples databases and populate my own ?
Should I consider using some kind of tripleStore like Sesame and use it with Symfony ?
To answer your questions:
1) I'm not entirely sure what you're asking. If you're asking if that's a reasonable way to model your data, it's probably ok. But your diagram is not clear and you're a bit light on details. Best thing to do is just do something that works to start with. You can improve the modeling later without much of a hassle.
If you're asking about storage of triples, don't. See my response to #3.
2) There are many RDF libraries available, you have Jena & Sesame in Java, dotNetRdf for the .Net world, RDFLib in python, redland for C, etc.
3) Yes. Don't attempt to re-invent the wheel and build your own triple store. It's not an easy project and you won't do better than even the worst existing triple store on any reasonable time scale.
As Michael said - please don't build your own triple store! There are several solutions available in PHP:
ARC2 provides a triple store based on MySQL
The librdf extension provides a PHP wrapper for the standrad RDF C library
The Erfurt Library is an abstraction library for connecting with the open source Virtuoso Server triple store, but also has its own triple store based on Zend DB taht can be used with MySQL.

Which one to use? EAV or Blobs in the database?

I am currently working to rework the data system of our application. Basically, it is designed so that people can add all the custom fields they want, with only a few constant/always-there fields.
Our current design is giving us plenty of maintenance problems. What we do is dynamically(at runtime) add a column to the database for each field. We have to have a meta table and other cruft to maintain all of these dynamic columns.
Now we are looking at EAV, but it doesn't seem much better. Basically, we have many different types of fields, so there would be a StringValues, IntegerValues, etc table... which makes things that much worse.
I am wondering if using JSON or XML blobs in the database may be a better solution, specifically because in most use cases, when we retrieve anything out of these tables, we need the entire row. The problems is that we need to be able to create reports for this data as well.. No solution really makes custom queries look easy. And searching across such a blob database will surely be a performance nightmare when reports are ran.
Each "row" needs to have anywhere from about 15 to 100(possibly more) attributes/columns associated with it.
We are using SQL Server 2008 and our application interfacing with the database is a C# web application(so, ASP.Net).
what do you think? Use EAV or blobs or something else entirely? (Also, yes, I know a schema free database like MongoDB would be awesome here, but I can't convince my boss to use it)
What about the xml datatype? Advanced querying is possible against this type.
We've used the xml type with good success. We do most of our heavy lifting at the code level using linq to parse out values. Our schema is somewhat fixed, so that may not be an option for you.
One interesting feature of SQL server is the sql_variant type. It's fully supported in .NET and quite easy to use. The advantages is you don't need to create StringValue, IntValue, etc... columns, just one Value column that can contain all the simple types.
This very specific type favors the EAV option, IMHO.
It has some drawbacks though (sorting, distinct selects, etc...). So if you want to use it, make sure you read all the documentation and understand its limit.
Create a table with your known columns and "X" sparse columns using a sequential name such as DataColumn0001, DataColumn0002, etc. When there is a definition for a new column just rename a column and start inserting data. The great advantage to the sparse column is it is indexable.
More info at this link.
What you're doing is STUPID with a database that doesn't support your data type. You should work with a medium that meets your needs which include NoSQL databases such as RavenDB, MongoDB, DocumentDB, CouchBase or Postgres in RDMBS to name several.
You are inherently using the tool in a capacity it was neither designed for, and one it specifically attempts to limit you from achieving success. NoSQL database solutions frequently use JSON as an underlying storage because JSON is inherently schemaless. Want to add a property? Sure go ahead, want to add a whole sub collection? Sure go ahead. NoSQL databases were in part, created specifically to remove rigid schema requirements of RDBMS.
2015 Edit: Postgres now natively supports JSON. This is a viable option for RDBMS. My answer is still correct that you need to use the correct tool for the problem. It is a polygot persistence world.

How do I do this in Drupal?

Im currently evaluating Drupal to see if we can use it to replace our framework. My problem is I have this legacy tables which I would want to try to reflect in Drupal. It involves a join table. There's quite a lot of this kind of relationship in our existing web app so I am looking for possible ways to solve it.
Thank you for your insight!
There are several ways to do this, and it's hard to know which is best with no context about what you're actually doing with the data, but here are some options:
One way to do this is to make a content type representing each table (using CCK) with the foreign keys represented by type-specific node reference fields. Doing everything as nodes gives you a bunch of prebuilt functionality around nodes, but has a bit of overhead you may want to avoid.
Another option is to leave your database just like it is now. Drupal can do direct database queries, or you can use Data to expose your tables to Views.
Another option, if those referenced tables really only have 1 non-ID field, is to do the project_companies_assignments as nodes and do the other 3 as taxonomies. But this won't work if those are really more complex entities, and wouldn't be very flexible if they might become more complex.
What about using hook_views_api and exposing your legacy tables in hook_views_data? i tried something like this myself - not sure if that is what you want...
try and let me know if that works for you.
http://drupalwalla.blogspot.com/2011/09/how-do-you-expose-your-legacy-database.html
Going with Views and CCK, optionally with the additional Data module has one huge disadvantage: it comes with complexity.
My preferred alternative, is to write your own module. Drupal offers little help wrt database abstraction, it comes not with a proper ORM or such. But with some simple CRUD functions for the data in the database, a few simple forms in front, and a menu-callback with some pages to present the data, you can -quite often- get your datamodel worked out much faster then going the route of the overly complex, often poorly documented CCK and views modules. KISS.

Resources