Is there a way to reuse aggregate steps? - gremlin

I have a graph database storing different types of entities and I am building an API to fetch entities from the graph. It is however a bit more complicated since for each type of entity there is a set of rules that are used for fetching related entities as well as the original.
To do this I used an aggregate step to aggregate all related entities that I am fetching into a collection.
An added requirement is to fetch a batch of entities (and related entities). I was going to do this by changing the has step that is fetching entities to use P.within and map the aggregation to each of the found entities.
This works if I continue fetching a single entity, but if I want to fetch two then my result set will be correct for the first one but the result set for the second contains the results of the first one as well as its own results.
I think this is because second one will simply add to the aggregated collection from the first one since the aggregation key is the same.
I haven't found any way to clear the collection between the first and the second, nor any way to have a dynamic aggregation side effect key.
Code:
return graph.traversal().V()
.hasLabel(ENTITY_LABEL)
.has("entity_ref", P.within(entityRefs)) // entityRefs is a list of entities I am looking for
.flatMap(
__.aggregate("match")
.sideEffect(
// The logic that applies the rules lives here. It will add items to "match" collection.
)
.select("match")
.fold()
)
.toStream()
...
The result should be a list of lists of entities where the first list of entities in the outer list contains results for the first entity in entityRefs, and the second list of entities contains results for the second entity in entityRefs.
Example:
I want to fetch the vertices for entity refs A and B and their related entities.
Let's say I expect the results to then be [[A, C], [B, D, E]], but I get the results [[A, C], [A, C, B, D, E]] (The second results contain the results from the first one).
Questions:
Is there a way to clear the "match" collection after the selection?
Is there a way to have dynamic side effect keys such that I create a collection for each entityRef?
Is there perhaps a different way I can do this?
Have I misidentified the problem?
EDIT:
This is an example that is a miniature version of the problem. The graph is setup like so:
g.addV('entity').property('id',1).property('type', 'customer').as('1').
addV('entity').property('id',2).property('type', 'email').as('2').
addV('entity').property('id',6).property('type', 'customer').as('6').
addV('entity').property('id',3).property('type', 'product').as('3').
addV('entity').property('id',4).property('type', 'subLocation').as('4').
addV('entity').property('id',7).property('type', 'location').as('7').
addV('entity').property('id',5).property('type', 'productGroup').as('5').
addE('AKA').from('1').to('2').
addE('AKA').from('2').to('6').
addE('HOSTED_AT').from('3').to('4').
addE('LOCATED_AT').from('4').to('7').
addE('PART_OF').from('3').to('5').iterate()
I want to fetch a batch of entities, given their ids and fetch related entities. Which related entities should be returned is a function of the type of the original entity.
My current query is like this (slightly modified for this example):
g.V().
hasLabel('entity').
has('id', P.within(1,3)).
flatMap(
aggregate('match').
sideEffect(
choose(values('type')).
option('customer',
both('AKA').
has('type', P.within('email', 'phone')).
sideEffect(
has('type', 'email').
aggregate('match')).
both('AKA').
has('type', 'customer').
aggregate('match')).
option('product',
bothE('HOSTED_AT', 'PART_OF').
choose(label()).
option('PART_OF',
bothV().
has('type', P.eq('productGroup')).
aggregate('match')).
option('HOSTED_AT',
bothV().
has('type', P.eq('subLocation')).
aggregate('match').
both('LOCATED_AT').
has('type', P.eq('location')).
aggregate('match')))
).
select('match').
unfold().
dedup().
values('id').
fold()
).
toList()
If I only fetch for one entity i get correct results. For id: 1 I get [1,2,6] and for id: 3 I get [3,5,4,7]. However when i fetch for both I get:
==>[3,5,4,7]
==>[3,5,4,7,1,2,6]
The first result is correct, but the second contains the results for both ids.

You can leverage the (not too well documented to be honest but seemingly powerful traversal step) group().by(key).by(value).
That way you can drop the aggregate() side effect step that is causing you trouble. As an alternative to collect multiple vertices matching some traversal into a list I used union().
An example that uses the graph you posted(I only included the Customer option for brevity):
g.V().
hasLabel('entity').
has('id', P.within(1,3)).
<String, List<Entity>>group()
.by("id")
.by(choose(values("type"))
.option('customer', union(
identity(),
both('AKA').has('type', 'email'),
both('AKA').has('type', within('email', 'phone')).both('AKA').has('type', 'customer'))
.map((traversal) -> new Entity(traversal.get())) //Or whatever business class you have
.fold() //This is important to collect all 3 paths in the union together
.option('product', union()))
.next()
This traversal has the obvious drawback of the code being a bit more verbose. It declares it will step over the 'AKA' from a Customer twice. Your traversal only declared it once.
It does however keep the by(value) part of the group() step separate between different keys. Which is what we wanted.

Related

How can I limit and sort on document ID in firestore?

I have a collection where the documents are uniquely identified by a date, and I want to get the n most recent documents. My first thought was to use the date as a document ID, and then my query would sort by ID in descending order. Something like .orderBy(FieldPath.documentId, descending: true).limit(n). This does not work, because it requires an index, which can't be created because __name__ only indexes are not supported.
My next attempt was to use .limitToLast(n) with the default sort, which is documented here.
By default, Cloud Firestore retrieves all documents that satisfy the query in ascending order by document ID
According to that snippet from the docs, .limitToLast(n) should work. However, because I didn't specify a sort, it says I can't limit the results. To fix this, I tried .orderBy(FieldPath.documentId).limitToLast(n), which should be equivalent. This, for some reason, gives me an error saying I need an index. I can't create it for the same reason I couldn't create the previous one, but I don't think I should need to because they must already have an index like that in order to implement the default ordering.
Should I just give up and copy the document ID into the document as a field, so I can sort that way? I know it should be easy from an algorithms perspective to do what I'm trying to do, but I haven't been able to figure out how to do it using the API. Am I missing something?
Edit: I didn't realize this was important, but I'm using the flutterfire firestore library.
A few points. It is ALWAYS a good practice to use random, well distributed documentId's in firestore for scale and efficiency. Related to that, there is effectively NO WAY to query by documentId - and in the few circumstances you can use it (especially for a range, which is possible but VERY tricky, as it requires inequalities, and you can only do inequalities on one field). IF there's a reason to search on an ID, yes it is PERFECTLY appropriate to store in the document as well - in fact, my wrapper library always does this.
the correct notation, btw, would be FieldPath.documentId() (method, not constant) - alternatively, __name__ - but I believe this only works in Queries. The reason it requested a new index is without the () it assumed you had a field named FieldPath with a subfield named documentid.
Further: FieldPath.documentId() does NOT generate the documentId at the server - it generates the FULL PATH to the document - see Firestore collection group query on documentId for a more complete explanation.
So net:
=> documentId's should be as random as possible within a collection; it's generally best to let Firestore generate them for you.
=> a valid exception is when you have ONE AND ONLY ONE sub-document under another - for example, every "user" document might have one and only one "forms of Id" document as a subcollection. It is valid to use the SAME ID as the parent document in this exceptional case.
=> anything you want to query should be a FIELD in a document,and generally simple fields.
=> WORD TO THE WISE: Firestore "arrays" are ABSOLUTELY NOT ARRAYS. They are ORDERED LISTS, generally in the order they were added to the array. The SDK presents them to the CLIENT as arrays, but Firestore it self does not STORE them as ACTUAL ARRAYS - THE NUMBER YOU SEE IN THE CONSOLE is the order, not an index. matching elements in an array (arrayContains, e.g.) requires matching the WHOLE element - if you store an ordered list of objects, you CANNOT query the "array" on sub-elements.
From what I've found:
FieldPath.documentId does not match on the documentId, but on the refPath (which it gets automatically if passed a document reference).
As such, since the documents are to be sorted by timestamp, it would be more ideal to create a timestamp fieldvalue for createdAt rather than a human-readable string which is prone to string length sorting over the value of the string.
From there, you can simply sort by date and limit to last. You can keep the document ID's as you intend.

Gremlin, how to return all vertex pairs that are connected by an edge with a specific label

Take a simple example of airline connection graph as in below picture
can we come up with a gremlin query that can return pairs of cities connected by SW? Like [{ATL,CHI},{SFO,CHI},{DAL,CHI},{HSV,DAL}]
Looks like all you probably need is:
g.V().outE('SW').inV().path()
If you don't want the edge in the result you can use a flatMap :
g.V().flatMap(outE('SW').inV()).path()
To get back some properties rather than just vertices all you need to do is add a by modulator to the path step.
g.V().flatMap(outE('SW').inV()).path().by(valueMap())
This will return all the properties for every vertex. In a large result set this is not considered a best practice and you should explicitly ask for the properties you care about. There are many ways you can do this using values, project or valueMap. If you had a property called code representing the airport code you might do this.
g.V().
flatMap(outE('SW').inV()).
path().
by(valueMap('code'))
or just
g.V().flatMap(outE('SW').inV()).
path().
by('code')

User Object in a one-to-one relationship using primary key shared with foreign key

Iterations of this question have been asked in the past, but this presents unique challenges as it combines some of the issues in one larger problem.
I have an entity(User) that is used as the user class in my application, then I have another entity (UserExtra), in a one-to-one relationship with the user entity, UserExtra's id is the same as User. The foreign key is the same as the primary key.
When the user object is loaded (say by $this->getUser() or by {{ app.user }}, the UserExtra data is also loaded through a join. The whole point of having two entities is so I don't have to load all the data at once.
I even tried defining a custom UserLoaderInterface/UserProviderInterface Repository for User, making sure that refreshUser and loadUserByUsername would only load the User data (I'd like for the UserExtra data to sit in a proxy unless I explicitly need it) but when Doctrine goes to Hydrate the object, it issues an extra query to load the UserExtra data, thereby skipping the Proxy status.
Is there a way out of this?
there are many solution for your issue:
1) Change the owning side and inverse side http://developer.happyr.com/choose-owning-side-in-onetoone-relation - I don't think that's right from a DB design perspective every time.
2) In functions like find, findAll, etc, the inverse side in OneToOne is joined automatically (it's always like fetch EAGER). But in DQL, it's not working like fetch EAGER and that costs the additional queries. Possible solution is every time to join with the inverse entity
3) If an alternative result format (i.e. getArrayResult()) is sufficient for some use-cases, that could also avoid this problem.
4) Change inverse side to be OneToMany - just looks wrong, maybe could be a temporary workaround.
5) Force partial objects. No additional queries but also no lazy-loading: $query->setHint (Query::HINT_FORCE_PARTIAL_LOAD, true) - seams to me the only possible solution, but not without a price:
Partial Objects are a little bit risky, because your entity behavior is not normal. For example if you not specify in ->select() all associations that you will user you can have an error because your object will not be full, all not specifically selected associations will be null
6) Not mapping the inverse bi-directional OneToOne association and either use an explicit service or a more active record approach - https://github.com/doctrine/doctrine2/pull/970#issuecomment-38383961 - And it looks like Doctrine closed the issue
this question may help you : one to one relation load

How to handle duplicates in disconnected object graph?

I'm having a problem updating a disconnected POCO model in an ASP.NET application.
Lets say we have the following model:
Users
Districts
Orders
A user can be responsible for 0 or more districts, an order belongs to a district and a user can be the owner of an order.
When the user logs in the user and the related districts are loaded. Later the user loads an order, and sets himself as the owner of the order. The user(and related districts) and order(and related district) are loaded in two different calls with two different dbcontexts. When I save the order after the user has assigned himself to it. I get an exception that saying that acceptchanges cannot continue because the object's key values conflict with another object.
Which is not strange, since the same district can appear both in the list of districts the user is responsible and on the order.
I've searched high and low for a solution to this problem, but the answers I have found seems to be either:
Don't load the related entities of one of the objects in my case that would be the districts of the user.
Don't assign the user to the order by using the objects, just set the foreign key id on the order object.
Use nHibernate since it apparently handles it.
I tried 1 and that works, but I feel this is wrong because I then either have to load the user without it's districts before relating it to the order, or do a shallow clone. This is fine for this simple case here, but the problem is that in my case district might appear several more times in the graph. Also it seems pointless since I have the objects so why not let me connected them and update the graph. The reason I need the entire graph for the order, is that I need to display all the information to the user. So since I got all the objects why should I need to either reload or shallow clone it to get this to work?
I tried using STE but I ran in to the same problem, since I cannot attach an object to a graph loaded by another context. So I am back at square 1.
I would assume that this is a common problem in anything but tutorial code. Yet, I cannot seem to find any good solution to this. Which makes me think that either I do not under any circumstance understand using POCOs/EF or I suck at using google to find an answer to this problem.
I've bought both of the "Programming Entity Framework" books from O'Reilly by Julia Lerman but cannot seem to find anything to solve my problem in those books either.
Is there anyone out there who can shed some light on how to handle graphs where some objects might be repeated and not necessarily loaded from the same context.
The reason why EF does not allow to have two entities with the same key being attached to a context is that EF cannot know which one is "valid". For example: You could have two District objects in your object graph, both with a key Id = 1, but the two have different Name property values. Which one represents the data that have to be saved to the database?
Now, you could say that it doesn't matter if both objects haven't changed, you just want to attach them to a context in state Unchanged, maybe to establish a relationship to another entity. It is true in this special case that duplicates might not be a problem. But I think, it is simply too complex to deal with all situations and different states the objects could have to decide if duplicate objects are causing ambiguities or not.
Anyway, EF implements a strict identity mapping between object reference identity and key property values and just doesn't allow to have more than one entity with a given key attached to a context.
I don't think there is a general solution for this kind of problem. I can only add a few more ideas in addition to the solutions in your question:
Attach the User to the context you are loading the order in:
context.Users.Attach(user); // attaches user AND user.Districts
var order = context.Orders.Include("Districts")
.Single(o => o.Id == someOrderId);
// because the user's Districts are attached, no District with the same key
// will be loaded again, EF will use the already attached Districts to
// populate the order.Districts collection, thus avoiding duplicate Districts
order.Owner = user;
context.SaveChanges();
// it should work without exception
Attach only the entities to the context you need in order to perform a special update:
using (var context = new MyContext())
{
var order = new Order { Id = order.Id };
context.Orders.Attach(order);
var user = new User { Id = user.Id };
context.Users.Attach(user);
order.Owner = user;
context.SaveChanges();
}
This would be enough to update the Owner relationship. You would not need the whole object graph for this procedure, you only need the correct primary key values of the entities the relationship has to be created for. It doesn't work that easy of course if you have more changes to save or don't know what exactly could have been changed.
Don't attach the object graph to the context at all. Instead load new entities from the database that represent the object graph currently stored in the database. Then update the loaded graph with your detached object graph and save the changes applied to the loaded (=attached) graph. An example of this procedure is shown here. It is safe and a very general pattern (but not generic) but it can be very complex for complex object graphs.
Traverse the object graph and replace the duplicate objects by a unique one, for example just the first one with type and key you have found. You could build a dictionary of unique objects that you lookup to replace the duplicates. An example is here.

Link Tables - Code First - Entity Framework - Table-Mapping

I asked a related question previously on this forum. This question outlines the steps I have taken, different things I have tried and errors I have encountered. It might help someone.
Considering a mapping to a structure that involves a link table, it seems to me there is a quirk with Code First, Link Tables and TPH or perhaps just a lack of transparency.
I created a derived class with a [Table("")] attribute to map objects to the following sort of table structures:
(Case 1) Employees -> Attributes -> AttributeTypes
(Case 2) Employees -> EmployeeAttributeLink -> Attributes -> AttributeTypes
In the former case i achieved the results i wanted for TPH querying and saving. (I used data annotation attributes along with fluent api to map the derived classes to the correct discriminator id column).
However, in the second case I got this error:
'The entity types A and B cannot share table B because they are not in the same type hierarchy [OR] do not have a valid one to one foreign key relationship with matching primary keys between them. Need to have a 1-1 correspondence'
When I looked at the names of the tables that it was trying to map types to I could see it was confused. However I could not figure out what I was doing wrong as I had used the correct table mapping attribute above the inheriting classes (I didn't define all the sub types that could come from the discriminator - does that matter?).
I introduced some FK attributes trying to address the second part of the [OR] in the error message. This led me to new problems i.e. unable to determine principal/dependant... Then I tried to use the [InverseProperty] attributes... And then I started pulling my hair out.
Now, rolling back and removing the attributes, I decided not to rely on [Table("")] attribute and map the type to the table using the fluent api. This seems to work.
My question is: Why is it that the behaviour of the [Table("") ] attribute and the ToTable function on the fluent API behave differently? I would have thought they are interchangeable
Thanks

Resources