What is the best way to manage relations in ElasticSearch?

What is the best way to manage relations in ElasticSearch? - symfony

Sorry if this question have been asked but i couldn't find clear answer on this subject.
I'm having troubles while creating my elasticsearch index, i'm not really sure how to manage relations properly.
Let's say i have we have the following entities:
Product
id
reference
Book
id
name
product_id
Shirt
id
color
product_id
StockItem
id
supplier_id
product_id
quantity
I'd like to :
Find a shirt from it's color
Find all books given by supplier_id 5
I wasn't able to find if i was supposed to do multiple queries, nested objects, parent/children relations, etc... I couldn't find a proper tutorial which says "do it that way".
Actually i'm working with nested objects but i find it quite dirty to redefine, in each of my type, all the data i need.
Do you have some advice on this ?
Thank's.

The key to searching and modeling relationships in Elasticsearch is to denormalize. This is because Lucene has a flat data model with no built-in support for relationships in your data.
Think of it from the perspective of your search results. What is the thing being searched for? What shows up in your search results? That is the thing you are searching against. If you want to filter or sort those things based on the values in a related object, then you need to pull those values in at indexing time.
If you're searching for shirts and want to filter by color, then your shirt documents should all have a color field on them. If you are searching books and want to filter to a certain supplier, then you should include the supplier name or ID as a field on your book documents.
Your choice of language and ES client may make this easier. For example, in Ruby, you can index the results of arbitrary method calls, allowing you to dynamically fetch from other associated models while indexing your data.

Nested structures or parent child relation is your best bet. I hope this blog will help.

Related

Firebase Firestore Database design

I'm looking for the most simply designed database structure for a given day, food type, food name and food price.
The part that is giving me trouble is, I feel like, the food name and food price need to be "tied together".
The pictures below are what I have experimented with, but I feel like I am over engineering the structure.

I think you need to understand the firebase ideas of collections and document better - this two level structure is often confusing to developers more familiar with SQL-like databases.
A document contains fields and potentially collections. A collection is always a collection of documents.
So in your case you might make a collection called days which contains documents 'monday','tuesday'....
the monday document might contain some other fields (the chef, the date ect) and a collection (of documents) called 'dessert'
Then each of your dessert documents might include fields for description, price etc.
So in general if you are trying to define a list of multiple similar objects your are wanting a collection of documents. If you are wanting to define fields within an object you should be looking to store them in a document.

Firestore data modeling for library books + wishlist data

I'm working on a library app, and am using Firestore with the following (simplified) two collections books and wishes:
Book
- locationIds[] # Libraries where the book is in stock
Wish
- userId # User who has wishlisted a book
- bookId # Book that was wishlisted
The challenge: I would like to be able to make a query which gets a list of all Book IDs which have been wishlisted by a user AND are currently available in a library.
I can imagine two ways to solve this:
APPROACH 1
Copy the locationIds[] array to each Wish, containing the IDs of every location having a copy of that book.
My query would then be (pseudocode):
collection('wishes')
.where('userId' equals myUserId)
.where('locationIds' contains myLocationId)
But I expect my Wishes collection to be pretty large, and I don't like the idea of having to update the locationIds[] of all (maybe thousands) of wishes whenever a book's location changes.
APPROACH 2
Add a wishers[] array to each Book, containing the IDs of every user who has wishlisted it.
Then the query would look something like:
collection('books')
.where('locationIds' contains myLocationId)
.where('wishers' contains myUserId)
The problem with this is that the wishers array for a particular book may grow pretty huge (I'd like to support thousands of wishes on each book), and then this becomes a mess.
Help needed
In my opinion, neither of these approaches are ideal. If I had to pick one, I will probably go with Approach 1 simply because I don't want my Book object to contain such a huge array.
I'm sure I'm not the first person to come across this sort of problem, is there a better way?

You could try dividing the query in two different requests. For instance, in pseudocode:
wishes = db.collection('wishes').where('userId', '==', myUserId)
book_ids = [wish.bookId for wish in wishes]
books = db.collection('books').where('bookId', 'in', book_ids)
result = [book.bookId for book in books if book.locationIds]
Notice that this is just an example, this code probably doesn't work, since I haven't tested it and the keywork in just supports 10 values. But you get the idea. A good idea would be adding the length of the locationIds or whether it's empty or not in a separate attribute so you could omit the last iteration querying the books with:
books = db.collection('books').where('bookId', 'in', book_ids).where('hasLocations', '==', True)
Although you would still have to iterate to only get the bookId.
Also, you should avoid using arrays in Firestore since it doesn't have native support for them, as explained in their blog.
Is it mandatory to use NoSQL? Maybe you could do this M:M relation better in SQL. Bear in mind that I'm no database expert though.

Should DynamoDB adjacency lists use discrete partition keys to model each type of relationship?

Context
I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).
Requirements
For example, let's say we want be able to do the following:
Users can like comments
Users can follow comments
Comments can display users that like it
Comments can display users that follow it
User profiles can show comments they like
User profiles can show comments they follow
So, we essentially have a many-to-many (user <=> comment) to many (like or follow).
Note: This example is deliberately stripped down, and in practice there will be many more relationships to model, so i'm trying to think of something extensible here.
Baseline
The following top-level data would likely be common in any adjacency list representation:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
User-Harry User-Harry User data
User-Ron User-Ron User data
User-Hermione User-Hermione User data
Comment-A Comment-A Comment data
Comment-B Comment-B Comment data
Comment-C Comment-C Comment data
Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.
Example Data
This is what I would like to model in DynamoDB:
Harry likes comment A
Harry likes comment B
Harry follows comment A
Ron likes comment B
Hermione likes comment C
Option 1
Use a third attribute to define the type of relationship:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
Comment-A User-Harry "LIKES"
Comment-B User-Harry "LIKES"
Comment-A User-Harry "FOLLOWS"
Comment-B User-Ron "LIKES"
Comment-C User-Hermione "FOLLOWS"
The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.
Option 2
Modify the keys to represent the relationship:
First_id(Partition key) Second_id(Sort Key)
------------- ----------
LikeComment-A LikeUser-Harry
LikeComment-B LikeUser-Harry
FollowComment-A FollowUser-Harry
LikeComment-B LikeUser-Ron
FollowComment-C FollowUser-Hermione
This makes it efficient to query independently:
Comment likes
Comment follows
User likes
User follows
The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.
Option 3
Skip adjacency lists altogether and use separate tables, maybe one for Users, one for Likes, and one for Follows.
Option 4
Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.
Conclusion
Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)
I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.

Your Option 1 is not possible because it does not have unique primary keys. In your sample data, you can see that you have two entries for (Comment-A, User-Harry).
Solution 1
The way to implement what you are looking for is by using slightly different attributes for your table and the GSI. If Harry likes Comment A, then your attributes should be:
hash_key: User-Harry
gsi_hash_key: Comment-A
sort_key_for_both: Likes-User-Harry-Comment-A
Now you have only one partition key value for your top level entities in both the table and the GSI, and you can query for a specific relationship type by using the begins_with operator.
Solution 2
You could make the relationship a top-level entity. For example, Likes-User-Harry-Comment-A would have two entries in the database because it is “adjacent to” both User-Harry and Comment A.
This allows you flexibility if you want to model more complex information about the relationships in the future (including the ability to describe the relationship between relationships, such as Likes-User-Ron-User-Harry Causes Follows-User-Ron-User-Harry).
However, this strategy requires more items to be stored in the database, and it means that saving a “like” (so that it can be queried) is not an atomic operation. (But you can work around that by only writing the relationship entity, and then use DynamoDBStreams + Lambda to write entries for two entries I mentioned at the beginning of this solution.)
Update: using DynamoDB Transactions, saving a "like" in this manner can actually be a fully ACID operation.

Drupal Entity Reference based on custom field

I hope that this question has not yet been answered, but I searched through lots of topics and I didn't find an answer.
Here is the problem: I'm trying to link two nodes from different content types, and I'd like to use another field than the Title of the other node as a reference.
The reason I dont want to use the node's Title is because I'm currently building a website for a movie theater. I'm using Feeds to import movies and movie times from a xml file. These are both represented by their own content type.
Each movie has its own unique ID picked from a database, and this is the field I'd like to use for joining two nodes.
The aim is to display a movie alongside its movie times (they are more than one movie
time's node for one movie, because a movie can be seen in 3D/non 3D mode, VO/translated...).
When I'm trying to use Entity Reference, I can't find a way to fill (for example) a text field with this unique movie's ID, in order to use it as a relationship in Views.
I hope my problem and question are both understandable (excuse my English...), and that some of you could help me to find a solution.
Thanks by advance

I solved this problem by grouping fields based on Film ID.
I'll just have a bit of templating to make in order to wrap my view's output to wrap all fields, but it shouldn't be a problem ;)

You can switch the entity reference selection method to use a view instead of the "simple". You have to have views installed.

How to design a datastore database relationship

So I have a question about designing a datastore database, I'm using objectify. I'm trying to get optimal performance.
So I need to create two entities, List and Listings, with a relationship. There will be 500,000 listings in all and 50,000 per list.
Looking at this https://code.google.com/p/objectify-appengine/wiki/IntroductionToObjectify#Multi-Value_Relationship
I see there are three methods to store relationship.
One to one, many to one and Multi-value relationship.
The Multi-Value relationship looks like it would work great but appears to have a limit of 5,000 entries per entity(List?)
So I assume I should use the many to one method but I question the performance on this as I would have to query every listing and filter.
Can I have good performance doing what I'm attempting with datastore?
Any help at all would be great!

Multi-value relationship in that case has no a good performance because each value implies a new line on its field index. It means longer write times. Also it has a limit of entries. It's useful when you have a few values to store.
There is another type of relationship: entity group.
The criteria to choose between each method also depends on the type of queries you do and the frequency of updating entities.
In base of the information you provide, I recommend many-to-one relationship.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex