Doctrine connections in mongodb to elasticsearch index

Doctrine connections in mongodb to elasticsearch index - symfony

I'm use symfony2 with doctrine odm (mongodb). I need create elasticsearch index, but its not hard. My structure Product collection (abridged):
{
"_id": ObjectId("5239656f60663de206b1053e"),
"category": {
"$ref": "Category",
"$id": ObjectId("50cb515760663d3577000043"),
"$db": "<dbName>"
},
"name": "<productName>"
}
Category collection:
{
"_id": ObjectId("50cb515760663d3577000043"),
"name": "<categoryName>"
}
Category field in Product collection - has 3 sub fields, which are created by doctrine. I need to create an index comprising only productName and categoryName.
How do I do that?

Related

Indexing the partition key in Azure Cosmos DB

Suppose I've the following data in my container:
{
"id": "1DBF704E-1623-4844-DC86-EFA729A5C048",
"firstName": "Wylie",
"lastName": "Ramsey",
"country": "AZ",
"city": "Tucson"
}
Where I use the field "id" as the item id, and the field "country" as the partition key, when I query on specific partition key:
SELECT * FROM c WHERE c.country = "AZ"
(get all the people in "AZ")
Should I add "country" as an index or I will get it by default, since I declered "country" as my partition key?
Is there a diference when using the SDK (meaning: adding the new PartitionKey("AZ") option and then sending the query as mentioned above)?

I created a collection with 50,000 records and disabled indexing on all properties.
Indexing policy:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [], // Included nothing
"excludedPaths": [
{
"path": "/\"_etag\"/?"
},
{
"path": "/*" // Exclude all paths
}
]
}
Querying by id cost 2.85 RUs.
Querying by PartitionKey cost 580 RUs.
Indexing policy with PartitionKey (country) added:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/country/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
},
{
"path": "/*" // Exclude all paths
}
]
}
Adding an index on the PartitionKey brought it down to 2.83 RUs.
So the answer to this is Yes, if you have disabled default indexing policies and need to search by partition key then you should add an index to it.

In my opinion, it's a good practice to query with partition key in cosmosdb sql api, here's the offical doc related to it.
By the way, cosmosdb sql api indexes all the properties by default. If you'd like to cover the default setting and customise the indexing policy, this doc may provide more details.

JSON API automatic reverse links

Is it a violation of the JSON-API spec to allow reverse relationships to be created automatically?
I need to create resources that, when I link A to B in a relationship, automatically links B to A. In this way I can traverse A to find all of its Bs and can find the parent A from a B. However, I don't want to POST/PATCH to 2 relationships to get this right. I want to establish the relationship once.
Now I know that it is an implementation detail as to how the server maintains link/references as well as how the behaviour is established but I want to build the API in such a way that it doesn't violate the spec.
Assuming I have resources Books and Authors. Books have Authors and Authors have Books. The question is, once I relate an Author to a Book, I need the reverse relationship to be created as well. Is it a violation of the spec in any way to assume that this reverse relationship can be automatically created by simply doing one POST to the Books resource's relationship?
By way of example, starting with the book.
{
"data": {
"type": "books", "id": 123, "attributes": ...,
"links": { "self": "/books/123" },
"relationships": {
"self": "/books/123/relationships/authors",
"related": "/books/123/authors"
}
}
}
And the author
{
"data": {
"type": "authors", "id": 456, "attributes": ...,
"links": { "self": "/authors/456" },
"relationships": {
"self": "/authors/456/relationships/books",
"related": "/authors/456/books"
}
}
}
If I establish the link from a book to an author with a POST to /books/123/relationships/authors
{
"data": [{ "data": "authors", "id": "456" }]
}
Do I need to explicitly do the same for the Author 456 as a POST to /authors/456/relationships/books?
{
"data": [{ "data": "books", "id": "123" }]
}
Or can I let the server build the relationship for me so that I can avoid the second POST and just see the automatic reverse relationship at GET /authors/456/relationships/books?

From the perspective of the spec this is only one relationship represented from two different sides. author and book have a many-to-many relationship. This relationship could be represented in author's resource object as well as book's resource object and of course also via there relationship links. Actually it would be a violation of the spirit of the specification if representations wouldn't match. Having one-sided relationships is another story but in that case one side wouldn't know about the relationships at all (e.g. a book is associated with an author but the author model does not know which books are associated with it).
A post to either one side of that relationship creates the relationship between the two records. It shouldn't matter which side is used to create that relationship and if it's created as part of a creation / update to a resource via it's resource object or via a relationship link representing that relationship. The same applies to deletion of that relationship.
Maybe an example would make that even more clear. Let's assume a book is created with a POST to /books?include=author having these payload:
{
"data": {
"type": "books",
"relationships": {
"author": {
"data": {
"type": "authors",
"id": "1"
}
}
}
}
}
The response may look like this:
{
"data": {
"type": "books",
"id": "7",
"relationships": {
"author": {
"data": { "type": "authors", "id": "1" }
}
}
},
"included": [
{
"type": "authors",
"id": "1",
"relationships": {
"books": {
"data": [
{ "type": "books", "id": "7" }
]
}
}
}
]
}

Azure search service index pointing multiple document db collections

How to load data from two separate collections of azure cosmos db to a single azure search index? I need a solution to join the data from two collections in a way similar to inner joining concept of SQL and load that data to azure search service.
I have two collections in azure cosmos db.
One for product and sample documents for the same is as below.
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "a4853bf5-9c58-4fb5-a1ff-fc3ab575b4c8",
"name": "New Product",
"createDate": "2018-09-19T10:04:35.1951552Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:46:24.7048358Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5#clients"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "b9b6c3bc-a8f8-470f-ac93-be589eb1da16",
"name": "New Product 2",
"createDate": "2018-09-19T11:02:02.6919008Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-19T11:02:02.6919008Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "98b3647a-3b40-4a00-bd0f-2a397bd48b68",
"name": "New Product 7",
"createDate": "2018-09-20T09:42:28.2913567Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-20T09:42:28.2913567Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
Another collection for ProductType with below sample document.
{
"description": null,
"links": null,
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"id": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"name": "ProductType1_186",
"createDate": "2018-09-18T23:54:43.9395245Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:29:44.019851Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5#clients"
}
The product type id is referred in product collection and that is the column which links both the collections.
I want to load the above two collections to the same azure search service index and I expect my field of index to be populated somewhat like below.

If you use product id as the key, you can simply point two indexers at the same index, and Azure Search will merge the documents automatically. For example, here are two indexer definitions that would merge their data into the same index:
{
"name" : "productIndexer",
"dataSourceName" : "productDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
{
"name" : "sampleIndexer",
"dataSourceName" : "sampleDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
Learn more about the create indexer api here
However, it appears that the two collections share the same fields. This means that the fields from the document which gets indexed last will replace the fields from the document that got indexed first. To avoid this, I would recommend replacing the fields that match the 00000000-0000-0000-0000-000000000000 pattern with null in your Cosmos DB query. For example:
SELECT productTypeId, (createdBy != "00000000-0000-0000-0000-000000000000" ? createdBy : null) as createdBy FROM products
This exact query may not work for your use case. See the query syntax reference for more information.
Please let me know if you have any questions, or something is not working as expected.
Thanks
Matt

Model Firebase realtime database in JSON schema

The Firebase database uses a subset of JSON. Thus it seems obvious to use JSON schema to describe the data model. This would allow to make use of tools which generate HTML forms or typescript models from it or generate random test data.
My question: How would one model key-value pairs in JSON schema, where the key is an id?
Example: (borrowed from firebase spec)
{
"users": {
"mchen": {
"name": "Mary Chen",
// index Mary's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"alpha": true,
"charlie": true
}
},
...
The group name here is used as an group id. In this reference (groups object) as well as in the group object itself, the id is used as the property name.
JSON schema for above example is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
"mchen": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
"alpha": {
"type": "boolean"
},
"charlie": {
"type": "boolean"
}
}
}
}
}
}
}
}
}
What I would need for the example is something like the following, where NAME is a placeholder for the property name and NAME_TYPE defines it's type.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
NAME: {
"type": "object",
NAME_TYPE: "string",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
NAME: {
NAME_TYPE: "string"
"type": "boolean"
}
}
}
}
}
}
}
}
}
(Maybe I am on the completely wrong path here or maybe JSON schema isn't able to model the required structure.)

There are certainly arrays in Firebase but they are situational and should be used only in certain use cases and should generally be avoided.
The Firebase structure you posted is very common and there are key:value pairs in your structure so the question is a tad unclear but I'll give it a shot.
'groups' is the parent key and the values are the children key:value pairs of group1:value, group2:value.
The group1, group2 keys you listed are essentially the same as the id's listed in the first example, other than it's not an array. i.e. array's have sequential, hard coded-indexes (0th, 1st, 2nd etc) whereas the keys in firebase are open-ended and can generally be set to any alphanumeric value - they are used more to refer to a specific node than the enforce an particular order (i'm speaking generally here)
In the Firebase structure, those keys could be id0, id1, id2... or a,b,c... or a timestamp... or an auto-generated Firebase id (childByAutoId) that would also make them 'sequential'.
However, you can get into trouble assigning your own with id0, id1 etc.
id0
id1
id2
.
id9
id10
id11
The reality here is that the actual order will be
id0
id1
id10
id11
id2
The 'key' is that if you are using the keys to read data in sequentially, set them up as such. You may also want to consider generating your keys with childByAutoId (see docs for language specifics) and orderBy one of the child values such as a timestamp or index.
'groups': {
'auto-generated id': {
'name': 'alpha',
'index': 0,
'timestamp': '20160814074146'
...
},
'auto-generated id': {
'name': 'charlie',
'index': 1,
'timestamp': '20160814073600'
...
},
...
}
in the above case, I can orderBy name, index or timestamp.
Name and index will read the nodes in the order they are listed, if we order by timestamp, then the charlie node will be loaded first. Leveraging the child values to orderBy is very flexible.
Also, you can limit the set of data you are loading in with startingAt and endingAt. So for example, you want to load nodes starting at node 10 and ending at node 14. Easily done with non-array JSON data but not easily done if it's stored in an array as the entire array must be read in.

Document Db query filter for an attribute that contains an array

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json

You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.

If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex