In an Azure Documentdb document like this
{
"id": "WakefieldFamily",
"parents": [
{ "familyName": "Wakefield", "givenName": "Robin" },
{ "familyName": "Miller", "givenName": "Ben" }
],
"children": [
{
"familyName": "Merriam",
"givenName": "Jesse",
"gender": "female",
"grade": 1,
"pets": [
{ "givenName": "Goofy" },
{ "givenName": "Shadow" }
]
},
{
"familyName": "Miller",
"givenName": "Lisa",
"gender": "female",
"grade": 8
}
],
"address": { "state": "NY", "county": "Manhattan", "city": "NY" },
"isRegistered": false
};
How do I query to get children whose pets given name is "Goofy" ?
Looks like the following syntax is invalid
Select * from root r
WHERE r.children.pets.givenName="Goofy"
Instead I need to do
Select * from root r
WHERE r.children[0].pets[0].givenName="Goofy"
which is not really searching through an array.
Any suggestion on how I should handle queries like these ?
You should take advantage of DocumentDB's JOIN clause, which operates a bit differently than JOIN in RDBMs (since DocumentDB deals w/ denormlaized data model of schema-free documents).
To put it simply, you can think of DocumentDB's JOIN as self-joins which can be used to form cross-products between nested JSON objects.
In the context of querying children whose pets given name is "Goofy", you can try:
SELECT
f.id AS familyName,
c AS child,
p.givenName AS petName
FROM Families f
JOIN c IN f.children
JOIN p IN c.pets
WHERE p.givenName = "Goofy"
Which returns:
[{
familyName: WakefieldFamily,
child: {
familyName: Merriam,
givenName: Jesse,
gender: female,
grade: 1,
pets: [{
givenName: Goofy
}, {
givenName: Shadow
}]
},
petName: Goofy
}]
Reference: http://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/
Edit:
You can also use the ARRAY_CONTAINS function, which looks something like this:
SELECT food.id, food.description, food.tags
FROM food
WHERE food.id = "09052" or ARRAY_CONTAINS(food.tags.name, "blueberries")
I think the ARRAY_CONTAINS function has changed since this was answered in 2014. I had to use the following for it to work.
SELECT * FROM c
WHERE ARRAY_CONTAINS(c.Samples, {"TimeBasis":"5MIN_AV", "Value":"5.105"},true)
Samples is my JSON array and it contains objects with many properties including the two above.
Related
The following is my Company object that I store in Cosmos DB. I have all the essential information about employees in Employees property. I also have a Departments property that both defines departments as well as its members.
{
"id": "company-a",
"name": "Company A",
"employees": [
{
"id": "john-smith",
"name": "John Smith",
"email": "john.smith#example.com"
},
{
"id": "jane-doe",
"name": "Jane Doe",
"email": "jane.doe#example.com"
},
{
"id": "brian-green",
"name": "Brian Green",
"email": "brian.green#example.com"
}
],
"departments": [
{
"id": "it",
"name": "IT Department",
"members": [
{
"id": "john-smith",
"name": "John Smith",
"isDepartmentHead": true
},
{
"id": "brian-green",
"name": "Brian Green",
"isDepartmentHead": false
}
]
},
{
"id": "hr",
"name": "HR Department",
"members": [
{
"id": "jane-doe",
"name": "Jane Doe",
"isDepartmentHead": true
}
]
}
]
}
I'm trying to return a list of a particular department, including the employee's email which will come from employees property.
Here's what I did but this is including all employees in the output:
SELECT dm.id, dm.name, e.email, em.isDepartmentHead
FROM Companies c
JOIN d IN c.departments
JOIN dm IN d.members
JOIN e IN c.Employees
WHERE c.id = "company-a" AND d.id = "hr"
The correct output would be:
[
{
"id": "jane-doe",
"name": "Jane Doe",
"email": "jane.doe#example.com",
"isDepartmentHead": true
}
]
How do I form my SQL statement to get all members of a department AND include employees' email addresses?
I'm pretty sure you cannot write a query like this. You are trying to correlate data twice in the same query across two arrays which I don't think is possible. (at least I've never been successful doing this).
Even if this was possible though, there are other issues with your data model. This data model will not scale. You also need to avoid unbounded or very large arrays within documents (e.g. employees and departments). You also do not want to store unrelated data in the same document. Objective here is to model data for high concurrency operations in the way you use it. This reduces both latency and cost.
There are many ways in which you can remodel this data. But if this is a very small data set, you could probably do something like this below with a partition key of companyId (assuming that you always query within a single company). This will store all employees for one company in the same logical partition which can store up to 20GB of data. I would also model this such that one document stores data specific to the company itself (address, phone number, number of employess, etc) with the id and companyId having the same value. This lets you do things like store materialized aggregates like # of employees and update it in a transaction. Also, since this approach mixes different types of entities (a bonus for NoSQL database, you need a discriminator property that allows you to filter for specific entities within the container so you can deserialize them directly into your model classes.
Here is a data model you could try (please note, you need to determine if this works for you by scaling it up to the amount of data you believe you will need to store. You also need to test and measure the RU/s cost for the CRUD operations you will execute with high concurrency).
Example company document:
{
"id": "aaaaa",
"companyId": "aaaaa",
"companyName": "Company A",
"type": "company",
"numberOfEmployees: 3,
"addresses": [
{
"address1": "123 Main Street",
"address2": "",
"city": "Los Angeles",
"state": "California",
"zip": "92568"
}
]
}
Then an employee document like this:
{
"id": "jane-doe",
"companyId": "aaaaa",
"type": "employee",
"employeeId": "jane-doe",
"employeeName": "Jane Doe",
"employeeEmail": "jane.doe#example.com",
"departmentId": "hr",
"departmentName": "HR Department",
"isDepartmentHead": true
}
Then last, here's the query to get the data you need.
SELECT
c.employeeId,
c.employeeName,
c.employeeEmail,
c.IsDepartmentHead
FROM c
WHERE
c.companyId = "company-a" AND
c.type = "employee" AND
c.departmentId = "hr"
I have a Firestore database structure like this:
(Articles is a collection, everything inside is arrays)
Articles:
[
{
"id": 1,
"reports": [
{
"locations": [
{
"location": "Sydney",
"country": "Australia"
},
{
"location": "Perth",
"country": "Australia"
}
],
},
{
"locations": [
{
"location": "King County, Washington",
"country": "USA"
}
],
}
]
},
{
"id": 2,
"reports": [
{
"locations": [
{
"location": "Brisbane",
"country": "Australia"
}
]
}
]
}
]
I'd like to create a query to return all Articles that mention a specific country.
Am I better off restructuring my database?
While Firestore's array-contains queries can get you close to this, you can't use them in your current data model as you're nesting arrays.
To allow the use-case you'll at the very least need to unnest one of those arrays into a subcollection of each article, so:
Articles (collection)
$article
reports (collection)
$report (document with a locations array)
At that point you can use a collection group query to search all reports collections, and then array-contains to find the relevant report documents:
var ref = firebase.firestore().collectionGroup("reports");
ref.where("locations", "array-contains", { location: "Sydney", country: "Australia" })
Never mind the below answer, which assumed you had only one level of array...
This type of query should be possible, as far as I can see, using the array-contains operator. Keeping in mind that you need to specify the entire array item.
So something like:
collectionRef.where("reports.locations", "array-contains",
{ location: "Sydney", country: "Australia" })
The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]
I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:
I am trying to extract a large amount of Data from a json document. There are about 1500 nodes per json document. When I attempt to load the body node I get the 128KB limit error. I have found a way to load the node but I have to go all the way down to the array list. JsonExtractor("body.nprobe.items[*]"); The issue I am having is that I cannot access any other part of the json document, I need to get the meta data, like: Id, SerialNumber etc. Should the json file be changed in some way? The data I need is 3 levels down. The json has be obfuscated and shortened, the real file is about 33K lines of formatted json about 1500 items with n-20 fields in each.
{
"headers": {
"pin": "12345",
"Type": "URJ201W-GNZGK",
"RTicks": "3345",
"SD": "211",
"Jov": "juju",
"Market": "Dal",
"Drst": "derre",
"Model": "qw22",
"DNum": "de34",
"API": "34f",
"Id": "821402444150002501"
},
"Id": "db5aacae3778",
"ModelType": "URJ",
"body": {
"uHeader": {
"ID": "821402444150002501",
"SerialNo": "ee028861",
"ServerName": "BRXTTY123"
},
"header": {
"form": 4,
"app": 0,
"Flg": 1,
"Version": 11056,
"uploadID": 1,
"uDate": "2016-04-14T18:29"
},
"nprobe": {
"items": [{
"purchaseDate": "2016-04-14T18:21:09",
"storeLoc": {
"latitude": 135.052335,
"longitude": 77.167005
},
"sr": {
"ticks": 3822,
"SkuId": "24",
"Data": {
"X": "0.00068",
"Y": "0.07246",
}
}
},
{
"purchaseDate": "2016-04-14T18:21:09",
"storeLoc": {
"latitude": 135.052335,
"longitude": 77.167005
},
"sr": {
"ticks": 3823,
"SkuId": "25",
"Data": {
"X": "0",
"Y": "2",
}
}
}]
}
},
"Messages": []
}
Thanks.
You'll have to use CROSS APPLY:
https://msdn.microsoft.com/en-us/library/azure/mt621307.aspx
and EXPLODE:
https://msdn.microsoft.com/en-us/library/azure/mt621306.aspx
See a worked out solution here:
https://github.com/algattik/USQLHackathon/blob/master/VS-Solution/USQLApplication/ciam-to-sqldw.usql
https://github.com/algattik/USQLHackathon/blob/master/Samples/Customer/customers%202016-08-10.json
-- IMPROVED ANSWER: --
As this solution will not work for you since your inner JSON is too large to fit in a string, you can parse the input twice:
DECLARE #input string = #"/so.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#meta =
EXTRACT Id string
FROM #input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#items =
EXTRACT purchaseDate string
FROM #input
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("body.nprobe.items[*]");
#itemsFull =
SELECT Id,
purchaseDate
FROM #meta
CROSS JOIN #items;
OUTPUT #itemsFull
TO "/items_full.csv"
USING Outputters.Csv();
With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json
You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.
If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.