How to structure data in Firebase for filtering potentially large datasets? - firebase

I have a node in my realtime database with thousands of children, and want to filter on these, both quick and bandwidth saving (the latter is maybe not that important at this point, but might be when my data grows).
What is the best way of structuring this to avoid fetching all the items and doing the filtering on the client?
Here is what i'm planning on implementing, but as I have no experience with Firebase or other NoSQL databases, I do need some input :)
{
"items" : {
"item1" : {
"name" : "Item 1",
"filters": {
"filter1" : true,
"filter2" : true,
"filter3" : true
}
},
"item2" : {
"name" : "Item 2",
"filters": null
},
"item3" : {
"name" : "Item 3",
"filters": {
"filter2" : true
}
},
"item4" : {
"name" : "Item 4",
"filters": {
"filter2" : true,
"filter3" : true
}
},
"item5" : {
"name" : "Item 5",
"filters": {
"filter3" : true
}
}
// Thousands of items
},
"items_by_filter1" : {
"item1" : true
},
"items_by_filter2" : {
"item1" : true,
"item3" : true
},
"items_by_filter3" : {
"item1" : true,
"item4" : true,
"item5" : true
}
}
Am I overthinking this, or is this a good approach? What if I want to filter on several filters, should I follow the same approach and add something like this for all filter combinations (probably a struggle to maintain)?
"items_by_filter2_and_filter3" : {
"item1" : true,
"item4" : true
}

Related

How to set .indexOn in firebase?

This is my json structure
{
"books" : {
"sample" : {
"eight" : {
"author" : "eighta",
"name" : "eight",
"sub" : {
"subauthor" : "eightauthor",
"subname" : "general"
}
},
"eleven" : {
"author" : "twelvea",
"name" : "twelve",
"sub" : {
"subauthor" : "elevenauthor",
"subname" : "general"
}
},
"five" : {
"author" : "fivea",
"name" : "five",
"sub" : {
"subauthor" : "fiveauthor",
"subname" : "fivesub"
}
},
"four" : {
"author" : "foura",
"name" : "four",
"sub" : {
"subauthor" : "fourauthor",
"subname" : "general"
}
},
"nine" : {
"author" : "ninea",
"name" : "nine",
"sub" : {
"subauthor" : "nineauthor",
"subname" : "ninesub"
}
},
"one" : {
"author" : "onea",
"name" : "one",
"sub" : {
"subauthor" : "oneauthor",
"subname" : "onesub"
}
},
"seven" : {
"author" : "seven",
"name" : "seven"
},
"six" : {
"author" : "sixa",
"name" : "six"
},
"ten" : {
"author" : "tena",
"name" : "ten"
},
"three" : {
"author" : "threea",
"name" : "three"
},
"two" : {
"author" : "twoa",
"name" : "two"
}
}
}
}
I want to fetch data which are having subname equal to general
My index rules
{
/* Visit https://firebase.google.com/docs/database/security to learn more about security rules. */
"rules": {
".read": true,
".write": true,
"books": {
"$user_id": {
".indexOn": ["subname", "subauthor"]
}
}
}
}
}
https://myprojectpath/books/sample/eight.json?orderBy="subname"&equalTo="general"&print=pretty
Above rule is working fine. But I need to pass generic api to fetch the data where subname should be general. I cannot pass eight.json, nine.json, ten.json each time when I call. I should call only one api where it should provide the data where subname should be general.
If I understand you correctly you want to use a single query to search across all authors for their sub/subname property.
In that case you can define an index on books/sample for the sub/subname property of each child node:
"books": {
"sample": {
".indexOn": ["sub/subname", "sub/subauthor"]
}
}
The sample could be a $ variable here (such a $user_id), but the paths in .indexOn have to be known.
This creates an index under sample with the value of sub/subname for each child node, which you can then query with:
https://myprojectpath/books/sample.json?orderBy="sub/subname"&equalTo="general"&print=pretty

Matching users with similar interest tags using Firebase, Elasticsearch and Flashlight

What is the best way to match documents by tags using elasticsearch using the following setup (or modifying the setup)?
I've got my users in a firebase database, they have associated tags that define their interests:
"users" : {
"bruce" : {
"martial art" : "Jeet Kune Do",
"name" : "Bruce Lee",
"nick" : "Little Phoenix",
"tags" : {
"android" : true,
"ios" : true
}
},
"chan" : {
"account_type" : "contractor",
"martial art" : "Kung Fu",
"name" : "Jackie Chan",
"nick" : "Cannonball",
"tags" : {
"ios" : true
}
},
"chuck" : {
"martial art" : "Chun Kuk Do",
"name" : "Carlos Ray Norris",
"nick" : "Chuck"
}}
Using Flashlight + the Firebase admin SDK I'm keeping an index up to date on Bonsai/heroku, that supposedly will help me to match users with similar interests or related products.
"firebase": {
"aliases": {},
"mappings": {
"user": {
"properties": {
"name": {
"type": "string"
},
"tags": {
"properties": {
"android": {
"type": "boolean"
},
"ios": {
"type": "boolean"
}
}
}
}
}
}...
For now I can query users with certain combination of tags:
{
"query": {
"bool": {
"must" : {
"type" : {
"value" : "user"
}
},
"should": [
{
"term": {
"tags.ios": true
}
},
{
"term": {
"tags.android": true
}
}
],
"minimum_should_match" : 1
}}}
This is great but what I'm looking for is a way to:
Given a user id find other users with similar tags ordered by _score.
There will be other _type's apart from "user" using also tags, for example products so it would also be great to match products to users when they share some tags.
I get the feeling that because I'm absolutely new on elastic search I'm targeting this in the wrong way. Maybe the way the data is modeled?
Problem is that firebase kind of restricts this a lot, for instance I cannot have arrays, so that makes the tag modeling a bit weird ending in even more weird indexed data...maybe an approach could be to manipulate the data before inserting it to the index?

Can't write to Firebase database using rules

I'm trying to create a rule that allows some users to write but not all.
I need that all user can read 'menu' items but only users listed at store data can write.
My data structure:
{
"category" : [ null, "Burger", "Drinks" ],
"menu" : [ null, {
"available" : true,
"category" : "1",
"description" : "item1 description",
"image" : "chicken_maharaja",
"name" : "New Chicken Maharaja",
"price" : 1300,
"store" : 1
}, {
"available" : true,
"category" : "1",
"description" : "item2 description",
"image" : "big_spicy_chicken_wrap",
"name" : "Big Spicy Chicken Wrap",
"price" : 120,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item3 description",
"image" : "thumsup",
"name" : "Thumsup 100ml",
"price" : 40,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item4 description",
"image" : "mccafe_ice_coffee",
"name" : "Ice Coffee",
"price" : 140,
"store" : 1
}, {
"available" : true,
"category" : "1",
"description" : "item5 description",
"image" : "mc_chicken",
"name" : "MC Chicken",
"price" : 190,
"store" : 1
}, {
"available" : true,
"category" : "2",
"description" : "item6 description",
"image" : "Smoothie",
"name" : "Smoothie",
"price" : 70,
"store" : 2
}, {
"available" : true,
"category" : "1",
"description" : "item8 description",
"image" : "salad_wrap",
"name" : "Salad Wrap",
"price" : 150,
"store" : 2
} ],
"stores" : [ null, {
"location" : "Campinas - Taquaral",
"name" : "Store 1",
"user" : {
"pyixsRTw9qdiuESt62YnmEYXQt13" : true
}
}, {
"location" : "São Paulo - Perdises",
"name" : "Store 2",
"user" : {
"LBNZ8Dwp2rdJtlSh0NC1ApdtbAl2" : true,
"TLomOgrd3gbjDdpDAqGiwl0lBhn2" : true
}
} ],
"userProfile" : {
"LBNZ8Dwp2rdJtlSh0NC1ApdtbAl2" : {
"birthDate" : "1974-02-10",
"email" : "asd#asd.com",
"firstName" : "João",
"lastName" : "Silva"
},
"pyixsRTw9qdiuESt62YnmEYXQt13" : {
"birthDate" : "1974-02-10",
"email" : "leandro.garcias#gmail.com",
"firstName" : "Leandro",
"lastName" : "Garcia"
}
}
}
My rule:
{
"rules": {
"menu": {
"$items": {
".read": "true",
".write": "root.child('stores').child('1').child(data.child('user').val()).hasChild(auth.uid)"
}
},
"stores": {
"$store": {
".read": "true",
".write": "root.child('stores').child('$store').child(data.child('user').val()).hasChild(auth.uid)"
}
}
}
}
The read is ok. :-) But I can't write.
Your newData doesn't have a child user so that check always fails. You probably mean:
"43268522": {
"menu": {
"$items": {
".read": "true",
".write": "root.child('stores').child('1').child('user').hasChild(auth.uid)"
}
}
You're probably looking for this rule:
".write": "
root.child('stores')
.child(newData.child('store').val())
.child('user')
.hasChild(auth.uid)"
So this uses the store property from the new data to look up if the current user is in the store they're trying to modify.
Unfortunately this rule won't work with your current data structure, since the value of store is a number, while the key of a store is a string: "1" !== 1.
The simplest solution is to store the store as a string, e.g.:
"store": "1"
You might want to consider that anyway, since you're now getting Firebase's array coercion, which is not helpful. For more on this see our blog post on Best Practices: Arrays in Firebase. I'd recommend storing stores using either push IDs, or simply prefixing them, e.g.
"stores": {
"store1": {
...
}
}

Search by array values in Firebase

I use Firebase via REST API. I have following database structure:
{
"categories" : {
"Cat1" : {},
"Cat2" : {},
"Cat3" : {},
"Cat4" : {}
},
"items" : {
"item1" : {
"categories": ["Cat1", "Cat3"]
},
"item2" : {
"categories": ["Cat1", "Cat3"]
},
"item3" : {
"categories": ["Cat1", "Cat2", "Cat3"]
},
"item4" : {
"categories": ["Cat4"]
}
}
}
As you could see we have relations of type "N <-> N" between categories and items (one category could have several items and one item could be in several categories).
Now I want to get all items of Cat1 via Firebase REST API, but I can not do it.
As we know arrays are stored in the Firebase like map with integral indexes:
"categories": {
"0": "Cat1",
"1": "Cat2",
"2": "Cat3",
}
So, I added ".indexOn": ["categories/*"] to Realtime Database Rules and tried to call this:
curl 'https://...firebaseio.com/...json?orderBy="categories/*"&equalTo="Cat1"'
But I got only this: { }
So, I think that regular expressions do not work in Firebase queries, because this worked:
".indexOn": ["categories/0"] in Realtime Database Rules and
curl 'https://...firebaseio.com/...json?orderBy="categories/0"&equalTo="Cat1"'
Of course, I could change the database model to something like this:
{
"categories" : {
"Cat1" : {},
"Cat2" : {},
"Cat3" : {},
"Cat4" : {}
},
"items" : {
"item1" : {},
"item2" : {},
"item3" : {},
"item4" : {}
},
"category-items": {
"Cat1": ["item1", "item2", "item3"],
"Cat2": ["item3"],
"Cat3": ["item1", "item2", "item3"]
"Cat4": ["item4"]
}
}
And get the category-items and iterate through the Cat1 array, but then I must to call REST API read method too many times (one REST API call for every item in the category). So, it is too expensive.
So, could anybody help me with getting all items in a category in origin database model?
UPDATE
The final model is:
{
"categories" : {
"Cat1" : {},
"Cat2" : {},
"Cat3" : {},
"Cat4" : {}
},
"items" : {
"item1" : {
"Cat1": true,
"Cat3": true,
},
"item2" : {
"Cat1": true,
"Cat3": true,
},
"item3" : {
"Cat1": true,
"Cat2": true,
"Cat3": true,
},
"item4" : {
"Cat4": true
}
}
}
Also I added
{
rules": {
...
"items": {
".indexOn": [ "Cat1", "Cat2", "Cat3", "Cat4" ]
}
}
}
to Realtime Database Rules, and REST API call is
curl 'https://...firebaseio.com/items.json?orderBy="Cat1"&equalTo=tr‌​ue'
Thanks to Vladimir Gabrielyan
Here is the structure which I would suggest to have.
{
"categories" : {
"Cat1" : {
"items": {
"item1":{/*Some item info*/},
"item2":{/*Some item info*/}
}
},
"Cat2" : {
"items": {
"item3":{/*Some item info*/}
}
},
},
"items" : {
"item1" : {
"categories": {
"Cat1": true,
}
},
"item3" : {
"categories": {
"Cat2": true,
"Cat3": true
}
}
}
}
Inside Cat1/Items/{itemId} and items/{itemId} you need to duplicate your item information, but I think that is okay.
Also see this article. https://firebase.googleblog.com/2013/04/denormalizing-your-data-is-normal.html
Wow! Thank you very much! Your suggestion with replace
"item1" : {
"categories": ["Cat1", "Cat3"]
},
to
"item1" : {
"Cat1": true,
"Cat3": true
},
can solve the problem, but then I will have to add every Cat to .indexOn in Realtime Database Rules, but this is not so big problem as origin problem.
But I think that
"categories" : {
"Cat1" : {
"items": {
"item1":{/*Some item info*/},
"item2":{/*Some item info*/}
}
},
"Cat2" : {
"items": {
"item3":{/*Some item info*/}
}
},
}
is not a good idea in my case because then we get many spare data every time we get information about Cat (when we no need list of items, only metadata of Cat). So, I suggest following model:
{
"categories" : {
"Cat1" : {},
"Cat2" : {},
"Cat3" : {},
"Cat4" : {}
},
"items" : {
"item1" : {
"Cat1": true,
"Cat3": true,
},
"item2" : {
"Cat1": true,
"Cat3": true,
},
"item3" : {
"Cat1": true,
"Cat2": true,
"Cat3": true,
},
"item4" : {
"Cat4": true
}
}
}

querying elasitcsearch parent child documents

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents.
We define the index with the following command:
curl -XPOST 'localhost:9200/items' -d #itemsdef.json
where itemsdef.json has the following definition
{
"mappings" : {
"item" : {
"properties" : {
"id" : {"type" : "long" },
"name" : {
"type" : "string",
"_analyzer" : "textIndexAnalyzer"
},
"location" : {"type" : "geo_point" },
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"activityIndexAnalyzer" : {
"alias" : ["activityQueryAnalyzer"],
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textIndexAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
},
"textQueryAnalyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
}
},
"filter" : {
"spanish_stop" : {
"type" : "stop",
"ignore_case" : true,
"enable_position_increments" : true,
"stopwords_path" : "analysis/spanish-stopwords.txt"
},
"spanish_synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis/spanish-synonyms.txt"
},
"word_delimiter_impl" : {
"type" : "word_delimiter",
"generate_word_parts" : true,
"generate_number_parts" : true,
"catenate_words" : true,
"catenate_numbers" : true,
"split_on_case_change" : false
}
}
}
}
}
Then we add the child document definition using the following command:
curl -XPOST 'localhost:9200/items/slot/_mapping' -d #slotsdef.json
Where slotsdef.json has the following definition:
{
"slot" : {
"_parent" : {"type" : "item"},
"_routing" : {
"required" : true,
"path" : "parent_id"
},
"properties": {
"id" : { "type" : "long" },
"parent_id" : { "type" : "long" },
"activity" : {
"type" : "string",
"_analyzer" : "activityIndexAnalyzer"
},
"day" : { "type" : "integer" },
"start" : { "type" : "integer" },
"end" : { "type" : "integer" }
}
}
}
Finally we perform a bulk index with the following command:
curl -XPOST 'localhost:9200/items/_bulk' --data-binary #testbulk.json
Where testbulk.json holds the following data:
{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}
I'm trying to make the following query: search for all items within a certain distance to a location that have children (slots) in the specified days and within certain start and end ranges.
An item with more slots fulfilling the condition should score higher.
I tried starting with existing samples but the docs are really scarce and its hard to move forward.
Clues?
I don't think there is a way to write an efficient query that would do something like this without moving location to slots. You can do something like this, but it can quite inefficient for some data:
{
"query": {
"top_children" : {
"type": "blog_tag",
"query" : {
"constant_score" : {
"query" : {
... your query for children goes here ...
}
}
},
"score" : "sum",
"factor" : 5,
"incremental_factor" : 2
}
},
"filter": {
"geo_distance" : {
"distance" : "200km",
"location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
Basically, what this query is doing is this, it takes your range query or filter for children and whatever other conditions you need and wraps it into constant_score query to make sure that all children have score of 1.0. The top_children query collects all these children and accumulates their scores to the parents. And then filter filters out parents that are too far away.

Resources