Here is my structure of realtime database in firebase
{
"student1" : {
"name" : "somename",
"skillset" : [
"cpp",
"c",
"java"
],
other properties
},
"student2" : {
"name" : "somename",
"skillset" : [
"javascript",
"c",
"python"
],
other properties
},
"student3" : {
"name" : "somename",
"skillset" : [
"cpp",
"java"
],
other properties
},
"student4" : {
"name" : "somename",
"skillset" : [
"java",
"kotlin"
],
other properties
} }
I want to retrieve all the students having some specific set of all skills
e.g. skills = ["cpp","java"]
then answer should be ["student1","student3"]
Your current structure allows you to easily determine the skills for a user. It does however not make it easy to determine the users for a skill. To allow that, you'll need to add a reverse index, looking something like:
skills: {
java: {
student1: true,
student3: true,
student4: true
},
kotlin: {
student4: true
}
...
}
With the above you can look up the user IDs for a skill, and from there look up each user. For more on this, see my answer here: Firebase query if child of child contains a value
But this still won't allow you to query for users by multiple skills. To allow that, you'll have to add skill combinations to the new data structure. For example with the above skills, there is one user who knows both kotlin and Java:
skills: {
java: {
student1: true,
student3: true,
student4: true
},
java_kotlin: {
student4: true
}
kotlin: {
student4: true
}
...
}
While this leads to extra data, it performs quite well in practice, since you can always directly access the data that you need (so there's no real database query needed).
That is not possible under this structure, as firebase only support filtering by one child.
In your case, you would need to get all the data and filter by code
Related
This question already has an answer here:
Firebase -- Bulk delete child nodes
(1 answer)
Closed 6 years ago.
With Firebase fan out data to different nodes and paths is recommended by Firebase like below example from Firebase sample:
{
"post-comments" : {
"PostId1" : {
"CommentID1" : {
"author" : "User1",
"text" : "Comment1!",
"uid" : "UserId1"
}
}
},
"posts" : {
"PostId1" : {
"author" : "user1",
"body" : "Firebase Mobile platform",
"starCount" : 1,
"stars" : {
"UserId1" : true
},
"title" : "About firebase",
"uid" : "UserId1"
}
},
"user-posts" : {
"UserId1" : {
"PostId1" : {
"author" : "user1",
"body" : "Firebase Mobile platform",
"starCount" : 1,
"stars" : {
"UserId1" : true
},
"title" : "About firebase",
"uid" : "UserId1"
}
}
},
"users" : {
"UserId1" : {
"email" : "user1#gmail.com",
"username" : "user1"
}
}
}
With multipath updates we can atomically update all the paths for a post, however if we want to delete a blog post in above kind of schema then how can we do it atomically? There is no multi path delete, I guess. If client losses network connection while deleting then only few paths would be deleted!
Also in case there is a requirement like when a user is deleted for all the post he has starred, we should remove the stars and unstar the post for that user. This becomes difficult as there is no direct tracking of what posts user has starred. For this do we need to fan out the starring of posts as well like have a node user-stars. Then while deleting we know what all activity the user has done and act on it while deleting user. Is there a better way of handling this?
"user-stars":{
"UserId1":{
"PostID1":true
}
}
In both cases the question on atomically or consistently deleting the data from multipaths (either all or nothing) is seemingly not available.
In that case the only option available looks to be putting the delete command in Firebase queue which will resolve the task in queue only if everything is deleted. That will be eventually consistent option but should be fine. But that is expensive option requiring server. Is there a better way?
You can implement a multi-path delete, by writing a value of null to the paths.
So:
var updates = {
"user-posts/UserId1/PostId1": null,
"post-comments/PostId1": null,
"posts/PostId1": null
}
ref.update(updates);
I had already answered this before: Firebase -- Bulk delete child nodes
It's also quite explicitly mentioned in the documentation on deleting data:
You can also delete by specifying null as the value for another write operation such as set() or update(). You can use this technique with update() to delete multiple children in a single API call.
I'm trying to get started with Firebase and I just want to make sure that this data structure is optimized for Firebase.
The conversation object/tree/whatever looks like this:
conversations: {
"-JRHTHaKuITFIhnj02kE": {
user_one_id: "054bd9ea-5e05-442b-a03d-4ff8e763030b",
user_two_id: "0b1b89b7-2580-4d39-ae6e-22ba6773e004",
user_one_name: "Christina",
user_two_name: "Conor",
user_one_typing: false,
user_two_typing: false,
last_message_text: "Hey girl, what are you doing?",
last_message_type: "TEXT",
last_message_date: 0
}
}
and the messages object looks like so:
messages: {
"-JRHTHaKuITFIhnj02kE": {
conversation: "-JRHTHaKuITFIhnj02kE",
sender: "054bd9ea-5e05-442b-a03d-4ff8e763030b",
message: "Hey girl, what are you doing?",
message_type: "TEXT",
message_date: 0
}
}
Is storing the name relative to the user in the conversation object needed, or can I easily look up the name of the user by the users UID on the fly? Other than the name question, is this good? I don't want to get started with a really bad data structure.
Note: Yes, i know the UID for the conversation & message are the same, I got tired of making up variables.
I usually model the data that I need to show in a single screen in a single location in the database. That makes it possible to retrieve that data with a single read/listener.
Following that train of thought it makes sense to keep the user name in the conversation node. In fact, I usually keep the username in each message node too. The latter prevents the need for a lookup, although in this case I might be expanding the data model a bit far for the sake of keep the code as simple as possible.
For the naming of the chat: if this is a fairly standard chat app, then user may expect to have a persistent 1:1 chat with each other, so that every time you and I chat, we end up in the same room. A good approach for accomplishing that in the data model, can be found in this answer: Best way to manage Chat channels in Firebase
I don't think you structured it right. You should bare in mind "What if" complete analysis.
Though, I would recommend structuring it this way (I made it up for fun, not really tested in-terms of performance when getting a huge traffic. but you can always do denormalization to increase performance when needed):
{
"conversation-messages" : {
"--JpntMPN_iPC3pKDUX9Z" : {
"-Jpnjg_7eom7pMG6LDe1" : {
"message" : "hey! Who are you?",
"timestamp" : 1432165992987,
"type" : "text",
"userId" : "user:-Jpnjcdp6YXM0auS1BAT"
},
"-JpnjibdwWpf1k-zS3SD" : {
"message" : "Arya Stark. You?",
"timestamp" : 1432166001453,
"type" : "text",
"userId" : "user:-OuJffgdYY0jshTFD"
},
"-JpnkqRjkz5oT9sTrKYU" : {
"message" : "no one. a man has no name.",
"timestamp" : 1432166295571,
"type" : "text",
"userId" : "user:-Jpnjcdp6YXM0auS1BAT"
}
}
},
"conversations-metadata" : { // to show the conversation list from all users for each user
"-JpntMPN_iPC3pKDUX9Z" : {
"id": "-JpntMPN_iPC3pKDUX9Z",
"date":995043959933,
"lastMsg": "no one. a man has no name.",
"messages_id": "-JpntMPN_iPC3pKDUX9Z"
}
},
"users" : {
"user:-Jpnjcdp6YXM0auS1BAT" : {
"id" : "user:-Jpnjcdp6YXM0auS1BAT",
"name" : "many-faced foo",
"ProfileImg" : "...."
"conversations":{
"user:-Yabba_Dabba_Doo" : {
"conversation_id": "-JpntMPN_iPC3pKDUX9Z",
"read" : false
}
}
},
"user:-Yabba_Dabba_Doo" : {
"id" : "user:-Yabba_Dabba_Doo",
"name" : "Arya Stark",
"ProfileImg" : "...."
"conversations":{
"user:-Jpnjcdp6YXM0auS1BAT" : {
"conversation_id": "-JpntMPN_iPC3pKDUX9Z",
"read" : true
}
}
}
}
}
Implementing an Android+Web(Angular)+Firebase app, which has a many-to-many relationship: User <-> Widget (Widgets can be shared to multiple users).
Considerations:
List all the Widgets that a User has.
A User can only see the Widgets which are shared to him/her.
Be able to see all Users to whom a given Widget is shared.
A single Widget can be owned/administered by multiple Users with equal rights (modify Widget and change to whom it is shared). Similar to how Google Drive does sharing to specific users.
One of the approaches to implement fetching (join-style), would be to go with this advice: https://www.firebase.com/docs/android/guide/structuring-data.html ("Joining Flattened Data") via multiple listeners.
However I have doubts about this approach, because I have discovered that data loading would be worryingly slow (at least on Android) - I asked about it in another question - Firebase Android: slow "join" using many listeners, seems to contradict documentation .
So, this question is about another approach: per-user copies of all Widgets that a user has. As used in the Firebase+Udacity tutorial "ShoppingList++" ( https://www.firebase.com/blog/2015-12-07-udacity-course-firebase-essentials.html ).
Their structure looks like this:
In particular this part - userLists:
"userLists" : {
"abc#gmail,com" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"listName" : "Test List 1 Rename 2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1456950573084
},
"timestampLastChanged" : {
"timestamp" : 1457044229747
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044229747
}
}
},
"xyz#gmail,com" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"listName" : "Test List 1 Rename 2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1456950573084
},
"timestampLastChanged" : {
"timestamp" : 1457044229747
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044229747
}
},
"-KByb0imU7hFzWTK4eoM" : {
"listName" : "List2",
"owner" : "xyz#gmail,com",
"timestampCreated" : {
"timestamp" : 1457044332539
},
"timestampLastChanged" : {
"timestamp" : 1457044332539
},
"timestampLastChangedReverse" : {
"timestamp" : -1457044332539
}
}
}
},
As you can see, the copies of shopping list "Test List 1 Rename 2" info appears in two places (for 2 users).
And here is the rest for completeness:
{
"ownerMappings" : {
"-KBt0MDWbvXFwNvZJXTj" : "xyz#gmail,com",
"-KByb0imU7hFzWTK4eoM" : "xyz#gmail,com"
},
"sharedWith" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
}
}
},
"shoppingListItems" : {
"-KBt0MDWbvXFwNvZJXTj" : {
"-KBt0heZh-YDWIZNV7xs" : {
"bought" : false,
"itemName" : "item",
"owner" : "xyz#gmail,com"
}
}
},
"uidMappings" : {
"google:112894577549422030859" : "abc#gmail,com",
"google:117151367009479509658" : "xyz#gmail,com"
},
"userFriends" : {
"xyz#gmail,com" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
}
}
},
"users" : {
"abc#gmail,com" : {
"email" : "abc#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Agenda TEST",
"timestampJoined" : {
"timestamp" : 1456950523145
}
},
"xyz#gmail,com" : {
"email" : "xyz#gmail,com",
"hasLoggedInWithPassword" : false,
"name" : "Karol Depka",
"timestampJoined" : {
"timestamp" : 1456952940258
}
}
}
}
However, before I jump into implementing a similar structure in my app, I would like to clarify a few doubts.
Here are my interrelated questions:
In their ShoppingList++ app, they only permit a single "owner" - assigned in the ownerMappings node. Thus no-one else can rename the shopping list. I would like to have multiple "owners"/admins, with equal rights. Would such a keep-copies-per-user structure still work for multiple owner/admin users, without risking data corruption/"desynchronization" or "pranks"?
Could data corruption arise in scenarios like this: User1 goes offline, renames Widget1 to Widget1Prim. While User1 is offline, User2 shares Widget1 to User3 (User3's copy would not yet be aware of the rename). User1 goes online and sends the info about the rename of Widget1 (only to his own and User2's copies, of which the client code was aware at the time of the rename - not updating User3's copy). Now, in a naive implementation, User3 would have the old name, while the others would have the new name. This would probably be rare, but still worrying a bit.
Could/should the data corruption scenario in point "2." be resolved via having some process (e.g. on AppEngine) listening to changes and ensuring proper propagation to all user copies?
And/or could/should the data corruption scenario in point "2." be resolved via implementing a redundant listening to both changes of sharing and renaming, and propagating the changes to per-user copies, to handle the special case? Most of the time this would not be necessary, so it could result in performance/bandwidth penalty and complicated code. Is it worth it?
Going forward, once we have multiple versions deployed "in the wild", wouldn't it become unwieldy to evolve the schema, given how much of the data-handling responsibility lies with the code in the clients? For example if we add a new relationship, that the older client versions don't yet know about, doesn't it seem fragile? Then, back to the server-side syncer-ensurerer process on e.g. AppEngine (described in question "3.") ?
Would it seem like a good idea, to also have a "master reference copy" of every Widget / shopping-list, so as to give good "source of truth" for any syncer-ensurerer type of operations that would update per-user copies?
Any special considerations/traps/blockers regarding rules.json / rules.bolt permissions for data structured in such a (redundant) way ?
PS: I know about atomic multi-path updates via updateChildren() - would definitely use them.
Any other hints/observations welcome. TIA.
I suggest having only one copy of a widget for the entire system. It would have an origin user ID, and a set of users that have access to it. The widget tree can hold user permissions and change history. Any time a change is made, a branch is added to the tree. Branches can then be "promoted" to the "master" kind of like GIT. This would guarantee data integrity because past versions are never changed or deleted. It would also simplify your fetches... I think :)
{
users:[
bob:{
widgets:[
xxx:{
widgetKey: xyz,
permissions: *,
lastEdit...
}
]
}
...
]
widgets:[
xyz:{
masterKey:abc,
data: {...},
owner: bob,
},
...
]
widgetHistory:[
xyz:[
v1:{
data:{...},
},
v2,
v3
]
123:[
...
],
...
]
}
I was just looking in the docs but couldn't find anything.
So my web app has a structure that's similar to the one in this site.
For the sake of simplicity, let's say my app has only questions which are catalogued by tags. As suggested in the docs, we store our data with a flat, non-normalized structure (E.g.
{
"questions": {
...
},
"tags": {
"tag1": {
"name": "Tag1",
"questions": { "0": true, "1": true }
},
"tag2": {
"name": "Tag2",
"questions": { "2": true, "3": true }
}
}
}
), rather than a normalized structure without data replication like:
{
"questions": {
"0": { "title": ..., "tag": ... },
"1": { "title": ..., "tag": ... },
}
}
One of the advantages of using the first structure is that I can search for questions that have a certain tag without downloading all the data of all of the questions first: querying for /tags/tag1/questions, will return all the object with all of the question's keys. Now, I can query for the questions, but how do I do that?
I don't want to make ten requests for every question, it seems a waste of time and performance, but I couldn't find a way to make Firebase filter by multiple keys. It seems I can only give Firebase one input at a time. I think (and I hope) I am missing something here. What is it?
If I really can't do this, how do I search by tags here?
EDIT: Added my current query to the end
I have a large database of human names and am using elastic search (via symfony2's FOSElasticaBundle and Elastica) to do smarter searching of the names.
I have a full name field, and I want to index the people's names with standard, ngram, and phonetic analyzers.
I've got the analyzers set up in elastic search, and I can begin dumping data into the index. I'm wondering if the way I'm doing it here is the best way, or if I can apply the analyzers to a single field...the reason I ask is because when I do a get /website/person/:id, I see all three fields in plain text...I was expecting to see the analyzed data here, although I guess it must only exist in an inverted index rather than on the document. Examples I've seen use multiple fields, but is it possible to add multiple analyzers to a single field?
My config.yml:
fos_elastica:
clients:
default: { host: %elastica_host%, port: %elastica_port% }
indexes:
website:
settings:
index:
analysis:
analyzer:
phonetic_analyzer:
type: "custom"
tokenizer: "lowercase"
filter: ["name_metaphone", "lowercase", "standard"]
ngram_analyzer:
type: "custom"
tokenizer: "lowercase"
filter : [ "name_ngram" ]
filter:
name_metaphone:
encoder: "metaphone"
replace: false
type: "phonetic"
name_ngram:
type: "nGram"
min_gram: 2
max_gram: 4
client: default
finder: ~
types:
person:
mappings:
name: ~
nameNGram:
analyzer: ngram_analyzer
namePhonetic:
analyzer: phonetic_analyzer
When I check the mapping it looks good:
{
"website" : {
"mappings" : {
"person" : {
"_meta" : {
"model" : "acme\\websiteBundle\\Entity\\Person"
},
"properties" : {
"name" : {
"type" : "string",
"store" : true
},
"nameNGram" : {
"type" : "string",
"store" : true,
"analyzer" : "ngram_analyzer"
},
"namePhonetic" : {
"type" : "string",
"store" : true,
"analyzer" : "phonetic_analyzer"
}
}
}
}
}
}
When I GET the document, I see that all three fields are stored in plain text... maybe i need to set STORE: FALSE for these extra fields, or, is it not being analyzed properly?
{
"_index" : "website",
"_type" : "person",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{
"name":"John Doe",
"namePhonetic":"John Doe",
"nameNGram":"John Doe"
}
}
EDIT: The solution I'm currently using, which still requires some refinement but tests well for most names
//Create the query object
$boolQuery = new \Elastica\Query\Bool();
//Boost exact name matches
$exactMatchQuery = new \Elastica\Query\Match();
$exactMatchQuery->setFieldParam('name', 'query', $name);
$exactMatchQuery->setFieldParam('name', 'boost', 10);
$boolQuery->addShould($exactMatchQuery);
//Create a basic Levenshtein distance query
$levenshteinMatchQuery = new \Elastica\Query\Match();
$levenshteinMatchQuery->setFieldParam('name', 'query', $name);
$levenshteinMatchQuery->setFieldParam('name', 'fuzziness', 1);
$boolQuery->addShould($levenshteinMatchQuery);
//Create a phonetic query, seeing if the name SOUNDS LIKE the name that was searched
$phoneticMatchQuery = new \Elastica\Query\Match();
$phoneticMatchQuery->setFieldParam('namePhonetic', 'query', $name);
$boolQuery->addShould($phoneticMatchQuery);
//Create an NGRAM query
$nGramMatchQuery = new \Elastica\Query\Match();
$nGramMatchQuery->setFieldParam('nameNGram', 'query', $name);
$nGramMatchQuery->setFieldParam('nameNGram', 'boost', 2);
$boolQuery->addMust($nGramMatchQuery);
return $boolQuery;
No, you can't have multiple analyzers on a single field. The way you are doing is correct way of applying multiple analyzers by having different field names for same field.
The reason you are getting namePhonetic and nameNGram also in _source field is use of
"store" : true
It tells the ElasticSearch that you need those extra fields also in response. Use
"store" : false
that will solve your problem.
If you want to see the analyzed data on a field you can use _analyze api of elasticsearch.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html
Yes, these fields are stored in inverted index after analysis.
I hope I have answered all your doubts. Please let me know if you need more help on this.
Thanks