I'm wondering if this is possible, and if it's a good solution to my problem.
I want users to be able to subscribe to content. The content is associated with an id.. for instance:
'JavaScript': 1,
'C++': 2,
'Python': 3,
'Java': 4,
Let's say a user subscribes to 1, 3, and 4.
So their user json data would appear as:
'subscribed_to': [1,3,4]
Now in my firestore, I have posts. Each post gets assigned a content_id (1-4 for instance), and so when I query for the content that this user is subscribed to, how would I do that so as effectively as possible?
This is indeed a complex but common case, I would recommend to set a data structure similar to:
{
"subscriptions" {
javascript: { ... },
python: { ... },
java: { ... }
},
"users": {
1: {
subscribed_to: ['javascript', 'python']
}
}
}
It's very important that on your subscribed_to prop you use the doc name, cause this is the part that allows you to query them (the docs).
the big problem, how do I query this data? I don't have joins!
Case 1:
Assuming you have your user data when you apply load...
const fetchDocs = async (collection, docs) => {
const fetchs = docs.map(id => collection.doc(id).get())
const responses = await Promise.all(fetchs)
return responses.map(r => r.data())
}
const userSubscriptions = ['javascript', 'python']
const mySubscriptions = await fetchDocs(`subscriptions`, userSubscriptions)
Behind the scene, the sdk will group all the queries and do its best efforts to deliver em together. This works good, I'm 99% sure you still have to pay for each query individually each time the user logs in.
Case 2:
Create a view dashboard collection and pre-calculate the dashboard behind the scene, this approach involves cloud functions to listen or changes on the user changes (and maybe subscriptions as well) and copy each individual doc into another collection, let's say subscriptions_per_users. This is a more complex approach, I will require more time to explain but if you have a big application where costs are important and if the user is going to subscribe to a lot of things, then you might want to investigate about it.
Source: My own experience... a little of google can also help, there seems to be a lot of opinions about it, find what works best for you.
Related
I have some data in a Firebase Realtime Database that looks as follows:
users: {
$user1: {
email: "person#gmail.com"
name: "Bob"
},
$user2: {
email: "otherperson#gmail.com"
name: "Sally"
},
...
}
investments: {
$investment1: {
user: $user1,
ticker: "GOOG",
name: "Google Inc",
...
},
$investment2: {
user: $user1,
ticker: "AAPL",
name: "Apple Inc",
...
},
$investment3: {
user: $user2,
ticker: "TSLA",
name: "Tesla Inc",
...
},
...
}
Where $user1 represents the id of user1 (generated by firebase).
In my web application, I would like to:
Fetch all investments that belong to a particular user
Then, listen for any newly added investments that belong to that user
The reason I want to fetch all of the existing investments at the same time is so my application doesn't re-render for each time the child_added listener callback is called.
For a small number of investments, the following code should work fine.
const userId = "..."; // userId of logged in user
let initialDataLoaded = false;
// For newly added investments
database
.ref("investments")
.orderByChild("user")
.equalTo(userId)
.on("child_added", (snapshot) => {
if (initialDataLoaded) {
const newlyAddedInvestment = snapshot.val();
/* add new row to table */
}
});
// For existing investments
database
.ref("investments")
.orderByChild(userId)
.equalTo(userId)
.once("value", (snapshot) => {
const existingInvestments = snapshot.val();
/* display existing investments in a table */
initialDataLoaded = true;
});
}, []);
However, I am worried about what would happen if a user had a large number of existing investments. Particularly, I am worried about two things:
The child_added callback will be called for every existing investment. I know firebase guarantees that child_added events are triggered before value events, but can it guarantee that the code inside the child_added callback is executed before the code inside the value event callback is executed? If not, then isn't it possible that initialDataLoaded is set to true before all of the if (initialDataLoaded) {} lines are executed?
The child_added callback will be called for every existing investment. This could chew up a lot of bandwith.
I read in another SO post that I could add a createdAt timestamp to all of the investments, and then use a ref.orderByChild('createdAt').startAt(Firebase.ServerValue.TIMESTAMP); query to only fetch newly added investments. I like this solution as it scales to large datasets, however, since I am already ordering by user, I don't believe I can use this method (please correct me if I am wrong). If there is a way to retrieve only the users investments that also start after Firebase.ServerValue.TIMESTAMP this would also answer my question. If this is a limitation of Realtime Database, I might try out Cloud Firestore since I know that it supports more complex queries, which would solve this problem.
I might be overthinking this, but I want to get a reliable solution that doesn't cause unecessary re-renders and does not cause unecessary event triggers.
Reading you post I can help you I think I can help you with a little information.
Regarding the first question that you ask, about if you can guarantee that the code in the child_added's call back it would be executed before the code in he values' call back. The response will be it depends, depends in the code that you put in each call back one maybe it would take longer than the other but it would depend.
Regarding the second question you can check this page that will help you to pinpoint exactly what reference (including children) was hogging the bandwidth and check that part of the code it's taking that much bandwidth.
And for the last question you might be able to use Firebase.ServerValue.TIMESTAMP with a filter for users, but after looking for a while it could be a little complicated to implement, maybe the easier solution would be to use Cloud Firestore since handle more complex queries as you say.
I am building a recommender system where I use Firebase to store and retrieve data about movies and user preferences.
Each movie can have several attributes, and the data looks as follows:
{
"titanic":
{"1997": 1, "english": 1, "dicaprio": 1, "romance": 1, "drama": 1 },
"inception":
{ "2010": 1, "english": 1, "dicaprio": 1, "adventure": 1, "scifi": 1}
...
}
To make the recommendations, my algorithm requires as input all the data (movies) and is matched against an user profile.
However, in production mode I need to retrieve over >10,000 movies. While the algorithm can handle this relatively fast, it takes a lot of time to load this data from Firebase.
I retrieve the data as follows:
firebase.database().ref(moviesRef).on('value', function(snapshot) {
// snapshot.val();
}, function(error){
console.log(error)
});
I am there wondering if you have any thoughts on how to speed things up? Are there any plugins or techniques known to solve this?
I am aware that denormalization could help split the data up, but the problem is really that I need ALL movies and ALL the corresponding attributes.
My suggestion would be to use Cloud Functions to handle this.
Solution 1 (Ideally)
If you can calculate suggestions every hour / day / week
You can use a Cloud Functions Cron to fire up daily / weekly and calculate recommendations per users every week / day. This way you can achieve a result more or less similar to what Spotify does with their weekly playlists / recommendations.
The main advantage of this is that your users wouldn't have to wait for all 10,000 movies to be downloaded, as this would happen in a cloud function, every Sunday night, compile a list of 25 recommendations, and save into your user's data node, which you can download when the user accesses their profile.
Your cloud functions code would look like this :
var movies, allUsers;
exports.weekly_job = functions.pubsub.topic('weekly-tick').onPublish((event) => {
getMoviesAndUsers();
});
function getMoviesAndUsers () {
firebase.database().ref(moviesRef).on('value', function(snapshot) {
movies = snapshot.val();
firebase.database().ref(allUsersRef).on('value', function(snapshot) {
allUsers = snapshot.val();
createRecommendations();
});
});
}
function createRecommendations () {
// do something magical with movies and allUsers here.
// then write the recommendations to each user's profiles kind of like
userRef.update({"userRecommendations" : {"reco1" : "Her", "reco2", "Black Mirror"}});
// etc.
}
Forgive the pseudo-code. I hope this gives an idea though.
Then on your frontend you would have to get only the userRecommendations for each user. This way you can shift the bandwidth & computing from the users device to a cloud function. And in terms of efficiency, without knowing how you calculate recommendations, I can't make any suggestions.
Solution 2
If you can't calculate suggestions every hour / day / week, and you have to do it each time user accesses their recommendations panel
Then you can trigger a cloud function every time the user visits their recommendations page. A quick cheat solution I use for this is to write a value into the user's profile like : {getRecommendations:true}, once on pageload, and then in cloud functions listen for changes in getRecommendations. As long as you have a structure like this :
userID > getRecommendations : true
And if you have proper security rules so that each user can only write to their path, this method would get you the correct userID making the request as well. So you will know which user to calculate recommendations for. A cloud function could most likely pull 10,000 records faster and save the user bandwidth, and finally would write only the recommendations to the users profile. (similar to Solution 1 above) Your setup would like this :
[Frontend Code]
//on pageload
userProfileRef.update({"getRecommendations" : true});
userRecommendationsRef.on('value', function(snapshot) { gotUserRecos(snapshot.val()); });
[Cloud Functions (Backend Code)]
exports.userRequestedRecommendations = functions.database.ref('/users/{uid}/getRecommendations').onWrite(event => {
const uid = event.params.uid;
firebase.database().ref(moviesRef).on('value', function(snapshot) {
movies = snapshot.val();
firebase.database().ref(userRefFromUID).on('value', function(snapshot) {
usersMovieTasteInformation = snapshot.val();
// do something magical with movies and user's preferences here.
// then
return userRecommendationsRef.update({"getRecommendations" : {"reco1" : "Her", "reco2", "Black Mirror"}});
});
});
});
Since your frontend will be listening for changes at userRecommendationsRef, as soon as your cloud function is done, your user will see the results. This might take a few seconds, so consider using a loading indicator.
P.S 1: I ended up using more pseudo-code than originally intended, and removed error handling etc. hoping that this generally gets the point across. If there's anything unclear, comment and I'll be happy to clarify.
P.S. 2: I'm using a very similar flow for a mini-internal-service I built for one of my clients, and it's been happily operating for longer than a month now.
Firebase NoSQL JSON structure best practice is to "Avoid nesting data", but you said, you don't want to change your data. So, for your condition, you can have REST call to any particular node (node of your each movie) of the firebase.
Solution 1) You can create some fixed number of Threads via ThreadPoolExecutors. From each worker thread, you can do HTTP (REST call request) as below. Based on your device performance and memory power, you can decide how many worker threads you want to manipulate via ThreadPoolExecutors. You can have code snippet something like below:
/* creates threads on demand */
ThreadFactory threadFactory = Executors.defaultThreadFactory();
/* Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available */
ExecutorService threadPoolExecutor = Executors.newFixedThreadPool(10); /* you have 10 different worker threads */
for(int i = 0; i<100; i++) { /* you can load first 100 movies */
/* you can use your 10 different threads to read first 10 movies */
threadPoolExecutor.execute(() -> {
/* OkHttp Reqeust */
/* urlStr can be something like "https://earthquakesenotifications.firebaseio.com/movies?print=pretty" */
Request request = new Request.Builder().url(urlStr+"/i").build();
/* Note: Firebase, by default, store index for every array.
Since you are storing all your movies in movies JSON array,
it would be easier, you read first (0) from the first worker thread,
second (1) from the second worker thread and so on. */
try {
Response response = new OkHttpClient().newCall(request).execute();
/* OkHttpClient is HTTP client to request */
String str = response.body().string();
} catch (IOException e) {
e.printStackTrace();
}
return myStr;
});
}
threadPoolExecutor.shutdown();
Solution 2) Solution 1 is not based on the Listener-Observer pattern. Actually, Firebase has PUSH technology. Means, whenever something particular node changes in Firebase NoSQL JSON, the corresponding client, who has connection listener for particular node of the JSON, will get new data via onDataChange(DataSnapshot dataSnapshot) { }. For this you can create an array of DatabaseReferences like below:
Iterable<DataSnapshot> databaseReferenceList = FirebaseDatabase.getInstance().getReference().getRoot().child("movies").getChildren();
for(DataSnapshot o : databaseReferenceList) {
#Override
public void onDataChange(DataSnapshot o) {
/* show your ith movie in ListView. But even you use RecyclerView, showing each Movie in your RecyclerView's item is still show. */
/* so you can store movie in Movies ArrayList. When everything completes, then you can update RecyclerView */
}
#Override
public void onCancelled(DatabaseError databaseError) {
}
}
Although you stated your algorithm needs all the movies and all attributes, it does not mean that it processes them all at once. Any computation unit has its limits, and within your algorithm, you probably chunk the data into smaller parts that your computation unit can handle.
Having said that, if you want to speed things up, you can modify your algorithm to parallelize fetching and processing of the data/movies:
| fetch | -> |process | -> | fetch | ...
|chunk(1)| |chunk(1)| |chunk(3)|
(in parallel) | fetch | -> |process | ...
|chunk(2)| |chunk(2)|
With this approach, you can spare almost the whole processing time (but the last chunk) if processing is really faster than fetching (but you have not said how "relatively fast" your algorithm run, compared to fetching all the movies)
This "high level" approach of your problem is probably your better chance if fetching the movies is really slow although it requires more work than simply activating a hypothetic "speed up" button of a Library. Though it is a sound approach when dealing with large chunk of data.
How can we re-create template while switching routes?
For example, i have subscriber template. It detects when user scrolls down to a display and subscribes to more data. It takes several parameters.
Example:
amazing_page.html
{{#each}}
{{amazing_topic}}
{{/each}}
{{>subscriber name='topics' count=5}}
subscriber.js
//rough sample code
Template.subscriber.onCreated(function() {
var self = this;
var type = Template.currentData().name;
var count = Template.currentData().count;
var user = Template.currentData().user;
var skipCount = 0;
self.autorun(function(c){
self.subscribe(type, skipCount, user);
var block = true;
$(window).scroll(function() {
if (($(window).scrollTop() + $(window).height()) >= ($(document).height()) && block) {
block = false;
skipCount = skipCount + count;
console.log(type);
console.log(skipCount);
self.subscribe(type, skipCount, user, {
onReady: function() {
block = true;
},
onStop: function() {
console.log('stopped');
}
});
}
});
})
});
I use this template with different parameters in different routes.
The problem is if user switches some routes, and scrolls down in one page, all subscribers he gets in another pages will actualy work in this page. More, they will store increased values for them variables, and will do all included logic.
I found a bad decision when we use Route.getName (for example) comparing and name parameter of subscriber. It is not a best option. Can someone help me to find a good practice for that?:)
Simple Example:
We have 3 different routes:
1)News
2)Videos
3)Topics
These routes templates have included special subscriber-templates. And subscribtion works fine on scroll.
Ok, now let's visit all of them: News, Videos, Topics.
Good, now scroll down and... I have three instance of subscriber template what will subscribe on them own publications, because they not destroyed when we switch routes.
And, as a result - when user scrolling Topics page, he will call subscribtion for News and Videos too, and he will take data from these collections too;)
And - this is a problem:)
UPD:
Looks like we find a decision. If i use Template.instance (autorun/subscribe) it will start working expected, except some strange cases:)
First of all, when i go in another route in next iteration (scroll down) it returns me data from old, destroyed template + error. Next time (next iteration) it will start to subscribe to a correct data. Hmm...it looks like i have mistake in autorun section...or not?
Attached print screen from console
this
It sounds like you have multiple subscriptions to the same collection and that therefore the list of documents shown in various contexts can change in unexpected ways. Meteor manages multiple subscriptions on the same collection by synchronizing the union of the selected documents.
The simplest way to manage each of your views is to make sure that the data context for a particular view uses a .find() with the query you need. This will typically be the same query that your publication is using.
A different but less efficient approach is to .stop() the subscription when you leave a view.
Let's say that two users do changes to the same document while offline, but in different sections of the document. If user 2 goes back online after user 1, will the changes made by user 1 be lost?
In my database, each row contains a JS object, and one property of this object is an array. This array is bound to a series of check-boxes on the interface. What I would like is that if two users do changes to those check-boxes, the latest change is kept for each check-box individually, based on the time the when the change was made, not the time when the syncing occurred. Is GroundDB the appropriate tool to achieve this? Is there any mean to add an event handler in which I can add some logic that would be triggered when syncing occurs, and that would take care of the merging ?
The short answer is "yes" none of the ground db versions have conflict resolution since the logic is custom depending on the behaviour of conflict resolution eg. if you want to automate or involve the user.
The old Ground DB simply relied on Meteor's conflict resolution (latest data to the server wins) I'm guessing you can see some issues with that depending on the order of when which client comes online.
Ground db II doesn't have method resume it's more or less just a way to cache data offline. It's observing on an observable source.
I guess you could create a middleware observer for GDB II - one that checks the local data before doing the update and update the client or/and call the server to update the server data. This way you would have a way to handle conflicts.
I think to remember writing some code that supported "deletedAt"/"updatedAt" for some types of conflict handling, but again a conflict handler should be custom for the most part. (opening the door for reusable conflict handlers might be useful)
Especially knowing when data is removed can be tricky if you don't "soft" delete via something like using a "deletedAt" entity.
The "rc" branch is currently grounddb-caching-2016 version "2.0.0-rc.4",
I was thinking about something like:
(mind it's not tested, written directly in SO)
// Create the grounded collection
foo = new Ground.Collection('test');
// Make it observe a source (it's aware of createdAt/updatedAt and
// removedAt entities)
foo.observeSource(bar.find());
bar.find() returns a cursor with a function observe our middleware should do the same. Let's create a createMiddleWare helper for it:
function createMiddleWare(source, middleware) {
const cursor = (typeof (source||{}).observe === 'function') ? source : source.find();
return {
observe: function(observerHandle) {
const sourceObserverHandle = cursor.observe({
added: doc => {
middleware.added.call(observerHandle, doc);
},
updated: (doc, oldDoc) => {
middleware.updated.call(observerHandle, doc, oldDoc);
},
removed: doc => {
middleware.removed.call(observerHandle, doc);
},
});
// Return stop handle
return sourceObserverHandle;
}
};
}
Usage:
foo = new Ground.Collection('test');
foo.observeSource(createMiddleware(bar.find(), {
added: function(doc) {
// just pass it through
this.added(doc);
},
updated: function(doc, oldDoc) {
const fooDoc = foo.findOne(doc._id);
// Example of a simple conflict handler:
if (fooDoc && doc.updatedAt < fooDoc.updatedAt) {
// Seems like the foo doc is newer? lets update the server...
// (we'll just use the regular bar, since thats the meteor
// collection and foo is the grounded data
bar.update(doc._id, fooDoc);
} else {
// pass through
this.updated(doc, oldDoc);
}
},
removed: function(doc) {
// again just pass through for now
this.removed(doc);
}
}));
I have a meteor collection like this:
Cases = new Meteor.Collection('cases');
As well i have registered users (max 10). I now want to be able to "give" a single case to a registered user and be sure, that no other user is getting that specific case.
The User is working with the case (updating fields, deleting fields) and then sends it in some kind of archive after submitting the user should get a new case that is in the collection.
My thought was to have field called "locked" which initially is set to false and in the moment it is displayed at the user "locked" gets true and is not returned anymore:
return Cases.find({locked: false, done: false}, {limit: 1});
Any ideas how to do that in meteor?
Thanks
You just need to attach an owner field (or similar) to the case. That would allow you to do things like:
Only publish the case to the user who is also the owner using something like:
Meteor.publish('cases/unassigned', function() {
return Cases.find({owner: {$exists: false}});
});
Meteor.publish('cases/mine', function() {
return Cases.find({owner: this.userId});
});
Not allow a user to update or delete a case if it's not assigned to them:
Cases.allow({
update: function(userId, fieldNames, doc, modifier) {
return userId === doc.owner;
},
delete: function(userId, doc) {
return userId === doc.owner;
}
});
Obviously, these would need amending for stuff like super-users and you probably need some methods defined to allow users to take cases, but that's the general idea.
There are concurrency issues to deal with, to reliably allocate a case to only one person.
We need to solve two things:
1. Reliably assign the case to a user
2. Fetch the cases assigned to a user
Number 2. is easy, but depends on 1.
To solve 1., this should work:
var updated = Cases.update(
{_id: <case-to-assign>, version: "ab92c91"},
{assignedTo: Meteor.userId(), version: Meteor.Collection.ObjectID()._str});
if (updated) {
// Successfully assigned
} else {
// Failed to assign, probably because the record was changed first
}
Using this you can query for all of a users cases:
var cases = Cases.find({assignedTo: Meteor.userId()});
If 10 people try get a case at the same time, it should have a pre-set version field, and the MongoDB will only let the .update work once. As soon as the version field changes (due to an .update succeeding) the remaining updates will fail as the version field could no longer match.
Now that the allocation has taken place reliably, fetching is very simple.
As suggested by #Kyll, the filtering of cases should be done inside a Meteor publication.
It would also make sense to perform the case-assignment inside a Meteor method.
UPDATE:
#richsilv's solution is simpler than this one, and works fine.
This solution is useful if you need to know who won immediately, without making further requests to the server.