Implementing a chat app, it seems that real time pagination seems quiet complex with many edge cases given the constant addition and possible deletion of messages.
And so to avoid this, i'm wondering, if i use a stream that fetches the most recent 50 messages, then if a user scrolls up i increase the fetch limit to 200 and so on.
return firestore
.collection("chatrooms")
.doc(chatroomId)
.collection('messages')
.orderBy('timestamp', descending: true)
.limit(documentLimit)
.snapshots()
Would that logic result in:
50 + 200 = 250 Reads. Or 50 + 150 = 200 Reads.?
I've read the documentation, but i'm not to sure of the outcome for this case.
In the best case, you have local caching enabled and/or attach the new snapshot listener before removing the old one. In that case the 50 existing documents will be read from the cache, so you won't be charged for them as server-side document reads.
In the worst case, you've disabled local caching and attach the new listener after removing the old one. In that case, the 50 old documents are no longer present on the client, and thus also need to be read (and charged) from the server.
Related
Imagine I am trying to get an infinite scrolling similar to that of Instagram by paginating the post and I will also be listening to the post detail changes for each post that have been queried.
I start off by querying the first 10 posts:
void getPosts(){
postStream =
firestore.collection('posts')
.limit(10)
.orderBy('createdAt', descending: true)
.snapshots().listen((documentSnapshot) {... code to convert to each doc from JSON..}
}
As the user scrolls down, I detect the scroll with the listener and query for more posts
void getMorePosts(){
paginatedPostStream =
firestore.collection('posts')
.limit(10)
.orderBy('createdAt', descending: true)
.startAfterDocument(lastPostDocSnapshot)
.snapshots().listen((documentSnapshot) {... code to convert to each doc from JSON..}
}
When the user no longer needs these post, I'll cancel the stream in the following way:
postStream.cancel();
paginatedPostStream.cancel();
However, I realized that doing so only stopped listening to the earliest and latest paginated stream. For example, the user may have queried in the following way
10 | 10 | 10 | 10 | 10
^ ^
but the canceling of stream only applies for the first and last 10 posts queried. How can I more effectively cancel the listener to prevent data leakage?
Any suggestion is welcomed. Thank you.
As discussed in the comments, you will need to manage all streams instead of just the first and last one. If you want to limit the resources being consumed, you will need to close each stream as it becomes obsolete.
There is no single best practice here, but most implementation I know of, implement a virtual window of few (say 3-5) screens full of information. So if in your case you can display 10 documents on the screen at once, you'd keep listeners on 3-5 streams of 10 documents: the information that is currently displayed and 1 or 2 screenfull's up and down. Any stream outside of this virtual window can then be cancelled to reduce resource consumption.
Assuming in my app I've already read an entire collection with a snapshot listener.
If a document is added several seconds after an entire collection has been read does it triggers an entire collection read? or just the new document?
For example - a chat app between 2 people:
a collection (represents a chatroom) contains 4 documents (each represents a message) is already been read by a user, hence 4 reads. if the person on the other side sends another message, does this mean another 5 reads just went underway (4 old document, and a brand new one), resulting in a total of 9 reads? or only the new collection is been read, resulting in a total of 5 reads (4 from the beginning and another after the listener detected a new document inserted to the collection)?
Just to be clear all of the procedures described in the example (from the initial read) takes several seconds.
I can't find a solution or a similar question online, and I cant understand if in the firebase documentation there is an answer to it no matter how much I search there.
EDIT WITH SOMEWHAT OF AN ANSWER:\
After trying to figure out the exact numbers, I've run a test that resulted the following:
a collection with (say) 20 document (that a listener is attached to), that you add another 10 documents result in way more than 10 reads.
My conclusions for chat like implementations I would recommend using Firebase realtime database and not firestore. With a childeventListener you can extract and read only new messages without the need to re-read several models that you've already pre-loaded.
EDIT CODE I'VE RUN TO TEST:
collectionRef.orderBy("timestamp", Query.Direction.ASCENDING).addSnapshotListener(new EventListener<QuerySnapshot>() {
#Override
public void onEvent(#Nullable QuerySnapshot value, #Nullable FirebaseFirestoreException error) {
if(error!= null)
return;
ArrayList<ChatMessage> justRead = new ArrayList<>();
for(DocumentSnapshot snapshot : value){
ChatMessage data = snapshot.toObject(ChatMessage.class);
justRead.add(data);
Log.i(TAG, "msg: " + data.getMsg() + "\n" + "from cache:" + snapshot.getMetadata().isFromCache());
}
}
}));
LOG:
2020-10-13 08:26:50.393 9275-9275/****** I/FirebaseLobbyViewModel: msg: first msg
from cache:true
2020-10-13 08:26:50.402 9275-9275/****** I/FirebaseLobbyViewModel: msg: second msg
from cache:true
2020-10-13 08:27:04.853 9275-9275/****** I/FirebaseLobbyViewModel: msg: first msg
from cache:false
2020-10-13 08:27:04.853 9275-9275/****** I/FirebaseLobbyViewModel: msg: second msg
from cache:false
2020-10-13 08:27:04.853 9275-9275/****** I/FirebaseLobbyViewModel: msg: third msg
from cache:false
You can see that I've started the app with 2 msg in the collection, already been read thus accessed from cache, but after sending the third message all 3 messages are not obtained from cache
Snapshot listeners only download the document data for documents that have changed since the last snapshot. They will not re-read the entire set of results again. The unchnanged documents are delivered to your snapshot listener from memory, for as long as the listener remains added to the query. If you remove the listener and add it again, it will cause all matching documents to be read again.
after trying to figure out the exact numbers, i've run a test that resulted the following: a collection with (say) 20 document (that a listiner is attached to), that you add another 10 documents result in about 20+ reads (more then 10 that's for sure). my conclutions for chat like implementations I would reccomend using Firebase realtime Database and not firestore. with a childeventListiner you can extract and read only new messages without the need to re-read several models that you've already pre-loaded.
After i execute my Firestore-Test-App made with Flutter i look at the Firestore analytics to see how many request my Test-App made. It shows me that i make a minimum of 20 up to 60 document reads with 1 start of the Test. The problem is, the test should result in a maximum of ~1-3 reads as i understand it.
I've read https://firebase.google.com/docs/firestore/pricing. It did help to understand the billing logic of firestore, but following that logic in the article i should be making a maximum of ~5 reads.
This thread: Firestore - unexpected reads also suggests that maybe, the document reads come from the opened Firebase console, viewing the documents. So i closed it before the test and opened it 30 min after. This did not change the result. I also set breakpoints and the code did only execute once.
I opened a completly new Flutter Project to test it.
This is the only part making read requests:
CollectionReference dbUsers = dbInstance.collection("Users");
var user = dbUsers
.where("docId", isEqualTo: fireAppUser.user.uid)
.limit(1)
.snapshots();
var _userSub = user.listen((value) {
if (value.documents.isNotEmpty && value.documents.first.data != null)
print(value.documents.first.data);
});
_userSub.cancel()
Below my firestore rule, which is on default settings for now.
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
match /{document=**} {
allow read, write;
}
}
}
I have exactly 5 documents in my Database.
I expect to have a maximum of ~5 document reads. Please help me to undestand why this Snippet causes an unexpected number of reads. What could cause it?
Edit: forgot to append the cancel() to the snippet
That code can make a nearly unlimited number of reads, since it's adding a listener to the document. It will cost 1 read the first time you run it, and continue to incur reads as the listener remains added, as the document changes over time.
If the accounting doesn't make sense to you, contact Firebase support with your exact reproduction instructions.
I'm having slow performance issues with Firestore while retrieving basic data stored in a document compared to the realtime database with 1/10 ratio.
Using Firestore, it takes an average of 3000 ms on the first call
this.db.collection(‘testCol’)
.doc(‘testDoc’)
.valueChanges().forEach((data) => {
console.log(data);//3000 ms later
});
Using the realtime database, it takes an average of 300 ms on the first call
this.db.database.ref(‘/test’).once(‘value’).then(data => {
console.log(data); //300ms later
});
This is a screenshot of the network console :
I'm running the Javascript SDK v4.50 with AngularFire2 v5.0 rc.2.
Did anyone experience this issue ?
UPDATE: 12th Feb 2018 - iOS Firestore SDK v0.10.0
Similar to some other commenters, I've also noticed a slower response on the first get request (with subsequent requests taking ~100ms). For me it's not as bad as 30s, but maybe around 2-3s when I have good connectivity, which is enough to provide a bad user experience when my app starts up.
Firebase have advised that they're aware of this "cold start" issue and they're working on a long term fix for it - no ETA unfortunately. I think it's a separate issue that when I have poor connectivity, it can take ages (over 30s) before get requests decide to read from cache.
Whilst Firebase fix all these issues, I've started using the new disableNetwork() and enableNetwork() methods (available in Firestore v0.10.0) to manually control the online/offline state of Firebase. Though I've had to be very careful where I use it in my code, as there's a Firestore bug that can cause a crash under certain scenarios.
UPDATE: 15th Nov 2017 - iOS Firestore SDK v0.9.2
It seems the slow performance issue has now been fixed. I've re-run the tests described below and the time it takes for Firestore to return the 100 documents now seems to be consistently around 100ms.
Not sure if this was a fix in the latest SDK v0.9.2 or if it was a backend fix (or both), but I suggest everyone updates their Firebase pods. My app is noticeably more responsive - similar to the way it was on the Realtime DB.
I've also discovered Firestore to be much slower than Realtime DB, especially when reading from lots of documents.
Updated tests (with latest iOS Firestore SDK v0.9.0):
I set up a test project in iOS Swift using both RTDB and Firestore and ran 100 sequential read operations on each. For the RTDB, I tested the observeSingleEvent and observe methods on each of the 100 top level nodes. For Firestore, I used the getDocument and addSnapshotListener methods at each of the 100 documents in the TestCol collection. I ran the tests with disk persistence on and off. Please refer to the attached image, which shows the data structure for each database.
I ran the test 10 times for each database on the same device and a stable wifi network. Existing observers and listeners were destroyed before each new run.
Realtime DB observeSingleEvent method:
func rtdbObserveSingle() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observeSingleEvent(of: .value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Realtime DB observe method:
func rtdbObserve() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from RTDB at: \(start)")
for i in 1...100 {
Database.database().reference().child(String(i)).observe(.value) { snapshot in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
let data = snapshot.value as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore getDocument method:
func fsGetDocument() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).getDocument() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Firestore addSnapshotListener method:
func fsAddSnapshotListener() {
let start = UInt64(floor(Date().timeIntervalSince1970 * 1000))
print("Started reading from FS at: \(start)")
for i in 1...100 {
Firestore.firestore().collection("TestCol").document(String(i)).addSnapshotListener() { document, error in
let time = UInt64(floor(Date().timeIntervalSince1970 * 1000))
guard let document = document, document.exists && error == nil else {
print("Error: \(error?.localizedDescription ?? "nil"). Returned at: \(time)")
return
}
let data = document.data() as? [String: String] ?? [:]
print("Data: \(data). Returned at: \(time)")
}
}
}
Each method essentially prints the unix timestamp in milliseconds when the method starts executing and then prints another unix timestamp when each read operation returns. I took the difference between the initial timestamp and the last timestamp to return.
RESULTS - Disk persistence disabled:
RESULTS - Disk persistence enabled:
Data Structure:
When the Firestore getDocument / addSnapshotListener methods get stuck, it seems to get stuck for durations that are roughly multiples of 30 seconds. Perhaps this could help the Firebase team isolate where in the SDK it's getting stuck?
Update Date March 02, 2018
It looks like this is a known issue and the engineers at Firestore are working on a fix. After a few email exchanges and code sharing with a Firestore engineer on this issue, this was his response as of today.
"You are actually correct. Upon further checking, this slowness on getDocuments() API is a known behavior in Cloud Firestore beta. Our engineers are aware of this performance issue tagged as "cold starts", but don't worry as we are doing our best to improve Firestore query performance.
We are already working on a long-term fix but I can't share any timelines or specifics at the moment. While Firestore is still on beta, expect that there will be more improvements to come."
So hopefully this will get knocked out soon.
Using Swift / iOS
After dealing with this for about 3 days it seems the issue is definitely the get() ie .getDocuments and .getDocument. Things I thought were causing the extreme yet intermittent delays but don't appear to be the case:
Not so great network connectivity
Repeated calls via looping over .getDocument()
Chaining get() calls
Firestore Cold starting
Fetching multiple documents (Fetching 1 small doc caused 20sec delays)
Caching (I disabled offline persistence but this did nothing.)
I was able to rule all of these out as I noticed this issue didn't happen with every Firestore database call I was making. Only retrievals using get(). For kicks I replaced .getDocument with .addSnapshotListener to retrieve my data and voila. Instant retrieval each time including the first call. No cold starts. So far no issues with the .addSnapshotListener, only getDocument(s).
For now, I'm simply dropping the .getDocument() where time is of the essence and replacing it with .addSnapshotListener then using
for document in querySnapshot!.documents{
// do some magical unicorn stuff here with my document.data()
}
... in order to keep moving until this gets worked out by Firestore.
Almost 3 years later, firestore being well out of beta and I can confirm that this horrible problem still persists ;-(
On our mobile app we use the javascript / node.js firebase client. After a lot of testing to find out why our app's startup time is around 10sec we identified what to attribute 70% of that time to... Well, to firebase's and firestore's performance and cold start issues:
firebase.auth().onAuthStateChanged() fires approx. after 1.5 - 2sec, already quite bad.
If it returns a user, we use its ID to get the user document from firestore. This is the first call to firestore and the corresponding get() takes 4 - 5sec. Subsequent get() of the same or other documents take approx. 500ms.
So in total the user initialization takes 6 - 7 sec, completely unacceptable. And we can't do anything about it. We can't test disabling persistence, since in the javascript client there's no such option, persistence is always enabled by default, so not calling enablePersistence() won't change anything.
I had this issue until this morning. My Firestore query via iOS/Swift would take around 20 seconds to complete a simple, fully indexed query - with non-proportional query times for 1 item returned - all the way up to 3,000.
My solution was to disable offline data persistence. In my case, it didn't suit the needs of our Firestore database - which has large portions of its data updated every day.
iOS & Android users have this option enabled by default, whilst web users have it disabled by default. It makes Firestore seem insanely slow if you're querying a huge collection of documents. Basically it caches a copy of whichever data you're querying (and whichever collection you're querying - I believe it caches all documents within) which can lead to high Memory usage.
In my case, it caused a huge wait for every query until the device had cached the data required - hence the non-proportional query times for the increasing numbers of items to return from the exact same collection. This is because it took the same amount of time to cache the collection in each query.
Offline Data - from the Cloud Firestore Docs
I performed some benchmarking to display this effect (with offline persistence enabled) from the same queried collection, but with different amounts of items returned using the .limit parameter:
Now at 100 items returned (with offline persistence disabled), my query takes less than 1 second to complete.
My Firestore query code is below:
let db = Firestore.firestore()
self.date = Date()
let ref = db.collection("collection").whereField("Int", isEqualTo: SomeInt).order(by: "AnotherInt", descending: true).limit(to: 100)
ref.getDocuments() { (querySnapshot, err) in
if let err = err {
print("Error getting documents: \(err)")
} else {
for document in querySnapshot!.documents {
let data = document.data()
//Do things
}
print("QUERY DONE")
let currentTime = Date()
let components = Calendar.current.dateComponents([.second], from: self.date, to: currentTime)
let seconds = components.second!
print("Elapsed time for Firestore query -> \(seconds)s")
// Benchmark result
}
}
well, from what I'm currently doing and research by using nexus 5X in emulator and real android phone Huawei P8,
Firestore and Cloud Storage are both give me a headache of slow response
when I do first document.get() and first storage.getDownloadUrl()
It give me more than 60 seconds response on each request. The slow response only happen in real android phone. Not in emulator. Another strange thing.
After the first encounter, the rest request is smooth.
Here is the simple code where I meet the slow response.
var dbuserref = dbFireStore.collection('user').where('email','==',email);
const querySnapshot = await dbuserref.get();
var url = await defaultStorage.ref(document.data().image_path).getDownloadURL();
I also found link that is researching the same.
https://reformatcode.com/code/android/firestore-document-get-performance
Firefeed is a very nice example of what can be achieved with Firebase - a fully client side Twitter clone. So there is this page : https://firefeed.io/about.html where the logic behind the adopted data structure is explained. It helps a lot to understand Firebase security rules.
By the end of the demo, there is this snippet of code :
var userid = info.id; // info is from the login() call earlier.
var sparkRef = firebase.child("sparks").push();
var sparkRefId = sparkRef.name();
// Add spark to global list.
sparkRef.set(spark);
// Add spark ID to user's list of posted sparks.
var currentUser = firebase.child("users").child(userid);
currentUser.child("sparks").child(sparkRefId).set(true);
// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
list.forEach(function(follower) {
var childRef = firebase.child("users").child(follower.name());
childRef.child("feed").child(sparkRefId).set(true);
});
});
It's showing how the writing is done in order to keep the read simple - as stated :
When we need to display the feed for a particular user, we only need to look in a single place
So I do understand that. But if we take a look at Twitter, we can see that some accounts has several millions followers (most followed is Katy Perry with over 61 millions !). What would happen with this structure and this approach ? Whenever Katy would post a new tweet, it would make 61 millions Write operations. Wouldn't this simply kill the app ? And even more, isn't it consuming a lot of unnecessary space ?
With denormalized data, the only way to connect data is to write to every location its read from. So yeah, to publish a tweet to 61 million followers would require 61 million writes.
You wouldn't do this in the browser. The server would listen for child_added events for new tweets, and then a cluster of workers would split up the load paginating a subset of followers at a time. You could potentially prioritize online users to get writes first.
With normalized data, you write the tweet once, but pay for the join on reads. If you cache the tweets in feeds to avoid hitting the database for each request, you're back to 61 million writes to redis for every Katy Perry tweet. To push the tweet in real time, you need to write the tweet to a socket for every online follower anyway.