Checking how many times a document has been read in Firestore? - firebase

I am working on a video based app that keeps track of how many views that video has received. I originally planned on having a field for view_count in my document that I would write to after someone watches a video.
However, knowing how many writes that could end up leading to, I started to wonder if it's possible to see a breakdown of how many reads have been made for each document in a collection and use that number instead. Since the videos are short, I figured this would be an accurate number for the view count.
Is this possible to access this kind of data?

Firestore does not expose any per-document access metrics. The available monitoring options are shown on this page on monitoring usage.
If you want something beyond that you'll have to build it yourself, as you originally intended.

Related

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

Update Partial Data In Firestore Document

I've built a real-time web editor similar to the one at firepad.io for my purposes, in a web-app.
I'm using firestore backend, utilizing it's real-time sync capabilities (Great!)
At this time, I store the content of my text editor as a document, the problem is that every time the content changes, I've to make a WRITE for the entire document in the firestore. I think it's a waste of network bandwidth, plus it's probably costly as well (I haven't done the cost evaluation yet).
Is there a way to make partial updates to a document in firestore?
Thank you.
You can update individual fields, each value being written entirely. That's as small of a write you can make. But a document write of any size is still billed as 1 write operation.

does accessing the firebase firestore database dashboard will be considered as read operation?

I am now in developing phase for the project. currently the project only using one Android app as the frontend. the query from Android using limit and pagination. but the total number of documents read is way above the expected number.
I am trying to figure this out, why the number of read documents is so big even though the user is only one (me). I am scared the project will not be feasible if the number of read is so big. thats why i need to figure out the firestore read behaviour
When I accessed the firestore dashboard, and select a collection like the image below, it will show blue loading indicator and then show all documents available. currently in the event collection I have 52 documents. I access all documents in the event collection like this for several times for debugging purpose.
so whenever i tap that event collection, I assume it will be counted as 52 read operation, so the read operation will not only come from Android device but also from the dashboard ? thats why the number of reads is so big. am I right ?
if thats the case....
say if I have 100000 documents in event collection, then whenever i tap that event collection, will i perform 100000 read operation as well ? is there a way to limit this dashboard read ?
so the read operation will not only come from the Android device but also from the dashboard? That's why the number of reads is so big. am I right?
Yes, you are right.
say if I have 100000 documents in event collection, then whenever I tap that event collection, will I perform 100000 read operation as well?
No, you'll be charged only for the number of documents that belong to the first page. Inside the Console, there is a pagination mechanism especially implemented for that. So you'll be not charged for all the documents that exist in your collection.
Is there a way to limit this dashboard read?
The limitation already exists but be aware that as much as you scroll down, you get more documents which means more read operations charged.
One thing to bear in mind about the Firebase console is that it reflects changes to visible documents in real time, and each one of those changes also costs you a read. So, if you leave the console open while documents are changing, you will accumulate reads over time, even if you aren't actively using the console. This is a common source of unexpected reads.

Cloud Firestore Data Structure

I am creating an application that uses cloud firestore to store data about "events" in our lab on several assets. We collected data for a few months and we are averaging about 2000 events per asset per month. Each event captures a few pieces of meta data that the user can query.
I imported all the data into firestore with a very simple layout at first.
Events (Collection of event data)
-> EventData (documents which contains a few fields for metadata)
From my understanding, even if the collection of events becomes quite large, for billing and speed of queries this won't be a problem (assuming I do some sort of pagination on the query results). The composite indexes are also very manageable with this structure.
The problem I see, is if someone goes and looks at the firestore console and brings that collection up, our read requests go through the roof. It seems that does a full read on the entire collection...which of course will kill us on billing as time goes on. I don't see this as a problem forever as eventually we should get to the point where everything is stable and won't need to go into the console very often, but what if someone does when we have a million or more records.
My next thought was to structure the database like this:
Events -> Assets -> {Asset_Name } -> {year_month} -> {Collection of
Document with field meta-data}
This certainly solves the issue of the ever growing collection of documents. The number of assets that we have is fixed, and the number of events is (effectively) capped to a maximum amount per month as well. The problem with this setup, however, is managing composite indexes. There are about 5 indexes needed for my original setup. I think this alternative setup means I would need to setup the same 5 indexes for each each collection of documents for every asset every month.
I thought maybe there could be a way to have a cloud function manage it for me (it doesn't appear there is an API for this). I think the number of indexes per project is also capped.
So, in the end, I am looking for recommendations on how to structure this database to limit reads if using the console, as well as keeping the indexes manageable. I am pretty new to NoSQL and perhaps I am just completely off.
I recommend you keep your structure as is if that's what's working for you. You should not need to optimize for reducing console reads. Console reads do count towards your usage but the console does not load the entire collection when you open the console.
The console loads just enough documents to let you scroll a bit and then it loads more documents if you scroll down. It will only load the entire collection if you scroll through the entire collection.

How to build large/busy RSS feed

I've been playing with RSS feeds this week, and for my next trick I want to build one for our internal application log. We have a centralized database table that our myriad batch and intranet apps use for posting log messages. I want to create an RSS feed off of this table, but I'm not sure how to handle the volume- there could be hundreds of entries per day even on a normal day. An exceptional make-you-want-to-quit kind of day might see a few thousand. Any thoughts?
I would make the feed a static file (you can easily serve thousands of these), regenerated periodically. Then you have a much broader choice, because it doesn't have to run below second, it can run even minutes. And users still get perfect download speed and reasonable update speed.
If you are building a system with notifications that must not be missed, then a pub-sub mechanism (using XMPP, one of the other protocols supported by ApacheMQ, or something similar) will be more suitable that a syndication mechanism. You need some measure of coupling between the system that is generating the notifications and ones that are consuming them, to ensure that consumers don't miss notifications.
(You can do this using RSS or Atom as a transport format, but it's probably not a common use case; you'd need to vary the notifications shown based on the consumer and which notifications it has previously seen.)
I'd split up the feeds as much as possible and let users recombine them as desired. If I were doing it I'd probably think about using Django and the syndication framework.
Django's models could probably handle representing the data structure of the tables you care about.
You could have a URL that catches everything, like: r'/rss/(?(\w*?)/)+' (I think that might work, but I can't test it now so it might not be perfect).
That way you could use URLs like (edited to cancel the auto-linking of example URLs):
http:// feedserver/rss/batch-file-output/
http:// feedserver/rss/support-tickets/
http:// feedserver/rss/batch-file-output/support-tickets/ (both of the first two combined into one)
Then in the view:
def get_batch_file_messages():
# Grab all the recent batch files messages here.
# Maybe cache the result and only regenerate every so often.
# Other feed functions here.
feed_mapping = { 'batch-file-output': get_batch_file_messages, }
def rss(request, *args):
items_to_display = []
for feed in args:
items_to_display += feed_mapping[feed]()
# Processing/returning the feed.
Having individual, chainable feeds means that users can subscribe to one feed at a time, or merge the ones they care about into one larger feed. Whatever's easier for them to read, they can do.
Without knowing your application, I can't offer specific advice.
That said, it's common in these sorts of systems to have a level of severity. You could have a query string parameter that you tack on to the end of the URL that specifies the severity. If set to "DEBUG" you would see every event, no matter how trivial. If you set it to "FATAL" you'd only see the events that that were "System Failure" in magnitude.
If there are still too many events, you may want to sub-divide your events in to some sort of category system. Again, I would have this as a query string parameter.
You can then have multiple RSS feeds for the various categories and severities. This should allow you to tune the level of alerts you get an acceptable level.
In this case, it's more of a manager's dashboard: how much work was put into support today, is there anything pressing in the log right now, and for when we first arrive in the morning as a measure of what went wrong with batch jobs overnight.
Okay, I decided how I'm gonna handle this. I'm using the timestamp field for each column and grouping by day. It takes a little bit of SQL-fu to make it happen since of course there's a full timestamp there and I need to be semi-intelligent about how I pick the log message to show from within the group, but it's not too bad. Further, I'm building it to let you select which application to monitor, and then showing every message (max 50) from a specific day.
That gets me down to something reasonable.
I'm still hoping for a good answer to the more generic question: "How do you syndicate many important messages, where missing a message could be a problem?"

Resources