I have just published an app that uses Firestore as a backend.
I want to change how the data is structured;
for example if some documents are stored in subcollections like 'PostsCollection/userId/SubcolletionPosts/postIdDocument' I want to move all this last postIdDocument inside the first collection 'PostsCollection'.
Obviously doing so would prevent users of the previous app version from writing and reading the right collection and all data would be lost.
Since I don't know how to approach this issue, I want to ask you what is the best approach that also big companies use when changing the data structure of their projects.
So the approach I have used is document versioning. There is an explanation here.
You basically version your documents so when you app reads them, it knows how to update those documents to get them to the desired version. So in your case, you would have no version, and need to get to version 1, which means read the sub-collections to the top collection and remove the sub collection before working with the document.
Yes it is more work, but allows an iterative approach to document changes. And sometimes, a script is written to update to the desired state and new code is deployed 😛. Which usually happens when someone wants it done yesterday. Which with many documents can have it's own issues.
Related
I have a very large collection of aprox 2 milions documents, all of them are outdated, and needed to be deleted.
I need to do this operation only one time, in the new data i have TTL (time to live) so i won't run into this problem again.
Sould i use the firestore console ui to delete those, or there is a better way to do this. is it possible to do this in one shot or sould i split it?
There's no single way that is pertinently better here.
The simplest option is probably to delete the documents from the console, but I often also use the Firebase CLI's firestore:delete command - and writing your own logic through the API is equally fine. Any of these can work fine, all will need to read the documents before deleting them, and none of them is going to be significantly faster than the other.
There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.
I have a few music albums - basically just files in folders - that I want to upload to Firebase Storage.
One would usually run a function after a file has been uploaded to create a Document containing the metadata about the Song but that's where Im stuck.
I can get most infos I need by reading the Tracks ID3 Tags but in a NoSql Database I think im supposed to not only create a Document for the Track but also a Document for each album with an array of all tracks - or at least an array with all track ids.
But when or how do I create the Album Document? Another example is the Album Cover.. I want to save the Url inside the Track Document as well as in the corresponding Album but that means that the Artwork is the first thing I need to upload because I can't add an URL because it doesn't exist yet.
I feel like I have to get this right before I start because updating everything afterwards is a pain.
Is using upload functions really the way to go here or is there really a tool or another way im missing.
thank you very much
You mentioned Firebase Storage wich is a just a cover for Cloud Storage and it's a obejct managment system not a Database, however I think you are refering to Firebase Firestore.
On firestore since as you mentioned is a NoSQL DB and the schema structure your Db should have, There no correct way to do this and will defitetly depend on each specific use case. However you can take a look at this docs where it's expalined how to arquitecture your schema thinking from a SQL to a NoSQL format.
Among other information the main pointsa are:
In general, you can treat documents as lightweight JSON records
You have complete freedom over what fields you put in each document
After you create the first document in a collection, the collection exists. If you delete all of the documents in a collection, it no longer exists.
You can use sub collections inside of collections
Deleting a document does not delete its subcollections!
And finally to have an idea on how to structure the information, you can take a look at this repo where "NoSQL-Spotify by Luke Halley" explains a NoSQL schema based on spotify so I think it shoudl fit your need or at least give you a starting point.
If I have User and Profile objects. What is the best way to structure my collections in firestore given that the follow scenarios can take place?
Users have a single Profile
Users can update their Profile
Users can save other users' profiles
Users can deleted their saved profiles
The same profile can't be saved twice
If Users and Profiles are separate collections, what is the best way to store saved profiles?
One way that came to mind was that each user has a sub collection called SavedProfiles. The id of each document is the id of the profile. Each saved Profile only contains a reference to the user who's profile it belongs to.
The other option was to do the same thing but store the whole profile of each saved profile.
The benefits of the first approach is that when a user updates their own profile there's no need to update any of the their profiles that have already been saved as it's only the reference that is stored. However, attempting to read a user's saved profiles may require two read operations (which will be quite often), one to get all the references then querying for all the profiles with those reference (if that's even possible???). This seems quite expensive.
The second approach seems like the right way to go as it solves the problem of reading all the saved profiles. But updating multiple saved profiles seems like an issue as each user's saved profiles may be unique. I understand that it's possible to do a batch update but will it be necessary to query each user in the db for their saved profiles and check if that updated profile exists, if so update it? I'm not too sure which way to go. I'm not super used to NoSQL data structures and it already seems like I've done something wrong since I've used a sub collection since it's advised to keep everything as denormalized as possible so please let me know if the structure to my whole db is wrong too, which is also quite possible...
Please provide some examples of how to get and update profiles/saved profiles.
Thank you.
Welcome to the conundrum that is designing a NoSQL database. There is no right or wrong answer, here. It's whatever works best for you.
As you have identified, querying will be much easier with your second option. You can easily create a Cloud Function which updates any profiles which have been modified.
Your first option will require multiple gets to the database. It really depends how you plan to scale this and how quick you want your app to run.
Option 1 will be a slow user experience, while all of the data is fetched. Option 2 will be a much faster user experience, but will requre your Cloud Function to update every saved profile. However, this is a background task so wouldn't matter if it takes a few seconds.
From what I know it seems that Meteor Framework stores part of data on the client. It's clear how to do it for personal todo list - because it's small and you can just copy everything.
But how it works in case of let's say Q&A site similar to this? The collection of questions are huge, you can't possibly copy it to the client. And you need to have filtering by tags and sorting by date and popularity.
How Meteor Framework handles such case? How it partition data? Does it make sense to use Meteor for such use case?
Have a look at the meteor docs, in particular the publish and subscribe section. Here's a short example:
Imagine your database contains one million posts. But your client only needs something like:
the top 10 posts by popularity
the posts your friends made in the last hour
the posts for the group you are in
In other words, some subset of the larger collection. In order to get to that subset, the client starts a subscription. For example: Meteor.subscribe('popularPosts'). Then on the server, there will be a corresponding publish function like: Meteor.publish('popularPosts', function(){...}.
As the client moves around the app (changes routes), different subscriptions may be started and stopped.
The subset of documents are sent to the client and cached in memory in a mongodb-like store called minimongo. The client can then retrieve the documents as needed in order to render the page.