Firebase/Firestore Testing Pattern: mock fake data without code duplication (DRY) - firebase

This is a question about programming style and patterns when it comes to writing tests for complex systems written in Firebase/Firestore. I'm writing a web app using Firebase, Typescript, Angular, Firestore, etc...
Objective:
I have written basic security rules tests to test my users collection. I'll also be testing the function that writes the document, and e2e testing that the document is written when a new user signs up. So far so good.
The tests are clean so far - I manually define a few user objects, write it to the database before the tests with beforeEach() and run the tests. The tests depend on me knowing what data I wrote in the first place - what were the document ids, what were the field values etc... and then I check for certain operations to pass, certain operations to fail, depending on the custom claims provided. Similarly, with functions tests and e2e tests, I'll be checking if correct data was written, generated, etc...
The next step would be to test the chat functionality. Here's where I run into code duplication issues.
Issue:
So let's say I want to test the chat functionality or the transaction functionality. For these modules to be tested, the database needs to have fake user data, transaction data, test data etc... and, furthermore, within the tests, I need to have access to the fields, document Ids, etc.. that were written to the database, so I can access the documents for tests, etc...
For example, whether or not a chat message can be written to firestore depends on whether certain fields exist on the user document.
This would require me to manually define all the objects I plan to write to the database in the same file as the tests themselves, so I have access to what I wrote. As I test more complex and dependent modules of the system, since each test for each module is in a separate file, I either have to manually write out each object, or require it from another test file and write it. Each document has its payload and its Id that I need to keep track of, and even the fake user token objects I have to pass to firestore (or actual auth user records I have to import for online testing). This would mean a whole bunch of boilerplate code and duplication simply writing objects as such in all of my files:
const fakeUserPayload: User = {
handle: 'username',
email: faker.internet.email(),
...etc
}
// And so on and on for all test users, chat docs, transaction docs, etc...
Potential Solution: so there are a few potential solutions I came up with, but none seem to solve the problem.
For example, I thought I would write a module that simply populates the firestore database at the start of each test. The module would have a userPayloadFactory and other loops (using faker module) to automatically populate the Db with fake data.
Problem: If I did this, I wouldn't have access to the document Ids and field data in my tests, since it's automatically generated. I thought, maybe I could populate firestore with fake data, and then use an administrative db connection to read the fake user documents and their Ids back into the tests, and then use this data to conduct the tests. For example, I would find the user id and then generate a chat document and test for correct data / success. Except it seems incredibly messy to write data in one module and then read it back in another, especially since most tests require a specific document to be written to test for certain cases/scenarios. Which makes auto-populated mock data useless - so we're back to square one, where I have to manually define and write out a large number of fake objects in order to test rules, functions and functionality.
Potential Solution: I could maybe keep auth and firestore data in a JSON backup file, (so they remain static) and import them into the database with a shell command before each test suite. However, this has its own issues as it's not easy to dynamically generate new test cases or edge cases, and also difficult to continually re-export and update the JSON backup files as the project grows.
What is a better way to structure and write my tests so that I can automatically generate the documents and payloads I need, while having control and access over what gets written?
I'm hoping there's some kind of factory or pattern that can make this easier, scalable and more consistent and robust.

You're asking a really huge question, writing tests for big environments is a complex task and even more when it's coupled to database state. I will try to answer to the best of my knowledge so take my words with a grain of salt.
I believe you're dealing with two similar yet different concerns, automatic creation of mock data and edge-cases that require very specific document setup. These two are tightly coupled within the tests itself as you need both kinds of data to run them, however the requirements differ one another and therefore their creation should take that in account. Let's talk about your potential solutions from that perspective.
The JSON backup provides a static and consistent dataset that allows to repeat the tests over time being sure the environment hasn't changed and it's a good candidate to address the edge-case problem. It's downsides are that is hardly maintainable because any object modification to accommodate changes in TestA may break the expectations of TestB that also relies on it, it's almost assured you will loss the track of these nuances at some point; You can add new objects to accommodate code and test changes but this could lead to a combinatoric explosion of the objects you need to take care of as your project grows. Finally JSON files are not the way to go if you are going to require dynamically generated data.
The factory method is a great option to deal with the creation of arbitrary mock data since there are less restrictions placed on it, so writing a generator seems a good idea. You disliked this based on the fact that you need to know your data while running the tests but I think that's solvable. Your test might load the Factory module, then create the data and store it in-memory/HDD in addition to commit this changes to Firestore, there's no need to read the data from the database.
Your other other concern were the corner case documents which is trickier because you need very specifically shaped data. You might handcraft the documents yourself but then you got a poorly scalable solution. The alternative is trying to look for constraints/invariants in the shape of edge-case documents that you can abstract into factory methods. The worst scenario here is that when some edge-cases do not share any similarity with the rest you will need to write a whole method for each of these. I won't consider this a downside as it improves the modularity and maintainability of the Factory .
Overall, I would stick with the Factory pattern because it already offers techniques to follow the DRY principle by the means of isolating the creation of distinct objects, decoupling data creation from test execution and facilities to avoid disruptive breaks as the tests evolve with the project.
With that being said a little research got me to this page about the Builder Pattern that you may find interesting. Also this thread about code duplication in tests might be of interest. And finally just to comment out that Firebase has some testing functionality that can be found here.
Hope this helps.

Related

Firestore database model for Notion-like modules [duplicate]

I have seen videos and read the documentation of Cloud firestore, from Google Firebase service, but I can't figure this out coming from realtime database.
I have this web app in mind in which I want to store my providers from different category of products. I want perform a search query through all my products to find what providers I have for such product, and eventually access that provider info.
I am planning to use this structure for this purpose:
Providers ( Collection )
Provider 1 ( Document )
Name
City
Categories
Provider 2
Name
City
Products ( Collection )
Product 1 ( Document )
Name
Description
Category
Provider ID
Product 2
Name
Description
Category
Provider ID
So my question is, is this approach the right way to access the provider info once I get the product I want?
I know this is possible in the realtime database, using the provider ID I could search for that provider in the providers section, but with Firestore I am not sure if its possible or if this is right approach.
What is the correct way to structure this kind of data in Firestore?
You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.
The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.
You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.
So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:
Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?
If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.
In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.
If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.
In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.
This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.
For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.
So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.
After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.
Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.
Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.

Modeling document data and query performance

I have an aggerate data model (think a Customer entity with Widgets that belong to them as a list of embedded entities).
When I search for customers (e.g DocumentDBRepository.GetItemsAsync) That will be hydrating the customer data model along with the widgets for each. For efficiency reasons, I don’t really need the customer search to consider the widgets.
Are there any strategies for this in document dbs (such as a “LiteCustomer” entity)? I suspect not as that is just the nature of the “schema-less” data I’ve told it to store in the first place, but interested to hear thoughts.
Is this simply a ‘non issue’?
First, disclaimer: data modeling is hard. There are many nuances and a SO question can never cover entire business and everything left unsaid in both Q and A. There's no silver bullets. Regardless..
"LiteCustomer"
Perfectly fine to have such model in your client code. Your main Customer model may and will have many representations, most of them simple subsets of full model. Similarly to relational sql, select only what you need. Don't fetch data to client which you don't need.
The SQL API provides quite cool SQL tools to compose json for return documents for you.
physical storage model may differ from domain model
Consider your usage scenarios. If many scenarios happen to work with customer without widgets (or vice versa) then consider having widgets as separate document(s) in storage model.
In DocDB, the question is often not so much in querying logic but what your application expects on modification logic. Querying which is indexed is fast and every sql query can easily do transformations (though cross-doc joining is troublesome). For C(R)UD - you have less options - it's always by full document. Having too large documents will end up with higher RU costs and complex code.
Questions to consider:
How often customer changes without widget count/details changing?
How often widgets change without customer changing?
Do widgets on customer change independently or always as a set?
When do you need transactional updates on customer+widget changes?
How would queries look like? Can they be indexed?
Test.
True, changing model later is cumbersome in DocDB, but don't try to fix something before you know it's broken. If you are not sure you have an issue or not, then most likely fixing the maybe-issue is costlier than not fixing it.
If in doubt, generate loads of data and test it out.

Firebase data structure and url to use

I'm really new to firebase, want to try out a simple mix-client app on it - android, js. I have a users table and a tasks table. The very first question that comes to my mind is, how to store them (and thus how the url to be)? For example, based on the tasks table, should I use:
/tasks/{userid}/task1, /tasks/{userid}/task2, ...
Or
/{userid}/tasks/task1, /{userid}/tasks/task2, ...
The next question, based on the answer to the first one - why to use any of the versions?
In my opinion, the first version is good because domains are separated.
The second approach is good because data is stored per-user which may make some of the operations easier.
Any ideas/suggestions?
Update: For the current case, let's say there are following features:
show list of tasks for each user
add new task to the list
edit/delete a task by user.
Simple operations.
This answer might come in late, but here's how I feel about the question after a year's experience with Firebase.
For your very first question, it totally depends on which data your application will mostly read and how and in which order ( kind of like sorting ) you expect to read the data.
your first proposal of data structure, that is "/tasks/{userid}/task1", "taks/{userid}/task2"... is good if the application will oftentimes read the tasks as per users with an added advantage of possibly sorting the data by any task's "attribute" if I might call it so.
say each task has got a priority attribute then,
// get all of a user's tasks with a priority of 25.
var userTasksRef = firebase.database().ref("tasks/${auth.uid}");
userTasksRef.orderByChild("priority").equalTo(25).on(
"desired_event",
(snapshot) => {
//do something important here.
});
2. I'll highly advice against the second approach because generally most if not all of the data that is associated to that user will be stored under the "/{userid}/" node and with firebase's mechanism, should a situation be in which you need more than one datum at that path level, it will require you getting that data with all the other data that's associated to that user's node ( tasks and any other data included). I won't want that behavior on my database. Nonetheless, this approach still permits you to store the tasks as per the users or making multiple RESTfull requesting and collecting the required data datum after datum. Suggest fanning out the data structure if this situation is encountered. Totally valid data structure if there don't exist a use case in the application where in datum at the first level of the path is needed and only that datum is needed but rather the block of data available at that path level with all the data at the deriving paths at that level( that is 2nd 3rd ... levels).
As per the use cases you've described, and if the database structure you've given is exhaustive of your database structure, I'll say it isn't enough to cover your use cases.
Suggest reading the docs here. Great and exhaustive documentation of their's.
As a pick, the first approach is a better approach to modelling this data use case in NoSQL and more accurately Firebase's NoSQL database.

MVC3 + MongoDB Architecture: Store models directly to database?

I am currently developing a mvc3 application using mongodb. I am quite unsure on how i shall build the architecture. E.g. my app has a page used for managing the user profile for a registered user (like name, email, some attributes exposed inside enum-comboboxes). Hence i have a ManageProfileModel.cs with all properties to manage. What's the proper way to use the data with mongodb? Shall i store the ManageProfileModel data inside mongodb or do i have to add an additional layer containing domain classes like User.cs, Invoice.cs, ... and store these objects inside mongodb (these objects are being used in the models created)?
I am asking because a model for managing a user profile does not necessarily resemble a user (domain) object. My first approach is to store directly my (view)models inside mongodb. I am not sure if its that easy to get my (consistent) data at a later point.
Thanks!
I would store the models directly in Mongo as-is for most of your data. I'm sure you know this already, but Mongo focuses on denormalization, and so it's different than traditional relational databases that want you to normalize your data.
So for a profile, you might have a user, a set of invoices, a set of addresses etc. As you decide your data models, I would suggest the following:
Consider your UI. If you need user + profile + invoices, go ahead and make a document like that. Makes your life a lot easier.
Don't be afraid to have repeated information stored.
You will constantly be wondering if you should embed a document (adding addresses to user) or link to a document (put a list of references in an array referencing invoices). The rule I've heard that I think is good: If the data is constantly changing, make a link/reference. If it's immutable or slowly changing, embed it.
If your document will grow a lot over time, considering breaking it up. Mongo has to move your document in memory if it grows too big.

How to test this scenario?

I have a desktop application made in Flex using PureMVC multi-core and Sqlite as back-end.Now, I want to write integration tests.The proxy layer makes database calls using async method of SQLConnection.And, the result-handler throws notification.I want to test that expected values were modified in tables.Any ideas,how can this be done?
If you're asking for ways to do unit testing in Flex, I suggest checking out FlexUnit as it is the most commonly used unit testing framework for Flex.
From a conceptual stand point, basically, you need to write a method to retrieve data from the database; either part of your Unit Tests or calling your actual encapsulated application classes. Many people use Data Access Objects and Data Gateways for this purpose.
I suggest running before and after tests. Retrieve data from the database and check it's value. The run your test. Then retrieve data from the database to check the updated values. What your database call entails depends on the type of test. You may want to check for the number of records in a table or values of a specific record.
Flex's asynchronous nature makes this a little tricky.
I never thought testing database values from a UI was an area where unit testing shines, but I understand why it is necessary.

Resources