I am currently developing a mvc3 application using mongodb. I am quite unsure on how i shall build the architecture. E.g. my app has a page used for managing the user profile for a registered user (like name, email, some attributes exposed inside enum-comboboxes). Hence i have a ManageProfileModel.cs with all properties to manage. What's the proper way to use the data with mongodb? Shall i store the ManageProfileModel data inside mongodb or do i have to add an additional layer containing domain classes like User.cs, Invoice.cs, ... and store these objects inside mongodb (these objects are being used in the models created)?
I am asking because a model for managing a user profile does not necessarily resemble a user (domain) object. My first approach is to store directly my (view)models inside mongodb. I am not sure if its that easy to get my (consistent) data at a later point.
Thanks!
I would store the models directly in Mongo as-is for most of your data. I'm sure you know this already, but Mongo focuses on denormalization, and so it's different than traditional relational databases that want you to normalize your data.
So for a profile, you might have a user, a set of invoices, a set of addresses etc. As you decide your data models, I would suggest the following:
Consider your UI. If you need user + profile + invoices, go ahead and make a document like that. Makes your life a lot easier.
Don't be afraid to have repeated information stored.
You will constantly be wondering if you should embed a document (adding addresses to user) or link to a document (put a list of references in an array referencing invoices). The rule I've heard that I think is good: If the data is constantly changing, make a link/reference. If it's immutable or slowly changing, embed it.
If your document will grow a lot over time, considering breaking it up. Mongo has to move your document in memory if it grows too big.
Related
This is a question about programming style and patterns when it comes to writing tests for complex systems written in Firebase/Firestore. I'm writing a web app using Firebase, Typescript, Angular, Firestore, etc...
Objective:
I have written basic security rules tests to test my users collection. I'll also be testing the function that writes the document, and e2e testing that the document is written when a new user signs up. So far so good.
The tests are clean so far - I manually define a few user objects, write it to the database before the tests with beforeEach() and run the tests. The tests depend on me knowing what data I wrote in the first place - what were the document ids, what were the field values etc... and then I check for certain operations to pass, certain operations to fail, depending on the custom claims provided. Similarly, with functions tests and e2e tests, I'll be checking if correct data was written, generated, etc...
The next step would be to test the chat functionality. Here's where I run into code duplication issues.
Issue:
So let's say I want to test the chat functionality or the transaction functionality. For these modules to be tested, the database needs to have fake user data, transaction data, test data etc... and, furthermore, within the tests, I need to have access to the fields, document Ids, etc.. that were written to the database, so I can access the documents for tests, etc...
For example, whether or not a chat message can be written to firestore depends on whether certain fields exist on the user document.
This would require me to manually define all the objects I plan to write to the database in the same file as the tests themselves, so I have access to what I wrote. As I test more complex and dependent modules of the system, since each test for each module is in a separate file, I either have to manually write out each object, or require it from another test file and write it. Each document has its payload and its Id that I need to keep track of, and even the fake user token objects I have to pass to firestore (or actual auth user records I have to import for online testing). This would mean a whole bunch of boilerplate code and duplication simply writing objects as such in all of my files:
const fakeUserPayload: User = {
handle: 'username',
email: faker.internet.email(),
...etc
}
// And so on and on for all test users, chat docs, transaction docs, etc...
Potential Solution: so there are a few potential solutions I came up with, but none seem to solve the problem.
For example, I thought I would write a module that simply populates the firestore database at the start of each test. The module would have a userPayloadFactory and other loops (using faker module) to automatically populate the Db with fake data.
Problem: If I did this, I wouldn't have access to the document Ids and field data in my tests, since it's automatically generated. I thought, maybe I could populate firestore with fake data, and then use an administrative db connection to read the fake user documents and their Ids back into the tests, and then use this data to conduct the tests. For example, I would find the user id and then generate a chat document and test for correct data / success. Except it seems incredibly messy to write data in one module and then read it back in another, especially since most tests require a specific document to be written to test for certain cases/scenarios. Which makes auto-populated mock data useless - so we're back to square one, where I have to manually define and write out a large number of fake objects in order to test rules, functions and functionality.
Potential Solution: I could maybe keep auth and firestore data in a JSON backup file, (so they remain static) and import them into the database with a shell command before each test suite. However, this has its own issues as it's not easy to dynamically generate new test cases or edge cases, and also difficult to continually re-export and update the JSON backup files as the project grows.
What is a better way to structure and write my tests so that I can automatically generate the documents and payloads I need, while having control and access over what gets written?
I'm hoping there's some kind of factory or pattern that can make this easier, scalable and more consistent and robust.
You're asking a really huge question, writing tests for big environments is a complex task and even more when it's coupled to database state. I will try to answer to the best of my knowledge so take my words with a grain of salt.
I believe you're dealing with two similar yet different concerns, automatic creation of mock data and edge-cases that require very specific document setup. These two are tightly coupled within the tests itself as you need both kinds of data to run them, however the requirements differ one another and therefore their creation should take that in account. Let's talk about your potential solutions from that perspective.
The JSON backup provides a static and consistent dataset that allows to repeat the tests over time being sure the environment hasn't changed and it's a good candidate to address the edge-case problem. It's downsides are that is hardly maintainable because any object modification to accommodate changes in TestA may break the expectations of TestB that also relies on it, it's almost assured you will loss the track of these nuances at some point; You can add new objects to accommodate code and test changes but this could lead to a combinatoric explosion of the objects you need to take care of as your project grows. Finally JSON files are not the way to go if you are going to require dynamically generated data.
The factory method is a great option to deal with the creation of arbitrary mock data since there are less restrictions placed on it, so writing a generator seems a good idea. You disliked this based on the fact that you need to know your data while running the tests but I think that's solvable. Your test might load the Factory module, then create the data and store it in-memory/HDD in addition to commit this changes to Firestore, there's no need to read the data from the database.
Your other other concern were the corner case documents which is trickier because you need very specifically shaped data. You might handcraft the documents yourself but then you got a poorly scalable solution. The alternative is trying to look for constraints/invariants in the shape of edge-case documents that you can abstract into factory methods. The worst scenario here is that when some edge-cases do not share any similarity with the rest you will need to write a whole method for each of these. I won't consider this a downside as it improves the modularity and maintainability of the Factory .
Overall, I would stick with the Factory pattern because it already offers techniques to follow the DRY principle by the means of isolating the creation of distinct objects, decoupling data creation from test execution and facilities to avoid disruptive breaks as the tests evolve with the project.
With that being said a little research got me to this page about the Builder Pattern that you may find interesting. Also this thread about code duplication in tests might be of interest. And finally just to comment out that Firebase has some testing functionality that can be found here.
Hope this helps.
I have seen videos and read the documentation of Cloud firestore, from Google Firebase service, but I can't figure this out coming from realtime database.
I have this web app in mind in which I want to store my providers from different category of products. I want perform a search query through all my products to find what providers I have for such product, and eventually access that provider info.
I am planning to use this structure for this purpose:
Providers ( Collection )
Provider 1 ( Document )
Name
City
Categories
Provider 2
Name
City
Products ( Collection )
Product 1 ( Document )
Name
Description
Category
Provider ID
Product 2
Name
Description
Category
Provider ID
So my question is, is this approach the right way to access the provider info once I get the product I want?
I know this is possible in the realtime database, using the provider ID I could search for that provider in the providers section, but with Firestore I am not sure if its possible or if this is right approach.
What is the correct way to structure this kind of data in Firestore?
You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.
The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.
You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.
So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:
Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?
If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.
In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.
If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.
In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.
This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.
For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.
So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.
After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.
Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.
Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.
If I have User and Profile objects. What is the best way to structure my collections in firestore given that the follow scenarios can take place?
Users have a single Profile
Users can update their Profile
Users can save other users' profiles
Users can deleted their saved profiles
The same profile can't be saved twice
If Users and Profiles are separate collections, what is the best way to store saved profiles?
One way that came to mind was that each user has a sub collection called SavedProfiles. The id of each document is the id of the profile. Each saved Profile only contains a reference to the user who's profile it belongs to.
The other option was to do the same thing but store the whole profile of each saved profile.
The benefits of the first approach is that when a user updates their own profile there's no need to update any of the their profiles that have already been saved as it's only the reference that is stored. However, attempting to read a user's saved profiles may require two read operations (which will be quite often), one to get all the references then querying for all the profiles with those reference (if that's even possible???). This seems quite expensive.
The second approach seems like the right way to go as it solves the problem of reading all the saved profiles. But updating multiple saved profiles seems like an issue as each user's saved profiles may be unique. I understand that it's possible to do a batch update but will it be necessary to query each user in the db for their saved profiles and check if that updated profile exists, if so update it? I'm not too sure which way to go. I'm not super used to NoSQL data structures and it already seems like I've done something wrong since I've used a sub collection since it's advised to keep everything as denormalized as possible so please let me know if the structure to my whole db is wrong too, which is also quite possible...
Please provide some examples of how to get and update profiles/saved profiles.
Thank you.
Welcome to the conundrum that is designing a NoSQL database. There is no right or wrong answer, here. It's whatever works best for you.
As you have identified, querying will be much easier with your second option. You can easily create a Cloud Function which updates any profiles which have been modified.
Your first option will require multiple gets to the database. It really depends how you plan to scale this and how quick you want your app to run.
Option 1 will be a slow user experience, while all of the data is fetched. Option 2 will be a much faster user experience, but will requre your Cloud Function to update every saved profile. However, this is a background task so wouldn't matter if it takes a few seconds.
This is mobile app which can have different kind of users. I'm using realm only for the offline storage. Say I have two users A and B and a have a List Class. This class wont ever be shared, so different data for each user. How would i go in designing the schema? Considering versioning and migration.
A. Add a primary key for the List and assign it differently to user A and B.
B. Use two different realms
There is no one good way of defining your Realm schema and the solution to choose completely depends on the exact scenario.
If you want your users data to be completely independent of each other and you will never need to use a single query to retrieve both users data or to access some common data, then using separate Realm instances for each use seems like a good approach. It provides complete separation between your users data.
However, if your users might have some shared data or if you might end up making some statistics about all of your users even though their data is independent, using a single Realm instance is the way to go. In this case you should just create a one-to-many relationship between each of your users and whatever objects you want to store in your lists like this:
class User:Object {
let stuff = List<Stuff>()
}
For the sake of argument assume that I have a webform that allows a user to edit order details. User can perform the following functions:
Change shipping/payment details (all simple text/dropdowns)
Add/Remove/Edit products in the order - this is done with a grid
Add/Remove attachments
Products and attachments are stored in separate DB tables with foreign key to the order.
Entity Framework (4.0) is used as ORM.
I want to allow the users to make whatever changes they want to the order and only when they hit 'Save' do I want to commit the changes to the database. This is not a problem with textboxes/checkboxes etc. as I can just rely on ViewState to get the required information. However the grid is presenting a much larger problem for me as I can't figure out a nice and easy way to persist the changes the user made without committing the changes to the database. Storing the Order object tree in Session/ViewState is not really an option I'd like to go with as the objects could get very large.
So the question is - how can I go about preserving the changes the user made until ready to 'Save'.
Quick note - I have searched SO to try to find a solution, however all I found were suggestions to use Session and/or ViewState - both of which I would rather not use due to potential size of my object trees
If you have control over the schema of the database and the other applications that utilize order data, you could add a flag or status column to the orders table that differentiates between temporary and finalized orders. Then, you can simply store your intermediate changes to the database. There are other benefits as well; for example, a user that had a browser crash could return to the application and be able to resume the order process.
I think sticking to the database for storing data is the only reliable way to persist data, even temporary data. Using session state, control state, cookies, temporary files, etc., can introduce a lot of things that can go wrong, especially if your application resides in a web farm.
If using the Session is not your preferred solution, which is probably wise, the best possible solution would be to create your own temporary database tables (or as others have mentioned, add a temporary flag to your existing database tables) and persist the data there, storing a single identifier in the Session (or in a cookie) for later retrieval.
First, you may want to segregate your specific state management implementation into it's own class so that you don't have to replicate it throughout your systems.
Second, you may want to consider a hybrid approach - use session state (or cache) for a short time to avoid unnecessary trips to a DB or other external store. After some amount of inactivity, write the cached state out to disk or DB. The simplest way to do this, is to serialize your objects to text (using either serialization or a library like proto-buffers). This helps allow you to avoid creating redundant or duplicate data structure to capture the in-progress data relationally. If you don't need to query the content of this data - it's a reasonable approach.
As an aside, in the database world, the problem you describe is called a long running transaction. You essentially want to avoid making changes to the data until you reach a user-defined commit point. There are techniques you can use in the database layer, like hypothetical views and instead-of triggers to encapsulate the behavior that you aren't actually committing the change. The data is in the DB (in the real tables), but is only visible to the user operating on it. This is probably a more complicated implementation than you may be willing to undertake, and requires intrusive changes to your persistence layer and data model - but allows the application to be ignorant of the issue.
Have you considered storing the information in a JavaScript object and then sending that information to your server once the user hits save?
Use domain events to capture the users actions and then replay those actions over the snapshot of the order model ( effectively the current state of the order before the user started changing it).
Store each change as a series of events e.g. UserChangedShippingAddress, UserAlteredLineItem, UserDeletedLineItem, UserAddedLineItem.
These events can be saved after each postback and only need a link to the related order. Rebuilding the current state of the order is then as simple as replaying the events over the currently stored order objects.
When the user clicks save, you can replay the events and persist the updated order model to the database.
You are using the database - no session or viewstate is required therefore you can significantly reduce page-weight and server memory load at the expense of some page performance ( if you choose to rebuild the model on each postback ).
Maintenance is incredibly simple as due to the ease with which you can implement domain object, automated testing is easily used to ensure the system behaves as you expect it to (while also documenting your intentions for other developers).
Because you are leveraging the database, the solution scales well across multiple web servers.
Using this approach does not require any alterations to your existing domain model, therefore the impact on existing code is minimal. Biggest downside is getting your head around the concept of domain events and how they are used and abused =)
This is effectively the same approach as described by Freddy Rios, with a little more detail about how and some nice keyword for you to search with =)
http://jasondentler.com/blog/2009/11/simple-domain-events/ and http://www.udidahan.com/2009/06/14/domain-events-salvation/ are some good background reading about domain events. You may also want to read up on event sourcing as this is essentially what you would be doing ( snapshot object, record events, replay events, snapshot object again).
how about serializing your Domain object (contents of your grid/shopping cart) to JSON and storing it in a hidden variable ? Scottgu has a nice article on how to serialize objects to JSON. Scalable across a server farm and guess it would not add much payload to your page. May be you can write your own JSON serializer to do a "compact serialization" (you would not need product name,product ID, SKU id, etc, may be you can just "serialize" productID and quantity)
Have you considered using a User Profile? .Net comes with SqlProfileProvider right out of the box. This would allow you to, for each user, grab their profile and save the temporary data as a variable off in the profile. Unfortunately, I think this does require your "Order" to be serializable, but I believe all of the options except Session thus far would require the same.
The advantage of this is it would persist through crashes, sessions, server down time, etc and it's fairly easy to set up. Here's a site that runs through an example. Once you set it up, you may also find it useful for storing other user information such as preferences, favorites, watched items, etc.
You should be able to create a temp file and serialize the object to that, then save only the temp file name to the viewstate. Once they successfully save the record back to the database then you could remove the temp file.
Single server: serialize to the filesystem. This also allows you to let the user resume later.
Multiple server: serialize it but store the serialized value in the db.
This is something that's for that specific user, so when you persist it to the db you don't really need all the relational stuff for it.
Alternatively, if the set of data is v. large and the amount of changes is usually small, you can store the history of changes done by the user instead. With this you can also show the change history + support undo.
2 approaches - create a complex AJAX application that stores everything on the client and only submits the entire package of changes to the server. I did this once a few years ago with moderate success. The applicaiton is not something I would want to maintain though. You have a hard time syncing your client code with your server code and passing fields that are added/deleted/changed is nightmarish.
2nd approach is to store changes in the data base in a temp table or "pending" mode. Advantage is your code is more maintainable. Disadvantage is you have to have a way to clean up abandonded changes due to session timeout, power failures, other crashes. I would take this approach for any new development. You can have separate tables for "pending" and "committed" changes that opens up a whole new level of features you can add. What if? What changed? etc.
I would go for viewstate, regardless of what you've said before. If you only store the stuff you need, like { id: XX, numberOfProducts: 3 }, and ditch every item that is not selected by the user at this point; the viewstate size will hardly be an issue as long as you aren't storing the whole object tree.
When storing attachments, put them in a temporary storing location, and reference the filename in your viewstate. You can have a scheduled task that cleans the temp folder for every file that was last saved over 1 day ago or something.
This is basically the approach we use for storing information when users are adding floorplan information and attachments in our backend.
Are the end-users internal or external clients? If your clients are internal users, it may be worthwhile to look at an alternate set of technologies. Instead of webforms, consider using a platform like Silverlight and implementing a rich GUI there.
You could then store complex business objects within the applet, provide persistant "in progress" edit tracking across multiple sessions via offline storage and easily integrate with back-end services that providing saving / processing of the finalised order. All whilst maintaining access via the web (albeit closing out most *nix clients).
Alternatives include Adobe Flex or AJAX, depending on resources and needs.
How large do you consider large? If you are talking sessions-state (so it doesn't go back/fore to the actual user, like view-state) then state is often a pretty good option. Everything except the in-process state provider uses serialization, but you can influence how it is serialized. For example, I would tend to create a local model that represents just the state I care about (plus any id/rowversion information) for that operation (rather than the full domain entities, which may have extra overhead).
To reduce the serialization overhead further, I would consider using something like protobuf-net; this can be used as the implementation for ISerializable, allowing very light-weight serialized objects (generally much smaller than BinaryFormatter, XmlSerializer, etc), that are cheap to reconstruct at page requests.
When the page is finally saved, I would update my domain entities from the local model and submit the changes.
For info, to use a protobuf-net attributed object with the state serializers (typically BinaryFormatter), you can use:
// a simple, sessions-state friendly light-weight UI model object
[ProtoContract]
public class MyType {
[ProtoMember(1)]
public int Id {get;set;}
[ProtoMember(2)]
public string Name {get;set;}
[ProtoMember(3)]
public double Value {get;set;}
// etc
void ISerializable.GetObjectData(
SerializationInfo info,StreamingContext context)
{
Serializer.Serialize(info, this);
}
public MyType() {} // default constructor
protected MyType(SerializationInfo info, StreamingContext context)
{
Serializer.Merge(info, this);
}
}