another day, another problem :)
Since I woke up today I'm wondering about numbers. My experience does not allow me to answear my questions:
What is the small collection? what is the medium size collection? what is the huge collection?
I mean where're the lines? ex. between 0 to XX elements is small collection and so on... I read many articles, blog posts about using collections and almost everywhere we can read "this solution is good for small collections" etc. and I ask: What does that mean? I know there's no hard lines but I think we can name it more or less :)
There are no hard lines, because it depends on many different things.
In some situations the size of the objects stored in the collection matter, in others it doesn't.
In some situations the CPU/memory is fast enough that "small" can be as large as a million, in others a few dozens is already "mid-sized".
Personally and from my experience in my specific field I'd guesstimate these rules-of-thumb:
a small collections has at most 10-20 elements
a few hundered elements make a medium-sized collection
a collection with more than thousand elements is large-ish
Again: this is very subjective and situational.
Often "this works for small collections" is a synonym for "this has a non-linear runtime over the size of the collection" or (more specifically) "this is O(n^2)".
Related
Assuming a user has a thousand friends, but when calling a friend list on a specific screen, bringing in a thousand documents is expensive and time consuming. Even if pagination is performed, there will be a speed delay due to additional requests.
And according to the official documentation, you can put 1MB in documents, that is, about 1 million characters. However, what I worry about when using Arrays is that there will be situations where things get complicated in many ways.
Are there any exact standards?
You seem to be fully aware of the limitations of Firestore in this case. There is no new information that will help you here.
If you have a list of things to store in Firestore, and that list could exceed the max size of a document (1MB, as you correctly state), then you are going to have a problem. On the other hand, if you put all of those items in other documents, you're going to have to pay for all of those reads. That's the tradeoff -- you will have to decide which problem is worse. Typically, people choose to use separately document so they are limited by the size of a document. But that's your call to make.
You could try to shard that list across multiple documents somehow, but that will add much complexity to your code. Again, your call to make.
I have recently started exploring graph databases and Neo4J, and would like to work with my own data. At the moment I've hit some confusion. I've created an example image to illustrate my issue. In terms of efficiency, I'm wondering which option is better (and I want to get it right now in early days before I start handling larger amounts).
Option A: Using only the blue relationships, I can work out whether things are related to, or come under, the Ancient group. This process will be done many many times, however it is unlikely to be more than ~6 generations.
Option B: I implement the red relationships, so that it is much faster to work out if young structures belong to the Ancient group.
I'm trying not to use Labels in this scenario, as I'm trying to use labels for a specific purpose to simplify my life (linking structures across seperate networks), and I'm not sure if I should have a label to represent a node that already exists.
In summary, I'm wondering whether adding a whole new bunch of relationships, whilst taking more space, is worth it, or whether traversing to find all relatives is such a simple/inexpensive task that it isn't worth doing so. Or alternatively, both options are viable and this isn't a real issue at all. Thanks for reading.
I'd go with Option A. One of the strengths of Neo4j is that it traverses relationships very efficiently and quickly, and so, there is no need to materialise relationships (sometimes, relationships are materialised in complex and/or extremely large graphs, but this is not your case).
Not sure why you don't want to use labels? Labels serve to group nodes into sets of the same type, and are also index backed- this makes it much faster to find the starting point of your query (index lookup over full database scan).
First of, I know how Firestore works and have spent a lot of time, evaluating different approaches for a good structure. Still I am considering following scenario:
There is a database of known recipes. Users can add recipes, but they have to be confirmed to be real recipes and not just some variations. So every user can choose receipes from the user-generated list of recipes to state, that they know how to cook them (or add new ones).
Now I want users to share their list of receipes with others, but this is where I am not sure how this can be best accomplished using Firestore. The trick is, that I want to show all the recipes at once, and don't want to paginate them.
I am currently evaluating two possibilities:
Subcollections
Whenever a user shares his list, the user looking at said list will have to load the entire list of the recipes which can result in a high amount of document reads (I suppose realistically ~50, in very rare cases maybe 1000).
Pros:
More natural structure
Easier to maintain (e.g. deleting a recipe, checking if a specific one exists)
Easier to add fields (e.g. timeOfCreation, comment, personalRating, ...)
Cons:
Can result in a high amount of reads on the long run
Arrays
I could save every known recipe (the id and an imageURL) inside the user's document (or as a single subdocument "KnownRecipes") within an array. This array could be in form of
recipesKnown: [{rid: 293ndwa, imageURL: image1.com, timeAdded: 8371201332},
{rid: 9012831, imageURL: image1.com, timeAdded: 8371201871},
{rid: jd812da, imageURL: image1.com, timeAdded: 8371201118},
...
]
Pros:
I only need one document read whenever someone wants to see another user's list
Reading a user's list is probably faster
Cons:
It's hard to update a specific recipe (e.g. someone wants to change the imageURL: I need to change the list locally and send the entire document as an update to the server - since I cannot just change a single element in the array)
When a user decides to have around 1000 recipes (this will maybe never happen, but it could), the 1MiB limit of the Firestore limit could be reached. A possible workaround would be to create a seperate document and split those two arrays into these two documents.
For me, the idea with Subcollections seems to be the more "clean" solution to this problem, but maybe I am missing some arguments on why one of those solutions would be superior over the other.
My most common queries are as follows (ordered descending by importance):
Which recipes can a user cook
Add a recipe a user can cook to the user's list
Who can cook a specific recipe (there is a Recipe -> Cooks subcollection)
Update an existing recipe a user can cook
The answer to your question depends on the level of scalability you want to achieve.
If by design the amount of sub-data you want to store is limited and very low, you should use arrays, since you reduce the number of document reads, which means lower costs.
If your sub-data is supposed to increase "unlimitedly" over time, you should use sub-collections.
If you're building a database which is not supposed to scale in any direction (Proof of concept, very small business, etc.) just go with what you feel more comfortable with.
I'm researching the same question...
One of the questions is whether the data held in the document will be ever go pass 1MB that is the limit for a document. Researching a bit on how much it can be held in plain text in 1MB well it's a hell of a lot. Still if it were to be incredible bigger it would crash in the end. Thus if you think in a big-big way sub-collections.
If we had to use the Firebase element logic the answer would be sub-collections.
Still I guess the major point is the data pulled. If you call the user you will directly be pulling out that MB of data. Instead with a sub-collection it won't load, even if you loaded it you can still lazy-load.
I guess for the kind of setup you are doing sub-collections.
key is an additional collection's con/pro
key could help to avoid duplicates; but this requires thinking of what is duplicate's definition (which might change);
array's no-key behavior could be emulated via auto-id.
p.s. #Thomas's list of pros/cons in the question has been quite helpful.
I have a flex webapp that retrieves some names & addresses from a database. Project works fine but I'd like to make it faster. Instead of making a call to the database for each name request, I could pre-load all names into an array & filter the array when the user makes a request. Before I go down this route though I wanted to check if it is even feasible to have an application w/ 50,000 or 1 million elements in an array? What is the limit b/f it slows down the app? (I anticipate that it will have a lot to do w/ what else is going on in my app but for this sake lets just assume the app ONLY consists of this huge array).
Searching through a large array can be slower than necessary, particularly if you're talking about 1 million records.
Can you split it into a few still-large-but-smaller arrays? If you're always searching by account number, then divide them up based on the first digit or two digits.
To directly answer your question though, pure AS3 processing of a 50,000 element array should be fine. Once you get over 250,000 I'd think you need to break it up.
Displaying that many UI elements is different though. If you try to bind a chart to a dataProvider with 10,000 elements, it's too much. Same for a list or datagrid.
But for pure model data, not ui bound, I'd recommend up to 250,000 in my experience.
If your loading large amounts of data (not sure if your using Lists though), you could check out James Wards post about using AsyncListView with paging to grab the data in chuncks as its needed. Gonna try and implement something like this soon. His runnable example uses 100,000 rows with paging of 100 (works with HttpService/AMF type calls):
http://www.jamesward.com/2010/10/11/data-paging-in-flex-4/
Yes, you could probably stuff a few million items in an array if you wanted to, and the Flash player wouldn't yell at you. But do you really want to?
Is the application going to take longer to start if it has to download the entire database locally before being able to work? If the additional time needed to download that much data isn't significant, are a few database lookups really worth optimizing?
If you have a good use case to do this, you're going to have to pay attention to the way you use those data structures. Looping over the array to find an item is going to be a bit slow, so you'll want to create indexes locally, most likely by using a few hash structures. The more flexible you allow the search queries to be, the more interesting the indexing issues will be.
I would like to know a few practical use-cases (if they are not related/tied to any programming language it will be better).I can associate Sets, Lists and Maps to practical use cases.
For example if you wanted a glossary of a book where terms that you want are listed alphabetically and a location/page number is the value, you would use the collection TreeMap(OrderedMap which is a Map)
Somehow, I can't associate MultiSets with any "practical" usecase. Does someone know of any uses?
http://en.wikipedia.org/wiki/Multiset does not tell me enough :)
PS: If you guys think this should be community-wiki'ed it is okay. The only reason I did not do it was "There is a clear objective way to answer this question".
Lots of applications. For example, imagine a shopping cart. That can contain more than one instance of an item - i.e. 2 cpu's, 3 graphics boards, etc. So it is a Multi-set. One simple implementation is to also keep track of the number of items of each - i.e. keep around the info 2 cpu's, 3 graphics boards, etc.
I'm sure you can think of lots of other applications.
A multiset is useful in many situations in which you'd otherwise have a Map. Here are three examples.
Suppose you have a class Foo with an accessor getType(), and you want to know, for a collection of Foo instances, how many have each type.
Similarly, a system could perform various actions, and you could use a Multiset to keep track of how many times each action occurred.
Finally, to determine whether two collections contain the same elements, ignoring order but paying attention to how often instances are repeated, simply call
HashMultiset.create(collection1).equals(HashMultiset.create(collection2))
In some fields of Math, a set is treated as a multiset for all purposes. For example, in Linear Algebra, a set of vectors is teated as a multiset when testing for linear dependancy.
Thus, implementations of these fields should benefit from the usage of multisets.
You may say linear algebra isn't practical, but that is a whole different debate...
A Shopping Cart is a MultiSet. You can put several instances of the same item in a Shopping Cart when you want to buy more than one.