I'm developing a web platform that may reach some million of users where I need to store users' images and docs.
I'm using Rackspace and now I need to define the files logic into cloud files service.
Rackspace allows to create up to 500,000 containers with an account (reference page 17, paragraph 4.2.2) and in addition they suggest to limit each container size up to 500,000 objects (reference Best practice - Limit the Number of Objects in Your Container), which is the best practice for users files management?
One container for user don't seems to be a good solution because there is the 500,000 containers limit.
Rackspace suggests to use virtual container. I'm a bit undecided how to use them.
Thanks in advance.
If you will only be interactive with the files via API calls having 200,000 objects is fine (from my experience, haven't had the need for anything larger).
if you want to try to use the web interface for ANY TASKS AT ALL you need to have far, far less than that. The web interface does not break contents up by folder, so if you have 30,000 objects, the web interface will just paginate them and show them to you in alphabetical order. This is ok for containers with up to a few hundred objects, but beyond that the web interface is unusable.
If you have some number of millions of users, you can use some part of the user ID as a shard key to decide what bucket to use. See http://docs.mongodb.org/manual/core/sharding-internals/#sharding-internals-shard-keys for information about choosing a shard key. It's written for Mongo users, but is applicable here. The takeaway is pick some attribute that will distribute your users somewhat evenly so you don't have one bucket that exceeds the max number of files you want to have per bucket.
One way is to use user ID's, which we can randomly assign and shard based on the first digit. For this example, we'll use the UID's 1234, 2234, 1123, and 2134. Say you want to break files up by the first digit of UID, you'd save user the files for 1234 and 1123 in the container "files_group_1" and the files for 2234 and 2134 in the "files_group_2" container.
Before picking a shard key, make sure you think about how many files users might store. If, for example, a user may store hundreds (or thousands) of files, then you will want to shard by a more unique key than the first digit of a UID.
Hope that helped.
Related
I am building a community-type app based on Firestore where users should have granual control over what kind of information they share with whom.
Users can have properties such as name, birthdate, etc. and for each of them they can decide to share it with the one of the following groups/roles:
Private
Contacts
Admin (Admins of organizations that user is a member of)
Organization (Members of organizations that a users is a member of)
Public (All users of the app)
As documents in Firestore will always be retrieved as a whole, I already know that I somehow will have to segregate my user properties by access level.
I've got two approaches so far:
Approach 1
Store each user property in a separate document that contains a field access level
Store some metadata in, for example /user/12345/meta/roles, so that I can point the security rules to those documents to validate access
Benefits:
Easy structure
Flexibly
(Almost) no data duplication
Drawbacks:
Lots of document reads for getting a user's profile
Approach 2
Store user profile in, for example /user/12345/profile/private and duplicate the public information into /user/12345/profile/public, and do the same for each access level
Benefits:
Reduced document reads
Drawbacks:
Complexity
It feels wrong to duplicate that much data
Does anyone have any experience with this and any suggestions or alternative approaches they can share?
Follow-up question:
Let’s say I store the list of members of an organization in a subcollection, that is only accessible for members of the organization (for privacy reasons). Doesn’t that mean that when querying that list of members from client side, I have to do it „blindly“, meaning I can’t know if the user can access that document until I actually try? The fact that the query might fail would tell me that the user is not actually a member of that organization.
Would you consider this kind of query that is set up for failure bad practice? Are there any alternatives that still allow to keep the memberlist private?
I think you are moving from a SQL environment to NoSql now which is why you are finding the Approach 2 as not the right way to proceed.
Actually approach 2 is the right way to proceed there are couple of advantages
1.) Reduced Document Reads - More cost savings. Firestore charges by number of reads and writes if you are reducing no of reads and writes optimally its always the way to go for. Also the cost of storage due is increased reads will always be less than the actual cost of reads if you are scaling up your application.
2.) In NoSql database your are allowed to duplicate data provided it is going to increase the read / search speed from the database.
I am not seeing the second approach as complex because that's the tradeoff you are making when Choosing a NoSql over Sql
I'm currently brainstorming and wondering if it's possible to easily communicate among multiple firestore databases. If so, I could isolate collections and therefore also isolate writes/updates on those collections from competing with other services reducing the risk that I hit the 10,000 write limit p/second on a given database.
Conceptually, I figure I can capture the necessary information from one document in DB_A (including the doc_id) in a read and then set that document in DB_B with the matching doc_id.
In a working example, perhaps one page has a lot of content (documents) that I need to generate and I don't want those writes to compete with writes used in other services on my app. When a user visits this page, we show those documents from DB_A and if the user is interested in one of those documents, we can take that document that we've effectively already read, and now write it into DB_B where user-specific content lives. It seems practical enough. Are there any indexing problems / other problems that could come out of this solution that I'm not seeing?
In the example you give the databases themselves are not communicating, but your app is communicating with multiple database instances. That is indeed possible. Since you can only have one Firestore instance per project, you will need to add multiple projects to your app.
What you're describing is known as sharding, as each database becomes a shard of (a subset of) your entire data set.
Note that it is quite uncommon to have shards to Firestore. If you predict such a high volume of writes, also have a look at Firebase's Realtime Database - as that is typically better suited for use-cases with more, small writes. Firestore is more suited for use-cases that have fewer larger writes, and many more readers. While you may also still to shard (and possibly shard more to reach the same read capacity) with Realtime Database, it can have multiple database instances per project - making the process easier to manage.
Let's assume I've started a SaaS platform with Azure Cosmos DB as my backend. I set up a container and used the tenant ID (a GUID) as the partition key. Things work well until we get two large clients. The overall system and the two large clients in particular would benefit a lot if we could move the two large clients into a different container and use more fine-grained partition keys for them. This is a bit a burden of us at the application-level, but it's doable.
How to move a whole "partition key" into a different container with a more fine-grained partition key? Is this something we can do "on-the-fly"? Do we need to take that tenant offline and use some sort of tool to migrate all the data? Is there a best practice?
So there's no built-in way of doing that, but there is as path forward, its called change feed. Basically you can use the change feed to migrate all the data from the beginning of the database to the last change. You would need to implement a filter to only filter on changes on that tenant partition key and you would also need to implement a way to distribute that tenants data across several partitions. There are some limitations to change feed, though.
Some other ways.
New to documentdb and I am trying to determine the best way to store documents. We are uploading documents every 15 minutes and I need to keep them as easily separated by upload as possible. At first glance, I thought I could have a database and a collection for each upload. Then, I discovered you can only have 3 collections per database. This leaves me with either adding a naming convention or trying to use folders and paths. According to the same source (http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/), we are limited to 100 paths per collection. This leaves folders. I have been looking, but I haven't found anything concrete on creating folders within a collection. The object API doesn't have an obvious add/create method.
Is this possible? If so, are we limited to how many (assuming I stay within the allowed collection/database size)?
You could define a sequential naming convention and create a range index on the collection indexing policy. In this way, if you need to retrieve a range of documents, you can do it in this way, which will leverage the indexing capabilities of docdb efficiently.
As a recommendation, you can examine the charge response header on the requests you fire off during your tests. This allows you to gauge how efficient your setup is (how stringent it is against the Db, which will translate into your cost structure for the service)
Sorry about the comment. What we ended up doing was just dumping everything into one collection. The azure documentdb query language (i.e. sql like) seems robust enough to handle detailed queries. Though I am not sure what the efficiency will be like once we have a ton of documents in there.
If I have 10 firefeeds in one app (one on each page) do I need do I need 10 firebases or can they be nested under one? Is it possible to set up 1000+ firefeeds under 1 firebase account?
I don't completely understand your question. Our firefeed demo app allows an unlimited number of users to sign up and they each get their own feed. E.g., here's mine. All of the data for it lives in a single Firebase.
If you're talking about having complete separate instances of firefeed under one Firebase (each with a distinct set of users and posts, etc.), you could do that too. Since Firebase's Data Structure allows nested children, you could nest each instance under it's own location in Firebase. E.g. /firefeed/1, /firefeed/2, etc. You would need to update the security rules to be aware of this extra level of nesting, but it shouldn't be too bad.