I'm starting to work with the Clockify APIs and I'd like to know if the different IDs are reliable or not? As in, is it a really bad idea to keep ther IDs in my database to know what's what or that's something that would work long term for sure? Thank you
IDs in Clockify represent the identities of their respective entities. They don't change, and are unique across the board, so you can use them in your database if you choose so.
That being said, it's always a good practice when dealing with outside data to assign them your own IDs, that way you're not reliant on contracts that you cannot enforce. Provide every entity with id (your own) and externalId or clockifyId and you won't ever be in position when outside change affected your domain logic.
Related
Official recommendation from the team is, to my knowledge, to put all datatypes into single collection that have something like type=someType field on documents to distinguish types.
Now, if we assume large databases with partitioning where different object types can be:
Completely different fields (so no common field for partitioning)
Related (through reference)
How to organize things so that things that should go together end up in same partition?
For example, lets say we have:
User
BlogPost
BlogPostComment
If we store them as separate types with type=user|blogPost|blogPostComment, in same collection, how do we ensure that user, his blogposts and all the corresponding comments end up in same partition?
Is there some best practice for this?
[UPDATE]
Can you ever avoid cross-partition queries completely? Should that be a goal? Or you just try to minimize them?
For example, you can partition your data perfectly for 99% of cases/queries but then you need some dashboard to show aggregates from all-the-data. Is that something you just accept as inevitable and try to minimize or is it possible to avoid it completely?
I've written about this somewhat extensively in other similar questions regarding Cosmos.
Basically, when dealing with many different logical entity types in a single Cosmos collection the easiest option is to put a generic (or abstract, as you refer to it) partition key on all your documents. At this point it's the concern of the application to make sure that at runtime the appropriate value is chosen. I usually name this document property either partitionKey, routingKey or something similar.
This is extremely important when designing for optimal query efficiency as your choice of partition keys can have a huge impact on query and throughput performance. A generic key like this lets you design the optimal storage of your data as it benefits whatever application you're building.
Even something like tenant does not make sense as different tenants might have wildly different data size and access patterns. Instead you could include the tenantId at runtime as part of your partition key as a kind of composite.
UPDATE:
For certain query patterns it might be possible to serve them entirely out of a single partition. It's definitely not the end of the world if things end up going cross partition though. The system is still quick. If possible, limiting the amount of partitions that need to be touched for a given query is ideal but you're never going to get away from it 100% of the time.
A partition should hold data related to a group that is expected to grow, for instance a Tenant which will group many documents (which can be of different types as you have mentioned) So the Partition Key in this instance should be the TenantId. The partitioning is more about the data relating to a group than the type of data. If the data is related to a User then you could use the UserId, however many users may comment on the same posts so it doesn't seem like a good candidate for a partition key unless there is some de-normalization of the user info so it doest have to relate back to the other users directly.. if that makes sense?
I'm looking at using firebase for a small project, but one stumbling block I can't find an answer to is that of security as it relates to things like indexes for a purely client side application.
For example, if I need an index for articles -- that is, not using priority -- for alternate sorting, how would I secure this?
The client would need access to the list that contains the article ids sorted appropriately, which as far as I can tell also means the client can then be malicious and completely reorder or delete that index, not just the article it posted.
For that matter, the same goes for setting priority, or really any kind of auxiliary data that is automatic and not user entered - a change date for example.
Am I missing something? Or are you forced to have a server component to accomplish that level of data security/integrity?
Edit: The simplest case of this I can think of, is something like a date created field on an article - What prevents the client from just setting that maliciously?
/competitions/1/clubs/5/players
/players/search?club_id=5
/players?club_id=5
When should I use a first-class URL for a resource, and when should I use a nested URL?
Update 1
Thanks for the answers so far. I'll try to clarify things a little further.
Competition and Club have a many-to-many relationship. Clubs can participate in multiple competitions. I guess that would make Club a first class entity, so the way to access a club would be for instance:
/clubs/33
But I also need to be able to access clubs that participate in a specific competition, so I need something like this too:
/competitions/2/clubs
But someone mentioned it isn't recommendable to make a resource accessible via multiple URI's. Doesn't this violate that?
Also, I presume a URI like this would not be preferable:
/competitions/2/clubs/33/players/5
But rather use this:
/clubs/33/players/5
Club has a one-to-many relationship with Player.
/competitions/1/clubs/5/players
As a URI is the identifier of a single resource, I would say the general rule is that if it is an object, it gets a 'first-class URL'.
I only tend to use the query parameters only when limiting/filtering lists, for example, /competitions/1/clubs/5/players?gender=MALE.
I use path elements if the relation "feels" tree/directory wise (like club has players /clubs/berlin/players). Parameters are more "tags", I use it often for search-filters (e.g. defenders of 'berlin' club with age older as 22 /clubs/berlin/players?position=defender&age=22).
I design URL structure by 'domain-importance'. The most basic concepts should go to the root. If possible don't go too deep down url-structure, I try not to duplicate or create alias collections which represent identical resources (costs double maintenance in code + documentation).
Generally putting /clubs as root feels more natural: /clubs/{club_id}/players
I would only expose players through /competitions/{comp_id}/clubs/{club_id}/players, if players-set of is different as /clubs/{club_id}/players, e.g. during competition
several players are blocked or didn't make it for the match-squad.
What do you mean with /competitions? Is it a tournament or a single match? If single match with two clubs maybe use home + away domain-concepts: /competitions/{comp_id}/home-club and
/competitions/{comp_id}/away-club .
Update-1 Answer
Here my thoughts on your update-question:
I guess /competitions/2/clubs is a subset of /clubs, not every club is competing in every competition. So both resources are different, so two URLs are fine.
Thinking again /competitions/2/clubs/33/players/5 should also be fine (but it is important that in server code duplication is avoided). This URL should even be mandatory when the returned resource is a subset of /clubs/33/players (e.g. players are injured or limit of team-size has been hit for specific competition).
I wouldn't put the ID numbers in the URL. They mean something only for those who actually knows what they mean, but for everyone else they are meaningless numbers.
You should always choose descriptive and related words for your URL, because the URL contribute to give informations about the linked resource.
Instead of using meaningless ID numbers, choose a unique name representing the name of the team or the competition, for example
/competitions/worldcup/clubs/usa/players
But if you really need to send that kind of anonymous data in the URL, then I would prefer to see them in a query.
Use only meaningful text for the URL.
I'm designing an application where my Order objects need to have a sequential and user-friendly Id field. I'm avoiding the HiLo algorithm because of the rather large gaps it produces (see here). Naturally, Guid values would make my corporate users go bananas. I'm also avoiding Oracle sequences because of the major disadvantages of it:
(From: NHibernate POID Generators revealed)
Post insert generators, as the name
suggest, assigns the id’s after the
entity is stored in the database. A
select statement is executed against
database. They have many drawbacks,
and in my opinion they must be used
only on brownfield projects. Those
generators are what WE DO NOT SUGGEST
as NH Team.
> Some of the drawbacks are the
following:
Unit Of Work is broken with the use of
those strategies. It doesn’t matter if
you’re using FlushMode.Commit, each
Save results in an insert statement
against DB. As a best practice, we
should defer insertions to the commit,
but using a post insert generator
makes it commit on save (which is what
UoW doesn’t do).
Those strategies
nullify batcher, you can’t take the
advantage of sending multiple queries
at once(as it must go to database at
the time of Save).
Any ideas/experience on implementing user-friendly IDs without major gaps between them?
Edit:
User friendly Id fields are ones my corporate users can memorize and even discuss and/or have phone conversations talking about a particular Order by its code, e.g. "I'm calling to know why the order #1625 was denied.".
The Id doesn't need to be strictly gapless, but I am worried that my users would get confused when they see gaps like 100, 201, 305. For my older projects, I currently implement NHibernate using Oracle sequences which occasionally lose a few sequences when exceptions are thrown, but yet keep a rather tidy order to them. The downside to them is how they break the Unit of Work which results in additional hits to the database for every Save command with or without the Session.Flush.
One option would be to keep a key-table that simply stores an incrementing value. This can introduce a few problems, namely possible locking issues as well as additional hits to the database.
Another option might be to refine what you mean by "User-friendly Id". This could consist of a combination of a Date/Time and a customer-specific sequence (or including the customer id as well). Also, your order id does not necessarily have to be the actual key on the table. There is nothing to say that you can't use a surrogate key with a separate "calculated" column which represents the order id.
The bottom-line is that it sounds like you want to use a surrogate key, but have the benefits of a natural key. It can be very difficult to have it both ways and a lot comes down to how you actually plan on using the data, how users interpret the data, and personal preference.
I have company, customer, supplier etc tables which all have address information related columns.
I am trying to figure out if I should create a new table 'addresses' and separate all address columns to that.
Having address columns on all tables is easy to use and query but I am not sure if it is the right way of doing it from a good design perspective, having these same columns repeat over few tables is making me curious.
Content of the address is not important for me, I will not be checking or using these addresses on any decision making processes, they are purely information related. Currently I am looking at 5 tables that have address information
The answer to all design questions is this:
It depends.
So basically, in the Address case it depends on whether or not you will have more than 1 address per customer. If you will have more than 1, put it in a new Addresses table and give each address a CustomerID. It's overkill (most times, it depends!) to create a generic Address table and map it to the company/customer/supplier tables.
It's also often overkill (and dangerous) to map addresses in a many-to-many relationship between your objects (as addresses can seem to magically change on users if you do this).
The one big rule is: Keep it simple!
This is called Database Normalization. And yes, you want to split them up, if for no other reason because if you need to in the future it will be much harder when you have code and queries in place.
As a rule, you should always design your database in 3rd Normal Form, even for simple apps (there will be a few cases where you won't for performance or logistic reasons, but starting out I would always try to make it 3rd Normal Form, and then learn to cheat after you know the right way of doing it).
EDIT: To expand on this and add some of the comments I have made on other's posts, I am a big believer in starting with a simple design when it comes to code and refactoring when it becomes clear that it is becoming too complex and more indepth object oriented principles would be appropriate. However, refactoring a database that is in production is not so simple. It is all about ROI. It is just too easy to design a normalized database from the outset to justify not doing it. The consequences of a poorly designed database can be catastrophic and it is usually too late before you come to that realization.
Yes, you should separate the addresses to a table of their own. It's a smart thing to know to ask. The key here is that general format of addresses is the same, regardless of who it is; a customer, a company, a supplier... they all have the same fields for addresses.
What makes this worthwhile is the ability to treat addresses as an atomic element; that is, you can generalize all the functionality related to addresses and have it deal with just one table, as opposed to having to worry about it dealing with several tables, and the associated schema drift that can occur.
If you are using those addresses only within the scope of their own tables, there may be no real benefit to moving them to their own tables.
Basically, it doesn't sound like it's worth the effort.
If there's an overlap between tables (i.e. the same organization is entered in both the company and supplier tables), and the address should always be the same in both tables, then it's probably worth moving address off in to its own table and having foreign keys to it from your other three tables. That way, you only have to update it in one spot when it changes.
If the three tables are entirely independent from each other, then there's not really much to gain from moving the data to another table, so you might as well leave it alone.
I think it entirely depends on the purpose of the database. Admittedly all address information is structurally the same and from a theoretical standpoint should all be in a single table linked from the parent table by a key.
However from a performance and query perspective, keeping them in their respective tables does simplify things from a reporting standpoint.
I have a situation with my current company [logistics] where the addresses are actually logically the same - they're all locations regardless of whether they're a pickup location, delivery location, customer etc.
In my case, I'd say that they should most definitely all be in one table. But if it's looking at it from a supplier, customer, contact information standpoint, I'd say that while theoretically it's nice to have the addresses in one table, in practice it won't buy you a whole lot as the data is unlikely to be repeated.
I disagree with Dave. The many-to-many approach (Address <-> User) is both safe, and highly advantageous.
When a customer moves, the addresses in the Address table does NOT change. Instead, the new address is found in the Address table, and the customer etc. is linked to that record. If the new address isn't already in the table, it's added to it.
So do address records themselves ever change? Yes, in cases like these:
it turns out that the address has a typo
US postal service changes the street name
These are the very situations where putting all addresses in one table without repetition pays off; any other arrangement would require an annoying and repetitive data entry.
Of course, if the database is abused, then it would be safer to avoid the many-to-many relationship. But by that token, if the database is in bad hands, it's better to just print everything out, store it in a file cabinet, and verify every transaction against the paper copy. So "protection against misuse" is not a good design principle, in my opinion.