I am having issues during a ProcessUpdate on a SSAS dimension. Here is, albeit simplified, how the dimension is set up:
In a "Reservation" dimension, I have my DimensionKey attribute set to be the reservation itslef (Key attribute: ReservationKey, Name Attribute: Reservation Code). To that member is attached (flexible Relationship) a "ReservationAgent" Member (Key: ReservationAgentCode, Name: ReservationAgentName) and to the "ReservationAgent" member, I have "ReservationAgentCode" member which is related by a rigid Relationship to the "Reservation Agent Name".
Now I'm sure you'd agree with me that it is impossible for the "ReservationAgent" to suddenly relate to a different "ReservationAgentCode" since the key of both attributes is the SQL column "ReservationAgentCode".
I have come in a situation where all Reservations of a given ReservationAgentCode were moved to a different ReservationAgentCode. so in essence, the old ReservationAgentCode is now non-existent. Remember here that "Reservation" and "ReservationAgentCode" Relationship is flexible.
Upon doing a ProcessUpdate on the dimension, SSAS gives me an error about not being able to change a rigid relationship between "ReservationAgent" and "ReservationAgentCode".
Since both the "ReservationAgent" and "ReservationAgentCode" members are effectively new, there is no movement of the old "ReservationAgent" to a different "ReservationAgentCode". Only the old members need to be deleted in a sense.
Has anyone come to that kind of situation? Is it something i'm not quite understanding correctly? To me it seems like a glitch/bug but before open a connect ticket at MS, i'd like to have feedback from the experts here.
Let me know if you need more info and I could also elaborate a quick solution demonstrating my issue if needed.
Thanks in advance!
It's not a bug, David.
Process Update removes the old ReservationAgentCode member from dimension (since on stage of members detecting doesn't see it in source). Server won't allow this deletion, since we have rigid relation later ("rigid attribute relationship is one where the member relationships are guaranteed to be fixed" - performance guide 2008 r2).
By the way, what is the benefit of using rigid relations there (performance / process time / both)?
Related
How would one validate a unique constraint using DDD? Let's say that an Entity has a property name that must be unique among the system and there is a specific EntityRepository method nameExists(name): bool... This is what I found people suggests to do, because the repository is the abstraction of the collection of all the Entityies and should be able to perform this check.
So before creating/adding the new Entity the command / domain service could check for the existence of a newName against the repository, but I think that this will not always work because of concurrency.
In a concurrent scenario where two transactions are started simultaneously, the EntityRepository's nameExists method might return false for both transactions, and as a result of this two entries with the same name will be incorrectly inserted.
I am sure that I am missing something basic, but the answers I found all point to the repository exists method - TBH others say that a UNIQUE constraint should be put on the DB to catch the concurrency case, but what if one uses Event Sourcing or a persistence layer that does not have unique constraints?
| Follow up question |
What if the uniqueness constraint is to be applied in different levels of a hierarchy?
A Container's name must be unique in the system and then Child names must be unique inside a Container.
Let's say that a transactional DB takes care of the uniqueness at the lowest possible level, what about the domain?
Should I still express the uniqueness logic at the domain level, e.g. with a Domain Service for the system-level uniqueness and embedding Child entities inside the Container entity and having a business rule (and therefore making Container the aggregate root)?
Or should I not bother with "replicating" the uniqueness in the domain and (given there are no other rules to apply between the two) split Container and Child? Will the domain lack expressiveness then?
I am sure that I am missing something basic
Not something basic.
The term we normally use for enforcing a constraint, like uniqueness, across a set of entities is set validation. Greg Young calls your attention to a specific question:
What is the business impact of having a failure
Most set constraints fall into one of two categories
constraints that need to be true when the system reaches steady state, but may not hold while work is in progress. In business processes, these are often handled by detecting conflicts in the stored data, and then invoking various mitigation processes to resolve the conflict.
constraints that need to be true always.
The first category includes things like double booking a seat on an airplane; it's not necessarily a problem unless both people show up, and even then you can handle it by bumping someone to another seat, or another flight.
In these cases, you make a best effort - you look at a recent copy of the set, make sure there are no conflicts there, then hope for the best (accepting that some percentage of the time, you'll have missed a change).
See Memories, Guesses and Apologies (Pat Helland, 2007).
Second category is the hard one; to ensure the invariant holds you have to lock the entire set to ensure that races don't allow two different writers to insert conflicting information.
Relational databases tend to be really good at set validation - putting the entire set into a single database is going to be the right answer (note the assumption that the set is small enough to fit into a single database -- trying to lock two databases at the same time is hard).
Another possibility is to ensure that only one writer can update the set at any given time -- you don't have to worry about a losing a race when you are the only one running in it.
Sometimes you can lock a smaller set -- imagine, for example, having a collection of locks with numbers, and the hash code for the name tells you which lock you have to grab.
This simplest version of this is when you can use the name as the aggregate identifier itself.
if one uses Event Sourcing or a persistence layer that does not have unique constraints?
Sometimes, you introduce a persistent store dedicated to the set, just to ensure that you can maintain the invariant. See "microservices".
But if you can't change the database, and you can't use a database with the locking guarantees that you need, and the business absolutely has to have the set valid at all times... then you single thread that part of the work.
Everybody that wants to change a name puts a request into a queue, and the one thread responsible for managing the invariant certifies each and every change.
There's no magic; just hard work and trade offs.
I am currently working on a mobile application that uses the OneNote REST API that sometimes have really enormous ping, so the cache implementation is one of the most important aspects with impact on my application's performance. But, for the purpose of implementing traffic-effective cache with data staying up-to-date the timestamp is needed for all of the entities that are not stable or their amount can grow (practically any entities fit these conditions). So, The question is whether the timestamp (e.g. lastModified, lastModifiedTime etc.) properties do not present in some entities, for example - permissions, principal objects or they are just hidden and it is possible to use the $expand to get them.
If you don't see a timestamp in one of the entities, then the entity does not have it.
Adding timestamp for some of this entities can be challenging for us, as they the datastore we're using might not have it, but I encourage you to ask for suggestions in our uservoice page.
https://onenote.uservoice.com/forums/245490-onenote-developer-apis
There is one example to explaining associations in UML.
A person works for a company; a company has a number offices.
But I am unable to understand the relationship between Person, Company, and Office classes. My understanding is:
a company consists of many persons as employees but these classes exist independently so that is simple association with 0..* multiplicity on Person class' end
a company has many offices and those offices will not exist if there is no company so that is composition having Company as the parent class and 0..* multiplicity on Branch class' end.
But I am not sure of 2nd point. Please correct me if I am wrong.
Thank you.
Why use composition or aggregation in this situation at all? The UML spec leaves the meaning of aggregation to the modeler. What do you want it to mean to your audience? And the meaning of composition is probably too strong for this situation. Thus, why use it here? I recommend you use a simple association.
If I were you, I would stay truer to the problem domain. In the world I know, Offices don't cease to exist when a Company goes out of business. Rather, a Company occupies some number of Offices for some limited period of time. If a Company goes out of business, the Offices get sold or leased to some other Company. The Offices are not burned to the ground.
If you aren't true to the problem domain in an application, then the shortcuts you take will become invalid when the customer "changes the requirements" for that application. The problem domain doesn't actually change much, just the shortcuts you are allowed to take. If you take shortcuts to satisfy requirements in a way that are misaligned with the problem domain, it is expensive to adjust the application. Your customer becomes unhappy and you wind up working overtime. Save yourself and everyone the trouble!
While Jim's answer is correct, I want to add some extra information. There are two main uses for aggregation
Memory management
Database management
In the first case it gives a hint how long objects shall live. This is directly related to memory usage. If the target language is one which (like most modern languages) uses a garbage collector, you can simply ignore this model information.
In the second case, it's only partially a memory question. A composite aggregation in a database indicates that the aggregated elements need to be deleted along with the aggregating element. This is less a memory but in most cases a security issue. So here you have to think twice.
A shared aggregation however has a very esoteric meaning in all cases.
When representing data models through a RESTful interface it is understood to create top level endpoints that associate to a type/group of objects:
/users
/cars
We can re-use HTTP verbs to enable actions upon these groups (GET to list, POST to create, etc). And when representing a model with a "dependency" (being that it can't exist without a "parent"), we can create deeper endpoints to represent that dependency relationship:
/users/[:id]/tokens
In this case, it makes sense to not have a top-level endpoint of /tokens, as they shouldn't be able to exist without the user.
Many-to-many relationships get a bit more tricky. If two models can have a many-to-many relationship but can also truly exist on their own, it makes sense to give both objects a top-level endpoint and a deeper endpoint for defining that relationship:
/users
/cars
/users/[:id]/cars
/cars/[:id]/users
We can then use PUT and DELETE methods to define those relationships through an HTTP interface: PUT /users/[:user_id]/cars/[:car_id]. It makes sense that running that PUT operation would create a data-model that somehow links the two objects (like a join table in a relational DB).
The tricky part, then, becomes deciding on where to limit the interface to combat redundancy.
Do you allow a GET request to the second-level deep endpoints (GET /users/[:user_id]/cars/[:car_id])? Or do you require that they access the "car" from the top level GET /cars/[:id]?
Now, what if the many-to-many relationship contains meta information? How do you represent that and where do you return it?
For example, what if we wanted to keep track of how many times a user drove a certain car? Where would we return that information? If we return it at the nested endpoint, are we violating REST or being inconsistent if we return the meta information and not the resource? Do we embed the meta information in the requested resource through an attribute of some kind?
Pls advise. :P (but really, thanks)
This is really more of a personal design preference at this point IMHO.
I would personally opt for stopping at /users/[:user_id]/cars/ and then requiring a call to /cars/[:car_id] to get the car information.
If you are including relation specific metadata though, like "how many times a user drove a certain car?" it would make sense to keep that under a deeper relationship like /users/[:user_id]/cars/[:car_id].
Truthfully it's not an exact science. You have to do what is simplest, most expressive, yet still powerful enough for your data model.
You could create a new resource. Something like users/[:user_id]/cars/[:car_id]/stats, whose response includes {drivings_count: 123}. You'd probably only allow a GET of this resource.
I have a node type that has a string property that will have the same value really often. Etc. Millions of nodes with only 5 options of that string value. I will be doing searches by that property.
My question would be what is better in terms of performance and memory:
a) Implement it as a node property and have lots of duplicates (and search using WHERE).
b) Implement it as 5 additional nodes, where all original nodes reference one of them (and search using additional MATCH).
Without knowing further details it's hard to give a general purpose answer.
From a performance perspective it's better to limit the search as early as possible. Even more beneficial if you do not have to look into properties for a traversal.
Given that I assume it's better to move the lookup property into a seperate node and use the value as relationship type.
Use labels; this blog post is a good intro to this new Neo4j 2.0 feature:
Labels and Schema Indexes in Neo4j
I've thought about this problem a little as well. In my case, I had to represent state:
STARTED
IN_PROGRESS
SUBMITTED
COMPLETED
Overall the Node + Relationship approach looks more appealing in that only a single relationship reference needs to be maintained each time rather than a property string and you don't need to scan an extra additional index which has to be maintained on the property (memory and performance would intuitively be in favor of this approach).
Another advantage is that it easily supports the ability of a node being linked to multiple "special nodes". If you foresee a situation where this should be possible in your model, this is better than having to use a property array (and searching using "in").
In practice I found that the problem then became, how do you access these special nodes each time. Either you maintain some sort of constants reference where you have the node ID of these special nodes where you can jump right into them in your START statement (this is what we do) or you need to do a search against property of the special node each time (name, perhaps) and then traverse down it's relationships. This doesn't make for the prettiest of cypher queries.