How do I define a dimension so that null values in the FK are not ignored when showing all values? - olap

I a modeling an OLAP cube using Modrian Workbench Schema and using Jaspersoft to present it. The cube is built upon a fact table with FKs to dimension tables.
Currently my fact table has nullable foreign keys to the dimensions, which I personally find interesting (and, as far as I know, it is just s styling decision whether to use nullable or not nullable FKs ( https://dba.stackexchange.com/questions/3512/fact-table-foreign-keys-null ).
The problem is that when selecting ALL States (State is a dimension in my design), I get only the records that have a state, not the records without states (in which the state id is null).
Is Mondrian capable of getting the rows that have not state id information? How can I define that?

I think you'll have to go with non-nullable FKs and a none / n/a / unknown etc. member if you want the ALL member to refer to all facts.
If you later want to write queries that only consider rows with real dimension values, you can exclude the none member again.

Related

Can SQLite return default values for non-existent columns instead of error?

I know how to use IFNULL to get default values for non-existent rows or null values, but for creating queries that are compatible with older schema versions, it would be nice to be able to do this:
Schema v1: CREATE TABLE Employee (Name TEXT, Phone TEXT)
Schema v2: CREATE TABLE Employee (Name TEXT, Phone TEXT, Address TEXT)
Theoretical backward compatible query:
SELECT Name, Phone, IFNULL(Address, '') FROM Employee
Obviously this doesn't work for a file created with schema v1. Is there some way to do this though?
There are 2 alternative workflows, but both are rather annoying. Either 1) update the old db by adding missing columns (which would start with null values); or 2) build the query code dynamically based on schema version.
Create a temporary view that references a particular schema, substituting default values (or even transforming other data) for individual columns which differ between the base schemas.
Sqlite views can even be made modifiable by defining appropriate triggers.
This still requires programming some conditional logic upon connection, but it would allow more uniform queries and interaction with different versions of the schema.
The suggested syntax would perhaps be convenient in some limited cases, but this approach is much more useful since it can be expanded beyond simple "if column exists" Boolean operations and instead could be used to perform dynamic transformation of one schema into another, perhaps joining tables and providing more advanced logic for updates of differing schema, etc.
Pseudo code mixed with view definitions to demonstrate:
db <- Open database connection
db_schema <- determine schema version
If db_schema == 1 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, '' AS Address
FROM main.Employee;" )
Else If db_schema == 2 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, Address
FROM main.Employee;" )
End If
#Later in code
data <- db.getdata("SELECT Name, Address
FROM EmployeeX")
If you're really averse to conditional statements for the schema this may still be annoying, but it would at least reduce/eliminate conditional statements throughout the code--ideally occurring as part of the connection logic at one location in the code.
You might further notice that this pattern is really what object-oriented programming is supposed to solve. There's no mention of the language in the question, but a well-designed object model could be created in a similar fashion so that all database access is done through a unified interface. The implementation details for different schemas are internal to different objects that derive (i.e. implement interfaces and/or inherit from base class) from a basic set of interfaces. Consider the language you're using to see if the problem could be solved this way.

What is Android Room foreign key used for?

What exactly is Room #ForeignKey used for?
I know that it is used for linking two tables, so that whenever some update happens to the parent it updates children as well. For example,
onDelete = ForeignKey.CASCADE
I suppose it's nothing but my given definition (second paragraph), right?.
The reason I am asking this question is in OrmLite for example when you define foreign = true then you can have join database and can fill the foreign value with data. This you can not do with #ForeignKey of Room.
Here is a detailed explanation of what foreign does in OrmLite.
Am I right?
FKs (foreign keys) are a relational database concept. A FK says table subrows appear elsewhere uniquely. Equivalently, a FK says entities that participate in a relation(ship)/association participate uniquely in another. Those statement are equivalent because in a relational database a table represents entities/values that participate together per a relation(ship)/association--hence "the Relational Model" & "the Entity-Relationship Model".
The FK graph can be used for convenience/shorthand: default join conditions; preventing updates to invalid states; cascading updates; getting a unique value associated with an entity in the other relation(ship)/association; simultaneously setting values in one relation(ship)/association and the other one. FKs are wrongly called "relationships" and don't have to be known to query. They must be known to ask for a single value associated with an entity, but we can always just ask for a set of values whether or not it might always only ever have one element.
FKs, CKs (candidate keys), PKs (primary keys) & superkeys (unique column/field sets) are special cases of constraints, which are just conditions that are always true in every database state & (equivalently) businesss situation. They are determined by the relation(ship)s/associations & the valid business situations that can arise. When we tell the DBMS about them it can prevent update to a state that must be invalid because it violates them.
What is the difference between an entity relationship model and a relational model?

Reason to use anything other than RecId as a clustered index

Is there any reason to use an index other than RecId (SurrogateKey in AX2012) as the clustered index?
Confirmed by a quick Google search (*), one should consider at least 4 criteria when deciding on clustered indexes:
Index must be unique.
Index must be narrow (As few fields as possible - since these would be copied to every other index).
Index must be static (As updating the index field value(s) will cause SQL server to physically move the record to a new location)
Index must be ordered (Ascending / Descending).
RecId adheres to all of the above, in a better way than any index you can create yourself. Any index you create yourself will violate at least the 2nd and/or the 4th, since it would automatically include DataAreaId.
What I think...
Could it be that the option to set this is just a legacy property from AX3.0 or lower, and that its use could be deprecated now?
*TechNet SQL Server Index Design Guide and Effective Clustered Indexes
While RecId is a good choice, you can make a shorter key on say an int on a global table (SaveDataPerCompany = No).
Access patterns matters, if you often access your customers by account number, you might as well store the records in that order.
Also, if you only have one index as is often the case for group and parameter tables, you are not punished for having a longer key, it will need storage somewhere anyway.
See also What do Clustered and Non clustered index actually mean?

How does GAE datastore index null values

I'm concerned about read performance, I want to know if putting an indexed field value as null is faster than giving it a value.
I have lots of items with a status field. The status can be, "pending", "invalid", "banned", etc...
my typical request is to find the status "ok" (or null). Since null fields are not saved to datastore, it is already a win to avoid to have a "useless" default value I can replace with null. So I already have less disk space use.
But I was wondering, since datastore is noSql, it doesn't know about the data structure and it doesn't know there is a missing column status. So how does it do the status = null request check?
Does it have to check all columns of each row trying to find my column? or is there some smarter mechanism?
For example, index (null=Entity,key) when we pass a column explicitly saying it is null (if this is the case, does Objectify respect that and keep the field in the list when passing it to the native API if it's null?)
And mainly, which request is more efficient?
The low level API (and Objectify) stores and indexes nulls if you specify that a field/property should be indexed. For Objectify, you can specify #Ignore(IfNull.class) or #Unindex(IfNull.class) if you want to alter this behavior. You are probably confusing this with documentation for other data access APIs.
Since GAE only allows you to query for indexed fields, your question is really: Is it better to index nulls and query for them, or to query for everything and filter out non-null values?
This is purely a question of sparsity. If the overwhelming majority of your records contain null values, then you're probably better off querying for everything and filtering out the ones you don't want manually. A handful of extra entity reads are probably cheaper than updating and storing an extra index. On the other hand, if null records are a small percentage of your data, then you will certainly want the index.
This indexing dilema is not unique to GAE. All databases present this question with respect to low-cardinality fields; it's just that they'll do the table scan (testing and skipping rows) for you.
If you really want to fine-tune this behavior, read Objectify's documentation on Partial Indexes.
null is also treated as a value in datastore and there will be entries for null values in indexes. Datastore doc says, "Datastore distinguishes between an entity that does not possess a property and one that possesses the property with a null value"
Datastore will never check all columns or all records. If you have this property indexed, it will get records from the index only If not indexed, you cannot query by that property.
In terms of query performance, it should be the same, but you can always profile and check.

EF4: Filtering out referenced entities that do not exist

I have an Entity Framework 4 design that allows referenced tables to be deleted (no cascade delete) without modifying the entities pointing to them. So for example entity A has a foreign key reference to entity B in the ID field. B can be deleted (and there are no FK constraints in the database to stop that), so if I look at A.B.ID it is always a valid field (since all this does is return the ID field in A) even if there is no record B with that ID due to a previous deletion. This is by design, I don't want cascading deletes, I need the A records to stick around for a while for auditing purposes.
The problem is that filtering out the non-existing deleted records is not as easy as it sounds. So for example if I do this:
from c in A
select A.B.somefield;
This results in a OUTER JOIN in the generated SQL so it's picking up all the A records even if they refer to missing B records. So, the hack I've been using to solve this (since I can't figure out a better way!) is do add a where clause to check a string field in the referenced B records. If that field in the B entity is null, then I assume B doesn't exist.
from c in A
where c.B.somestringfield != null
select A.B.somefield;
seems to work IF B.somestringfield is a string. If it is an integer, this doesn't work!
This is all such a hack to me. I've thought of a few solutions but they are just not practical:
Query all tables that reference B when a B is deleted and null out their foreign keys. This is so ugly, I don't want to have to remember to do this if I add another entity that references B in the future. Not to mention a huge performace delay resolving all the references whenever I delete something.
Add a string field to every table that I can count on being there that I can check to see if the entity exists. Blech, I don't want to add a database field just for this.
Implement a soft delete and keep all the referencial integrity intact - essentially set up cascading deletes, but this is going to result is huge database bloat since I can't clean up a massive amount of records due to the references. No go.
I thought I had this problem licked with the "check if a field in the referenced entity is null" trick but it breaks under conditions that I don't completely understand (what if I don't have any strings in the referenced table? What kinds of fields will work? Integers won't.)
As an example if I have an integer field "count" in entity B and I check to see if it's null like:
from c in A
where c.B.count != null
select c.B.count;
I get a bunch of records with null for count mixed in with the results, and in fact the query bombs out with an "InvalidOperationException: The cast to value type 'Int32' failed because the materialized value is null. Either the result type's generic parameter or the query must use a nullable type."
So I need to do
from c in A
where c.B.count != null
select new { count = (int?)c.B.count };
to even see the null records. So this is pretty baffling to me how that query can result in null records in the results at all.
I just discovered something, if I do an explicit join like this, the SQL is INNER JOIN and everything works great:
from c in A
join j in B on A.B.ID equals j.ID
select c;
But this sucks. I'll have to modify a ton of queries to add explicit join clauses instead of enjoying the convenience of the relationship fields I get with the EF. Kinda defeats the purpose and adds a buch more code to maintain.
When you say that your first code snippet creates an OUTER JOIN then it's the case because B is an optional navigation property of entity A. For a required navigation property EF would create an INNER JOIN (explained in more detail here: https://stackoverflow.com/a/7640489/270591).
So, the only alternative I see to your last code snippet (using explicit join in LINQ) - aside from using direct SQL - is to make your navigation property required.
This is still a very ugly hack in my opinion which might have unexpected behaviour in other situations. If a navigation property is required or optional EF adds a "semantic meaning" to this relationship which is: If there is a foreign key != NULL there must be a related entity and EF expects that you don't have removed the enforcement of the FK constraint in the database.

Resources