Mondrian: Help picking the right aggregator for Measures? - aggregate-functions

There are some measurements I calculate that don't roll up in the same way as something like sales, or revenue. In sales if you wanted to calculate sales for the quarter you could sum all entries in the sales for each month falling in that quarter. And generally you'd just state in the schema the aggregator for the Sales Measure as sum and it would do that.
Consider we have a table of employment entries. If an employee was employed for that month there is an entry in the table for that employee. And we want to know the head count for either Month, Quarter, or year. In this case Measures like Head Count don't make sense to sum up in the same way. The head count for the quarter is the same as the head count for the last day of the month occurring in that quarter. Adding up the head count for Q1 isn't the sum of Jan, Feb, and Mar. It's simply what was the head count on Mar 31? However, I don't see any choice from the given aggregators that would allow you to specify that.
Everything works great for Head Count when you are using the lowest division of time like month, but when you start to look at head count for the quarter or year summing up doesn't make much sense.
So how is something like head count handled given that there are probably lots of other dimensions that could be included on the facts used to calculate head count? You need to roll up head count on some dimensions, but you can't sum across all dated entries? I'm looking to encapsulate this logic into the schema in some way so that users don't have to add extra filters every time they want to define a report using head count.

No one attempted to answer this question because it was poorly asked. So I asked another question with what I think is a better question.
How to perform a Distinct Sum using MDX?

Related

How to generate matched-pairs based on dates?

I have a dataset that includes dates and count of reports. I am tasked with generating matched-pairs using these guidelines:
Reports will need to be matched to the week immediately prior to or following. (For example: Jan 23, 2000 will be matched with Jan 16, 2000 and Jan 30, 2000)
Holidays must not be included in the final matched-pairs generation.
I have been able to identify the holidays within the dataset but am still stuck on how to generate the matched pairs. Any advice would be much appreciated!
Example of the Data
I am making assumptions as I could not ask for clarifications.
Assumptions I made
a> You wanted to get a formula bash
b> You wanted the date closest matching the previous week to the specific date. for example a Monday event needed to match closer to an event on Monday the previous week. As the data set you gave showed multiple reports through the week. It was not clear what pattern of the previous week you wanted to match.
Solution based on Assumptions.
1> You can mathematically turn each date to a grouping of which week they were in for the year. Then match them to one another. For example 1/1/2003 would be 1.1. A date in 14/1/2003 would be 2.1.
You can then patten match on if 1.1 = 2.1 if that hits it's a match if not it would loop until it saw an entry in the range of 2.[0-9]. You can place an if statment to check if there is a holiday on the match, if there is one it will continue the loop.

Tableau Weighted Average Per Capita Calc not aggregating right

I am trying to create a simple revenue per person calc that works with different filters within the data. I have it working for a single record, however, it breaks and aggregates incorrectly with multiple records.
The formula I have now is simply Sum([Revenue]) / Sum([Attendance]). This works when I only have a single event selected. However, as soon as I select multiple shows it aggregates and doesn't do the weighted avg.
I'm making some assumptions here, but hopefully this will help you out. I've created an .xlsx file with the following data:
Event Revenue Attendance
Event 1 63761 6685
Event 2 24065 3613
Event 3 69325 4635
Event 4 41996 5414
Inside Tableu I've created the calculated column for Rev Per Person.
Finally, in the Analysis dropdown I've enabled Show Column Grand Totals. This gives me the following:
Simple Fix
The problem is that all of the column totals are being calculated using the SUM aggregation. This is the desired behavior for Revenue and Attendance, but for Rev Per Person, you want to display the average.
In Analysis/ Totals / Total All Using you can configure the default aggregation. Here we don't want to set all of them though; but it's useful to know. Leave that where it is, and instead click on the Rev Per Person Grand Total value and change it from 'Automatic' to 'Average'.
Now you'll see a number much closer to the expected.
But it's not exactly what you expect. The average of all the Rev Per Person values gives us $9.73; but if you take the total Revenue / total Attendance you'd expect a value of $9.79.
Slight More Involved Fix
First - undo the simple fix. We'll keep all of the totals at 'Default'. Instead, we'll modify the Rev Per Person calculation.
IF Size() > 1 THEN
// Grand Total
SUM([Revenue]/[Attendance])
ELSE
// Regular View
SUM([Revenue])/SUM([Attendance])
END
Size() is being used to determine if the calculation is being done for an individual cell or not.
More information on Size() and similar functions can be found on Tableau's website here - https://onlinehelp.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.html
Now I see the expected value of $9.79.

Dax for % of columntotal

I've got a tablereport with on the rows product category and on the columns years. In the valuesection, I want to show the number of sales. This works fine. But now I also want to show the % of columntotal for the product categories.
I use dax:
Measure := count(factSales[salesnr])/calculate(count(factSales[salesnr]);all(factSales))
But this yields the percentage of grand total over all years. I want the percentage of columntotal for every seperate year.
The ALL(Table) function tells Power Pivot to ignore any filters applied over the whole table. Therefore, you're telling Power Pivot to count all the rows of the factsales table regarless of the Category or Year being filtered on the pivot table.
However, in your case, what you want is the sum for ALL the categories on each year. Since you want the sum of ALL the categories you must use `ALL(factsales[categories]). In this way, you're ignoring only the filters for the categories and not the filters for the years.
Based on the previous explanation the dax formula would be:
Measure :=
count(factSales[salesnr]) / calculate(count(factSales[salesnr]);all(factsales[categories]))

how count the selected children of a hierarchy

I'm in this situation.
I have a cube that has data aggregated by day.
This cube has a dimension with hierarchy (year-> month-> day).
in this hierarchy are visible every day for the last 4 months.
my problem is counting the days selected, because I need to do calculations based on the days observed (on a pivot).
example:
There are data for the day 11/11/2013 (amount orders), but in the hierarchy I select values from 10.11.2013 to 20.11.2013 ..
how do I calculate (amount orders) / 11?
11 is the number of days 10-20.
See my answer at MDX average fact items count by time dimension.
Having the day count as a measure in the cube would automatically apply all filters etc. without you having to take care of it.
You can use the Count() function to calculate the number of members in a set. Your set can consist of a range of dates if you use a colon between start and end date. Something like this:
Count({[Date].[Day].&[2013]&[11]&[10] : [Date].[Day].&[2013]&[11]&[20]})

Dates in SQLite3, with a twist (inaccurate dates)

I am working on genealogical software that stores its data in SQLite3 format. Everything works fine, except for one minor detail. Not in all cases is the accuracy of the birth or death dates (etc) available to the exact day. So I have the following accuracies:
exact (YYYY-MM-DD)
month (YYYY-MM)
year (YYYY)
year (YYYY+/-5)
year (YYYY+/-10)
year (YYYY+/-50)
decade
century
Now, assuming I store everything in a single column, I end up with a problem. Since SQLite3 has the Julian Day function I was thinking to encode the accuracy in the fractional part of the REAL Julian Day (I don't need the hours anyway). That is fine, but it complicates the way SELECTs work, in fact it means that stuff I could otherwise offload to SQLite3 has to be implemented in application code.
What would be a reasonable method to store the inaccurate dates and be able to query them quickly?
Note: if it matters to anyone answering, the language used is Python, but I am asking in general.
When doing queries on those date values, the most common operation probably is to check whether a date might match another date.
For this, you always need the start and the end of the interval, so it would make sense to store these two values in the DB.
(Call them Start/End or Min/Max or Earliest/Latest or whatever makes sense.)
For example, to find people who might have been born one century ago:
... WHERE '1913-04-16' BETWEEN BirthDateMin AND BirthDateMax
Inequality comparisons can be done with one of the interval boundaries.
For example, to find people who might have been born more than one century ago:
... WHERE BirthDateMin < '1913-04-16'
Just because you're storing date information, doesn't mean that the built-in date type is the right one for you. Your data requirements (date inaccuracy) means that it's probably more accurate and better long-term to do some custom date-handling work, and avoid using the built-in date data types.
Use two columns. One column is the approximate date, as accurate as possible, in SQLite format. The second column is the accuracy of the date in days. If the date is absolutely accurate, the second column is zero. If only the month is known, the date would be mid month and the second column 15 days. Etc. Date comparisons can be done by comparing against the date +/- the accuracy column.

Resources