I'm in this situation.
I have a cube that has data aggregated by day.
This cube has a dimension with hierarchy (year-> month-> day).
in this hierarchy are visible every day for the last 4 months.
my problem is counting the days selected, because I need to do calculations based on the days observed (on a pivot).
example:
There are data for the day 11/11/2013 (amount orders), but in the hierarchy I select values from 10.11.2013 to 20.11.2013 ..
how do I calculate (amount orders) / 11?
11 is the number of days 10-20.
See my answer at MDX average fact items count by time dimension.
Having the day count as a measure in the cube would automatically apply all filters etc. without you having to take care of it.
You can use the Count() function to calculate the number of members in a set. Your set can consist of a range of dates if you use a colon between start and end date. Something like this:
Count({[Date].[Day].&[2013]&[11]&[10] : [Date].[Day].&[2013]&[11]&[20]})
Related
I am trying to create a simple revenue per person calc that works with different filters within the data. I have it working for a single record, however, it breaks and aggregates incorrectly with multiple records.
The formula I have now is simply Sum([Revenue]) / Sum([Attendance]). This works when I only have a single event selected. However, as soon as I select multiple shows it aggregates and doesn't do the weighted avg.
I'm making some assumptions here, but hopefully this will help you out. I've created an .xlsx file with the following data:
Event Revenue Attendance
Event 1 63761 6685
Event 2 24065 3613
Event 3 69325 4635
Event 4 41996 5414
Inside Tableu I've created the calculated column for Rev Per Person.
Finally, in the Analysis dropdown I've enabled Show Column Grand Totals. This gives me the following:
Simple Fix
The problem is that all of the column totals are being calculated using the SUM aggregation. This is the desired behavior for Revenue and Attendance, but for Rev Per Person, you want to display the average.
In Analysis/ Totals / Total All Using you can configure the default aggregation. Here we don't want to set all of them though; but it's useful to know. Leave that where it is, and instead click on the Rev Per Person Grand Total value and change it from 'Automatic' to 'Average'.
Now you'll see a number much closer to the expected.
But it's not exactly what you expect. The average of all the Rev Per Person values gives us $9.73; but if you take the total Revenue / total Attendance you'd expect a value of $9.79.
Slight More Involved Fix
First - undo the simple fix. We'll keep all of the totals at 'Default'. Instead, we'll modify the Rev Per Person calculation.
IF Size() > 1 THEN
// Grand Total
SUM([Revenue]/[Attendance])
ELSE
// Regular View
SUM([Revenue])/SUM([Attendance])
END
Size() is being used to determine if the calculation is being done for an individual cell or not.
More information on Size() and similar functions can be found on Tableau's website here - https://onlinehelp.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.html
Now I see the expected value of $9.79.
I have researched this problem and have found the answer for a single query, where you can find the nth value of a single column by using DESC OFFSET 2. What I am trying to do is find the nth value for each item in a row. For example, I'm working with a data base concerning bike share data. The data base stores the duration of each trip and the date. I'm trying to find the 3rd longest duration for each day in a data base. If I was going to find the max duration I would use the following code.
SELECT DATE(start_date) trip_date, MAX(duration)
FROM trips
GROUP BY 1
I want the output to be something like this.
Date 3rd_duration
1/1/2017 334
1/2/2017 587
etc
If the value of the third longest duration is the same for two or more different trips, I would like the trip with the lowest trip_id to be ranked 3rd.
I'm working in SQLite.
Any help would be appreciated.
Neither SQLite nor MySQL have a ROW_NUMBER function built in, so get ready for an ugly query. We can still group by the date, but to find the max duration we can use a correlated subquery.
SELECT
DATE(t1.start_date) AS start_date,
t1.duration
FROM trips t1
WHERE
(SELECT COUNT(*) FROM trips t2
WHERE DATE(t2.start_date) = DATE(t1.start_date) AND
t2.duration <= t1.duration) = 3;
Note that this approach might break down if you could have, for a given date, more than one record with the same duration. In this case, you might get multiple results, neither of which might actually be the third highest duration. In order to handle such ties, you should tell us what the logic is with regard to ties.
Demo here:
Rextester
I've got a tablereport with on the rows product category and on the columns years. In the valuesection, I want to show the number of sales. This works fine. But now I also want to show the % of columntotal for the product categories.
I use dax:
Measure := count(factSales[salesnr])/calculate(count(factSales[salesnr]);all(factSales))
But this yields the percentage of grand total over all years. I want the percentage of columntotal for every seperate year.
The ALL(Table) function tells Power Pivot to ignore any filters applied over the whole table. Therefore, you're telling Power Pivot to count all the rows of the factsales table regarless of the Category or Year being filtered on the pivot table.
However, in your case, what you want is the sum for ALL the categories on each year. Since you want the sum of ALL the categories you must use `ALL(factsales[categories]). In this way, you're ignoring only the filters for the categories and not the filters for the years.
Based on the previous explanation the dax formula would be:
Measure :=
count(factSales[salesnr]) / calculate(count(factSales[salesnr]);all(factsales[categories]))
I have a dataset that has 59k entries recorded over 63 years, I need to identify clusters of events with the criteria being:
6 or more events within 6 hours
Each event has a unique ID, time HH:MM:SS and date DD:MM:YY, an output would ideally have a cluster ID, the eventS that took place within each cluster, and start and finish time and date.
Thinking about the problem in R we would need to look at every date/time and count the number of events in the following 6 hours, if the number is 6 or greater save the event IDs, if not move onto the next date and perform the same task. I have taken a data extract that just contains EventID, Date, Time and Year.
https://dl.dropboxusercontent.com/u/16400709/StackOverflow/DataStack.csv
If I come up with anything in the meantime I will post below.
Update: Having taken a break to think about the problem I have a new approach.
Add 6 hours to the Date/Time of each event then count the number of events that fall within the start end time, if there are 6 or more take the eventIDs and assign them a clusterID. Then move onto the next event and repeat 59k times as a loop.
Don't use clustering. It's the wrong tool. And the wrong term. You are not looking for abstract "clusters", but something much simpler and much more well defined. In particular, your data is 1 dimensional, which makes things a lot easier than the multivariate case omnipresent in clustering.
Instead, sort your data and use a sliding window.
If your data is sorted, and time[x+5] - time[x] < 6 hours, then these events satisfy your condition.
Sorting is O(n log n), but highly optimized. The remainder is O(n) in a single pass over your data. This will beat every single clustering algorithm, because they don't exploit your data characteristics.
I have a data set in BIRT Designer with two columns, one with day of week abbreviation names (Su, M, Tu, etc.) and the other with numerical representations of those days of the week starting at 0 and going to 6 (0, 1, 2, etc.). I want to determine what percentage of the total number of rows that each day of week represents. For example, if I have 100 total rows and 12 of those rows correspond to Su/0, 12% of the total rows are made up of Su.
I would like to perform this same calculation within BIRT and graph (bar graph) those percentages that each day consists of out of the total. I'm just learning how to use BIRT and assume that I need to do some scripting either when making my data set or when specifying the rows when making the chart. Any tips would be greatly appreciated.
Use computed columns.
Edit Data set > Computed Columns
The simplest way is to put one column that counts every row, for each day of the week. You can have a separate column that adds a count if the day of the week is a specific values
if (row["Day"] == "Su"){
1
}
I should add: that you can use a 'data' element in your table to compute the percentage. A 'Dynamic Text' item could also be used, but the data item gives you a binded value that you can make better use of later if needed.
Edit
To get a total row count, us a computed column I name mine 'All'
For the Expression use the value "1"
With some inspiration from James Jenkins I think I found my answer. It was pretty simple in the end, but all I needed to do was make a new computed column and instead of adding an expression, I simply set the Aggregation to "COUNT". That counts all of the rows in your table and puts that total on each row. That way you can use that total in any calculations that you may need to do. I have added a screenshot for clarity.