How can I can aggregate by group over an aggregate in Tableau? - aggregate-functions

I'm trying to visualize the median profit as a proportion of sales for each day of the week. My data looks like this:
Date Category Profit Sales State
1/1 Book 3 6 NY
1/1 Toys 12 30 CA
1/2 Games 9 20 NY
1/2 Books 5 10 WA
I've created a calculated field "Profit_Prop" as SUM([Profit])/SUM([Sales]). I want to display the median daily value of profit_prop for Mondays, Tuesdays, etc.
I can kind of do this as a boxplot by adding WEEKDAY(Date) to Columns and Profit_Prop to Rows, then adding Date to Detail and changing granularity to Exact Date. But I just want to display the median without displaying a data point for each day.
I tried making another calculated field with MEDIAN([Profit_prop]), but I get "argument to MEDIAN is already an aggregation and cannot be further aggregated."

Remove date from the level of detail.
Create calculated field like below and use it instead of Profit prop
median(
{ INCLUDE [Date]:
[Profit_Prop]
}
)
Let me know how it goes.

When you are doing a calculation on a calculated field normal median function doesn't work instead you need to use the Table calculations.
Taking data from your example, create a formula. Use below code:
Create a calculated field and paste below code:
WINDOW_MEDIAN([Calculation1],FIRST(),LAST())
Set the computation to Table Down

Related

Grouping data based on difference in days

I have a data frame that has 3 columns a subid , test,day. For each subject, I want to identify which tests happened within a time frame of x days and calculate max change in test value. Please see example below. For each subject and a given test ,I want to identify which tests happened within 3 days. so if we look at "Day" column, for the value =1 it wont have any groups as subsequent test was done 6 days after. Values of Day= 10,7,8,9 should be identified as a group and the max change among these should be calculated. Similarly Day = 12,11,10,9 should be identified as another group and the max change among these should be calculated. How can i do this using R. Thank you in advance.

Calculating new columns in PowerBI

I've got this table I've defined in PowerBI:
I'd like to define a new table which has the percentage of medals won by USA from the total of medals that were given that year for each sport.
An example:
Year Sport Percentage
1986 Aquatics 0.0%
How could I do it?
You can use SUMMARIZE() to calculate a new table:
NewTable =
SUMMARIZE(
yourDataTable;
[Year];
[Sports];
"Pct";
DIVIDE(
CALCULATE(
COUNTROWS(yourDataTable);
yourDataTable[Nat] = "USA"
);
CALCULATE(
COUNTROWS(yourDataTable);
ALLEXCEPT(
yourDataTable;
yourDataTable[Year];
yourDataTable[Sports]
)
);
0
)
I know that an answer has already been accepted, but I feel that I should provide my suggested solution to utilize all of Power BI's capabilities.
By creating a calculated table, you are limited in what you can do with the data, in that it is hard coded to be filtered to USA and is only based on Year and Sport. While that is the current requirements, what if they change? Then you have to recode your table or make another one.
My suggestion is to use measures to accomplish this task, and here's how...
First, here is my set of sample data.
With that data, I created a simple measure that count the rows to get the count of medals.
Medal Count = COUNTROWS(Olympics)
Throwing together a basic matrix with that measure we can see the data like this.
A second measure can then be created to get a percentage for a specific country.
Country Medal Percentage = DIVIDE([Medal Count], CALCULATE([Medal Count], ALL(Olympics[Country])), BLANK())
Adding that measure to the matrix we can start to see our percentages.
From that matrix, we can see that USA won 25% of all medals in 2000. And their 2 medals in Sport B made up 33.33% of all medals that year.
With this you can utilize slicers and the layout of the matrix to get the desired percentage. Here's a small example with a country and year slicer that shows the same numbers.
From here you are able to cut the data by any sport or year and see the percentage of any selected country (or countries).

Tableau - Average of Ranking based on Average

For a certain data range, for a specific dimension, I need to calculate the average value of a daily rank based on the average value.
First of all this is the starting point:
This is quite simple and for each day and category I get the AVG(value) and the Ranke based on that AVG(Value) computed using Category.
Now what I need is "just" a table with one row for each Category with the average value of that rank for the overall period.
Something like this:
Category Global Rank
A (blue) 1,6 (1+3+1+1+1+3)/6
B (orange) 2,3 (3+2+3+2+2+2)/6
C (red) 2,0 (2+1+2+3+3+1)/6
I tried using the LOD but it's not possble using rank table calculation inside them so I'm wondering if I'm missing anything or if it's even possible in Tableau.
Please find attached the twbx with the raw data here:
Any Help would be appreciated.

Moving average with dynamic window

I'm trying to add a new column to my data table that contains the average of some of the following rows. How many rows to be selected for the average however depends on the time stamp of the rows.
Here is some test data:
DT<-data.table(Weekstart=c(1,2,2,3,3,4,5,5,6,6,7,7,8,8,9,9),Art=c("a","b","a","b","a","a","a","b","b","a","b","a","b","a","b","a"),Demand=c(1:16))
I want to add a column with the mean of all demands, which occured in the weeks ("Weekstart") up to three weeks before the respective week (grouped by Art, excluding the actual week).
With rollapply from zoo-library, it works like this:
setorder(DT,-Weekstart)
DT[,RollMean:=rollapply(Demand,width=list(1:3),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
The problem however is, some data is missing. In the example, the data for the Art b lack the week no 4, there is no Demand in week 4. As I want the average of the three prior weeks, not the three prior rows, the average is wrong. Instead, the result for Art b for week 6 should look like this:
DT[Art=="b"&Weekstart==6,RollMean:=6]
(6 instead of 14/3, because only Week 5 and Week 3 count: (8+4)/2)
Here is what I tired so far:
It would be possible to loop through the minima of the week of the following rows in order to create a vector that defines for each row, how wide the 'width' should be (the new column 'rollwidth'):
i<-3
DT[,rollwidth:=Weekstart-rollapply(Weekstart,width=list(1:3),partial=TRUE,FUN=min,align="left",fill=1),.(Art)]
while (max(DT[,Weekstart-rollapply(Weekstart,width=list(1:i),partial=TRUE,FUN=min,align="left",fill=NA),.(Art)][,V1],na.rm=TRUE)>3) {
i<-i-1
DT[rollwidth>3,rollwidth:=i]
}
But that seems very unprofessional (excuse my poor skills). And, unfortunately, the rollapply with width and rollwidth doesnt work as intended (produces warnings as 'rollwidth' is considered as all the rollwidths in the table):
DT[,RollMean2:=rollapply(Demand,width=list(1:rollwidth),partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
What does work is
DT[,RollMean3:=rollapply(Demand,width=rollwidth,partial=TRUE,FUN=mean,align="left",fill=NA),.(Art)]
but then again, the average includes the actual week (not what I want).
Does anybody know how to apply a criterion (i.e. the difference in the weeks shall be <= 3) instead of a number of rows to the argument width?
Any suggestions are appreciated!

Stata-related graphic enquiry

I have a very basic question about Stata. I have a repeated cross section of individuals from year 1 to year 20. For each individual, by year, I have a year-specific variable- GDP per capita in the country for instance. This variable is defined for each individual for each year, across years. I therefore have 20 unique data points for this variable. I want to plot this variable as a function of time (say in a two-way plot). The twoway command does not work because I have a lot more than 20 points for this 20 values because for each value I have it defined over the n number of people in the cross section in that year. How can I create a separate variable that extracts only the distinct values from the variable in its current form?
With a simple example of your data you could have saved yourself and others time. As it stands, your question is difficult to understand. As already pointed out, it lacks both code and example data. Please rewrite so others can easily find and use whatever is posted here.
My interpretation is you have panel data. The variable gdp is year-specific (in every panel the information is duplicated), but you'd like to graph it against time. Just tag one instance, and draw a graph conditional on that. An example:
clear
set more off
// not 20 years, but 3
input ///
id year gdp
1 1990 78
1 1991 90
1 1992 98
2 1990 78
2 1991 90
2 1992 98
end
egen tograph = tag(year)
twoway line gdp year if tograph
or
twoway line gdp year if id == 1
This is a perfect case of panel data:
First set the panel. The command to set the panel in your case is the following:
xtset id year
you can plot using xtline function using following command:
xtline gdp , t(year) i(id)
The above command will plot individual graphs for each id over year. To get one graph for all for comparison, use the following command:
xtline gdb , overlay t(year) i(id)

Resources