Calculating waterfall chart components - math

I’m trying to create a waterfall chart, but having trouble working out how to calculate the contributions from the components that gets you from budget to actual...
For example – I have a "budget" list of groceries, with a price and a quantity, and then I’ll have an "actual" grocery list with the final price and quantity… which looks like the below
Budget
Item
Price
Quantity
Apples
2.1
5
Oranges
3.4
7
Bananas
5.1
10
Mangos
15.3
3
Grapes
3.8
20
Total
4.6
45
Actual
Item
Price
Quantity
Apples
2.5
9
Oranges
3.7
6
Bananas
4.3
11
Mangos
13.3
4
Grapes
9.5
22
Total
6.8
52
So if the waterfall were to begin at the weighted average of $4.6 per grocery item, how do I calculate each grocery item's contribution to get to the actual weighted contribution to $6.8 per item? Is there a nice simple calculation to work this out, ensuring that it factors in changes to each item's price as well as changes in quantity...
hoping to achieve something like this waterfall:
Thanks

Notations:
i : item's number (in the example: 1 for Apples, ..., 5 for Grapes)
n : number if items (in the example: 5)
Pb[i] : budget price of the i-th item
Qb[i] : budget quantity of the i-th item
Qb = Qb[1] + ... + Qb[n] : sum of budget quantities (in the example: 45)
Pb = (Pb[1] * Qb[1] + ... + Pb[n] * Qb[n]) / Qb : average budget price (in the example: 4.6)
Pa[i] : actual price of the i-th item
Qa[i] : actual quantity of the i-th item
Qa = Qa[1] + ... + Qa[n] : sum of actual quantities (in the example: 52)
Pa = (Pa[1] * Qa[1] + ... + Pa[n] * Qa[n]) / Qa : average actual price (in the example: 6.8)
You would like to calculate each item's contribution to the difference between the average actual price and the average budget price. This can be done by rearranging the difference:
Pa - Pb
= (Pa[1] * Qa[1] + ... + Pa[n] * Qa[n]) / Qa - (Pb[1] * Qb[1] + ... + Pb[n] * Qb[n]) / Qb
= (Pa[1] * Qa[1] / Qa - Pb[1] * Qb[1] / Qb) + ... + (Pa[n] * Qa[n] / Qa - Pb[n] * Qb[n] / Qb)
That is, the contribution of the i-ts item is Pa[i] * Qa[i] / Qa - Pb[i] * Qb[i] / Qb. For your example, the numbers are the following:
Contribution
Apples 0.2
Oranges -0.1
Bananas -0.2
Mangos 0.0
Grapes 2.3
The sum of these contributions is 2.2, which equals to the difference between the average actual price (6.8) and the average budget price (4.6), as expected.

Related

Constraint issue with pyomo involving a scalar

working on an economic optimization problem with pyomo, I would like to add a constraint to prevent the product of the commodity quantity and its price to go below zero (<0), avoiding a negative revenue. It appears that all the data are in a dataframe and I can't setup a constraint like:
def positive_revenue(model, t)
return model.P * model.C >=0
model.positive_rev = Constraint(model.T, rule=positive_revenue)
The system returns the error that the price is a scalar and it cannot process it. Indeed the price is set as such in the model:
model.T = Set(doc='quarter of year', initialize=df.quarter.tolist(), ordered=True)
model.P = Param(initialize=df.price.tolist(), doc='Price for each quarter')
##while the commodity is:
model.C = Var(model.T, domain=NonNegativeReals)
I just would like to apply that for each timestep (quarter of hour here) that:
price(t) * model.C(t) >=0
Can someone help me to spot the issue ? Thanks
Here are more information:
df dataframe:
df time_stamp price Status imbalance
quarter
0 2021-01-01 00:00:00 64.84 Final 16
1 2021-01-01 00:15:00 13.96 Final 38
2 2021-01-01 00:30:00 12.40 Final 46
index = quarter from 0 till 35049, so it is ok
Here is the df.info()
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time_stamp 35040 non-null datetime64[ns]
1 price 35040 non-null float64
2 Status 35040 non-null object
3 imbalance 35040 non-null int64
I modified the to_list() > to_dict() in model.T but still facing the same issue:
KeyError: "Cannot treat the scalar component 'P' as an indexed component" at the time model.T is defined in the model parameter, set and variables.
Here is the constraint where the system issues the error:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
Can't figure it out...any idea ?
UPDATE
Model works after dropping an unfortunate 'quarter' column somewhere...after I renamed the index as quarter.
It runs but i still get negative revenues, so the constraints seems not working at present, here is how it is written:
def revenue_positive(model,t):
for t in model.T:
return (model.C[t] * model.P[t]) >= 0
model.positive_revenue = Constraint(model.T,rule=revenue_positive)
What am I missing here ? Thanks for help, just beginning
Welcome to the site.
The problem you appear to be having is that you are not building your model parameter model.P as an indexed component. I believe you likely want it to be indexed by your set model.T.
When you make indexed params in pyomo you need to initialize it with some key:value pairing, like a python dictionary. You can make that from your data frame by re-indexing your data frame so that the quarter labels are the index values.
Caution: The construction you have for model.T and this assume there are no duplicates in the quarter names.
If you have duplicates (or get a warning) then you'll need to do something else. If the quarter labels are unique you can do this:
import pandas as pd
import pyomo.environ as pyo
df = pd.DataFrame({'qtr':['Q5', 'Q6', 'Q7'], 'price':[12.80, 11.50, 8.12]})
df.set_index('qtr', inplace=True)
print(df)
m = pyo.ConcreteModel()
m.T = pyo.Set(initialize=df.index.to_list())
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
m.pprint()
which should get you:
price
qtr
Q5 12.80
Q6 11.50
Q7 8.12
1 Set Declarations
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'Q5', 'Q6', 'Q7'}
1 Param Declarations
price : Size=3, Index=T, Domain=Any, Default=None, Mutable=False
Key : Value
Q5 : 12.8
Q6 : 11.5
Q7 : 8.12
2 Declarations: T price
edit for clarity...
NOTE:
The first argument when you create a pyomo parameter is the indexing set. If this is not provided, pyomo assumes that it is a scalar. You are missing the set as shown in my example and highlighted with arrow here: :)
|
|
|
V
m.price = pyo.Param(m.T, initialize=df['price'].to_dict())
Also note, you will need to initialize model.P with a dictionary as I have in the example, not a list.

GameTheory package: Convert data frame of games to Coalition Set

I am looking to explore the GameTheory package from CRAN, but I would appreciate help in converting my data (in the form of a data frame of unique combinations and results) in to the required coalition object. The precursor to this I believe to be an ordered list of all coalition values (https://cran.r-project.org/web/packages/GameTheory/vignettes/GameTheory.pdf).
My real data has n ~ 30 'players', and unique combinations = large (say 1000 unique combinations), for which I have 1 and 0 identifiers to describe the combinations. This data is sparsely populated in that I do not have data for all combinations, but will assume combinations not described have zero value. I plan to have one specific 'player' who will appear in all combinations, and act as a baseline.
By way of example this is the data frame I am starting with:
require(GameTheory)
games <- read.csv('C:\\Users\\me\\Desktop\\SampleGames.csv', header = TRUE, row.names = 1)
games
n1 n2 n3 n4 Stakes Wins Success_Rate
1 1 1 0 0 800 60 7.50%
2 1 0 1 0 850 45 5.29%
3 1 0 0 1 150000 10 0.01%
4 1 1 1 0 300 25 8.33%
5 1 1 0 1 1800 65 3.61%
6 1 0 1 1 1900 55 2.89%
7 1 1 1 1 700 40 5.71%
8 1 0 0 0 3000000 10 0.00333%
where n1 is my universal player, and in this instance, I have described all combinations.
To calculate my 'base' coalition value from player {1} alone, I am looking to perform the calculation: 0.00333% (success rate) * all stakes, i.e.
0.00333% * (800 + 850 + 150000 + 300 + 1800 + 1900 + 700 + 3000000) = 105
I'll then have zero values for {2}, {3} and {4} as they never "play" alone in this example.
To calculate my first pair coalition value, I am looking to perform the calculation:
7.5%(800 + 300 + 1800 + 700) + 0.00333%(850 + 150000 + 1900 + 3000000) = 375
This is calculated as players {1,2} base win rate (7.5%) by the stakes they feature in, plus player {1} base win rate (0.00333%) by the combinations he features in that player {2} does not - i.e. exclusive sets.
This logic is repeated for the other unique combinations. For example row 4 would be the combination of {1,2,3} so the calculation is:
7.5%(800+1800) + 5.29%(850+1900) + 8.33%(300+700) + 0.00333%(3000000+150000) = 529 which descriptively is set {1,2} success rate% by Stakes for the combinations it appears in that {3} does not, {1,3} by where {2} does not feature, {1,2,3} by their occurrences, and the base player {1} by examples where neither {2} nor {3} occur.
My expected outcome therefore should look like this I believe:
c(105,0,0,0, 375,304,110,0,0,0, 529,283,246,0, 400)
where the first four numbers are the single player combinations {1} {2} {3} and {4}, the next six numbers are two player combinations {1,2} {1,3} {1,4} (and the null cases {2,3} {2,4} {3,4} which don't exist), then the next four are the three player combinations {1,2,3} {1,2,4} {1,3,4} and the null case {2,3,4}, and lastly the full combination set {1,2,3,4}.
I'd then feed this in to the DefineGame function of the package to create my coalitions object.
Appreciate any help: I have tried to be as descriptive as possible. I really don't know where to start on generating the necessary sets and set exclusions.

Crystal Reports Difference of group total

I have a report which has two groups. Group B always has only 2 values. I want to get the difference of total values of Item Type 01 and Item Type 02 to the Group B footer (Tot type01 - tot type02).
Help me to achieve this. I tried few formulas but non of them works for me
Month01 Month2
Group A
Group B
Item Type 01
ab 10 10
ac 20 30
ad 30 30
**Total** 60 70
Item Type 02
ab 10 20
ac 10 15
ad 20 5
**Total** 40 30
**Difference 20 40**
I want something like this
NumberVar sum01 := 0;
Numbervar sum02 := 0;
GroupName ({DataTable1.IncomeType}) = Type 01
Then
sum01 := Sum ({DataTable1.Month01}, {DataTable1.IncomeType})
if
GroupName ({DataTable1.IncomeType}) = Type 02
Then
sum02 := Sum ({DataTable1.Month01}, {DataTable1.IncomeType})
sum01 - sum02
I know this isn't correct. I used it to explain my question for you as much as possible.
Really appreciate your guidence
You can do this using arrays..
Take 2 arrays and store values for Month1 and Month2 and in group footer retrive and add those.
Create a formula #Month1Array and place in Item Type group footer after Month1 summary
Shared Numbervar array x;
x:=x+sum(Month1,Item GRoup);
1;
Create a formula #Month2Array and place in Item Type group footer after Month2 summary
Shared Numbervar array y;
y:=y+sum(Month2,Item GRoup);
1;
Now in the footer where you want to see the difference write below formula for
Create a formula #Month1
Shared Numbervar array x;
x[1]-x[2]
Create a formula #Month2
Shared Numbervar array y;
y[1]-y[2]

Dealing with conditionals in a better manner than deeply nested ifelse blocks

I'm trying to write some code to analyze my company's insurance plan offerings... but they're complicated! The PPO plan is straightforward, but the high deductible health plans are complicated, as they introduced a "split" deductible and out of pocket maximum (individual and total) for the family plans. It works like this:
Once the individual meets the individual deductible, he/she is covered at 90%
Once the remaining 1+ individuals on the plan meet the total deductible, the entire family is covered at 90%
The individual cannot satisfy the family deductible with only their medical expenses
I want to feed in a vector of expenses for my family members (there are four of them) and output the total cost for each plan. Below is a table of possible scenarios, with the following column codes:
ded_ind: did one individual meet the individual deductible?
ded_tot: was the total deductible reached?
oop_ind: was the individual out of pocket max reached
oop_tot: was the total out of pocket max reached?
exp_ind = the expenses of the highest spender
exp_rem = the expenses of the remaining /other/ family members (not the highest spender)
oop_max_ind = the level of expenses at which the individual has paid their out of pocket maximum (when ded_ind + 0.1 * exp_ind = out of pocket max for the individual
oop_max_fam = same as for individual, but for remaining family members
The table:
| ded_ind | oop_ind | ded_rem | oop_rem | formula
|---------+---------+---------+---------+---------------------------------------------------------------------------|
| 0 | 0 | 0 | 0 | exp_ind + exp_rem |
| 1 | 0 | 0 | 0 | ded_ind + 0.1 * (exp_ind - ded_ind) + exp_rem |
| 0 | 0 | 1 | 0 | exp_ind + ded_rem + 0.1 * (exp_rem - ded_rem) |
| 1 | 1 | 0 | 0 | oop_max_ind + exp_fam |
| 1 | 0 | 1 | 0 | ded_ind + 0.1 * (exp_ind - ded_ind) + ded_rem + 0.1 * (exp_rem - ded_rem) |
| 0 | 0 | 1 | 1 | oop_max_rem + exp_ind |
| 1 | 0 | 1 | 1 | ded_ind + 0.1 * (exp_ind - ded_ind) + oop_max_rem |
| 1 | 1 | 1 | 0 | oop_ind_max + ded_rem + 0.1 * (exp_rem - ded_rem) |
| 1 | 1 | 1 | 1 | oop_ind_max + oop_rem_max |
Omitted: 0 1 0 0, 0 0 0 1, 0 1 1 0, and 0 1 0 1 are not present, as oop_ind and oop_rem could not have been met if ded_ind and ded_rem, respectively, have not been met.
My current code is a somewhat massive ifelse loop like so (not the code, but what it does):
check if plan is ppo or hsa
if hsa plan
if exp_ind + exp_rem < ded_rem # didn't meet family deductible
if exp_ind < ded_ind # individual deductible also not met
cost = exp_ind + exp_rem
else is exp_ind > oop_ind_max # ded_ind met, is oop_ind?
ded_ind + 0.1 * (exp_ind - ded_ind) + exp_fam # didn't reach oop_max_ind
else oop_max_ind + exp_fam # reached oop_max_ind
else ...
After the else, the total is greater than the family deductible. I check to see if it was contributed by more than two people and just continue on like that.
My question, now that I've given some background to the problem: Is there a better way to manage conditional situations like this than ifelse loops to filter them down a bit at a time?
The code ends up seeming redundant, as one checks for some higher level conditions (consider the table where ded_rem is met or not met... one still has to check for ded_ind and oop_max_ind in both cases, and the code is the same... just positioned at two different places in the ifelse structure).
Could this be done with some sort of matrix operation? Are there other examples online of more clever ways to deal with filtering of conditions?
Many thanks for any suggestions.
P.S. I'm using R and will be creating an interactive with shiny so that other employees can input best and worst case scenarios for each of their family members and see which plan comes out ahead via a dot or bar chart.
The suggestion to convert to a binary value based on the result gave me an idea, which also helped me learn that one can do vectorized TRUE / FALSE checks (I guess that was probably obvious to many).
Here's my current idea:
expenses will be a vector of individual forecast medical expenses for the year (example of three people):
expenses <- c(1500, 100, 400)
We set exp_ind to the max value, and sum the rest for exp_rem
exp_ind <- max(expenses)
# [1] index of which() for cases with multiple max values
exp_rem <- sum(expenses[-which(expenses == exp_ind)[1]])
For any given plan, I can set up a vector with the cutoffs, for example:
individual deductible = 1000
individual out of pocket max = 2000 (need to incur 11k of expenses to get there)
family deductible = 2000
family out of pocket max = 4000 (need to incur 22k of expenses to get there)
Set those values:
ded_ind <- 1000
oop_max_ind <- 11000
ded_tot <- 2000
oop_max_tot <- 22000
cutoffs <- c(ded_ind, oop_max_ind, ded_tot, oop_max_tot)
Now we can check the input expense against the cutoffs:
result <- as.numeric(rep(c(exp_ind, exp_rem), each = 2) > cutoffs)
Last, convert to binary:
result_bin <- sum(2^(seq_along(result) - 1) * result)
Now I can set up functions for the possible outcomes based on the value in result_bin:
if(result_bin == 1) {cost <- ded_ind + 0.1 * (exp_ind - ded_ind) + exp_rem }
cost
[1] 1550
We can check this...
High spender would have paid his 1000 and then 10% of remaining 500 = 1050
Other members did not reach the family deductible and paid the full 400 + 100 = 500
Total: 1550
I still need to create a mapping of results_bin values to corresponding functions, but doing a vectorized check and converting a unique binary value is much, much better, in my opinion, than my ifelse nested mess.
I look at it like this: I'd have had to set the variables and write the functions anyway; this saves me 1) explicitly writing all the conditions, 2) the redundancy issue I was talking about in that one ends up writing identical "sibling" branches of parent splits in the ifelse structure, and lastly, 3) the code is far, far, far more easily followed.
Since this question is not very specific, here is a simpler example/answer:
# example data
test <- expand.grid(opt1=0:1,opt2=0:1)
# create a unique identifier to represent the binary variables
test$code <- with(allopts,paste(opt1,opt2,sep=""))
# create an input variable to be used in functions
test$var1 <- 1:4
# opt1 opt2 code var1
#1 0 0 00 1
#2 1 0 10 2
#3 0 1 01 3
#4 1 1 11 4
Respective functions to apply depending on binary conditions, along with intended results for each combo:
var1 + 10 #code 00 - intended result = 11
var1 + 100 #code 10 - intended result = 102
var1 + 1000 #code 01 - intended result = 1003
var1 + var1 #code 11 - intended result = 8
Use ifelse combinations to do the calculations:
test$result <- with(test,
ifelse(code == "00", var1 + 10,
ifelse(code == "10", var1 + 100,
ifelse(code == "01", var1 + 1000,
ifelse(code == "11", var1 + var1,
NA
)))))
Result:
opt1 opt2 code var1 result
1 0 0 00 1 11
2 1 0 10 2 102
3 0 1 01 3 1003
4 1 1 11 4 8

Can I calculate the average of these numbers?

I was wondering if it's possible to calculate the average of some numbers if I have this:
int currentCount = 12;
float currentScore = 6.1123 (this is a range of 1 <-> 10).
Now, if I receive another score (let's say 4.5), can I recalculate the average so it would be something like:
int currentCount now equals 13
float currentScore now equals ?????
or is this impossible and I still need to remember the list of scores?
The following formulas allow you to track averages just from stored average and count, as you requested.
currentScore = (currentScore * currentCount + newValue) / (currentCount + 1)
currentCount = currentCount + 1
This relies on the fact that your average is currently your sum divided by the count. So you simply multiply count by average to get the sum, add your new value and divide by (count+1), then increase count.
So, let's say you have the data {7,9,11,1,12} and the only thing you're keeping is the average and count. As each number is added, you get:
+--------+-------+----------------------+----------------------+
| Number | Count | Actual average | Calculated average |
+--------+-------+----------------------+----------------------+
| 7 | 1 | (7)/1 = 7 | (0 * 0 + 7) / 1 = 7 |
| 9 | 2 | (7+9)/2 = 8 | (7 * 1 + 9) / 2 = 8 |
| 11 | 3 | (7+9+11)/3 = 9 | (8 * 2 + 11) / 3 = 9 |
| 1 | 4 | (7+9+11+1)/4 = 7 | (9 * 3 + 1) / 4 = 7 |
| 12 | 5 | (7+9+11+1+12)/5 = 8 | (7 * 4 + 12) / 5 = 8 |
+--------+-------+----------------------+----------------------+
I like to store the sum and the count. It avoids an extra multiply each time.
current_sum += input;
current_count++;
current_average = current_sum/current_count;
It's quite easy really, when you look at the formula for the average: A1 + A2 + ... + AN/N. Now, If you have the old average and the N (numbers count) you can easily calculate the new average:
newScore = (currentScore * currentCount + someNewValue)/(currentCount + 1)
You can store currentCount and sumScore and you calculate sumScore/currentCount.
or... if you want to be silly, you can do it in one line :
current_average = (current_sum = current_sum + newValue) / ++current_count;
:)
float currentScore now equals (currentScore * (currentCount-1) + 4.5)/currentCount ?

Resources