I’m trying to create a waterfall chart, but having trouble working out how to calculate the contributions from the components that gets you from budget to actual...
For example – I have a "budget" list of groceries, with a price and a quantity, and then I’ll have an "actual" grocery list with the final price and quantity… which looks like the below
Budget
Item
Price
Quantity
Apples
2.1
5
Oranges
3.4
7
Bananas
5.1
10
Mangos
15.3
3
Grapes
3.8
20
Total
4.6
45
Actual
Item
Price
Quantity
Apples
2.5
9
Oranges
3.7
6
Bananas
4.3
11
Mangos
13.3
4
Grapes
9.5
22
Total
6.8
52
So if the waterfall were to begin at the weighted average of $4.6 per grocery item, how do I calculate each grocery item's contribution to get to the actual weighted contribution to $6.8 per item? Is there a nice simple calculation to work this out, ensuring that it factors in changes to each item's price as well as changes in quantity...
hoping to achieve something like this waterfall:
Thanks
Notations:
i : item's number (in the example: 1 for Apples, ..., 5 for Grapes)
n : number if items (in the example: 5)
Pb[i] : budget price of the i-th item
Qb[i] : budget quantity of the i-th item
Qb = Qb[1] + ... + Qb[n] : sum of budget quantities (in the example: 45)
Pb = (Pb[1] * Qb[1] + ... + Pb[n] * Qb[n]) / Qb : average budget price (in the example: 4.6)
Pa[i] : actual price of the i-th item
Qa[i] : actual quantity of the i-th item
Qa = Qa[1] + ... + Qa[n] : sum of actual quantities (in the example: 52)
Pa = (Pa[1] * Qa[1] + ... + Pa[n] * Qa[n]) / Qa : average actual price (in the example: 6.8)
You would like to calculate each item's contribution to the difference between the average actual price and the average budget price. This can be done by rearranging the difference:
Pa - Pb
= (Pa[1] * Qa[1] + ... + Pa[n] * Qa[n]) / Qa - (Pb[1] * Qb[1] + ... + Pb[n] * Qb[n]) / Qb
= (Pa[1] * Qa[1] / Qa - Pb[1] * Qb[1] / Qb) + ... + (Pa[n] * Qa[n] / Qa - Pb[n] * Qb[n] / Qb)
That is, the contribution of the i-ts item is Pa[i] * Qa[i] / Qa - Pb[i] * Qb[i] / Qb. For your example, the numbers are the following:
Contribution
Apples 0.2
Oranges -0.1
Bananas -0.2
Mangos 0.0
Grapes 2.3
The sum of these contributions is 2.2, which equals to the difference between the average actual price (6.8) and the average budget price (4.6), as expected.
I have a Carbon/Graphite stack with some very basic retention schemas set up. These retention periods work fine, apart from on a couple of statistics - these only appear to last for a week.
My storage-schemas.conf:
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[collectd]
pattern = ^collectd.*
retentions = 10s:2d,1m:14d,5m:1y
And my storage-aggregation.conf:
[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min
[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max
[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum
[default_average]
pattern = .*
xFilesFactor = 0.5
aggregationMethod = average
All stats arrive prefixed with collectd., so the retention patterns are correct. When viewing an affected dashboard in Grafana I see the following in graphite's cache.log:
Thu Oct 13 11:25:16 2016 :: CarbonLink cache-query request for collectd.host_domain_com.openstack-keystone-totals.gauge-users-count returned 0 datapoints
Using whisper-info.py on an affected .wsp shows the following:
maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 1710772
Archive 0
retention: 172800
secondsPerPoint: 10
points: 17280
size: 207360
offset: 52
Archive 1
retention: 1209600
secondsPerPoint: 60
points: 20160
size: 241920
offset: 207412
Archive 2
retention: 31536000
secondsPerPoint: 300
points: 105120
size: 1261440
offset: 449332
Can anyone suggest anything I may have missed?
So the answer to this comes from a couple of issues. Firstly, the data points are being submitted with -count on the end of the name instead of .count, so the default [sum] aggregation is being applied to the data. Because we're not submitting data every 10 seconds (and because we have an xFilesFactor of 0.5 on the default), the data is munged when it hits the retention point and because there are less than 50% of the expected data points, a null value is stored instead.
I have created heatmap graphs using gnuplot.
I have data.dat:
avail reli perf
stop 181 20 121 10 34 20
jitter 18 20 17 20 13 20
limp 12 20 5 30 20 20
and gnuplot script:
set term pos eps font 20
unset key
set nocbtics
set cblabel "Score"
set cbtics scale 0
set cbrange [ 0.00000 : 110.00000 ] noreverse nowriteback
set palette defined ( 0.0 "#FFFFFF",\
1 "#FFCCCC",\
20.2 "#FF9999 ",\
30.3 "#FF6666",\
40.4 "#FF3333",\
50.5 "#FF0000",\
60.6 "#CC0000",\
70.7 "#C00000",\
80.8 "#B00000",\
90.9 "#990000",\
100.0 "#A00000")
set title "Faults"
set ylabel "Hardware Faults"
set xlabel "Aspects"
set size 1, 0.5
set output 'c11.eps'
YTICS="`awk 'BEGIN{getline}{printf "%s ",$1}' 'data2.dat'`"
XTICS="`head -1 'data2.dat'`"
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) i-1 )
set for [i=1:words(YTICS)] ytics ( word(YTICS,i) i-1 )
plot "<awk '{$1=\"\"}1' 'data2.dat' | sed '1 d'" matrix w image, '' matrix using 1:2:($3==0 ? " " : sprintf("%.1d",$3)) with labels
#######^ replace the first field with nothing
################################## ^ delete first line
My output is:
Here I have range 1-20,30-39,...,100 or more)
Now I have to 2 values in every axis. e.g stop and avail have(181 and 20). the 181 is the count and 20 is percentages. I want to create graphs which have colors base on percentages and the labels on my graphs from the counts of data.
I have experienced create some graph using for and do some modulo to select the data. But here, I have not idea to create that graphs. Any suggestion for creating this? Thanks!
You can use every to skip columns.
plot ... every 2 only uses every second column, which is what you can use for the labels. For the colors, you must start with the second column (numbered with 1), and you need every 2::1.
Following are the relevant changes only to your script:
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) 2*i-1 )
plot "<awk '{$1=\"\"}1' 'data2.dat' | sed '1 d'" matrix every 2::1 w image, \
'' matrix using ($1+1):2:(sprintf('%d', $3)) every 2 with labels
The result with 4.6.5 is:
I examined some MPEG-4 video headers and saw some byte arrays like below at the beginning:
00 00 01 B0 01 00 00 01 B5 89 13
I know 00 00 01 parts but what exactly B0 B1 and B5 89 13 parts mean? Actually, if I put this byte array infront of an MPEG-4 stream, it works fine.
But I don't know if those values works with different mpeg-4 stream sources ?
0x000001B0 -> Visual Object Sequence Start (VOSS) Code
0x000001B5 -> Visual Object Start (VOS) Code
You can find the complete MPEG-4 elementary video header details at "ISO/IEC 14496-2" documentation. Here are the details you asked for.
Visual Object Sequence Start (VOSS) Code
-> 4 bytes visual object sequence start code = long hex value of 0x000001B0
-> 8 bits profile/level indicator = 1 byte unsigned number
Visual Object Start (VOS) Code
-> 4 bytes visual object start code = long hex value of 0x000001B5
-> 1 bit has id marker flag = 1/4 nibble flag
_ID_Marker_Section_
-> 4 bits version id = 1 nibble unsigned value - only if marker is true
- version id types are ISO 14496-2 = 1
-> 3 bits visual object priority = 3/4 nibble unsigned value - only if marker is true
- priorities are 1 through to 7
-> 4 bits visual object type = 1 nibble unsigned value
- types are video = 1 ; still texture = 2 ; mesh = 3 ; face = 4
-> 1 bit video signal type = 1/4 nibble flag
- NOTE: if this is false Y has a sample range of 16 through to 235
I have DateTime structure for an old data format that I don't have access to any specs for. There is a field which indicates the datetime of the the data, but it isn't in any format I recognize. It appears to be stored as a 32-bit integer, that increments by 20 for each day. Has anyone ever run across something like this?
EDIT:
Example: 1088631936 DEC = 80 34 E3 40 00 00 00 00 HEX = 09/07/2007
EDIT:
First off, sorry for the delay. I had hoped to do stuff over the weekend, but was unable to.
Second, this date format is weirder than I initially thought. It appears to be some sort of exponential or logarithmic method, as the dates do not change at an increasing rate.
Third, the defunct app that I have for interpreting these values only shows the date portion, so I don't know what the time portion is.
Example data:
(Hex values are big-endian, dates are mm/dd/yyyy)
0x40000000 = 01/01/1900
0x40010000 = 01/01/1900
0x40020000 = 01/01/1900
0x40030000 = 01/01/1900
0x40040000 = 01/01/1900
0x40050000 = 01/01/1900
0x40060000 = 01/01/1900
0x40070000 = 01/01/1900
0x40080000 = 01/02/1900
0x40090000 = 01/02/1900
0x400A0000 = 01/02/1900
0x400B0000 = 01/02/1900
0x400C0000 = 01/02/1900
0x400D0000 = 01/02/1900
0x400E0000 = 01/02/1900
0x400F0000 = 01/02/1900
0x40100000 = 01/03/1900
0x40110000 = 01/03/1900
0x40120000 = 01/03/1900
0x40130000 = 01/03/1900
0x40140000 = 01/04/1900
0x40150000 = 01/04/1900
0x40160000 = 01/04/1900
0x40170000 = 01/04/1900
0x40180000 = 01/05/1900
0x40190000 = 01/05/1900
0x401A0000 = 01/05/1900
0x401B0000 = 01/05/1900
0x401C0000 = 01/06/1900
0x401D0000 = 01/06/1900
0x401E0000 = 01/06/1900
0x401F0000 = 01/06/1900
0x40200000 = 01/07/1900
0x40210000 = 01/07/1900
0x40220000 = 01/08/1900
0x40230000 = 01/08/1900
....
0x40800000 = 05/26/1901
0x40810000 = 06/27/1901
0x40820000 = 07/29/1901
....
0x40D00000 = 11/08/1944
0x40D10000 = 08/29/1947
EDIT: I finally figured this out, but since I've already given up the points for the bounty, I'll hold off on the solution in case anyone wants to give it a shot.
BTW, there is no time component to this, it is purely for storing dates.
It's not integer, it's a 32 bit floating point number. I haven't quite worked out the format yet, it's not IEEE.
Edit: got it. 1 bit sign, 11 bit exponent with an offset of 0x3ff, and 20 bit mantissa with an implied bit to the left. In C, assuming positive numbers only:
double offset = pow(2, (i >> 20) - 0x3ff) * (((i & 0xfffff) + 0x100000) / (double) 0x100000);
This yields 0x40000000 = 2.0, so the starting date must be 12/30/1899.
Edit again: since you were so kind as to accept my answer, and you seem concerned about speed, I thought I'd refine this a little. You don't need the fractional part of the real number, so we can convert straight to integer using only bitwise operations. In Python this time, complete with test results. I've included some intermediate values for better readability. In addition to the restriction of no negative numbers, this version might have problems when the exponent goes over 19, but this should keep you good until the year 3335.
>>> def IntFromReal32(i):
exponent = (i >> 20) - 0x3ff
mantissa = (i & 0xfffff) + 0x100000
return mantissa >> (20 - exponent)
>>> testdata = range(0x40000000,0x40240000,0x10000) + range(0x40800000,0x40830000,0x10000) + [1088631936]
>>> from datetime import date,timedelta
>>> for i in testdata:
print "0x%08x" % i, date(1899,12,30) + timedelta(IntFromReal32(i))
0x40000000 1900-01-01
0x40010000 1900-01-01
0x40020000 1900-01-01
0x40030000 1900-01-01
0x40040000 1900-01-01
0x40050000 1900-01-01
0x40060000 1900-01-01
0x40070000 1900-01-01
0x40080000 1900-01-02
0x40090000 1900-01-02
0x400a0000 1900-01-02
0x400b0000 1900-01-02
0x400c0000 1900-01-02
0x400d0000 1900-01-02
0x400e0000 1900-01-02
0x400f0000 1900-01-02
0x40100000 1900-01-03
0x40110000 1900-01-03
0x40120000 1900-01-03
0x40130000 1900-01-03
0x40140000 1900-01-04
0x40150000 1900-01-04
0x40160000 1900-01-04
0x40170000 1900-01-04
0x40180000 1900-01-05
0x40190000 1900-01-05
0x401a0000 1900-01-05
0x401b0000 1900-01-05
0x401c0000 1900-01-06
0x401d0000 1900-01-06
0x401e0000 1900-01-06
0x401f0000 1900-01-06
0x40200000 1900-01-07
0x40210000 1900-01-07
0x40220000 1900-01-08
0x40230000 1900-01-08
0x40800000 1901-05-26
0x40810000 1901-06-27
0x40820000 1901-07-29
0x40e33480 2007-09-07
Are you sure that values correspond to 09/07/2007?
I ask because 1088631936 are the number of seconds since Linux (et al) zero date: 01/01/1970 00:00:00 to 06/30/2004 21:45:36.
Seems to me reasonable to think the value are seconds since this usual zero date.
Edit: I know it is very possible for this not to be the correct answer. It is just one approach (a valid one) but I think more info is needed (see the comments). Editing this (again) to bring the question to the front in the hope of somebody else to answer it or give ideas. Me: with a fairness, sportive and sharing spirit :D
I'd say that vmarquez is close.
Here are dates 2009-3-21 and 2009-3-22 as unix epochtime:
In [8]: time.strftime("%s", (2009, 3, 21, 1, 1, 0, 0,0,0))
Out[8]: '1237590060'
In [9]: time.strftime("%s", (2009, 3, 22, 1, 1, 0, 0,0,0))
Out[9]: '1237676460'
And here they are in hex:
In [10]: print("%0x %0x" % (1237590060, 1237676460))
49c4202c 49c571ac
If you take only first 5 digits, the growth is 21. Which kinda matches your format, neg?
Some context would be useful. If your data file looks something, literally or at least figuratively, like this file, vmarquez is on the money.
http://www.slac.stanford.edu/comp/net/bandwidth-tests/eventanalysis/all_100days_sep04/node1.niit.pk
That reference is data produced by Available Bandwith Estimation tool (ABwE) -- the curious item is that it actually contains that 1088631936 value as well as the context. That example
date time abw xtr dbcap avabw avxtr avdbcap rtt timestamp
06/30/04 14:43:48 1.000 0.000 1.100 1.042 0.003 1.095 384.387 1088631828
06/30/04 14:45:36 1.100 0.000 1.100 1.051 0.003 1.096 376.408 1088631936
06/30/04 14:47:23 1.000 0.000 1.100 1.043 0.003 1.097 375.196 1088632043
seems to have a seven hour offset from the suggested 21:45:36 time value. (Probably Stanford local, running on Daylight savings time.)
Well, you've only shown us how your program uses 2 of the 8 digits, so we'll have to assume that the other 6 are ignored (because your program could be doing anything it wants with those other digits).
So, we could say that the input format is:
40mn0000
where m and n are two hex digits.
Then, the output is:
01/01/1900 + floor((2^(m+1)-2) + n*2^(m-3)) days
Explanation:
In each example, notice that incrementing n by 1 increases the number of days by 2^(m-3).
Notice that every time n goes from F to 0, m is incremented.
Using these two rules, and playing around with the numbers, you get the equation above.
(Except for floor, which was added because the output doesn't display fractional days).
I suppose you could rewrite this by replacing the two separate hex variables m and n with a single 2-digit hex number H. However, I think that would make the equation a lot uglier.