Measure names containing "Total" have strange grand total calculation in cubes - olap

In programmatically building cubes for SQL Server Analysis Services using AMO, I've discovered that when a measure has "Total" in it's title, the grand total in the cube is calculated by a distinct sum instead of just a sum (creating very strange results)
This doesn't occur when building cubes using DSO. Does anyone know of why this could be happening?
Please pardon my use of python:
class MeasureSpec(MeasureSpec):
def create(self, measureGroup, cube, dsv, factTable):
log("creating measure:", self.name)
measure = measureGroup.Measures.Add(self.name)
measure.AggregateFunction = self.aggregateFunction
measure.FormatString = self.format
# Set datatype to integer for counts otherwise this is set to the same
# type as the source column in createDataItem
if self.aggregateFunction in (aggCount, aggDistinctCount):
measure.DataType = MeasureDataType.Integer
measure.Visible = self.isVisible
measure.Source = createDataItem(dsv, factTable, self.column.getColumnName())

Here's what was going on. AMO was tagging the datacolumn as byte for anything with Total in the name. It was cycling to 32 which amazingly was the same number as a distinct sum on the column... Wow.

Related

Combining COUNTA() and AVERAGEX() in DAX / Power BI

I have a simple data set with training sessions for some athletes. Let's say I want to visualize how many training sessions are done as an average of the number of athletes, either in total or divided by the clubs that exist. I hope the data set is somewhat self-describing.
To norm the number of activities by the number of athletes I use two measures:
TotalSessions = COUNTA(Tab_Sessions[Session key])
AvgAthlete = AVERAGEX(VALUES(Tab_Sessions[Athlete]),[TotalSessions])
I give AvgAthlete as the desired value in both visuals shown below. If I make a filter on the clubs the values are as expected, but with no filter applied I get some strange values
What I guess happens is that since Athlete B doesn't do any strength, Athlete B is not included in the norming factor for strength. Is there a DAX function that can solve this?
If I didn't have the training sessions as a hierarchy (Type-Intensity), it would be pretty straightforward to do some kind of workaround with a calculated column, but it won't work with hierarchical categories. The expected results calculated in excel are shown below:
Data set as csv:
Session key;Club;Athlete;Type;Intensity
001;Fast runners;A;Cardio;High
002;Fast runners;A;Strength;Low
003;Fast runners;B;Cardio;Low
004;Fast runners;B;Cardio;High
005;Fast runners;B;Cardio;High
006;Brutal boxers;C;Cardio;High
007;Brutal boxers;C;Strength;High
If you specifically want to aggregate this across whatever choice you have made in your Club selection, then you simply write out a simple measure that does that:
AvgAthlete =
VAR _athletes =
CALCULATE (
DISTINCTCOUNT ( 'Table'[Athlete] ) ,
ALLEXCEPT ( 'Table' , 'Table'[Club] )
)
RETURN
DIVIDE (
[Sessions] ,
_athletes
)
Here we use a distinct count of values in the Athlete column, with all filters removed apart from on the Club column. This is, as far as I interpret your question, the denominator you are after.
Divide the total number of sessions on this number of athletes. Here is the result:

Weighted randomization based on runtime data in System Verilog

Is there a way to do weighted randomization in System Verilog based on runtime data. Say, I have a queue of integers and a queue of weights (unsigned integers) and wish to select a random integer from the first queue as per the weights in the second queue.
int data[$] = '{10, 20, 30};
uint_t weights[$] = '{100, 200, 300};
Any random construct expects the weights hardcoded as in
constraint range { Var dist { [0:1] := 50 , [2:7] := 50 }; }
But in my case, I need to pick an element from an unknown number of elements.
PS: Assume the number of elements and weights will be the same always.
Unfortunately, the dist constraint only lets you choose from a fixed number of values.
Two approaches I can think of are
Push each data value into a queue using the weight as a repetition count. In your example, you wind up with a queue of 600 values. Randomly pick an index into the queue. The selected element has the distribution you want. An example is posted here.
Create an array of ranges for each weight. For your example the array would be uint_t ranges[][2]'{{0,99},{100,299},{300,599}}. Then you could do the following in a constraint
index inside {[0:weights.sum()-1]};
foreach (data[ii])
index inside {[ranges[ii][0]:ranges[ii][1]} -> value == date[ii];

R: Continuous futures working backward

I want to create a continuous futures series, that is to eliminate a gap between two series.
First thing I want is to download all individual contracts from the beginning to the now, the syntax is always the same:
Quandl("CME/INSTRUMENT_MONTHCODE_YEAR")
1.INSTRUMENT is GC (gold) in this case
2.MONTHCODE is G J M Q V Z
3.YEAR is from 1975 to 2017 (the actual contract)
With the data, I start working from the last contract, in this case "CME/GCG1975" and with the next contract "CME/GCJ1975". Then I see the last 6 values (are the more recent because date is descending) of the first contract GCG1975
require(Quandl)
GCG1975 = Quandl("CME/GCG1975",order="asc", type="raw")
tail(GCG1975,6)
order can be asc desc (ascending or descending), type can be : raw (data frame) ts xts zoo
And it outputs:
Image: quandl-1.png = Last values of GCG1975
Then I just want the 6th row starting from the final, and I want to eliminate the columns "Last" "Change" (this could be before starting processing each individual contract):
Image: quandl-2.png = Last 6th value GCG1975
Then I want to find the row with date 1975-02-18 (last 6th value GCG1975) in the next contract (GCJ1975):
Image: quandl-3.png = 1975-02-18 on GCJ1975
Then I compute the difference between the "Settle" of the G contract and the "Settle" of the J contract.
Difference_contract = 183.6 - 185.4
Difference_contract = -1.8
So that means that the next or J contract is 1.5 points up respect the before contract so we have to sum -1.8 to all the following numbers of the J contract (Open, High, Low, Settle), including the row 1975-02-18. This:
Image: quandl-4.png = Differences between contracts
And then we have a continuous series like this:
Image: quandl-5.png = Continuous series
All this differences and sums to make a continuous series is done since the last contract until the actual contract.
I think I can't post this because I don't have 10 points of reputation and I can just post 2 image-links.
Any guidance would help me, any question you have ask me.
Thanks and hope everything is well.
RTA
Edit: I have uploaded the photos and its links on post to my dropbox so you must look into it because Stackoverflow don't allow to post more than 2 links without 10 points of reputation.
Dropbox file

SPSS Count depending on Conditions in several variables

I am quite new to SPSS and I need to count the number of certain errors made in a test (Stroop Test). There are three kinds of variables:
theCongruencies - can be 'I' or 'C' for incongruent or congruent
theWordkeys - code for a key that indicates the first letter of a word
thePressedKeys - code for the key pressed by the user
Each type exists 80 times called e.g. theCongruencies_1 to the theCongruencies_80.
I want to count how many times there is the same value in theWordKeys_x and thePressedKeys_x when theCongruencies_x has the value 'I'.
Example: theCongruencies_42 = 'I' theWordKeys_42 = 88 thePressedKeys_42 = 88
So I need to do something like this in my SPSS Code:
COMPUTE InhibErrs = COUNT(
IF(
theCongruencies_1 to theCongruencies_80 EQ 'I'
AND theWordkeys_1 to theWordkeys_80 EQ thePressedKeys_1 to thePressedKeys_80)).
execute.
Thanks a lot
Deego
Try this:
compute countVar=0.
do repeat theCongruencies=theCongruencies_1 to theCongruencies_80
/theWordkeys=theWordkeys_1 to theWordkeys_80
/thePressedKeys=thePressedKeys_1 to thePressedKeys_80.
compute countVar=sum(countVar, (theCongruencies="I" and theWordkeys=thePressedKeys)).
end repeat.
exe.

R Optimisation - Integer Programming

I have tried to use the R package LPSolve and in particular the lp.transport function to solve a optimisation problem. In my fictitious example below I have 5 office sites that I need to resource with a minimum number of employees and I have set up a cost matrix that determines the distance from each employees home to the office. I want to minimize the total distance traveled to work whilst meeting the minimum number of employees per office.
Initially this was working as I was treating all employees as equal (1). however problems have started to occur when I rate each employee by how efficient they are. For example I now want to say that officeX needs the equivalent of 2 engineers which might be made up of 4 engineers who are 50% efficient or 1 that is 200% efficient. When I do this however the solution found will split a employee across a number of offices, what I need is a additional constraint so impose that a employee can only be at 1 Office.
Anyway hopefully that is enough background here is my example:
Employee <- c("Jim","John","Jonah","James","Jeremy","Jorge")
Office1 <- c(2.58321505105556, 5.13811249390279, 2.75943834864996,
6.73543614029559, 6.23080251653027, 9.00620341764497)
Office2 <- c(24.1757667923894, 19.9990724784926, 24.3538456922105,
27.9532073293925, 26.3310994833106, 14.6856664813007)
Office3 <- c(38.6957155251069, 37.9074293509861, 38.8271000719858,
40.3882569566947, 42.6658938732098, 34.2011184027657)
Office4 <- c(28.8754359274453, 30.396841941228, 28.9595182970988,
29.2042274337124, 33.3933900645023, 28.6340025144932)
Office5 <- c(49.8854888720157, 51.9164328512659, 49.948290261029,
49.4793138594302, 54.4908258333456, 50.1487397648236)
#create CostMatrix
costMat<-data.frame(Employee,Office1, Office2, Office3, Office4, Office5)
#efficiency is the worth of employees, eg if 1 they are working at 100%
#so if for example I wanted 5 Employees
#working in a office then I could choose 5 at 100% or 10 working at 50% etc...
efficiency<-c(0.8416298, 0.8207991, 0.7129663, 1.1406839, 1.3868177, 1.1989748)
#Uncomment next line to see the working version based on headcount
#efficiency<-c(1,1,1,1,1,1)
#Minimum is the minimum number of Employees we want in each office
minimum<-c(1, 1, 2, 1, 1)
#solve problem
opSol <-lp.transport(cost.mat = as.matrix(costMat[,-1]),
direction = "min",
col.signs = rep(">=",length(minimum)),
col.rhs = minimum,
row.signs = rep("==", length(efficiency)),
row.rhs = efficiency,
integers=NULL)
#view solution
opSol$solution
# My issue is one employee is being spread across multiple areas,
#what I really want is a extra constraint that says that in a row there
# can only be 1 non 0 value.
I think this is no longer a transportation problem. However you still can solve it as a MIP model:

Resources