I am building a report in Power BI for all the calls we do.
Down below is a simplified version of the data I work with (we produce 1250 calls per hour, so my data is a lot bigger than this).
Every row is a call attempt, the first column defines who the attempt was to, the second in which week, the third the attemptnr and the last column states the status of the phone call.
For example: We called ID 1 two times, the first time in week 1 which ended in status 310 (means callback) and the second time in week 2 which ended in status 710 (positive conversion).
The problem: I want to make a count of all the people (Call ID's) who are still waiting on a phone call (last call status = 310).
If I use: CALCULATE(DISTINCTCOUNT(data[ID]), data[Status] = 310) the result = 3. Which makes sense: PBI counts 3 times a 310 status.
But it should count only 1, because ID 1 & 2 are already called back and have a positive result (710 & 711). So it needs to look to the highest attemptnr.
So I tried: CALCULATE(DISTINCTCOUNT(Blad1[ID]), FILTER(Blad1,MAX(Blad1[Attempt])), Blad1[Status] = 310) But this also results in a count of 3.
I've found solutions in which you make a calculated column, but I also want to combine this with a slicer on the weeknumber, so I can check what the callbacks in a specified week are.
So basically I need PBI to count the ID's with a certain status (310) with max attempt. Does anybody know how I can do this?
Related
I have a big dataset (around 100k rows) with 2 columns referencing a device_id and a date and the rest of the columns being attributes (e.g. device_repaired, device_replaced).
I'm building a ML algorithm to predict when a device will have to be maintained. To do so, I want to calculate certain features (e.g. device_reparations_on_last_3days, device_replacements_on_last_5days).
I have a function that subsets my dataset and returns a calculation:
For the specified device,
That happened before the day in question,
As long as there's enough data (e.g. if I want last 3 days, but only 2 records exist this returns NA).
Here's a sample of the data and the function outlined above:
data = data.frame(device_id=c(rep(1,5),rep(2,10))
,day=c(1:5,1:10)
,device_repaired=sample(0:1,15,replace=TRUE)
,device_replaced=sample(0:1,15,replace=TRUE))
# Exaxmple: How many times the device 1 was repaired over the last 2 days before day 3
# => getCalculation(3,1,data,"device_repaired",2)
getCalculation <- function(fday,fdeviceid,fdata,fattribute,fpreviousdays){
# Subset dataset
df = subset(fdata,day<fday & day>(fday-fpreviousdays-1) & device_id==fdeviceid)
# Make sure there's enough data; if so, make calculation
if(nrow(df)<fpreviousdays){
calculation = NA
} else {
calculation = sum(df[,fattribute])
}
return(calculation)
}
My problem is that the amount of attributes available (e.g. device_repaired) and the features to calculate (e.g. device_reparations_on_last_3days) has grown exponentially and my script takes around 4 hours to execute, since I need to loop over each row and calculate all these features.
I'd like to vectorize this logic using some apply approach which would also allow me to parallelize its execution, but I don't know if/how it's possible to add these arguments to a lapply function.
I want to create a continuous futures series, that is to eliminate a gap between two series.
First thing I want is to download all individual contracts from the beginning to the now, the syntax is always the same:
Quandl("CME/INSTRUMENT_MONTHCODE_YEAR")
1.INSTRUMENT is GC (gold) in this case
2.MONTHCODE is G J M Q V Z
3.YEAR is from 1975 to 2017 (the actual contract)
With the data, I start working from the last contract, in this case "CME/GCG1975" and with the next contract "CME/GCJ1975". Then I see the last 6 values (are the more recent because date is descending) of the first contract GCG1975
require(Quandl)
GCG1975 = Quandl("CME/GCG1975",order="asc", type="raw")
tail(GCG1975,6)
order can be asc desc (ascending or descending), type can be : raw (data frame) ts xts zoo
And it outputs:
Image: quandl-1.png = Last values of GCG1975
Then I just want the 6th row starting from the final, and I want to eliminate the columns "Last" "Change" (this could be before starting processing each individual contract):
Image: quandl-2.png = Last 6th value GCG1975
Then I want to find the row with date 1975-02-18 (last 6th value GCG1975) in the next contract (GCJ1975):
Image: quandl-3.png = 1975-02-18 on GCJ1975
Then I compute the difference between the "Settle" of the G contract and the "Settle" of the J contract.
Difference_contract = 183.6 - 185.4
Difference_contract = -1.8
So that means that the next or J contract is 1.5 points up respect the before contract so we have to sum -1.8 to all the following numbers of the J contract (Open, High, Low, Settle), including the row 1975-02-18. This:
Image: quandl-4.png = Differences between contracts
And then we have a continuous series like this:
Image: quandl-5.png = Continuous series
All this differences and sums to make a continuous series is done since the last contract until the actual contract.
I think I can't post this because I don't have 10 points of reputation and I can just post 2 image-links.
Any guidance would help me, any question you have ask me.
Thanks and hope everything is well.
RTA
Edit: I have uploaded the photos and its links on post to my dropbox so you must look into it because Stackoverflow don't allow to post more than 2 links without 10 points of reputation.
Dropbox file
i m a total beginner in Choco Solver. I want to make a simple shift scheduler.
i have set integer variables like this
IntVar day1 = model.intVar("day1", new int[] {0,1,2,3,4,5});
where 0 , 1,...5 is a reference ID to an employee.
I have a total of 30 variables,(one for every day of the month) since this a monthly based shift schedule.
I have set up constraints, that do not allow e.g. not be on shift for two days in a row.
My question is,
how can i set up a constraint, such that each employer has a minimum of 5 shifts ie. each value in the domain appears at least 5 times in all 30 variables ?
Thank you!
There are several ways of doing this. Give a look at model.globalCardinality and model.count, these constraints enable to count the number of times a value is used by a set of variables.
http://choco-solver.org/apidocs/org/chocosolver/solver/constraints/IConstraintFactory.html
For instance, model.count(3, vars, model.intVar(5,10)).post(); means that between 5 and 10 variables in vars should be equal to 3, so employee 3 should do between 5 and 10 shifts.
I'm building a report using Report Builder. It uses Report Application Pascal, which is based on Delphi Object Pascal. I'm still learning this and struggling with a variable value.
I have a variable called 'duration' which contains the following script:
value := round(ReportWizardQuery['wodFinishDate'] - ReportWizardQuery
['wodCreateDate']);
This gives me the result I want. It calculates the total number of days between the two dates.
What I'm trying to do then is to use the value of this 'duration' variable to find out if jobs (which are defined by the start and end date) have been completed on the same day, within 1-5 days, 6-10 days, etc.
I've created columns with these headings and placed a varible in each column in the detail band of the report. The code I have written in the variable for 'same-day' is:
if (duration = 0) then
value := 1;
Likewise for jobs being completed between 1 - 5 days
if (duration > 0 and < 6) then
value := 1;
But variables are blank when report run. I have tried to assign the value of the 'duration' variable to that of the same-day variable and it returns a weird number which is the the same for each line in the report (99468080, or 10150660...etc) This number changes each time I run the report and always seems to be 8 digits long.
Does anybody have any idea what I'm doing wrong and how I can assign the value 1 for each variable if the duration variable = 0, or between 1 - 5, etc.
Thanks.
I have a vector of binary variables which state whether a product is on promotion in the period. I'm trying to work out how to calculate the duration of each promotion and the duration between promotions.
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0))
So in other words: if promo.flag is same as previous period then running.total + 1, else running.total is reset to 1
I've tried playing with apply functions and cumsum but can't manage to get the conditional reset of running total working :-(
The output I need is:
promo.flag = c(1,1,0,1,0,0,1,1,1,0,1,1,0)
rolling.sum = c(1,2,1,1,1,2,1,2,3,1,1,2,0)
Can anybody shed any light on how to achieve this in R?
It sounds like you need run length encoding (via the rle command in base R).
unlist(sapply(rle(promo.flag)$lengths,seq))
Gives you a vector 1 2 1 1 1 2 1 2 3 1 1 2 1. Not sure what you're going for with the zero at the end, but I assume it's a terminal condition and easy to change after the fact.
This works because rle() returns a list of two, one of which is named lengths and contains a compact sequence of how many times each is repeated. Then seq when fed a single integer gives you a sequence from 1 to that number. Then apply repeatedly calls seq with the single numbers in rle()$lengths, generating a list of the mini sequences. unlist then turns that list into a vector.