Trouble calculating nth percentile with reductio - crossfilter

I've got some crossfilter data with dates (d) and values (v):
[
{d: "2013-07-26T00:00:00.000Z", v: 2.5}
{d: "2013-07-25T00:00:00.000Z", v: 2.64}
// ...and many more
[
I've created a group for the months in Crossfilter (crossfilter2#1.4.5):
months = cf.dimension((d) => {
const dateObj = new Date(d.d);
// use 1-12 instead of 0-11
return dateObj.getMonth() + 1;
});
monthsGroup = months.group();
So monthsGroup.all() returns an array of 12 objects, aggregated by month. I want those objects to include the min, max, and median, as well as the 25th and 75th percentile. Reductio (reductio#0.6.3) helps with the min, max, and median out of the box, so I've added a custom aggregator to add the 75th and 25th percentiles.
The following code works, but it's very slow:
const monthReducer = reductio()
.valueList(d => d.v)
.min(true)
.max(true)
.median(true)
.count(true)
.custom({
add(p) {
const valueList = p.valueList;
p.p75 = getPercentile(valueList, 75);
p.p25 = getPercentile(valueList, 25);
return p;
},
remove(p) {
const valueList = p.valueList;
p.p75 = getPercentile(valueList, 75);
p.p25 = getPercentile(valueList, 25);
return p;
},
initial(p) {
p.p75 = undefined;
p.p25 = undefined;
return p;
},
});
If I remove the .custom block, it's much faster. This runs the code for each item in the data, which is unnecessary because it only needs to look at the final valueList. Reductio has a barely-documented .post() hook that I think would do the trick here, but I can't get it working.
UPDATE: I got the post-processing hook callback to run, but it doesn't work the way I expected.
I tried registering a new post processor with an undocumented method I saw in the source:
// register post-processing function to add percentiles
reductio.registerPostProcessor('addPercentiles', (prior) => {
const all = prior();
return () => {
const updated = all.map((e) => {
const valueList = e.value.valueList;
e.value.p75 = getPercentile(valueList, 75);
e.value.p25 = getPercentile(valueList, 25);
return e;
});
return updated;
};
});
and adding it to the post() hook:
// run post-processing to add the 25th & 75th %iles
this.monthsGroup.post().addPercentiles()();
This appears to do what I want, but only once. It doesn't re-run the post hooks when a filter is applied to another dimension.
If median is just the 50th percentile, it should be trivial to also get the 25th and 75th. I feel like I'm close, but I'm obviously doing something wrong. How can I add these aggregations to the reductio reducer?

One solution is to just add the quantiles manually, right before rendering the chart. I have a formatData function does date/time formatting, and restructures the data to be more d3-friendly. Since valueList is still available in every element of the array, I just added a couple of lines to calculate the 25th and 27th percentiles in there.
Not ideal, but very easy!

Related

Retrieve and compare the style attribute of an element periodically using using cypress

I have a time indicator that travels over a timescale, the indicator's style attribute value keeps on changing for every x milliseconds and I need to get, store and compare that the previously captured value is greater than the latest value.
Initial value:
Latest value:
The logic is, from one point (left 10), every second it moves to the left (left -0, -1, -2, -3 ...)
I tried few ways and one of them is to capture in the same 'cy.then', but in that case, the element will not have the recent value. So far, I tried this. it fetches the value and with some help of regex, I got a 'comparable' value but how I can store/compare those values? Additionally, what is the best way if we need to compare more than 2 values?
const BTN_CONTROL_TIMEINDICATOR = '#currentTimeIndicator'
static verifyTimeLapse() {
//wip
var initialVal, nextVal
initialVal = this.getAnyValueOfAnElement(BTN_CONTROL_TIMEINDICATOR)
cy.wait(500)
nextVal = this.getAnyValueOfAnElement(BTN_CONTROL_TIMEINDICATOR)
cy.log(initialVal > nextVal)
}
static getAnyValueOfAnElement(element) {
//wip
cy.get(element)
.then(($ele) => {
const val=$ele.attr('style').replace(/[^\d.-]/g, '')
cy.log(val)
// return does not work
})
}
cy.log:
Page objects don't work very well with the Cypress command queue, here's what you might do with custom commands.
/* Get the numeric value of CSS left in px */
Cypress.Commands.add('getTimescaleValue', () => {
cy.get('#currentTimeIndicator')
.then($el => +$el[0].style.left.replace('px',''))
})
/* Get a sequence of time scale values */
Cypress.Commands.add('getTimescaleValues', ({numValues, waitBetween}) => {
const values = [];
Cypress._.times(numValues, () => { // repeat inner commands n times
cy.getTimescaleValue()
.then(value => values.push(value)) // save value
.wait(waitBetween)
})
return cy.wrap(values);
})
/* Assert a sequence of values are in descending order */
Cypress.Commands.add('valuesAreDescending', { prevSubject: true }, (values) => {
values.reduce((prev, current) => {
if (prev) { // skip first (no prev to compare)
expect(prev).to.be.gt(current) // assert pairs of values
}
return current
});
})
it('check the timeline', () => {
cy.getTimescaleValues({ numValues: 10, waitBetween: 100 })
.valuesAreDescending()
})
Log
assert
expected 63 to be above 58
assert
expected 58 to be above 48
assert
expected 48 to be above 43
assert
expected 43 to be above 33
assert
expected 33 to be above 23
assert
expected 23 to be above 18
assert
expected 18 to be above 13
assert
expected 13 to be above 3
assert
expected 3 to be above -2
Tested with
<div id="currentTimeIndicator" style="left:63px">Target</div>
<script>
const timer = setInterval(() => {
const div = document.querySelector('#currentTimeIndicator')
const left = +div.style.left.replace('px', '');
if (left < 0) {
clearInterval(timer)
return
}
const next = (left - 5) + 'px';
div.style.left = next;
}, 100)
</script>
If your app uses setInterval() for timing, you should be able to use cy.clock() and cy.tick() instead of .wait(waitBetween) to get more precise sampling and faster test execution.
I don't know where the initial value comes from. But before it changes, maybe on page load, maybe as first job on click, etc you can do something like this:
let item = document.querySelector("#currentTimeIndicator");
item.dataset.left = parseFloat(item.style.left);
console.log(item);
<div id="currentTimeIndicator" style="left:-20px"></div>

How to separate multiple columns from a range in an array?

I have a range of data in a Google Sheet and I want to store that data into an array using the app script. At the moment I can bring in the data easily enough and put it into an array with this code:
var sheetData = sheet.getSheetByName('Fruit').getRange('A1:C2').getValues()
However, this puts each row into an array. For example, [[Apple,Red,Round],[Banana,Yellow,Long]].
How can I arrange the array by columns so it would look: [[Apple,Banana],[Red,Yellow],[Round,Long]].
Thanks.
It looks like you have to transpose the array. You can create a function
function transpose(data) {
return (data[0] || []).map (function (col , colIndex) {
return data.map (function (row) {
return row[colIndex];
});
});
}
and then pass the values obtained by .getValues() to that function..
var sheetData = transpose(sheet.getSheetByName('Fruit').getRange('A1:C2').getValues())
and check the log. See if that works for you?
Use the Google Sheets API, which allows you to specify the primary dimension of the response. To do so, first you must enable the API and the advanced service
To acquire values most efficiently, use the spreadsheets.values endpoints, either get or batchGet as appropriate. You are able to supply optional arguments to both calls, and one of which controls the orientation of the response:
const wb = SpreadsheetApp.getActive();
const valService = Sheets.Spreadsheets.Values;
const asColumn2D = { majorDimension: SpreadsheetApp.Dimension.COLUMNS };
const asRow2D = { majorDimension: SpreadsheetApp.Dimension.ROWS }; // this is the default
var sheet = wb.getSheetByName("some name");
var rgPrefix = "'" + sheet.getName() + "'!";
// spreadsheetId, range string, {optional arguments}
var single = valService.get(wb.getId(), rgPrefix + "A1:C30");
var singleAsCols = valService.get(wb.getId(), rgPrefix + "A1:C30", asColumn2D);
// spreadsheetId, {other arguments}
var batchAsCols = valService.batchGet(wb.getId(), {
ranges: [
rgPrefix + "A1:C30",
rgPrefix + "J8",
...
],
majorDimension: SpreadsheetApp.Dimension.COLUMNS
});
console.log({rowResp: single, colResp: singleAsCols, batchResponse: batchAsCols});
The reply will either be a ValueRange (using get) or an object wrapping several ValueRanges (if using batchGet). You can access the data (if any was present) at the ValueRange's values property. Note that trailing blanks are omitted.
You can find more information in the Sheets API documentation, and other relevant Stack Overflow questions such as this one.

Performance server scripting

I have table with multiple customerKey values assigned to a numeric value; I wrote a script where foreach row of data I scan whole table to find all values assigned to the current customerKey and return a highest one;
I have a problem with performance - script processes around 10 records per second - any ideas how to improve this or maybe propose an alternative solution plesae?
function getLastest() {
var date = app.models.magicMain.newQuery();
var date_all = date.run();
date_all.forEach(function(e) { // for every row of date_all
var temp = date_all.filter(function(x) {
return x.SubscriberKey === e.SubscriberKey; // find matching records for the current x.SubscriberKey
});
var dates = [];
temp.forEach(function(z) { // get all matching "dates"
dates.push(z.Date);
});
var finalValue = dates.reduce(function(a, b) { // get highest dates value (integer)
return Math.max(a, b);
});
var record = app.models.TempOperatoins.newRecord(); // save results to DB
record.email = e.SubscriberKey.toString() + " " + finalValue.toString();
app.saveRecords([record]);
});
}
The only suggestion I have would be to add:
var recordstosave = [];
At the top of your function.
Then replace app.saveRecords([record]) with recordstosave.push(record).
Finally outside of your foreach function do app.saveRecords(recordstosave).
I saw major processing time improvements doing this rather than saving each record individually inside a loop.

Crossfilter: how to build custom reduce functions when I want to access a specific array-value?

I have constructed my crossfilter-setup a bit different than in most examples I can find, namely:
I have data-array d with multiple data-sources included, among which is data1.
var cf = crossfilter(d3.range(0, d.data1.length));
Then I construct my dims like:
var dim = cf.dimension(function(i) { return d.data1[i].id; });
And I construct my groups like:
var group = dim.group().reduceSum(function(i) { return d.data1[i].total;});
This all works fine, but when I want to create custom reduce functions, the extra parameter i is giving me trouble.
var reduceAddPerc = function(p,v) {
p.sumOfSub += d.data1[i].var1;
p.sumOfTotal += d.data1[i].total;
p.finalVal = p.sumOfSub / p.sumOfTotal;
return p;
};
var reduceRemovePerc = function(p,v) {
p.sumOfSub -= d.data1[i].var1;
p.sumOfTotal -= d.data1[i].total;
p.finalVal = p.sumOfSub / p.sumOfTotal;
return p;
};
var reduceInitialPerc = function() {
return {sumOfSub:0, sumOfTotal:0, finalVal:0 };
};
And then defining the group with:
var group = dim.group().reduce(reduceAddPerc,reduceRemovePerc,reduceInitialPerc);
This doesn't work obviously, since the parameter i is now not known within the function. But I've tried adding the parameter (p,v,i), or nesting the functions by creating an additional function with parameter i around the (p,v) function, and also creating an additionao function(i) within the (p,v) function, but I cannot get this to work.
Does anyone have any help to offer?
In the custom reduce functions, the v parameter is the record currently being "reduced". In this case, it should be your counter, so just use it where you would normally use i. Is that not working?

Crossfilter reduce :: find number of uniques

I am trying to create a custom reduce function for a dataset attribute group that would sum a number of unique values for another attribute.
For example, my dataset looks like a list of actions on projects by team members:
{ project:"Website Hosting", teamMember:"Sam", action:"email" },
{ project:"Website Hosting", teamMember:"Sam", action:"phoneCall" },
{ project:"Budjet", teamMember:"Joe", action:"email" },
{ project:"Website Design", teamMember:"Joe", action:"design" },
{ project:"Budget", teamMember:"Sam", action:"email" }
So, team members work on a variable number of projects by performing one action per line. I have a dimension by team member, and would like to reduce it by the number of projects (uniques).
I tried the below (storing project in a uniques array) without success (sorry, this might hurt your eyes):
var teamMemberDimension = dataset.dimension(function(d) {
return d.teamMember;
});
var teamMemberDimensionGroup = teamMemberDimension.group().reduce(
// add
function(p,v) {
if( p.projects.indexOf(v.project) == -1 ) {
p.projects.push(v.project);
p.projectsCount += 1;
}
return p;
},
// remove
function(p,v) {
if( p.projects.indexOf(v.projects) != -1 ) {
p.projects.splice(p.projects.indexOf(v.projects), 1);
p.projectsCount -= 1;
}
return p;
},
// init
function(p,v) {
return { projects:[], projectsCount:0 }
}
);
Thanks a lot!
Edit after DJ Martin's answer ::
So, to be clearer, I would like to get the numbers I am after here would be:
-----------
Sam : 2 (projects he is workin on, no matter the number of actions)
Joe : 2 (projects he is workin on, no matter the number of actions)
-----------
The answer provided by DJ Martin gets me there. But rather than hard coding a table, I would like to find a way to use these numbers for my DC.JS bar chart. When I was only using the number of actions (so just a reduceCount() ), I did it like below:
teamMemberChart.width(270)
.height(220)
.margins({top: 5, left: 10, right: 10, bottom: 20})
.dimension(teamMemberDimension)
.group(teamMemberDimensionGroup)
.colors(d3.scale.category20())
.elasticX(true)
.xAxis().ticks(4);
I guess there might be something to change in the group().
UPDATED ANSWER
Sorry I misunderstood the question... you are actually on the right track. You'll just need to maintain a count of each project so that your subtract function can know when to remove the value.
teamMemberGroup = teamMemberDimension.group().reduce(
function (p, d) {
if( d.project in p.projects)
p.projects[d.project]++;
else p.projects[d.project] = 1;
return p;
},
function (p, d) {
p.projects[d.project]--;
if(p.projects[d.project] === 0)
delete p.projects[d.project];
return p;
},
function () {
return {projects: {}};
});
Here is an updated fiddle: http://jsfiddle.net/djmartin_umich/3LyhL/

Resources