How to filter empty groups from a reduction? - crossfilter

It appears to me that Crossfilter never excludes a group from the results of a reduction, even if the applied filters have excluded all the rows in that group. Groups that have had all of their rows filtered out simply return an aggregate value of 0 (or whatever reduceInitial returns).
The problem with this is that it makes it impossible to distinguish between groups that contain no rows and groups that do contain rows but just legitimately aggregate to a value of 0. Basically, there's no way (that I can see) to distinguish between a null value and a 0 aggregation.
Does anybody know of a built-in Crossfilter technique for achieving this? I did come up with a way to do this with my own custom reduceInitial/reduceAdd/reduceRemove method but it wasn't totally straight forward and it seemed to me that this is behavior that might/should be more native to Crossfilter's filtering semantics. So I'm wondering if there's a canonical way to achieve this.
I'll post my technique as an answer if it turns out that there is no built-in way to do this.

A simple way to accomplish this is to have both count and total be reduce attributes:
var dimGroup = dim.group().reduce(reduceAdd, reduceRemove, reduceInitial);
function reduceAdd(p, v) {
++p.count;
p.total += v.value;
return p;
}
function reduceRemove(p, v) {
--p.count;
p.total -= v.value;
return p;
}
function reduceInitial() {
return {count: 0, total: 0};
}
Empty groups will have zero counts, so retrieving only non-empty groups is easy:
dimGroup.top(Infinity).filter(function(d) { return d.value.count > 0; });

OK, there doesn't seem to be any obvious answer jumping out so I'll answer my own question and post the technique I used to solve this.
This example assumes that I've already created a dimension and grouping, which is passed in as groupDim. Because I want to be able to sum up any arbitrary numeric field, I also pass in fieldName so that it will be available in the closure scope of my the reduction functions.
One important characteristic of this technique is that it relies on there being a way to uniquely identify which group each row belongs to. Thinking in term of OLAP, this is essentially the "tuple" that defines a particular aggregation context. But it can be anything you want as long as it deterministically returns the same value for all data rows belonging to a given group.
The end result is that empty groups will have an aggregate value of "null" which can be easily detected for and filtered out after the fact. Any group with at least one row will have a numeric value (even if it happens to be zero).
Refinements or suggestions to this are more then welcome. Here's the code with comments inline:
function configureAggregateSum(groupDim, fieldName) {
function getGroupKey(datum) {
// Given datum return key corresponding to the group to which the datum belongs
}
// This object will keep track of the number of times each group had reduceAdd
// versus reduceRemove called. It is used to revert the running aggregate value
// back to "null" if the count hits zero. This is unfortunately necessary because
// Crossfilter filters as it is aggregating so reduceAdd can be called even if, in
// the end, all records in a group end up being filtered out.
//
var groupCount = {};
function reduceAdd(p, v) {
// Here's the code that keeps track of the invocation count per group
var groupKey = getGroupKey(v);
if (groupCount[groupKey] === undefined) { groupCount[groupKey] = 0; }
groupCount[groupKey]++;
// And here's the implementation of the add reduction (sum in my case)
// Note the check for null (our initial value)
var value = +v[fieldName];
return p === null ? value : p + value;
}
function reduceRemove(p, v) {
// This code keeps track of invocations of invocation count per group and, importantly,
// reverts value back to "null" if it hits 0 for the group. Essentially, if we detect
// that group has no records again we revert to the initial value.
var groupKey = getGroupKey(v);
groupCount[groupKey]--;
if (groupCount[groupKey] === 0) {
return null;
}
// And here's the code for the remove reduction (sum in my case)
var value = +v[fieldName];
return p - value;
}
function reduceInitial() {
return null;
}
// Once returned, can invoke all() or top() to get the values, which can then be filtered
// using a native Array.filter to remove the groups with null value.
return groupedDim.reduce(reduceAdd, reduceRemove, reduceInitial);
}

Related

Confusion about recursion for BST

Is there an easy way to understand when you can just call the recursive method vs having to set that recursive method to a variable?
For example...
Just calling the recursive function to traverse:
self.recurse(node.left)
self.recurse(node.right)
Having to set the recursive function to node.left and node.right:
node.left = self.recurse(node.left)
node.right = self.recurse(node.left)
Another example is to delete a node in a bst you have to set the recursive function to root.left and root.right... I get it but not completely... is there a easy way to understand when you can just call the recursive function vs having to set it to node.left, node.right..etc...?
def deleteNode(self, root: TreeNode, key:int) -> TreeNode:
if not root:
return root
if key < root.val:
root.left = self.deleteNode(root.left,key)
elif key > root.val:
root.right = self.deleteNode(root.right,key)
else:
if not root.left:
return root.right
elif not root.right:
return root.left
root.val = self.successor(root.right)
root.right = self.deleteNode(root.right,root.val)
return root
To understand this two above scenarios (Simple Recursive Call and Set result of Recursive call to a Variable), just try to understand the following code/function.
Let's say, you have a TREE, which contains a value in every node where value is either negative or positive. Now let's say, you are going to count how many nodes are there whose value is Positive.
The TREE structure for this problem is like following:
TREE{
Integer val;
TREE left = right = null;
}
Now you gave me this problem to solve. And I wrote a function/method which will count nodes with positive value. The function is following:
Integer countNodes(TREE node){
if(node == null){
return 0;
}else{
Integer count = 0; // which will count how many nodes are there with positive value
if(node.val >= 0){
count += 1; // if the value is positive I incremented count
}
// and we are checking every other nodes present in the TREE
getCount(node.left);
getCount(node.right);
// and return the final result
return count;
}
}
Now I returned my function to you, and you executed! But what! There is a big WRONG! It's giving wrong result, over and over again!!
But why???
Let's analysis.
if(node.val >= 0){
count += 1;
}
Up to that we were right! But the problem was, we were incremented the count, but wasn't use it! Each time we was calling function recursively, a new stack frame was created, a new variable named "count" was created, but we were not using this value!
To use the variable "count", we need to re-initialize the returned value of every recursive call to the variable, that's the way we can keep a link between the current stack-frame and the previous stack-frame and the previous of previous stack-frame and goes onn!.. we need to change little-bit in the function countNodes like following:
if(node.val >= 0){
count += 1;
}
count += getCount(node.left); // re-initialize count in each recursive call
count += getCount(node.right); // re-initialize count in each recursive call
return count;
Now everything we'll be alright! this code will work perfectly!
The above scenarios is implies your problem.
self.recurse(node.left)
self.recurse(node.right)
this is nothing but simple traversing over all the nodes.
But if you need to use the returned result of every recursion, you need to initialize/re-initialize the returned value to a variable. That's what is happening with:
node.left = self.recurse(node.left)
node.right = self.recurse(node.left)
I HOPE this long (bit long) explanation will help to go further. Happy Coding! : )

complex reduce sample unclear how the reduce works

Starting with complex reduce sample
I have trimmed it down to a single chart and I am trying to understand how the reduce works
I have made comments in the code that were not in the example denoting what I think is happening based on how I read the docs.
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);//set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for remove below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and splice it out
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
function groupArrayInit() {
//for each key found by the key function return this array?
return []; //the result array for where the data is being inserted in sorted order?
}
I am not quite sure my perception of how this is working is quite right. Some of the magic isn't showing itself. Am I correct that elements is the group the reduce function is being called on ? also the array in groupArrayInit() how is it being indirectly populated?
Part of me feels that the functions supplied to the reduce call are really array.map functions not array.reduce functions but I just can't quite put my finger on why. having read the docs I am just not making a connection here.
Any help would be appreciated.
Also have I missed Pens/Fiddles that are created for all these examples? like this one
http://dc-js.github.io/dc.js/examples/complex-reduce.html which is where I started with this but had to download the csv and manually convert to Json.
--------------Update
I added some print statements to try to clarify how the add function is working
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
console.log("---Start Elements and Item and keyfn(item)----")
console.log(elements) //elements grouped by run?
console.log(item) //not seeing the pattern on what this is on each run
console.log(keyfn(item))
console.log("---End----")
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
and to print out the group's contents
console.log("RunAvgGroup")
console.log(runAvgGroup.top(Infinity))
which results in
Which appears to be incorrect b/c the values are not sorted by key (the run number)?
And looking at the results of the print statements doesn't seem to help either.
This looks basically right to me. The issues are just conceptual.
Crossfilter’s group.reduce is not exactly like either Array.reduce or Array.map. Group.reduce defines methods for handling adding new records to a group or removing records from a group. So it is conceptually similar to an incremental Array.reduce that supports an reversal operation. This allows filters to be applied and removed.
Group.top returns your list of groups. The value property of these groups should be the elements value that your reduce functions return. The key of the group is the value returned by your group accessor (defined in the dimension.group call that creates your group) or your dimension accessor if you didn’t define a group accessor. Reduce functions work only on the group values and do not have direct access to the group key.
So check those values in the group.top output and hopefully you’ll see the lists of elements you expect.

How get random item from es6 Map or Set

I have a project that uses arrays of objects that I'm thinking of moving to es6 Sets or Maps.
I need to quickly get a random item from them (obviously trivial for my current arrays). How would I do this?
Maps and Sets are not well suited for random access. They are ordered and their length is known, but they are not indexed for access by an order index. As such, to get the Nth item in a Map or Set, you have to iterate through it to find that item.
The simple way to get a random item from a Set or Map would be to get the entire list of keys/items and then select a random one.
// get random item from a Set
function getRandomItem(set) {
let items = Array.from(set);
return items[Math.floor(Math.random() * items.length)];
}
You could make a version that would work with both a Set and a Map like this:
// returns random key from Set or Map
function getRandomKey(collection) {
let keys = Array.from(collection.keys());
return keys[Math.floor(Math.random() * keys.length)];
}
This is obviously not something that would perform well with a large Set or Map since it has to iterate all the keys and build a temporary array in order to select a random one.
Since both a Map and a Set have a known size, you could also select the random index based purely on the .size property and then you could iterate through the Map or Set until you got to the desired Nth item. For large collections, that might be a bit faster and would avoid creating the temporary array of keys at the expense of a little more code, though on average it would still be proportional to the size/2 of the collection.
// returns random key from Set or Map
function getRandomKey(collection) {
let index = Math.floor(Math.random() * collection.size);
let cntr = 0;
for (let key of collection.keys()) {
if (cntr++ === index) {
return key;
}
}
}
There's a short neat ES6+ version of the answer above:
const getRandomItem = iterable => iterable.get([...iterable.keys()][Math.floor(Math.random() * iterable.size)])
Works for Maps as well as for Sets (where keys() is an alias for value() method)
This is the short answer for Sets:
const getRandomItem = set => [...set][Math.floor(Math.random()*set.size)]

Counting the number of elements of an array but with a condition

I have an array with real numbers, say A. I have calculated the mean as np.mean(A)
Now I want to check how many elements fell below the mean and how many above.
for example
A = [ 1 2 3 5] so the average is 2.75. So, i have two elements below the average and two elements above.
Any help will be appreciated
Not sure if this is what you are looking for, but you could do:
function mean(array){
var sum=0;
for (item in array){
sum = sum + array[item];
}
return sum/(array.length)
}
function belowMean(array) {
return array.filter(function(item){
return item < mean(array);
});
}
var a=[1,2,3,4];
alert(mean(a));
alert(belowMean(a)); //you'll get an array with those elements below the mean.
alert(belowMean(a).length); //you'll get how many elements are below the mean.
It's ugly though, I'd rather modified the array prototype to tho so.
How about loop it twice? First time for average value and second time for your count?

Sorting in AdvancedDatagrid in Flex 3

I am using AdvancedDatagrid in Flex 3. One column of AdvancedDatagrid contains numbers and alphabets. When I sort this column, numbers come before alphabets (Default behavior of internal sorting of AdvancedDatagrid). But I want alphabets to come before number when I sort.
I know I will have to write the custom sort function. But can anybody give some idea on how to proceed.
Thanks in advance.
Use sortCompareFunction
The AdvancedDataGrid control uses this function to sort the elements of the data provider collection. The function signature of the callback function takes two parameters and has the following form:
mySortCompareFunction(obj1:Object, obj2:Object):int
obj1 — A data element to compare.
obj2 — Another data element to compare with obj1.
The function should return a value based on the comparison of the objects:
-1 if obj1 should appear before obj2 in ascending order.
0 if obj1 = obj2.
1 if obj1 should appear after obj2 in ascending order.
<mx:AdvancedDataGridColumn sortCompareFunction="mySort"
dataField="colData"/>
Try the following sort compare function.
public function mySort(obj1:Object, obj2:Object):int
{
var s1:String = obj1.colData;
var s2:String = obj2.colData;
var result:Number = s1.localeCompare(s2);
if(result != 0)
result = result > 0 ? 1 : -1;
if(s1.match(/^\d/))
{
if(s2.match(/^\d/))
return result;
else
return 1;
}
else if(s2.match(/^\d/))
return -1;
else
return result;
}
It checks the first character of strings and pushes the ones that start with a digit downwards in the sort order. It uses localeCompare to compare two strings if they both start with letters or digits - otherwise it says the one starting with a letter should come before the one with digit. Thus abc will precede 123 but a12 will still come before abc.
If you want a totally different sort where letters always precede numbers irrespective of their position in the string, you would have to write one from the scratch - String::charCodeAt might be a good place to start.

Resources