How get random item from es6 Map or Set - dictionary

I have a project that uses arrays of objects that I'm thinking of moving to es6 Sets or Maps.
I need to quickly get a random item from them (obviously trivial for my current arrays). How would I do this?

Maps and Sets are not well suited for random access. They are ordered and their length is known, but they are not indexed for access by an order index. As such, to get the Nth item in a Map or Set, you have to iterate through it to find that item.
The simple way to get a random item from a Set or Map would be to get the entire list of keys/items and then select a random one.
// get random item from a Set
function getRandomItem(set) {
let items = Array.from(set);
return items[Math.floor(Math.random() * items.length)];
}
You could make a version that would work with both a Set and a Map like this:
// returns random key from Set or Map
function getRandomKey(collection) {
let keys = Array.from(collection.keys());
return keys[Math.floor(Math.random() * keys.length)];
}
This is obviously not something that would perform well with a large Set or Map since it has to iterate all the keys and build a temporary array in order to select a random one.
Since both a Map and a Set have a known size, you could also select the random index based purely on the .size property and then you could iterate through the Map or Set until you got to the desired Nth item. For large collections, that might be a bit faster and would avoid creating the temporary array of keys at the expense of a little more code, though on average it would still be proportional to the size/2 of the collection.
// returns random key from Set or Map
function getRandomKey(collection) {
let index = Math.floor(Math.random() * collection.size);
let cntr = 0;
for (let key of collection.keys()) {
if (cntr++ === index) {
return key;
}
}
}

There's a short neat ES6+ version of the answer above:
const getRandomItem = iterable => iterable.get([...iterable.keys()][Math.floor(Math.random() * iterable.size)])
Works for Maps as well as for Sets (where keys() is an alias for value() method)

This is the short answer for Sets:
const getRandomItem = set => [...set][Math.floor(Math.random()*set.size)]

Related

Removing the first half of the entries in LinkedHashMap other than looping

I was going to use Hashtable but some existing answer said only LinkedHashMap preserve the insertion order. So, it seems that I can get the insertion order with the entries or keys properties.
My question is, when the map has n elements, if I want to remove the first n/2 elements, is there a better way than looping through the keys and repeatedly calling remove(key)? That is, something like this
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
a.removeRange(0,n/2);
instead of
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
var i = 0;
var keysToRemove= ArrayList<Int>();
for(k in a.keys)
{
if(i >= n/2)
break;
else
i++
keysToRemove.add(k);
}
for(k in keysToRemove)
{
a.remove(k);
}
The purpose of this is that I use the map as a cache, and when the cache is full, I want to purge the oldest half of the entries. I do not have to use LinkedHashMap as long as I can:
Find the value using a key, efficiently.
Remove a range of entries at once.
There's no method in the class that makes this possible. The source code doesn't have any operations for ranges of keys or entries. Since the linking is built on top of the HashMap logic, individual entries still have to be individuatlly found by a hashed key lookup to remove them, so being able to remove a range couldn't be done faster in a LinkedHashMap, which is unlike the analogy of a LinkedList to an ArrayList.
For simpler code that's equivalent to what you're doing:
a.keys.take(a.size / 2).forEach(a::remove)
If you don't want to use a library for a cache set, LinkedHashSet is designed so you can easily build your own by subclassing. For instance, a basic one that simply removes the oldest entry when you add elements above a certain collection size:
class CacheHashMap<K, V>(private var maxSize: Int): LinkedHashMap<K, V>() {
override fun removeEldestEntry(eldest: MutableMap.MutableEntry<K, V>?): Boolean =
size == maxSize
}
Also, if you set accessOrder to true in your constructor call, it orders by last used to most recently used entry, which might be more apt for your situation than insertion order.
EDIT: sorry I missed the part about using this as an LRU cache, for that use case, TreeMap will not be suitable.
If insertion order is just incidental for you, and what you want is in fact the actual order of comparable keys, you should use a TreeMap instead.
However, the specific use case of removing half the keys might not be supported directly. You will rather find methods to remove keys below/above a certain value, and get the highest/lowest keys.

complex reduce sample unclear how the reduce works

Starting with complex reduce sample
I have trimmed it down to a single chart and I am trying to understand how the reduce works
I have made comments in the code that were not in the example denoting what I think is happening based on how I read the docs.
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);//set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for remove below
return function(elements, item) {
//get the position of the key value for this element in the sorted array and splice it out
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
function groupArrayInit() {
//for each key found by the key function return this array?
return []; //the result array for where the data is being inserted in sorted order?
}
I am not quite sure my perception of how this is working is quite right. Some of the magic isn't showing itself. Am I correct that elements is the group the reduce function is being called on ? also the array in groupArrayInit() how is it being indirectly populated?
Part of me feels that the functions supplied to the reduce call are really array.map functions not array.reduce functions but I just can't quite put my finger on why. having read the docs I am just not making a connection here.
Any help would be appreciated.
Also have I missed Pens/Fiddles that are created for all these examples? like this one
http://dc-js.github.io/dc.js/examples/complex-reduce.html which is where I started with this but had to download the csv and manually convert to Json.
--------------Update
I added some print statements to try to clarify how the add function is working
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn); //set the bisector value function
//elements is the group that we are reducing,item is the current item
//this is a the reduce function being supplied to the reduce call on the group runAvgGroup for add below
return function(elements, item) {
console.log("---Start Elements and Item and keyfn(item)----")
console.log(elements) //elements grouped by run?
console.log(item) //not seeing the pattern on what this is on each run
console.log(keyfn(item))
console.log("---End----")
//get the position of the key value for this element in the sorted array and put it there
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
and to print out the group's contents
console.log("RunAvgGroup")
console.log(runAvgGroup.top(Infinity))
which results in
Which appears to be incorrect b/c the values are not sorted by key (the run number)?
And looking at the results of the print statements doesn't seem to help either.
This looks basically right to me. The issues are just conceptual.
Crossfilter’s group.reduce is not exactly like either Array.reduce or Array.map. Group.reduce defines methods for handling adding new records to a group or removing records from a group. So it is conceptually similar to an incremental Array.reduce that supports an reversal operation. This allows filters to be applied and removed.
Group.top returns your list of groups. The value property of these groups should be the elements value that your reduce functions return. The key of the group is the value returned by your group accessor (defined in the dimension.group call that creates your group) or your dimension accessor if you didn’t define a group accessor. Reduce functions work only on the group values and do not have direct access to the group key.
So check those values in the group.top output and hopefully you’ll see the lists of elements you expect.

Swift dictionary keys are integers

I have an app that I'm converting from Swift2 to Swift3. One thing I need to do is the following:
dict.enumerateKeysAndObjects({ (key, rotationString, stop) -> Void in
//code goes here to use key as an integer
)
How do I do that? Also, rotationString will also be an integer.
Background:
In the Swift2 version of the app I was saving data to defaults thusly:
func saveGame() {
let defaults = UserDefaults.standard
defaults.set(blackRotations, forKey: "blackRotations")
defaults.set(whiteRotations, forKey: "whiteRotations")
etc.
Here, whiteRotations and blackRotations were NSDictionary objects where the actual data being saved had keys that were NSNumbers and values that were also NSNumbers.
So, what I need to do is to handle loading the saved game from UserDefaults. The values for blackRotations and whiteRotations were dictionaries when the game was saved. The structure of this dictionary was a bunch of integer pairs. You can think of them as integer lookup tables saved as dictionaries.
So all I really am asking for is how to load this data from UserDefaults and be able to treat both the keys and values as integers.
Possible solution? I'm still working on trying to get this to work, but I'll post here anyway, in the hope that it will help to illustrate what I am trying to do.
In the method that loads the saved state, I think I need to do something along these lines:
if let blackDict = (defaults.dictionary(forKey: "blackRotations") as? [String:Int]) {
for (keyString, rotation) in blackDict {
blackRotations[Int(keyString)!] = rotation
}
}

java 8: How to convert following code to functional?

Instead of using the for loop, how do I use the Stream API of Java 8 on array of booleans? How do I use methods such as forEach, reduce etc.?
I want to get rid of the two variables totalRelevant and retrieved which I am using to maintain state.
As in a lambda expression, we can only reference final variables from its lexical context.
import java.util.Arrays;
import java.util.List;
public class IRLab {
public static void main(String[] args) {
// predefined list of either document is relevant or not
List<Boolean> documentRelivency = Arrays.asList(true, false, true, true, false);
System.out.println("Precision\tRecall\tF-Measure");
// variables for output
double totalRelevant = 0.0;
double retrieved = 0.0;
for (int i = 0; i < documentRelivency.size(); ++i) {
Boolean isRelevant = documentRelivency.get(i);
// check if document is relevant
if (isRelevant) totalRelevant += 1;
// total number of retrieved documents will be equal to
// number of document being processed currently, i.e. retrieved = i + 1
retrieved += 1;
// storing values using formulas
double precision = totalRelevant / retrieved;
double recall = totalRelevant / totalRelevant;
double fmeasure = (2 * precision * recall) / (precision + recall);
// Printing the final calculated values
System.out.format("%9.2f\t%.2f\t%.2f\t\n", precision, recall, fmeasure);
}
}
}
How do I convert above code to functional code using the Java 8 Stream API and Lambda Expressions? I need to maintain state for two variables as above.
Generally, converting imperative to a functional code will only be an improvement when you manage to get rid of mutable state that causes the processing of one element to depend on the processing of the previous one.
There are workarounds that allow you to incorporate mutable state, but you should first try to find a different representation of your problem that works without. In your example, the processing of each element depends on two values, totalRelevant and retrieved. The latter is just an ascending number and therefore can be represented as a range, e.g. IntStream.range(startValue, endValue). The second stems from your list of boolean values and is the number of true value inside the sublist (0, retrieved)(inclusive).
You could recalculate that value without needing the previous value, but reiterating the list in each step could turn out to be expensive. So instead, collect your list into a single int number representing a bitset first, i.e. [true, false, true, true, false] becomes 0b_10110. Then, you can get the number of one bits using intrinsic operations:
List<Boolean> documentRelivency = Arrays.asList(true, false, true, true, false);
int numBits=documentRelivency.size(), bitset=IntStream.range(0, numBits)
.map(i -> documentRelivency.get(i)? 1<<(numBits-i-1): 0).reduce(0, (i,j) -> i|j);
System.out.println("Precision\tRecall\tF-Measure");
IntStream.rangeClosed(1, numBits)
.mapToObj(retrieved -> {
double totalRelevant = Integer.bitCount(bitset&(-1<<(numBits-retrieved)));
return String.format("%9.2f\t%.2f\t%.2f",
totalRelevant/retrieved, 1f, 2/(1+retrieved/totalRelevant));
})
.forEach(System.out::println);
This way, you have expressed the entire operation in a functional way where the processing of one element does not depend on the previous one. It could even run in parallel, though this would offer no benefit here.
If the list size exceeds 32, you have to resort to long, or java.util.BitSet for more than 64.
But the whole operation is more an example of how to change the thinking from “this is a number I increment in each iteration” to “I’m processing a continuous range of values” and from “this is a number I increment when the element is true” to “this is the count of true values in a range of this list”.
It's unclear why you need to change your code to lambdas. Currently it's quite short and lambdas will not make it shorter or cleaner. However if you really want, you may encapsulate your shared state in the separate object:
static class Stats {
private int totalRelevant, retrieved;
public void add(boolean relevant) {
if(relevant)
totalRelevant++;
retrieved++;
}
public double getPrecision() {
return ((double)totalRelevant) / retrieved;
}
public double getRecall() {
return 1.0; // ??? was totalRelevant/totalRelevant in original code
}
public double getFMeasure() {
double precision = getPrecision();
double recall = getRecall();
return (2 * precision * recall) / (precision + recall);
}
}
And use with lambda like this:
Stats stats = new Stats();
documentRelivency.forEach(relevant -> {
stats.add(relevant);
System.out.format("%9.2f\t%.2f\t%.2f\t\n", stats.getPrecision(),
stats.getRecall(), stats.getFMeasure());
});
Lambda is here, but not Stream API. Seems that involving Stream API for such problem is not very good idea as you need to output the intermediate states of mutable container which should be mutated strictly in given order. Well, if you desperately need Stream API, replace .forEach with .stream().forEachOrdered.

How to filter empty groups from a reduction?

It appears to me that Crossfilter never excludes a group from the results of a reduction, even if the applied filters have excluded all the rows in that group. Groups that have had all of their rows filtered out simply return an aggregate value of 0 (or whatever reduceInitial returns).
The problem with this is that it makes it impossible to distinguish between groups that contain no rows and groups that do contain rows but just legitimately aggregate to a value of 0. Basically, there's no way (that I can see) to distinguish between a null value and a 0 aggregation.
Does anybody know of a built-in Crossfilter technique for achieving this? I did come up with a way to do this with my own custom reduceInitial/reduceAdd/reduceRemove method but it wasn't totally straight forward and it seemed to me that this is behavior that might/should be more native to Crossfilter's filtering semantics. So I'm wondering if there's a canonical way to achieve this.
I'll post my technique as an answer if it turns out that there is no built-in way to do this.
A simple way to accomplish this is to have both count and total be reduce attributes:
var dimGroup = dim.group().reduce(reduceAdd, reduceRemove, reduceInitial);
function reduceAdd(p, v) {
++p.count;
p.total += v.value;
return p;
}
function reduceRemove(p, v) {
--p.count;
p.total -= v.value;
return p;
}
function reduceInitial() {
return {count: 0, total: 0};
}
Empty groups will have zero counts, so retrieving only non-empty groups is easy:
dimGroup.top(Infinity).filter(function(d) { return d.value.count > 0; });
OK, there doesn't seem to be any obvious answer jumping out so I'll answer my own question and post the technique I used to solve this.
This example assumes that I've already created a dimension and grouping, which is passed in as groupDim. Because I want to be able to sum up any arbitrary numeric field, I also pass in fieldName so that it will be available in the closure scope of my the reduction functions.
One important characteristic of this technique is that it relies on there being a way to uniquely identify which group each row belongs to. Thinking in term of OLAP, this is essentially the "tuple" that defines a particular aggregation context. But it can be anything you want as long as it deterministically returns the same value for all data rows belonging to a given group.
The end result is that empty groups will have an aggregate value of "null" which can be easily detected for and filtered out after the fact. Any group with at least one row will have a numeric value (even if it happens to be zero).
Refinements or suggestions to this are more then welcome. Here's the code with comments inline:
function configureAggregateSum(groupDim, fieldName) {
function getGroupKey(datum) {
// Given datum return key corresponding to the group to which the datum belongs
}
// This object will keep track of the number of times each group had reduceAdd
// versus reduceRemove called. It is used to revert the running aggregate value
// back to "null" if the count hits zero. This is unfortunately necessary because
// Crossfilter filters as it is aggregating so reduceAdd can be called even if, in
// the end, all records in a group end up being filtered out.
//
var groupCount = {};
function reduceAdd(p, v) {
// Here's the code that keeps track of the invocation count per group
var groupKey = getGroupKey(v);
if (groupCount[groupKey] === undefined) { groupCount[groupKey] = 0; }
groupCount[groupKey]++;
// And here's the implementation of the add reduction (sum in my case)
// Note the check for null (our initial value)
var value = +v[fieldName];
return p === null ? value : p + value;
}
function reduceRemove(p, v) {
// This code keeps track of invocations of invocation count per group and, importantly,
// reverts value back to "null" if it hits 0 for the group. Essentially, if we detect
// that group has no records again we revert to the initial value.
var groupKey = getGroupKey(v);
groupCount[groupKey]--;
if (groupCount[groupKey] === 0) {
return null;
}
// And here's the code for the remove reduction (sum in my case)
var value = +v[fieldName];
return p - value;
}
function reduceInitial() {
return null;
}
// Once returned, can invoke all() or top() to get the values, which can then be filtered
// using a native Array.filter to remove the groups with null value.
return groupedDim.reduce(reduceAdd, reduceRemove, reduceInitial);
}

Resources