java 8: How to convert following code to functional? - functional-programming

Instead of using the for loop, how do I use the Stream API of Java 8 on array of booleans? How do I use methods such as forEach, reduce etc.?
I want to get rid of the two variables totalRelevant and retrieved which I am using to maintain state.
As in a lambda expression, we can only reference final variables from its lexical context.
import java.util.Arrays;
import java.util.List;
public class IRLab {
public static void main(String[] args) {
// predefined list of either document is relevant or not
List<Boolean> documentRelivency = Arrays.asList(true, false, true, true, false);
System.out.println("Precision\tRecall\tF-Measure");
// variables for output
double totalRelevant = 0.0;
double retrieved = 0.0;
for (int i = 0; i < documentRelivency.size(); ++i) {
Boolean isRelevant = documentRelivency.get(i);
// check if document is relevant
if (isRelevant) totalRelevant += 1;
// total number of retrieved documents will be equal to
// number of document being processed currently, i.e. retrieved = i + 1
retrieved += 1;
// storing values using formulas
double precision = totalRelevant / retrieved;
double recall = totalRelevant / totalRelevant;
double fmeasure = (2 * precision * recall) / (precision + recall);
// Printing the final calculated values
System.out.format("%9.2f\t%.2f\t%.2f\t\n", precision, recall, fmeasure);
}
}
}
How do I convert above code to functional code using the Java 8 Stream API and Lambda Expressions? I need to maintain state for two variables as above.

Generally, converting imperative to a functional code will only be an improvement when you manage to get rid of mutable state that causes the processing of one element to depend on the processing of the previous one.
There are workarounds that allow you to incorporate mutable state, but you should first try to find a different representation of your problem that works without. In your example, the processing of each element depends on two values, totalRelevant and retrieved. The latter is just an ascending number and therefore can be represented as a range, e.g. IntStream.range(startValue, endValue). The second stems from your list of boolean values and is the number of true value inside the sublist (0, retrieved)(inclusive).
You could recalculate that value without needing the previous value, but reiterating the list in each step could turn out to be expensive. So instead, collect your list into a single int number representing a bitset first, i.e. [true, false, true, true, false] becomes 0b_10110. Then, you can get the number of one bits using intrinsic operations:
List<Boolean> documentRelivency = Arrays.asList(true, false, true, true, false);
int numBits=documentRelivency.size(), bitset=IntStream.range(0, numBits)
.map(i -> documentRelivency.get(i)? 1<<(numBits-i-1): 0).reduce(0, (i,j) -> i|j);
System.out.println("Precision\tRecall\tF-Measure");
IntStream.rangeClosed(1, numBits)
.mapToObj(retrieved -> {
double totalRelevant = Integer.bitCount(bitset&(-1<<(numBits-retrieved)));
return String.format("%9.2f\t%.2f\t%.2f",
totalRelevant/retrieved, 1f, 2/(1+retrieved/totalRelevant));
})
.forEach(System.out::println);
This way, you have expressed the entire operation in a functional way where the processing of one element does not depend on the previous one. It could even run in parallel, though this would offer no benefit here.
If the list size exceeds 32, you have to resort to long, or java.util.BitSet for more than 64.
But the whole operation is more an example of how to change the thinking from “this is a number I increment in each iteration” to “I’m processing a continuous range of values” and from “this is a number I increment when the element is true” to “this is the count of true values in a range of this list”.

It's unclear why you need to change your code to lambdas. Currently it's quite short and lambdas will not make it shorter or cleaner. However if you really want, you may encapsulate your shared state in the separate object:
static class Stats {
private int totalRelevant, retrieved;
public void add(boolean relevant) {
if(relevant)
totalRelevant++;
retrieved++;
}
public double getPrecision() {
return ((double)totalRelevant) / retrieved;
}
public double getRecall() {
return 1.0; // ??? was totalRelevant/totalRelevant in original code
}
public double getFMeasure() {
double precision = getPrecision();
double recall = getRecall();
return (2 * precision * recall) / (precision + recall);
}
}
And use with lambda like this:
Stats stats = new Stats();
documentRelivency.forEach(relevant -> {
stats.add(relevant);
System.out.format("%9.2f\t%.2f\t%.2f\t\n", stats.getPrecision(),
stats.getRecall(), stats.getFMeasure());
});
Lambda is here, but not Stream API. Seems that involving Stream API for such problem is not very good idea as you need to output the intermediate states of mutable container which should be mutated strictly in given order. Well, if you desperately need Stream API, replace .forEach with .stream().forEachOrdered.

Related

Removing the first half of the entries in LinkedHashMap other than looping

I was going to use Hashtable but some existing answer said only LinkedHashMap preserve the insertion order. So, it seems that I can get the insertion order with the entries or keys properties.
My question is, when the map has n elements, if I want to remove the first n/2 elements, is there a better way than looping through the keys and repeatedly calling remove(key)? That is, something like this
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
a.removeRange(0,n/2);
instead of
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
var i = 0;
var keysToRemove= ArrayList<Int>();
for(k in a.keys)
{
if(i >= n/2)
break;
else
i++
keysToRemove.add(k);
}
for(k in keysToRemove)
{
a.remove(k);
}
The purpose of this is that I use the map as a cache, and when the cache is full, I want to purge the oldest half of the entries. I do not have to use LinkedHashMap as long as I can:
Find the value using a key, efficiently.
Remove a range of entries at once.
There's no method in the class that makes this possible. The source code doesn't have any operations for ranges of keys or entries. Since the linking is built on top of the HashMap logic, individual entries still have to be individuatlly found by a hashed key lookup to remove them, so being able to remove a range couldn't be done faster in a LinkedHashMap, which is unlike the analogy of a LinkedList to an ArrayList.
For simpler code that's equivalent to what you're doing:
a.keys.take(a.size / 2).forEach(a::remove)
If you don't want to use a library for a cache set, LinkedHashSet is designed so you can easily build your own by subclassing. For instance, a basic one that simply removes the oldest entry when you add elements above a certain collection size:
class CacheHashMap<K, V>(private var maxSize: Int): LinkedHashMap<K, V>() {
override fun removeEldestEntry(eldest: MutableMap.MutableEntry<K, V>?): Boolean =
size == maxSize
}
Also, if you set accessOrder to true in your constructor call, it orders by last used to most recently used entry, which might be more apt for your situation than insertion order.
EDIT: sorry I missed the part about using this as an LRU cache, for that use case, TreeMap will not be suitable.
If insertion order is just incidental for you, and what you want is in fact the actual order of comparable keys, you should use a TreeMap instead.
However, the specific use case of removing half the keys might not be supported directly. You will rather find methods to remove keys below/above a certain value, and get the highest/lowest keys.

How get random item from es6 Map or Set

I have a project that uses arrays of objects that I'm thinking of moving to es6 Sets or Maps.
I need to quickly get a random item from them (obviously trivial for my current arrays). How would I do this?
Maps and Sets are not well suited for random access. They are ordered and their length is known, but they are not indexed for access by an order index. As such, to get the Nth item in a Map or Set, you have to iterate through it to find that item.
The simple way to get a random item from a Set or Map would be to get the entire list of keys/items and then select a random one.
// get random item from a Set
function getRandomItem(set) {
let items = Array.from(set);
return items[Math.floor(Math.random() * items.length)];
}
You could make a version that would work with both a Set and a Map like this:
// returns random key from Set or Map
function getRandomKey(collection) {
let keys = Array.from(collection.keys());
return keys[Math.floor(Math.random() * keys.length)];
}
This is obviously not something that would perform well with a large Set or Map since it has to iterate all the keys and build a temporary array in order to select a random one.
Since both a Map and a Set have a known size, you could also select the random index based purely on the .size property and then you could iterate through the Map or Set until you got to the desired Nth item. For large collections, that might be a bit faster and would avoid creating the temporary array of keys at the expense of a little more code, though on average it would still be proportional to the size/2 of the collection.
// returns random key from Set or Map
function getRandomKey(collection) {
let index = Math.floor(Math.random() * collection.size);
let cntr = 0;
for (let key of collection.keys()) {
if (cntr++ === index) {
return key;
}
}
}
There's a short neat ES6+ version of the answer above:
const getRandomItem = iterable => iterable.get([...iterable.keys()][Math.floor(Math.random() * iterable.size)])
Works for Maps as well as for Sets (where keys() is an alias for value() method)
This is the short answer for Sets:
const getRandomItem = set => [...set][Math.floor(Math.random()*set.size)]

Java8 - get by index but something similar to 'getOrDefault' for Map?

Is there a cleaner way to check whether a value is present at a particular index like list.getOrDefault(index, "defaultValue"). Or even do a default operation when the particular index is out of range of the list.
The normal way to do this is to check for size of the list before attempting this operation.
The default List interface does not have this functionality. There is Iterables.get in Guava:
Iterables.get(iterable, position, defaultValue);
Returns the element at the specified position in iterable or
defaultValue if iterable contains fewer than position + 1 elements.
Throws IndexOutOfBoundsException if position is negative.
If this is functionality you intend to use a lot and can't afford to depend on third-party libraries, you could write your own static method (here inspired by the Guava Lists class):
public class Lists {
public static <E> E getOrDefault(int index, E defaultValue, List<E> list) {
if (index < 0) {
throw new IllegalArgumentException("index is less than 0: " + index);
}
return index <= list.size() - 1 ? list.get(index) : defaultValue;
}
}

Define a new mathematical function in TCL using Tcl_CreateMathFunc

I use TCL 8.4 and for that version I need to add a new mathematical function into TCL interpreter by using TCL library function, particularly Tcl_CreateMathFunc. But I could not find a single example of how it can be done. Please could you write for me a very simple example, assuming that in the C code you have a Tcl_Interp *interp to which you should add a math function (say, a function that multiplies two double numbers).
I once did some alternative implementations of random number generators for Tcl and you can look at some examples at the git repository. The files in generic implement both a tcl command and a tcl math function for each PRNG.
So for instance in the Mersenne Twister implementation, in the package init function we add the new function to the interpreter by declaring
Tcl_CreateMathFunc(interp, "mt_rand", 1, (Tcl_ValueType *)NULL, RandProc, (ClientData)state);
this registers the C function RandProc for us. In this case the function takes no arguments but the seeding equivalent (srand) shows how to handle a single parameter.
/*
* A Tcl math function that implements rand() using the Mersenne Twister
* Pseudo-random number generator.
*/
static int
RandProc(ClientData clientData, Tcl_Interp *interp, Tcl_Value *args, Tcl_Value *resultPtr)
{
State * state = (State *)clientData;
if (! (state->flags & Initialized)) {
unsigned long seed;
/* This is based upon the standard Tcl rand() initializer */
seed = time(NULL) + ((long)Tcl_GetCurrentThread()<<12);
InitState(state, seed);
}
resultPtr->type = TCL_DOUBLE;
resultPtr->doubleValue = RandomDouble(state);
return TCL_OK;
}
Be aware that this is an API that is very unlikely to survive indefinitely (for reasons such as its weird types, inflexible argument handling, and the inability to easily use it from Tcl itself). However, here's how to do an add(x,y) with both arguments being doubles:
Registration
Tcl_ValueType types[2] = { TCL_DOUBLE, TCL_DOUBLE };
Tcl_CreateMathFunc(interp, "add", 2, types, AddFunc, NULL);
Implementation
static int AddFunc(ClientData ignored, Tcl_Interp *interp,
Tcl_Value *args, Tcl_Value *resultPtr) {
double x = args[0].doubleValue;
double y = args[1].doubleValue;
resultPtr->doubleValue = x + y;
resultPtr->type = TCL_DOUBLE;
return TCL_OK;
}
Note that because this API is always working with a fixed number of arguments to the function (and argument type conversions are handled for you) then the code you write can be pretty short. (Writing it to be type-flexible with TCL_EITHER — only permissible in the registration/declaration — makes things quite a lot more complex, and you really are stuck with a fixed argument count.)

What's the opposite of the term "closed over"?

Consider the following (C#) code. The lambda being passed to ConvolutedRand() is said to be "closed over" the variable named format. What term would you use to describe how the variable random is used within MyMethod()?
void MyMethod
{
int random;
string format = "The number {0} inside the lambda scope";
ConvolutedRand(x =>
{
Console.WriteLine(format, x);
random = x;
});
Console.WriteLine("The number is {0} outside the lambda scope", random);
}
void ConvolutedRand(Action<int> action)
{
int random = new Random.Next();
action(random);
}
I typically hear "bound" versus "free", in the context of a particular expression or lexical scope. The lambda closes over both format and random (which are 'free' in the lambda, which is why it closes over them). Inside MyMethod, both variables are just locally bound variables.
That would be a local variable IMO. (Perhaps there is a more scientific name, not free maybe?)

Resources