Algorithm used to calculate hashcode for segments in ConcurrentHashMap in Java - concurrenthashmap

What is the Algorithm used to calculate hashcode for segments in Concurrent HashMap in Java ?

Firstly we know that Concurrent HashMap is divided into a finite number of segments.
Segment is a final class inside Concurrent HashMap .
The definition of Segment is as below:
/** Inner Segment class plays a significant role **/
protected static final class Segment {
protected int count;
protected synchronized int getCount() {
return this.count;
}
protected synchronized void synch() {}
}
/** Segment Array declaration **/
public final Segment[] segments = new Segment[32];//By default
// i am taking as 32.
Let me explain by taking put method of ConcurrentHashMap class.
put(Object key, Object value)
Before placing this map into anyone one of those 32 segments we need to
calculate the hashcode right.
First we calculate the hash of key:
int hashVal = hash(key);
static int hash(Object x) {
int h = x.hashCode();
return (h << 7) - h + (h >>> 9) + (h >>> 17);
}
After getting the hashVal we can decide the Segment as below:
Segment seg = segments[(hash & 0x1F)];
// segments is an array defined above
This is just for understanding refer oracle docs for their practices.

Related

Breadth first traversal of arbitrary graph with minimal memory

I have an enormous directed graph I need to traverse in search for the shortest path to a specific node from a given starting point. The graph in question does not exist explicitly; the child nodes are determined algorithmically from the parent nodes.
(To give an illustration: imagine a graph of chess positions. Each node is a chess position and its children are all the legal moves from that position.)
So I have a queue for open nodes, and every time I process the next node in the queue I enqueue all of its children. But since the graph can have cycles I also need to maintain a hashset of all visited nodes so I can check if I have visited one before.
This works okay, but since this graph is so large, I run into memory problems. All of the nodes in the queue are also stored in the hashset, which tends to be around 50% of the total number or visited nodes in practice in my case.
Is there some magical way to get rid of this redundancy while keeping the speed of the hashset? (Obviously, I could get rid of the redundancy by NOT hashing and just doing a linear search, but that is out of the question.)
I solved it by writing a class that stores the keys in a list and stores the indices of the keys in a hashtable. The next node "in the queue" is always the the next node in the list until you find what you're looking for or you've traversed the entire graph.
class IndexMap<T>
{
private List<T> values;
private LinkedList<int>[] buckets;
public int Count { get; private set; } = 0;
public IndexMap(int capacity)
{
values = new List<T>(capacity);
buckets = new LinkedList<int>[NextPowerOfTwo(capacity)];
for (int i = 0; i < buckets.Length; ++i)
buckets[i] = new LinkedList<int>();
}
public void Add(T item) //assumes item is not yet in map
{
if (Count == buckets.Length)
ReHash();
int bucketIndex = item.GetHashCode() & (buckets.Length - 1);
buckets[bucketIndex].AddFirst(Count++);
values.Add(item);
}
public bool Contains(T item)
{
int bucketIndex = item.GetHashCode() & (buckets.Length - 1);
foreach(int i in buckets[bucketIndex])
{
if (values[i].Equals(item))
return true;
}
return false;
}
public T this[int index]
{
get => values[index];
}
private void ReHash()
{
LinkedList<int>[] newBuckets = new LinkedList<int>[2 * buckets.Length];
for (int i = 0; i < newBuckets.Length; ++i)
newBuckets[i] = new LinkedList<int>();
for (int i = 0; i < buckets.Length; ++i)
{
foreach (int index in buckets[i])
{
int bucketIndex = values[index].GetHashCode() & (newBuckets.Length - 1);
newBuckets[bucketIndex].AddFirst(index);
}
buckets[i] = null;
}
buckets = newBuckets;
}
private int NextPowerOfTwo(int n)
{
if ((n & n-1) == 0)
return n;
int output = 0;
while (n > output)
{
output <<= 1;
}
return output;
}
}
The old method of maintaining both an array of the open nodes and a hashtable of the visited nodes needed n*(1+a)*size(T) space, where a is the ratio of nodes_in_the_queue over total_nodes_found and size(T) is the size of a node.
This method needs n*(size(T) + size(int)). If your nodes are significantly larger than an int, this can save a lot.

Recursive factorial returns 0 for large input

The answer returned by the following Java code is 0. Can anyone help me find the error?
public class ComplexityOrder {
public static void main(String[] args) {
ComplexityOrder co = new ComplexityOrder();
co.Order(1000);
}
public double Order(int n) {
int[] a = new int[10];
a[0] = Fact(n);
System.out.println("Factorial " + a[0]);
return a[0];
}
public static int Fact(int n) {
if (n == 0 || n ==1) {
return 1;
} else {
return n * Fact(n - 1);
}
}
}
The max value int can contain is 2^32 and 1000! is too big for int to contain it. You can use java.math.BigInteger for the purpose. The BigInteger class allocates as much memory as it needs to hold all the bits of data it is asked to hold. There are, however, some practical limits, dictated by the memory available.
Using BigInteger your code will somewhat look like:
import java.math.BigInteger;
public class ComplexityOrder {
public static void main(String[] args) {
ComplexityOrder co = new ComplexityOrder();
co.Order(1000);
}
public BigInteger Order(int n) {
BigInteger[] a = new BigInteger[10];
a[0] = fact(n);
System.out.println("Factorial " + a[0]);
return a[0];
}
public static BigInteger fact(int n) {
if (n == 0 || n ==1) {
return BigInteger.ONE;
} else {
return fact(n-1).multiply(BigInteger.valueOf(n));
}
}
}
Also, I don't see any point using the array.
that is because of the overflow of int variable that maximum contain number = 2^32 , and Fact(1000) is more than Max int, if you don't acquire numbers leas than 100 you can use BigInteger class instead of int , if you acquire big numbers you have to implement your string addition function to avoid overflow .
To be more specific ...
You are using standard integers, an n-bit signed binary number. You then compute 1000! This is a very large number compared to any standard integer representation. The prime factorization includes 2^994. This means that the resulting number, in binary, ends with a string of 994 zeroes.
When integer overflow isn't handled as an exception, the condition is a highly informal way of reducing your result mod 2^n, where n is the length of the internal representation, usually 32 or 64 bits, and then mapping the higher half of the range to negative numbers. A number that ends in at least n zeroes will get reduced to 0 (mod 2^n). That's what happened in your case, as your computer does not have 1024-bit integers. :-)
As others have already suggested, you can handle this capacity by switching to BigInteger and adjusting your class to deal with the expanded range. Do note that it will be much slower, as you are beyond the hardware's native integer range, and the processing resembles doing all operations by hand in base 2^n. "Write down the 00110111001010010110110001010110, carry the 1, and on to the next column." :-)

Is the following approach dynamic programming

As far as I know, DP is either you start with bigger problem and recursively come down, and keep saving the value each time for future use or you do it iteratively and keep saving values bottom up. But what if I am doing it bottom up but recursively going up?
Say for example the following question, Longest Common Subsequence
Here's my solution
public class LongestCommonSubseq {
/**
* #param args
*/
public static List<Character> list = new ArrayList<Character>();
public static int[][] M = new int[7][7];
public static void main(String[] args) {
String s1 = "ABCDGH";
String s2 = "AEDFHR";
for(int i=0;i<=6;i++)
for(int j=0;j<=6;j++)
M[i][j] = -1;
int max = getMax(s1,s2,0,0);
System.out.println(max);
Collections.sort(list);
for(int i = 0;i < max;i++)
System.out.println(list.get(i));
}
public static int getMax(String s1, String s2,int i ,int j){
if(i >= s1.length() || j>= s2.length()){
M[i][j] = 0;
return M[i][j];
}
if(M[i][j] != -1)
return M[i][j];
if(s1.charAt(i) == s2.charAt(j)){
M[i][j] = 1 + getMax(s1,s2,i+1,j+1);
list.add(s1.charAt(i));
}
else
M[i][j] = max(getMax(s1,s2,i+1,j) , getMax(s1, s2, i, j+1));
return M[i][j];
}
public static int max(int a,int b){
return a > b ? a : b;
}
}
So you see,I am going from M[0][0] in the other direction but I am not doing it iteratively.
But I guess it should be fine. Just needed to confirm.
Thanks
The direction does not matter. What is more important is that you go from more general(complex) problem to simpler ones. What you have done is dynamic programming.
For dynamic programming it doesn't matter if you follow the bottom-up or top-down-paradigm. The basic thesis (like you have correctly mentioned) of dynamic programming is known as Bellman's Principle of Optimality which is the following:
Principle of Optimality: An optimal policy has the property that
whatever the initial state and initial decision are, the remaining
decisions must constitute an optimal policy with regard to the state
resulting from the first decision.
Resource: Wikipedia (http://en.wikipedia.org/wiki/Bellman_equation#Bellman.27s_Principle_of_Optimality)
An great approach to cut of some of these optimal sub-solutions from the recursive-call-tree is to use Caching (like in your code).

How to measure the rate of events through a system

I need to measure that rate at which a software system is consuming messages from a message queue and report on that periodically.
Specifically, messages arrive from a message queueing system and I need to report (each second) on the number of messages received within a number of rolling windows - e.g. the last second, the last 5 seconds, the last 30 seconds, etc.
Whilst I'm sure I could build this, I'm not certain that I'd go about it in the most efficient manner! I'm also sure that there are libraries for doing this (I'm using the JVM, so Apache Commons Math springs to mind), but I don't even know the right words to Google for! :-)
Here is my solution based on exponential smoothing. It doesn't require any background threads. You would create 1 instance for each rolling window that you want to track. For each relevant event you would call newEvent on each instance.
public class WindowedEventRate {
private double normalizedRate; // event rate / window
private long windowSizeTicks;
private long lastEventTicks;
public WindowedEventRate(int aWindowSizeSeconds) {
windowSizeTicks = aWindowSizeSeconds * 1000L;
lastEventTicks = System.currentTimeMillis();
}
public double newEvent() {
long currentTicks = System.currentTimeMillis();
long period = currentTicks - lastEventTicks;
lastEventTicks = currentTicks;
double normalizedFrequency = (double) windowSizeTicks / (double) period;
double alpha = Math.min(1.0 / normalizedFrequency, 1.0);
normalizedRate = (alpha * normalizedFrequency) + ((1.0 - alpha) * normalizedRate);
return getRate();
}
public double getRate() {
return normalizedRate * 1000L / windowSizeTicks;
}
}
This is what I ended up writing.
package com.example;
import java.util.Arrays;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
public class BucketCounter {
private final Lock rollLock = new ReentrantLock();
private final int[] bucketSizes;
private final int[] buckets;
private final int[] intervals;
private final AtomicInteger incoming = new AtomicInteger(0);
public BucketCounter(int... bucketSizes) {
if (bucketSizes.length < 1) {
throw new IllegalArgumentException("Must specify at least one bucket size");
}
this.bucketSizes = bucketSizes;
this.buckets = new int[bucketSizes.length];
Arrays.sort(bucketSizes);
if (bucketSizes[0] < 1) {
throw new IllegalArgumentException("Cannot have a bucket of size < 1");
}
intervals = new int[bucketSizes[bucketSizes.length - 1]];
}
public int count(int n) {
return incoming.addAndGet(n);
}
public int[] roll() {
final int toAdd = incoming.getAndSet(0);
rollLock.lock();
try {
final int[] results = new int[buckets.length];
for (int i = 0, n = buckets.length; i < n; i++) {
results[i] = buckets[i] = buckets[i] - intervals[bucketSizes[i] - 1] + toAdd;
}
System.arraycopy(intervals, 0, intervals, 1, intervals.length - 1);
intervals[0] = toAdd;
return results;
} finally {
rollLock.unlock();
}
}
}
Initialise it by passing the different time increments (e.g. 1, 5, 30). Then arrange for a background thread to call roll() every "time period". If you call it every second, then your buckets are 1, 5 and 30 seconds. If you call it every 5 seconds, then your buckets are 5, 25 and 150 seconds, etc. Basically, the buckets are expressed in "number of times roll() is called").
roll() also returns you an array of the current counts for each bucket. Note that these numbers are the raw counts, and are not averaged per time interval. You'll need to do that division yourself if you want to measure "rates" rather than "counts".
Finally, every time an event happens, call count(). I've set up a system with a few of these and I call count(1) on each message to count incoming messages, count(message.size()) on each message to count incoming byte rates, etc.
Hope that helps.
You could probably implement it as an interceptor, so search for interceptor combined with the message queue product name and the language name.

Size-limited queue that holds last N elements in Java

A very simple & quick question on Java libraries: is there a ready-made class that implements a Queue with a fixed maximum size - i.e. it always allows addition of elements, but it will silently remove head elements to accomodate space for newly added elements.
Of course, it's trivial to implement it manually:
import java.util.LinkedList;
public class LimitedQueue<E> extends LinkedList<E> {
private int limit;
public LimitedQueue(int limit) {
this.limit = limit;
}
#Override
public boolean add(E o) {
super.add(o);
while (size() > limit) { super.remove(); }
return true;
}
}
As far as I see, there's no standard implementation in Java stdlibs, but may be there's one in Apache Commons or something like that?
Apache commons collections 4 has a CircularFifoQueue<> which is what you are looking for. Quoting the javadoc:
CircularFifoQueue is a first-in first-out queue with a fixed size that replaces its oldest element if full.
import java.util.Queue;
import org.apache.commons.collections4.queue.CircularFifoQueue;
Queue<Integer> fifo = new CircularFifoQueue<Integer>(2);
fifo.add(1);
fifo.add(2);
fifo.add(3);
System.out.println(fifo);
// Observe the result:
// [2, 3]
If you are using an older version of the Apache commons collections (3.x), you can use the CircularFifoBuffer which is basically the same thing without generics.
Update: updated answer following release of commons collections version 4 that supports generics.
Guava now has an EvictingQueue, a non-blocking queue which automatically evicts elements from the head of the queue when attempting to add new elements onto the queue and it is full.
import java.util.Queue;
import com.google.common.collect.EvictingQueue;
Queue<Integer> fifo = EvictingQueue.create(2);
fifo.add(1);
fifo.add(2);
fifo.add(3);
System.out.println(fifo);
// Observe the result:
// [2, 3]
I like #FractalizeR solution. But I would in addition keep and return the value from super.add(o)!
public class LimitedQueue<E> extends LinkedList<E> {
private int limit;
public LimitedQueue(int limit) {
this.limit = limit;
}
#Override
public boolean add(E o) {
boolean added = super.add(o);
while (added && size() > limit) {
super.remove();
}
return added;
}
}
Use composition not extends (yes I mean extends, as in a reference to the extends keyword in java and yes this is inheritance). Composition is superier because it completely shields your implementation, allowing you to change the implementation without impacting the users of your class.
I recommend trying something like this (I'm typing directly into this window, so buyer beware of syntax errors):
public LimitedSizeQueue implements Queue
{
private int maxSize;
private LinkedList storageArea;
public LimitedSizeQueue(final int maxSize)
{
this.maxSize = maxSize;
storageArea = new LinkedList();
}
public boolean offer(ElementType element)
{
if (storageArea.size() < maxSize)
{
storageArea.addFirst(element);
}
else
{
... remove last element;
storageArea.addFirst(element);
}
}
... the rest of this class
A better option (based on the answer by Asaf) might be to wrap the Apache Collections CircularFifoBuffer with a generic class. For example:
public LimitedSizeQueue<ElementType> implements Queue<ElementType>
{
private int maxSize;
private CircularFifoBuffer storageArea;
public LimitedSizeQueue(final int maxSize)
{
if (maxSize > 0)
{
this.maxSize = maxSize;
storateArea = new CircularFifoBuffer(maxSize);
}
else
{
throw new IllegalArgumentException("blah blah blah");
}
}
... implement the Queue interface using the CircularFifoBuffer class
}
The only thing I know that has limited space is the BlockingQueue interface (which is e.g. implemented by the ArrayBlockingQueue class) - but they do not remove the first element if filled, but instead block the put operation until space is free (removed by other thread).
To my knowledge your trivial implementation is the easiest way to get such an behaviour.
You can use a MinMaxPriorityQueue from Google Guava, from the javadoc:
A min-max priority queue can be configured with a maximum size. If so, each time the size of the queue exceeds that value, the queue automatically removes its greatest element according to its comparator (which might be the element that was just added). This is different from conventional bounded queues, which either block or reject new elements when full.
An LRUMap is another possibility, also from Apache Commons.
http://commons.apache.org/collections/apidocs/org/apache/commons/collections/map/LRUMap.html
Ok I'll share this option. This is a pretty performant option - it uses an array internally - and reuses entries. It's thread safe - and you can retrieve the contents as a List.
static class FixedSizeCircularReference<T> {
T[] entries
FixedSizeCircularReference(int size) {
this.entries = new Object[size] as T[]
this.size = size
}
int cur = 0
int size
synchronized void add(T entry) {
entries[cur++] = entry
if (cur >= size) {
cur = 0
}
}
List<T> asList() {
int c = cur
int s = size
T[] e = entries.collect() as T[]
List<T> list = new ArrayList<>()
int oldest = (c == s - 1) ? 0 : c
for (int i = 0; i < e.length; i++) {
def entry = e[oldest + i < s ? oldest + i : oldest + i - s]
if (entry) list.add(entry)
}
return list
}
}
public class ArrayLimitedQueue<E> extends ArrayDeque<E> {
private int limit;
public ArrayLimitedQueue(int limit) {
super(limit + 1);
this.limit = limit;
}
#Override
public boolean add(E o) {
boolean added = super.add(o);
while (added && size() > limit) {
super.remove();
}
return added;
}
#Override
public void addLast(E e) {
super.addLast(e);
while (size() > limit) {
super.removeLast();
}
}
#Override
public boolean offerLast(E e) {
boolean added = super.offerLast(e);
while (added && size() > limit) {
super.pollLast();
}
return added;
}
}

Resources