Removing the first half of the entries in LinkedHashMap other than looping - dictionary

I was going to use Hashtable but some existing answer said only LinkedHashMap preserve the insertion order. So, it seems that I can get the insertion order with the entries or keys properties.
My question is, when the map has n elements, if I want to remove the first n/2 elements, is there a better way than looping through the keys and repeatedly calling remove(key)? That is, something like this
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
a.removeRange(0,n/2);
instead of
val a = LinkedHashMap<Int, Int>();
val n = 10;
for(i in 1 .. n)
{
a[i] = i*10;
}
var i = 0;
var keysToRemove= ArrayList<Int>();
for(k in a.keys)
{
if(i >= n/2)
break;
else
i++
keysToRemove.add(k);
}
for(k in keysToRemove)
{
a.remove(k);
}
The purpose of this is that I use the map as a cache, and when the cache is full, I want to purge the oldest half of the entries. I do not have to use LinkedHashMap as long as I can:
Find the value using a key, efficiently.
Remove a range of entries at once.

There's no method in the class that makes this possible. The source code doesn't have any operations for ranges of keys or entries. Since the linking is built on top of the HashMap logic, individual entries still have to be individuatlly found by a hashed key lookup to remove them, so being able to remove a range couldn't be done faster in a LinkedHashMap, which is unlike the analogy of a LinkedList to an ArrayList.
For simpler code that's equivalent to what you're doing:
a.keys.take(a.size / 2).forEach(a::remove)
If you don't want to use a library for a cache set, LinkedHashSet is designed so you can easily build your own by subclassing. For instance, a basic one that simply removes the oldest entry when you add elements above a certain collection size:
class CacheHashMap<K, V>(private var maxSize: Int): LinkedHashMap<K, V>() {
override fun removeEldestEntry(eldest: MutableMap.MutableEntry<K, V>?): Boolean =
size == maxSize
}
Also, if you set accessOrder to true in your constructor call, it orders by last used to most recently used entry, which might be more apt for your situation than insertion order.

EDIT: sorry I missed the part about using this as an LRU cache, for that use case, TreeMap will not be suitable.
If insertion order is just incidental for you, and what you want is in fact the actual order of comparable keys, you should use a TreeMap instead.
However, the specific use case of removing half the keys might not be supported directly. You will rather find methods to remove keys below/above a certain value, and get the highest/lowest keys.

Related

Kotlin - very frequent data removal and addition to a list causes npe

I've a buffer that is actually ArrayList<Object>.
Happens async:
This buffer list changes very frequently - I mean 15-50 times in single second and the idea is that whenever there's an update, I remove first element by position buffer.removeAt(0) and add new value in the end by buffer.add(new).
At some point I call a function that goes and do calculation with buffer list. What I do is I go through the list - element by element. At some point I run into NPE as the the element has been removed async.
How to solve this NPE? I was thinking of making deep copy, but making deep copy would mean to go through the buffer list and do some data allocation, which basically means that while I do deep copy I can still run into NPE.
How problems like these are solved?
How to solve NPE?
What would be more optimized way as this is gonna consume a lot of memory?
Code:
private fun observeFrequentData() {
frequentData.observe(owner, Observer { data ->
if (accelerationData == null) return#Observer
GlobalScope.launch {
val a = data[0].toDouble()
val b = data[1].toDouble()
val c = a + b
val timestamp = System.currentTimeMillis()
val customObj = CustomObj(c, timestamp)
if (buffer.size >= 5000) {
buffer.removeAt(0)
}
buffer.add(acceleration)
}
})
}
fun getBuffer() {
val mappedData = buffer.map { it.smth } // NPE, it == null
}
If you are doing lots of removing from 0, and insert at the end. Then ArrayList is probably not the container to use.
you can consider using a LinkedList .
buffer.removeFirst();
and
buffer.add(acceleration);
also note the following comments regarding synchronization.
Note that this implementation is not synchronized. If multiple threads
access a linked list concurrently, and at least one of the threads
modifies the list structurally, it must be synchronized externally. (A
structural modification is any operation that adds or deletes one or
more elements; merely setting the value of an element is not a
structural modification.) This is typically accomplished by
synchronizing on some object that naturally encapsulates the list. If
no such object exists, the list should be "wrapped" using the
Collections.synchronizedList method. This is best done at creation
time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new LinkedList(...));
Using the synchronized keyword on your piece of code as #patrickf suggested.
To take care of performance, instead of making the method call itself synchronized, you can just write the 3 "buffer" related lines of code (size, removeAt and add) in a synchronized block.
Something like;
.
.
.
synchronized {
if (buffer.size >= 5000) {
buffer.removeAt(0)
}
buffer.add(acceleration)
}
}
})
Hope this helps!

How do I mutate and optionally remove elements from a vec without memory allocation?

I have a Player struct that contains a vec of Effect instances. I want to iterate over this vec, decrease the remaining time for each Effect, and then remove any effects whose remaining time reaches zero. So far so good. However, for any effect removed, I also want to pass it to Player's undo_effect() method, before destroying the effect instance.
This is part of a game loop, so I want to do this without any additional memory allocation if possible.
I've tried using a simple for loop and also iterators, drain, retain, and filter, but I keep running into issues where self (the Player) would be mutably borrowed more than once, because modifying self.effects requires a mutable borrow, as does the undo_effect() method. The drain_filter() in nightly looks useful here but it was first proposed in 2017 so not holding my breath on that one.
One approach that did compile (see below), was to use two vectors and alternate between them on each frame. Elements are pop()'ed from vec 1 and either push()'ed to vec 2 or passed to undo_effect() as appropriate. On the next game loop iteration, the direction is reversed. Since each vec will not shrink, the only allocations will be if they grow larger than before.
I started abstracting this as its own struct but want to check if there is a better (or easier) way.
This one won't compile. The self.undo_effect() call would borrow self as mutable twice.
struct Player {
effects: Vec<Effect>
}
impl Player {
fn update(&mut self, delta_time: f32) {
for effect in &mut self.effects {
effect.remaining -= delta_time;
if effect.remaining <= 0.0 {
effect.active = false;
}
}
for effect in self.effects.iter_mut().filter(|e| !e.active) {
self.undo_effect(effect);
}
self.effects.retain(|e| e.active);
}
}
The below compiles ok - but is there a better way?
struct Player {
effects: [Vec<Effect>; 2],
index: usize
}
impl Player {
fn update(&mut self, delta_time: f32) {
let src_index = self.index;
let target_index = if self.index == 0 { 1 } else { 0 };
self.effects[target_index].clear(); // should be unnecessary.
while !self.effects[src_index].is_empty() {
if let Some(x) = self.effects[src_index].pop() {
if x.active {
self.effects[target_index].push(x);
} else {
self.undo_effect(&x);
}
}
}
self.index = target_index;
}
}
Is there an iterator version that works without unnecessary memory allocations? I'd be ok with allocating memory only for the removed elements, since this will be much rarer.
Would an iterator be more efficient than the pop()/push() version?
EDIT 2020-02-23:
I ended up coming back to this and I found a slightly more robust solution, similar to the above but without the danger of requiring a target_index field.
std::mem::swap(&mut self.effects, &mut self.effects_cache);
self.effects.clear();
while !self.effects_cache.is_empty() {
if let Some(x) = self.effects_cache.pop() {
if x.active {
self.effects.push(x);
} else {
self.undo_effect(&x);
}
}
}
Since self.effects_cache is unused outside this method and does not require self.effects_cache to have any particular value beforehand, the rest of the code can simply use self.effects and it will always be current.
The main issue is that you are borrowing a field (effects) of Player and trying to call undo_effect while this field is borrowed. As you noted, this does not work.
You already realized that you could juggle two vectors, but you could actually only juggle one (permanent) vector:
struct Player {
effects: Vec<Effect>
}
impl Player {
fn update(&mut self, delta_time: f32) {
for effect in &mut self.effects {
effect.remaining -= delta_time;
if effect.remaining <= 0.0 {
effect.active = false;
}
}
// Temporarily remove effects from Player.
let mut effects = std::mem::replace(&mut self.effects, vec!());
// Call Player::undo_effects (no outstanding borrows).
// `drain_filter` could also be used, for better efficiency.
for effect in effects.iter_mut().filter(|e| !e.active) {
self.undo_effect(effect);
}
// Restore effects
self.effects = effects;
self.effects.retain(|e| e.active);
}
}
This will not allocate because the default constructor of Vec does not allocate.
On the other hand, the double-vector solution might be more efficient as it allows a single pass over self.effects rather than two. YMMV.
If I understand you correctly, you have two questions:
How can I split a Vec into two Vecs (one which fulfill a predidate, the other one which doesn't)
Is it possible to do without memory overhead
There are multiple ways of splitting a Vec into two (or more).
You could use Iteratator::partition which will give you two distinct Iterators which can be used further.
There is the unstable Vec::drain_filter function which does the same but on a Vec itself
Use splitn (or splitn_mut) which will split your Vec/slice into n (2 in your case) Iterators
Depending on what you want to do, all solutions are applicable and good to use.
Is it possible without memory overhead? Not with the solutions above, because you need to create a second Vec which can hold the filtered items. But there is a solution, namely you can "sort" the Vec where the first half will contain all the items that fulfill the predicate (e.g. are not expired) and the second half that will fail the predicate (are expired). You just need to count the amount of items that fulfill the predicate.
Then you can use split_at (or split_at_mut) to split the Vec/slice into two distinct slices. Afterwards you can resize the Vec to the length of the good items and the other ones will be dropped.
The best answer is this one in C++.
[O]rder the indices vector, create two iterators into the data vector, one for reading and one for writing. Initialize the writing iterator to the first element to be removed, and the reading iterator to one beyond that one. Then in each step of the loop increment the iterators to the next value (writing) and next value not to be skipped (reading) and copy/move the elements. At the end of the loop call erase to discard the elements beyond the last written to position.
The Rust adaptation to your specific problem is to move the removed items out of the vector instead of just writing over them.
An alternative is to use a linked list instead of a vector to hold your Effect instances.

How get random item from es6 Map or Set

I have a project that uses arrays of objects that I'm thinking of moving to es6 Sets or Maps.
I need to quickly get a random item from them (obviously trivial for my current arrays). How would I do this?
Maps and Sets are not well suited for random access. They are ordered and their length is known, but they are not indexed for access by an order index. As such, to get the Nth item in a Map or Set, you have to iterate through it to find that item.
The simple way to get a random item from a Set or Map would be to get the entire list of keys/items and then select a random one.
// get random item from a Set
function getRandomItem(set) {
let items = Array.from(set);
return items[Math.floor(Math.random() * items.length)];
}
You could make a version that would work with both a Set and a Map like this:
// returns random key from Set or Map
function getRandomKey(collection) {
let keys = Array.from(collection.keys());
return keys[Math.floor(Math.random() * keys.length)];
}
This is obviously not something that would perform well with a large Set or Map since it has to iterate all the keys and build a temporary array in order to select a random one.
Since both a Map and a Set have a known size, you could also select the random index based purely on the .size property and then you could iterate through the Map or Set until you got to the desired Nth item. For large collections, that might be a bit faster and would avoid creating the temporary array of keys at the expense of a little more code, though on average it would still be proportional to the size/2 of the collection.
// returns random key from Set or Map
function getRandomKey(collection) {
let index = Math.floor(Math.random() * collection.size);
let cntr = 0;
for (let key of collection.keys()) {
if (cntr++ === index) {
return key;
}
}
}
There's a short neat ES6+ version of the answer above:
const getRandomItem = iterable => iterable.get([...iterable.keys()][Math.floor(Math.random() * iterable.size)])
Works for Maps as well as for Sets (where keys() is an alias for value() method)
This is the short answer for Sets:
const getRandomItem = set => [...set][Math.floor(Math.random()*set.size)]

How can I simply check if a set of n numbers are all different?

I have n integers and I need a quick logic test to see that they are all different, and I don't want to compare every combination to find a match...any ideas on a nice and elegant approach?
I don't care what programming language your idea is in, I can convert!
Use a set data structure if your language supports it, you might also look at keeping a hash table of seen elements.
In python you might try
seen={}
n_already_seen=n in seen
seen[n]=n
n_already_seen will be a boolean indicating if n has already been seen.
You don't have to check every combination thanks to commutivity and transitivity; you can simply go down the list and check each entry against each entry that comes after it. For example:
bool areElementsUnique( int[] arr ) {
for( int i=0; i<arr.Length-1; i++ ) {
for( int j=i+1; j<arr.Length; j++ ) {
if( arr[i] == arr[j] ) return false;
}
}
return true;
}
Note that the inner loop doesn't start from the beginning, but from the next element (i+1).
You can use a Hash Table or a Set type of data structure that using hashing. Then you can insert all of the elements into the hashtable or hashset, and either as you insert, check if the element is already in the table/set. If for some reason you don't want to check as you go, you can just insert all the numbers and then check to see if the size of the structure is less than n. If it is less than n, there had to be repeated elements. Otherwise, they were all unique.
Here is a really compact Java solution. The time-complexity is amortized O(n) and the space complexity is also O(n).
public boolean areAllElementsUnique(int [] list)
{
Set<Integer> set = new HashSet<Integer>();
for (int number: list)
if (set.contains(number))
return false;
else
set.add(number);
return true;
}

how to design/create key for key/value storage?

I want to store serialized objects (or whatever) in a key/value cache.
Now I do something like this :
public string getValue(int param1, string param2, etc )
{
string key = param1+"_"+param2+"_"+etc;
string tmp = getFromCache();
if (tmp == null)
{
tmp = getFromAnotherPlace();
addToCache( key, tmp);
}
return tmp;
}
I think it can be awkward. How can I design the key?
if i understood the question, i think the simplest and smartest way to make a key is to use an unidirectional hash function as MD5, SHA1 ecc...
At least two reason for doing this:
The resulting key is unique for sure!(actually both MD5 and SHA1 have been cracked (= )
The resulting key has a fixed lenght!
You have to give your object as argument of the function and you have your unique key.
I don t know very much c# but i am quite sure you can find an unidirectional hash function builted-in.
First of all your key seems to be composed out of a lot of characters. Keep in mind that the key name also occupies memory (1byte / char) so try to keep it as short as possible. I've seen situations where the key name was larger than the value, which can happen if you have cases where you store an empty array or an empty value.
The key structure. I guess from your example that the object you want to store is identified by the params (one being the item id maybe, or maybe filters for a search [...]). Start with a prefix. The prefix should be the name of the object class (or a simplified name depicting the object in general).
Most of the time, keys will have a prefix + identifier. In your example you have multiple identifiers. If one of them is a unique id, go with only prefix + id and it should be enough.
If the object is large and you don't always use all of it then change your strategy to a multiple key storage. Use one main key for storing the most common values, or for storing the components of the object, values of which are stored in separate keys. Make use of pipes and get the whole object in one connection using one "multiple" query :
mainKey = prefix + objectId;
object = getFromCache(mainKey);
startCachePipeline();
foreach (object[properties] as property) {
object->property = getFromCache(prefix + objectId + property);
}
endCachePipeline();
The structure for an example "Person" object would then be something like :
person_33 = array(
properties => array(age, height, weight)
);
person_33_age = 28;
person_33_height = 6;
person_33_weight = 150;
Memcached uses memory most efficient when objects stored inside are of similar sizes. The bigger the size difference between objects (not talking about 1 lost big object or singular cases, although memory gets wasted then as well) the more wasted memory.
Hope it helps!

Resources