Recently I tackled a problem which involved updating a large number of key values.
Naturally, I considered using a Map, with operations like Map.put/3.
However this seemed insufficient, given the immutable nature of data structures in Elixir:
iex> m = Map.put(%{}, :a, 1)
%{a: 1}
iex> Map.put(m, :b, 2)
%{a: 1, b: 2}
iex> m
%{a: 1}
I then solved the problem by holding the state of the Map in a GenServer, and updating it using handle_cast/3 calls.
Generally, is this the right approach, or was this too much here?
I then solved the problem by holding the state of the Map in a GenServer [...]
Generally, is this the right approach, or was this too much here?
It heavily depends on your goal. There are many different ways to store the state. Rebinding variables like:
m = Map.put(%{}, :a, 1)
#⇒ %{a: 1}
m = Map.put(m, :b, 2)
#⇒ %{a: 1, b: 2}
Does not store anything. It binds the local variable m to RHO and as soon as the control flow leaves the scope, this variable becomes garbage collected. Whether you need the aforementioned map within a single scope, GenServer (and other state holders) is an overkill.
OTOH, if you need to store the state for a long time and share it between different scopes (e. g. between different processes,) GenServer is the simplest way to accomplish that. In Elixir we have Agent module to decrease the boilerplate for GenServer that is used as a simple in-memory storage, but my advice would be to always use GenServer: sooner or later Agent will become too tight for your purposes.
Also, one might use ets module to keep in-memory key-value storage, shared between processes.
dets is a way to store the state between process restarts.
And, finally, mnesia is an OTP native approach to share the state between both restarts and different nodes (in distributed environment.)
Your first approach was right, you just got one thing wrong.
You should rebind the variable when you update the map, like here:
iex> m = Map.put(%{}, :a, 1)
%{a: 1}
iex> m = Map.put(m, :b, 2)
%{a: 1, b: 2}
iex> m
%{a: 1, b: 2}
But you gotta undestand here that it doesn't mutate the variable, it creates a new map and rebinds it to the same variable.
Now, this approach is the most simple one and you'd have to pass this map to every function that uses it. As an alternative, you may consider using the Agent module. All the info what it is and what it is used for can be found in its docs.
Related
I'm trying to wrap my head around the use cases for the RxJs operator groupBy and I'm concerned that in certain instances it may lead to a memory leak.
I'm familiar with groupBy in the tradition sense (synchronous list processing for example). I'm going to write out a groupBy function to make reference to:
const groupBy = f => list =>
list.reduce((grouped, item) => {
const category = f(item);
if (!(category in grouped)) {
grouped[category] = [];
}
grouped[category].push(item);
return grouped;
}, {});
const oddsAndEvens = x => x % 2 === 0 ? 'EVEN' : 'ODD';
compose(
console.log,
groupBy(oddsAndEvens)
)([1,2,3,4,5,6,7,8])
// returns: { ODD: [ 1, 3, 5, 7 ], EVEN: [ 2, 4, 6, 8 ] }
Note that this is stateless in the broader scope. I'm assuming that RxJs does something similar to this where in the place of EVEN and ODD there would be returned observables, and that it keeps track of the groups statefully in something that behaves like a set. Correct me if I'm wrong, the main point is that I think RxJs would have to maintain a stateful list of all groupings.
My question is, what happens if the number of grouping values (just EVEN and ODD in this example) are not finite? For example, a stream that gives you a unique identifier to maintain coherence over the life of the stream. If you were to group by this identifier would RxJs's groupBy operator keep making more and more groups even tho old identifiers will never be revisited again?
If your stream is infinite and your Key Selector can produce infinite groups, then - yes, you have a memory leak.
You can set a Duration Selector for every grouped observable. The Duration Selector is created for each group and signals on the expiration of the group.
rxjs 5+: groupBy 3rd parameter.
rxjs 4: use the groupedByUntil operator instead.
Here is an example of an infinite stream, where each of the grouped Observables is closed after 3 seconds.
Rx.Observable.interval(200)
.groupBy(
x => Math.floor(x / 10),
x => x,
x$ => Rx.Observable.timer(3000).finally(() => console.log(`closing group ${x$.key}`))
)
.mergeMap(x$ => x$.map(x => `group ${x$.key}: ${x}`))
.subscribe(console.log)
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/5.5.8/Rx.js"></script>
My question is, what happens if the number of grouping values (just EVEN and ODD in this example) are not finite?
That can only happen in infinite streams (as there can't be more groups than values on the source stream). The answer is simple: you will keep creating new observables.
Each GroupedObservable lives exactly as long as the source (groups are completed when the source completes), as you can see in the docs:
Technically there is no memory leak here since you're actively observing an infinite observable. Once the source observable completes, so will all groups:
source$
.takeUntil(stop$)
.groupBy(…)
But in a less technical sense: grouping an infinite observable over a unique property without ever unsubscribing from the source won't do your memory usage a big favor, no.
If you were to group by this identifier would RxJs's groupBy operator keep making more and more groups even tho old identifiers will never be revisited again?
The thing to point out here is that there is nothing rxjs could do about this. It cannot know whether a group is done or whether it will receive another value at some point later on.
Isn't there already Send/Sync? The official document only mentions it has something to do with data races.
Because of memory safety.
Consider this example (disregard the fact that this would result in an infinite loop if it compiled):
let mut list = vec![1, 2, 3];
for item in &list {
list.push(*item + 1);
println!("item = {}", item);
}
item is a reference to the memory held by list; it is of type &i32. You may read the value of that element by dereferencing it (*item).
What would happen to the reference in item if the push call were to reallocate the vector's memory to a different address?
The reference would then contain the old address. Any attempt to access it would involve reading some undefined chunk of memory. This violates a core Rust safety principle.
Isn't there already Send/Sync
Send and Sync are concerned with multiple threads. As you can see from the example above, you don't need threads to potentially produce invalid references.
I often find myself getting an error like this:
mismatched types: expected `collections::vec::Vec<u8>`, found `&[u8]` (expected struct collections::vec::Vec, found &-ptr)
As far as I know, one is mutable and one isn't but I've no idea how to go between the types, i.e. take a &[u8] and make it a Vec<u8> or vice versa.
What's the different between them? Is it the same as String and &str?
Is it the same as String and &str?
Yes. A Vec<T> is the owned variant of a &[T]. &[T] is a reference to a set of Ts laid out sequentially in memory (a.k.a. a slice). It represents a pointer to the beginning of the items and the number of items. A reference refers to something that you don't own, so the set of actions you can do with it are limited. There is a mutable variant (&mut [T]), which allows you to mutate the items in the slice. You can't change how many are in the slice though. Said another way, you can't mutate the slice itself.
take a &[u8] and make it a Vec
For this specific case:
let s: &[u8]; // Set this somewhere
Vec::from(s);
However, this has to allocate memory not on the stack, then copy each value into that memory. It's more expensive than the other way, but might be the correct thing for a given situation.
or vice versa
let v = vec![1u8, 2, 3];
let s = v.as_slice();
This is basically "free" as v still owns the data, we are just handing out a reference to it. That's why many APIs try to take slices when it makes sense.
The equivalent in a procedural language (e.g. in Java) would be local variables (or instance variables) declared outside of a loop whose contents use and update them. How can I do that in Erlang?
You pass the state as parameters in the recursive call. Example loop that receives N Msgs and returns them as a list:
loop(N) ->
loop(N, 0, []).
loop(N, Count, Msgs) when Count < N ->
receive
Msg -> loop(N, Count+1, [Msg|Msgs])
end;
loop(_, _, Msgs)
list:reverse(Msgs).
I hope it wasn't homework question but I'm confused with "two ways" in subject.
The most proper way, of course, is to extend recursive function definition with at least one argument to carry all needed data. But, if you can't use it, and you are sure only one instance of such recursive cycle will be in effect in a moment (or they will be properly stacked), and function invocations are in the same process, then process dictionary will help you. See put() and get() in erlang module, and invent unique terms to be used as keys. But this is definitely a kind of hack.
One could invent more hacks but all them will be ugly.:)
When ranging over a map m that has concurrent writers, including ones that could delete from the map, is it not thread-safe to do this?:
for k, v := range m { ... }
I'm thinking to be thread-safe I need to prevent other possible writers from changing the value v while I'm reading it, and (when using a mutex and because locking is a separate step) verify that the key k is still in the map. For example:
for k := range m {
m.mutex.RLock()
v, found := m[k]
m.mutex.RUnlock()
if found {
... // process v
}
}
(Assume that other writers are write-locking m before changing v.) Is there a better way?
Edit to add: I'm aware that maps aren't thread-safe. However, they are thread-safe in one way, according to the Go spec at http://golang.org/ref/spec#For_statements (search for "If map entries that have not yet been reached are deleted during iteration"). This page indicates that code using range needn't be concerned about other goroutines inserting into or deleting from the map. My question is, does this thread-safe-ness extend to v, such that I can get v for reading only using only for k, v := range m and no other thread-safe mechanism? I created some test code to try to force an app crash to prove that it doesn't work, but even running blatantly thread-unsafe code (lots of goroutines furiously modifying the same map value with no locking mechanism in place) I couldn't get Go to crash!
No, map operations are not atomic/thread-safe, as the commenter to your question pointed to the golang FAQ “Why are map operations not defined to be atomic?”.
To secure your accessing it, you are encouraged to use Go's channels as a means of resource access token. The channel is used to simply pass around a token. Anyone wanting to modify it will request so from the channel - blocking or non-blocking. When done with working with the map it passes the token back to the channel.
Iterating over and working with the map should be sufficiently simple and short, so you should be ok using just one token for full access.
If that is not the case, and you use the map for more complex stuff/a resource consumer needs more time with it, you may implement a reader- vs writer-access-token. So at any given time, only one writer can access the map, but when no writer is active the token is passed to any number of readers, who will not modify the map (thus they can read simultaneously).
For an introduction to channels, see the Effective Go docs on channels.
You could use concurrent-map to handle the concurrency pains for you.
// Create a new map.
map := cmap.NewConcurretMap()
// Add item to map, adds "bar" under key "foo"
map.Add("foo", "bar")
// Retrieve item from map.
tmp, ok := map.Get("foo")
// Checks if item exists
if ok == true {
// Map stores items as interface{}, hence we'll have to cast.
bar := tmp.(string)
}
// Removes item under key "foo"
map.Remove("foo")