Can the RxJs operator groupBy leak memory? - functional-programming

I'm trying to wrap my head around the use cases for the RxJs operator groupBy and I'm concerned that in certain instances it may lead to a memory leak.
I'm familiar with groupBy in the tradition sense (synchronous list processing for example). I'm going to write out a groupBy function to make reference to:
const groupBy = f => list =>
list.reduce((grouped, item) => {
const category = f(item);
if (!(category in grouped)) {
grouped[category] = [];
}
grouped[category].push(item);
return grouped;
}, {});
const oddsAndEvens = x => x % 2 === 0 ? 'EVEN' : 'ODD';
compose(
console.log,
groupBy(oddsAndEvens)
)([1,2,3,4,5,6,7,8])
// returns: { ODD: [ 1, 3, 5, 7 ], EVEN: [ 2, 4, 6, 8 ] }
Note that this is stateless in the broader scope. I'm assuming that RxJs does something similar to this where in the place of EVEN and ODD there would be returned observables, and that it keeps track of the groups statefully in something that behaves like a set. Correct me if I'm wrong, the main point is that I think RxJs would have to maintain a stateful list of all groupings.
My question is, what happens if the number of grouping values (just EVEN and ODD in this example) are not finite? For example, a stream that gives you a unique identifier to maintain coherence over the life of the stream. If you were to group by this identifier would RxJs's groupBy operator keep making more and more groups even tho old identifiers will never be revisited again?

If your stream is infinite and your Key Selector can produce infinite groups, then - yes, you have a memory leak.
You can set a Duration Selector for every grouped observable. The Duration Selector is created for each group and signals on the expiration of the group.
rxjs 5+: groupBy 3rd parameter.
rxjs 4: use the groupedByUntil operator instead.
Here is an example of an infinite stream, where each of the grouped Observables is closed after 3 seconds.
Rx.Observable.interval(200)
.groupBy(
x => Math.floor(x / 10),
x => x,
x$ => Rx.Observable.timer(3000).finally(() => console.log(`closing group ${x$.key}`))
)
.mergeMap(x$ => x$.map(x => `group ${x$.key}: ${x}`))
.subscribe(console.log)
<script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/5.5.8/Rx.js"></script>

My question is, what happens if the number of grouping values (just EVEN and ODD in this example) are not finite?
That can only happen in infinite streams (as there can't be more groups than values on the source stream). The answer is simple: you will keep creating new observables.
Each GroupedObservable lives exactly as long as the source (groups are completed when the source completes), as you can see in the docs:
Technically there is no memory leak here since you're actively observing an infinite observable. Once the source observable completes, so will all groups:
source$
.takeUntil(stop$)
.groupBy(…)
But in a less technical sense: grouping an infinite observable over a unique property without ever unsubscribing from the source won't do your memory usage a big favor, no.
If you were to group by this identifier would RxJs's groupBy operator keep making more and more groups even tho old identifiers will never be revisited again?
The thing to point out here is that there is nothing rxjs could do about this. It cannot know whether a group is done or whether it will receive another value at some point later on.

Related

Treating single and multiple elements the same way ("transparent" map operator)

I'm working on a programming language that is supposed to be easy, intuitive, and succinct (yeah, I know, I'm the first person to ever come up with that goal ;-) ).
One of the features that I am considering for simplifying the use of container types is to make the methods of the container's element type available on the container type itself, basically as a shortcut for invoking a map(...) method. The idea is that working with many elements should not be different from working with a single element: I can apply add(5) to a single number or to a whole list of numbers, and I shouldn't have to write slightly different code for the "one" versus the "many" scenario.
For example (Java pseudo-code):
import static java.math.BigInteger.*; // ZERO, ONE, ...
...
// NOTE: BigInteger has an add(BigInteger) method
Stream<BigInteger> numbers = Stream.of(ZERO, ONE, TWO, TEN);
Stream<BigInteger> one2Three11 = numbers.add(ONE); // = 1, 2, 3, 11
// this would be equivalent to: numbers.map(ONE::add)
As far as I can tell, the concept would not only apply to "container" types (streams, lists, sets...), but more generally to all functor-like types that have a map method (e.g., optionals, state monads, etc.).
The implementation approach would probably be more along the lines of syntactic sugar offered by the compiler rather than by manipulating the actual types (Stream<BigInteger> obviously does not extend BigInteger, and even if it did the "map-add" method would have to return a Stream<BigInteger> instead of an Integer, which would be incompatible with most languages' inheritance rules).
I have two questions regarding such a proposed feature:
(1) What are the known caveats with offering such a feature? Method name collisions between the container type and the element type are one problem that comes to mind (e.g., when I call add on a List<BigInteger> do I want to add an element to the list or do I want to add a number to all elements of the list? The argument type should clarify this, but it's something that could get tricky)
(2) Are there any existing languages that offer such a feature, and if so, how is this implemented under the hood? I did some research, and while pretty much every modern language has something like a map operator, I could not find any languages where the one-versus-many distinction would be completely transparent (which leads me to believe that there is some technical difficulty that I'm overlooking here)
NOTE: I am looking at this in a purely functional context that does not support mutable data (not sure if that matters for answering these questions)
Do you come from an object-oriented background? That's my guess because you're thinking of map as a method belonging to each different "type" as opposed to thinking about various things that are of the type functor.
Compare how TypeScript would handle this if map were a property of each individual functor:
declare someOption: Option<number>
someOption.map(val => val * 2) // Option<number>
declare someEither: Either<string, number>
someEither.map(val => val * 2) // Either<string,number>
someEither.mapLeft(string => 'ERROR') // Either<'ERROR', number>
You could also create a constant representing each individual functor instance (option, array, identity, either, async/Promise/Task, etc.), where these constants have map as a method. Then have a standalone map method that takes one of those "functor constant"s, the mapping function, and the starting value, and returns the new wrapped value:
const option: Functor = {
map: <A, B>(f: (a:A) => B) => (o:Option<A>) => Option<B>
}
declare const someOption: Option<number>
map(option)(val => val * 2)(someOption) // Option<number>
declare const either: Functor = {
map: <E, A, B>(f: (a:A) => B) => (e:Either<E, A>) => Either<E, B>
}
declare const either: Either<string,number>
map(either)(val => val * 2)(someEither)
Essentially, you have a functor "map" that uses the first parameter to identify which type you're going to be mapping, and then you pass in the data and the mapping function.
However, with proper functional languages like Haskell, you don't have to pass in that "functor constant" because the language will apply it for you. Haskell does this. I'm not fluent enough in Haskell to write you the examples, unfortunately. But that's a really nice benefit that means even less boilerplate. It also allows you to write a lot of your code in what is "point free" style, so refactoring becomes much easier if you make your language so you don't have to manually specify the type being used in order to take advantage of map/chain/bind/etc.
Consider you initially write your code that makes a bunch of API calls over HTTP. So you use a hypothetical async monad. If your language is smart enough to know which type is being used, you could have some code like
import { map as asyncMap }
declare const apiCall: Async<number>
asyncMap(n => n*2)(apiCall) // Async<number>
Now you change your API so it's reading a file and you make it synchronous instead:
import { map as syncMap }
declare const apiCall: Sync<number>
syncMap(n => n*2)(apiCall)
Look how you have to change multiple pieces of the code. Now imagine you have hundreds of files and tens of thousands of lines of code.
With a point-free style, you could do
import { map } from 'functor'
declare const apiCall: Async<number>
map(n => n*2)(apiCall)
and refactor to
import { map } from 'functor'
declare const apiCall: Sync<number>
map(n => n*2)(apiCall)
If you had a centralized location of your API calls, that would be the only place you're changing anything. Everything else is smart enough to recognize which functor you're using and apply map correctly.
As far as your concerns about name collisions, that's a concern that will exist no matter your language or design. But in functional programming, add would be a combinator that is your mapping function passed into your fmap (Haskell term) / map(lots of imperative/OO languages' term). The function you use to add a new element to the tail end of an array/list might be called snoc ("cons" from "construct" spelled backwards, where cons prepends an element to your array; snoc appends). You could also call it push or append.
As far as your one-vs-many issue, these are not the same type. One is a list/array type, and the other is an identity type. The underlying code treating them would be different as they are different functors (one contains a single element, while one contains multiple elements.
I suppose you could create a language that disallows single elements by automatically wrapping them as a single-element lists and then just uses the list map. But this seems like a lot of work to make two things that are very different look the same.
Instead, the approach where you wrap single elements to be identity and multiple elements to be a list/array, and then array and identity have their own under-the-hood handlers for the functor method map probably would be better.

Concurrent updates in DynamoDB, are there any guarantees?

In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.
For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?
A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?
I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?
I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.
The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.
I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.
The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.
For completeness, here is the script I used, nodejs.
const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();
const key = process.argv[2];
const num = process.argv[3];
run().then(() => {
console.log('Done');
});
async function run() {
const p = [];
for (let i = 0; i < num; i++) {
p.push(ddb.update({
TableName: 'concurrency-test',
Key: {x: key},
UpdateExpression: 'SET #k = :v',
ExpressionAttributeValues: {
':v': `test-${i}`
},
ExpressionAttributeNames: {
'#k': `k${i}`
}
}).promise());
}
await Promise.all(p);
const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
const item = response.Item;
console.log('keys', Object.keys(item).length);
}
Run like so:
node index.js {key} {number}
node index.js myKey 10
Timings:
10 updates: ~1.5s
100 updates: ~2s
1000 updates: ~10-20s (fluctuated a lot)
Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.
Your post contains quite a lot of questions.
There's a note in DynamoDB's manual:
All write requests are applied in the order in which they were received.
I assume that the clients send the requests in the order they were passed through a call.
That should resolve the question whether there are any guarantees. If you update different properties of an item in several requests updating only those properties, it should end up in an expected state (the 'sum' of the distinct changes).
If you, on the other hand, update the whole object, the last one will win.
DynamoDB has #DynamoDbVersion which you can use for optimistic locking to manage concurent writes of whole objects.
For scenarios like auctions, parallel tick counts (such as "likes"), DynamoDB offers AtomicCounters.
If you update a list, that depends on if you use the DynamoDB's list type (L), or if it is just a property and the client translates the lists into a String (S). So if you read a property, change it, and write, and do that in parallel, the result will be subject to eventual consistency - what you will read may not be the latest write. Applied to lists, and several times, you'll end up with some of the elements added, and some not (or, better said, added but then overwritten).

Result functions of type x => x are unnecessary?

There might be a gap in my understanding of how Reselect works.
If I understand it correctly the code beneath:
const getTemplates = (state) => state.templates.templates;
export const getTemplatesSelector = createSelector(
[getTemplates],
templates => templates
);
could just as well (or better), without loosing anything, be written as:
export const getTemplatesSelector = (state) => state.templates.templates;
The reason for this, if I understand it correctly, is that Reselect checks it's first argument and if it receives the exact same object as before it returns a cached output. It does not check for value equality.
Reselect will only run templates => templates when getTemplates returns a new object, that is when state.templates.templates references a new object.
The input in this case will be exactly the same as the input so no caching functionality is gained by using Reselect.
One can only gain performance from Reselect's caching-functionality when the function (in this case templates => templates) itself returns a new object, for example via .filter or .map or something similar. In this case though the object returned is the same, no changes are made and thus we gain nothing by Reselect's memoization.
Is there anything wrong with what I have written?
I mainly want to make sure that I correctly understand how Reselect works.
-- Edits --
I forgot to mention that what I wrote assumes that the object state.templates.templates immutably that is without mutation.
Yes, in your case reselect won't bring any benefit, since getTemplates is evaluated on each call.
The 2 most important scenarios where reselect shines are:
stabilizing the output of a selector when result function returns a new object (which you mentioned)
improving selector's performance when result function is computationally expensive

FRP vs. State Machine w/ Lenses for Game Loop

I'm trying to understand the practical difference between a FRP graph and a State Machine with lenses- specifically for something like a game loop where the entire state is re-drawn every tick.
Using javascript syntax, the following implementations would both essentially work:
Option 1: State Machine w/ Lenses
//Using Sanctuary and partial.lenses (or Ramda) primitives
//Each update takes the state, modifies it with a lens, and returns it
let state = initialValues;
eventSource.addEventListener(update, () => {
state = S.pipe([
updateCharacter,
updateBackground,
])
(state) //the first call has the initial settings
render(state);
});
Option 2: FRP
//Using Sodium primitives
//It's possible this isn't the best way to structure it, feel free to advise
cCharacter = sUpdate.accum(initialCharacter, updateCharacter)
cBackground = sUpdate.accum(initialBackground, updateBackground)
cState = cCharacter.lift(cBackground, mergeGameObjects)
cState.listen(render)
I see that Option 1 allows any update to get or set data anywhere in the game state, however all the cells/behaviors in Option 2 could be adjusted to be of type GameState and then the same thing applies. If this were the case, then I'm really confused about the difference since that would then just boil down to:
cGameState = sUpdate
.accum(initialGameState, S.pipe(...updates))
.listen(render)
And then they're really very equivalent...
Another way to achieve that goal would be to store all the Cells in some global reference, and then any other cell could sample them for reading. New updates could be propagated for communicating. That solution also feels quite similar to Option 1 at the end of the day.
Is there a way to structure the FRP graph in such a way that it offers clear advantages over the event-driven state machine, in this scenario?
I'm not quite sure what your question is, also because you keep changing the second example in your explanatory text.
In any case, the key benefit of the FRP approach — as I see it — is the following: The game state depends on many things, but they are all listed explicitly on the right-hand side of the definition of cGameState.
In contrast, in the imperative style, you have a global variable state which may or may not be changed by code that is not shown in the snippet you just presented. For all I know, the next line could be
eventSource2.addEventListener(update, () => { state = state + 1; })
and the game state suddenly depends on a second event source, a fact that is not apparent from the snippet you showed. This cannot happen in the FRP example: All dependencies of cGameState are explicit on the right-hand side. (They may be very complicated, sure, but at least they are explicit.)

How to do a simple notification system with redux-observable?

I'm trying to do a simple notification system with redux-observable. I'm new to rxjs so I'm having a hard time doing it.
What I'm trying to do is:
Dispatch an intent to display a notification
Detect the intent with an Epic
Dispatch the action that inserts the new notification
Wait 3 seconds
Dispatch another action that deletes the old notification
This is my Epic:
import { NOTIFICATION_DISPLAY_REQUESTED } from '../actions/actionTypes';
import { displayNotification, hideNotification } from '../actions/notifications';
export const requestNotificationEpic = (action$, store) =>
action$.ofType(NOTIFICATION_DISPLAY_REQUESTED)
.mapTo(displayNotification(action$.notification))
.delay(3000)
.mapTo(hideNotification(action$.notification));
What really happens is that NOTIFICATION_DISPLAY_REQUESTED is dispatched, and 3 seconds later, hideNotification is dispatched. displayNotification never happens.
I could just dispatch displayNotification from the view, delay 3 seconds and then dispatch hideNotification. But later I want to delete the last notification before adding a new one if there are more than 3 active notifications. That's why I dispatch displayNotification manually from inside the epic in this simple case.
So, how do I achieve this? Sorry if this is super simple question, I'm just new to all this and need some help.
Note: I know redux-saga exists, is just that redux-obsevable made more sense to me.
If you're new to RxJS, this isn't so simple :)
Couple things up front:
Operator chains
An Epic is a function which takes a stream of actions and returns a stream of actions. Actions in, actions out. The functions you chain to transform matching actions are called operators. Chaining operators is a lot like chaining garden hoses or power cords--the values flow from one to the other. It's also very similar to just chaining regular functions like third(second(first())) except that Observables have an additional dimension of time, so the operators are applied on each value that flows through them.
So if you say stream.mapTo(x).mapTo(y) the fact that you firsted mapped to x is made meaningless when you .mapTo(y) since mapTo ignores the source's values and instead just maps it to the one provided.
If instead you used map, it might become more apparant:
stream.map(value => 'a message').map(message => message + '!!!')
Just to be claer, this chaining of operators stuff is RxJS, not specific to redux-observable, which is more a pattern of using idiomatic RxJS with a tiny amount of glue into redux.
action$ is an Observable (technically ActionsObservable)
The argument action$ is an Observable of actions, not an actual action itself. So action$.notification will be undefined. That's one of the reasons people commonly use the dollar sign suffix, to denote it is a stream of those things.
Consider only have 2 actions, not 3
Your example shows you using three actions NOTIFICATION_DISPLAY_REQUESTED and two others to show and hide the notifications. In this case, the original intent action is basically the same as displayNotification() because it would be dispatched synchronously after the other.
Consider only have two actions, one for "show this notification" and another for "hide this notification". While this isn't a rule, it can often simplify your code and increase performance since your reducers don't have to run twice.
This is what it would look like in your case (name things however you'd like, of course):
export const displayNotificationEpic = (action$, store) =>
action$.ofType(DISPLAY_NOTIFICATION)
.delay(3000)
.map(action => hideNotification(action.notification));
// UI code kicks it off some how...
store.dispatch(displayNotification('hello world'));
Your reducers would then receive DISPLAY_NOTIFICATION and then 3 seconds later HIDE_NOTIFICATION (order whatever).
Also, cruicial to remember rom the redux-observable docs:
REMEMBER: Epics run alongside the normal Redux dispatch channel, after the reducers have already received them. When you map an action to another one, you are not preventing the original action from reaching the reducers; that action has already been through them!
Solution
Although I suggest using only two actions in this case (see above), I do want to directly answer your question! Since RxJS is a very flexible library there are many ways of accomplishing what you're asking for.
Here a couple:
One epic, using concat
The concat operator is used subscribe to all the provided Observables one at a time, moving onto the next one only when the current one completes. It "drains" each Observable one at a time.
If we wanted to create a stream that emits one action, waits 3000 ms then emits a different one, you could do this:
Observable.of(displayNotification(action.notification))
.concat(
Observable.of(hideNotification(action.notification))
.delay(3000)
)
Or this:
Observable.concat(
Observable.of(displayNotification(action.notification)),
Observable.of(hideNotification(action.notification))
.delay(3000)
)
In this case, they have the exact same effect. The key is that we are applying the delay to different Observable than the first--because we only want to delay the second action. We isolate them.
To use inside your epic, you'll need a merging strategy operator like mergeMap, switchMap, etc. These are very important to learn well as they're used very often in RxJS.
export const requestNotificationEpic = (action$, store) =>
action$.ofType(NOTIFICATION_DISPLAY_REQUESTED)
.mergeMap(action =>
Observable.concat(
Observable.of(displayNotification(action.notification)),
Observable.of(hideNotification(action.notification))
.delay(3000)
)
);
Two different epics
Another way of doing this would be to create two different epics. One is responsible for maping the first second to the second, the other for waiting 3 seconds before hiding.
export const requestNotificationEpic = (action$, store) =>
action$.ofType(NOTIFICATION_DISPLAY_REQUESTED)
.map(action => displayNotification(action.notification));
export const displayNotificationEpic = (action$, store) =>
action$.ofType(DISPLAY_NOTIFICATION)
.delay(3000)
.map(action => hideNotification(action.notification));
This works because epics can match against all actions, even ones that other epics have emitted! This allows clean separation, composition, and testing.
This example (to me) better demonstrates that having two intent actions is unneccesary for this example, but there may be requirements you didn't provide that justify it.
If this was very confusing, I would recommend diving deep into RxJS first. Tutorials, videos, workshops, etc. This is only skimming the surface, it gets much much deeper, but the payout is great for most people who stick with it.

Resources