Crossfilter temporary groups efficiency - crossfilter

I'm integrating Crossfilter with Vue and wonder about efficiency.
Whenever the state of the UI updates, I'm doing calculations using code like this one, to get various metrics from the dataset:
const originKeys = this.dimensions.origin.group().all().map(value => value.key)
At this point I realised that the group being created by the call is stored in the "registry" of the dimension every time the UI updates, but since I'm not storing the reference to the group, it's effectively "lost".
Then, whenever the data set updates, all of these "lost" groups do their calculations, even though the results are never used.
This is what I assumed, correct me if I'm wrong, please.
The calculations change according to the UI, so it's pointless to store references to the groups.
To overcome this issue, I created a simple helper function that creates a temporary group and disposes of it right after the calculation is done.
function temporaryGroup(dimension, calculator) {
const group = dim.group()
const result = calculator(group, dim)
group.dispose()
return result
}
Using it like this:
const originKeys = temporaryGroup(this.dimensions.origin, (group) => {
return group.all().map(value => value.key)
})
The question is, is there a better (more efficient) way for temporary calculations like the one above?

The answer is no. Your stated assumptions are correct. That's not efficient, and there is no more efficient way of using this library for temporary groups.
Crossfilter is designed for quick interaction between a fixed set of dimensions and groups.
It's stateful by design, adding and removing just the specific rows from each group that have changed based on the changes to filters.
This matters especially for range-based filters, since if you drag a brush interactively a small segment of domain is added and a small segment is removed at each mousemove.
There are also array indices created to track the mapping from data -> key -> bin. One array of keys and one integer array of indices for the dimension, and one integer array of bin indices for the group. This makes updates fast, but it may be inefficient for temporary groups.
If you don't have a consistent set of charts, it would be more efficient in principle to do the calculation yourself, using typed arrays and e.g. d3-array.
On the other hand, if you are doing this to comply with Vue's data model, you might see if Vue has a concept similar to React's "context", where shared state is associated with a parent component. This is how e.g. react-dc-js holds onto crossfilter objects.

Related

Doctrine innerOrder[int] column implementation for manual sort order control

I have two tables in my app's schema: Event and Game (one-to-many). Games are ordered by datetime field. But sometimes there can be games played in parallel (same datetime), but the user should be able to set their relative order.
I've added innerOrder (int) field with simple idea: it should have autogenerated value that can be changed on reorder (exchange with neighbor record). But I can't achieve this behavior with Doctrine: GeneratedValue can't be used twice / with separate field (just don't work this way).
On the next attempt I've tried to do it without autogeneration. But I need some initial value on insert, for example: MAX(innerOrder) (better - to set it automatically of course).
I can't do it in prePersist or similar methods - don't have access to repository class. And don't want to do it with additional query in controller - not only because of additional code I should insert each time (get max value from table, set inner order), but I'm afraid of possible conflicts (when two users are adding Games in parallel).
How should I achieve expected behavior (maybe, I'm totally wrong here)?
There is no need in achieving this behavior with Doctrine, you can manage this value from aggregate root. I.e when you attach the Game to the Event you can update it innerOrder value according to maximum of currently attached games + 1. Conflicts could be easily avoided with different kind of locks on Event you edit (i.e fetcing it with doctrine write lock or some kind of shared locks or mutex (see symfony/lock))
After it you can specify your relation confiration to fetch it with given order using this documentation
https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/tutorials/ordered-associations.html
My two cents: when creating/modifying an event, you can check if there's one already at the same time (default innerOrder is 0, or even count(*) of the events at the same time). You can issue a warning when there's another event, ask for the order, or take to a form where you can manually reassign the order of the events.

Creating a leaderboard scenario on Cloud Firestore

I'm playing with the recently introduced Cloud Firestore and I was wondering if it's possible to get a document's index in a collection to create a leaderboard.
For example, let's say I want to retrieve a user's position in the leaderboard. I'd do something like this:
db.collection('leaderboard').doc('usedId')
There, I'd have a rank field to display that user's position in the leaderboard. However, that means I'd have to create a Cloud Function to calculate users' position everytime their score changes. Considering Firestore charges by the number of CRUD operations, that could be really expensive.
Is there a better way to do it? Let's say define a query field (i.e. score), then get that document's index in the collection's array?
You can't find the position of a document within a query result except by actually running the query and reading all of them. This makes sorting by score unworkable for just determining rank.
You could reduce the number of writes for updating ranks by grouping ranks into blocks of 10 or 100 and then only updating the rank group when a player moves between them. Absolute rank could be determined by sorting by score within the group.
If you stored these rank groups as single documents this could result in significant savings.
Note that as you increase the number of writers to a document this increases contention so you'd need to balance group size against expected score submission rates to get the result you want.
The techniques in aggregation queries could be adapted to this problem but they essentially have the same cost trade-off you described in your question.
You can filter it in the database console. Top-right of the centre column, next to the 3 vertical dots.
You can also use this GitHub Repo to search for the query you need to insert into your code: https://github.com/firebase/snippets-web/blob/f61ee63d407f4a71ef9e677284c292b0a083d723/firestore/test.firestore.js#L928-L928
If you wanted to rank users based on their 'highScores', for instance, you could use something like the following for the top 10 (and create an ordered list or similar, to represent the user's rank):
db.collection("leaderboard")
.orderBy("highScore", "desc").limit(10) // this is the line you add to filter results
.get()
.then((snapshot) => {
snapshot.docs.forEach((doc) => {
console.log(doc.data());
renderLeaderboard(doc);
});
});

Sync related data using angularfire/collection

I want to use angularfirecollection to keep one-way sync with a list of data. The structure is a list of 'things' with various properties (e.g. 'Likes') and users who each hold a subset of 'things' keys (e.g. -jsdzsdrestofkey: true). U
sing angularfirecollection (or firebase native 'on'), I can sync up all things that a particular user has... I can also grab (using firebase native 'once') each 'thing''s properties to display.
In angular, however, I need to use $apply() to inject the property data into scope for each item in the user's 'thing' list. To keep things in sync, I suppose I can use firebase's on change event... But this all requires me to create new references for each thing in a user's list.
What is the best way to approach grabbing relational data in firebase, while keeping both the list and the relational data in sync?
Thanks!
Irfaan
If I understand correctly, it sounds like you should use FirebaseIndex and feed the index directly into an angularFireCollection. Then you wouldn't need to use $apply since the thing data will already be in the $scope, and everything will stay synced:
var index = new FirebaseIndex(fb.child('users/789/thing_list'), fb.child('things'));
$scope.things = angularFireCollection(index);
// $scope.things will contain the user's things with the associated thing data

How to handle duplicates in disconnected object graph?

I'm having a problem updating a disconnected POCO model in an ASP.NET application.
Lets say we have the following model:
Users
Districts
Orders
A user can be responsible for 0 or more districts, an order belongs to a district and a user can be the owner of an order.
When the user logs in the user and the related districts are loaded. Later the user loads an order, and sets himself as the owner of the order. The user(and related districts) and order(and related district) are loaded in two different calls with two different dbcontexts. When I save the order after the user has assigned himself to it. I get an exception that saying that acceptchanges cannot continue because the object's key values conflict with another object.
Which is not strange, since the same district can appear both in the list of districts the user is responsible and on the order.
I've searched high and low for a solution to this problem, but the answers I have found seems to be either:
Don't load the related entities of one of the objects in my case that would be the districts of the user.
Don't assign the user to the order by using the objects, just set the foreign key id on the order object.
Use nHibernate since it apparently handles it.
I tried 1 and that works, but I feel this is wrong because I then either have to load the user without it's districts before relating it to the order, or do a shallow clone. This is fine for this simple case here, but the problem is that in my case district might appear several more times in the graph. Also it seems pointless since I have the objects so why not let me connected them and update the graph. The reason I need the entire graph for the order, is that I need to display all the information to the user. So since I got all the objects why should I need to either reload or shallow clone it to get this to work?
I tried using STE but I ran in to the same problem, since I cannot attach an object to a graph loaded by another context. So I am back at square 1.
I would assume that this is a common problem in anything but tutorial code. Yet, I cannot seem to find any good solution to this. Which makes me think that either I do not under any circumstance understand using POCOs/EF or I suck at using google to find an answer to this problem.
I've bought both of the "Programming Entity Framework" books from O'Reilly by Julia Lerman but cannot seem to find anything to solve my problem in those books either.
Is there anyone out there who can shed some light on how to handle graphs where some objects might be repeated and not necessarily loaded from the same context.
The reason why EF does not allow to have two entities with the same key being attached to a context is that EF cannot know which one is "valid". For example: You could have two District objects in your object graph, both with a key Id = 1, but the two have different Name property values. Which one represents the data that have to be saved to the database?
Now, you could say that it doesn't matter if both objects haven't changed, you just want to attach them to a context in state Unchanged, maybe to establish a relationship to another entity. It is true in this special case that duplicates might not be a problem. But I think, it is simply too complex to deal with all situations and different states the objects could have to decide if duplicate objects are causing ambiguities or not.
Anyway, EF implements a strict identity mapping between object reference identity and key property values and just doesn't allow to have more than one entity with a given key attached to a context.
I don't think there is a general solution for this kind of problem. I can only add a few more ideas in addition to the solutions in your question:
Attach the User to the context you are loading the order in:
context.Users.Attach(user); // attaches user AND user.Districts
var order = context.Orders.Include("Districts")
.Single(o => o.Id == someOrderId);
// because the user's Districts are attached, no District with the same key
// will be loaded again, EF will use the already attached Districts to
// populate the order.Districts collection, thus avoiding duplicate Districts
order.Owner = user;
context.SaveChanges();
// it should work without exception
Attach only the entities to the context you need in order to perform a special update:
using (var context = new MyContext())
{
var order = new Order { Id = order.Id };
context.Orders.Attach(order);
var user = new User { Id = user.Id };
context.Users.Attach(user);
order.Owner = user;
context.SaveChanges();
}
This would be enough to update the Owner relationship. You would not need the whole object graph for this procedure, you only need the correct primary key values of the entities the relationship has to be created for. It doesn't work that easy of course if you have more changes to save or don't know what exactly could have been changed.
Don't attach the object graph to the context at all. Instead load new entities from the database that represent the object graph currently stored in the database. Then update the loaded graph with your detached object graph and save the changes applied to the loaded (=attached) graph. An example of this procedure is shown here. It is safe and a very general pattern (but not generic) but it can be very complex for complex object graphs.
Traverse the object graph and replace the duplicate objects by a unique one, for example just the first one with type and key you have found. You could build a dictionary of unique objects that you lookup to replace the duplicates. An example is here.

Firebase better way of getting total number of records

From the Transactions doc, second paragraph:
The intention here is for the client to increment the total number of
chat messages sent (ignore for a moment that there are better ways of
implementing this).
What are some standard "better ways" of implementing this?
Specifically, I'm looking at trying to do things like retrieve the most recent 50 records. This requires that I start from the end of the list, so I need a way to determine what the last record is.
The options as I see them:
use a transaction to update a counter each time a record is added, use the counter value with setPriority() for ordering
forEach() the parent and read all records, do my own sorting/filtering at client
write server code to analyze Firebase tables and create indexed lists like "mostRecent Messages" and "totalNumberOfMessages"
Am I missing obvious choices?
To view the last 50 records in a list, simply call "limit()" as shown:
var data = new Firebase(...);
data.limit(50).on(...);
Firebase elements are ordering first by priority, and if priorities match (or none is set), lexigraphically by name. The push() command automatically creates elements that are ordered chronologically, so if you're using push(), then no additional work is needed to use limit().
To count the elements in a list, I would suggest adding a "value" callback and then iterating through the snapshot (or doing the transaction approach we mention). The note in the documentation actually refers to some upcoming features we haven't released yet which will allow you to count elements without loading them first.

Resources