CouchDB: Merging Objects in Reduce Function - dictionary

I'm new to CouchDB, so bear with me. I searched SO for an answer, but couldn't narrow it down to this specifically.
I have a mapper function which creates values for a user. The users have seen different product pages, and we want to tally the type and products they've seen.
var emit_values = {};
emit_values.name = doc.name;
...
emit_values.productsViewed = {};
emit_values.productsViewed[doc.product] = 1
emit([doc.id, doc.customer], emit_values);
In the reduce function, I want to gather different values into that productsViewed object for that given user. So after the reduce, I have this:
productsViewed: {
book1: 1,
book3: 2,
book8: 1
}
Unfortunately, doing this creates a reduce overflow error. According to the other posts, this is because the productsViewed object is growing in size in the reduce function, and Couch doesn't like that. Specifically:
A common mistake new CouchDB users make is attempting to construct complex aggregate values with a reduce function. Full reductions should result in a scalar value, like 5, and not, for instance, a JSON hash with a set of unique keys and the count of each.
So, I understand this is not the right way to do this in Couch. Does anyone have any insight into how to properly gather values into a document after reduce?

You simple build a view with the customer as key
emit(doc.customer, doc.product);
Then you can call
/:db/_design/:name/_view/:name?key=":customer"
to get all products an user has viewed.
If a customer can have viewed a product several times you can build a multi-key view
emit([doc.customer, doc.product], null);
and reduce it with the built-in function _count
/:db/_design/:name/_view/:name?startkey=[":customer","\u0000"]&endkey=[":customer","\u9999"]&reduce=true&group_level=2
You have to accept that you cannot
construct complex aggregate values
with CouchDB by requesting the view. If you want to have a data structure like your wished payload
productsViewed: {
book1: 1,
book3: 2,
book8: 1
}
i recommend to use an _update handler on the customer doc. Every request that logs a product visit adds a value to the customer.productsViewed property instead of creating a new doc.

Related

Converting a "name" to a real, then convert it back to a string

I've ran into some issues with my current gamemaker project. I've setup a simple "merge" functionality in my game, and I'm trying to increase it's QoL
So what's happening when I'm merging, is this;
var sabatons = instance_place(x, y, object_index);
if (instance_exists(sabatons)) {
instance_create_layer(sabatons.lastknownPosX, sabatons.lastknownPosY, "Instances", "");
if (level >= 3) {
scr_spawn_experience();
}
audio_play_sound(snd_sabatons_merge,10,false);
instance_destroy();
instance_destroy(sabatons);
}
So what the code above does, is to check if object_index matches with what I'm trying to merge it with, and if it does, it runs instance_create_layer - which is where my question comes in.
My objects are easily named with a digit at the end, to keep track of their "level"
so basically, obj_sabatons_1 means it's the first item in a chain, obj_sabatons_2 is the second etc. Now what I need help with is to convert whatever the object is that I'm trying to merge with, to a string (object_index) but increase it with 1, so I then can put that result in my instance_create_layer so the game will create the next level of item, if I successfully merge my first two items :)
Thanks!!
Is there a reason that the level should be part of the name? If not, then you could convert the level into a variable of the object obj_sabatons, so you can keep all the code related to obj_sabatons in one place (And then remove all those copied objects with digits at the end)
Making multiple copies of the same object can be disadvantageous in numerous ways, For example, if you want to expand the limit, then you'd have to create a new object for each level, making a code change also means changing that code in every other object level (unless you're making use of parents/child objects), it would also make the progress of merging objects into new objects easier

Records and multithreading, converting to pass through the wall

First, is there an official CS term for sending things between the front end and back end? I just made up "the wall" but I would like a cooler term.
So in appmaker it seems you cannot pass whole records through to the backend (although you can handle them on either end).
So basically what I was doing was
get set of records, divide into chunks
var records = app.datasources.filesToProcess.items;
call backend process one time per chunk with this
google.script.run.withSuccessHandler(onSuccess).backendProcess(records, start, end);
This allows for a kind of multithreading. The problem is passing records. Is there an easy way to get just the IDS from a set of records client side so I can pass those as an array in place of the records? Passing the record object itself gives an error.
Just do the following:
var records = app.datasources.filesToProcess.items.map(function(item) {return item.id;});
and then you can use
function backendProcess(records, start, end) {
var items = app.models.YourModel.getRecords(records);
}
On the backend.

Arango DB performace: edge vs. DOCUMENT()

I'm new to arangoDB with graphs. I simply want to know if it is faster to build edges or use 'DOCUMENT()' for very simple 1:1 connections where a querying the graph is not needed?
LET a = DOCUMENT(#from)
FOR v IN OUTBOUND a
CollectionAHasCollectionB
RETURN MERGE(a,{b:v})
vs
LET a = DOCUMENT(#from)
RETURN MERGE(a,{b:DOCUMENT(a.bId)}
A simple benchmark you can try:
Create the collections products, categories and an edge collection has_category. Then generate some sample data:
FOR i IN 1..10000
INSERT {_key: TO_STRING(i), name: CONCAT("Product ", i)} INTO products
FOR i IN 1..10000
INSERT {_key: TO_STRING(i), name: CONCAT("Category ", i)} INTO categories
FOR p IN products
LET random_categories = (
FOR c IN categories
SORT RAND()
LIMIT 5
RETURN c._id
)
LET category_subset = SLICE(random_categories, 0, RAND()*5+1)
UPDATE p WITH {
categories: category_subset,
categoriesEmbedded: DOCUMENT(category_subset)[*].name
} INTO products
FOR cat IN category_subset
INSERT {_from: p._id, _to: cat} INTO has_category
Then compare the query times for the different approaches.
Graph traversal (depth 1..1):
FOR p IN products
RETURN {
product: p.name,
categories: (FOR v IN OUTBOUND p has_category RETURN v.name)
}
Look-up in categories collection using DOCUMENT():
FOR p IN products
RETURN {
product: p.name,
categories: DOCUMENT(p.categories)[*].name
}
Using the directly embedded category names:
FOR p IN products
RETURN {
product: p.name,
categories: p.categoriesEmbedded
}
Graph traversal is the slowest of all 3, the lookup in another collection is faster than the traversal, but the by far fastest query is the one with embedded category names.
If you query the categories for just one or a few products however, the response times should be in the sub-millisecond area regardless of the data model and query approach and therefore not pose a performance problem.
The graph approach should be chosen if you need to query for paths with variable depth, long paths, shortest path etc. For your use case, it is not necessary. Whether the embedded approach is suitable or not is something you need to decide:
Is it acceptable to duplicate information, and potentially have inconsistencies in the data? (If you want to change the category name, you need to change it in all product records instead of just one category document, that products can refer to via the immutable ID)
Is there a lot of additional information per category? If so, all that data needs to be embedded into every product document that has that category - basically trading memory / storage space for performance
Do you need to retrieve a list of all (distinct) categories often? You can do this type of query really cheap with the separate categories collection. With the embedded approach, it will be much less efficient, because you need to go over all products and collect the category info.
Bottom line: you should choose the data model and approach that fits your use case best. Thanks to ArangoDB's multi-model nature you can easily try another approach if your use case changes or you run into performance issues.
Generally spoken, the latter variant
LET a = DOCUMENT(#from)
RETURN MERGE(a,{b:DOCUMENT(a.bId)}
should have lower overhead than the full-featured traversal variant. This is because the DOCUMENT variant will do a point lookup of a document whereas the traversal variant is very general purpose: it can return zero to many results from a variable number of collections, needs to keep track of the path seen etc.
When I tried both variants in a local test case, the non-traversal variant was also a lot faster, supporting this claim.
However, the traversal-based variant is more flexible: it can also be used should there be multiple edges (no 1:1 mapping) and for longer paths.

modify field value in a crossfilter after insertion

I need to modify a field value for all records in a crossfilter before inserting new records.
The API doesn't say anything about it. Is there a way to do that ?
Even if it's a hack that would be really useful to me.
Looking at the code, the data array is held as a private local variable inside the crossfilter function so there's no way to get at it directly.
With that said, it looks like Crossfilter really tries to minimize the number of copies of the data it makes. So callback functions like the ones passed into crossfilter.dimension or dimension.filter are passed the actual records themselves from the data array (using the native Array.map) so any changes to make to the records will be made to the main records.
With that said, you obviously need to be very careful that you're not changing anything that is relied on by the existing dimensions, filters or groups. Otherwise you'll end up with data that doesn't agree with the internal Crossfilter structures and chaos will ensue.
The cool thing about .remove is it only removes entries that match the currently applied filters. So if you create a 'unique dimension' that returns a unique value for every entry in your dataset (like an ID column), you can have a function like this:
function editEntry(id, changes) {
uniqueDimension.filter(id); // filter to the item you want to change
var selectedEntry = uniqueDimension.top(1)[0]; // get the item
_.extend(selectedEntry, changes); // apply changes to it
ndx.remove(); // remove all items that pass the current filter (which will just be the item we are changing
ndx.add([selectedEntry]); // re-add the item
uniqueDimension.filter(null); // clear the filter
dc.redrawAll(); // redraw the UI
}
At the least you could do a cf.remove() then readd the adjusted data with cf.add().

AdvancedDataGrid (grouping) quick jump to row

I have a problem with the AdvancedDataGrid widget. When the dataProvider is an ArrayCollection (of arrays), the nth array (within the collection) is also the nth row within the grid, and I can jump and display the i-th row by scripting
adg.selectedIndex = i;
adg.scrollToIndex(i);
now, when I add a Grouping, the dataProvider ends up being a GroupingCollection2, and now the index in the dataprovider's source does not correspond to the index in the adg anymore (which is understandable, because it's being grouped).
How can I select and display a row in grouped data efficiently? Currently, I have to traverse the adg and compare each found item with its data attributes in order to find the correct index of the row within the adg, and jump to it like above. This process is very slow. Any thoughts?
edited later:
We already used a caching object as Shaun suggests, but it still didn't compensate for the search times. In order to fully construct a sorting of a list of things (which this problem equates to, as the list is completely reordered by the grouping), you always have to know the entire set. In the end we didn't solve that problem. The project is over now. I will accept Shaun's answer if no one knows a better way in three days.
Depending on what values your comparing against you can store the objects in a dictionary with the lookup using the property/properties that would be searched for, this way you have a constant time look-up for the object (no need to look at every single item). Say for example your using a property called id on an object then you can create an AS object like
var idLookup:Object = {};
for(myObject in objects)
idLookup[myObject.id] = myObject;
//Say you want multiple properties
//idLookup[myObject.id]={};
//idLookup[myObject.id][myObject.otherProp] = myObject;
now say the user types in an id you go into the idLookup object at that id property and retrieve the object:
var myObject:Object = idLookup[userInput.text];
myAdg.expandItem(myObject, true);
now when you want to get an object by id you can just do
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/controls/AdvancedDataGrid.html#expandItem()
I haven't done any thorough testing of this directly, but use a similar concept for doing quick look-ups for advanced filtering. Let me know if this helps at all or is going in the wrong direction. Also if you could clarify a bit more in terms of what types/number of values you need to lookup and if there's the possibility for multiple matches etc. I may be able to provide a better answer.
Good luck,
Shaun

Resources