I have a navbar as a map:
var navbar = map[string]navbarTab{
}
Where navbarTab has various properties, child items and so on. When I try to render the navbar (with for tabKey := range navbar) it shows up in a random order. I'm aware range randomly sorts when it runs but there appears to be no way to get an ordered list of keys or iterate in the insertion order.
The playground link is here: http://play.golang.org/p/nSL1zhadg5 although it seems to not exhibit the same behavior.
How can I iterate over this map without breaking the insertion order?
The general concept of the map data structure is that it is a collection of key-value pairs. "Ordered" or "sorted" is nowhere mentioned.
Wikipedia definition:
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears just once in the collection.
The map is one of the most useful data structures in computer science, so Go provides it as a built-in type. However, the language specification only specifies a general map (Map types):
A map is an unordered group of elements of one type, called the element type, indexed by a set of unique keys of another type, called the key type. The value of an uninitialized map is nil.
Note that the language specification not only leaves out the "ordered" or "sorted" words, it explicitly states the opposite: "unordered". But why? Because this gives greater freedom to the runtime to implement the map type. The language specification allows to use any map implementation like hash map, tree map etc. Note that the current (and previous) versions of Go use a hash map implementation, but you don't need to know that to use it.
The blog post Go maps in action is a must read regarding to this question.
Before Go 1, when a map was not changed, the runtime returned the keys in the same order when you iterated over its keys/entries multiple times. Note that this order could have changed if the map was modified as the implementation might needed to do a rehash to accommodate more entries. People started to rely on the same iteration order (when map was not changed), so starting with Go 1 the runtime randomizies map iteration order on purpose to get the attention of the developers that the order is not defined and can't be relied on.
What to do then?
If you need a sorted dataset (be it a collection of key-value pairs or anything else) either by insertion order or natural order defined by the key type or an arbitrary order, map is not the right choice. If you need a predefined order, slices (and arrays) are your friend. And if you need to be able to look up the elements by a predefined key, you may additionally build a map from the slice to allow fast look up of the elements by a key.
Either you build the map first and then a slice in proper order, or the slice first and then build a map from it is entirely up to you.
The aforementioned Go maps in action blog post has a section dedicated to Iteration order:
When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next. Since Go 1 the runtime randomizes map iteration order, as programmers relied on the stable iteration order of the previous implementation. If you require a stable iteration order you must maintain a separate data structure that specifies that order. This example uses a separate sorted slice of keys to print a map[int]string in key order:
import "sort"
var m map[int]string
var keys []int
for k := range m {
keys = append(keys, k)
}
sort.Ints(keys)
for _, k := range keys {
fmt.Println("Key:", k, "Value:", m[k])
}
P.S.:
...although it seems to not exhibit the same behavior.
Seemingly you see the "same iteration order" on the Go Playground because the outputs of the applications/codes on the Go Playground are cached. Once a new, yet-unique code is executed, its output is saved as new. Once the same code is executed, the saved output is presented without running the code again. So basically it's not the same iteration order what you see, it's the exactly same output without executing any of the code again.
P.S. #2
Although using for range the iteration order is "random", there are notable exceptions in the standard lib that do process maps in sorted order, namely the encoding/json, text/template, html/template and fmt packages. For more details, see In Golang, why are iterations over maps random?
Go maps do not maintain the insertion order; you will have to implement this behavior yourself.
Example:
type NavigationMap struct {
m map[string]navbarTab
keys []string
}
func NewNavigationMap() *NavigationMap { ... }
func (n *NavigationMap) Set(k string, v navbarTab) {
n.m[k] = v
n.keys = append(n.keys, k)
}
This example is not complete and does not cover all use-cases (eg. updating insertion order on duplicate keys).
If your use-case includes re-inserting the same key multiple times (this will not update insertion order for key k if it was already in the map):
func (n *NavigationMap) Set(k string, v navbarTab) {
_, present := n.m[k]
n.m[k] = v
if !present {
n.keys = append(n.keys, k)
}
}
Choose the simplest thing that satisfies your requirements.
Related
We often use GraphBLAS for graph processing so we need to use the incidence matrix. I haven't been able to find a way to export this from Grakn to a csv or any file. Is this possible?
There isn't a built-in way to dump data to CSV in Grakn right now. However, we do highly encourage our community to contribute open source tooling for these kinds of tasks! Feel free to chat to use about it on our discord.
As to how it can be done, conceptually it's pretty easy:
Query to get stream all hyper-relations out:
match $r isa relation;
and then for each relation, we can pipeline another query (possibly in new transaction if you wish to keep memory usage lower):
match $r iid <iid of $r from previous query>; $r ($x); get $x;
which will get you everything in this particular hyper relation $r playing a role.
If you also wish to extract attributes that are attached to the hyper relation, you can use the following
match $r iid <iid of $r from first query>; $r has $a; get $a;
In effect we can use these steps to build up each column in the A incidence matrix.
There are a couple if important caveats I should bring up:
What you'll end up with, will exclude all type information about the hyper relations, the role players in the relations, and the actual role that is being played by the role player, and attribute types owned.
==> It would be interesting to hear/discuss how one could encode types information for use in GraphBLAS
In Graql, it's entirely possible to have relations participating in relations. in the worst case, this means all hyper-edges E will also be present in the set V. In practice only a few relations will play a role in other relations, so only a subset of E may be in V.
So the incidence matrix is equivalent to the nodes/edges array used in force graph visualisation. In this case it is pretty straight forward.
My approach would be slightly different than the above as all i need to do is pull all of the things in the db (entities, relations, attributes), with
match $ting isa thing;
Now when i get my transaction back, for each $ting I want to pull all of the available properties using both local and remote methods if I am building a force graph viz, but for your incidence matrix, I really only care about pulling 3 bits of data:
The iid of the thing
The attributes the thing may own.
The roles the thing owns if it is a relation
Essentially one tests each returned object to find out the type (e.g. entity, attribute, relation), and then uses some of the local and remote methods to get the data one wants. In Python, the code for pulling the data for relations looks like
# pull relation data
elif thing.is_relation():
rel = {}
rel['type'] = 'relation'
rel['symbol'] = key
rel['G_id'] = thing.get_iid()
rel['G_name'] = thing.get_type().get_label().name()
att_obj = thing.as_remote(r_tx).get_has()
att = []
for a in att_obj:
att.append(a.get_iid())
rel['has'] = att
links = thing.as_remote(r_tx).get_players_by_role_type()
logger.debug(f' links are -> {links}')
edges = {}
for edge_key, edge_thing in links.items():
logger.debug(f' edge key is -> {edge_key}')
logger.debug(f' edge_thing is -> {list(edge_thing)}')
edges[edge_key.get_label().name()] = [e.get_iid() for e in list(edge_thing)]
rel['edges'] = edges
res.append(rel)
layer.append(rel)
logger.debug(f'rel -> {rel}')
This then gives us a node array, which we can easily process to build an edges array (i.e. the links joining an object and the attributes it owns, or the links joining a relation to its role players). Thus, exporting your incidence matrix is pretty straightforward
I'm new to arangoDB with graphs. I simply want to know if it is faster to build edges or use 'DOCUMENT()' for very simple 1:1 connections where a querying the graph is not needed?
LET a = DOCUMENT(#from)
FOR v IN OUTBOUND a
CollectionAHasCollectionB
RETURN MERGE(a,{b:v})
vs
LET a = DOCUMENT(#from)
RETURN MERGE(a,{b:DOCUMENT(a.bId)}
A simple benchmark you can try:
Create the collections products, categories and an edge collection has_category. Then generate some sample data:
FOR i IN 1..10000
INSERT {_key: TO_STRING(i), name: CONCAT("Product ", i)} INTO products
FOR i IN 1..10000
INSERT {_key: TO_STRING(i), name: CONCAT("Category ", i)} INTO categories
FOR p IN products
LET random_categories = (
FOR c IN categories
SORT RAND()
LIMIT 5
RETURN c._id
)
LET category_subset = SLICE(random_categories, 0, RAND()*5+1)
UPDATE p WITH {
categories: category_subset,
categoriesEmbedded: DOCUMENT(category_subset)[*].name
} INTO products
FOR cat IN category_subset
INSERT {_from: p._id, _to: cat} INTO has_category
Then compare the query times for the different approaches.
Graph traversal (depth 1..1):
FOR p IN products
RETURN {
product: p.name,
categories: (FOR v IN OUTBOUND p has_category RETURN v.name)
}
Look-up in categories collection using DOCUMENT():
FOR p IN products
RETURN {
product: p.name,
categories: DOCUMENT(p.categories)[*].name
}
Using the directly embedded category names:
FOR p IN products
RETURN {
product: p.name,
categories: p.categoriesEmbedded
}
Graph traversal is the slowest of all 3, the lookup in another collection is faster than the traversal, but the by far fastest query is the one with embedded category names.
If you query the categories for just one or a few products however, the response times should be in the sub-millisecond area regardless of the data model and query approach and therefore not pose a performance problem.
The graph approach should be chosen if you need to query for paths with variable depth, long paths, shortest path etc. For your use case, it is not necessary. Whether the embedded approach is suitable or not is something you need to decide:
Is it acceptable to duplicate information, and potentially have inconsistencies in the data? (If you want to change the category name, you need to change it in all product records instead of just one category document, that products can refer to via the immutable ID)
Is there a lot of additional information per category? If so, all that data needs to be embedded into every product document that has that category - basically trading memory / storage space for performance
Do you need to retrieve a list of all (distinct) categories often? You can do this type of query really cheap with the separate categories collection. With the embedded approach, it will be much less efficient, because you need to go over all products and collect the category info.
Bottom line: you should choose the data model and approach that fits your use case best. Thanks to ArangoDB's multi-model nature you can easily try another approach if your use case changes or you run into performance issues.
Generally spoken, the latter variant
LET a = DOCUMENT(#from)
RETURN MERGE(a,{b:DOCUMENT(a.bId)}
should have lower overhead than the full-featured traversal variant. This is because the DOCUMENT variant will do a point lookup of a document whereas the traversal variant is very general purpose: it can return zero to many results from a variable number of collections, needs to keep track of the path seen etc.
When I tried both variants in a local test case, the non-traversal variant was also a lot faster, supporting this claim.
However, the traversal-based variant is more flexible: it can also be used should there be multiple edges (no 1:1 mapping) and for longer paths.
I would like to perform a search on the zope catalog of the objects with missing index key values. Is it possible?
For example consider the subsequent code lines:
from Products.CMFCore.utils import getToolByName
catalog = getToolByName(context, 'portal_catalog')
results = catalog.searchResults({'portal_type': 'Event', 'review_state': 'pending'})
what to do if I'm interested in objects in which a certain item, instead of portal_type or review_state, has not be inserted?
You can search for both types, but to search for MissingValue entries requires custom handling of the internal catalog data structures.
Indexes take the value from an object, and index that. If there is an AttributeError or similar, the index does not store anything for that object, and if the same field is part of the returned columns, in that case a MissingValue will be given to indicate the index is empty for that field.
In the following examples I assume you have a variable catalog that points to the site's portal_catalog tool; e.g. the result of getToolByName(context, 'portal_catalog') or similar.
Searching for None
You can search for None in many indexes just fine:
catalog(myKeywordIndex=None)
The problem is that most indexe types ignore None as a value. Thus, searching for None will fail on Date and Path indexes; they ignore None on index, and Boolean indexes; they turn None into False when indexing.
Keyword indexes ignore None as well, unless it is part of a sequence. If the indexed method returns [None] it'll happily be indexed, but None on it's own won't be.
Field indexes do store None in the index.
Note that each index can show unique values, so you can check if there are None values stored for a given index by calling:
catalog.uniqueValuesFor(indexname)
Searching for missing values
This is a little trickier. Each index does keep track of what objects it has indexed, to be able to remove data from the index when the object is removed, for example. At the same time, the catalog keeps track of what objects it has indexed as a whole.
Thus, we can calculate the difference between these two sets of information. This is what the catalog does all the time when you call the published APIs, but for this trick there is no such public API. We'll need to reach into the catalog internals and grab these sets for ourselves.
Luckily, these are all BTree sets, and the operations are thus relatively efficient. Here is how I'd do it:
from BTrees.IIBTree import IISet, difference
def missing_entries_for_index(catalog, index_name):
# Return the difference between catalog and index ids
index = catalog._catalog.getIndex(index_name)
referenced = IISet(index.referencedObjects()) # Works with any UnIndex-based index
return (
difference(IISet(catalog._catalog.paths), referenced),
len(catalog) - len(referenced)
)
The missing_entries_for_index method returns an IISet of catalog ids and it's length; each is a pointer to a catalog record for which the named index has no entry. You can then use catalog.getpath to turn that into a full path to objects, or use catalog.getMetadataForRID to get a dictionary of metadata values, or use catalog.getobject to get the original object itself, or use catalog._catalog[] to get catalog brains.
The following method will give you a catalog result set, just like you would get from a regular catalog search:
from ZCatalog.Lazy import LazyMap
def not_indexed_results(catalog, index_name):
rs, length = missing_entries_for_index(catalog, index_name)
return LazyMap(catalog._catalog.__getitem__, rs.keys(), length)
Thanks Ago. Actually reading the link you suggest I discover that it's not possible without a trick. I report from pypi:
Note that negative filtering on an index still restricts items to those having a value in the index. So with 10 documents, 5 of them in the foo index with a value of 1, a query for not 1 will return no items instead of the 5 items without a value. You need to index a dummy/default value if you want to consider all items for a particular index.
So it is necessary to give a default value to your item and look for it.
Just wanted to confirm the in what order we get the elements from different collection
List
ArrayList:- the sequence in which we put the element we get them in same sequence
LinkedList:- when we add an element in linked list with add(E e) it will be added at last when we get it with itertator it will start from first element to last element. So we can say
the sequence in which we put the element we get them in reverse sequence
Set
HashSet:- No sequence(for getting the elemnets) is guaranted. It will be a random sequence
Tresset:- Will get the elements as per their natural ordering or comparator defined at the time of creation
Map
HashMap:- No sequence(for getting the elemnets) is guaranted. It will be a random sequence
TreeMap:- Will get the elements as per their natural ordering of key or comparator defined at the time of creation
Please let me know if it is correct?
Yup - apart from your use of the word random. The order from a hash set/map won't actually be random; it will just be implementation-specific and unpredicatable. Not quite the same thing - in particular, you shouldn't use it as a source of randomness - but you're right that you shouldn't rely on it being any specific ordering.
This ordering is what you would get if you used an iterator. I think the Java Doc are clear enough for this to be apparent. Mind you this is not all on one page.
I have a problem with the AdvancedDataGrid widget. When the dataProvider is an ArrayCollection (of arrays), the nth array (within the collection) is also the nth row within the grid, and I can jump and display the i-th row by scripting
adg.selectedIndex = i;
adg.scrollToIndex(i);
now, when I add a Grouping, the dataProvider ends up being a GroupingCollection2, and now the index in the dataprovider's source does not correspond to the index in the adg anymore (which is understandable, because it's being grouped).
How can I select and display a row in grouped data efficiently? Currently, I have to traverse the adg and compare each found item with its data attributes in order to find the correct index of the row within the adg, and jump to it like above. This process is very slow. Any thoughts?
edited later:
We already used a caching object as Shaun suggests, but it still didn't compensate for the search times. In order to fully construct a sorting of a list of things (which this problem equates to, as the list is completely reordered by the grouping), you always have to know the entire set. In the end we didn't solve that problem. The project is over now. I will accept Shaun's answer if no one knows a better way in three days.
Depending on what values your comparing against you can store the objects in a dictionary with the lookup using the property/properties that would be searched for, this way you have a constant time look-up for the object (no need to look at every single item). Say for example your using a property called id on an object then you can create an AS object like
var idLookup:Object = {};
for(myObject in objects)
idLookup[myObject.id] = myObject;
//Say you want multiple properties
//idLookup[myObject.id]={};
//idLookup[myObject.id][myObject.otherProp] = myObject;
now say the user types in an id you go into the idLookup object at that id property and retrieve the object:
var myObject:Object = idLookup[userInput.text];
myAdg.expandItem(myObject, true);
now when you want to get an object by id you can just do
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/controls/AdvancedDataGrid.html#expandItem()
I haven't done any thorough testing of this directly, but use a similar concept for doing quick look-ups for advanced filtering. Let me know if this helps at all or is going in the wrong direction. Also if you could clarify a bit more in terms of what types/number of values you need to lookup and if there's the possibility for multiple matches etc. I may be able to provide a better answer.
Good luck,
Shaun