unique or distinct for collections in squeak(Smalltalk) - collections

there is a method like uniq in bash for collections in squeak?
to remove all duplications in the collection and get a collection with one of each distinct obj.
For example:
before: #('cat', 'cat', 'dog', 'cat')
after: (uniq) #('cat', 'dog')

There exist quite a few ways of doing the same in subtly different ways, depending on your needs. The more popular ones I can think of are:
aCollection asSet creates a new Set with all the elements of the collection, and sets by definition have only one instance of each element.
aCollection removeDuplicates removes the duplicates from the original collection itself. This only works if the collection is not of a fixed size (i.e. it doesn't work on arrays).

Related

Finding all entity names from deprecated freebase

I'm training a few Machine learning models that represent words as vectors, using freebase as training data. Since the API has been deprecated, I'm working with raw freebase dump, which is now a list of 3.1 billion triples, containing more than 500 million distinct entities (subject/object), and I'd like to reduce this number.
I would like to remove all triples which simply denote names of subjects so that only triples containing MIDs remain. However, I've found multiple possible predicates that define the 'name' of an entity.
i) common.notable_for.display_name
ii) type.object.name
iii) /rdf-schema#label
I have 3 questions :
a) Is there any difference between the above predicates?
b) Are there any additional predicates which also describe the names of entities?
c) Apart from the triple where a name is defined, does the name ever appear in other triples, instead of the MID?
Thank you for your help!
You should only concentrate on the type.object.name that's the schema property holding the topic's name.
The /rdf-schema#label is equalization, it is not part of the freebase schema.
The common.notable_for.display_name description is: "Localized/gender appropriate display name for the notable object.", it is also a property within a CVT (compound value type) and it holds different type of information: "of all types that a topic has, what't it most "important". As far as I remember "Larry Page" was an "entrepreneur". So you don't need this property. Concentrate on the TON type.object.name.

How to merge List of two objects based on unique id?

I have a class
class Topic {
Integer id
String name
Integer numberPosts
}
and another one
class TopicDetails {
Integer id
Integer numberPosts
}
The second is actually a container for query results that's why the similarity.
I have two lists List<Topic> and List<TopicDetails>. Objects will be unique by id in both the lists. The second one will have at most all the ids as the first list.
I want to merge the data from second list to first list. I understand that there are simple ways like
to iterate over both and check for ids and merge the details
Using a map for the details.
But is there some better way to do this? Collection framework has many new methods so I was thinking that there may be some elegant way to do this in groovy instead of doing the above mentioned methods.
EDIT I forgot to mention that the first one initially does not have the information regarding the numberPosts. That is why the second one is present i.e. as a container for information from the database.
A List is still just a list. You can use lambda expressions and "find" the ID each time, but you gain nothing in efficiency. A map is the way to go, at least for one of the lists.

In d3.js, how can I preserve index i (or coerce / force group index j use in it's place) in data bindings subject to nested selections?

The context here is standard (index i based) data binding in d3.js, ie where indices have supposedly been preserved.
In my experience, selection mode, preservation of indices and data binding comprise a war zone. For all but the simplest cases, gain one and you lose one of the others.. (Bloggers, this an area which would benefit greatly from a rigorous truth table..).
For example, for nested selections of the form d3.selectAll().selectAll(), the only index available at the point of data binding is that of the group or parentNode: j, which, though both common to old and new selections, cannot be used.
Assuming (because it finds only the first element and leaves all key/value pairs undefined) d3.selectAll().select() is not an option, is there some means of coercing binding based on the j index? Some kind of key function, perhaps, but specifying use of the index j?
In the past I've overcome this problem by playing selection & indexing leapfrog using object filters, but frankly it's messy and opaque.
Though possibly founded on misunderstanding (an obvious knock-on issue, for example, is where there are multiple elements at the given j index), glad of suggestions or insights..
Thug
Hard to believe, but in total it's taken me two weeks to find the answer: the so-called descendent combinator "A B".
The description provided just enough of a hint to warrant a test..
Simply put, it preserves the index i of selected element A (and does so even where A and B are separated by multiple interstitial svg:g group elements). For example:
d3.selectAll(".parent_class .child_class");
Using the selection's .each() method with an inline function is a way to access both child data and, via an outer scope, the parent's data. Something like this:
var parents = d3.selectAll('.parent-class').data(parentData)
parents.each(function(dParent, iParent) {
var currentParent = d3.select(this)
var children = currentParent
.selectAll('.child-class')
.data(dParent.children);
// Now, e.g. if child color should depend on the parent data (and child data):
children.attr('color', function(dChild, iChild) {
// Here you have access to the parent's and the child's datums,
// as well as their indices
});
});
Is this a solution to what you're trying to achieve?

Is it possible to define limit()'s for child objects?

I am looking to retrieve the first 10 records of a 1000 in a dataset, but each of those records has a property with a 1000 records itself - is there a way to limit the grandchild to only return X records as well? Something like:
firebaseRef.limit(10).limit(10, childPropertyName).once(...)
(when I say 1000, it could be 1,000,000, I didn't want to include all the zeros)
If not, are there any workarounds or strategies to deal with large nested sets?
One possibility is to de-nest them. The grandchild could be split out out into its own list with the same key names as its former parent. Is that the best way to go?
No, there isn't. We're working on ways to do this but they won't be released for a while.
In the meantime, I'd suggest building a separate index that simply lists the names of the top-level children. Then you can do a "limit(10)" on that index, and then do a limit(10) on a path constructed using each of those keys.
So your code would look like this:
indexRef.limit(10).once("child_added", function(snapshot) {
dataRef.child(snapshot.name()).limit(10).once(...);
}

AdvancedDataGrid (grouping) quick jump to row

I have a problem with the AdvancedDataGrid widget. When the dataProvider is an ArrayCollection (of arrays), the nth array (within the collection) is also the nth row within the grid, and I can jump and display the i-th row by scripting
adg.selectedIndex = i;
adg.scrollToIndex(i);
now, when I add a Grouping, the dataProvider ends up being a GroupingCollection2, and now the index in the dataprovider's source does not correspond to the index in the adg anymore (which is understandable, because it's being grouped).
How can I select and display a row in grouped data efficiently? Currently, I have to traverse the adg and compare each found item with its data attributes in order to find the correct index of the row within the adg, and jump to it like above. This process is very slow. Any thoughts?
edited later:
We already used a caching object as Shaun suggests, but it still didn't compensate for the search times. In order to fully construct a sorting of a list of things (which this problem equates to, as the list is completely reordered by the grouping), you always have to know the entire set. In the end we didn't solve that problem. The project is over now. I will accept Shaun's answer if no one knows a better way in three days.
Depending on what values your comparing against you can store the objects in a dictionary with the lookup using the property/properties that would be searched for, this way you have a constant time look-up for the object (no need to look at every single item). Say for example your using a property called id on an object then you can create an AS object like
var idLookup:Object = {};
for(myObject in objects)
idLookup[myObject.id] = myObject;
//Say you want multiple properties
//idLookup[myObject.id]={};
//idLookup[myObject.id][myObject.otherProp] = myObject;
now say the user types in an id you go into the idLookup object at that id property and retrieve the object:
var myObject:Object = idLookup[userInput.text];
myAdg.expandItem(myObject, true);
now when you want to get an object by id you can just do
http://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/mx/controls/AdvancedDataGrid.html#expandItem()
I haven't done any thorough testing of this directly, but use a similar concept for doing quick look-ups for advanced filtering. Let me know if this helps at all or is going in the wrong direction. Also if you could clarify a bit more in terms of what types/number of values you need to lookup and if there's the possibility for multiple matches etc. I may be able to provide a better answer.
Good luck,
Shaun

Resources