We are working on a research project for answering questions with a knowledge base. We adopted the dataset SimpleQuestions (https://research.fb.com/projects/babi/).
We loaded the latest freebase data dump into Virtuoso, and query entities' names by their mids (using relations type.object.name and common.topic.alias). However, many entities' names cannot be found.
We also use the KB provided by Sempre (https://github.com/percyliang/sempre), where we find more entities' name but not all.
We are guessing that these entities might be deleted, is that true? If so, how can we continue to work on this problem?
The Freebase API has been depreciated and it is not possible to obtain entity names. However, Google has provided Freebase/Wikidata Mappings for 2.1M entities. These mappings could be used to map the freebase entity to wikidata entities and obtain their entity names and other information.
Additionally, http://sameas.org/ provides some additional mappings for freebase entities.
Edit:
You can also read the Freebase Dump to get these mappings. I personally used the following properties to get the corresponding entity names:
ENTITY_GET_LABEL_ORDER = [
'<http://rdf.freebase.com/ns/type.object.name>',
'<http://rdf.freebase.com/ns/common.topic.alias>',
'<http://rdf.freebase.com/key/en>',
'<http://rdf.freebase.com/key/wikipedia.en>',
'<http://rdf.freebase.com/key/wikipedia.en_title>',
]
Also, even after doing this, there were a good amount (~1-5k, do not remember it exactly) mids that we were unable to map to names, and one could safely remove those (only a small number of such) questions. Google also provides an additional dump which contains some deleted tuples.
Related
I am trying to come up one-many relationship where an user can have links to many organization bucket.
I would like to walk and return the results back.
I am upgrading the stackmob's scala driver to support linkwalking https://github.com/megamsys/scaliak
Any help would be greatly be appreciated. The forums talk about using mapreduce.
Link walking is deprecated in the latest version of Riak, and will likely be removed in future versions. So it probably doesn't make sense to upgrade the Scala driver to support it.
The real question here is - how should you model a One to Many relationship in Riak? There are two main approaches to this, depending on if you have a read-heavy or a write-heavy use case.
1 - Links as Lists of Keys
You can store the list of links/associations as a separate object for easy retrieval. For example, if I have a users object stored at /buckets/users/keys/user-id-123:
{ id: "user-id-123", name: "Dmitri", ... }
I can then store the organizations that user belongs to (notice that I'm using the same key for the user and for their membership object) in /buckets/user-orgs/keys/user-id-123:
["organization-id-1", "organization-id-2", "organization-id-3"]
This allows me to answer the question of "Which organizations does this user belong to?" with a single GET to the user-orgs object (and, optionally, a multi-get to fetch each of the organization objects by their IDs).
Note: If you're using Riak 2.0 or above, you can use the new Riak Data Types (specifically, the sets data type), to store that list of IDs. (The Sets data type provides an API of operations to add/remove/fetch elements from a list in a way appropriate for distributed systems).
Use this approach when you have a read-heavy use case (when the list of links is read frequently, but is not written to/updated frequently).
2 - Search / Query for the links
The other main approach is to use indexes (preferably via the Solr-based Riak Search, or, for rare cases, via Secondary Indexes) and queries to retrieve one-to-many association objects.
So, if you had a user object stored at /buckets/users/keys/user-id-123:
{ id: "user-id-123", name: "Dmitri", ... }
You would then insert multiple "membership entry" objects into the search-enabled (meaning, you would create a search index and associate it with the user-orgs bucket) /buckets/user-orgs/:
{user_id: "user-id-123", org_id: "organization-1"}
{user_id: "user-id-123", org_id: "organization-2"}
{user_id: "user-id-123", org_id: "organization-3"}
Afterwards, you can answer the question "Which organizations does the user belong to?" by issuing a Search query saying "Give me all of the objects in user-orgs where user_id equals to user-id-123", for example.
Incidentally, using Search / membership objects like this also allows you to model a Many-To-Many relationship (meaning, you can also answer the question "Which users belong to the organization organization-id-1?").
Because Search queries are more expensive than a single GET to fetch a membership list (like in the first strategy), you should use this strategy when you're not in a read-heavy use case (when the membership objects are updated often, but not read often), or when you need to also model the inverse relationship (many to many).
Note: Do not use Map/Reduce to model one-to-many relationships, and don't use the deprecated Link Walking mechanism (which uses Map/Reduce on the backend, anyways).
Should I be using entity_load or EntityFieldQuery to get entity ids from a custom entity?
I was going to use entity_load to pull all of the entities in question of a particular type, and grab their relevant information (but that seems like it could be inefficient).
EntityFieldQuery will only return an array of entity IDs. If that is all you need then EntityFieldQuery will be much faster.
If you need to get the field values you should do entity_load. It is slow but it is the Drupal way.
If it is a very large number of nodes you may have timeout issues. To overcome this use Drupals Batch API or you can use the Database API to write a custom query to pull in the exact data you need in one query. This is technically faster but requires more code and can break compatibility.
I am studying https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/reference/working-with-associations.html but I cannot figure out what cascade merge does. I have seen elsewhere that
$new_object = $em->merge($object);
basically creates a new managed object based on $object. Is that correct?
$em->merge() is used to take an Entity which has been taken out of the context of the entity manager and 'reattach it'.
If the Entity was never managed, merge is equivalent to persist.
If the Entity was detached, or serialized (put in a cache perhaps) then merge more or less looks up the id of the entity in the data store and then starts tracking any changes to the entity from that point on.
Cascading a merge extends this behavior to associated entities of the one you are merging. This means that changes are cascaded to the associations and not just the entity being merged.
I know this is an old question, but I think it is worth mentioning that $em->merge() is deprecated and will be removed soon. Check here
Merge operation is deprecated and will be removed in Persistence 2.0.
Merging should be part of the business domain of an application rather
than a generic operation of ObjectManager.
Also please read this doc v3 how they expect entities to be stored
https://www.doctrine-project.org/projects/doctrine-orm/en/latest/cookbook/entities-in-session.html#entities-in-the-session
It is a good idea to avoid storing entities in serialized formats such
as $_SESSION: instead, store the entity identifiers or raw data.
I am attempting to develop a Freebase Explorer application. one part of the application allows a user to drill down through freebase Domains, then types then type instances, then finally using the freebase Topic API i display the selected Type Instance. however many of the type instances lists do have "null" for the name and machine ids for the id.
what combination of freebase api calls can i employ to return something of value/interest (man readable) using a freebase mid?
where should i look in the freebase site/wiki to help?
A machine ID (MID) can be used anywhere any other ID is used in Freebase. There's no requirement that an object have a name. "Something of value/interest" will depend a lot on the context, but the types and property values of an object help show how it's connected to the rest of the graph.
You might also look at the existing Freebase Schema Explorer app for ideas and inspiration.
Tom's explanation regarding machine ids is spot on, here is some additional information:
Domains and types are schema objects and it's preferable that you use human readable ids for these. Items "of interest" are usually topics, and those are all objects that are typed with /common/topic.
You can use MQL to get a list of types and domains, and then as you say use the Topic API - which will also be available in the new APIs - to get all the data for a given topic.
I am writing a vocabulary learning application.
I have a Wordset Entity.
I want it to contain a property - WordsToLearn (a collection of words to learn for a CURRENT user, words which are either new, i.e no repetitions for current user or have Repetition due today or earlier)
How can I implement this?
Without this my object seems very incomplete.
Are entities limited to simple relationships and I should forget about it in this place and move it to Wordset Repository.
I would be very useful to be able to get that information (wordsToLearn) from Wordset Object
Yes, entities are limited to these simple relationships. For more complex queries you have to use a WordsetRepository that you can pass your current_user object to and use that to get the desired entities in your controllers. You can use Doctrine's DQL to fetch 'real' entity objects instead of just SQL results.