I think Google Cloud Vision has awesome accuracy and by API, it is very easy to use. But I don't know how many teacher labels it has. Because of this, it can be sometimes difficult to efficiently use scores and compare with other outputs.
On the official document, I could't find the description about the teacher label list or something.
Anyone knows how many teacher label it has and the rules about the labels?(I felt it has some hierarchical rules.)
Related
I have several documents (pdf and txt) in my notebook and I want to construct a knowledge graph using Grakn.
Through Google I found the blog but there is no documentation or readme teaching how to do that.
Also is written in the blog "The script to mine text can be found on our GitHub repo here" but I am failing in understanding what I have to do.
Can someone here advise me how to construct a knowledge graph from text using Grakn?
Grakn is a knowledge engine/network, which understands knowledge by well defined entities and relations (ontologies), so you need to use NLP (Natural Language processing) to make human language accessible to a graph network. also you need OCR (Optical Character Recognition) to convert some image texts to text. also you should teach the network basic ontologies to understand the texts. you are actually heading through Singularity era.
To give an example of how to go from a collection of text to a knowledge graph, let us assume that all of your text is concerned with a certain domain of knowledge - in the example of the blog post you mention, we are dealing with biomedical research publications.
A first step could be to find entities, or defined "things", in the text. To stick with the biomedical example, we could look for drugs and genes mentioned in the publications. This is called named-entity-recognition (NER), a technique applied in text-mining.
If a certain drug is often mentioned in the same publication as a particular gene, they "co-occur" and are likely related in some way. This would be an example of a relationship. The automated extraction of exactly how they are related is a difficult problem and is called relationship-extraction (RE).
Solutions for both NER and RE are usually domain-specific (ranging from simple matching of dictionary terms to AI models).
If you are interested in text-mining, a good place to start in python is NLTK.
The idea of a knowledge graph is to put defined things, called entities, in defined relationships to one another to create context. After you have a list of entities that you have found in all your documents, as well as their relationships (as in the example above, co-occurrance in a document or even a single sentence), you can define a schema and upload the entities and relationships into grakn and use all of its functionality to analyze your data.
For a tutorial on how to use grakn with already extracted data, see here
I have a set of 20'000 words and simple phrases. I need to pick each word and define it's general concept, or category.
So if I take "hockey" it should fall into a large "Sports" category. If it's "Barack Obama" then it's "Politics". Here is a sample from my word list:
israel
illness
face
experts
throat
tory
moments
numerous
All the weird stuff can fall into "General" category.
That's my problem. Following are my thoughts that you could probably ignore, cause I have no good clue how to deal with the problem.
Probably I am looking for some kind of opened dictionary or API that can define a general concept of a word. I was thinking to take a simple dictionary and run every word through it parsing it's Economics categories. But not all words have it.
I could point you to http://dbpedia.org/. It's a onthology of the data of many wikipedia info-boxes and it has a sparql endpoint for queries. I used it two year ago, but the api seems to have changed, so I can't give you an example right now. But it has a pretty good documentation.
It sounds like you're wanting to do topic modeling. The packages quanteda, Snowball, and tm are good places to start. A resource for doing topic modeling with the mallet package is here:
http://www.matthewjockers.net/materials/dh-2014-introduction-to-text-analysis-and-topic-modeling-with-r/
The general idea of topic modeling is that your words came from documents that were themselves about a certain topic. Topic modeling checks which words occur together in the same documents, and assumes that, over many documents, those words are probably about the same topic. Hopefully this helps.
Caveat Emptor - I'm neither a linguist nor a Graph theorist, however, I am a [Java] developer wishing to use a Graph database for persistence and the following topic is of interest to me, and I hope to others.
OK, the idea is to have some application or code to:
recognise the embedded relationship structures between named entities within a given piece of text
apply or expose these discovered relationships to usage within a Graph database structure.
In such a system, the text might essentially form a basic, layman-written graph schema of sorts. To better visualise this, here is some [very], basic text:
Andrew is married to Jane
Using the online CLAWS parts-of-speech tagger (POS), I'm given the following:
Andrew_NP0 is_VBZ married_AJ0 to_SENT Jane_NP0
According to 'The BNC Basic (C5) Tagset' # Oxford University, NP0='Proper noun', which is a name (as you know) but these NP0-tagged entries would lend themselves to becoming graph vertice instances/nodes (the end user could be further prompted to give these entries an encompassing 'type/description'). The verb(s), 'VBZ' and adjective(s), AJ0, might highlight graph relationships.
Once the end user has confirmed their graph representation, they might export it to GraphML, for re-import into a graph database such as Titan or Neo4j.
So, the overall idea is to have a tool that allows a layman end user the ability to create Graph-theory-based database structures, using everyday language.
Does such a tool exist already?
Some of my observations above were influenced, in some way, by the following tools (amongst others):
http://www.plantuml.com <- UML diagrams defined using a simple and intuitive language
http://www.planttext.com <- See plantuml
http://www.acqualia.com/soulver <- An NLP-based calculator and currency exchange tool, using natural sentence phrases
http://nlp.stanford.edu/software/tagger.shtml <- Stanford Log-linear Part-Of-Speech Tagger
Yes, this exists in many different places. Examples include OpenCalais (which was created by Reuters) and the AlchemyAPI. There are a bunch of other toolkits and APIs like NLTK and IBM's UIMA that don't present you with a finished solution, but a bunch of tools necessary to build a bespoke solution.
This is a very deep area, subject to ongoing research. I can't cover all of it here, but one thing to keep in mind is that solutions in this space are often highly specific to a certain "corpus" of documents. Software which does any arbitrary English text well doesn't really exist. Instead what you see is solutions that do it really well for business press releases. Or intelligence reports. Or newspaper articles. Or medical alerts. But not any, arbitrary text.
The area is also rife with a lot of problems; one of the big ones is known as "Named Entity Recognition"
Andrew is married to Jane. Andrew bought eggs yesterday.
How many people are being discussed here? Is the second Andrew the same as the first? That's a very complicated and contextual question. But you better get it right, otherwise you might have more or fewer "person" nodes in your resulting graph than you expect.
I'm looking for a place API that can be used with a map API. Here are three APIs I've been thinking about:
- Google Maps/Places: https://developers.google.com/maps/
- Microsoft Bing: https://www.microsoft.com/maps/developers/mapapps.aspx
- Nokia Maps: http://api.maps.nokia.com/2.1.0/devguide/overview.html
They seem to be likely to give good results. The application I'm going to work on is on travel information. So we would like to use the best API for finding sightseeing, accommodations, restaurants, but we don't care about dentists, grocery stores, etc., which are not related to travel.
Which one do you guys think would be the best for our needs? (if you think of another good API that I didn't mention, make sure to let me know!)
Thank you,
J
It is difficult to give an absolute answer here because the quality of the data behind each of the APIs will vary from place to place, and what is "best" will depend on the nature of your app and the questions it solves - For example the extent of Google data (to pick one of your options) is generally perceived to be stronger in the Americas and weaker in Europe. Another example I have heard of is a Brazilian company that decided on using Nokia Maps because it had better coverage in rural areas even though it was weaker in the big cities. And of course the breadth and quality of the data may change with time.
I would guess that your best option here would be to run a simple beauty contest.
Take as a starting point the code examples from the relevant API developer sites
Bing Search
Google Search
Nokia Search
Then modify the code to obtain the same results for some of your typical use cases e.g. accommodations, restaurants then score each API according to your criteria
What sort of coverage is obtained in an area that is relevant for
you?
How easy is it to modify the code?
Do you like the way the results are presented?
How easy is it to get more detailed information?
How much does a data plan for the API cost?
Then use the score card to work out which API is the best for you.
Here is an example of difference in coverage from all three APIs for "bookshops" in Berlin
In this particular case the Nokia API returns more data, but a different result may be given if you look for say "Bookshops in Boston" - you need to decide which locations and which queries are most relevant to your application.
I think following list of APIs will be helpful to you.
http://www.programmableweb.com/news/134-travel-apis-kayak-yahoo-travel-and-hotelscombined/2012/02/28
There are about 134 APIs specifically meant for Travel App.
My teammates and I have a very challenging new project to do, and we are supposed to submit it next week. We don't have a single clue about how to do it, and really need help. We are undergraduate students, new to Information Retrieval and AI, and really need your ideas.
The project is roughly:
When an expert is cited in a document,
find an expert with an opposing
opinion & find out what he/she says
about that topic.
We are free to use any programming language, but we are not concerned with the programming. We would like help to get us started. Please give us a rough idea on how to design such a system and how to retrieve information on the internet. How should we get his opinion, then find an opposite opinion?
Simple: use Amazon's Mechanical Turk.
Without that (or an equivalent) you're in trouble. If there are no further constraints on the problem then you will need a full-blown AI, the kind that doesn't yet exist. If there are severe restraints then you might have a chance of doing this in a week. If the expert can be in any field (medicine, politics, history, fashion, science, comic books, etc.) then there will be no single, well-organized repository of essays. You'll have to use Google to find Dr. X's opinion. Once you find Dr. X's writing (and let's pray it's text, not audio) you'll have to do some kind of natural language processing to get the thrust of it, even if you're lucky enough to find a descriptive title ("Digital Photography Is Absolutely Great"). Then you have to figure out it's opposite. What's the opposite of "Neil Gaiman draws on folklore for his story ideas"? Figuring out what opinion you're looking for will be a serious problem. After that, things actually get easier: you can google for the subject and use the same magic tools to find the one you're looking for.
So what do have a chance of solving? A search for opinions that someone else has already organised into "pro" and "con". Some online political forums are organised that way. Wikipedia cites opposing views in a special section in some of its articles. Science journals print letters of rebuttal. Look around, you might find a site even more cut-and-dried. Choose a small enough arena and you'll have a tractible problem.
EDIT: Damn, Ben Dunlap beat me to all my major points in a comment. Sigh
Sounds like an NLP problem to me. As for the information about documents and cites, http://citeseerx.ist.psu.edu should be a good starting point.
For each paper, there are several citations which refer to the paper. At the very minimum, you have to scan the abstract of the paper and that of the citations and run your own algorithm to figure if any citation is of the opposing opinion. Maybe your professor can give you hints on some approximate heuristic, but as far as I know it is a really hard problem.
I would be watching this thread for more interesting approaches.
Automatically submit a Google search request similar to "expert_name sucks", "expert_name wrong", or something like that. Find the first result that has "PhD" with a document link in the same sentence and return the link.
I think you might be blowing this up a little too big... as an undergraduate project, I would approach it a little more small scale.
Unless your specification says you must use actual internet resources, you would be better off creating your own database of custom short documents. Add metadata to each document stating the points they make about certain topics.
Next, I would create a list of citations which link to each document and add some metadata representing that experts stance on the topic. When someone reads a document, I would augment the list of citations with lists of links to documents which have alternative views on that topic.
Basically it would consist of these tables:
Document (id, data)
DocumentPoints (documentId, topic, stance)
Citation (documentId, topic, stance)
And when someone loads up a document, the citations are pulled up as well. For each citation, you search DocumentPoints for the same topics with different stances. The most difficult part of this project would be creating the 5 or 6 documents you need to have data in your database. After that the solution is trivial.
On a side note, most of these other answers are telling you to use some existing solution... don't do that unless the assignment tells you to. You'll be much better off understanding the problem and various ways to solve it (this is definitely not the only/best one) if you work through the entire problem yourself. When the teacher asks you to do something not supported by whatever product you chose to implement your solution on, you wouldn't be able to fix it. If you had just written it yourself, you could just as easily implement to the new spec as well.