What are the symbols for specializations and generalizations etc. in crow foot notation - erd

My university is forcing me to learn from a terrible textbook on ERD's, in which they are using a notation I personally don't like because I've never seen it used before (the book is so bad it doesn't even say what notation they're using), and I like learning it using a more common notation. Therefore I chose to learn it using the crow foot notation. (Please enlighten me if you think this is a bad decision)
Now the textbook is covering is-a(n) relationships (a.k.a. specializations/generalizations) and I'd like to know how I have to represent one in a consistent way with martin/crow foot notation... I learned about it thanks to a yt video (https://www.youtube.com/watch?v=MTG1zl8PkXk) but I noticed he's not using the same notation I'm using.
So how do I represent a specialization or generalization in crow foot notation? Or is crow foot notation only specific to cardinality? In my textbook, a few pages ahead, I also see concepts like multidimensional relationships (entity A has the same relation with entity B as with entity C) and relationships that refer to the same entity itsself (so 1 employee can hire multiple other employee's). Extra much love if you can show me how I should draw those as well :)
Unfortunately I could not find much information on this using search engines...

Try searching on "EER Diagram PDF". You'll get more images than you can shake a stick at. Some of them use crow's foot notation. Others don't.
The extra "E" stands for "Enhanced". This has to do with the fact that the original ER model did not have modeling conventions for Gen-spec (superclass/subclass) or for unions.
Unlike most people, I prefer to make a sharp distinction between diagrams that depict an ER model and ones that depict a relational model. Contrary to prevailing opinion, ER modeling isn't just "relational lite". It's a different model, with different purposes. You can look up the history if you are really interested.
I tend to use crow's foot notation in ER diagrams, and I always leave out junction boxes and foreign keys. This makes the diagram more useful for stakeholders who want to see the big picture.
I like arrowhead notation for relational diagrams. Foreign keys and junction boxes must be included in relational diagrams. They are part of the model, and implement relationships.
As far as a relational table design for gen-spec, I don't think you can beat Fowler's treatment of the subject. Try searching on "Fowler Class Table Inheritance" for an entry point into this aspect of the topic.

Related

Graph database vertex/edge inference from a text (i.e. an informal Graph 'schema'), using Natural Language Processing (NLP) - does this exist?

Caveat Emptor - I'm neither a linguist nor a Graph theorist, however, I am a [Java] developer wishing to use a Graph database for persistence and the following topic is of interest to me, and I hope to others.
OK, the idea is to have some application or code to:
recognise the embedded relationship structures between named entities within a given piece of text
apply or expose these discovered relationships to usage within a Graph database structure.
In such a system, the text might essentially form a basic, layman-written graph schema of sorts. To better visualise this, here is some [very], basic text:
Andrew is married to Jane
Using the online CLAWS parts-of-speech tagger (POS), I'm given the following:
Andrew_NP0 is_VBZ married_AJ0 to_SENT Jane_NP0
According to 'The BNC Basic (C5) Tagset' # Oxford University, NP0='Proper noun', which is a name (as you know) but these NP0-tagged entries would lend themselves to becoming graph vertice instances/nodes (the end user could be further prompted to give these entries an encompassing 'type/description'). The verb(s), 'VBZ' and adjective(s), AJ0, might highlight graph relationships.
Once the end user has confirmed their graph representation, they might export it to GraphML, for re-import into a graph database such as Titan or Neo4j.
So, the overall idea is to have a tool that allows a layman end user the ability to create Graph-theory-based database structures, using everyday language.
Does such a tool exist already?
Some of my observations above were influenced, in some way, by the following tools (amongst others):
http://www.plantuml.com <- UML diagrams defined using a simple and intuitive language
http://www.planttext.com <- See plantuml
http://www.acqualia.com/soulver <- An NLP-based calculator and currency exchange tool, using natural sentence phrases
http://nlp.stanford.edu/software/tagger.shtml <- Stanford Log-linear Part-Of-Speech Tagger
Yes, this exists in many different places. Examples include OpenCalais (which was created by Reuters) and the AlchemyAPI. There are a bunch of other toolkits and APIs like NLTK and IBM's UIMA that don't present you with a finished solution, but a bunch of tools necessary to build a bespoke solution.
This is a very deep area, subject to ongoing research. I can't cover all of it here, but one thing to keep in mind is that solutions in this space are often highly specific to a certain "corpus" of documents. Software which does any arbitrary English text well doesn't really exist. Instead what you see is solutions that do it really well for business press releases. Or intelligence reports. Or newspaper articles. Or medical alerts. But not any, arbitrary text.
The area is also rife with a lot of problems; one of the big ones is known as "Named Entity Recognition"
Andrew is married to Jane. Andrew bought eggs yesterday.
How many people are being discussed here? Is the second Andrew the same as the first? That's a very complicated and contextual question. But you better get it right, otherwise you might have more or fewer "person" nodes in your resulting graph than you expect.

Generating articles automatically

This question is to learn and understand whether a particular technology exists or not. Following is the scenario.
We are going to provide 200 english words. Software can add additional 40 words, which is 20% of 200. Now, using these, the software should write dialogs, meaningful dialogs with no grammar mistake.
For this, I looked into Spintax and Article Spinning. But you know what they do, taking existing articles and rewrite it. But that is not the best way for this (is it? let me know if it is please). So, is there any technology which is capable of doing this? May be semantic theory that Google uses? Any proved AI method?
Please help.
To begin with, a word of caution: this is quite the forefront of research in natural language generation (NLG), and the state-of-the-art research publications are not nearly good enough to replace human teacher. The problem is especially complicated for students with English as a second language (ESL), because they tend to think in their native tongue before mentally translating the knowledge into English. If we disregard this fearful prelude, the normal way to go about this is as follows:
NLG comprises of three main components:
Content Planning
Sentence Planning
Surface Realization
Content Planning: This stage breaks down the high-level goal of communication into structured atomic goals. These atomic goals are small enough to be reached with a single step of communication (e.g. in a single clause).
Sentence Planning: Here, the actual lexemes (i.e. words or word-parts that bear clear semantics) are chosen to be a part of the atomic communicative goal. The lexemes are connected through predicate-argument structures. The sentence planning stage also decides upon sentence boundaries. (e.g. should the student write "I went there, but she was already gone." or "I went there to see her. She has already left." ... notice the different sentence boundaries and different lexemes, but both answers indicating the same meaning.)
Surface Realization: The semi-formed structure attained in the sentence planning step is morphed into a proper form by incorporating function words (determiners, auxiliaries, etc.) and inflections.
In your particular scenario, most of the words are already provided, so choosing the lexemes is going to be relatively simple. The predicate-argument structures connecting the lexemes needs to be learned by using a suitable probabilistic learning model (e.g. hidden Markov models). The surface realization, which ensures the final correct grammatical structure, should be a combination of grammar rules and statistical language models.
At a high-level, note that content planning is language-agnostic (but it is, quite possibly, culture-dependent), while the last two stages are language-dependent.
As a final note, I would like to add that the choice of the 40 extra words is something I have glossed over, but it is no less important than the other parts of this process. In my opinion, these extra words should be chosen based on their syntagmatic relation to the 200 given words.
For further details, the two following papers provide a good start (complete with process flow architectures, examples, etc.):
Natural Language Generation in Dialog Systems
Stochastic Language Generation for Spoken Dialogue Systems
To better understand the notion of syntagmatic relations, I had found Sahlgren's article on distributional hypothesis extremely helpful. The distributional approach in his work can also be used to learn the predicate-argument structures I mentioned earlier.
Finally, to add a few available tools: take a look at this ACL list of NLG systems. I haven't used any of them, but I've heard good things about SPUD and OpenCCG.

besides BM25, whats other ranking functions exists?

besides BM25, what's other ranking functions exists? Where I found information on this topic?
BM25 is one of term-based ranking algorithms. Nowadays there are concept-based algorithms as well.
BM25 if state-of-the-art of term based information retrieval; however, there are some challenges that term-based cannot overcome such as, relating synonyms, matching an abbreviation or recognizing homonyms.
Here are the examples:
synonym: "buy" and "purchase"
antonym: "Professor" and "Prof."
homonym:
bow – a long wooden stick with horse hair that is used to play certain string instruments such as the violin
bow – to bend forward at the waist in respect (e.g. "bow down")
To deal with these problems, some are using concept-based models such as this article and this article.
Concept-based models are mostly using dictionaries or external terminologies to identify concepts and each have their own representation of concepts or weighting algorithms.
vanilla tf-idf is what is often used. If you want to learn about these things the best place to start is this book.

How to get started on Information Extraction?

Could you recommend a training path to start and become very good in Information Extraction. I started reading about it to do one of my hobby project and soon realized that I would have to be good at math (Algebra, Stats, Prob). I have read some of the introductory books on different math topics (and its so much fun). Looking for some guidance. Please help.
Update: Just to answer one of the comment. I am more interested in Text Information Extraction.
Just to answer one of the comment. I am more interested in Text Information Extraction.
Depending on the nature of your project, Natural language processing, and Computational linguistics can both come in handy -they provide tools to measure, and extract features from the textual information, and apply training, scoring, or classification.
Good introductory books include OReilly's Programming Collective Intelligence (chapters on "searching, and ranking", Document filtering, and maybe decision trees).
Suggested projects utilizing this knowledge: POS (part-of-speech) tagging, and named entity recognition (ability to recognize names, places, and dates from the plain text). You can use Wikipedia as a training corpus since most of the target information is already extracted in infoboxes -this might provide you with some limited amount of measurement feedback.
The other big hammer in IE is search, a field not to be underestimated. Again, OReilly's book provides some introduction in basic ranking; once you have a large corpus of indexed text, you can do some really IE tasks with it. Check out Peter Norvig: Theorizing from data as a starting point, and a very good motivator -maybe you could reimplement some of their results as a learning exercise.
As a fore-warning, I think I'm obligated to tell you, that information extraction is hard. The first 80% of any given task is usually trivial; however, the difficulty of each additional percentage for IE tasks are usually growing exponentially -in development, and research time. It's also quite underdocumented -most of the high-quality info is currently in obscure white papers (Google Scholar is your friend) -do check them out once you've got your hand burned a couple of times. But most importantly, do not let these obstacles throw you off -there are certainly big opportunities to make progress in this area.
I would recommend the excellent book Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. It covers a broad area of issues which form a great and up-to-date (2008) basis for Information Extraction and is available online in full text (under the given link).
I would suggest you take a look at the Natural Language Toolkit (nltk) and the NLTK Book. Both are available for free and are great learning tools.
You don't need to be good at math to do IE just understand how the algorithm works, experiment on the cases for which you need an optimal result performance, and the scale with which you need to achieve target accuracy level and work with that. You are basically working with algorithms and programming and aspects of CS/AI/Machine learning theory not writing a PhD paper on building a new machine-learning algorithm where you have to convince someone by way of mathematical principles why the algorithm works so I totally disagree with that notion. There is a difference between practical and theory - as we all know mathematicians are stuck more on theory then the practicability of algorithms to produce workable business solutions. You would, however, need to do some background reading both books in NLP as well as journal papers to find out what people found from their results. IE is a very context-specific domain so you would need to define first in what context you are trying to extract information - How would you define this information? What is your structured model? Supposing you are extracting from semi and unstructured data sets. You would then also want to weigh out whether you want to approach your IE from a standard human approach which involves things like regular expressions and pattern matching or would you want to do it using statistical machine learning approaches like Markov Chains. You can even look at hybrid approaches.
A standard process model you can follow to do your extraction is to adapt a data/text mining approach:
pre-processing - define and standardize your data to extraction from various or specific sources cleansing your data
segmentation/classification/clustering/association - your black box where most of your extraction work will be done
post-processing - cleansing your data back to where you want to store it or represent it as information
Also, you need to understand the difference between what is data and what is information. As you can reuse your discovered information as sources of data to build more information maps/trees/graphs. It is all very contextualized.
standard steps for: input->process->output
If you are using Java/C++ there are loads of frameworks and libraries available you can work with.
Perl would be an excellent language to do your NLP extraction work with if you want to do a lot of standard text extraction.
You may want to represent your data as XML or even as RDF graphs (Semantic Web) and for your defined contextual model you can build up relationship and association graphs that most likely will change as you make more and more extractions requests. Deploy it as a restful service as you want to treat it as a resource for documents. You can even link it to taxonomized data sets and faceted searching say using Solr.
Good sources to read are:
Handbook of Computational Linguistics and Natural Language Processing
Foundations of Statistical Natural Language Processing
Information Extraction Applications in Prospect
An Introduction to Language Processing with Perl and Prolog
Speech and Language Processing (Jurafsky)
Text Mining Application Programming
The Text Mining Handbook
Taming Text
Algorithms of Intelligent Web
Building Search Applications
IEEE Journal
Make sure you do a thorough evaluation before deploying such applications/algorithms into production as they can recursively increase your data storage requirements. You could use AWS/Hadoop for clustering, Mahout for large scale classification amongst others. Store your datasets in MongoDB or unstructured dumps into jackrabbit, etc. Try experimenting with prototypes first. There are various archives you can use to base your training on say Reuters corpus, tipster, TREC, etc. You can even check out alchemy API, GATE, UIMA, OpenNLP, etc.
Building extractions from standard text is easier than say a web document so representation at pre-processing step becomes even more crucial to define what exactly it is you are trying to extract from a standardized document representation.
Standard measures include precision, recall, f1 measure amongst others.
I disagree with the people who recommend reading Programming Collective Intelligence. If you want to do anything of even moderate complexity, you need to be good at applied math and PCI gives you a false sense of confidence. For example, when it talks of SVM, it just says that libSVM is a good way of implementing them.
Now, libSVM is definitely a good package but who cares about packages. What you need to know is why SVM gives the terrific results that it gives and how it is fundamentally different from Bayesian way of thinking ( and how Vapnik is a legend).
IMHO, there is no one solution to it. You should have a good grip on Linear Algebra and probability and Bayesian theory. Bayes, I should add, is as important for this as oxygen for human beings ( its a little exaggerated but you get what I mean, right ?). Also, get a good grip on Machine Learning. Just using other people's work is perfectly fine but the moment you want to know why something was done the way it was, you will have to know something about ML.
Check these two for that :
http://pindancing.blogspot.com/2010/01/learning-about-machine-learniing.html
http://measuringmeasures.com/blog/2010/1/15/learning-about-statistical-learning.html
http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html
Okay, now that's three of them :) / Cool
The Wikipedia Information Extraction article is a quick introduction.
At a more academic level, you might want to skim a paper like Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text.
Take a look here if you need enterprise grade NER service. Developing a NER system (and training sets) is a very time consuming and high skilled task.
This is a little off topic, but you might want to read Programming Collective Intelligence from O'Reilly. It deals indirectly with text information extraction, and it doesn't assume much of a math background.

What are some good rigid body dynamics references?

I'm not a math guy in the least but I'm interested in learning about rigid body physics (for the purpose of implementing a basic 3d physics engine). In school I only took Maths through Algebra II, but I've done 3d dev for years so I have a fairly decent understanding of vectors, quaternions, matrices, etc. My real problem is reading complex formulas and such, so I'm looking for some decent rigid body dynamics references that will make some sense.
Anyone have any good references?
Physics for Game Programmers I think is better than Physics for Game Developers.
If you want something thick in your bookshelf (like I do), Eberly's 3D Game Engine Design and Erleben's Physics-Based Animation can accompany the above.
Chris Hecker has a nice set of articles on his website which were originally published in Game Developer Magazine. They start with 2D physics and progress to 3D.
Physically Based Modeling by David Baraff is also good, but is a bit heavier on the math.
I guess what you are looking for is Classical Mechanics, which describes motion in one, two, and three dimensions in a generalized manner.
I found a good introductory course on Classical Mechanics from the University of Texas.
I do not guarantee that you will be able to understand all the concepts there, but it will at least give you a basis for your plan. I advise you to consult a Physics professor to help you understand the math.
Good luck!
If you are already familiar (and comfortable) with
linear algebra
basic calculus
Newton's laws of motion
then 6DoF Rigid Body Dynamics is what you are looking for. It's a brief article written [disclaimer: by me] when I once had to develop a helicopter flight simulator.
Using a rotation matrix allows for extremely simple modelling equations, but there exists a simple mapping to and from a quaternion if you prefer that representation for other reasons.
Trying not to get you to rip off your hair with frustration (well, Baraff's/Witkin great math articles with the multi-dimensional matrices would do that sometimes), you can look at the easier online articles such as the ones published in Gamasutra.
Here are two of them:
http://www.gamasutra.com/resource_guide/20030121/kennedy_pfv.htm
http://www.gamasutra.com/features/19990702/data_structures_01.htm
http://www.gamasutra.com/resource_guide/20030121/jacobson_pfv.htm
You'd notice that they point at the mentioned resources as part of their references. I would add that unless you need to solve equations system for multiple particles, articulated characters, or non-rigid complex object, this might be enough to start with.
If however, you do look for more advanced physics and mathematics which involves matrices and equations systems look up Witkin and Baraff's home pages (I think they are both in Pixar if I'm not mistaken), or start with Hecker (that tried more than several practical methods and documented his results).

Resources