Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Here's what I have on my list so far. I'd like to know of others in the same vein, perhaps more technical, perhaps less
Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion - Ableson, Leeden, and Lewis
Glut: Mastering Information Through the Ages - Wright
Information Rules - Varian and Shapiro
Web Dragons: Inside the Myths of Search Engine Technology - Witten, Gori, and Numerico
There are a few I've seen on text mining, they include
Web Data Mining - Liu
Modern Information Retrieval - Baeza-Yates, Ribiero-Neto
Also looking for blog recs like
or papers like
The Discovery of Structural Form
"SIGIR" - the conference
"TREC" - the conference
Baeza-Yates, Ribeiro-Neto, "Modern Information Retrieval" (1999)
Witten, "Managing Gigabytes" (1999)
van Rijsbergen, "Information Retrieval" (1979)
are the obvious "bibles" (as mentioned above).
Büttcher, Clarke, Cormack, "Information Retrieval: Implementing and Evaluating Search Engines" (2010)
is an interesting new textbook (student-level), full of biliographic references. It contains a good explanation of parallel retrieval algorithms (sample chapter).
Croft, Metzler, Strohman, "Search Engines: Information Retrieval in Practice" (2009)
has good reviews; I didn't like it too much (read the sample chapters on Croft's homepage).
Voorhees, Harman, "Trec: Experiment and Evaluation in Information Retrieval" (2009)
is a good introduction to the TREC approach in evaluating IR.
Langville, Meyer, "Google's Pagerank and Beyond: The Science of Search Engine Rankings" (2006)
explains how to efficently compute PageRank.
Introduction to Information Retrieval seems to be the recommended text these days for the underlying technology; it was released in 2008 and I haven't read it yet. (The full text is free online.) Managing Gigabytes, as TimB recommended, is my favorite older book; it's much better written than Modern Information Retrieval, though that's also worth a look. There's more you can find with the obvious search.
Managing Gigabytes - Witten, Moffat, and Bell: a quite detailed look at some of the technologies behind information retrieval, text and image compression. (Disclaimer: my university supervisor is the second author.)
You should also know about ACM's SIGIR, which organises an annual conference on information retrieval, and has a mailing list as well.
As Book Introduction to Information Retrieval as mentioned.
I think, the best advanced information are the publications found in several academic sites and the conference papers (SIGIR, CIKM, SPIRE, WWW009, ...).
Poly edu
University of Waterloo
Information Retrieval - Implementing and Evaluating Search Engines has been published by MIT press in 2010 and is a very good book on gaining practical knowledge of Information Retrieval. Stefan Buttcher, Charles Clarke and Gordon Cormack are the authors of this book. Buttcher was Clarke's doctoral student, and Clarke was Cormack's doctoral student. Altogether, they have combined around 50 years of their IR research and experience in this book. Its a must read !!!
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have studied mathematics, but that was long time ago. I have been a programmer for 8 years but when I started to study concepts in AI and data mining I find it very difficult to understand the theory.
Now I have wasted 2-3 years and I have got nothing. I need to first understand the math concepts required to learn AI and data mining.
I don't know where to start. Which books and tutorials do you recommend I should start with from the AI point of view.
How should I go about obtaining the fundamental requirements to use AI and Data Mining concepts.
I got this list from internet
Matrix algebra: most machine learning models are represented as matrices and vectors. Concepts like eigenvectors and singular value decomposition appear all over the place.
Bayesian statistics: probability, Bayes' rule, common distributions (e.g., beta, Dirichlet, Gaussian), etc.
Multivariable calculus: most learning techniques use gradients and Hessians at their core to fit parameters. (If you want to get fancier, study numerical optimization.)
Information theory: entropy, KL divergence, etc. Just the basics here.
In limited cases, higher-level math can be useful. E.g., to understand manifold learning, you'll want to know some basic notions from geometry and topology. Occasionally abstract algebra is used (e.g., see "expectation semirings" for learning on hyper-graphs). I would learn these as-needed, but if you have a chance to learn them early it can't hurt.
Can anyone recommend some books on those
My resource for studying math :
You will be able to find A LOT on all math fields.
I agree with #Lostdreamer that has great material for learning various math concepts.
For an excellent introductory online course on Machine Learning I highly recommend the Machine Learning course being offered on It is taught by Stanford Professor Andrew Ng You can watch the videos as many times as you need to understand the concepts.
The exercises and programming assignments help drive home the concepts.
I recommend that you register for it the next time it is offered. Here's a link to the course registration page.
Here's a link to a preview of the material in the course.
The course contains a basic review of linear algebra including basic matrix concepts that help me review this material.
I highly recoment #HeatfanJohn 's course, I've already made it, without any knoledge of AI and it turned out pretty good, the teacher is amazing and the course is extremely clear, try it!
In addition I made this other AI course in the same time as the other. This one is much more general, you will learn a bit about everything in AI and there are not any previous knowledge you should have. If you are not used to do math, this one is easier than the ML one (in ML you need to make exercises in matlab, that are sometimes a little bit tricky) but I found it more interesting for a general overview.
I highly recommend you to do both
One you become addicted to AI (you will for sure if you make this two courses!) I reccomend
Udacity, an amazing computer science free online "university". The best teacher in the world teaching you awesome things for free. If this is not awesome enough I'll tell you that the AI class teachers made this web page. One is Google's research director (Peter Norvig) and other one is the guy that made the first autonomous driving car (Sebastian Thrun). Awesome people
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I want to advertise OCaml to beginners, and I am looking for good tutorials in English; not that you have only heard of, but that you have actually tried and found useful...
I quite like the book Developing Applications With Objective Caml -- I guess the title should be updated to mirror the 'OCaml' naming decision. It is old and therefore slightly out-of-date, but on only minor aspects -- eg., it presents the stream syntax as belonging to the core language, but it is now outsourced as a Camlp4 extension. The book is surprisingly complete, and there is a lot of meat already in the chapters 2, 3 and 4.
This books covers a bit of system programming, but if it's what the reader is interested in, I would rather recommend the separate book Unix system programming in OCaml -- also translated into english by a community effort.
Finally, if one want to discover the theoretical underpinnings of OCaml, If found the U3 book, Using, Understanding, and Unraveling the OCaml Language to be a great resource. But it's only for readers that already know about OCaml.
PS: I have a very good opinion on Jason Hickey's introduction to Objective Caml as well, but I can't say I have read it in full, only glanced at it. That's the problem with "beginners" books, you can really read at most one good one.
For me, the primary one is:
$ apt-cache show ocaml-book-en
Package: ocaml-book-en
Source: ocaml-book
Version: 1.0-5
Installed-Size: 7061
Maintainer: Debian QA Group <>
Architecture: all
Recommends: www-browser | pdf-viewer
Description-en: English book: "Developing applications with Objective Caml"
This is the English translation of the O'Reilly's OCaml French
book "Developpement d'applications avec Objective Caml" that can
be found in the ocaml-book-fr package.
This package contains both the HTML and PDF version of the book.
There is also great book for system programming in OCaml and cookbook-style resource here.
The tutorial I used when learning and the one I always recommend to beginners - (mirrored at ocamlcore as original site went down).
Here is a book that is intended for newcomers to programming and also those who know some programming but want to learn programming in the function-oriented paradigm, or those who simply want to learn OCaml.
An OCaml port of the book How to Think Like a Computer Scientist has been created by Nicolas Monje.
According to the website, the PDF version of the book should be downloaded
From the book:
The goal of this book is to teach you to think like a computer scientist. This way of thinking combines some of the best features of mathematics, engineering, and natural science. Like mathematicians, computer scientists use formal languages to denote ideas (specifically computations). Like engineers, they design things, assembling components into systems and evaluating tradeoffs among alternatives. Like scientists, they observe the behavior of complex systems, form hypotheses, and test predictions.
The single most important skill for a computer scientist is problem solving. Problem solving means the ability to formulate problems, think creatively about solutions, and express a solution clearly and accurately. As it turns out, the process of learning to program is an excellent opportunity to practice problem-solving skills. That’s why this chapter is called, “The way of the program.”
On one level, you will be learning to program, a useful skill by itself. On another level, you will use programming as a means to an end. As we go along, that end will become clearer.
I've just started with Ocaml, and these are tutorials that I find most helpful:
Documentation and user’s manual - most useful and official
Introduction to Caml - this one i used in my first days (recently) and it was really helpful because of it's simplicity
I thought Jason Hickey's Introduction to Objective Caml was very good (the only actual text on the language I've read, and how I started). INRIA's documentation is nice as well; and reading module signatures by themselves is quite instructive once you get the hang of it ;)
Believe it or not, OCaml was the first language I (really) learned.
There is a new book "Real World OCaml" of Jason Hickey is going to be published soon. On the web-site there is a public beta available for free. Despite the fact the book is not finished yet, I didn't notice any major mistakes or irrelevancies.
It gave me a full-fledged understanding of OCaml. It contains lots of examples illustrating concepts and could be easily considered as a tutorial. I also liked that it partly covers standart modules (List, ListLabels, Map, Sys, String, may be some others).
"The Runtime System" section in this book is very useful. It provides details about compiler implementation, memory management, linkage with foreign code, language cost intuition. The latter I consider very important, because many functional programming books cover concepts without saying how expensive they are in terms of memory and time. Highly recommend this book, especially when there is a free online version.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Assume you know a student who wants to study Machine Learning and Natural Language Processing.
What specific computer science subjects should they focus on and which programming languages are specifically designed to solve these types of problems?
I am not looking for your favorite subjects and tools, but rather industry standards.
Example: I'm guessing that knowing Prolog and Matlab might help them. They also might want to study Discrete Structures*, Calculus, and Statistics.
*Graphs and trees. Functions: properties, recursive definitions, solving recurrences. Relations: properties, equivalence, partial order. Proof techniques, inductive proof. Counting techniques and discrete probability. Logic: propositional calculus, first-order predicate calculus. Formal reasoning: natural deduction, resolution. Applications to program correctness and automatic reasoning. Introduction to algebraic structures in computing.
This related stackoverflow question has some nice answers: What are good starting points for someone interested in natural language processing?
This is a very big field. The prerequisites mostly consist of probability/statistics, linear algebra, and basic computer science, although Natural Language Processing requires a more intensive computer science background to start with (frequently covering some basic AI). Regarding specific langauges: Lisp was created "as an afterthought" for doing AI research, while Prolog (with it's roots in formal logic) is especially aimed at Natural Language Processing, and many courses will use Prolog, Scheme, Matlab, R, or another functional language (e.g. OCaml is used for this course at Cornell) as they are very suited to this kind of analysis.
Here are some more specific pointers:
For Machine Learning, Stanford CS 229: Machine Learning is great: it includes everything, including full videos of the lectures (also up on iTunes), course notes, problem sets, etc., and it was very well taught by Andrew Ng.
Note the prerequisites:
Students are expected to have the following background: Knowledge of
basic computer science principles and skills, at a level sufficient to write
a reasonably non-trivial computer program. Familiarity with the basic probability theory.
Familiarity with the basic linear algebra.
The course uses Matlab and/or Octave. It also recommends the following readings (although the course notes themselves are very complete):
Christopher Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001.
Tom Mitchell, Machine Learning. McGraw-Hill, 1997.
Richard Sutton and Andrew Barto, Reinforcement Learning: An introduction. MIT Press, 1998
For Natural Language Processing, the NLP group at Stanford provides many good resources. The introductory course Stanford CS 224: Natural Language Processing includes all the lectures online and has the following prerequisites:
Adequate experience with programming
and formal structures. Programming
projects will be written in Java 1.5,
so knowledge of Java (or a willingness
to learn on your own) is required.
Knowledge of standard concepts in
artificial intelligence and/or
computational linguistics. Basic
familiarity with logic, vector spaces,
and probability.
Some recommended texts are:
Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Second Edition. Prentice Hall.
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
James Allen. 1995. Natural Language Understanding. Benjamin/Cummings, 2ed.
Gerald Gazdar and Chris Mellish. 1989. Natural Language Processing in Prolog. Addison-Wesley. (this is available online for free)
Frederick Jelinek. 1998. Statistical Methods for Speech Recognition. MIT Press.
The prerequisite computational linguistics course requires basic computer programming and data structures knowledge, and uses the same text books. The required articificial intelligence course is also available online along with all the lecture notes and uses:
S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Second Edition
This is the standard Artificial Intelligence text and is also worth reading.
I use R for machine learning myself and really recommend it. For this, I would suggest looking at The Elements of Statistical Learning, for which the full text is available online for free. You may want to refer to the Machine Learning and Natural Language Processing views on CRAN for specific functionality.
My recommendation would be either or all (depending on his amount and area of interest) of these:
The Oxford Handbook of Computational Linguistics:
Foundations of Statistical Natural Language Processing:
Introduction to Information Retrieval:
String algorithms, including suffix trees. Calculus and linear algebra. Varying varieties of statistics. Artificial intelligence optimization algorithms. Data clustering techniques... and a million other things. This is a very active field right now, depending on what you intend to do.
It doesn't really matter what language you choose to operate in. Python, for instance has the NLTK, which is a pretty nice free package for tinkering with computational linguistics.
I would say probabily & statistics is the most important prerequisite. Especially Gaussian Mixture Models (GMMs) and Hidden Markov Models (HMMs) are very important both in machine learning and natural language processing (of course these subjects may be part of the course if it is introductory).
Then, I would say basic CS knowledge is also helpful, for example Algorithms, Formal Languages and basic Complexity theory.
Stanford CS 224: Natural Language Processing course that was mentioned already includes also videos online (in addition to other course materials). The videos aren't linked to on the course website, so many people may not notice them.
Jurafsky and Martin's Speech and Language Processing is very good. Unfortunately the draft second edition chapters are no longer free online now that it's been published :(
Also, if you're a decent programmer it's never too early to toy around with NLP programs. NLTK comes to mind (Python). It has a book you can read free online that was published (by OReilly I think).
How about Markdown and an Introduction to Parsing Expression Grammars (PEG) posted by cletus on his site cforcoding?
ANTLR seems like a good place to start for natural language processing. I'm no expert though.
Broad question, but I certainly think that a knowledge of finite state automata and hidden Markov models would be useful. That requires knowledge of statistical learning, Bayesian parameter estimation, and entropy.
Latent semantic indexing is a commonly yet recently used tool in many machine learning problems. Some of the methods are rather easy to understand. There are a bunch of potential basic projects.
Find co-occurrences in text corpora for document/paragraph/sentence clustering.
Classify the mood of a text corpus.
Automatically annotate or summarize a document.
Find relationships among separate documents to automatically generate a "graph" among the documents.
EDIT: Nonnegative matrix factorization (NMF) is a tool that has grown considerably in popularity due to its simplicity and effectiveness. It's easy to understand. I currently research the use of NMF for music information retrieval; NMF has shown to be useful for latent semantic indexing of text corpora, as well. Here is one paper. PDF
Prolog will only help them academically it is also limited for logic constraints and semantic NLP based work. Prolog is not yet an industry friendly language so not yet practical in real-world. And, matlab also is an academic based tool unless they are doing a lot of scientific or quants based work they wouldn't really have much need for it. To start of they might want to pick up the 'Norvig' book and enter the world of AI get a grounding in all the areas. Understand some basic probability, statistics, databases, os, datastructures, and most likely an understanding and experience with a programming language. They need to be able to prove to themselves why AI techniques work and where they don't. Then look to specific areas like machine learning and NLP in further detail. In fact, the norvig book sources references after every chapter so they already have a lot of further reading available. There are a lot of reference material available for them over internet, books, journal papers for guidance. Don't just read the book try to build tools in a programming language then extrapolate 'meaningful' results. Did the learning algorithm actually learn as expected, if it didn't why was this the case, how could it be fixed.
I'm a totally blind individual who would like to learn more of the theory aspect of computer science. I've had an intro data structures class and the general intro programming but would like to learn more on things such as software design, advanced data structures, and compiler design. I want to do this as a self study course not as part of college classes.
Unfortunately there aren’t many text books available on computer science from Recordings for the Blind and Dyslexic where I normally get my textbooks. I would appreciate any electronic resources preferably free that could help me get more of a computer science education rather then the newest language or platform that a lot of programming sites appear to focus on.
You might find the Experiences of a Blind Computer Scientist a good read.
MIT's Open Courseware would be a good resource for you with the amount of videos/audio they have.
Really though, for the core computer-science topics I find it pretty hard to beat some of the better textbooks out there. Some offer digital versions of their book with purchase and some don't. For those that don't, I would just purchase the book and then download via a torrent site a digital e-book equivelant. Since you already own the book I don't think this would be a major problem.
UC Berkley has a couple of computer science courses online for free as mp3 and video files (including RSS feed for each course). And if reading PDF files aren't an issue you could check out O'Reilly's Safari.
The text book for Structure and Interpretation of Computer Programs appears to be accessible. Software engineering radio is a good podcast that I listen to but recently has focused a lot on model driven development and UML which doesn't interest me. The UC Berkley
lectures are of varying quality, it's like all other college classes it depends on the professor. I've found I can follow along with the cs162 lectures fine but not so much with the cs61b. Part of this is because of the professor and part is probably because 61b is more math heavy since it's a data structures class. Unfortunately the RSS feeds are useless since the file names are meaningless. I used my podcatcher to download the entire lecture series, then used the converting capability of foobar 2000 to rename the files with there track number so I could listen to them in order. I've used Safari at work before and it is accessible although to expensive for me to get a yearly subscription. Open Courseware appears to have a lot of good stuff. Unfortunately I don't use itunes so instead of downloading each mp3 file individually I used the firefox extension DownThemAll! with a custom filter to grab all the mp3 files at once from the specific course I wanted. Another series of books that looks useful are the data structures books by Bruno R. Preiss several of which are available online at
Some of the equations are represented as graphics but I can often tell what the general idea is by context.
I wonder would the Structure and Interpretation of Computer Programs video lectures by Hal Abelson and Gerald Jay Sussman be of any use?
If the audio content is enough on its own without the video, they are an excellent digital resource.
The podcast "software engineering radio" is excellent. Though not CS courseware, it is the most academic and intellectually stimulating podcast I have found about software development and computer science.
personally I am just blown away by the questioner. I mean, the challenge alone of programming is too much for most people but being without the primary sense used in the task is amazing to me. What is ironic though is I bet that given this challenge the questioner is still FAR more adept at most CS tasks than the people I work with day to day. Just saying.
I'm also a totally blind programmer, currently working for Microsoft. The most valuable resource for te technical books is Safari ( You can read thousands of computer science texts there. if you're in the USA, you can also get many of those titles for free from BookShare ( In both cases graphical images will be an issue, but there's no easy solution for that. Most good books have enough descriptive text that one can manage without the diagrams.
I to am a new blind programmer! I only lost my vision 5 years ago. Anyway, I have been programming in Visual Basic 2008 throughout the past year. It turned out to be more accessible than I had at first suspected.
I start a Java class next semester and the required text is a free online text! It is posted below.
Introduction to Programming Using Java, Fifth Edition
Can some of you seasoned blind programmers share with us any blogs or websites where other blind programmers can be found??
Check out this Stack Overflow question about podcasts.
A language called Quorum is a lot like Python but optimized across a few more syntactic details, and the corresponding development environment is designed with the blind in mind. This might fit especially well with the use case where most students are using Python.
A 2016 blog about CSed (actually a response to a blog post) points to
program-l discussion board for blind programmers at
The EPIQ conference for blind and other programmers interested in Quorum
Also, see other ideas in a similar question on another SO site:
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Does anybody know of any resources (books, classes, lecture notes, or anything) about the general theory of computer algebra systems (e.g. mathematica, sympy)?
"Introductory" materials are preferred, but I realize that with such a specialized subject anything is bound to be fairly advanced.
"General Theory" of CAS is a pretty huge scope for a question. That being said, I'll do my best to cover as much as I can in the hopes that something helps you find what you're looking for :)
The proceedings of the ISSAC and SIGSAM groups would no doubt have some good stuff about techniques for building CAS systems. A list of various topics in the general area of CAS building is available here:
If you're more looking for information on how to code some of the math involved, I'm a fan of the "Numerical Recipes" series; it provides sample code and a reasonably decent explanation of math in a wide range of topics. Last I checked, an online version of an older revision of the book was available here: (Note that this is the "Numerical Recipes in C" form of the book; there are versions in other languages as well).
For building a CAS in general, one place to start might be here: "Building a computer algebra environment by composition of collaborative tools" by Kajler and Safir; Another place you might check is here: where a high-level description of how a few folks implemented a CAS is listed.
The other thing you might try is diving into the code for a few of the open source CAS projects that exist: YACAS (Yet Another Computer Algebra System : Java), Axiom, etc. I like the list here:
Hope something in there was useful!
The basics are nicely covered in PAIP; the source code is free online -- see particularly the source files with 'macsyma' in the name. Topics include rewrite-rule systems, simplification using canonical forms, integration and differentiation, and compiling and memoizing rewrite rules for speed.
I've found Algorithms for computer algebra by K.O. Geddes... to be pretty useful. I'm a junior undergrad with a light math background doing work on OpenAxiom (a CAS). Get ready for some heavy, heavy math though, my best advice is to have a couple books if only to have a different perspective if you get "stuck".
It might help if you suggest what you're looking into, what areas you're interested in, etc.
Here's one link from Wikipedia: Computer Algebra Systems
And another here:
Here are two books which describe algorithms used for implementing computer algebra systems:
Computer Algebra and Symbolic Computation: Elementary Algorithms
Computer Algebra and Symbolic Computation: Mathematical Methods
I used these books to implement libraries for computer algebra in Scheme (MPL) and C# (Symbolism).
You mention SymPy in your question so I'll speak to that briefly.
The project and community of SymPy are themselves actually very good resources.
There is a variety of expertise that regularly checks and responds to the mailing list.
The code is openly available on github.
The documentation is fairly complete and often includes academic citations.
If you're interested in CASs come on by. The contributors like to talk about what they work on. If you're interested it's easy to get started and add your own contributions.