Technical choices in unmarshalling hash-consed data - functional-programming

There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.
For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it's possible to do everything in a single pass.
I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn't find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it's not reliable (you can't make this up), but "personal communication" and "look how it's done in the source code" are considered poor form in citations.
I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.

OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.
Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.
The JFP paper is proprietary, but there appears to be a preprint on Andrew's web page.
Elsman's paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.
(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)

I found one reference on marshalling in functional languages; not sure if it will be useful, but the authors are smart: http://tinyurl.com/yc3hob9
I believe that Matthias Blume and/or Andrew Appel did something on this, but I can't find the paper. I also believe I reviewed something once for the Journal of Functional Programming, but I can't remember if the paper was accepted or who wrote it.
I suggest you ask Matthias Blume, Andrew Appel, and Phil Wadler if they can help.

Coq V5.10 had hash-consing and marshaling/unmarshaling. I didn't find anything in published form but the unmarshaling steps would be referenced as "reinterning" in the source code. Coq unmarhsaled values and then traversed them in order to re-create sharing, the obvious and only solution when all the language provides is an unmarshal function of type int_channel -> 'a.

Related

NTRU Key Exchange example implementation

Are there any open-source implementations of NTRU-KE (Preferably in Java or C#) out there that I can use as a reference for implementing it in a different language?
The implementations listed on the Wikipedia page for NTRUEncrypt don't have it included, and there's a paper covering the algorithm here but the language is a bit too technical for me to be able to understand it fully.
Future readers, please prove me wrong (and post your own answer).
Given it is pretty new (November 2013) there probably aren't any implementations at all. Even the authors of the paper might not have implemented it themselves (you could ask them though). But as far as I can tell the protocol only uses operations that would have to be included in NTRUEncrypt implementations anyway. So it shouldn't be to difficult to write one yourself on top of an existing NTRU library. You can ask specific questions on the protocol here or on https://crypto.stackexchange.com. Probably you should try to understand the basics of NTRUEncrypt first, though.

functional programming and self-commenting code - is this really possible?

As I got a lot of spare time to spend ATM I read a few threads/comments on code-comments and documentation here.
As most people here I too think that you should write your code so that it's easy to read and self-commenting as far as it's possible.
On the other hand I am a huge FP-fanboy - and yes if you write your code the right way it will be very readable in FP - or so I thought.
Problem is that tiny things make a awful lot of difference in FP-world. If your colleague doesn't fully understand FP he might be able to "read" the indentation of the code but won't be able to modify or fully understand it. That stands for languagues like Haskell, where a '.' or '$' makes a big difference and also for languages like F# or even C# of VB.NET with lots of LINQ statements.
At first glance the problem might be, that your peer just doesn't get the language and it's not the codes fault - on the other hand: who truly gets all of FP? Look at some papers concerning Haskell - the code is beautifully crafted and self-commenting but just as in math you may have to chew on a line for several minutes before you get it.
Of course in those papers there will be a text-block trying to clarify just after the code ....
So IMHO you have to comment your FP-code as long as you work in a shop where not every colleague has a PhD in CS ;)
What do you think?
PS: first post here - really looked for answers concerning this questions but didn't find any - please be gentle if I just didn't look hard enough :)
Functional languages greatly favor the development of self-documenting code, because you can freely rearrange the order of functions, and easily abstract out any given part of the code, assigning it an explanatory name.
Abstract, abstract, abstract, is the keyword to master code complexity, and that's where the functional style shines. But there will be always things that cannot be expressed within the code itself.
One clear example is code for algorithms. It is unlikely that one can easily understand a complex algorithm just by looking at the implementation. Yes, functional languages make understanding simpler, becasue many gory details (trivial example: memory management) do not have to be coded explicitly, thus exposing the underlying logic more clearly.
However this is no substitute for an explanation in natural language, which conveys in an intuitive way how it works (and sometimes a picture is worth a thousand words). This is becasue our brain needs to observe difficult concepts from different point of views in order to understand them fully.
What to comment also depends on your audience. Beginners, average programmers or wizards? There is no one-fits-all solution.
E.g. you should explain the meaning of a "." (function composition) in Haskell if you are writing tutorial code, but certainly that would be a redundant explanation for anyone who has gone past chapter one/two of any Haskell book.
On the other hand some specific algorithm, like say red-black trees, could be a given for some programmers, and something very mysterious for others. In the second case you should add many comments to the code, or point to a document with further explanations.
Finally, one should notice that there is no consensus even among the masters. E.g. Dennis Ritchie is famous for being extremely parsimonious with comments, instead Don Knuth is an advocate of "Literate programming", where comments are as important as code itself. A set of rules will never be a substitute for personal taste.

What's the easiest way to explain What is Hadoop and Map/Reduce?

It's very easy to explain NoSQL from high level view - it is basically "key-value" storage. Of course with thousand minor and important things, but in general it's just key value storage.
What's the best way to explain Hadoop and Map/Reduce?
May be some "real world" example which can be easy to give an compare for even newbies? Thanks!
I recently found this great article describing Map Reduce :
I’ve been planning on writing about
the Google’s MapReduce algorithm for
some time but I couldn’t find a good
practical example. Then we had a
Northwest C++ Users Group presentation
by Steve Yegge and a followup
discussion and beers, and I had a
little epiphany. Steve was talking
about, among other things, the build
process. And that’s just a bunch of
algorithms that are perfect for
explaining MapReduce.
The code examples are in C++, but the content is really language agnostic.
Here's a great tutorial on map/reduce in general, explaining the background, basics and data flow. I'm finding it useful to explain Google's App Engine implementation as well.
http://developer.yahoo.com/hadoop/tutorial/module4.html

a question about a design

My teammates and I have a very challenging new project to do, and we are supposed to submit it next week. We don't have a single clue about how to do it, and really need help. We are undergraduate students, new to Information Retrieval and AI, and really need your ideas.
The project is roughly:
When an expert is cited in a document,
find an expert with an opposing
opinion & find out what he/she says
about that topic.
We are free to use any programming language, but we are not concerned with the programming. We would like help to get us started. Please give us a rough idea on how to design such a system and how to retrieve information on the internet. How should we get his opinion, then find an opposite opinion?
Simple: use Amazon's Mechanical Turk.
Without that (or an equivalent) you're in trouble. If there are no further constraints on the problem then you will need a full-blown AI, the kind that doesn't yet exist. If there are severe restraints then you might have a chance of doing this in a week. If the expert can be in any field (medicine, politics, history, fashion, science, comic books, etc.) then there will be no single, well-organized repository of essays. You'll have to use Google to find Dr. X's opinion. Once you find Dr. X's writing (and let's pray it's text, not audio) you'll have to do some kind of natural language processing to get the thrust of it, even if you're lucky enough to find a descriptive title ("Digital Photography Is Absolutely Great"). Then you have to figure out it's opposite. What's the opposite of "Neil Gaiman draws on folklore for his story ideas"? Figuring out what opinion you're looking for will be a serious problem. After that, things actually get easier: you can google for the subject and use the same magic tools to find the one you're looking for.
So what do have a chance of solving? A search for opinions that someone else has already organised into "pro" and "con". Some online political forums are organised that way. Wikipedia cites opposing views in a special section in some of its articles. Science journals print letters of rebuttal. Look around, you might find a site even more cut-and-dried. Choose a small enough arena and you'll have a tractible problem.
EDIT: Damn, Ben Dunlap beat me to all my major points in a comment. Sigh
Sounds like an NLP problem to me. As for the information about documents and cites, http://citeseerx.ist.psu.edu should be a good starting point.
For each paper, there are several citations which refer to the paper. At the very minimum, you have to scan the abstract of the paper and that of the citations and run your own algorithm to figure if any citation is of the opposing opinion. Maybe your professor can give you hints on some approximate heuristic, but as far as I know it is a really hard problem.
I would be watching this thread for more interesting approaches.
Automatically submit a Google search request similar to "expert_name sucks", "expert_name wrong", or something like that. Find the first result that has "PhD" with a document link in the same sentence and return the link.
I think you might be blowing this up a little too big... as an undergraduate project, I would approach it a little more small scale.
Unless your specification says you must use actual internet resources, you would be better off creating your own database of custom short documents. Add metadata to each document stating the points they make about certain topics.
Next, I would create a list of citations which link to each document and add some metadata representing that experts stance on the topic. When someone reads a document, I would augment the list of citations with lists of links to documents which have alternative views on that topic.
Basically it would consist of these tables:
Document (id, data)
DocumentPoints (documentId, topic, stance)
Citation (documentId, topic, stance)
And when someone loads up a document, the citations are pulled up as well. For each citation, you search DocumentPoints for the same topics with different stances. The most difficult part of this project would be creating the 5 or 6 documents you need to have data in your database. After that the solution is trivial.
On a side note, most of these other answers are telling you to use some existing solution... don't do that unless the assignment tells you to. You'll be much better off understanding the problem and various ways to solve it (this is definitely not the only/best one) if you work through the entire problem yourself. When the teacher asks you to do something not supported by whatever product you chose to implement your solution on, you wouldn't be able to fix it. If you had just written it yourself, you could just as easily implement to the new spec as well.

Concepts that surprised you when you read SICP?

SICP - "Structure and Interpretation of Computer Programs"
Explanation for the same would be nice
Can some one explain about Metalinguistic Abstraction
SICP really drove home the point that it is possible to look at code and data as the same thing.
I understood this before when thinking about universal Turing machines (the input to a UTM is just a representation of a program) or the von Neumann architecture (where a single storage structure holds both code and data), but SICP made the idea much more clear. Scheme (Lisp) helped here, as the syntax for a program is exactly the same as the syntax for lists in general, namely S-expressions.
Once you have the "equivalence" of code and data, suddenly a lot of things become easy. For example, you can write programs that have different evaluation methods (lazy, nondeterministic, etc). Previously, I might have thought that this would require an extension to the programming language; in reality, I can just add it on to the language myself, thus allowing the core language to be minimal. As another example, you can similarly implement an object-oriented framework; again, this is something I might have naively thought would require modifying the language.
Incidentally, one thing I wish SICP had mentioned more: types. Type checking at compilation time is an amazing thing. The SICP implementation of object-oriented programming did not have this benefit.
I didn't read that book yet, I have only looked at the video courses, but it taught me a lot. Functions as first class citizens was mind blowing for me. Executing a "variable" was something very new to me. After watching those videos the way I now see JavaScript and programming in general has greatly changed.
Oh, I think I've lied, the thing that really struck me was that + was a function.
I think the most surprising thing about SICP is to see how few primitives are actually required to make a Turing complete language--almost anything can be built from almost nothing.
Since we are discussing SICP, I'll put in my standard plug for the video lectures at http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/, which are the best Introduction to Computer Science you could hope to get in 20 hours.
The one that I thought was really cool was streams with delayed evaluation. The one about generating primes was something I thought was really neat. Like a "PEZ" dispenser that magically dispenses the next prime in the sequence.
One example of "the data and the code are the same thing" from A. Rex's answer got me in a very deep way.
When I was taught Lisp back in Russia, our teachers told us that the language was about lists: car, cdr, cons. What really amazed me was the fact that you don't need those functions at all - you can write your own, given closures. So, Lisp is not about lists after all! That was a big surprise.
A concept I was completely unfamiliar with was the idea of coroutines, i.e. having two functions doing complementary work and having the program flow control alternate between them.
I was still in high school when I read SICP, and I had focused on the first and second chapters. For me at the time, I liked that you could express all those mathematical ideas in code, and have the computer do most of the dirty work.
When I was tutoring SICP, I got impressed by different aspects. For one, the conundrum that data and code are really the same thing, because code is executable data. The chapter on metalinguistic abstractions is mind-boggling to many and has many take-home messages. The first is that all the rules are arbitrary. This bothers some students, specially those who are physicists at heart. I think the beauty is not in the rules themselves, but in studying the consequence of the rules. A one-line change in code can mean the difference between lexical scoping and dynamic scoping.
Today, though SICP is still fun and insightful to many, I do understand that it's becoming dated. For one, it doesn't teach debugging skills and tools (I include type systems in there), which is essential for working in today's gigantic systems.
I was most surprised of how easy it is to implement languages. That one could write interpreter for Scheme onto a blackboard.
I felt Recursion in different sense after reading some of the chapters of SICP
I am right now on Section "Sequences as Conventional Interfaces" and have found the concept of procedures as first class citizens quite fascinating. Also, the application of recursion is something I have never seen in any language.
Closures.
Coming from a primarily imperative background (Java, C#, etc. -- I only read SICP a year or so ago for the first time, and am re-reading it now), thinking in functional terms was a big revelation for me; it totally changed the way I think about my work today.
I read most part of the book (without exercise). What I have learned is how to abstract the real world at a specific level, and how to implement a language.
Each chapter has ideas surprise me:
The first two chapters show me two ways of abstracting the real world: abstraction with the procedure, and abstraction with data.
Chapter 3 introduces time in the real world. That results in states. We try assignment, which raises problems. Then we try streams.
Chapter 4 is about metalinguistic abstraction, in other words, we implement a new language by constructing an evaluator, which determines the meaning of expressions.
Since the evaluator in Chapter 4 is itself a Lisp program, it inherits the control structure of the underlying Lisp system. So in Chapter 5, we dive into the step-by-step operation of a real computer with the help of an abstract model, register machine.
Thanks.

Resources