Is it feasible to implement a Clean backend with LLVM - functional-programming

Would it be feasible to implement a backend for Clean using the LLVM toolkit? If not, what are the stumbling blocks?
Also, if you happened to know of a good reference for the "ABC assembler" used as an IR by the Clean compiler, please include it in your answer. Thanks.

Without any sort of documentation of the ABC-intermediate language, it's going to be tough (I have been unable to find any what so ever).
It's definitely possible, however. As you hint at yourself, you would need to implement the code generator to use llvm instead -- the scope of which depends entirely on the complexity of the ABC-language.
The llvm-backend for Haskell may be of inspiration: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM

You might be interested in the following articles (I'm having trouble finding them since the ST department screwed up their server config):
Smetsers, J.E.W. Compiling CLEAN to Abstract ABC-Machine Code, University of Nijmegen, Technical Report 89-20, October 1989. Describes how CLEAN is translated into (intermediate) ABC code.
Koopman P.W.M., Eekelen M.C.J.D. van, Nöcker E.G.J.M.H., Smetsers S., Plasmeijer M.J. (1990). 'The ABCmachine:
A Sequential Stack-based Abstract Machine For Graph Rewriting'. Technical Report 90-22, University of
Nijmegen.
Also see http://clean.cs.ru.nl/ST_Publications.

Related

SIGNAL vs Esterel vs Lustre

I'm very interested in dataflow and concurrency focused languages. I've read up on the subject and repeatedly I see SIGNAL, Esterel, and Lustre mentioned; so I take it they're prominent players in those fields. However, many of their links in the resources I found are dead and they don't seem very accessible. I managed to find a couple compilers I can compile from source (Polychrony Toolset for SIGNAL and the Columbia Compiler for Esterel) but they've both had issues when trying to compile with cmake. Even textbooks teaching these languages have been tough to come by.
With the background of the way, my actual questions are: is anyone really familiar with this field of programming? Are these languages still big deals, or have they "died out" by now? Could it be they're just available to big companies with a hefty price tag, so the average programmer wouldn't really be able to pick those languages up?
I ran into a couple other dataflow/concurrent paradigm languages, such as Oz or E, but they seemed to be mostly for education and not suitable for real world projects. Not to say they aren't impressive languages, but their implementation was limited and it would be unlikely to see them in production contexts. Does anyone know of other languages in this field they can recommend that are actually accessible (have good documentation, tutorials, and an installable compiler to actually code in)? Or can anyone clarify a language such as Oz or E and hopefully show that they indeed are good enough for large real world projects?
All the languages you mentioned are not widespread. This means their compilers and runtime have bugs, the community is narrow and can give little help, and linking with general purpose libraries can be problematic.
I recommend to use an actively supported general purpose language such as Java, Scala, Kotlin or C++. They all have libraries to support asynchronous computations, and dataflow is no more than support of asynchronous procedure call. You even can develop your own dataflow library. This is not that hard: I wrote a dataflow library for Java which is only 40 kilobytes of source code.
Have you tried Céu? It is a recent variant of Esterel, and compiles to C. It is simple to understand, and provides a reactive and concurrent structuring of control flow. Native C calls can be made by just prefixing them with an underscore ("_printf").
http://ceu-lang.org
Also, see the paper "Structured Synchronous Reactive Programming with Céu" for a nice overview.
http://www.ceu-lang.org/chico/ceu_mod15_pre.pdf
These academics languages mostly disappeared as such and are used in industrial tools
Esterel-Lustre are the basis of in Ansys' SCADE
Signal is used in 3DS' ControlBuild
Esterel was used in Synopsys' ConcentricStudio.
Researchers use also Heptagon for synchronous language studies for code generation, formal methods, new concepts.

What do I need to do to get paid to Scheme?

I'm a big fan of functional programming in general, Schemes in particular, and PLT-Racket ideally. I am wondering what concrete steps are likely to get me into a position where coding Scheme (or some functional language) is the bulk of the work.
I'm actually quite interested in academia, but on the other hand, I don't feel like I necessarily have what it takes (at least not at the moment) to do a top-tier Ph.D in CS. I definitely would prefer to have some real-world experience putting complex systems together in Scheme either way. Does anyone have any advice for an aspiring Schemer?
Start writing some Scheme libraries, then blog about the libraries you've wrote, get noticed in the community.
This will always give you leverage when applying for a position, employers like to have some evidence of what you can do.
dalton has the right idea; you want to build something you can show off. To find out about needs, you could go to http://srfi.schemers.org/, which is an archive of proposals for Scheme libraries and other improvements to Scheme, and see what you think you can contribute to. Or make contact with the Racket team; you may be able to contribute to Racket directly.
If you want to leverage something popular and in the news: App Inventor is based on Google Blocks, which are in turn based on Kawa, which is a Scheme dialect [*].
If you can show off your skills by putting together blocks and making them available for the community...it's a natural way to take advantage both of your multi-language skills and something currently getting press coverage.
Regards,
Dak
[*] and I forgot to say that earlier, mea culpa!
Not going to accept my own answer because it is, in general, worse than the one #dalton gave, but!
I got a grant through Turbulence.org to write an art and thus was paid to scheme! Or racket, if you want to be a pedant. repo here...
F# is getting popular in the finance sector:
http://cs.hubfs.net/forums/thread/16004.aspx

Technical choices in unmarshalling hash-consed data

There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.
For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it's possible to do everything in a single pass.
I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn't find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it's not reliable (you can't make this up), but "personal communication" and "look how it's done in the source code" are considered poor form in citations.
I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.
OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.
Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.
The JFP paper is proprietary, but there appears to be a preprint on Andrew's web page.
Elsman's paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.
(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)
I found one reference on marshalling in functional languages; not sure if it will be useful, but the authors are smart: http://tinyurl.com/yc3hob9
I believe that Matthias Blume and/or Andrew Appel did something on this, but I can't find the paper. I also believe I reviewed something once for the Journal of Functional Programming, but I can't remember if the paper was accepted or who wrote it.
I suggest you ask Matthias Blume, Andrew Appel, and Phil Wadler if they can help.
Coq V5.10 had hash-consing and marshaling/unmarshaling. I didn't find anything in published form but the unmarshaling steps would be referenced as "reinterning" in the source code. Coq unmarhsaled values and then traversed them in order to re-create sharing, the obvious and only solution when all the language provides is an unmarshal function of type int_channel -> 'a.

wav-to-midi conversion

I'm new to this field - but I need to perform a WAV-to-MIDI conversion in java.
Is there a way to know what exactly are the steps involved in WAV-to-MIDI conversion?
I have a very rough idea as in you need to;
sample the wav file, filter it, use FFT for spectral analysis, feature extraction and then write the extracted features on to MIDI.
But I cannot find solid sources or papers as in how to do all that?
Can some one give me clues as in how and where to start?
Are there any Open Source APIs available for this WAV-to-MIDI conversion process?
Advance thanks
It's a more involved process than you might imagine.
This research problem is often referred to as music transcription: the act of converting a low-level representation of music (e.g., waveform) into a higher-level representation such as MIDI or even sheet music.
The sophistication of your solution will depend upon the complexity of your input data. Tons of research papers address music transcription only on monophonic piano or drums... because they are easy to transcribe. (Relatively.) Violin is harder. Voice is even harder. Violin plus voice plus piano is much harder. A symphony is nearly impossible. You get the picture.
The basic elements of music transcription involve any of the following overlapping areas:
(multi)pitch estimation
instrument recognition, timbral modeling
rhythm detection
note onset/offset detection
form/structure modeling
Search for papers on "music transcription" on Google Scholar or from the ISMIR proceedings: http://www.ismir.net. If you are more interested in one of the above subtopics, I can point you further. Good luck.
EDIT: That being said, there are existing solutions that we can all find on the web. Feel free to try them. But as you do, evaluate them with a critical eye and ear. What types of audio signals would cause transcription to fail?
EDIT 2: Ah, you are only doing this for piano. Okay, this is doable. Music transcription has advanced to the point where it can transcribe monophonic piano pretty well. A Rachmaninov concerto will still pose problems.
Our recommendations depend upon your end goal. You state "need to perform... in Java." So it sounds like you just want something to work regardless of how it gets you there. In that case, I agree 100% with others: use something that exists.
That's actually an interesting question; all of the MIR libraries I know are typically C/C++/Python/Matlab. But not Java. The EchoNest has a Java API, but I don't think it does note-level transcription. http://developer.echonest.com. (Edit: It does note-level transcription. The returned data includes pitch, timbre, beat, tatum, and more. But I find polyphony is still a problem.)
Oh, Marsyas is Java-based. Cool. I thought it was just C++. http://marsyas.info/ I recommend this. It's developed by George Tzanetakis, a professor in MIR. It does signal-level analysis and should be a good option.
Now, if this is for a fun learning experience, I think you can use the sound manipulation utilities in Java to experiment with the WAV signal and see what comes out.
EDIT: This page describes MIR software better than I can: The Tools We Use
For Matlab, you may be interested in the MIR Toolbox
Here is a nice page of common datasets: MIR Datasets
This is a very big undertaking for being new to the field, unless you mean you are familiar with signal analysis and feature detection in general and want to look more specifically into automatic transcription.
There is no API for WAV to MIDI conversion. Vamp is a framework for feature extraction plugins, but to do automatic transcription you would need to use all the functionality of the existing plugins, plus implement functionality that exists in none of them yet.
Browse through the descriptions of the plugins on the vamp download page, any descriptions you do not understand are topics you should start researching if you want to do this.
If you don't need to automate this task (ie, for a website where people can upload MP3's and get MIDI files back), then you should consider using a tool like Melodyne which is already quite good at going this. As Steve noted, this is a very difficult task to accomplish, and even the best algorithms and solutions present at the moment are not 100% reliable.
So if you are just doing studio work and need to do a few conversions, it will probably save you a bit of time (and lots of headache) to use a tool already designed for this task.
This is a field which is still highly under development, yet, there are some (experimental) algorithms available.
You can install sonic annotator and use a few vamp plugins.
For example:
./sonic-annotator file.wav -d vamp:qm-vamp-plugins:qm-transcription:transcription -w midi
./sonic-annotator file.wav -d vamp:silvet:silvet:notes -w midi
./sonic-annotator file.wav -d vamp:ua-vamp-plugins:mf0ua:mf0ua -w midi
Dolphin, sorry to be brusque, but you have completely underestimated the problem. What you want to achieve - a full piano sound transcription involving all parameters that were used while playing would need an enormous amount of research with people who have worked in the field for many years. Even a group of PhDs in signal processing would have to invest a lot of work to even come close to what you mean. Music transcription has needed decades of work to even work halfway reliable. I'd suggest you pick a different problem which you can manage better than this.

The Clean programming language in the real world?

Are there any real world applications written in the Clean programming language? Either open source or proprietary.
This is not a direct answer, but when I checked last time (and I find the language very interesting) I didn't find anything ready for real-world.
The idealist in myself always wants to try out new languagages, very hot on my list (apart from the aforementioned very cool Clean Language) is currently (random order) IO, Fan and Scala...
But in the meantime I then get my pragmatism out and check the Tiobe Index. I know you can discuss it, but still: It tells me what I will be able to use in a year from now and what I possibly won't be able to use...
No pun intended!
I am using Clean together with the iTasks library to build websites quite easy around workflows.
But I guess another problem with Clean is the lack of documentation and examples: "the Clean book" is from quite a few years back, and a lot of new features don't get documented except for the papers they publish.
http://clean.cs.ru.nl/Projects page doesn't look promising :) It looks like just another research project with no real-world use to date.
As one of my professors at college has been involved in the creation of Clean, it was no shock he'd created a real world application. The rostering-program of our university was created entirely in Clean.
The Clean IDE and the Clean compiler are written in Clean. (http://wiki.clean.cs.ru.nl/Download_Clean)
Cloogle, a search engine for Clean libraries, syntax, etc. (like Hoogle for Haskell) is written in Clean. Its source is on Radboud University's GitLab instance (web frontend; engine).

Resources