Applicative programming and common lisp types - common-lisp

I've just started learning Common Lisp--and rapidly falling in love with it--and I've just moved onto the type system. I seem to be developing a particular fondness for applicative programming.
As I understand it, in CL strings and lists are both sequences, but there don't seem to be any standard functions for mapping over a sequence, only lists. I can see why they would be supplied for lists, what with them being the fundamental datatype and all, but why was it not designed to work with sequences? As they are a more general type, it would seem more useful to target applicative functions at them rather than lists. Or am I completely misunderstandimatifying how it works?
Edit:
What I was feeling particularly confused about was the way that sequences -- the abstraction -- and lists -- an implementation -- seem to be muddled up in CL. The consensus seems to be that this is for historical reasons; lisp has been around so long that you can pretty much map out the development of software engineering practices through its functions and macros; which functions apply to sequences and which to lists seems arbitrary at first glance because CL has a mixture of pre-sequence-abstraction functions that operate only on lists, and functions that do the same thing in a more general way on sequences. As someone who is just learning CL at the moment, I think it would be useful if authors introduced sequences first as the cleaner abstraction, and then bought in lists as the most fundamental implementation of that abstraction. Lists would still be needed as syntax of course, but by the time it is necessary to state this explicitly many readers would have worked this out by themselves, which would be quite an ego boost when starting out.

Why, there are a lot of functions working on sequences. Mapping over a sequence is done with MAP or MAP-INTO.
Look at the sequences section of the CLHS to find out more.
There is also a quick reference that is nicely organized.

Well, you are generally correct. Most functions do indeed focus on lists (mapcar, find, count, remove, append etc.) For a few of these there are equivalent functions for sequences (concatenate, some and every come to mind), and some, where the list-equivalent is outdated (eg. nth for lists only vs. elt for all sequences). Some functions simply work on sequences (length, for example).
CL is a bit of a mess. It's a big language, as in huge. Over 700 functions, AFAIK. And it's old. Some of these functions are deprecated by convention, and others are rarely, if ever, used.
Yes, it would be more useful to have mapping functions be methods, that applied as intended on all sequences. CL was simply not built that way. If it were to be built again today, I'm sure this would be considered, and it would look very different.
That said, you are not left completely in the cold. The loop macro works on sequences, as does iterate (a separate looping macro, which i happen to like more). This will get you far. For most practical purposes you will be using lists, and this won't be more than a pragmatic problem. If you do happen to lack a mapping function for vectors (or sequences in general), who's to stop you from writing it?

Related

R define a multidimensional object to behave like a vector

I would like to define a slightly more general version of a complex number in R. This should be a vector that has more than one component, accessible in a similar manner to using Re() and Im() for complex numbers. Is there a way to do this using S3/S4 classes?
I have read through the OO field guide among other resources, but most solutions seem focused around the use of lists as fundamental building objects. However, I need vectors for use in data.frames and matrices. I was hoping to use complex numbers as a template, but they seem to be implemented largely in C. At this point, I don't even know where to start.

Primitive function in R

Can someone give me explanation for the below sentence highlighted in Bold.
"Primitive functions are only found in the base package, and since they operate at a low level, they can be more efficient (primitive replacement functions don’t have to make copies), and can have different rules for argument matching (e.g., switch and call). This, however, comes at a cost of behaving differently from all other functions in R. Hence the R core team generally avoids creating them unless there is no other option.
Source Link:http://adv-r.had.co.nz/Functions.html#lexical-scoping

Rllvm and compiler packages: R compilation

This is a fairly general question about the future of R: Any hope to see a merger of compilerand Rllvm (from Omegahat) or another JIT compilation scheme for R (I know there is Ra, but not updated recently)?
In my tests the speed gain from compiler are marginal for "complicated" functions...
What matters isn't how complicated a function is but what kinds of computations it performs. The compiler will make most difference for functions dominated by interpreter overhead, such as ones that perform mostly simple operations on scalar or other small data. In cases like that I have seen a factor of 3 for artificial examples and a a bit
better than a factor of 2 for some production code. Functions that spend most of their time in operations implemented in native code, like linear algebra operations, will see little benefit.
This is just the first release of the compiler and it will evolve over time. LLVM is one of several possible direction we will look at but probably not for a while. In any case, I would expect using something like LLVM to provide further improvements in cases where the current compiler already makes a difference, but not to add much in cases where it does not.
(Moving from a comment to an answer ...)
This sounds more like a question for the r development mailing list. Based on my general impressions I would say "probably not". Are your complicated functions already based on heavily vectorized (and hence efficient) functions? I think a more promising direction for not-so-easily-automatically-optimized situations is the increased simplicity of embedding C++ etc. (i.e. Rcpp), inline if necessary

Coding practice in R : what are the advantages and disadvantages of different styles?

The recent questions regarding the use of require versus :: raised the question about which programming styles are used when programming in R, and what their advantages/disadvantages are. Browsing through the source code or browsing on the net, you see a lot of different styles displayed.
The main trends in my code :
heavy vectorization I play a lot with the indices (and nested indices), which results in rather obscure code sometimes but is generally a lot faster than other solutions.
eg: x[x < 5] <- 0 instead of x <- ifelse(x < 5, x, 0)
I tend to nest functions to avoid overloading the memory with temporary objects that I need to clean up. Especially with functions manipulating large datasets this can be a real burden. eg : y <- cbind(x,as.numeric(factor(x))) instead of y <- as.numeric(factor(x)) ; z <- cbind(x,y)
I write a lot of custom functions, even if I use the code only once in eg. an sapply. I believe it keeps it more readible without creating objects that can remain lying around.
I avoid loops at all costs, as I consider vectorization to be a lot cleaner (and faster)
Yet, I've noticed that opinions on this differ, and some people tend to back away from what they would call my "Perl" way of programming (or even "Lisp", with all those brackets flying around in my code. I wouldn't go that far though).
What do you consider good coding practice in R?
What is your programming style, and how do you see its advantages and disadvantages?
What I do will depend on why I am writing the code. If I am writing a data analysis script for my research (day job), I want something that works but that is readable and understandable months or even years later. I don't care too much about compute times. Vectorizing with lapply et al. can lead to obfuscation, which I would like to avoid.
In such cases, I would use loops for a repetitive process if lapply made me jump through hoops to construct the appropriate anonymous function for example. I would use the ifelse() in your first bullet because, to my mind at least, the intention of that call is easier to comprehend than the subset+replacement version. With my data analysis I am more concerned with getting things correct than necessarily with compute time --- there are always the weekends and nights when I'm not in the office when I can run big jobs.
For your other bullets; I would tend not to inline/nest calls unless they were very trivial. If I spell out the steps explicitly, I find the code easier to read and therefore less likely to contain bugs.
I write custom functions all the time, especially if I am going to be calling the code equivalent of the function repeatedly in a loop or similar. That way I have encapsulated the code out of the main data analysis script into it's own .R file which helps keep the intention of the analysis separate from how the analysis is done. And if the function is useful I have it for use in other projects etc.
If I am writing code for a package, I might start with the same attitude as my data analysis (familiarity) to get something I know works, and only then go for the optimisation if I want to improve compute times.
The one thing I try to avoid doing, is being too clever when I code, whatever I am coding for. Ultimately I am never as clever as I think I am at times and if I keep things simple, I tend not to fall on my face as often as I might if I were trying to be clever.
I write functions (in standalone .R files) for various chunks of code that conceptually do one thing. This keeps things short and sweet. I found debugging somewhat easier, because traceback() gives you which function produced an error.
I too tend to avoid loops, except when its absolutely necessary. I feel somewhat dirty if I use a for() loop. :) I try really hard to do everything vectorized or with the apply family. This is not always the best practice, especially if you need to explain the code to another person who is not as fluent in apply or vectorization.
Regarding the use of require vs ::, I tend to use both. If I only need one function from a certain package I use it via ::, but if I need several functions, I load the entire package. If there's a conflict in function names between packages, I try to remember and use ::.
I try to find a function for every task I'm trying to achieve. I believe someone before me has thought of it and made a function that works better than anything I can come up with. This sometimes works, sometimes not so much.
I try to write my code so that I can understand it. This means I comment a lot and construct chunks of code so that they somehow follow the idea of what I'm trying to achieve. I often overwrite objects as the function progresses. I think this keeps the transparency of the task, especially if you're referring to these objects later in the function. I think about speed when computing time exceeds my patience. If a function takes so long to finish that I start browsing SO, I see if I can improve it.
I found out that a good syntax editor with code folding and syntax coloring (I use Eclipse + StatET) has saved me a lot of headaches.
Based on VitoshKa's post, I am adding that I use capitalizedWords (sensu Java) for function names and fullstop.delimited for variables. I see that I could have another style for function arguments.
Naming conventions are extremely important for the readability of the code. Inspired by R's S4 internal style here is what I use:
camelCase for global functions and objects (like doSomething, getXyyy, upperLimit)
functions start with a verb
not exported and helper functions always start with "."
local variables and functions are all in small letters and in "_" syntax (do_something, get_xyyy), It makes it easy to distinguish local vs global and therefore leads to a cleaner code.
For data juggling I try to use as much SQL as possible, at least for the basic things like GROUP BY averages. I like R a lot but sometimes it's not only fun to realize that your research strategy was not good enough to find yet another function hidden in yet another package. For my cases SQL dialects do not differ much and the code is really transparent. Most of the time the threshold (when to start to use R syntax) is rather intuitive to discover. e.g.
require(RMySQL)
# selection of variables alongside conditions in SQL is really transparent
# even if conditional variables are not part of the selection
statement = "SELECT id,v1,v2,v3,v4,v5 FROM mytable
WHERE this=5
AND that != 6"
mydf <- dbGetQuery(con,statement)
# some simple things get really tricky (at least in MySQL), but simple in R
# standard deviation of table rows
dframe$rowsd <- sd(t(dframe))
So I consider it good practice and really recommend to use a SQL database for your data for most use cases. I am also looking into TSdbi and saving time series in relational database, but cannot really judge that yet.

Representing code algebraically

I have a number of small algorithms that I would like to write up in a paper. They are relatively short, and concise. However, instead of writing them in pseudo-code (à la Cormen or even Knuth), I would like to write an algebraic representation of them (more linear and better LaTeX rendering) . However, I cannot find resources as to the best notation for this, if there is anything: e.g. how do I represent a loop? If? The addition of a tuple to a list?
Has any of you encountered this problem, and somehow solved it?
Thanks.
EDIT: Thanks, people. I think I did a poor job at phrasing the question. Here goes again, hoping I make it clearer: what is the common notation for talking about loops and if-then clauses in a mathematical notation? For instance, I can use $acc \leftarrow acc \cup \langle i,i+1 \rangle$ to represent the "add" method of a list.
Don't do this. You are deviating from what people expect to see when they read a paper about algorithms. You should follow expected practices; your ideas are more likely to get the attention that they deserve. When in Rome, do as the Romans do.
Formatting code (or pseudocode as it may be) in a LaTeXed paper is very easy. See, for example, Formatting code in LaTeX.
I see if-expressions in mathematical notation fairly often. The usual thing for a loop is a recurrence relation, or equivalently, a function defined recursively.
Here's how the Ackermann function is defined on Wikipedia, for instance:
This picture is nice because it feels mathematical in flavor and yet you could clearly type it in almost exactly as written and have an implementation. It is not always possible to achieve that.
Other mathematical notations that correspond to loops include ∑-notation for summation and set-builder notation.
I hope this answers your question! But if your aim is to describe how something is done and have someone understand, I think it is probably a mistake to assume that mathematicians would prefer to see equations. I don't think they're interchangeable tools (despite Turing equivalence). If your algorithm involves mutable data structures, procedural code is probably going to be better than equations for explaining it.
I'd copy Knuth. Few know how to communicate better than him in a computer science setting.
A symbol for general loops does not exist; usually you will use the summation operator. "if" is represented using implications, and to "add a tuple to a list" you would use union.
However, in general, a bit of verbosity is not necessarily a bad thing - sometimes, especially for complex algorithms, it is best to spell it out in plain English, using examples and diagrams. This is doubly-true for non-coders.
Think about it: when you read a math text-book on Euclid's algorithm for GCD, or the sieve of Eratosthenes, how is it written? Usually, the algorithm itself is in prose, while the proof of the algorithm is where the mathematical symbols lie.
You might take a look at Haskell. Haskell formats well in latex, has a nice algebraic syntax, and you can even compile a latex file with Haskell in it, provided the code is wrapped in \begin{code} and \end{code}. See here: http://www.haskell.org/haskellwiki/Literate_programming. There are probably literate programming tools for other languages.
Lisp started out as a mathematical notation of a computing model so that the lecturer would have a better tool than turing machines. By accident, it turns out that it can be implemented in assembly - thus lisp, the programming language was born.
But I don't think this is really what you are looking for since the computing model that lisp describes doesn't have loops: recursion is used instead. The syntax derives from algebra where braces denote evaluate-this-and-substitute-the-result. Indeed, lisp's model of computing is basically substitution - what algebra essentially is.
Indeed, most functional languages like Lisp, Haskell and Erlang are derived from mathematics. Haskell is actually a result of proving that lambda calculus can be used to implement type systems. So Haskell, like Lisp was born out of pure mathematics. But again, the syntax is not what you would probably be used to.
You can certainly explain Lisp and Haskell syntax to mathematicians and they would treat it as a "game". Language constructs like loops, recursion and conditionals can be proven out of the rules of the game rather than blindly implemented like in other languages. This would lead you into the realms of combinatronics, another branch of mathematics. Indeed, in combinatronics, even the concept of numbers can be constructed out of the rules of the game rather than being a native part of the language (google Church Numerals).
So have a look at Lisp/Scheme, Erlang and Haskell if you want. Erlang especially has syntax close to what you want:
add(a,b) -> a + b
But my recommendation is to write in C-like pseudocode. It's sort of the lowest common denominator in programming languages. Has a syntax that is fairly easy to understand and clean. And the function syntax even derives from functions in mathematics. Remember f(x)?
As a plus, mathematicians are used to writing C, statisticians are used to writing C (though generally they prefer R), physicists are used to writing C, programmers are used to at least looking at C (I know a few who've never touched C).
Actually, scratch that. You mention that your target audience are statisticians. Write in R
Something like this website describes?
APL? The only problem is that few people can read it.

Resources