What best practices do you use for programming in R? [closed]

What best practices do you use for programming in R? [closed] - r

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some good practices for programming in R?
Since R is a special-purpose language that I don't use all the time, I typically just hack together some quick scripts that do what I need.
But what are some tips for writing clean and efficient R code?

You already provide some hints by stating your approach is 'hack quick scripts'. If you want best practices and structure, simple follow the established best practices from CRAN:
create a package, this opens the door to running R CMD check which is very useful
as many people have stated, having a package helps you in the code writing stage too as you are somewhat forced to document the code; that is a Good Thing (TM)
once you have a package, add code in the \examples{} section of the documentation as this will be running during R CMD check and provides an easy entry to regression testing
once you get used to regression testing, start to use a package such as RUnit; that really is best practices
JD's pointer to the Google Style Guide is a good one too. That isn't the only style guide as e.g. Henrik's R Coding Convention precedes it by a few years; and there is also Hadley's riff on Google's style guide
Otherwise, the oldie-but-goldie 'do what your colleagues and coauthors do' also applies

I recommend Josh Reich's Load, Clean, Func, Do workflow from this previous question.
In addition I recommend following coding guidelines such as Google's R Style Guide. Using a coding style guide makes reading the code later so much easier.

I completely agree with the existing answers, especially regarding the usage of packages. Packages require a lot of discipline, documentation, and structure, which really help to enforce best practices (along with R CMD CHECK). You can also use the codetools package to help with this. Use the roxygen package for documentation.
Beyond that, I recommend that you not only vectorize your code, but more particularly, make every effort to vectorize your functions, meaning that you should be able to provide vector arguments and get vectors returned (even from things like database calls). That will really improve your code efficiency and clarity in the long run.
Lastly, I really like to use something like Sweave to organize my code into clear literate reproducible research whenever writing a report. Along with this I recommend using the cache package.

For efficiency, prefer vector operations over for loops.

This is good programming practice in general, but use a version control system such as SVN manage your code.

Related

Learning R. Where does one Start? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've been using R for a little over a year now and it's been a successful venture. But all too often, I find that there is something that I can't figure out for lack of knowing how to find it or an example of it.
Stackoverflow,
Could you recommend a pathway for learning R in a manner that provides one with a toolset at their disposal to solve problems of a statistical nature?
There's a wealth of knowledge on the internet, between the r-project website and the mailings lists but it seems to be "everywhere" and nowhere when you're actually looking for it.
For example, when I first started using R, I went through "Intro to R". Then I read the language definition (which obviously hasn't sunk in). But every time I ask a question on Stackoverflow I'm presented with some new badass function that is the solution to all my problems in the short term. My question is, how did you know these functions existed in the first place? And how does one go about finding them? Presumably, you read something or found some resources that detoured your learning to the exponential part of the curve. What was it?
Obviously, R's functionality as a statistical tool is broad. For my own purposes I work mostly with economic or financial data. Hence, answers with this in mind would be most helpful.

Completely biased response: learn plyr, reshape2 and ggplot2. They will cover 90% of your data manipulation and visualisation needs. All three packages have a consistent philosophy of data (which the ggplot2 book touches upon), and are designed to be consistent and easier to
learn.
Rather than learning many specialised functions, I really encourage you to learn about simple functions that can be flexibly composed to solve a wide range of problems. This is what plyr strives to do for data manipulation, and what ggplot2 strives to do for visualisation. It does mean you need to invest more time up front to learn a little about the underlying theory, but it's my belief that it will pay off handsomely in the long run.

My way how I learned R.
R resources:
To learn R, the most important resource is google. search for: “TOPIC r-project”, “TOPIC filetype:r”, or “TOPIC site:nabble.com”.
Second, look at the example code provided with most packages. go to “http://bm2.genes.nig.ac.jp/”, search for a topic and look at the example code. run it and adapt it, this way you can often solve part of your problem.
Third: the r-help mailing list. Read the posts, the basic questions get asked over and over again. If you have a problem and you are completely stuck, ask a question on the mailing list.
Finally, look at the source code of the R-packages. that’s the hardest part. if you can alter the code to your needs, you have mastered R ;-)
Some Tips:
R has a steep learing curve. that’s a feature ;-) , it is designed to solve advanced problems and in the end you are fast than when using an alternative to R.
Know every single R package and function that is relevant to your problem. the strength of R is that there are so many packages availiable (around 2000, I think). Usually there is always a package that’s more suited or that already solves your problem. (some help pages are badly written and hard to understand - I got used to it)
R books are not helpful in learning R. yes, that’s true. If you are an expert programmer and expert statistician, you don’t need any book on R. (only exception is Hadley Wickham’s ggplot2 book). If your are not, learn programming in general and/or advanced statistics.
Some R package have known bugs, which nobody will fix (package owner left university, etc.). just a warning, this can be tricky if you are looking for a bug in your code and the bug is in a R package.

I'll start with this:
My question is, how did you know these functions existed in the first place?
Simple - we tried to solve a similar problem and came across that function. It either suited or didn't suit our needs but we now know it's there. I haven't used R much personally but what you're describing is the learning curve for every programming language ever. Firstly, you learn the "grammar" i.e. what you can do. Then you try to do something. You find you can't.
At that stage a programmer has a number of options. What do I do personally? Depends. I'll try and look up that package/header/library/whatever's member functions to see if something suits my needs. I might Google it, because unless you're really pushing the boundaries someone somewhere has probably tried and failed to do it before and had their question answered. If you are pushing the boundaries, someone somewhere has probably tried and failed before, but got no answer. I might try a forum or two to see what happens. I personally don't use IRC much, but that's another option, as are mailing lists depending on how specialised the problem is.
I also have a folder on my computer full of books which I search through depending on the problem and a small library of books I look through/learnt from, which often contain practical, not-quite-there-but-adaptable examples.
My only comment would be attempting to read the language specification is unlikely to be massively useful to you as a beginner. You won't fully understand what it means because you haven't pushed the bounds and tried things yet. For example, a novice in C might try this:
char c = '7';
int x = (int) c;
to convert the character '7' into an integer form. It's not a bad thought process until you understand how characters and ASCII work, then you see why the above doesn't give you what you want.
In short, I think this is going to be part of the learning process and I don't think you can cut it any shorter. The consolation is like any research, the more you do it the more you'll know where to look and what questions to ask on various communities.

One of the things I do is follow the RSS feed of R questions on SO (https://stackoverflow.com/feeds/tag/r). Then I can browse what other people have asked/answered.
Often I will favourite a particular question/answer if I think I'll use it, or jot down the salient points into my notebook software (OneNote), occaisonaly I'll even try the question/answer out myself.
EDIT:
I'd also recomend Patrick Burn's book R-Inferno. It's not so much of a training book as a description of all the gotchas and oooh moments Patrick has found (so far).

There's a free book you might be interested in: Introduction to Probability and Statistics Using R

Here is a good list of resources for learning R:
https://stats.stackexchange.com/questions/138/resources-for-learning-r
Also, that website in general is a good resource.
In general I would say that following a mailing list, or a help list is the best way I have found for learning new things. (That and the "R magazine": http://www.r-bloggers.com )

Learning the RODBC package to interact directly with Oracle data made a big impact at my job. My boss was amazed when I pulled Oracle data directly into R and cranking out a plot in only a few lines of code. Try doing that in Excel!
Moral of the story, learn how to pull in data and manipulate it within R. Then move to some of the cooler stuff like ggplot.

I can recommend Penn University's Introductory Course on R.
The ggplot chapter alone is worth reading - I found ggplot very confusing but this is a great explanation.

The book that helped my learning the most was The Art of R Programming. A lot of programming books can be dry. Since R is commonly an entry point to programming it's important for the voice of materials to resonante with the student. That book did just that with me. The voice felt very casual and I liked that.

Some interesting links:
Intro, links and examples: http://manuals.bioinformatics.ucr.edu/home/programming-in-r
A lot of documentation: https://en.wikibooks.org/wiki/R_Programming
R forum: http://r.789695.n4.nabble.com/

The [R] tag FAQ, right here on Stackoverflow, https://stackoverflow.com/questions/tagged/r?sort=frequent provides numerous reproducible examples that one can use to "learn by doing".
Most of the problems are very common and will eventually be something that you will have to look up as a beginner. The FAQ also provides highly literate (and experienced) examples of usage for a diverse range of functions and useful packages.
If you're new to R, and you prefer a more hands on approach to learning, the FAQ should not be overlooked as a potential resource for learning. Many of the questions also provide useful discussion surrounding paradigms of the language itself (vectorization, workflow, debugging are just a few examples).
Nearly every question in the FAQ is worth studying as a new user as it touches on elements that, speaking for myself, I wish I had been pointed to when I asked this question originally.
Just a few examples:
How to make a great R reproducible example
Grouping functions (tapply, by, aggregate) and the *apply family
Workflow for statistical analysis and report writing
How to sort a dataframe by multiple column(s)?
What is your favorite R debugging trick?

Are you using "pen and paper" during programming? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
There are many CASE tools, many software for diagrams, drawing, documenting. But can they replace old good paper?

Every day, all day! (Okay, not all day, but a lot)
I actually had a debate a while back on the value of psuedocode, and I was giving my input on how much pen/paper and some pseudocode could work wonders at times :)

I use computers to solve easy design problems, but when I hit something really hard I break out the powerful tools - pen, paper and brain.

I use a whiteboard for design and pen-and-paper for TODOs.

Especially when it comes to doing some math before the implementation, there's nothing better than putting it down on paper first!

No software can ever replace the sheer ease of jotting down ideas and solution sketches using pen/paper! EVER!
Once you have your critical thinking down on a paper you can take your time to beautify them using fancy softwares and tools.

All the time I use pen and paper, I find them invaluable tools to programming! Making notes, etc, etc...

Using quick sketches is an invaluable tool in clarifying requirements with a client. You don't have to be Da Vinci to quickly encapsulate complex business logic or UI behaviors in some simple sketches. Leah Buley at Adaptive Path has great resources on sketching for UX. Programmers can learn these techniques as well. You'll save a lot of time using paper first, before sitting down in front of Visio.

I vastly prefer pencil & paper (or pen & markerboard) for real-time thinking. It can handle just about anything my brain thinks of. If I need to create any official artifacts, I'll take what I've drawn and set it up using a tool. But usually the initial copy is sufficient.
On a side note, I'm still not sure why just about everyone in college switched to laptops for taking notes. You don't have anywhere near the ability to express your thoughts in Word as you do on paper.

All the time, especially for complex logic with lots of conditional programming!

I always find it easier to jot down what I'm about to draw/model before using application tools.

All the time. When I want to draw/write something complex, I don't want to master a piece of software to do it. Also means there are no extra applications hogging up my system resources. Plus, there's something satisfying about writing at all angles on a piece of paper :).

Most of the times when I program you can see papers all over my desk, some are wrinkled on the floor and some are not.
I usually do my brainstorming on paper and preliminary UML diagrams.
If only I had a whiteboard... :)

I don't use pen and paper when working alone, but I always use them when working with other people, talking with customers and so on. I mainly use pencils to draw diagrams.

In my opinion, the most beauty about programming, its heart its about designing good algorithm or pseudocode.
I thought before that a paper and a pen could be a good idea but I went ahead to write it, They were easy programs though, short ones.
I just approached the PNP question, not that I expect to resolve it but curiosity rules me,
You do not need to face such a big problem to use paper and pen but since I got into that I realized how important It is.
It saves time, makes you more efficient.
General while you are programming you concentrate in small concepts like: Is this variable int...?
To have a big picture of the program, the best way It is a pen, that lets you concentrate on a problem and the go with the technical stuff, memory management, security, fast code...
If you go directyle into the keyboard, You might spend lots of time creating a big powerful function to realize at the end You do not need that because It happens that variable "a" will always be negative or whatever.
But please trust me I just started programming but happily I have discovered the world of the pen and paper.
I just realized that your question is actually nor a yes no question is it about comparison with diagrams, documenting.
Pen and paper before writing the program.
Documenting while You program and that It is a good idea to use a computer, I mean of course You can document it with papers but having you code full of /* */ It is just faster and better to read it and edit it again.
So there is a place for both things but stick with the pen at the beginning.

Statistical tools for programmers [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm trying to evaluate the purchase of a statistical tool. This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation. Of course, cost is an issue, but if I can build a solid case, we could probably buy a commercial package, so we're not totally limited to free options.
So far, our options are:
Statistica (which some non-programmers already know)
Matlab Statistics toolbox (programmers already use matlab)
R language (would need a UI for non-programmers)
Hack something into Excel (not fun, but that's what non-programmers do right now)
?...
What else is out there? What's the industry standard? What kind of distinctive features should I look for? What would you recommend, and why?
Ideally, we'd like a tool that can run both on Linux and Windows machines.
(I work in medical imaging, so we do both biostatistics, and software engineering statistics)

Hands down it's R. R is very programmer friendly. It has functional aspects and it's GNU.
S-PLUS and R are both based off the S language. Both are similar and in most cases you can run as S-PLUS program in R and vice versa.
SAS is another option, although geared more towards BI and enterprise. SAS has a simpler syntax than R and in my opinion is easier to pickup for a non-programmer.
Other options include SPSS, Matlab, and even Excel.

I recommend R, personally. It's used by bioinformaticians and psychologists, I hear. Don't know what your field is though, so maybe it's a lousy choice. It is reasonably easy to use and learn.

Stata and SPSS tend to be the most commonly used packages in clinical studies. Both are pretty easy to pick up and use for non-technically minded folks but are generally flexible enough. I've used Stata more than any of the others and have been pretty happy with its options (supports both menu-based and command line operation, decent enough plugin system to get new user-created modules, good graphing support).
R is a little more daunting for newbie users, though it is popular with the biostatisticians. Since it's free, that's another nice point in its favor.

For a statistical package with a GUI which non-technical users can use, I would recommend that you go with "SAS Enterprise Guide". You will get the common and advanced SAS procedures, an excellent graphics facility and the ability to program for the technical users. I recommend that you start with the "SAS Learning Edition" (http://support.sas.com/learn/le/) which is a fully functional version of Enterprise Guide, but limited to processing 1000 rows at a time only. It is under $500, which makes it a pretty good deal.

I would look at S-Plus.
You get a strong programming environment (S-Plus Workbench, based upon the Eclipse platform), an intuitive GUI for non-programmers, and an extensive user community (including users of R, which was based upon the original S).

Visual Numerics is another option.

It sounds like you're trying to maximize multiple goals. You say "This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation", with an implicit assumption that this will be the same tool in both cases, when that might not be realistic. What's the compromise for Word and LaTeX, for example?
Some different questions about the requirements:
Should it be extensible for programmers
Able to use C extensions
Easy to make new procedures and methods
What analysis are non-programmers going to want to use?
Graphics?
Ease of use for different groups
So my read on this:
Easy to extend: R/S-plus, Matlab/Octave (I happen to prefer R, but I do more stats and fewer matrix things)
Easy to use for normal people: Excel, custom wrapped R, SPSS
Also, R on windows has a limited GUI, which may or may not help your users.
If it was me, I'd go with a hybrid solution. Use R, and give a cheat sheet for for common tasks to non-programmers that illustrates common tasks, or even better, write some wrapper functions with names like "image_summary" that automate their exploratory work.
For writing front end scripts for R, the RPy python wrappers might help as well.

SAS Enterprise Guide has good usability for non-programmers. Also, it has good options to connect to Excel. And for programmers, it's the most robust option out there. The sas server runs on anything, though, enterprise guide is Windows only.

Consider Excel one more time. It is well known, and widely available. Refer this book or this book.

This Wikipedia page compares the features available for several statistical packages, as well as their OS compatibility and pricing info (which seems a little out of date, but it gives an overall idea)

We ended up getting the Matlab Statistics toolbox (mainly because we already have some experience with Matlab in the team, and needed the tool anyway)
So far, it's doing what we need to do, and it's easily expansible. Usage will show if non-programmers really use it, but so far it's looking good.

Which CASE Tools do you use? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Which Computer-aided Software Engineering tools do you use and why? In what ways do they increase your productivity or help you design your programs? Or, in case you do not use CASE tools, what are your reasons for this?

The best CASE tool I had to work with is the Enterprise Architect from Sparx.
It's lightweight comparing to Rose (easier to buy and cheaper too) but extremely powerful. You could do great UML diagrams or database model or anything else you want but in a nice and organised way.
It greatly helps on the initial stages of the elaboration process as you could create domain model, do some preliminary use cases, map them to the requirements and present all of it in a nice way to the customer. It helps me thinking and I re-factor my design with it until I am satisfied enough to start proper documentation.
It is also very good for database models as it could reverse-engineer most databases very neatly.
The only (but quite serious) drawback it has in my eyes is that its documentation generator is, to put it mildly, crap. Getting a proper document from it is almost impossible unless you invest a significant amount of work in the templates and then it would be only OK.

I have used Rational Rose and a few other similar packages in the past. Mostly I have used them for the UML diagram elements and have not gone into the more detailed functionality such as code generation etc.
I mostly use them for aiding the design process and clarifying my own ideas. Often I find that, in trying to come up with a design for a componant, I end up needing to write down / draw what I want to happen so I can get a clear overview in my mind of what needs to happen and why. I have found that in a lot of cases, what I end up trying to draw is essentially the same as a predefined kind of diagram in UML, such as a Use Case Diagram etc. and by then adopting that style, it becomes easier to get my ideas on paper as I have some framework to work within.
So, I use CASE tools principally for thier UML / designing tools at a highish, semi-abstract level.

Oracle Designer

Not using any. No money for them.

A Good 3D mesh library [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a good 3D Mesh library
Should be able to read popular formats (OFF, OBJ...)
Should support both half-edge structure and a triangle soup
Should be tolerant to faults and illegal meshes.
Basic geometric operations - intersections, normal calculation, etc'
Most importantly - Should not be convoluted with endless template and inheritance hierarchies.
I've tried both CGAL and OpenMesh but both fail miserably in the last point.
Specifically CGAL which is impossible to follow even with the most advanced code analysis tools.
So far I'm seriously considering to pull my own.
My preference is C++ but I'm open to other options.

May I ask why the last point is a requirement?
Libraries written for public consumption are designed to be as generic as possible so that it is usable by the widest possible audience. In C++, this is often best done using templates. It would suck tremendously if found a good library, only to discover it was useless for your purposes because it used floats instead of doubles.
CGAL, for example, appears to have adopted the well-known and well-tested STL paradigm of writing generic and extensible C++ libraries. This does indeed make it difficult to follow with code analysis tools; I doubt they're much good at following STL headers either.
But are you trying to use the library or modify it? Either way, they seem to have some extremely high-quality documentation (e.g. Kernel Manual) that should make it relatively simple to figure out what you need to do, without having to resort to reading their code.
Disclaimer: I know this isn't what you're asking for. But I don't think what you're looking for exists. It is extraordinarily rare to find open source code with documentation as good as what I've seen scanning through CGAL. I would strongly suggest that you take another look at it.

First, some general comments about you requirements:
reading OBJ or OFF files is very easy. You could implement it yourself, on top of a library providing the more geometric features, in a few minutes. On the other hand, the geometric part of such libraries is so much more tricky that you should certainly focus on your requirements which really deal with the geometric algorithms, and try to find something which suits your needs. Then, of course, if there is a tie, start considering this interface issue.
in terms of geometric operations, you ask for intersection. Do you mean primitives intersection ? (for which good and simple algorithms can be found and implemented) or computation of the intersection of two meshes ? or collision detection ? (which are delicate questions, with no simple answer)
if you are more specific, from a higher level point of view, about the kind of tools you want to build, then people will be able to direct you to the right tool. Your requirements are too low-level.
As far as I understand your question, it seems to me that you do not clearly see the point of libraries like CGAL and OpenMesh. Such libraries may not provide all the higher level tools you need, but their aim is to provide you (especially in the CGAL case) all the geometric framework upon which you can build a geometric application. Such geometric frameworks are very delicate to design, especially because of the robustness issue, which is very specific to computational geometry. And without such a framework, building a robust application is an horrendous effort.
If you do not find a library which suits your need, you should seriously consider using a library such as CGAL as the underlying framework for your development. It will prevent the appearance of the robustness related problems, that you will typically only start noticing late in your development process, when changing the underlying framework will be painful. As an aside, CGAL has an extensive documentation, and a very active users' mailing-list.
If you do not know about robustness issues in geometry software, have a look at this page:
robustness issues

I don't know if it can be useful for you. There is also another library, which is called the Mangrove TDS Library, freely available at http://mangrovetds.sourceforge.net It supports any type of shapes (2d, 3d, any dimension), with any domains (manifold, non-manifold, pseudo-manifolds, iqm complexes, simplicial complexes, and so on). It possibly supports non-regular shapes, i.e., formed by pieces of different dimensionalities.
Its main property is that it is extensible, in the sense that any topological data structure is supported. It is a plugin, which can be changed and loaded at run-time.
Its implementation is based on the array-based indexing of entities, encoded in a data structure, supporting iterators. It also supports dynamic properties.
Finally, it supports an implicit representation of entities not directly encoded in a data structure (ghost entities), which improve efficiency of topological queries

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex