What is your preferred style for naming variables in R? [closed] - r

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Which conventions for naming variables and functions do you favor in R code?
As far as I can tell, there are several different conventions, all of which coexist in cacophonous harmony:
1. Use of period separator, e.g.
stock.prices <- c(12.01, 10.12)
col.names <- c('symbol','price')
Pros: Has historical precedence in the R community, prevalent throughout the R core, and recommended by Google's R Style Guide.
Cons: Rife with object-oriented connotations, and confusing to R newbies
2. Use of underscores
stock_prices <- c(12.01, 10.12)
col_names <- c('symbol','price')
Pros: A common convention in many programming langs; favored by Hadley Wickham's Style Guide, and used in ggplot2 and plyr packages.
Cons: Not historically used by R programmers; is annoyingly mapped to '<-' operator in Emacs-Speaks-Statistics (alterable with 'ess-toggle-underscore').
3. Use of mixed capitalization (camelCase)
stockPrices <- c(12.01, 10.12)
colNames <- c('symbol','price')
Pros: Appears to have wide adoption in several language communities.
Cons: Has recent precedent, but not historically used (in either R base or its documentation).
Finally, as if it weren't confusing enough, I ought to point out that the Google Style Guide argues for dot notation for variables, but mixed capitalization for functions.
The lack of consistent style across R packages is problematic on several levels. From a developer standpoint, it makes maintaining and extending other's code difficult (esp. where its style is inconsistent with your own). From a R user standpoint, the inconsistent syntax steepens R's learning curve, by multiplying the ways a concept might be expressed (e.g. is that date casting function asDate(), as.date(), or as_date()? No, it's as.Date()).

Good previous answers so just a little to add here:
underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);
dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;
so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.
And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)

I did a survey of what naming conventions that are actually used on CRAN that got accepted to the R Journal :) Here is a graph summarizing the results:
Turns out (no surprises perhaps) that lowerCamelCase was most often used for function names and period.separated names most often used for parameters. To use UpperCamelCase, as advocated by Google's R style guide is really rare however, and it is a bit strange that they advocate using that naming convention.
The full paper is here:
http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf

Underscores all the way! Contrary to popular opinion, there are a number of functions in base R that use underscores. Run grep("^[^\\.]*$", apropos("_"), value = T) to see them all.
I use the official Hadley style of coding ;)

I like camelCase when the camel actually provides something meaningful -- like the datatype.
dfProfitLoss, where df = dataframe
or
vdfMergedFiles(), where the function takes in a vector and spits out a dataframe
While I think _ really adds to the readability, there just seems to be too many issues with using .-_ or other characters in names. Especially if you work across several languages.

This comes down to personal preference, but I follow the google style guide because it's consistent with the style of the core team. I have yet to see an underscore in a variable in base R.

As I point out here:
How does the verbosity of identifiers affect the performance of a programmer?
it's worth bearing in mind how understandable your variable names are to your co-workers/users if they are non-native speakers...
For that reason I'd say underscores and periods are better than capitalisation, but as you point out consistency is essential within your script.

As others have mentioned, underscores will screw up a lot of folks. No, it's not verboten but it isn't particularly common either.
Using dots as a separator gets a little hairy with S3 classes and the like.
In my experience, it seems like a lot of the high muckity mucks of R prefer the use of camelCase, with some dot usage and a smattering of underscores.

I have a preference for mixedCapitals.
But I often use periods to indicate what the variable type is:
mixedCapitals.mat is a matrix.
mixedCapitals.lm is a linear model.
mixedCapitals.lst is a list object.
and so on.

Usually I rename my variables using a ix of underscores and a mixed capitalization (camelCase). Simple variables are naming using underscores, example:
PSOE_votes -> number of votes for the PSOE (political group of Spain).
PSOE_states -> Categorical, indicates the state where PSOE wins {Aragon, Andalucia, ...)
PSOE_political_force -> Categorial, indicates the position between political groups of PSOE {first, second, third)
PSOE_07 -> Union of PSOE_votes + PSOE_states + PSOE_political_force at 2007 (header -> votes, states, position)
If my variable is a result of to applied function in one/two Variables I using a mixed capitalization.
Example:
positionXstates <- xtabs(~states+position, PSOE_07)

Related

R Development: Use of `::` Operator for `base` Package

TLDR
Does rigorous best practice recommend that a proactive R developer explicitly disambiguate all base functions — even the ubiquitously common functions like c() or cat() — within the .R files of their package, using the package::function() convention?
Context
Though a novice developer, I need to create a (proprietary) package in R. The text R Packages (by the authoritative authors Hadley Wickham and Jenny Bryan) has proven extremely helpful (if occasionally deprecated).
I am keen on following best practices from the start, so as to save myself time and effort down the road. As described in the text, the use of the :: operator can prevent current and
future conflicts, by disambiguating functions whose names are overloaded. To wit, the authors are careful to introduce each function with the package::function() convention, and they recommend its general use within the .R files of one's package.
However, their code examples often call functions that hail from the base package yet are unaccompanied by base::. Many base functions, like the ubiquitous c() or cat(), are used by R programmers in their sleep and (I imagine) are unlikely to ever be overloaded by a presumptuous developer. Nonetheless, it is confusing to see (for example) the juxtaposition of base::with() against (the base function) print(), all within a few lines of text.
...(These functions are inspired by how base::with() works.)
f <- function(x, sig_digits) {
# imagine lots of code here
withr::with_options(
list(digits = sig_digits),
print(x)
)
# ... and a lot more code here
}
I understand that the purpose of base::with() is to unambiguously introduce the with() function to the reader. However, the absence of base:: (within the code itself) seems to stick out like a sore thumb, when the package is explicitly named for any function called from any other package. Given my inexperience, I don't feel comfortable assuming the authors' intent.
Question
Are the names of base functions sufficiently unique that using this convention — of calling base::function() for every function() within the base package — would not be worth it? That the risk of overloading the functions (at some point in the future) is far outweighed by the inconvenience (and sheer ugliness) of
my_vector <- base::c(1, 2, 3)
throughout one's .R files? If not, is there an established convention that would balance unambiguity with elegance?
As always, I am grateful for any help, especially on this, my first post to Stack Overflow.

Primitive function in R

Can someone give me explanation for the below sentence highlighted in Bold.
"Primitive functions are only found in the base package, and since they operate at a low level, they can be more efficient (primitive replacement functions don’t have to make copies), and can have different rules for argument matching (e.g., switch and call). This, however, comes at a cost of behaving differently from all other functions in R. Hence the R core team generally avoids creating them unless there is no other option.
Source Link:http://adv-r.had.co.nz/Functions.html#lexical-scoping

Abbreviations and functions in preparation for a programming contest [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am participating in a big programming competition tomorrow where I use R.
Time is the main factor (only 2 hours for 7 coding problems).
The problems are very mathematics related.
I would like to write "f" instead of "function" when I define a function.
This can be done and I had the code to do so, but I lost it and cannot find it.
Where do I find sin() functions for degrees input, not radian?
(optional) Is there any algorithm specific task view or libraries.
Any tip for a programming contest?
I prepared the following cheat sheet for the contest:
http://pastebin.com/h5xDLhvg
======== EDIT: ==========
So I finally have time to write down my lessons learned.
The programming contest was a lot of fun, but unfortunately I did not score very well. I was in the top 50%, but my aim was to be in the top 25%.
The main problem was that there was very little time to program, just 2 hours in total. But I had to read the problem descriptions and also I needed some time to paste the results in the web form, etc., so it was more like 90 Minutes of programming.
Hopefully the next contest in December will have extended time, like 3-4 hours. The organizers said that perhaps will be the case.
Also, there was no Internet access at the contest, and my mobile reception was not really working.
The main lesson for me is that you have to use a language you daily use in order to have a real chance. Especially, if there is only about 90 Minutes time to program. Since I use haskell more than R in my daily work, I think R was not the best choice. During the contest I mixed up haskell and R function definitions, and I made too many small typos to program fast enough.
What was great about the contest was, that there was about 20 000 bucks prize money in total for the about 80 participants. So the top 25% participants got from 500 to 1500 bucks each. Further, I think the top 15% get a job right away from one of the sponsor IT firms.
So it's a win-win situation. It's fun, plus you can get prize money. Further the IT firms are more than happy, because they have access to the top programmers.
I used the chance to speak to IT decision makers. One of them was from a larger bank. I boldly suggested that they consider switching to Scala for their development (switchung from Java). And also to consider using R and Haskell. It was fun, and they even said they already looked into Scala!
What was interesting to note was, that one of my best friends scored very good at the competition. He is only 19 years old, but he was well in the top 20% and got 500 bucks prize money. He beat me plus 6 of my colleges, who all have a respectable computer science degree. My friend programs more like hacker style, but he was very fast.
People in the top 10 used:
1) Java
2) C# and
3) C++
(No other programming language in the top 10!).
The only other programming language that scored reasonably well was Ruby, I think.
For the next contest the programming language of choice will probably be haskell. For one reason, it's just easier to find 2 team mates for haskell than for R programming. And up to 3 persons can form a team.
My ideal scenario would be a very light weight framework, where I could use multiple programming languages at once for the contest. That way, the main code can be written in haskell (which all team mates can program in). And some specific functions may be programmed in R, or in Mathematica, or even some other programming language (like python/sage).
This sounds a little bit overkill. But I think it would be very usefull. Like a function that has a matrix as a parameter and returns a matrix. Then this framework work generate automatically a RESTful service from the R code, so I could call the R function from any programming language. The matrix is just passed around as JSON data (or some other serialization). Okay, but this is off topic...
So finally some lessons learned as a bullet list:
don't bring food. you don't have time to eat, and there is a rich buffet afterwards
time is the limiting factor!
if you don't program R for a living, don't use R
look for contests where there is more time (3-4 hourss minimum!)
all in all, the concept of the contest is superb! Both for the participants, but also for the sponsors.
BIG THANKS to the help of 'Iterator' for his post!!
I'm going to answer a related, but different question. No offense, but your original suggestions don't seem very wise for a programming competition. Much of the time spent in such contexts is in devising an answer and in debugging (or, better, avoiding the need to debug).
Instead, I will answer this question: "What are the key resources in R that are useful for rapid prototyping, with a focus on being able to find resources quickly, being able to debug quickly, and being able to investigate data quickly? If I need to use numerical optimization methods and algebra systems, what should I investigate?"
Here are my answers:
Install RStudio or possibly Revolution Analytics' R, depending on which interface seems more appropriate to you. Both are good. The former has a very smooth GUI, the latter has a more intense interface, with more capabilities for managing code. Both have some nice properties over the "community" R regarding being able to look up information and navigate the help libraries quickly.
Get acquainted with example(), identify where to get vignettes and tutorials (from packages' pages on CRAN), and take a brief look at demo().
Use the sos library, and master findFn.
Look at the Task Views on CRAN - be sure you know about the tools for high performance computing (if that is going to be related) and the tools for optimization - it's quite common to need to use some kind of solver, and there's a task view for that.
If your code is running slowly during the prototyping or competition, you'll need to run Rprof(). Take that for a spin first. You may also benefit from using the compiler package if your code involves much iteration. In short: You do not want to wait on the computer. You might also look at foreach and doSMP or doMC if you can parcel the job to different cores. To aggregate results, become familiar with plyr and methods like ldply, as well as standard *apply functions, like lapply and apply; another good one to know is rapply. (If you have lots of stuff to process and it takes some time, look at mclapply or the .parallel argument for the plyr functions.)
On Stack Overflow: browse JD Long's questions - much of what you will discover that you do not know will have been asked by him before you thought to ask it. And there's an answer already there.
Create a number of little code templates for yourself. Master functions so that you don't need to learn these in a rush. Learn how to debug and step through these, using debug() and browser().
If you have to count things, learn how to use the hash package (akin to Perl and Python hash tables) and learn to use digest for keys that are too long to be used for hash (see this question for references)
If you are going to need to plot things, get some basic example plots prepared, using either plot or ggplot2, along with hist, boxplot, and some others. If you don't know ggplot2 already, then postpone, but you should become familiar with it. If you happen to use a lot of data, then be sure you know hexbin. If you will have to interact with data, then get to know iplots and the interesting tools there, such as iplot, ihist, and parallel coordinate plots (ipcp).
Be sure you know how to use lists, data frames, and matrices, including subscripting, lookups of entries based on (row, column) indices. (Again, be sure to investigate plyr for transforming and operating on some of these objects.)
Get acquainted with data.table() - it's exceptionally efficient for a lot of things you might do with data frames and matrices.
If you need to do symbolic mathematics, be sure you know the packages for that or else get another standalone tool for symbolic math. Ryacas is one package that appears to be useful.
Get the PDF of the R in a Nutshell, so that you can rapidly search through it for useful methods. Else, get the book itself. Various other books, such as Venables & Ripley, the R Cookbook, and others may be useful, depending on your experience.
If you've already mastered a good editor (e.g. emacs) or IDE (e.g. Eclipse), stick with it and look for bindings to R. Otherwise, a simple one you can begin using right away is Notepad++. Being able to do block selection is a very useful property in an editor. Being able to search through an entire directory hierarchy of code examples is another useful capability.
If you need to do anything involving database data, you may want to know RSQLite and sqldf, though these may not be relevant to a math competition.
Open a bunch of R instances so that you can try things out. :) [This is actually serious: by having multiple instances running, you can somewhat avoid latency associated with sequentially trying things out, waiting for results, and then debugging the results.]
For (1), you can do something like
f <- function(..., body)
{
dots <- substitute(...)
body <- substitute(body)
f <- function()
formals(f) <- dots
body(f) <- body
environment(f) <- parent.env(environment())
f
}
which lets you write, eg, g <- f(x, y, body=x+y) but I'm not sure how far that gets you.
For (2), you could just do:
sindeg <- function(x) sin(x*pi/180)

How did you experience the transition from SPSS to R? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The discussion in this question is the direct cause for me asking this question. The more general reason is the fact that I often have to explain R use to people that are only familiar with SPSS. I know most of the basics of SPSS, as we still use it in the base course statistics. But as I'm more of an R guy, it's difficult to know how SPSS users experience the first meeting with R.
I know there is the book R for SAS and SPSS users and that contains already some information. Yet, I would like to know what the more difficult parts are when you switch from SPSS to R.
Or in other words : if you would have to explain R in one day to SPSS users, which topics would you focus on? This is not a hypothetical question by the way (yeah, I know, it's not because one get paid for it that it always makes sense...).
Firstly, data manipulation has been the most challenging thing to learn coming from SPSS/SAS to R. I've found, personally, that getting the data in the right shape for an analysis is usually much more difficult than the analysis itself. Secondly, a true understanding of how to deal with categorical values through the use of factors. Lastly, summary statistics and descriptives can sometimes be challenging to get in a format that is transmutable to PPT or Excel which are what (my) clients generally expect/demand for reporting.
I would focus on:
1 Data manipulation
Understanding data structures. Import/Export. Then in-depth training on the use of packages like plyer, reshape with a particular focus on how to effectively use cast with formulas and melt with ids. How to apply numerical functions within a data.frame using ddply.
2 Factoring Data
In general, an explanation of dealing with recoding with, epicalc or a user-defined function. Also an explanation of the significance of factors, levels, and labels
3 Descriptives
Take a few minutes to introduce xtabs(), table(), prop.table() using cast() from reshape to create columnar tables of data that are more reasonably exported to Excel.
Graphics are optional, if you've done a good job of the above they should be able to get the data they need to create graphs in whatever software they are most comfortable with.
4 Graphics
If you've done a good job teaching the data manipulation, getting data into the shape needed for graphing should be pretty straightforward (or at least reproducible) at this point. ggplot2 is complicated and requires a day just by itself to be played with. But it is possible to give a quick overview of it. Alternatively, base graphics are simple to understand and the help is much more clear on what things do and how the syntax works.
Note: I left out statistical analysis. However, an overview of lm() and perhaps anova(), or cor() would be helpful as a start point. But this should be explained at the same time as data.manipulation.
Although I "wrote the book" on R to SPSS migration, that was aimed at programmers and most SPSS users that I know prefer to "point-and-click" instead. A graphical user interface like Deducer (or R Commander) can help them feel at home while teaching them how R programming code works if they want to see it. Deducer's Plot Builder also does a nice job letting you create complex plots easily, and if you want to learn to ggplot2 code, it will show you that as well. Ian did a great job with it!
However, while the SPSS graphical user interface covers 98% of what SPSS can do, Deducer covers perhaps 1% of what R can do. That's probably still 75% of what your average researcher needs, but R is so broad that to get the most out of it people will need to learn to program. The free version of my book, "R for SAS and SPSS Users" is only 80 pages & covers the areas of programming that I think are most likely to confuse beginners. It's at http://r4stats.com.
Just recently I've had a student who was somewhat versed in statistics and did some analysis beforehand in SPSS. I then showed him how to do the exact same thing in R. We went through the code and plotting, explaining and debating each line. He realized how easy and convenient it is to do it in R. Thus, R community grew by 1. :)
The biggest issue that the researchers I've dealt with have is the lack of point-and-click GUI. While there are a number of efforts out there in the R community, none of them have reached the ease-of-use/power level that SPSS has.
Since coding is second nature to R users, sometimes we forget that the majority of users of statistical software can't program (and would avoid it like the plague), even though they may have a strong practical understanding of statistics.
If I had one day to bring an SPSS user into R, I'd start them on Deducer. Deducer is an R GUI project (Self promotion note: I'm the author) that should feel very familiar to a user coming from SPSS. As they find themselves needing more advanced functions, they will naturally move to the command line to fulfill their needs.

How to describe a MATLAB function using mathematical notation?

Basically I have created two MATLAB functions which involve some basic signal processing and I need to describe how these functions work in a written report. It specifically requires me to describe the algorithms using mathematical notation.
Maths really isn't my strong point at all, in fact I'm quite surprised I've even been able to develop the functions in the first place. I'm quite worried about the situation at the moment, it's the last section of writing I need to complete but it is crucially important.
What I want to know is whether I'm going to have to grab a book and teach myself mathematical notation in a very short space of time or is there possibly an easier/quicker way to learn? (Yes I know reading a book should be simple enough, but maths + short time frame = major headache + stress)
I've searched through some threads on here already but I really don't know where to start!
Although your question is rather vague, and I have no idea what sorts of algorithms you have coded that you are trying to describe in equation form, here are a few pointers that may help:
Check the MATLAB documentation: If you are using built-in MATLAB functions, they will sometimes give an equation in the documentation that describes what they are doing internally. Some examples are the functions CONV, CORRCOEF, and FFT. If the function is rather complicated, it may not have an equation but instead have links to some papers describing the algorithm, which may themselves have equations for the algorithm. An example is the function HILBERT (which you can also find equations for on Wikipedia).
Find some lists of common mathematical symbols: Some standard symbols used to represent common mathematical operations can be found here.
Look at some sample pseudocode to see how it's done: For algorithms you yourself have coded up, you'll have to write them out in equation or pseudocode form. A paper that I've used often in my work is Templates for the Solution of Linear Systems, and it has some examples of pseudocode that may be helpful to you. I would suggest first looking at the list of symbols used in that paper (on page iv) to see some typical notations used to represent various mathematical operations. You can then look at some of the examples of pseudocode throughout the rest of the document, such as in the box on page 8.
I suggest that you learn a little bit of LaTeX and investigate Matlab's publish feature. You only need to learn enough LaTeX to write mathematical expressions. Then you have to write Matlab comments in your source file in LaTeX, but only for the bits you want to look like high-quality maths. Finally, open the Matlab editor on your .m file, and select File | Publish.
See Very Quick Intro to LaTeX and check your Matlab documentation for publish.
In addition to the answers already here, I would strongly advise using words in addition to forumlae in your report to describe the maths that you are presenting.
If I were marking a student's report and they explained the concepts of what they were doing correctly, but had poor or incorrect mathematical notation to back it up: this would lose them some marks, but would hopefully not impede my understanding of the hard work they've put in.
If they had poor/wrong maths, with no explanation of what they meant to say, this could jeapordise my understanding of their entire project and cost them a passing grade.
The reason you haven't found any useful threads is because most of the time, people are trying to turn maths into algorithms, not vice versa!
Starting from an arbitrary algorithm, sometimes pseudo-code, along with suitable comments, is the clearest (and possibly only) representation.

Resources