any R style guide / checker? - r

in Python I'm used to having my code "style-checked" by an automatic but configurable tool, called pep8, after the 8th Python enhancement proposal.
in R I don't know. Google has a style guide, but:
what do most R programmers actually use?
I still didn't find any program that performs those checks.
Dirk, Alex, in your answers you pointed me at pretty printers, but in my opinion that would overdo one thing and not do another: code would be automatically edited to follow the style, while no warnings are issued for poorly chosen identifiers.

There's a formatR package with tidy.source function. I use Emacs with ESS, and follow Hadley's style recommendations. It's hard to compare R with Python, since style is kind of mandatory in Python, unlike R. =)
EDITa simple demonstration:
code <- "fn <- function(x, y) { paste(x, '+', y, '-', x+y) }"
tidy.source(text = code)
## not run
fn <- function(x, y) {
paste(x, "+", y, "-", x + y)
}

I think if you want such a tool, you may have to write it yourself. The reason is that R does not have an equivalent to Python's PEP8; that is, an "official style guide" that has been handed down from on high and is universally followed by the majority of R programmers.
In addition there are a lot of stylistic inconsistencies in the R core itself; this is a consequence of the way in which R evolved as a language. For example, many functions in R core follow the form of foo.bar and were written before the S3 object system came along and used that notation for method dispatch. In hindsight, the naming of these functions should probably be changed in the interests of consistency and clarity, but it is too late to consider that now.
In summary, there is no official "style lint" tool for R because the R Core itself contains enough style lint, of which nothing can be done about, that writing one would be very difficult. For every rule--- "don't do this" ---there would have to be a long list of exceptions--- "except in this case, and this case, and this one, and ..., where it was done for historical purposes".

As for
what do most R programmers actually use
I suspect that quite a few people follow R Core who have a
R Coding standards section in the R Internals manual.
Which in a large sense falls back to these sensible Emacs defaults to be used along with ESS. Here is what I use and it is only minimally changed:
;;; C
(add-hook 'c-mode-hook
;;(lambda () (c-set-style "bsd")))
;;(lambda () (c-set-style "user"))) ; edd or maybe c++ ?
(lambda () (c-set-style "c++"))) ; edd or maybe c++ ?
;;;; ESS
(add-hook 'ess-mode-hook
(lambda ()
(ess-set-style 'C++)
;; Because
;; DEF GNU BSD K&R C++
;; ess-indent-level 2 2 8 5 4
;; ess-continued-statement-offset 2 2 8 5 4
;; ess-brace-offset 0 0 -8 -5 -4
;; ess-arg-function-offset 2 4 0 0 0
;; ess-expression-offset 4 2 8 5 4
;; ess-else-offset 0 0 0 0 0
;; ess-close-brace-offset 0 0 0 0 0
(add-hook 'local-write-file-hooks
(lambda ()
(ess-nuke-trailing-whitespace)))))
(setq ess-nuke-trailing-whitespace-p t)
As for a general, tool Xihui's formatR pretty-printer may indeed be the closest. Or just use ESS :)

lintr - highlights possible syntax and style issues/errors
CRAN Task View: Reproducible Research - Formatting Tools section contains other useful tools, particularly formatR which can automatically formt code.

The lint package gives warnings about stylistic problems, without correcting those.
Running the lint() command (using the default parameter values) gives you a list of warnings for all R files in the current directory.

I use styler and then lintr before I check anything into version control.
styler converts your code base to match a given style - the default matches the tidyverse style described here. It modifies alignments, and some syntax (<- over =). But, it doesn't rename variables or anything like that.
lintr is non-modifying. It just identifies lines of code that are inconsistent with your style guide. I use this within vim when I'm working on a package or a project to identify things that need a bit more human input to fix (renaming variables/functions etc)

RStudio has added a style checker at some point in the past. For instance, in version 1.1.463 you can enable the feature under General Options. Here's a screenshot:

Related

SBCL Compiler Diagnostic Messages (missing with a "when" with no body)

By accident, I recently came across a latent coding error in one of my functions, dealing with a when statement. A reduced paraphrase might look like:
(defparameter a 0)
(when (or (= a 0)
(= a 1)
(* a a)))
The problem is a misplaced parenthesis, so it actually is
(when (or (= a 0)
(= a 1)
(* a a)))
In a situation like this, wouldn't it be useful for the compiler to generate either a style warning or note? It seems to me that the meaning of a when statement normally implies a condition and a body, even though the body is strictly optional. Of course, a print pretty would have caught this in the editor, but I had copied it from elsewhere. Is there a reason that SBCL does not check for these kinds of mistakes?
a print pretty would have caught this in the editor
To discuss the options, I know about:
trivial-formatter will format the source code.
(trivial-formatter:fmt :your-system :supersede)
cl-indentify indents the source code. Has a command line utility. I tried it once and it was not bad, but different than Emacs' indentation, thus annoying for me.
$ cl-indentify bar.lisp
It links to lispindent but I was less happy with its result.
However, the best would be to not only format the code and re-read it ourselves, but to
run checks against a set of rules to warn against code smells
This is what proposes the lisp-critic. It can critique a function or a file. However:
(edit) it doesn't really have a Slime integration, we have to either critique a function or a whole file.
if you feel adventurous, see an utility of mine here. It could be an easier way to test snippets that you enter at the REPL.
it hasn't the rule about when without a body (we can easily add it)
And it would be best that the run failed with an error status code if it found a code smell. Again, a little project of mine in beta tries to do that, see here. It doesn't have much rules now, but I just pushed a check for this. You can call the script:
$colisper.sh tests/playground.lisp
it shows an error (but doesn't write it in-place by default):
|;; when with no body
|(when (or (= a 0)
| (= a 1)
!| (* a a))
!| (error "colisper found a 'when' with a missing body. (we should error the script without this rewrite!)"))
and returns with an exit code, so we can use it has a git hook or on a CI pipeline.
The problem is that if a human writes (when x) (or whatever that expands into, perhaps (if x (progn) nil)) this is probably a mistake, but when a program writes it it may well not be: it may be just some edge case that the program hasn't been smart enough to optimize completely away. And a huge amount of code that the compiler processes is written by programs, not humans.

What's the meaning of "#+" in the code of cl-mysql? [duplicate]

This question already has answers here:
operator #+ and #- in .sbclrc
(2 answers)
Closed 6 years ago.
Recently I tried to read code about cl-mysql, but got stuck with the #+.
Tried to google it, but not work, so turn to here
(defun make-lock (name)
#+sb-thread (sb-thread:make-mutex :name name)
#+ecl (mp:make-lock :name name)
#+armedbear (ext:make-thread-lock)
#+ (and clisp mt) (mt:make-mutex :name name)
#+allegro (mp:make-process-lock :name name))
And looks like it is for different backend lisp compiler. But still no idea why write something like this.
Anyone can help me make it clear, thx.
#+ is a reader-macro that checks if a keyword is in the special variable *FEATURES*. If it isn't there, the following form will be skipped over (by the reader; the compiler will never see it). There is also #- which does the opposite.
There are some things that aren't part of the Common Lisp standard, but are important enough that all (or most) implementations provide a non-standard extension for them. When you want to use them in code that needs to work on multiple implementations, you have to use read-time conditionals to provide the correct code for the current implementation. Mutexes (and threads in general) are one of those things.
Of course there may be features provided by third party libraries as well. The contents of *FEATURES* will look something like this:
(:SWANK :QUICKLISP :SB-BSD-SOCKETS-ADDRINFO :ASDF-PACKAGE-SYSTEM :ASDF3.1
:ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :64-BIT
:64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS
:C-STACK-IS-CONTROL-STACK :COMMON-LISP :COMPARE-AND-SWAP-VOPS
:COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :ELF :FLOAT-EQL-VOPS
:FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT :INLINE-CONSTANTS
:INTEGER-EQL-VOP :INTERLEAVED-RAW-SLOTS :LARGEFILE :LINKAGE-TABLE :LINUX
:LITTLE-ENDIAN :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS :OS-PROVIDES-DLADDR
:OS-PROVIDES-DLOPEN :OS-PROVIDES-GETPROTOBY-R :OS-PROVIDES-POLL
:OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
:PRECISE-ARG-COUNT-ERROR :RAW-INSTANCE-INIT-VOPS :SB-DOC :SB-EVAL :SB-FUTEX
:SB-LDB :SB-PACKAGE-LOCKS :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-TEST
:SB-THREAD :SB-UNICODE :SBCL :STACK-ALLOCATABLE-CLOSURES
:STACK-ALLOCATABLE-FIXED-OBJECTS :STACK-ALLOCATABLE-LISTS
:STACK-ALLOCATABLE-VECTORS :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS
:UNIX :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)
So if you wanted to write code that depends on Quicklisp for example, you could use #+quicklisp. If you wanted code that is only run if Quicklisp is not available, you'd use #-quicklisp.
You can also use a boolean expression of features. For example,
#+(or sbcl ecl) (format t "Foo!")
would print Foo! on either SBCL or ECL.
#+(and sbcl quicklisp) (format t "Bar!")
would only print Bar! on SBCL that has Quicklisp available.
One could imagine that we can write:
(defun make-lock (name)
(cond ((member :sb-thread *features)
(sb-thread:make-mutex :name name))
((member :ecl *features*)
(mp:make-lock :name name))
...))
But that does usually not work, because we can't read symbols when their package is not existing and some packages are implementation/library/application specific. Packages are not created at read time in a lazy/automatic fashion.
In Common Lisp, reading a symbol of a package, which does not exist, leads to an error:
CL-USER 1 > (read-from-string "foo:bar")
Error: Reader cannot find package FOO.
1 (continue) Create the FOO package.
2 Use another package instead of FOO.
3 Try finding package FOO again.
4 (abort) Return to level 0.
5 Return to top loop level 0.
In your example sb-thread:make-mutex is a symbol which makes sense in SBCL, but not in Allegro CL. Additionally the package SB-THREAD does not exist in Allegro CL. Thus Allegro CL needs to be protected from reading it. In this case, the symbol sb-thread:make-mutex will only be read, if the the feature sb-thread is present on the cl:*features* list. Which is likely only for SBCL, or a Lisp which claims to have sb-threads available.
The feature expressions here prevents the Lisp from trying to read symbols with unknown packages - the packages are unknown, because the respective software is not loaded or not available.

Defining aliases to standard Common Lisp functions?

Lisp is said to enable redefinitions of its core functions.
I want to define an alias to the function cl:documentation function, such that
(doc 'write 'function) === (documentation 'write 'function)
How can this be done and made permanent in SBCL?
Creating an Alias
You are not trying to redefine (i.e., change the definition of) the system function documentation, you want to define your own function with a shorter name which would do the same thing as the system function.
This can be done using fdefinition:
(setf (fdefinition 'doc) #'documentation)
How to make your change "permanent" in common lisp
There is no standard way, different implementation may do it differently, but, generally speaking, there are two common ways.
Add code to an init file - for beginners and casual users
SBCL
CLISP
Clozure
ECL
The code in question will be evaluated anew every time lisp starts.
Pro:
Easy to modify (just edit file)
Takes little disk space
Normal lisp invocation captures the change
Con:
Evaluated every time you start lisp (so, slows start up time if the code is slow)
Save image - for heavy-weight professionals
SBCL
CLISP
Clozure
ECL - not supported
The modified lisp world is saved to disk.
Pro:
Start uptime is unaffected
Con:
Requires re-dumping the world on each change
Lisp image is usually a large file (>10MB)
Must specify the image at invocation time
Even though #sds has already answered pretty thoroughly I just wanted to add that the utility library serapeum has defalias
I use a simple macro for this:
(defmacro alias (to fn)
`(setf (fdefinition ',to) #',fn))
e.g.
(alias neg -) => #<Compiled-function ... >
(neg 10) => -10
Other answers include detail about how to make this permanent.

Customizing the ESS environment for R

I am trying to optimize my ESS - R environment. So far I make use of the r-autoyas, I set intendation and stuff following style guides, in the mini-buffer there are eldoc hints for function arguments, and I have the option to press a key in order to find information about variable at point (more here).
Are there any other things you use in order to have a nice R environment? Maybe non-ESS people have some nice things to add (I got that info of variable at point from looking at an Eclipser). One example could be an easy way to insert "just-before-defined" variables without typing the variable name (should be something for that?).
(Please help me to change the question instead of "closing" the thread if it is not well formulated)
I am not using autoyas as I find auto-complete integration a better approach.
Insertion of previously defined symbols is a general emacs functionality called 'dabbrev-expand' and is bound to M-/. I have this in my .emacs to make it complete on full symbols:
(setq dabbrev-abbrev-char-regexp "\\sw\\|\\s_\\|s.")
(setq dabbrev-case-fold-search t)
Another thing which I use extensively is imenu-based-jump-to-symbol-definition. It offers similar functionality to emacs tags, but just for open buffers in the same mode as the current buffer. It also uses IDO for queries:
Put imenu-anywhere.el into your emacs load path and add this:
(require 'imenu-anywhere)
(global-set-key [?\M-o] 'imenu-anywhere)
Now, if I do M-o foo RET emacs jumps to the function/class/method/generic definition of 'foo' as long as 'foo' is defined in one of the open buffers. This of course works whenever a mode defines imenu-tags. ESS defines those, so you should not need to add more.
There is also somewhere a collection of R-yas templates. I didn't get around to starting using them but my guess is that it's a pretty efficient template insertion mechanism.
[edit] Activate tracebug:
(setq ess-use-tracebug t)

Make Emacs ESS follow R style guide

I've been using Emacs/ESS for quite a while, and I'm familiar with Hadley's R style recommendations. I'd like to follow these conventions in ESS, like those nice spaces around operators, space after comma and after if statement, before curly braces, etc.
Did anyone even bothered to follow this style guide at all? IMHO, official style recommendations are quite modest, and they say nothing about the style whatsoever. Google R style guide are too similar with the ones I use when I code in JavaScript, so it's a no-no.
Long story short: is there anyone with (e)LISP skills willing to implement (Hadley's) style guide for ESS?
I don't write Elisp, and I disagree with Hadley about the stylistic merits of underscores. Moreover, Hadley is still lost in the desert of not using the OneTrueEditor so we can expect no help from him on this on this issue.
But if you are open to follow R Core rather than Hadley, below is what the R Internals manual, section 8. "R Coding Standards" recommends. To me, it is R Core who defines R style first and foremost. Google's and Hadley's styles are nice secondary recommendations.
Anyway, back to Elisp. The following has served we well for many years, and I do like the fact that the basic R behaviour is similar to the Emacs C++ style as I happen to look at code in both modes a lot.
[...]
It is also important that code is written in a way that allows
others to understand it. This is particularly helpful for fixing
problems, and includes using self-descriptive variable names,
commenting the code, and also formatting it properly. The R Core Team
recommends to use a basic indentation of 4 for R and C (and most
likely also Perl) code, and 2 for documentation in Rd format. Emacs
(21 or later) users can implement this indentation style by putting
the following in one of their startup files, and using customization
to set the c-default-style' to"bsd"' and c-basic-offset' to4'.)
;;; ESS
(add-hook 'ess-mode-hook
(lambda ()
(ess-set-style 'C++ 'quiet)
;; Because
;; DEF GNU BSD K&R C++
;; ess-indent-level 2 2 8 5 4
;; ess-continued-statement-offset 2 2 8 5 4
;; ess-brace-offset 0 0 -8 -5 -4
;; ess-arg-function-offset 2 4 0 0 0
;; ess-expression-offset 4 2 8 5 4
;; ess-else-offset 0 0 0 0 0
;; ess-close-brace-offset 0 0 0 0 0
(add-hook 'local-write-file-hooks
(lambda ()
(ess-nuke-trailing-whitespace)))))
(setq ess-nuke-trailing-whitespace-p 'ask)
;; or even
;; (setq ess-nuke-trailing-whitespace-p t)
;;; Perl
(add-hook 'perl-mode-hook
(lambda () (setq perl-indent-level 4)))
(The `GNU' styles for Emacs' C and R modes use a basic indentation of
2, which has been determined not to display the structure clearly
enough when using narrow fonts.)
I think the only additions I regularly make are to follow the last commented-out snippet:
;; or even
(setq ess-nuke-trailing-whitespace-p t)
You can of course turn off the underscore toggle if you really need to code with underscores.
With the development version of ESS (to be released in September 2015), just add to your ess-mode-hook:
(ess-set-style 'RStudio)
RStudio also has a 'Vertically align arguments' checkbox that is set by default. If you want to reproduce the behaviour when this setting is unchecked (as in Hadley's code), you'll need to change ess-offset-arguments to prev-line:
(setq ess-offset-arguments 'prev-line)
Please file an issue on https://github.com/emacs-ess/ESS if you find an important discrepancy between ESS and RStudio indentation.
The good point of Hadley's guide is spaceing around operators (except maybe around /)
There is a smart-operator package which implements it for almost every operator.
This is my setup (uncoment operators which you want to use):
(setq smart-operator-mode-map
(let ((keymap (make-sparse-keymap)))
(define-key keymap "=" 'smart-operator-self-insert-command)
;; (define-key keymap "<" 'smart-operator-<)
;; (define-key keymap ">" 'smart-operator->)
;; (define-key keymap "%" 'smart-operator-%)
(define-key keymap "+" 'smart-operator-+)
;; (define-key keymap "-" 'smart-operator--)
;; (define-key keymap "*" 'smart-operator-*)
;; (define-key keymap "/" 'smart-operator-self-insert-command)
(define-key keymap "&" 'smart-operator-&)
(define-key keymap "|" 'smart-operator-self-insert-command)
;; (define-key keymap "!" 'smart-operator-self-insert-command)
;; (define-key keymap ":" 'smart-operator-:)
;; (define-key keymap "?" 'smart-operator-?)
(define-key keymap "," 'smart-operator-,)
;; (define-key keymap "." 'smart-operator-.)
keymap)
"Keymap used my `smart-operator-mode'.")
See also a nice discussion on R styles here.
[edit] I am also using the defacto camelCase style of R code for globals. The underscore-separated names for local variables - it's easy to differentiate.
There is a special subword mode in emacs which redefines all editing and navigation commands to be used on capitalized sub-words
(global-subword-mode)
Having the same style preference of the OP, I jumped here and I thank #lionel for your valid suggestions, but I had an issue coming from setting RStudio style in ~/emacs.d/init.el with:
(ess-set-style 'RStudio)
Emacs/ESS refused to apply the style (4-space indent instead of the 2-space, not discussing style here :) ).
My advice for the novice, using the following hook in your ~/emacs.d/init.el:
(add-hook 'find-file-hook 'my-r-style-hook)
(defun my-r-style-hook ()
(when (string-match (file-name-extension buffer-file-name) "[r|R]$")
(ess-set-style 'RStudio)))
will do the trick.
One more component of the style guide which hasn't been covered here so far
is indentation when a function runs over multiple lines. The style guide
says to use indentation like
long_function_name <- function(a = "a long argument",
b = "another argument", # X
c = "another long argument") {
# As usual code is indented by two spaces.
}
but using any of the default styles seems to indent line X by some fixed
number of spaces rather than up to the (. The solution is to set
ess-arg-function-offset to some non-number, e.g.
(set 'ess-arg-function-offset t)
This should be placed in a mode hook similar to #Dirk Eddelbuettel's answer. Also it must go after any calls to ess-set-style or it will be overridden.

Resources