How to easily see the syntax of functions in R? - r

I am struggling to learn R since I cannot find a good way to see the syntax of the functions.
For example: I am practicing the rename function, I have a video guiding it, but it's not Ideal. When I've learned python it had the shift + Tab option that showed exactly what is mandatory, the order, etc. It was not perfect, but I know I can get away with it 80% of the times.
There's a good way to guide yourself in R, like the Shift + tab in Python?

Related

Visualization of R-Workflow through Flowchart in Alteryx-Way

I'm wondering if there are any packages for R which help to visualize workflows/code in a way Alteryx does. I find the visualization of the workflows within Alteryx quite helpful, but manually dragging an dropping the tools onto the canvas and set the parameters just takes so much longer than just writing the code in R. Also some functionally within Alteryx is not yet sufficient and has to be implemented via the R/Python-Tool anyway.
During my search I found this post which goes into the same direction, but the suggested packages don't really match what I am looking for.
Best regards

improving rGL HTML performance with multiple figures (mfrow3d) + rglWidgets

i am using RGL to produce a panel of multiple figures through the mfrow3d command.
for the most part, the html produced from the call to writeWebGL is exemplary.
the one caveat is that for multiple figures (be it 6 or 16), i have noticed a bit of lag when attempting to manipulate any one of these figures (to pan/zoom/look around).
an example can be found here: http://fluxions.dydx.ie:1338/schiz.html (warning, 100MB html file haha).
i wanted to ask people here if there is anything i can do in terms of using the "reuse" argument that may speed up performance.
additionally, i wanted to ask if there is any benefit to using rglWidgets and if there is a small example someone could provide in porting a writeWebGL call produced from the following:
https://johnmuschelli.com/WebGL_Interactive_Paper/supp_1/supp_1_wrap.Rmd
to rglwidgets (in hopes that the reuse argument in widgets may improve performance due to my use of mfrow3d).
i am not familiar on how to capture a multi-figure layout with multiple calls to contour3d as a scene that widgets can use.
dr duncan murdoch has gotten back to me and said there probably is not a way to do this, so i guess i will close it.
he is very helpful and i thank him for his support.

Naming of Plot Commands in Sage

I've started teaching myself sage and I'm a bit confused about the naming of some commands in graphics. The most basic command for graphics is perhaps plot with its variants polar_plot, contour_plot, etc. However, I've also seen some variants of plot that are obtained from it by adding postfixes to it, for instance, plot_vector_field.
Does anyone know the reason why some graphical commands belong to the first category (prefix_plot) and some to the second (plot_postfix)? I'm asking this because of there is a good reason for this, then it can help me remember the names more easily, and if there is no special reason this might be something to suggest for changes in future releases of sage as it is open source.
PS This is my first question on stackoverflow and I hope this is the right place for asking it, otherwise please feel free to move it anywhere that you feel it might belong.

R bindings for Mapnik?

I frequently find myself doing some analysis in R and then wanting to make a quick map. The standard plot() function does a reasonable job of quick, but I quickly find that I need to go to ggplot2 when I want to make something that looks nice or has more complex symbology requirements. Ggplot2 is great, but is sometimes cumbersome to convert a SpatialPolygonsDataFrame into the format required by Ggplot2. Ggplot2 can also be a tad slow when dealing with large maps that require specific projections.
It seems like I should be able to use Mapnik to plot spatial objects directly from R, but after exhausting my Google-fu, I cannot find any evidence of bindings. Rather than assume that such a thing doesn't exist, I thought I'd check here to see if anyone knows of an R - Mapnik binding.
The Mapnik FAQ explicitly mentions Python bindings -- as does the wiki -- with no mention of R, so I think you are correct that no (Mapnik-sponsored, at least) R bindings currently exist for Mapnik.
You might get a more satisfying (or at least more detailed) answer by asking on the Mapnik users list. They will know for certain if any projects exist to make R bindings for Mapnik, and if not, your interest may incite someone to investigate the possibility of generating bindings for R.
I would write the SpatialWotsitDataFrames to Shapefiles and then launch a Python Mapnik script. You could even use R to generate the Python script (package 'brew' is handy for making files from templates and inserting values form R).

Coding practice in R : what are the advantages and disadvantages of different styles?

The recent questions regarding the use of require versus :: raised the question about which programming styles are used when programming in R, and what their advantages/disadvantages are. Browsing through the source code or browsing on the net, you see a lot of different styles displayed.
The main trends in my code :
heavy vectorization I play a lot with the indices (and nested indices), which results in rather obscure code sometimes but is generally a lot faster than other solutions.
eg: x[x < 5] <- 0 instead of x <- ifelse(x < 5, x, 0)
I tend to nest functions to avoid overloading the memory with temporary objects that I need to clean up. Especially with functions manipulating large datasets this can be a real burden. eg : y <- cbind(x,as.numeric(factor(x))) instead of y <- as.numeric(factor(x)) ; z <- cbind(x,y)
I write a lot of custom functions, even if I use the code only once in eg. an sapply. I believe it keeps it more readible without creating objects that can remain lying around.
I avoid loops at all costs, as I consider vectorization to be a lot cleaner (and faster)
Yet, I've noticed that opinions on this differ, and some people tend to back away from what they would call my "Perl" way of programming (or even "Lisp", with all those brackets flying around in my code. I wouldn't go that far though).
What do you consider good coding practice in R?
What is your programming style, and how do you see its advantages and disadvantages?
What I do will depend on why I am writing the code. If I am writing a data analysis script for my research (day job), I want something that works but that is readable and understandable months or even years later. I don't care too much about compute times. Vectorizing with lapply et al. can lead to obfuscation, which I would like to avoid.
In such cases, I would use loops for a repetitive process if lapply made me jump through hoops to construct the appropriate anonymous function for example. I would use the ifelse() in your first bullet because, to my mind at least, the intention of that call is easier to comprehend than the subset+replacement version. With my data analysis I am more concerned with getting things correct than necessarily with compute time --- there are always the weekends and nights when I'm not in the office when I can run big jobs.
For your other bullets; I would tend not to inline/nest calls unless they were very trivial. If I spell out the steps explicitly, I find the code easier to read and therefore less likely to contain bugs.
I write custom functions all the time, especially if I am going to be calling the code equivalent of the function repeatedly in a loop or similar. That way I have encapsulated the code out of the main data analysis script into it's own .R file which helps keep the intention of the analysis separate from how the analysis is done. And if the function is useful I have it for use in other projects etc.
If I am writing code for a package, I might start with the same attitude as my data analysis (familiarity) to get something I know works, and only then go for the optimisation if I want to improve compute times.
The one thing I try to avoid doing, is being too clever when I code, whatever I am coding for. Ultimately I am never as clever as I think I am at times and if I keep things simple, I tend not to fall on my face as often as I might if I were trying to be clever.
I write functions (in standalone .R files) for various chunks of code that conceptually do one thing. This keeps things short and sweet. I found debugging somewhat easier, because traceback() gives you which function produced an error.
I too tend to avoid loops, except when its absolutely necessary. I feel somewhat dirty if I use a for() loop. :) I try really hard to do everything vectorized or with the apply family. This is not always the best practice, especially if you need to explain the code to another person who is not as fluent in apply or vectorization.
Regarding the use of require vs ::, I tend to use both. If I only need one function from a certain package I use it via ::, but if I need several functions, I load the entire package. If there's a conflict in function names between packages, I try to remember and use ::.
I try to find a function for every task I'm trying to achieve. I believe someone before me has thought of it and made a function that works better than anything I can come up with. This sometimes works, sometimes not so much.
I try to write my code so that I can understand it. This means I comment a lot and construct chunks of code so that they somehow follow the idea of what I'm trying to achieve. I often overwrite objects as the function progresses. I think this keeps the transparency of the task, especially if you're referring to these objects later in the function. I think about speed when computing time exceeds my patience. If a function takes so long to finish that I start browsing SO, I see if I can improve it.
I found out that a good syntax editor with code folding and syntax coloring (I use Eclipse + StatET) has saved me a lot of headaches.
Based on VitoshKa's post, I am adding that I use capitalizedWords (sensu Java) for function names and fullstop.delimited for variables. I see that I could have another style for function arguments.
Naming conventions are extremely important for the readability of the code. Inspired by R's S4 internal style here is what I use:
camelCase for global functions and objects (like doSomething, getXyyy, upperLimit)
functions start with a verb
not exported and helper functions always start with "."
local variables and functions are all in small letters and in "_" syntax (do_something, get_xyyy), It makes it easy to distinguish local vs global and therefore leads to a cleaner code.
For data juggling I try to use as much SQL as possible, at least for the basic things like GROUP BY averages. I like R a lot but sometimes it's not only fun to realize that your research strategy was not good enough to find yet another function hidden in yet another package. For my cases SQL dialects do not differ much and the code is really transparent. Most of the time the threshold (when to start to use R syntax) is rather intuitive to discover. e.g.
require(RMySQL)
# selection of variables alongside conditions in SQL is really transparent
# even if conditional variables are not part of the selection
statement = "SELECT id,v1,v2,v3,v4,v5 FROM mytable
WHERE this=5
AND that != 6"
mydf <- dbGetQuery(con,statement)
# some simple things get really tricky (at least in MySQL), but simple in R
# standard deviation of table rows
dframe$rowsd <- sd(t(dframe))
So I consider it good practice and really recommend to use a SQL database for your data for most use cases. I am also looking into TSdbi and saving time series in relational database, but cannot really judge that yet.

Resources