Sample Code for R? [closed] - r

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Does anyone know a good online resource for example of R code?
The programs do not have to be written for illustrative purposes, I am really just looking for some places where a bunch of R code has been written to give me a sense of the syntax and capabilities of the language?
Edit: I have read the basic documentation on the main site, but was wondering if there was some code samples or even programs that show how R is used by different people.

Why not look at www.r-project.org under documentation and read at least the introduction? The language is sufficiently different from what you're used to that just looking at code samples won't be enough for you to pick it up. (At least, not beyond basic calculator-like functionality.)
If you want to look a bit deeper, you might want to look at CRAN: an online collection of R modules with source code: cran.r-project.org

I just found this question and thought I would add a few resources to it. I really like the Quick-R site:
http://www.statmethods.net/
Muenchen has written a book about using R if you come from SAS or SPSS. Originally it was an 80 page online doc that Springer encouraged him to make a 400+ page book out of. The original short form as well as the book are here:
http://rforsasandspssusers.com/
You've probably already seen these, but worth listing:
http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
http://cran.r-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip
I don't want to sound like a trite RTFM guy, but the help files generally have great short snips of working code as examples. I'm no R pro so I end up having to deconstruct the examples to understand them. That process, while tedious, is really useful.
Good luck!
EDIT: well I hesitated to be self linking (it feels a bit masturbatory) but here's my own list of R resources with descriptions and comments on each: http://www.cerebralmastication.com/?page_id=62

The Rosetta Code project shows R compared to other languages.

How about CRAN? You've got over a thousand packages of code to choose from.

The simplest way of seeing code, is to
install R
type "help.start()" or look at online documentation, to get names of functions
type the function name at the prompt
This will print the source code right at the prompt, and illustrate all manner of odd and interesting syntax corners.

The Learning R blog has a lot of good examples. Lately, the author has been doing a visualization series, comparing Lattice and ggplot2.

It is hard to google r, because of it being too short a name. Try http://rseek.org/, which provides an r-customized Google search instead. Search on examples, code in repositories, etc.

Some simple examples can be found at Mathesaurus - if you know e.g. Python or Matlab, look at the respective comparison charts to find the R idioms that correspond to your familiar idioms in the other language.

I use the R Graph Gallery. It has been a lot of help on graphing itself. Lots of good examples.
#R on Freenode has also been very useful.

http://had.co.nz/ggplot2/ has a lot of graphics with example code. And you only need one package to create almost every graph you need.

There is also the R Wiki which is slowly growing.

As you probably know, R and S are pretty similar (apart from the cost!).
I use to use both, and I highly recommend S Poetry.
I can also highly recommend the M.J. Crawley book, and the shorter Venables & Ripley one.

here are links to the R project group on Linkedin. I put together this list of links and a lot of people have found it useful (some have also made very useful additions)

Use Google Code Search with command "lang:r" and your keyword(s)

Steve McIntyre at http://www.climateaudit.org/ is a big fan of R and often posts working code.
There is a scripts category, and the Statistics and R lists some other resources

Related

how to define variables after importing excel sheet in R

i imported exel to R now i do not know how to solve the question, as it is my 1st time with R
As this looks like an assignment/homework question, and you mention this is your 1st time with R, I think you would benefit more from looking at an in-depth introduction to R than a quick answer here. This site seems to be a good introduction: https://intro2r.com/index.html . The site recommend RStudio which is far more intuitive and easy to use than base R.
There is also often good documentation on basic functions within R itself. Type ? into the console before any command and it will direct you to some helpful information. For example, you may find these useful to get started.
?hist
?plot
?min
?max

How can I measure my usage of R? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm writing an annual report for uni, in which I would like to detail how my usage of R has increased over the past year. I'm looking metrics that I can use to describe my usage of R. Some possible metrics to describe usage:
number of lines of code in history
number of errors
hours spent using program
number of times a particular function has been called
number of plots made
So my question is: can I extract any of the above from R, or can I extract any other metrics which would demonstrate my usage of R?
First, I'm not sure that this question is at all suited to Stack Overflow. Second, I think that the metrics you've identified are not really suitable. Let's look at the ones you've shortlisted so far:
Number of lines of code in history
You make a lot of tweaks to your code. They accumulate in your history. Your history now has a lot of lines of code. Does that reflect positively of your usage of R? Or, you like to write code like the following in R:
temp <- 0
for (i in 1:10) {
temp <- temp + i
}
print(temp)
while a person familiar with R would just write sum(1:10). One line versus five. Can we really say that number of lines is a good metric?
Number of errors
Maybe there is some merit to this. But are you going to classify errors in some way? Is a missing or misplaced bracket forgivable? What about times when no error or warning is issued but R behaves in a way that you might not have expected, thus leading to unexpected results (for example, assuming that numeric(0) and factor(0) would behave the same way). See here for some R gotchas, several of which won't provide any indication of an error, but would certainly lead to erroneous analysis. How would they be analyzed with this metric?
Number of hours spent using the program
Again, debatable. How do you measure the number of hours? Time spent coding? Time the computer spends processing your code? Time it took you to figure out how to program your problem?
Number of times a particular function has been called
I don't understand this metric at all. Do more obscure functions get a higher weight (for example, if you are one of those who use vapply while the rest of the schmucks use sapply, do you get bonus points for using vapply because it can be safer (and sometimes faster) to use?)
Number of plots made
Sorry, but again, I don't understand this metric at all. First of all, not all plots are created equally! There are several in the data visualization field who feel that a lot of software ruined data visualization because some software (a very popular spreadsheet program, in particular) made it so easy for people to quickly make gaudy plots. With R, they are less gaudy by default, but that in itself doesn't make it good. So, if you're just measuring the number of plots churned out without some other criteria for quality assessment, then I'm not sure how this metric is useful.
And, from your comment to your question:
Actually...stack overflow reputation points might be as good as anything!
Eh... The only time I really use R is to answer questions on Stack Overflow (unfortunately true). At the same time, almost all my reputation points here are from the questions I've answered in the R tag. Sure, there are some users here that I would really trust, but sometimes, I don't even trust myself, so I don't know if that's a good indicator of your usage of R.
Lots of users have also complained that Stack Overflow voting is totally wacky, so I'm not sure that you really can use "reputation" as a valid measure of skill. For example, there's an ongoing discussion among regular users here that answers to "easy" questions get voted up very quickly (because they are easy to verify, often without even running the code) while answers to "complicated" questions don't yield votes proportional to the effort taken to answer the question. Case in point: Why the heck do I have a "Guru" badge for an answer that is essentially a reordered version of data already easily available with two minutes on Google. I'm not particularly proud of that answer, and it certainly doesn't say anything about my "usage" of R.
Now, to make this so that it might qualify as an answer and not just an extended comment on your question itself, the biggest thing that I would consider valid, but not sure how to measure it, would be something like how active you are in the R community. There are many ways to get involved with R, from writing or contributing to packages, filing bug reports, conducting workshops to help others make the switch to R, and so on.
I'm not suggesting that you need to write a book, as several others here have done, or to become a legendary package developer with a cult of underscore followers, but you can take small steps. For instance, although I'm a writing teacher, I have held workshops for students and written a few "getting started tips" just to introduce them to using R, so they can consider adding it to their toolkit. Many other users here regularly blog about their experiences working with R and, again, as this is part of a community, they learn a lot in the process.
Finally, a couple of more ideas:
#PaulHiemstra suggested in his comment that you could "mention the percentage of your programming work you do in R." I would extend that concept as follows: (1) try to measure how much of your work overall is done in R and tools complementary to R (obvious ones like Sweave/knitr/LaTeX come to mind), and (2) try to measure how much of an impact using R has had on improving your overall skills (with the logic being that good programming is often accompanied by logical thought, careful organization, good documentation, and so on).
Related to the previous point, try to see how your usage of R has changed with time. Has your behavior changed from manually redoing the same steps to writing functions yet? Have you then gone back and adapted those functions so that, instead of solving a specific problem you had at a given point in time, they can be used more generally by a larger audience? These are pretty significant changes, particularly if you had started from scratch with the language, and they can be a bit more meaningful than the ideas you presented in your question.
So, to summarize, a lot of the somewhat easily quantifiable things that you've identified in your question will probably lead to very meaningless analysis. I feel that the qualitative inputs you make would be much more valuable.
Another metric: Get an old and complex (don't know if you have one) code and redo it from 0. Use the difference of computation time as metric.

Abbreviations and functions in preparation for a programming contest [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am participating in a big programming competition tomorrow where I use R.
Time is the main factor (only 2 hours for 7 coding problems).
The problems are very mathematics related.
I would like to write "f" instead of "function" when I define a function.
This can be done and I had the code to do so, but I lost it and cannot find it.
Where do I find sin() functions for degrees input, not radian?
(optional) Is there any algorithm specific task view or libraries.
Any tip for a programming contest?
I prepared the following cheat sheet for the contest:
http://pastebin.com/h5xDLhvg
======== EDIT: ==========
So I finally have time to write down my lessons learned.
The programming contest was a lot of fun, but unfortunately I did not score very well. I was in the top 50%, but my aim was to be in the top 25%.
The main problem was that there was very little time to program, just 2 hours in total. But I had to read the problem descriptions and also I needed some time to paste the results in the web form, etc., so it was more like 90 Minutes of programming.
Hopefully the next contest in December will have extended time, like 3-4 hours. The organizers said that perhaps will be the case.
Also, there was no Internet access at the contest, and my mobile reception was not really working.
The main lesson for me is that you have to use a language you daily use in order to have a real chance. Especially, if there is only about 90 Minutes time to program. Since I use haskell more than R in my daily work, I think R was not the best choice. During the contest I mixed up haskell and R function definitions, and I made too many small typos to program fast enough.
What was great about the contest was, that there was about 20 000 bucks prize money in total for the about 80 participants. So the top 25% participants got from 500 to 1500 bucks each. Further, I think the top 15% get a job right away from one of the sponsor IT firms.
So it's a win-win situation. It's fun, plus you can get prize money. Further the IT firms are more than happy, because they have access to the top programmers.
I used the chance to speak to IT decision makers. One of them was from a larger bank. I boldly suggested that they consider switching to Scala for their development (switchung from Java). And also to consider using R and Haskell. It was fun, and they even said they already looked into Scala!
What was interesting to note was, that one of my best friends scored very good at the competition. He is only 19 years old, but he was well in the top 20% and got 500 bucks prize money. He beat me plus 6 of my colleges, who all have a respectable computer science degree. My friend programs more like hacker style, but he was very fast.
People in the top 10 used:
1) Java
2) C# and
3) C++
(No other programming language in the top 10!).
The only other programming language that scored reasonably well was Ruby, I think.
For the next contest the programming language of choice will probably be haskell. For one reason, it's just easier to find 2 team mates for haskell than for R programming. And up to 3 persons can form a team.
My ideal scenario would be a very light weight framework, where I could use multiple programming languages at once for the contest. That way, the main code can be written in haskell (which all team mates can program in). And some specific functions may be programmed in R, or in Mathematica, or even some other programming language (like python/sage).
This sounds a little bit overkill. But I think it would be very usefull. Like a function that has a matrix as a parameter and returns a matrix. Then this framework work generate automatically a RESTful service from the R code, so I could call the R function from any programming language. The matrix is just passed around as JSON data (or some other serialization). Okay, but this is off topic...
So finally some lessons learned as a bullet list:
don't bring food. you don't have time to eat, and there is a rich buffet afterwards
time is the limiting factor!
if you don't program R for a living, don't use R
look for contests where there is more time (3-4 hourss minimum!)
all in all, the concept of the contest is superb! Both for the participants, but also for the sponsors.
BIG THANKS to the help of 'Iterator' for his post!!
I'm going to answer a related, but different question. No offense, but your original suggestions don't seem very wise for a programming competition. Much of the time spent in such contexts is in devising an answer and in debugging (or, better, avoiding the need to debug).
Instead, I will answer this question: "What are the key resources in R that are useful for rapid prototyping, with a focus on being able to find resources quickly, being able to debug quickly, and being able to investigate data quickly? If I need to use numerical optimization methods and algebra systems, what should I investigate?"
Here are my answers:
Install RStudio or possibly Revolution Analytics' R, depending on which interface seems more appropriate to you. Both are good. The former has a very smooth GUI, the latter has a more intense interface, with more capabilities for managing code. Both have some nice properties over the "community" R regarding being able to look up information and navigate the help libraries quickly.
Get acquainted with example(), identify where to get vignettes and tutorials (from packages' pages on CRAN), and take a brief look at demo().
Use the sos library, and master findFn.
Look at the Task Views on CRAN - be sure you know about the tools for high performance computing (if that is going to be related) and the tools for optimization - it's quite common to need to use some kind of solver, and there's a task view for that.
If your code is running slowly during the prototyping or competition, you'll need to run Rprof(). Take that for a spin first. You may also benefit from using the compiler package if your code involves much iteration. In short: You do not want to wait on the computer. You might also look at foreach and doSMP or doMC if you can parcel the job to different cores. To aggregate results, become familiar with plyr and methods like ldply, as well as standard *apply functions, like lapply and apply; another good one to know is rapply. (If you have lots of stuff to process and it takes some time, look at mclapply or the .parallel argument for the plyr functions.)
On Stack Overflow: browse JD Long's questions - much of what you will discover that you do not know will have been asked by him before you thought to ask it. And there's an answer already there.
Create a number of little code templates for yourself. Master functions so that you don't need to learn these in a rush. Learn how to debug and step through these, using debug() and browser().
If you have to count things, learn how to use the hash package (akin to Perl and Python hash tables) and learn to use digest for keys that are too long to be used for hash (see this question for references)
If you are going to need to plot things, get some basic example plots prepared, using either plot or ggplot2, along with hist, boxplot, and some others. If you don't know ggplot2 already, then postpone, but you should become familiar with it. If you happen to use a lot of data, then be sure you know hexbin. If you will have to interact with data, then get to know iplots and the interesting tools there, such as iplot, ihist, and parallel coordinate plots (ipcp).
Be sure you know how to use lists, data frames, and matrices, including subscripting, lookups of entries based on (row, column) indices. (Again, be sure to investigate plyr for transforming and operating on some of these objects.)
Get acquainted with data.table() - it's exceptionally efficient for a lot of things you might do with data frames and matrices.
If you need to do symbolic mathematics, be sure you know the packages for that or else get another standalone tool for symbolic math. Ryacas is one package that appears to be useful.
Get the PDF of the R in a Nutshell, so that you can rapidly search through it for useful methods. Else, get the book itself. Various other books, such as Venables & Ripley, the R Cookbook, and others may be useful, depending on your experience.
If you've already mastered a good editor (e.g. emacs) or IDE (e.g. Eclipse), stick with it and look for bindings to R. Otherwise, a simple one you can begin using right away is Notepad++. Being able to do block selection is a very useful property in an editor. Being able to search through an entire directory hierarchy of code examples is another useful capability.
If you need to do anything involving database data, you may want to know RSQLite and sqldf, though these may not be relevant to a math competition.
Open a bunch of R instances so that you can try things out. :) [This is actually serious: by having multiple instances running, you can somewhat avoid latency associated with sequentially trying things out, waiting for results, and then debugging the results.]
For (1), you can do something like
f <- function(..., body)
{
dots <- substitute(...)
body <- substitute(body)
f <- function()
formals(f) <- dots
body(f) <- body
environment(f) <- parent.env(environment())
f
}
which lets you write, eg, g <- f(x, y, body=x+y) but I'm not sure how far that gets you.
For (2), you could just do:
sindeg <- function(x) sin(x*pi/180)

Any documentation for optimizing the performance of R? [duplicate]

This question already has answers here:
Speed up the loop operation in R
(10 answers)
Closed 9 years ago.
I'm fairly new to R, and one thing that has struck me is that it's running fairly slow. Is there any documentation for optimizing R? For example, optimizing Python is described very good here. In my particular case I'm interested in optimizing R for batch jobs.
I have tried Googling for an answer of course, but it's not exactly easy to Google for R info since R is a pretty generic little search pattern.
For start, you should take a look at R Inferno by Patric Burns.
Than the best idea is to ask more detailed questions here.
Yes, R is a bit awkward for a search term, so try RSiteSearch("performance") within R - this will search within lots of R docs sources.
a simple google search on 'efficient programming in r' reveals the following excellent resources. the first resource is great as it provides a comparison of the bad, good and best ways of programming a task in R. the second resource is more generic.
http://perswww.kuleuven.be/~u0044882/Research/slidesR.pdf
http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
if you are looking at more specific areas of optimizing your R code, specify it more clearly and i am sure you will find an expert here !!
"It's running fairly slow" is very vague. There are many techniques for using R in the most efficient way, the general rule is "avoid loops, and vectorize" - but there is so much more such as ensuring objects are pre-allocated rather than resized on the fly.
It really depends on what you are doing, so please be more specific. The standard documentation has plenty of tips for the basics and your question does not really give opportunity for someone to do any more than regurgitate those.
When standard R really is limited for your needs you can write directly in a compiled language such as C, or use advanced interfaces such as Rcpp. For other tools and techniques that extend beyond the basic R toolkit consult the "High Performance Computing" Task View on CRAN.

What useful R package doesn't currently exist? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have been working on a few R packages for some general tools that aren't currently available in R: blogging, report delivery, logging, and scheduling. This led me to wonder: what are the most important things that people wish existed in R that currently aren't available?
My hope is that we can use this to pinpoint some gaps, and possibly work on them collaboratively.
I'm a former Mathematica junkie, and one thing that I really miss is the notebook style interface. When I did my research with notebooks, papers would almost write themselves as I did my analysis. But now that I'm using R, I find that documenting my work to be quite tedious.
For people that are not so familiar with Mathematica, you have documents called "notebooks" that can contain code, text, equations, and the results from executed code (which can be equations, text, graphics, or interactive tools). Everything can be neatly organized into styled subsections or sections that are collapsable. You can have multiple open documents that integrate with a single shared kernel.
While I don't think a full-blown Mathematica style interface is entirely necessary, some interactive document system that would support text (for description), code, code output, and embedded image output would be a real boon to researchers.
A Real-Time R package would be my choice, using C Streaming perhaps.
Also I'd like a more robust web development package. Nothing as extensive as Ruby on Rails but something a bit better than Sweave combined with R2HTML, that can run on RApache. I think this needs to be a huge area of emphasis for R in general.
I realize LaTeX is better markup for certain academia but in general I think HTML should be the markup language of choice. More needs to be done in terms of R Web Apps, so applications can be hosted on huge RAM remotely and R can start being used for SaaS data applications and other graphics choices.
Interfaces to any of the new-fangled 'Web 2.0' databases that use key-value pairs rather than the standard RDMS. A non-exhaustive list (in alphabetical order) would be
Cassandra Project
CouchDB
MongoDB
Project Voldemort
Redis
Tokyo Cabinet
and it would of course be nice if we had a DBI-alike abstraction on top of this. Jeff has started with RBerkeley but that use the older-school Oracle BerkeleyDB backend rather than one of those new things.
An output device which produces Javascript code, perhaps using the protovis library.
as a programmer and writer of libraries for colleagues, I was definitely missing a logging package, I googled and asked around, here too, then wrote one myself. it is on r-forge, here, and it s called "logging" :)
I use it and I'm obviously still developing it.
There are few libraries to interface with database in general, and there is not ORM library.
RMySQL is useful, but you have to write the SQL queries manually and there is not a way to generate them as in a ORM. Morevoer, it is only specific to MySQL.
Another library set that R still doesn't have, for me, it is a good system for reading command line arguments: there is R getopt but it is nothing like, for example, argparse in python.
A natural interface to the .NET framework would be awesome, though I suspect that that might be a lot of work.
EDIT:
Syntax highlighting from within RGui would also be wonderful.
ANOTHER EDIT:
R.NET now exists to integrate R with .NET.
A FRAQ package for FRequently Asked Questions, a la fortune(). R-help would be so much fun: "Try this, library(FRAQ); faq("lattice won't print"), etc.
See also.
A wiki package that adds wiki-like documentation to R packages. You'd have a inst/wiki subdirectory with plain text files in markdown, asciidoc, textile, with embedded R code. With the right incantation, these files would be executed (think brew and/or asciidoc packages), and the relevant output uploaded to a given repository online (github, googlecode, etc.). Another function could take care of synchronizing the changes made online, typically via svn or git.
Suddenly you have a wiki documentation for your package with reproducible examples (could even be hooked to R CMD check).
EDIT 2012:
... and now the knitr package would make this process even easier and neater
I would like to see a possibility to embed another programming language within R in a more straightforward way by the users. I give this as an example in some common-lisp implementations one could write a function with embedded C code like this:
(defun sample (x)
(ffi:c-inline (n1 n2) (:int :int) (values :int :int) "{
int n1 = #0, n2 = #1, out1 = 0, out2 = 1;
while (n1 <= n2) {
out1 += n1;
out2 *= n1;
n1++;
}
#(return 0)= out1;
#(return 1)= out2;
}"
:side-effects nil))
It would be good if one could write an R function with embedded C or lisp code (more interested in the latter) in a similar way.
A native .NET interface to RGUI. R(D)Com is based on COM, and it only allows to exchange matrices, not more complex structures.
I would very much like a line profiler. This exists in Matlab and Python, and is very useful for finding bits of code that take a lot of time or are executed more (or less) than expected. A lot of my code involves function optimizations and how many times something iterates may not be known in advance (though most iterations are constrained or specified).
The call stack is useful if all of your code is in R and is very simple, but as I recently posted about it, it takes a painstaking effort if your code is complex.
It's quite easy to develop a line profiler for a given bit of code. A naive way is to index every line (or just pre-specified sections) and insert a call to log proc.time() that line. In a loop, I simply enumerate sections of code and store in a 2 dimensional list the proc.time values for section i in iteration k. [See update below: this isn't actually a way to do a line profiler for all kinds of code.]
One can use such a tool to find hotspots, anomalies (e.g. code that should be O(n) but is really O(n^2)), code that may benefit from memoization (a line profiler doesn't tell you this, but it lets you know where to look), code that is mistakenly inside a loop, and more.
Update 1: Inserting a timing line between every function line is slightly erroneous: the definition of a line of code is not simply code separated by whitespace. Being able to parse the code into an AST is necessary for knowing where operations begin and end. As discussed in some of the answers to this question, there are some tools (namely, showTree and walkCode in the codetools package) for doing this. Simply applying a regular expression to source code would be a very bad thing to do.

Resources