Any documentation for optimizing the performance of R? [duplicate] - r

This question already has answers here:
Speed up the loop operation in R
(10 answers)
Closed 9 years ago.
I'm fairly new to R, and one thing that has struck me is that it's running fairly slow. Is there any documentation for optimizing R? For example, optimizing Python is described very good here. In my particular case I'm interested in optimizing R for batch jobs.
I have tried Googling for an answer of course, but it's not exactly easy to Google for R info since R is a pretty generic little search pattern.

For start, you should take a look at R Inferno by Patric Burns.
Than the best idea is to ask more detailed questions here.

Yes, R is a bit awkward for a search term, so try RSiteSearch("performance") within R - this will search within lots of R docs sources.

a simple google search on 'efficient programming in r' reveals the following excellent resources. the first resource is great as it provides a comparison of the bad, good and best ways of programming a task in R. the second resource is more generic.
http://perswww.kuleuven.be/~u0044882/Research/slidesR.pdf
http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
if you are looking at more specific areas of optimizing your R code, specify it more clearly and i am sure you will find an expert here !!

"It's running fairly slow" is very vague. There are many techniques for using R in the most efficient way, the general rule is "avoid loops, and vectorize" - but there is so much more such as ensuring objects are pre-allocated rather than resized on the fly.
It really depends on what you are doing, so please be more specific. The standard documentation has plenty of tips for the basics and your question does not really give opportunity for someone to do any more than regurgitate those.
When standard R really is limited for your needs you can write directly in a compiled language such as C, or use advanced interfaces such as Rcpp. For other tools and techniques that extend beyond the basic R toolkit consult the "High Performance Computing" Task View on CRAN.

Related

Abbreviations and functions in preparation for a programming contest [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am participating in a big programming competition tomorrow where I use R.
Time is the main factor (only 2 hours for 7 coding problems).
The problems are very mathematics related.
I would like to write "f" instead of "function" when I define a function.
This can be done and I had the code to do so, but I lost it and cannot find it.
Where do I find sin() functions for degrees input, not radian?
(optional) Is there any algorithm specific task view or libraries.
Any tip for a programming contest?
I prepared the following cheat sheet for the contest:
http://pastebin.com/h5xDLhvg
======== EDIT: ==========
So I finally have time to write down my lessons learned.
The programming contest was a lot of fun, but unfortunately I did not score very well. I was in the top 50%, but my aim was to be in the top 25%.
The main problem was that there was very little time to program, just 2 hours in total. But I had to read the problem descriptions and also I needed some time to paste the results in the web form, etc., so it was more like 90 Minutes of programming.
Hopefully the next contest in December will have extended time, like 3-4 hours. The organizers said that perhaps will be the case.
Also, there was no Internet access at the contest, and my mobile reception was not really working.
The main lesson for me is that you have to use a language you daily use in order to have a real chance. Especially, if there is only about 90 Minutes time to program. Since I use haskell more than R in my daily work, I think R was not the best choice. During the contest I mixed up haskell and R function definitions, and I made too many small typos to program fast enough.
What was great about the contest was, that there was about 20 000 bucks prize money in total for the about 80 participants. So the top 25% participants got from 500 to 1500 bucks each. Further, I think the top 15% get a job right away from one of the sponsor IT firms.
So it's a win-win situation. It's fun, plus you can get prize money. Further the IT firms are more than happy, because they have access to the top programmers.
I used the chance to speak to IT decision makers. One of them was from a larger bank. I boldly suggested that they consider switching to Scala for their development (switchung from Java). And also to consider using R and Haskell. It was fun, and they even said they already looked into Scala!
What was interesting to note was, that one of my best friends scored very good at the competition. He is only 19 years old, but he was well in the top 20% and got 500 bucks prize money. He beat me plus 6 of my colleges, who all have a respectable computer science degree. My friend programs more like hacker style, but he was very fast.
People in the top 10 used:
1) Java
2) C# and
3) C++
(No other programming language in the top 10!).
The only other programming language that scored reasonably well was Ruby, I think.
For the next contest the programming language of choice will probably be haskell. For one reason, it's just easier to find 2 team mates for haskell than for R programming. And up to 3 persons can form a team.
My ideal scenario would be a very light weight framework, where I could use multiple programming languages at once for the contest. That way, the main code can be written in haskell (which all team mates can program in). And some specific functions may be programmed in R, or in Mathematica, or even some other programming language (like python/sage).
This sounds a little bit overkill. But I think it would be very usefull. Like a function that has a matrix as a parameter and returns a matrix. Then this framework work generate automatically a RESTful service from the R code, so I could call the R function from any programming language. The matrix is just passed around as JSON data (or some other serialization). Okay, but this is off topic...
So finally some lessons learned as a bullet list:
don't bring food. you don't have time to eat, and there is a rich buffet afterwards
time is the limiting factor!
if you don't program R for a living, don't use R
look for contests where there is more time (3-4 hourss minimum!)
all in all, the concept of the contest is superb! Both for the participants, but also for the sponsors.
BIG THANKS to the help of 'Iterator' for his post!!
I'm going to answer a related, but different question. No offense, but your original suggestions don't seem very wise for a programming competition. Much of the time spent in such contexts is in devising an answer and in debugging (or, better, avoiding the need to debug).
Instead, I will answer this question: "What are the key resources in R that are useful for rapid prototyping, with a focus on being able to find resources quickly, being able to debug quickly, and being able to investigate data quickly? If I need to use numerical optimization methods and algebra systems, what should I investigate?"
Here are my answers:
Install RStudio or possibly Revolution Analytics' R, depending on which interface seems more appropriate to you. Both are good. The former has a very smooth GUI, the latter has a more intense interface, with more capabilities for managing code. Both have some nice properties over the "community" R regarding being able to look up information and navigate the help libraries quickly.
Get acquainted with example(), identify where to get vignettes and tutorials (from packages' pages on CRAN), and take a brief look at demo().
Use the sos library, and master findFn.
Look at the Task Views on CRAN - be sure you know about the tools for high performance computing (if that is going to be related) and the tools for optimization - it's quite common to need to use some kind of solver, and there's a task view for that.
If your code is running slowly during the prototyping or competition, you'll need to run Rprof(). Take that for a spin first. You may also benefit from using the compiler package if your code involves much iteration. In short: You do not want to wait on the computer. You might also look at foreach and doSMP or doMC if you can parcel the job to different cores. To aggregate results, become familiar with plyr and methods like ldply, as well as standard *apply functions, like lapply and apply; another good one to know is rapply. (If you have lots of stuff to process and it takes some time, look at mclapply or the .parallel argument for the plyr functions.)
On Stack Overflow: browse JD Long's questions - much of what you will discover that you do not know will have been asked by him before you thought to ask it. And there's an answer already there.
Create a number of little code templates for yourself. Master functions so that you don't need to learn these in a rush. Learn how to debug and step through these, using debug() and browser().
If you have to count things, learn how to use the hash package (akin to Perl and Python hash tables) and learn to use digest for keys that are too long to be used for hash (see this question for references)
If you are going to need to plot things, get some basic example plots prepared, using either plot or ggplot2, along with hist, boxplot, and some others. If you don't know ggplot2 already, then postpone, but you should become familiar with it. If you happen to use a lot of data, then be sure you know hexbin. If you will have to interact with data, then get to know iplots and the interesting tools there, such as iplot, ihist, and parallel coordinate plots (ipcp).
Be sure you know how to use lists, data frames, and matrices, including subscripting, lookups of entries based on (row, column) indices. (Again, be sure to investigate plyr for transforming and operating on some of these objects.)
Get acquainted with data.table() - it's exceptionally efficient for a lot of things you might do with data frames and matrices.
If you need to do symbolic mathematics, be sure you know the packages for that or else get another standalone tool for symbolic math. Ryacas is one package that appears to be useful.
Get the PDF of the R in a Nutshell, so that you can rapidly search through it for useful methods. Else, get the book itself. Various other books, such as Venables & Ripley, the R Cookbook, and others may be useful, depending on your experience.
If you've already mastered a good editor (e.g. emacs) or IDE (e.g. Eclipse), stick with it and look for bindings to R. Otherwise, a simple one you can begin using right away is Notepad++. Being able to do block selection is a very useful property in an editor. Being able to search through an entire directory hierarchy of code examples is another useful capability.
If you need to do anything involving database data, you may want to know RSQLite and sqldf, though these may not be relevant to a math competition.
Open a bunch of R instances so that you can try things out. :) [This is actually serious: by having multiple instances running, you can somewhat avoid latency associated with sequentially trying things out, waiting for results, and then debugging the results.]
For (1), you can do something like
f <- function(..., body)
{
dots <- substitute(...)
body <- substitute(body)
f <- function()
formals(f) <- dots
body(f) <- body
environment(f) <- parent.env(environment())
f
}
which lets you write, eg, g <- f(x, y, body=x+y) but I'm not sure how far that gets you.
For (2), you could just do:
sindeg <- function(x) sin(x*pi/180)

How did you experience the transition from SPSS to R? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The discussion in this question is the direct cause for me asking this question. The more general reason is the fact that I often have to explain R use to people that are only familiar with SPSS. I know most of the basics of SPSS, as we still use it in the base course statistics. But as I'm more of an R guy, it's difficult to know how SPSS users experience the first meeting with R.
I know there is the book R for SAS and SPSS users and that contains already some information. Yet, I would like to know what the more difficult parts are when you switch from SPSS to R.
Or in other words : if you would have to explain R in one day to SPSS users, which topics would you focus on? This is not a hypothetical question by the way (yeah, I know, it's not because one get paid for it that it always makes sense...).
Firstly, data manipulation has been the most challenging thing to learn coming from SPSS/SAS to R. I've found, personally, that getting the data in the right shape for an analysis is usually much more difficult than the analysis itself. Secondly, a true understanding of how to deal with categorical values through the use of factors. Lastly, summary statistics and descriptives can sometimes be challenging to get in a format that is transmutable to PPT or Excel which are what (my) clients generally expect/demand for reporting.
I would focus on:
1 Data manipulation
Understanding data structures. Import/Export. Then in-depth training on the use of packages like plyer, reshape with a particular focus on how to effectively use cast with formulas and melt with ids. How to apply numerical functions within a data.frame using ddply.
2 Factoring Data
In general, an explanation of dealing with recoding with, epicalc or a user-defined function. Also an explanation of the significance of factors, levels, and labels
3 Descriptives
Take a few minutes to introduce xtabs(), table(), prop.table() using cast() from reshape to create columnar tables of data that are more reasonably exported to Excel.
Graphics are optional, if you've done a good job of the above they should be able to get the data they need to create graphs in whatever software they are most comfortable with.
4 Graphics
If you've done a good job teaching the data manipulation, getting data into the shape needed for graphing should be pretty straightforward (or at least reproducible) at this point. ggplot2 is complicated and requires a day just by itself to be played with. But it is possible to give a quick overview of it. Alternatively, base graphics are simple to understand and the help is much more clear on what things do and how the syntax works.
Note: I left out statistical analysis. However, an overview of lm() and perhaps anova(), or cor() would be helpful as a start point. But this should be explained at the same time as data.manipulation.
Although I "wrote the book" on R to SPSS migration, that was aimed at programmers and most SPSS users that I know prefer to "point-and-click" instead. A graphical user interface like Deducer (or R Commander) can help them feel at home while teaching them how R programming code works if they want to see it. Deducer's Plot Builder also does a nice job letting you create complex plots easily, and if you want to learn to ggplot2 code, it will show you that as well. Ian did a great job with it!
However, while the SPSS graphical user interface covers 98% of what SPSS can do, Deducer covers perhaps 1% of what R can do. That's probably still 75% of what your average researcher needs, but R is so broad that to get the most out of it people will need to learn to program. The free version of my book, "R for SAS and SPSS Users" is only 80 pages & covers the areas of programming that I think are most likely to confuse beginners. It's at http://r4stats.com.
Just recently I've had a student who was somewhat versed in statistics and did some analysis beforehand in SPSS. I then showed him how to do the exact same thing in R. We went through the code and plotting, explaining and debating each line. He realized how easy and convenient it is to do it in R. Thus, R community grew by 1. :)
The biggest issue that the researchers I've dealt with have is the lack of point-and-click GUI. While there are a number of efforts out there in the R community, none of them have reached the ease-of-use/power level that SPSS has.
Since coding is second nature to R users, sometimes we forget that the majority of users of statistical software can't program (and would avoid it like the plague), even though they may have a strong practical understanding of statistics.
If I had one day to bring an SPSS user into R, I'd start them on Deducer. Deducer is an R GUI project (Self promotion note: I'm the author) that should feel very familiar to a user coming from SPSS. As they find themselves needing more advanced functions, they will naturally move to the command line to fulfill their needs.

R and SPSS difference

I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
When talking about user friendlyness
of computer software I like the
analogy of cars vs. busses:
Busses are very easy to use, you just
need to know which bus to get on,
where to get on, and where to get off
(and you need to pay your fare). Cars
on the other hand require much more
work, you need to have some type of
map or directions (even if the map is
in your head), you need to put gas in
every now and then, you need to know
the rules of the road (have some type
of drivers licence). The big advantage
of the car is that it can take you a
bunch of places that the bus does not
go and it is quicker for some trips
that would require transfering between
busses.
Using this analogy programs like SPSS
are busses, easy to use for the
standard things, but very frustrating
if you want to do something that is
not already preprogrammed.
R is a 4-wheel drive SUV (though
environmentally friendly) with a bike
on the back, a kayak on top, good
walking and running shoes in the
pasenger seat, and mountain climbing
and spelunking gear in the back.
R can take you anywhere you want to go
if you take time to leard how to use
the equipment, but that is going to
take longer than learning where the
bus stops are in SPSS.
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:
Output of tables - SPSS has basic tables, general tables, custom tables, etc that are all output to that nifty data viewer or whatever they call it. These can relatively easily be transported to Word Documents or Excel sheets for further analysis / presentation. The equivalent function in R involves learning LaTex or using a odfWeave or Lyx or something of that nature.
Labeling of data --> SPSS does a pretty good job with the variable labels and value labels. I haven't found a robust solution for R to accomplish this same task.
You mention that you are going to be scripting most of your work, and personally I find SPSS's scripting syntax absolutely horrendous, to the point that I've stopped working with SPSS whenever possible. R syntax seems much more logical and follows programming standards more closely AND there is a very active community to rely on should you run into trouble (SO for instance). I haven't found a good SPSS community to ask questions of when I run into problems.
Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.
If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?
There's an interesting (and reasonably fair) comparison between a number of stats tools here
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
I work with both in a company and can say the following:
If you have a large team of different people (not all data scientists), SPSS is useful because it is plain (relatively) to understand. For example, if users are going to run a model to get an output (sales estimates, etc), SPSS is clear and easy to use.
That said, I find R better in almost every other sense:
R is faster (although, sometimes debatable)
As stated previously, the syntax in SPSS is aweful (I can't stress this enough). On the other hand, R can be painful to learn, but there are tons of resources online and in the end it pays much more because of the different things you can do.
Again, like everyone else says, the sky is the limit with R. Tons of packages, resources and more importantly: indepedence to do as you please. In my organization we have some very high level functions that get a lot done. The hard part is creating them once, but then they perform complicated tasks that SPSS would tangle in a never ending web of canvas. This is specially true for things like loops.
It is often overlooked, but R also has plenty of features to cooperate between teams (github integration with RStudio, and easy package building with devtools).
Actually, if everyone in your organization knows R, all you need is to maintain a basic package on github to share everything. This of course is not the norm, which is why I think SPSS, although a worst product, still has a market.
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
SPSS provides a GUI to easily integrate existing R programs or develop new ones. For more info, see the SPSS Community on IBM Developer Works.
#Henrik, I did the same task you have mentioned (C++ and R) on SPSS. And it turned out that SPSS is faster compared to R on this one. In my case SPSS is aprox. 7 times faster. I am surprised about it.
Here is a code I used in SPSS.
data list free
/x (f8.3).
begin data
1
end data.
comp n = 1e6.
comp t1 = $time.
loop #rep = 1 to 10.
comp x = 1.
loop #i=1 to n.
comp x = 1/(1+x).
end loop.
end loop.
comp t2 = $time.
comp elipsed = t2 - t1.
form elipsed (f8.2).
exe.
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
The truth is: both packages are useful if you do data analysis professionally. Sure, R / RStudio has more statistical methods implemented than SPSS. But SPSS is much easier to use and gives more information per each button click. And, therefore, it is faster to exploit whenever a particular analysis is implemented in both R and SPSS.
In the modern age, neither CPU nor memory is the most valuable resource. Researcher's time is the most valuable resource. Also, tables in SPSS are more visually pleasing, in my opinion.
In summary, R and SPSS complement each other well.

What useful R package doesn't currently exist? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have been working on a few R packages for some general tools that aren't currently available in R: blogging, report delivery, logging, and scheduling. This led me to wonder: what are the most important things that people wish existed in R that currently aren't available?
My hope is that we can use this to pinpoint some gaps, and possibly work on them collaboratively.
I'm a former Mathematica junkie, and one thing that I really miss is the notebook style interface. When I did my research with notebooks, papers would almost write themselves as I did my analysis. But now that I'm using R, I find that documenting my work to be quite tedious.
For people that are not so familiar with Mathematica, you have documents called "notebooks" that can contain code, text, equations, and the results from executed code (which can be equations, text, graphics, or interactive tools). Everything can be neatly organized into styled subsections or sections that are collapsable. You can have multiple open documents that integrate with a single shared kernel.
While I don't think a full-blown Mathematica style interface is entirely necessary, some interactive document system that would support text (for description), code, code output, and embedded image output would be a real boon to researchers.
A Real-Time R package would be my choice, using C Streaming perhaps.
Also I'd like a more robust web development package. Nothing as extensive as Ruby on Rails but something a bit better than Sweave combined with R2HTML, that can run on RApache. I think this needs to be a huge area of emphasis for R in general.
I realize LaTeX is better markup for certain academia but in general I think HTML should be the markup language of choice. More needs to be done in terms of R Web Apps, so applications can be hosted on huge RAM remotely and R can start being used for SaaS data applications and other graphics choices.
Interfaces to any of the new-fangled 'Web 2.0' databases that use key-value pairs rather than the standard RDMS. A non-exhaustive list (in alphabetical order) would be
Cassandra Project
CouchDB
MongoDB
Project Voldemort
Redis
Tokyo Cabinet
and it would of course be nice if we had a DBI-alike abstraction on top of this. Jeff has started with RBerkeley but that use the older-school Oracle BerkeleyDB backend rather than one of those new things.
An output device which produces Javascript code, perhaps using the protovis library.
as a programmer and writer of libraries for colleagues, I was definitely missing a logging package, I googled and asked around, here too, then wrote one myself. it is on r-forge, here, and it s called "logging" :)
I use it and I'm obviously still developing it.
There are few libraries to interface with database in general, and there is not ORM library.
RMySQL is useful, but you have to write the SQL queries manually and there is not a way to generate them as in a ORM. Morevoer, it is only specific to MySQL.
Another library set that R still doesn't have, for me, it is a good system for reading command line arguments: there is R getopt but it is nothing like, for example, argparse in python.
A natural interface to the .NET framework would be awesome, though I suspect that that might be a lot of work.
EDIT:
Syntax highlighting from within RGui would also be wonderful.
ANOTHER EDIT:
R.NET now exists to integrate R with .NET.
A FRAQ package for FRequently Asked Questions, a la fortune(). R-help would be so much fun: "Try this, library(FRAQ); faq("lattice won't print"), etc.
See also.
A wiki package that adds wiki-like documentation to R packages. You'd have a inst/wiki subdirectory with plain text files in markdown, asciidoc, textile, with embedded R code. With the right incantation, these files would be executed (think brew and/or asciidoc packages), and the relevant output uploaded to a given repository online (github, googlecode, etc.). Another function could take care of synchronizing the changes made online, typically via svn or git.
Suddenly you have a wiki documentation for your package with reproducible examples (could even be hooked to R CMD check).
EDIT 2012:
... and now the knitr package would make this process even easier and neater
I would like to see a possibility to embed another programming language within R in a more straightforward way by the users. I give this as an example in some common-lisp implementations one could write a function with embedded C code like this:
(defun sample (x)
(ffi:c-inline (n1 n2) (:int :int) (values :int :int) "{
int n1 = #0, n2 = #1, out1 = 0, out2 = 1;
while (n1 <= n2) {
out1 += n1;
out2 *= n1;
n1++;
}
#(return 0)= out1;
#(return 1)= out2;
}"
:side-effects nil))
It would be good if one could write an R function with embedded C or lisp code (more interested in the latter) in a similar way.
A native .NET interface to RGUI. R(D)Com is based on COM, and it only allows to exchange matrices, not more complex structures.
I would very much like a line profiler. This exists in Matlab and Python, and is very useful for finding bits of code that take a lot of time or are executed more (or less) than expected. A lot of my code involves function optimizations and how many times something iterates may not be known in advance (though most iterations are constrained or specified).
The call stack is useful if all of your code is in R and is very simple, but as I recently posted about it, it takes a painstaking effort if your code is complex.
It's quite easy to develop a line profiler for a given bit of code. A naive way is to index every line (or just pre-specified sections) and insert a call to log proc.time() that line. In a loop, I simply enumerate sections of code and store in a 2 dimensional list the proc.time values for section i in iteration k. [See update below: this isn't actually a way to do a line profiler for all kinds of code.]
One can use such a tool to find hotspots, anomalies (e.g. code that should be O(n) but is really O(n^2)), code that may benefit from memoization (a line profiler doesn't tell you this, but it lets you know where to look), code that is mistakenly inside a loop, and more.
Update 1: Inserting a timing line between every function line is slightly erroneous: the definition of a line of code is not simply code separated by whitespace. Being able to parse the code into an AST is necessary for knowing where operations begin and end. As discussed in some of the answers to this question, there are some tools (namely, showTree and walkCode in the codetools package) for doing this. Simply applying a regular expression to source code would be a very bad thing to do.

Sample Code for R? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Does anyone know a good online resource for example of R code?
The programs do not have to be written for illustrative purposes, I am really just looking for some places where a bunch of R code has been written to give me a sense of the syntax and capabilities of the language?
Edit: I have read the basic documentation on the main site, but was wondering if there was some code samples or even programs that show how R is used by different people.
Why not look at www.r-project.org under documentation and read at least the introduction? The language is sufficiently different from what you're used to that just looking at code samples won't be enough for you to pick it up. (At least, not beyond basic calculator-like functionality.)
If you want to look a bit deeper, you might want to look at CRAN: an online collection of R modules with source code: cran.r-project.org
I just found this question and thought I would add a few resources to it. I really like the Quick-R site:
http://www.statmethods.net/
Muenchen has written a book about using R if you come from SAS or SPSS. Originally it was an 80 page online doc that Springer encouraged him to make a 400+ page book out of. The original short form as well as the book are here:
http://rforsasandspssusers.com/
You've probably already seen these, but worth listing:
http://cran.r-project.org/doc/manuals/R-intro.pdf
http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
http://cran.r-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip
I don't want to sound like a trite RTFM guy, but the help files generally have great short snips of working code as examples. I'm no R pro so I end up having to deconstruct the examples to understand them. That process, while tedious, is really useful.
Good luck!
EDIT: well I hesitated to be self linking (it feels a bit masturbatory) but here's my own list of R resources with descriptions and comments on each: http://www.cerebralmastication.com/?page_id=62
The Rosetta Code project shows R compared to other languages.
How about CRAN? You've got over a thousand packages of code to choose from.
The simplest way of seeing code, is to
install R
type "help.start()" or look at online documentation, to get names of functions
type the function name at the prompt
This will print the source code right at the prompt, and illustrate all manner of odd and interesting syntax corners.
The Learning R blog has a lot of good examples. Lately, the author has been doing a visualization series, comparing Lattice and ggplot2.
It is hard to google r, because of it being too short a name. Try http://rseek.org/, which provides an r-customized Google search instead. Search on examples, code in repositories, etc.
Some simple examples can be found at Mathesaurus - if you know e.g. Python or Matlab, look at the respective comparison charts to find the R idioms that correspond to your familiar idioms in the other language.
I use the R Graph Gallery. It has been a lot of help on graphing itself. Lots of good examples.
#R on Freenode has also been very useful.
http://had.co.nz/ggplot2/ has a lot of graphics with example code. And you only need one package to create almost every graph you need.
There is also the R Wiki which is slowly growing.
As you probably know, R and S are pretty similar (apart from the cost!).
I use to use both, and I highly recommend S Poetry.
I can also highly recommend the M.J. Crawley book, and the shorter Venables & Ripley one.
here are links to the R project group on Linkedin. I put together this list of links and a lot of people have found it useful (some have also made very useful additions)
Use Google Code Search with command "lang:r" and your keyword(s)
Steve McIntyre at http://www.climateaudit.org/ is a big fan of R and often posts working code.
There is a scripts category, and the Statistics and R lists some other resources

Resources