How did you experience the transition from SPSS to R? [closed] - r

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The discussion in this question is the direct cause for me asking this question. The more general reason is the fact that I often have to explain R use to people that are only familiar with SPSS. I know most of the basics of SPSS, as we still use it in the base course statistics. But as I'm more of an R guy, it's difficult to know how SPSS users experience the first meeting with R.
I know there is the book R for SAS and SPSS users and that contains already some information. Yet, I would like to know what the more difficult parts are when you switch from SPSS to R.
Or in other words : if you would have to explain R in one day to SPSS users, which topics would you focus on? This is not a hypothetical question by the way (yeah, I know, it's not because one get paid for it that it always makes sense...).

Firstly, data manipulation has been the most challenging thing to learn coming from SPSS/SAS to R. I've found, personally, that getting the data in the right shape for an analysis is usually much more difficult than the analysis itself. Secondly, a true understanding of how to deal with categorical values through the use of factors. Lastly, summary statistics and descriptives can sometimes be challenging to get in a format that is transmutable to PPT or Excel which are what (my) clients generally expect/demand for reporting.
I would focus on:
1 Data manipulation
Understanding data structures. Import/Export. Then in-depth training on the use of packages like plyer, reshape with a particular focus on how to effectively use cast with formulas and melt with ids. How to apply numerical functions within a data.frame using ddply.
2 Factoring Data
In general, an explanation of dealing with recoding with, epicalc or a user-defined function. Also an explanation of the significance of factors, levels, and labels
3 Descriptives
Take a few minutes to introduce xtabs(), table(), prop.table() using cast() from reshape to create columnar tables of data that are more reasonably exported to Excel.
Graphics are optional, if you've done a good job of the above they should be able to get the data they need to create graphs in whatever software they are most comfortable with.
4 Graphics
If you've done a good job teaching the data manipulation, getting data into the shape needed for graphing should be pretty straightforward (or at least reproducible) at this point. ggplot2 is complicated and requires a day just by itself to be played with. But it is possible to give a quick overview of it. Alternatively, base graphics are simple to understand and the help is much more clear on what things do and how the syntax works.
Note: I left out statistical analysis. However, an overview of lm() and perhaps anova(), or cor() would be helpful as a start point. But this should be explained at the same time as data.manipulation.

Although I "wrote the book" on R to SPSS migration, that was aimed at programmers and most SPSS users that I know prefer to "point-and-click" instead. A graphical user interface like Deducer (or R Commander) can help them feel at home while teaching them how R programming code works if they want to see it. Deducer's Plot Builder also does a nice job letting you create complex plots easily, and if you want to learn to ggplot2 code, it will show you that as well. Ian did a great job with it!
However, while the SPSS graphical user interface covers 98% of what SPSS can do, Deducer covers perhaps 1% of what R can do. That's probably still 75% of what your average researcher needs, but R is so broad that to get the most out of it people will need to learn to program. The free version of my book, "R for SAS and SPSS Users" is only 80 pages & covers the areas of programming that I think are most likely to confuse beginners. It's at http://r4stats.com.

Just recently I've had a student who was somewhat versed in statistics and did some analysis beforehand in SPSS. I then showed him how to do the exact same thing in R. We went through the code and plotting, explaining and debating each line. He realized how easy and convenient it is to do it in R. Thus, R community grew by 1. :)

The biggest issue that the researchers I've dealt with have is the lack of point-and-click GUI. While there are a number of efforts out there in the R community, none of them have reached the ease-of-use/power level that SPSS has.
Since coding is second nature to R users, sometimes we forget that the majority of users of statistical software can't program (and would avoid it like the plague), even though they may have a strong practical understanding of statistics.
If I had one day to bring an SPSS user into R, I'd start them on Deducer. Deducer is an R GUI project (Self promotion note: I'm the author) that should feel very familiar to a user coming from SPSS. As they find themselves needing more advanced functions, they will naturally move to the command line to fulfill their needs.

Related

How can I measure my usage of R? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm writing an annual report for uni, in which I would like to detail how my usage of R has increased over the past year. I'm looking metrics that I can use to describe my usage of R. Some possible metrics to describe usage:
number of lines of code in history
number of errors
hours spent using program
number of times a particular function has been called
number of plots made
So my question is: can I extract any of the above from R, or can I extract any other metrics which would demonstrate my usage of R?
First, I'm not sure that this question is at all suited to Stack Overflow. Second, I think that the metrics you've identified are not really suitable. Let's look at the ones you've shortlisted so far:
Number of lines of code in history
You make a lot of tweaks to your code. They accumulate in your history. Your history now has a lot of lines of code. Does that reflect positively of your usage of R? Or, you like to write code like the following in R:
temp <- 0
for (i in 1:10) {
temp <- temp + i
}
print(temp)
while a person familiar with R would just write sum(1:10). One line versus five. Can we really say that number of lines is a good metric?
Number of errors
Maybe there is some merit to this. But are you going to classify errors in some way? Is a missing or misplaced bracket forgivable? What about times when no error or warning is issued but R behaves in a way that you might not have expected, thus leading to unexpected results (for example, assuming that numeric(0) and factor(0) would behave the same way). See here for some R gotchas, several of which won't provide any indication of an error, but would certainly lead to erroneous analysis. How would they be analyzed with this metric?
Number of hours spent using the program
Again, debatable. How do you measure the number of hours? Time spent coding? Time the computer spends processing your code? Time it took you to figure out how to program your problem?
Number of times a particular function has been called
I don't understand this metric at all. Do more obscure functions get a higher weight (for example, if you are one of those who use vapply while the rest of the schmucks use sapply, do you get bonus points for using vapply because it can be safer (and sometimes faster) to use?)
Number of plots made
Sorry, but again, I don't understand this metric at all. First of all, not all plots are created equally! There are several in the data visualization field who feel that a lot of software ruined data visualization because some software (a very popular spreadsheet program, in particular) made it so easy for people to quickly make gaudy plots. With R, they are less gaudy by default, but that in itself doesn't make it good. So, if you're just measuring the number of plots churned out without some other criteria for quality assessment, then I'm not sure how this metric is useful.
And, from your comment to your question:
Actually...stack overflow reputation points might be as good as anything!
Eh... The only time I really use R is to answer questions on Stack Overflow (unfortunately true). At the same time, almost all my reputation points here are from the questions I've answered in the R tag. Sure, there are some users here that I would really trust, but sometimes, I don't even trust myself, so I don't know if that's a good indicator of your usage of R.
Lots of users have also complained that Stack Overflow voting is totally wacky, so I'm not sure that you really can use "reputation" as a valid measure of skill. For example, there's an ongoing discussion among regular users here that answers to "easy" questions get voted up very quickly (because they are easy to verify, often without even running the code) while answers to "complicated" questions don't yield votes proportional to the effort taken to answer the question. Case in point: Why the heck do I have a "Guru" badge for an answer that is essentially a reordered version of data already easily available with two minutes on Google. I'm not particularly proud of that answer, and it certainly doesn't say anything about my "usage" of R.
Now, to make this so that it might qualify as an answer and not just an extended comment on your question itself, the biggest thing that I would consider valid, but not sure how to measure it, would be something like how active you are in the R community. There are many ways to get involved with R, from writing or contributing to packages, filing bug reports, conducting workshops to help others make the switch to R, and so on.
I'm not suggesting that you need to write a book, as several others here have done, or to become a legendary package developer with a cult of underscore followers, but you can take small steps. For instance, although I'm a writing teacher, I have held workshops for students and written a few "getting started tips" just to introduce them to using R, so they can consider adding it to their toolkit. Many other users here regularly blog about their experiences working with R and, again, as this is part of a community, they learn a lot in the process.
Finally, a couple of more ideas:
#PaulHiemstra suggested in his comment that you could "mention the percentage of your programming work you do in R." I would extend that concept as follows: (1) try to measure how much of your work overall is done in R and tools complementary to R (obvious ones like Sweave/knitr/LaTeX come to mind), and (2) try to measure how much of an impact using R has had on improving your overall skills (with the logic being that good programming is often accompanied by logical thought, careful organization, good documentation, and so on).
Related to the previous point, try to see how your usage of R has changed with time. Has your behavior changed from manually redoing the same steps to writing functions yet? Have you then gone back and adapted those functions so that, instead of solving a specific problem you had at a given point in time, they can be used more generally by a larger audience? These are pretty significant changes, particularly if you had started from scratch with the language, and they can be a bit more meaningful than the ideas you presented in your question.
So, to summarize, a lot of the somewhat easily quantifiable things that you've identified in your question will probably lead to very meaningless analysis. I feel that the qualitative inputs you make would be much more valuable.
Another metric: Get an old and complex (don't know if you have one) code and redo it from 0. Use the difference of computation time as metric.

Are there any guidelines for when reproducible code should be included into a publication?

Given the stress toward reproducible science, I was wondering if my recent work warrants the inclusion of example code in the publication. The datasets that I am using are quite big, so it wouldn't make sense to publish those necessarity - However, the statistical methods that I apply within R are not generally known to my audience (although I would think that they should be).
I'm using empirical orthogonal function analysis (EOF) and generalized additive models (GAM) within my analysis. GAM, in particular, is widely used in ecological studies, but less so within the physical sciences - my work spans both disciplines.
I definitely refer to the R packages that I use, and it wouldn't really be difficult for a reviewer / reader to look for those references (and included examples) themselves. So, my question is, what situations are most appropriate for the inclusion of reproducible code in a publication?
Code is the most accurate representation of what you actually did. Therefore, in my view you should always aim to publish code alongside your article.
However, editor resistance to this is pretty strong. The fear is that if the reviewer had access to the code, then the journal looks pretty bad if a substantive coding mistake is later found. This is not a hypothetical fear, given the Levitt paper, etc.
Knuth has some strong views on literate programming that you should be able to cite as justification. If you can't convince the journal to accept your code as an integral piece of the publication, consider publishing it on your personal website (the approach taken by e.g. Raj Chetty for many of his papers) or publish it as an R package.
Finally, here's a note I wrote to my programming students:
Consider publishing your code. Doing so will act as a commitment
device which will encourage good habits--habits that make your own
work easier. Publishing your code also makes it easier for others to
extend your analysis, which can result in more citations of your work.
Releasing your code is good academic practice as well: it is the
truest testament to your analysis. And offering your program to the
world shows off the beautiful coding skills which you are about to
acquire.
A basic tenet of science is reproducibility. So the answer would be to "include" code required to conduct your analysis to every paper/publication that is based on data analysis.
I say "include" because you don't need to put the R code directly into the paper. Many if not most journals allow supplementary material which is an option. Alternative, supply your script to one of the many Science data archiving sites (Such as Figshare) and then (and here is the killer!) cite your own script using the DOI that Figshare gives to your deposited script. If you can post the data too, then all the better; Figshare doesn't really care too much about big data sets.
The above applies to code where you are using other packages and your R script does things like loads and formats data, calls functions from other packages and then plots or displays output/results. If you have developed new R code to implement a particular method then I would say package the code as an R package and submit that to CRAN or r-forge or something like that.
From your description, the former (deposit the analysis script in a repo) would be most appropriate.
We recently had a discussion at our research institute regarding reproducible research. The incentive came from the Nature editorial (http://arstechnica.com/science/2012/02/science-code-should-be-open-source-according-to-editorial/) which argued that all your code should be published. I whole heartedly agree with this. Even though your dataset is very big, publishing the R code that you used to create your results makes it crystal clear what you did. Often times the methods of a paper do not contain sufficient detail to reproduce the result, the code is quite a help in this case.

Abbreviations and functions in preparation for a programming contest [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am participating in a big programming competition tomorrow where I use R.
Time is the main factor (only 2 hours for 7 coding problems).
The problems are very mathematics related.
I would like to write "f" instead of "function" when I define a function.
This can be done and I had the code to do so, but I lost it and cannot find it.
Where do I find sin() functions for degrees input, not radian?
(optional) Is there any algorithm specific task view or libraries.
Any tip for a programming contest?
I prepared the following cheat sheet for the contest:
http://pastebin.com/h5xDLhvg
======== EDIT: ==========
So I finally have time to write down my lessons learned.
The programming contest was a lot of fun, but unfortunately I did not score very well. I was in the top 50%, but my aim was to be in the top 25%.
The main problem was that there was very little time to program, just 2 hours in total. But I had to read the problem descriptions and also I needed some time to paste the results in the web form, etc., so it was more like 90 Minutes of programming.
Hopefully the next contest in December will have extended time, like 3-4 hours. The organizers said that perhaps will be the case.
Also, there was no Internet access at the contest, and my mobile reception was not really working.
The main lesson for me is that you have to use a language you daily use in order to have a real chance. Especially, if there is only about 90 Minutes time to program. Since I use haskell more than R in my daily work, I think R was not the best choice. During the contest I mixed up haskell and R function definitions, and I made too many small typos to program fast enough.
What was great about the contest was, that there was about 20 000 bucks prize money in total for the about 80 participants. So the top 25% participants got from 500 to 1500 bucks each. Further, I think the top 15% get a job right away from one of the sponsor IT firms.
So it's a win-win situation. It's fun, plus you can get prize money. Further the IT firms are more than happy, because they have access to the top programmers.
I used the chance to speak to IT decision makers. One of them was from a larger bank. I boldly suggested that they consider switching to Scala for their development (switchung from Java). And also to consider using R and Haskell. It was fun, and they even said they already looked into Scala!
What was interesting to note was, that one of my best friends scored very good at the competition. He is only 19 years old, but he was well in the top 20% and got 500 bucks prize money. He beat me plus 6 of my colleges, who all have a respectable computer science degree. My friend programs more like hacker style, but he was very fast.
People in the top 10 used:
1) Java
2) C# and
3) C++
(No other programming language in the top 10!).
The only other programming language that scored reasonably well was Ruby, I think.
For the next contest the programming language of choice will probably be haskell. For one reason, it's just easier to find 2 team mates for haskell than for R programming. And up to 3 persons can form a team.
My ideal scenario would be a very light weight framework, where I could use multiple programming languages at once for the contest. That way, the main code can be written in haskell (which all team mates can program in). And some specific functions may be programmed in R, or in Mathematica, or even some other programming language (like python/sage).
This sounds a little bit overkill. But I think it would be very usefull. Like a function that has a matrix as a parameter and returns a matrix. Then this framework work generate automatically a RESTful service from the R code, so I could call the R function from any programming language. The matrix is just passed around as JSON data (or some other serialization). Okay, but this is off topic...
So finally some lessons learned as a bullet list:
don't bring food. you don't have time to eat, and there is a rich buffet afterwards
time is the limiting factor!
if you don't program R for a living, don't use R
look for contests where there is more time (3-4 hourss minimum!)
all in all, the concept of the contest is superb! Both for the participants, but also for the sponsors.
BIG THANKS to the help of 'Iterator' for his post!!
I'm going to answer a related, but different question. No offense, but your original suggestions don't seem very wise for a programming competition. Much of the time spent in such contexts is in devising an answer and in debugging (or, better, avoiding the need to debug).
Instead, I will answer this question: "What are the key resources in R that are useful for rapid prototyping, with a focus on being able to find resources quickly, being able to debug quickly, and being able to investigate data quickly? If I need to use numerical optimization methods and algebra systems, what should I investigate?"
Here are my answers:
Install RStudio or possibly Revolution Analytics' R, depending on which interface seems more appropriate to you. Both are good. The former has a very smooth GUI, the latter has a more intense interface, with more capabilities for managing code. Both have some nice properties over the "community" R regarding being able to look up information and navigate the help libraries quickly.
Get acquainted with example(), identify where to get vignettes and tutorials (from packages' pages on CRAN), and take a brief look at demo().
Use the sos library, and master findFn.
Look at the Task Views on CRAN - be sure you know about the tools for high performance computing (if that is going to be related) and the tools for optimization - it's quite common to need to use some kind of solver, and there's a task view for that.
If your code is running slowly during the prototyping or competition, you'll need to run Rprof(). Take that for a spin first. You may also benefit from using the compiler package if your code involves much iteration. In short: You do not want to wait on the computer. You might also look at foreach and doSMP or doMC if you can parcel the job to different cores. To aggregate results, become familiar with plyr and methods like ldply, as well as standard *apply functions, like lapply and apply; another good one to know is rapply. (If you have lots of stuff to process and it takes some time, look at mclapply or the .parallel argument for the plyr functions.)
On Stack Overflow: browse JD Long's questions - much of what you will discover that you do not know will have been asked by him before you thought to ask it. And there's an answer already there.
Create a number of little code templates for yourself. Master functions so that you don't need to learn these in a rush. Learn how to debug and step through these, using debug() and browser().
If you have to count things, learn how to use the hash package (akin to Perl and Python hash tables) and learn to use digest for keys that are too long to be used for hash (see this question for references)
If you are going to need to plot things, get some basic example plots prepared, using either plot or ggplot2, along with hist, boxplot, and some others. If you don't know ggplot2 already, then postpone, but you should become familiar with it. If you happen to use a lot of data, then be sure you know hexbin. If you will have to interact with data, then get to know iplots and the interesting tools there, such as iplot, ihist, and parallel coordinate plots (ipcp).
Be sure you know how to use lists, data frames, and matrices, including subscripting, lookups of entries based on (row, column) indices. (Again, be sure to investigate plyr for transforming and operating on some of these objects.)
Get acquainted with data.table() - it's exceptionally efficient for a lot of things you might do with data frames and matrices.
If you need to do symbolic mathematics, be sure you know the packages for that or else get another standalone tool for symbolic math. Ryacas is one package that appears to be useful.
Get the PDF of the R in a Nutshell, so that you can rapidly search through it for useful methods. Else, get the book itself. Various other books, such as Venables & Ripley, the R Cookbook, and others may be useful, depending on your experience.
If you've already mastered a good editor (e.g. emacs) or IDE (e.g. Eclipse), stick with it and look for bindings to R. Otherwise, a simple one you can begin using right away is Notepad++. Being able to do block selection is a very useful property in an editor. Being able to search through an entire directory hierarchy of code examples is another useful capability.
If you need to do anything involving database data, you may want to know RSQLite and sqldf, though these may not be relevant to a math competition.
Open a bunch of R instances so that you can try things out. :) [This is actually serious: by having multiple instances running, you can somewhat avoid latency associated with sequentially trying things out, waiting for results, and then debugging the results.]
For (1), you can do something like
f <- function(..., body)
{
dots <- substitute(...)
body <- substitute(body)
f <- function()
formals(f) <- dots
body(f) <- body
environment(f) <- parent.env(environment())
f
}
which lets you write, eg, g <- f(x, y, body=x+y) but I'm not sure how far that gets you.
For (2), you could just do:
sindeg <- function(x) sin(x*pi/180)

Any documentation for optimizing the performance of R? [duplicate]

This question already has answers here:
Speed up the loop operation in R
(10 answers)
Closed 9 years ago.
I'm fairly new to R, and one thing that has struck me is that it's running fairly slow. Is there any documentation for optimizing R? For example, optimizing Python is described very good here. In my particular case I'm interested in optimizing R for batch jobs.
I have tried Googling for an answer of course, but it's not exactly easy to Google for R info since R is a pretty generic little search pattern.
For start, you should take a look at R Inferno by Patric Burns.
Than the best idea is to ask more detailed questions here.
Yes, R is a bit awkward for a search term, so try RSiteSearch("performance") within R - this will search within lots of R docs sources.
a simple google search on 'efficient programming in r' reveals the following excellent resources. the first resource is great as it provides a comparison of the bad, good and best ways of programming a task in R. the second resource is more generic.
http://perswww.kuleuven.be/~u0044882/Research/slidesR.pdf
http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
if you are looking at more specific areas of optimizing your R code, specify it more clearly and i am sure you will find an expert here !!
"It's running fairly slow" is very vague. There are many techniques for using R in the most efficient way, the general rule is "avoid loops, and vectorize" - but there is so much more such as ensuring objects are pre-allocated rather than resized on the fly.
It really depends on what you are doing, so please be more specific. The standard documentation has plenty of tips for the basics and your question does not really give opportunity for someone to do any more than regurgitate those.
When standard R really is limited for your needs you can write directly in a compiled language such as C, or use advanced interfaces such as Rcpp. For other tools and techniques that extend beyond the basic R toolkit consult the "High Performance Computing" Task View on CRAN.

R and SPSS difference

I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
When talking about user friendlyness
of computer software I like the
analogy of cars vs. busses:
Busses are very easy to use, you just
need to know which bus to get on,
where to get on, and where to get off
(and you need to pay your fare). Cars
on the other hand require much more
work, you need to have some type of
map or directions (even if the map is
in your head), you need to put gas in
every now and then, you need to know
the rules of the road (have some type
of drivers licence). The big advantage
of the car is that it can take you a
bunch of places that the bus does not
go and it is quicker for some trips
that would require transfering between
busses.
Using this analogy programs like SPSS
are busses, easy to use for the
standard things, but very frustrating
if you want to do something that is
not already preprogrammed.
R is a 4-wheel drive SUV (though
environmentally friendly) with a bike
on the back, a kayak on top, good
walking and running shoes in the
pasenger seat, and mountain climbing
and spelunking gear in the back.
R can take you anywhere you want to go
if you take time to leard how to use
the equipment, but that is going to
take longer than learning where the
bus stops are in SPSS.
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:
Output of tables - SPSS has basic tables, general tables, custom tables, etc that are all output to that nifty data viewer or whatever they call it. These can relatively easily be transported to Word Documents or Excel sheets for further analysis / presentation. The equivalent function in R involves learning LaTex or using a odfWeave or Lyx or something of that nature.
Labeling of data --> SPSS does a pretty good job with the variable labels and value labels. I haven't found a robust solution for R to accomplish this same task.
You mention that you are going to be scripting most of your work, and personally I find SPSS's scripting syntax absolutely horrendous, to the point that I've stopped working with SPSS whenever possible. R syntax seems much more logical and follows programming standards more closely AND there is a very active community to rely on should you run into trouble (SO for instance). I haven't found a good SPSS community to ask questions of when I run into problems.
Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.
If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?
There's an interesting (and reasonably fair) comparison between a number of stats tools here
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
I work with both in a company and can say the following:
If you have a large team of different people (not all data scientists), SPSS is useful because it is plain (relatively) to understand. For example, if users are going to run a model to get an output (sales estimates, etc), SPSS is clear and easy to use.
That said, I find R better in almost every other sense:
R is faster (although, sometimes debatable)
As stated previously, the syntax in SPSS is aweful (I can't stress this enough). On the other hand, R can be painful to learn, but there are tons of resources online and in the end it pays much more because of the different things you can do.
Again, like everyone else says, the sky is the limit with R. Tons of packages, resources and more importantly: indepedence to do as you please. In my organization we have some very high level functions that get a lot done. The hard part is creating them once, but then they perform complicated tasks that SPSS would tangle in a never ending web of canvas. This is specially true for things like loops.
It is often overlooked, but R also has plenty of features to cooperate between teams (github integration with RStudio, and easy package building with devtools).
Actually, if everyone in your organization knows R, all you need is to maintain a basic package on github to share everything. This of course is not the norm, which is why I think SPSS, although a worst product, still has a market.
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
SPSS provides a GUI to easily integrate existing R programs or develop new ones. For more info, see the SPSS Community on IBM Developer Works.
#Henrik, I did the same task you have mentioned (C++ and R) on SPSS. And it turned out that SPSS is faster compared to R on this one. In my case SPSS is aprox. 7 times faster. I am surprised about it.
Here is a code I used in SPSS.
data list free
/x (f8.3).
begin data
1
end data.
comp n = 1e6.
comp t1 = $time.
loop #rep = 1 to 10.
comp x = 1.
loop #i=1 to n.
comp x = 1/(1+x).
end loop.
end loop.
comp t2 = $time.
comp elipsed = t2 - t1.
form elipsed (f8.2).
exe.
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
The truth is: both packages are useful if you do data analysis professionally. Sure, R / RStudio has more statistical methods implemented than SPSS. But SPSS is much easier to use and gives more information per each button click. And, therefore, it is faster to exploit whenever a particular analysis is implemented in both R and SPSS.
In the modern age, neither CPU nor memory is the most valuable resource. Researcher's time is the most valuable resource. Also, tables in SPSS are more visually pleasing, in my opinion.
In summary, R and SPSS complement each other well.

Resources