Abbreviations and functions in preparation for a programming contest [closed] - r

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I am participating in a big programming competition tomorrow where I use R.
Time is the main factor (only 2 hours for 7 coding problems).
The problems are very mathematics related.
I would like to write "f" instead of "function" when I define a function.
This can be done and I had the code to do so, but I lost it and cannot find it.
Where do I find sin() functions for degrees input, not radian?
(optional) Is there any algorithm specific task view or libraries.
Any tip for a programming contest?
I prepared the following cheat sheet for the contest:
http://pastebin.com/h5xDLhvg
======== EDIT: ==========
So I finally have time to write down my lessons learned.
The programming contest was a lot of fun, but unfortunately I did not score very well. I was in the top 50%, but my aim was to be in the top 25%.
The main problem was that there was very little time to program, just 2 hours in total. But I had to read the problem descriptions and also I needed some time to paste the results in the web form, etc., so it was more like 90 Minutes of programming.
Hopefully the next contest in December will have extended time, like 3-4 hours. The organizers said that perhaps will be the case.
Also, there was no Internet access at the contest, and my mobile reception was not really working.
The main lesson for me is that you have to use a language you daily use in order to have a real chance. Especially, if there is only about 90 Minutes time to program. Since I use haskell more than R in my daily work, I think R was not the best choice. During the contest I mixed up haskell and R function definitions, and I made too many small typos to program fast enough.
What was great about the contest was, that there was about 20 000 bucks prize money in total for the about 80 participants. So the top 25% participants got from 500 to 1500 bucks each. Further, I think the top 15% get a job right away from one of the sponsor IT firms.
So it's a win-win situation. It's fun, plus you can get prize money. Further the IT firms are more than happy, because they have access to the top programmers.
I used the chance to speak to IT decision makers. One of them was from a larger bank. I boldly suggested that they consider switching to Scala for their development (switchung from Java). And also to consider using R and Haskell. It was fun, and they even said they already looked into Scala!
What was interesting to note was, that one of my best friends scored very good at the competition. He is only 19 years old, but he was well in the top 20% and got 500 bucks prize money. He beat me plus 6 of my colleges, who all have a respectable computer science degree. My friend programs more like hacker style, but he was very fast.
People in the top 10 used:
1) Java
2) C# and
3) C++
(No other programming language in the top 10!).
The only other programming language that scored reasonably well was Ruby, I think.
For the next contest the programming language of choice will probably be haskell. For one reason, it's just easier to find 2 team mates for haskell than for R programming. And up to 3 persons can form a team.
My ideal scenario would be a very light weight framework, where I could use multiple programming languages at once for the contest. That way, the main code can be written in haskell (which all team mates can program in). And some specific functions may be programmed in R, or in Mathematica, or even some other programming language (like python/sage).
This sounds a little bit overkill. But I think it would be very usefull. Like a function that has a matrix as a parameter and returns a matrix. Then this framework work generate automatically a RESTful service from the R code, so I could call the R function from any programming language. The matrix is just passed around as JSON data (or some other serialization). Okay, but this is off topic...
So finally some lessons learned as a bullet list:
don't bring food. you don't have time to eat, and there is a rich buffet afterwards
time is the limiting factor!
if you don't program R for a living, don't use R
look for contests where there is more time (3-4 hourss minimum!)
all in all, the concept of the contest is superb! Both for the participants, but also for the sponsors.
BIG THANKS to the help of 'Iterator' for his post!!

I'm going to answer a related, but different question. No offense, but your original suggestions don't seem very wise for a programming competition. Much of the time spent in such contexts is in devising an answer and in debugging (or, better, avoiding the need to debug).
Instead, I will answer this question: "What are the key resources in R that are useful for rapid prototyping, with a focus on being able to find resources quickly, being able to debug quickly, and being able to investigate data quickly? If I need to use numerical optimization methods and algebra systems, what should I investigate?"
Here are my answers:
Install RStudio or possibly Revolution Analytics' R, depending on which interface seems more appropriate to you. Both are good. The former has a very smooth GUI, the latter has a more intense interface, with more capabilities for managing code. Both have some nice properties over the "community" R regarding being able to look up information and navigate the help libraries quickly.
Get acquainted with example(), identify where to get vignettes and tutorials (from packages' pages on CRAN), and take a brief look at demo().
Use the sos library, and master findFn.
Look at the Task Views on CRAN - be sure you know about the tools for high performance computing (if that is going to be related) and the tools for optimization - it's quite common to need to use some kind of solver, and there's a task view for that.
If your code is running slowly during the prototyping or competition, you'll need to run Rprof(). Take that for a spin first. You may also benefit from using the compiler package if your code involves much iteration. In short: You do not want to wait on the computer. You might also look at foreach and doSMP or doMC if you can parcel the job to different cores. To aggregate results, become familiar with plyr and methods like ldply, as well as standard *apply functions, like lapply and apply; another good one to know is rapply. (If you have lots of stuff to process and it takes some time, look at mclapply or the .parallel argument for the plyr functions.)
On Stack Overflow: browse JD Long's questions - much of what you will discover that you do not know will have been asked by him before you thought to ask it. And there's an answer already there.
Create a number of little code templates for yourself. Master functions so that you don't need to learn these in a rush. Learn how to debug and step through these, using debug() and browser().
If you have to count things, learn how to use the hash package (akin to Perl and Python hash tables) and learn to use digest for keys that are too long to be used for hash (see this question for references)
If you are going to need to plot things, get some basic example plots prepared, using either plot or ggplot2, along with hist, boxplot, and some others. If you don't know ggplot2 already, then postpone, but you should become familiar with it. If you happen to use a lot of data, then be sure you know hexbin. If you will have to interact with data, then get to know iplots and the interesting tools there, such as iplot, ihist, and parallel coordinate plots (ipcp).
Be sure you know how to use lists, data frames, and matrices, including subscripting, lookups of entries based on (row, column) indices. (Again, be sure to investigate plyr for transforming and operating on some of these objects.)
Get acquainted with data.table() - it's exceptionally efficient for a lot of things you might do with data frames and matrices.
If you need to do symbolic mathematics, be sure you know the packages for that or else get another standalone tool for symbolic math. Ryacas is one package that appears to be useful.
Get the PDF of the R in a Nutshell, so that you can rapidly search through it for useful methods. Else, get the book itself. Various other books, such as Venables & Ripley, the R Cookbook, and others may be useful, depending on your experience.
If you've already mastered a good editor (e.g. emacs) or IDE (e.g. Eclipse), stick with it and look for bindings to R. Otherwise, a simple one you can begin using right away is Notepad++. Being able to do block selection is a very useful property in an editor. Being able to search through an entire directory hierarchy of code examples is another useful capability.
If you need to do anything involving database data, you may want to know RSQLite and sqldf, though these may not be relevant to a math competition.
Open a bunch of R instances so that you can try things out. :) [This is actually serious: by having multiple instances running, you can somewhat avoid latency associated with sequentially trying things out, waiting for results, and then debugging the results.]

For (1), you can do something like
f <- function(..., body)
{
dots <- substitute(...)
body <- substitute(body)
f <- function()
formals(f) <- dots
body(f) <- body
environment(f) <- parent.env(environment())
f
}
which lets you write, eg, g <- f(x, y, body=x+y) but I'm not sure how far that gets you.

For (2), you could just do:
sindeg <- function(x) sin(x*pi/180)

Related

How can I measure my usage of R? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm writing an annual report for uni, in which I would like to detail how my usage of R has increased over the past year. I'm looking metrics that I can use to describe my usage of R. Some possible metrics to describe usage:
number of lines of code in history
number of errors
hours spent using program
number of times a particular function has been called
number of plots made
So my question is: can I extract any of the above from R, or can I extract any other metrics which would demonstrate my usage of R?
First, I'm not sure that this question is at all suited to Stack Overflow. Second, I think that the metrics you've identified are not really suitable. Let's look at the ones you've shortlisted so far:
Number of lines of code in history
You make a lot of tweaks to your code. They accumulate in your history. Your history now has a lot of lines of code. Does that reflect positively of your usage of R? Or, you like to write code like the following in R:
temp <- 0
for (i in 1:10) {
temp <- temp + i
}
print(temp)
while a person familiar with R would just write sum(1:10). One line versus five. Can we really say that number of lines is a good metric?
Number of errors
Maybe there is some merit to this. But are you going to classify errors in some way? Is a missing or misplaced bracket forgivable? What about times when no error or warning is issued but R behaves in a way that you might not have expected, thus leading to unexpected results (for example, assuming that numeric(0) and factor(0) would behave the same way). See here for some R gotchas, several of which won't provide any indication of an error, but would certainly lead to erroneous analysis. How would they be analyzed with this metric?
Number of hours spent using the program
Again, debatable. How do you measure the number of hours? Time spent coding? Time the computer spends processing your code? Time it took you to figure out how to program your problem?
Number of times a particular function has been called
I don't understand this metric at all. Do more obscure functions get a higher weight (for example, if you are one of those who use vapply while the rest of the schmucks use sapply, do you get bonus points for using vapply because it can be safer (and sometimes faster) to use?)
Number of plots made
Sorry, but again, I don't understand this metric at all. First of all, not all plots are created equally! There are several in the data visualization field who feel that a lot of software ruined data visualization because some software (a very popular spreadsheet program, in particular) made it so easy for people to quickly make gaudy plots. With R, they are less gaudy by default, but that in itself doesn't make it good. So, if you're just measuring the number of plots churned out without some other criteria for quality assessment, then I'm not sure how this metric is useful.
And, from your comment to your question:
Actually...stack overflow reputation points might be as good as anything!
Eh... The only time I really use R is to answer questions on Stack Overflow (unfortunately true). At the same time, almost all my reputation points here are from the questions I've answered in the R tag. Sure, there are some users here that I would really trust, but sometimes, I don't even trust myself, so I don't know if that's a good indicator of your usage of R.
Lots of users have also complained that Stack Overflow voting is totally wacky, so I'm not sure that you really can use "reputation" as a valid measure of skill. For example, there's an ongoing discussion among regular users here that answers to "easy" questions get voted up very quickly (because they are easy to verify, often without even running the code) while answers to "complicated" questions don't yield votes proportional to the effort taken to answer the question. Case in point: Why the heck do I have a "Guru" badge for an answer that is essentially a reordered version of data already easily available with two minutes on Google. I'm not particularly proud of that answer, and it certainly doesn't say anything about my "usage" of R.
Now, to make this so that it might qualify as an answer and not just an extended comment on your question itself, the biggest thing that I would consider valid, but not sure how to measure it, would be something like how active you are in the R community. There are many ways to get involved with R, from writing or contributing to packages, filing bug reports, conducting workshops to help others make the switch to R, and so on.
I'm not suggesting that you need to write a book, as several others here have done, or to become a legendary package developer with a cult of underscore followers, but you can take small steps. For instance, although I'm a writing teacher, I have held workshops for students and written a few "getting started tips" just to introduce them to using R, so they can consider adding it to their toolkit. Many other users here regularly blog about their experiences working with R and, again, as this is part of a community, they learn a lot in the process.
Finally, a couple of more ideas:
#PaulHiemstra suggested in his comment that you could "mention the percentage of your programming work you do in R." I would extend that concept as follows: (1) try to measure how much of your work overall is done in R and tools complementary to R (obvious ones like Sweave/knitr/LaTeX come to mind), and (2) try to measure how much of an impact using R has had on improving your overall skills (with the logic being that good programming is often accompanied by logical thought, careful organization, good documentation, and so on).
Related to the previous point, try to see how your usage of R has changed with time. Has your behavior changed from manually redoing the same steps to writing functions yet? Have you then gone back and adapted those functions so that, instead of solving a specific problem you had at a given point in time, they can be used more generally by a larger audience? These are pretty significant changes, particularly if you had started from scratch with the language, and they can be a bit more meaningful than the ideas you presented in your question.
So, to summarize, a lot of the somewhat easily quantifiable things that you've identified in your question will probably lead to very meaningless analysis. I feel that the qualitative inputs you make would be much more valuable.
Another metric: Get an old and complex (don't know if you have one) code and redo it from 0. Use the difference of computation time as metric.

How did you experience the transition from SPSS to R? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The discussion in this question is the direct cause for me asking this question. The more general reason is the fact that I often have to explain R use to people that are only familiar with SPSS. I know most of the basics of SPSS, as we still use it in the base course statistics. But as I'm more of an R guy, it's difficult to know how SPSS users experience the first meeting with R.
I know there is the book R for SAS and SPSS users and that contains already some information. Yet, I would like to know what the more difficult parts are when you switch from SPSS to R.
Or in other words : if you would have to explain R in one day to SPSS users, which topics would you focus on? This is not a hypothetical question by the way (yeah, I know, it's not because one get paid for it that it always makes sense...).
Firstly, data manipulation has been the most challenging thing to learn coming from SPSS/SAS to R. I've found, personally, that getting the data in the right shape for an analysis is usually much more difficult than the analysis itself. Secondly, a true understanding of how to deal with categorical values through the use of factors. Lastly, summary statistics and descriptives can sometimes be challenging to get in a format that is transmutable to PPT or Excel which are what (my) clients generally expect/demand for reporting.
I would focus on:
1 Data manipulation
Understanding data structures. Import/Export. Then in-depth training on the use of packages like plyer, reshape with a particular focus on how to effectively use cast with formulas and melt with ids. How to apply numerical functions within a data.frame using ddply.
2 Factoring Data
In general, an explanation of dealing with recoding with, epicalc or a user-defined function. Also an explanation of the significance of factors, levels, and labels
3 Descriptives
Take a few minutes to introduce xtabs(), table(), prop.table() using cast() from reshape to create columnar tables of data that are more reasonably exported to Excel.
Graphics are optional, if you've done a good job of the above they should be able to get the data they need to create graphs in whatever software they are most comfortable with.
4 Graphics
If you've done a good job teaching the data manipulation, getting data into the shape needed for graphing should be pretty straightforward (or at least reproducible) at this point. ggplot2 is complicated and requires a day just by itself to be played with. But it is possible to give a quick overview of it. Alternatively, base graphics are simple to understand and the help is much more clear on what things do and how the syntax works.
Note: I left out statistical analysis. However, an overview of lm() and perhaps anova(), or cor() would be helpful as a start point. But this should be explained at the same time as data.manipulation.
Although I "wrote the book" on R to SPSS migration, that was aimed at programmers and most SPSS users that I know prefer to "point-and-click" instead. A graphical user interface like Deducer (or R Commander) can help them feel at home while teaching them how R programming code works if they want to see it. Deducer's Plot Builder also does a nice job letting you create complex plots easily, and if you want to learn to ggplot2 code, it will show you that as well. Ian did a great job with it!
However, while the SPSS graphical user interface covers 98% of what SPSS can do, Deducer covers perhaps 1% of what R can do. That's probably still 75% of what your average researcher needs, but R is so broad that to get the most out of it people will need to learn to program. The free version of my book, "R for SAS and SPSS Users" is only 80 pages & covers the areas of programming that I think are most likely to confuse beginners. It's at http://r4stats.com.
Just recently I've had a student who was somewhat versed in statistics and did some analysis beforehand in SPSS. I then showed him how to do the exact same thing in R. We went through the code and plotting, explaining and debating each line. He realized how easy and convenient it is to do it in R. Thus, R community grew by 1. :)
The biggest issue that the researchers I've dealt with have is the lack of point-and-click GUI. While there are a number of efforts out there in the R community, none of them have reached the ease-of-use/power level that SPSS has.
Since coding is second nature to R users, sometimes we forget that the majority of users of statistical software can't program (and would avoid it like the plague), even though they may have a strong practical understanding of statistics.
If I had one day to bring an SPSS user into R, I'd start them on Deducer. Deducer is an R GUI project (Self promotion note: I'm the author) that should feel very familiar to a user coming from SPSS. As they find themselves needing more advanced functions, they will naturally move to the command line to fulfill their needs.

R and SPSS difference

I will be analysing vast amount of network traffic related data shortly, and will pre-process the data in order to analyse it. I have found that R and SPSS are among the most popular tools for statistical analysis. I will also be generating quite a lot of graphs and charts. Therefore, I was wondering what is the basic difference between these two softwares.
I am not asking which one is better, but just wanted to know what are the difference in terms of workflow between the two (besides the fact that SPSS has a GUI). I will be mostly working with scripts in either case anyway so I wanted to know about the other differences.
Here is something that I posted to the R-help mailing list a while back, but I think that it gives a good high level overview of the general difference in R and SPSS:
When talking about user friendlyness
of computer software I like the
analogy of cars vs. busses:
Busses are very easy to use, you just
need to know which bus to get on,
where to get on, and where to get off
(and you need to pay your fare). Cars
on the other hand require much more
work, you need to have some type of
map or directions (even if the map is
in your head), you need to put gas in
every now and then, you need to know
the rules of the road (have some type
of drivers licence). The big advantage
of the car is that it can take you a
bunch of places that the bus does not
go and it is quicker for some trips
that would require transfering between
busses.
Using this analogy programs like SPSS
are busses, easy to use for the
standard things, but very frustrating
if you want to do something that is
not already preprogrammed.
R is a 4-wheel drive SUV (though
environmentally friendly) with a bike
on the back, a kayak on top, good
walking and running shoes in the
pasenger seat, and mountain climbing
and spelunking gear in the back.
R can take you anywhere you want to go
if you take time to leard how to use
the equipment, but that is going to
take longer than learning where the
bus stops are in SPSS.
There are GUIs for R that make it a bit easier to use, but also limit the functionality that can be used that easily. SPSS does have scripting which takes it beyond being a mere bus, but the general phylosophy of SPSS steers people towards the GUI rather than the scripts.
I work at a company that uses SPSS for the majority of our data analysis, and for a variety of reasons - I have started trying to use R for more and more of my own analysis. Some of the biggest differences I have run into include:
Output of tables - SPSS has basic tables, general tables, custom tables, etc that are all output to that nifty data viewer or whatever they call it. These can relatively easily be transported to Word Documents or Excel sheets for further analysis / presentation. The equivalent function in R involves learning LaTex or using a odfWeave or Lyx or something of that nature.
Labeling of data --> SPSS does a pretty good job with the variable labels and value labels. I haven't found a robust solution for R to accomplish this same task.
You mention that you are going to be scripting most of your work, and personally I find SPSS's scripting syntax absolutely horrendous, to the point that I've stopped working with SPSS whenever possible. R syntax seems much more logical and follows programming standards more closely AND there is a very active community to rely on should you run into trouble (SO for instance). I haven't found a good SPSS community to ask questions of when I run into problems.
Others have pointed out some of the big differences in terms of cost and functionality of the programs. If you have to collaborate with others, their comfort level with SPSS or R should play a factor as you don't want to be the only one in your group that can work on or edit a script that you wrote in the future.
If you are going to be learning R, this post on the stats exchange website has a bunch of great resources for learning R: https://stats.stackexchange.com/questions/138/resources-for-learning-r
The initial workflow for SPSS involves justifying writing a big fat cheque. R is freely available.
R has a single language for 'scripting', but don't think of it like that, R is really a programming language with great data manipulation, statistics, and graphics functionality built in. SPSS has 'Syntax', 'Scripts' and is also scriptable in Python.
Another biggie is that SPSS squeezes its data into a spreadsheety table structure. Dealing with other data structures is probably very hard, but comes naturally to R. I wouldn't know where to start handling network graph type data in SPSS, but there's a package to do it for R.
Also with R you can integrate your workflow with your reporting by using Sweave - you write a document with embedded bits of R code that generate plots or tables, run the file through the system and out comes the report as a PDF. Great for when you want to do a weekly report, or you do a body of work and then the boss gives you an updated data set. Re-run, read it over, its done.
But you know, your call...
Well, are you a decent programmer? If you are, then it's worthwhile to learn R. You can do more with your data, both in terms of manipulation and statistical modeling, than you can with SPSS, and your graphs will likely be better too. On the other hand, if you've never really programmed before, or find the idea of spending several months becoming a programmer intimidating, you'll probably get more value out of SPSS. The level of stuff that you can do with R without diving into its power as a full-fledged programming language probably doesn't justify the effort.
There's another option -- collaborate. Do you know someone you can work with on your project (you don't say whether it's academic or industry, but either way...), who knows R well?
There's an interesting (and reasonably fair) comparison between a number of stats tools here
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/
I work with both in a company and can say the following:
If you have a large team of different people (not all data scientists), SPSS is useful because it is plain (relatively) to understand. For example, if users are going to run a model to get an output (sales estimates, etc), SPSS is clear and easy to use.
That said, I find R better in almost every other sense:
R is faster (although, sometimes debatable)
As stated previously, the syntax in SPSS is aweful (I can't stress this enough). On the other hand, R can be painful to learn, but there are tons of resources online and in the end it pays much more because of the different things you can do.
Again, like everyone else says, the sky is the limit with R. Tons of packages, resources and more importantly: indepedence to do as you please. In my organization we have some very high level functions that get a lot done. The hard part is creating them once, but then they perform complicated tasks that SPSS would tangle in a never ending web of canvas. This is specially true for things like loops.
It is often overlooked, but R also has plenty of features to cooperate between teams (github integration with RStudio, and easy package building with devtools).
Actually, if everyone in your organization knows R, all you need is to maintain a basic package on github to share everything. This of course is not the norm, which is why I think SPSS, although a worst product, still has a market.
I have not data for it, but from my experience I can tell you one thing:
SPSS is a lot slower than R. (And with a lot, I really mean a lot)
The magnitude of the difference is probably as big as the one between C++ and R.
For example, I never have to wait longer than a couple of seconds in R. Using SPSS and similar data, I had calculations that took longer than 10 minutes.
As an unrelated side note: In my eyes, in the recent discussion on the speed of R, this point was somehow overlooked (i.e., the comparison with SPSS). Furthermore, I am astonished how this discussion popped up for a while and silently disappeared again.
There are some great responses above, but I will try to provide my 2 cents. My department completely relies on SPSS for our work, but in recent months, I have been making a conscious effort to learn R; in part, for some of the reasons itemized above (speed, vast data structures, available packages, etc.)
That said, here are a few things I have picked up along the way:
Unless you have some experience programming, I think creating summary tables in CTABLES destroys any available option in R. To date, I am unaware package that can replicate what can be created using Custom Tables.
SPSS does appear to be slower when scripting, and yes, SPSS syntax is terrible. That said, I have found that scipts in SPSS can always be improved but using the EXECUTE command sparingly.
SPSS and R can interface with each other, although it appears that it's one way (only when using R inside of SPSS, not the other way around). That said, I have found this to be of little use other than if I want to use ggplot2 or for some other advanced data management techniques. (I despise SPSS macros).
I have long felt that "reporting" work created in SPSS is far inferior to other solutions. As mentioned above, if you can leverage LaTex and Sweave, you will be very happy with your efficient workflows.
I have been able to do some advanced analysis by leveraging OMS in SPSS. Almost everything can be routed to a new dataset, but I have found that most SPSS users don't use this functionality. Also, when looking at examples in R, it just feels "easier" than using OMS.
In short, I find myself using SPSS when I can't figure it out quickly in R, but I sincerely have every intention of getting away from SPSS and using R entirely at some point in the near future.
SPSS provides a GUI to easily integrate existing R programs or develop new ones. For more info, see the SPSS Community on IBM Developer Works.
#Henrik, I did the same task you have mentioned (C++ and R) on SPSS. And it turned out that SPSS is faster compared to R on this one. In my case SPSS is aprox. 7 times faster. I am surprised about it.
Here is a code I used in SPSS.
data list free
/x (f8.3).
begin data
1
end data.
comp n = 1e6.
comp t1 = $time.
loop #rep = 1 to 10.
comp x = 1.
loop #i=1 to n.
comp x = 1/(1+x).
end loop.
end loop.
comp t2 = $time.
comp elipsed = t2 - t1.
form elipsed (f8.2).
exe.
Check out this video why is good to combine SPSS and R...
Link
http://bluemixanalytics.wordpress.com/2014/08/29/7-good-reasons-to-combine-ibm-spss-analytics-and-r/
If you have a compatible copy of R installed, you can connect to it from IBM SPSS Modeler and carry out model building and model scoring using custom R algorithms that can be deployed in IBM SPSS Modeler. You must also have a copy of IBM SPSS Modeler - Essentials for R installed. IBM SPSS Modeler - Essentials for R provides you with tools you need to start developing custom R applications for use with IBM SPSS Modeler.
The truth is: both packages are useful if you do data analysis professionally. Sure, R / RStudio has more statistical methods implemented than SPSS. But SPSS is much easier to use and gives more information per each button click. And, therefore, it is faster to exploit whenever a particular analysis is implemented in both R and SPSS.
In the modern age, neither CPU nor memory is the most valuable resource. Researcher's time is the most valuable resource. Also, tables in SPSS are more visually pleasing, in my opinion.
In summary, R and SPSS complement each other well.

In what areas of programming is a knowledge of mathematics helpful? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
For example, math logic, graph theory.
Everyone around tells me that math is necessary for programmer. I saw a lot of threads where people say that they used linear algebra and some other math, but no one described concrete cases when they used it.
I know that there are similar threads, but I couldn't see any description of such a case.
Computer graphics.
It's all matrix multiplication, vector spaces, affine spaces, projection, etc. Lots and lots of algebra.
For more information, here's the Wikipedia article on projection, along with the more specific case of 3D projection, with all of its various matrices. OpenGL, a common computer graphics library, is an example of applying affine matrix operations to transform and project objects onto a computer screen.
I think that a lot of programmers use more math than they think they do. It's just that it comes so intuitively to them that they don't even think about it. For instance, every time you write an if statement are you not using your Discrete Math knowledge?
In graphic world you need a lot of transformations.
In cryptography you need geometry and number theory.
In AI, you need algebra.
And statistics in financial environments.
Computer theory needs math theory: actually almost all the founders are from Maths.
Given a list of locations with latitudes and longitudes, sort the list in order from closest to farthest from a specific position.
All applications that deal with money need math.
I can't think of a single app that I have written that didn't require math at some point.
I wrote a parser compiler a few months back, and that's full of graph-theory. This was only designed to be slightly more powerful than regular expressions (in that multiple matches were allowed, and some other features were added), but even such a simple compiler requires loop detection, finite state automata, and tons more math.
Implementing the Advanced Encryption Standard (AES) algorithm required some basic understanding of finite field math. See act 4 of my blog post on it for details (code sample included).
I've used a lot of algebra when writing business apps.
Simple Examples
BMI = weight / (height * height);
compensation = 10 * hours * ((pratio * 2.3) + tratio);
A few years ago, I had a DSP project that had to compute a real radix-2 FFT of size N, in a given time. The vendor-supplied real radix-2 FFT wouldn't run in the allocated time, but their complex FFT of size N/2 would. It is easy to feed the real data into the complex FFT. Getting the answers out afterwards is not so easy: it is called post-weaving, or post-unweaving, or unweaving. Deriving the unweave equations from the FFT and complex number theory was not fun. Going from there to tightly-optimized DSP code was equally not fun.
Naturally, the signal I was measuring did not match the FFT sample size, which causes artifacts. The standard fix is to apply a Hanning window. This causes other artifacts. As part of understanding (and testing) that code, I had to understand the artifacts caused by the Hanning window, so I could interpret the results and decide whether the code was working or not.
I've used tons of math in various projects, including:
Graph theory for dealing with dependencies in large systems (e.g. a Makefile is a kind of directed graph)
Statistics and linear regression in profiling performance bottlenecks
Coordinate transformations in geospatial applications
In scientific computing, project requirements are often stated in algebraic form, especially for computationally intensive code
And that's just off the top of my head.
And of course, anything involving "pure" computer science (algorithms, computational complexity, lambda calculus) tends to look more and more like math the deeper you go.
In answering this image-comparison-algorithm question, I drew on lots of knowledge of math, some of it from other answers and web searches (where I had to apply my own knowledge to filter the information), and some from my own engineering training and lengthy programming background.
General Mindforming
Solving Problems - One fundamental method of math, independent of the area, is transofrming an unknown problem into a known one. Even if you don't have the same problems, you need the same skill. In math, as in programming, virtually everything has different representations. Understanding the equivalence between algorithms, problems or solutions that are completely different on the surface helps you avoid the hard parts.
(A similar thing happens in physics: to solve a kinematic problem, choice of the coordinate system is often the difference between one and ten pages full of formulas, even though problem and solution are identical.)
Precision of Language / Logical reasoning - Math has a very terse yet precise language. Learning to deal with that will prepare you for computers doing what you say, not what you meant. Also, the same precision is required to analyse if a specification is sufficient, to check a piece of code if it covers all possible cases, etc.
Beauty and elegance - This may be the argument that's hardest to grasp. I found the notion of "beauty" in code is very close to the one found in math. A beautiful proof is one whose idea is immediately convincing, and the proof itself is merely executing a sequence of executing the next obvious step.
The same goes for an elegant implementation.
(Most mathematicians I've encountered have a faible for putting the "Aha!" - effect at the end rather than at the beginning. As have most elite geeks).
You can learn these skills without one lesson of math, of course. But math ahs perfected this for centuries.
Applied Skills
Examples:
- Not having to run calc.exe for a quick estimation of memory requirements
- Some basic statistics to tell a valid performance measurement from a shot in the dark
- deducing a formula for a sequence of values, rather than hardcoding them
- Getting a feeling for what c*O(N log N) means.
- Recursion is the same as proof by inductance
(that list would probably go on if I'd actively watch myself for items for a day. This part is admittedly harder than I thought. Further suggestions welcome ;))
Where I use it
The company I work for does a lot of data acquisition, and our claim to fame (comapred to our competition) is the brain muscle that goes into extracting something useful out of the data. While I'm mostly unconcerned with that, I get enough math thrown my way. Before that, I've implemented and validated random number generators for statistical applications, implemented a differential equation solver, wrote simulations for selected laws of physics. And probably more.
I wrote some hash functions for mapping airline codes and flight numbers with good efficiency into a fairly limited number of data slots.
I went through a fair number of primes before finding numbers that worked well with my data. Testing required some statistics and estimates of probabilities.
In machine learning: we use Bayesian (and other probabilistic) models all the time, and we use quadratic programming in the form of Support Vector Machines, not to mention all kinds of mathematical transformations for the various kernel functions. Calculus (derivatives) factors into perceptron learning. Not to mention a whole theory of determining the accuracy of a machine learning classifier.
In artifical intelligence: constraint satisfaction, and logic weigh very heavily.
I was using co-ordinate geometry to solve a problem of finding the visible part of a stack of windows, not exactly overlapping on one another.
There are many other situations, but this is the one that I got from the top of my head. Inherently all operations that we do is mathematics or at least depends on/related to mathematics.
Thats why its important to know mathematics to have a more clearer understanding of things :)
Infact in some cases a lot of math has gone into our common sense that we don't notice that we are using math to solve a particular problem, since we have been using it for so long!
Thanks
-Graphics (matrices, translations, shaders, integral approximations, curves, etc, etc,...infinite dots)
-Algorithm Complexity calculations (specially in line of business' applications)
-Pointer Arithmetics
-Cryptographic under field arithmetics etc.
-GIS (triangles, squares algorithms like delone, bounding boxes, and many many etc)
-Performance monitor counters and the functions they describe
-Functional Programming (simply that, not saying more :))
-......
I used Combinatorials to stuff 20 bits of data into 14 bits of space.
Machine Vision or Computer Vision requires a thorough knowledge of probability and statistics. Object detection/recognition and many supervised segmentation techniques are based on Bayesian inference. Heavy on linear algebra too.
As an engineer, I'm trying really hard to think of an instance when I did not need math. Same story when I was a grad student. Granted, I'm not a programmer, but I use computers a lot.
Games and simulations need lots of maths - fluid dynamics, in particular, for things like flames, fog and smoke.
As an e-commerce developer, I have to use math every day for programming. At the very least, basic algebra.
There are other apps I've had to write for vector based image generation that require a strong knowledge of Geometry, Calculus and Trigonometry.
Then there is bit-masking...
Converting hexadecimal to base ten in your head...
Estimating load potential of an application...
Yep, if someone is no good with math, they're probably not a very good programmer.
Modern communications would completely collapse without math. If you want to make your head explode sometime, look up Galois fields, error correcting codes, and data compression. Then symbol constellations, band-limited interpolation functions (I'm talking about sinc and raised-cosine functions, not the simple linear and bicubic stuff), Fourier transforms, clock recovery, minimally-ambiguous symbol training sequences, Rayleigh and/or Ricean fading, and Kalman filtering. All of those involve math that makes my head hurt bad, and I got a Masters in Electrical Engineering. And that's just off the top of my head, from my wireless communications class.
The amount of math required to make your cell phone work is huge. To make a 3G cell phone with Internet access is staggering. To prove with sufficient confidence that an algorithm will work in most all cases sometimes takes people's careers.
But... if you're only ever going to work with this stuff as black boxes imported from a library (at their mercy, really), well, you might get away with just knowing enough algebra to debug mismatched parentheses. And there are a lot more of those jobs than the hard ones... but at the same time, the hard jobs are harder to find a replacement for.
Examples that I've personally coded:
wrote a simple video game where one spaceship shoots a laser at another ship. To know if the ship was in the laser's path, I used basic algebra y=mx+b to calculate if the paths intersect. (I was a child when I did this and was quite amazed that something that was taught on a chalkboard (algebra) could be applied to computer programming.)
calculating mortgage balances and repayment schedules with logarithms
analyzing consumer buying choices by calculating combinatorics
trigonometry to simulate camera lens behavior
Fourier Transform to analyze digital music files (WAV files)
stock market analysis with statistics (linear regressions)
using logarithms to understand binary search traversals and also disk space savings when using packing information into bit fields. (I don't calculate logarithms in actual code, but I figure them out during "design" to see if it's feasible to even bother coding it.)
None of my projects (so far) have required topics such as calculus, differential equations, or matrices. I didn't study mathematics in school but if a project requires math, I just reference my math books and if I'm stuck, I search google.
Edited to add: I think it's more realistic for some people to have a programming challenge motivate the learning of particular math subjects. For others, they enjoy math for its own sake and can learn it ahead of time to apply to future programming problems. I'm of the first type. For example, I studied logarithms in high school but didn't understand their power until I started doing programming and all of sudden, they seem to pop up all over the place.
The recurring theme I see from these responses is that this is clearly context-dependent.
If you're writing a 3D graphics engine then you'd be well advised to brush up on your vectors and matrices. If you're writing a simple e-commerce website then you'll get away with basic algebra.
So depending on what you want to do, you may not need any more math than you did to post your question(!), or you might conceivably need a PhD (i.e. you would like to write a custom geometry kernel for turbine fan blade design).
One time I was writing something for my Commodore 64 (I forget what, I must have been 6 years old) and I wanted to center some text horizontally on the screen.
I worked out the formula using a combination of math and trial-and-error; years later I would tackle such problems using actual algebra.
Drawing, moving, and guidance of missiles and guns and lasers and gravity bombs and whatnot in this little 2d video game I made: wordwarvi
Lots of uses sine/cosine, and their inverses, (via lookup tables... I'm old, ok?)
Any geo based site/app will need math. A simple example is "Show me all Bob's Pizzas within 10 miles of me" functionality on a website. You will need math to return lat/lons that occur within a 10 mile radius.
This is primarily a question whose answer will depend on the problem domain. Some problems require oodles of math and some require only addition and subtraction. Right now, I have a pet project which might require graph theory, not for the math so much as to get the basic vocabulary and concepts in my head.
If you're doing flight simulations and anything 3D, say hello to quaternions! If you're doing electrical engineering, you will be using trig and complex numbers. If you're doing a mortgage calculator, you will be doing discrete math. If you're doing an optimization problem, where you attempt to get the most profits from your widget factory, you will be doing what is called linear programming. If you are doing some operations involving, say, network addresses, welcome to the kind of bit-focused math that comes along with it. And that's just for the high-level languages.
If you are delving into highly-optimized data structures and implementing them yourself, you will probably do more math than if you were just grabbing a library.
Part of being a good programmer is being familiar with the domain in which you are programming. If you are working on software for Fidelity Mutual, you probably would need to know engineering economics. If you are developing software for Gallup, you probably need to know statistics. LucasArts... probably Linear Algebra. NASA... Differential Equations.
The thing about software engineering is you are almost always expected to wear many hats.
More or less anything having to do with finding the best layout, optimization, or object relationships is graph theory. You may not immediately think of it as such, but regardless - you're using math!
An explicit example: I wrote a node-based shader editor and optimizer, which took a set of linked nodes and converted them into shader code. Finding the correct order to output the code in such that all inputs for a certain node were available before that node needed them involved graph theory.
And like others have said, anything having to do with graphics implicitly requires knowledge of linear algebra, coordinate spaces transformations, and plenty of other subtopics of mathematics. Take a look at any recent graphics whitepaper, especially those involving lighting. Integrals? Infinite series?! Graph theory? Node traversal optimization? Yep, all of these are commonly used in graphics.
Also note that just because you don't realize that you're using some sort of mathematics when you're writing or designing software, doesn't mean that you aren't, and actually understanding the mathematics behind how and why algorithms and data structures work the way they do can often help you find elegant solutions to non-trivial problems.
In years of webapp development I didn't have much need with the Math API. As far as I can recall, I have ever only used the Math#min() and Math#max() of the Math API.
For example
if (i < 0) {
i = 0;
}
if (i > 10) {
i = 10;
}
can be done as
i = Math.max(0, Math.min(i, 10));

What useful R package doesn't currently exist? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I have been working on a few R packages for some general tools that aren't currently available in R: blogging, report delivery, logging, and scheduling. This led me to wonder: what are the most important things that people wish existed in R that currently aren't available?
My hope is that we can use this to pinpoint some gaps, and possibly work on them collaboratively.
I'm a former Mathematica junkie, and one thing that I really miss is the notebook style interface. When I did my research with notebooks, papers would almost write themselves as I did my analysis. But now that I'm using R, I find that documenting my work to be quite tedious.
For people that are not so familiar with Mathematica, you have documents called "notebooks" that can contain code, text, equations, and the results from executed code (which can be equations, text, graphics, or interactive tools). Everything can be neatly organized into styled subsections or sections that are collapsable. You can have multiple open documents that integrate with a single shared kernel.
While I don't think a full-blown Mathematica style interface is entirely necessary, some interactive document system that would support text (for description), code, code output, and embedded image output would be a real boon to researchers.
A Real-Time R package would be my choice, using C Streaming perhaps.
Also I'd like a more robust web development package. Nothing as extensive as Ruby on Rails but something a bit better than Sweave combined with R2HTML, that can run on RApache. I think this needs to be a huge area of emphasis for R in general.
I realize LaTeX is better markup for certain academia but in general I think HTML should be the markup language of choice. More needs to be done in terms of R Web Apps, so applications can be hosted on huge RAM remotely and R can start being used for SaaS data applications and other graphics choices.
Interfaces to any of the new-fangled 'Web 2.0' databases that use key-value pairs rather than the standard RDMS. A non-exhaustive list (in alphabetical order) would be
Cassandra Project
CouchDB
MongoDB
Project Voldemort
Redis
Tokyo Cabinet
and it would of course be nice if we had a DBI-alike abstraction on top of this. Jeff has started with RBerkeley but that use the older-school Oracle BerkeleyDB backend rather than one of those new things.
An output device which produces Javascript code, perhaps using the protovis library.
as a programmer and writer of libraries for colleagues, I was definitely missing a logging package, I googled and asked around, here too, then wrote one myself. it is on r-forge, here, and it s called "logging" :)
I use it and I'm obviously still developing it.
There are few libraries to interface with database in general, and there is not ORM library.
RMySQL is useful, but you have to write the SQL queries manually and there is not a way to generate them as in a ORM. Morevoer, it is only specific to MySQL.
Another library set that R still doesn't have, for me, it is a good system for reading command line arguments: there is R getopt but it is nothing like, for example, argparse in python.
A natural interface to the .NET framework would be awesome, though I suspect that that might be a lot of work.
EDIT:
Syntax highlighting from within RGui would also be wonderful.
ANOTHER EDIT:
R.NET now exists to integrate R with .NET.
A FRAQ package for FRequently Asked Questions, a la fortune(). R-help would be so much fun: "Try this, library(FRAQ); faq("lattice won't print"), etc.
See also.
A wiki package that adds wiki-like documentation to R packages. You'd have a inst/wiki subdirectory with plain text files in markdown, asciidoc, textile, with embedded R code. With the right incantation, these files would be executed (think brew and/or asciidoc packages), and the relevant output uploaded to a given repository online (github, googlecode, etc.). Another function could take care of synchronizing the changes made online, typically via svn or git.
Suddenly you have a wiki documentation for your package with reproducible examples (could even be hooked to R CMD check).
EDIT 2012:
... and now the knitr package would make this process even easier and neater
I would like to see a possibility to embed another programming language within R in a more straightforward way by the users. I give this as an example in some common-lisp implementations one could write a function with embedded C code like this:
(defun sample (x)
(ffi:c-inline (n1 n2) (:int :int) (values :int :int) "{
int n1 = #0, n2 = #1, out1 = 0, out2 = 1;
while (n1 <= n2) {
out1 += n1;
out2 *= n1;
n1++;
}
#(return 0)= out1;
#(return 1)= out2;
}"
:side-effects nil))
It would be good if one could write an R function with embedded C or lisp code (more interested in the latter) in a similar way.
A native .NET interface to RGUI. R(D)Com is based on COM, and it only allows to exchange matrices, not more complex structures.
I would very much like a line profiler. This exists in Matlab and Python, and is very useful for finding bits of code that take a lot of time or are executed more (or less) than expected. A lot of my code involves function optimizations and how many times something iterates may not be known in advance (though most iterations are constrained or specified).
The call stack is useful if all of your code is in R and is very simple, but as I recently posted about it, it takes a painstaking effort if your code is complex.
It's quite easy to develop a line profiler for a given bit of code. A naive way is to index every line (or just pre-specified sections) and insert a call to log proc.time() that line. In a loop, I simply enumerate sections of code and store in a 2 dimensional list the proc.time values for section i in iteration k. [See update below: this isn't actually a way to do a line profiler for all kinds of code.]
One can use such a tool to find hotspots, anomalies (e.g. code that should be O(n) but is really O(n^2)), code that may benefit from memoization (a line profiler doesn't tell you this, but it lets you know where to look), code that is mistakenly inside a loop, and more.
Update 1: Inserting a timing line between every function line is slightly erroneous: the definition of a line of code is not simply code separated by whitespace. Being able to parse the code into an AST is necessary for knowing where operations begin and end. As discussed in some of the answers to this question, there are some tools (namely, showTree and walkCode in the codetools package) for doing this. Simply applying a regular expression to source code would be a very bad thing to do.

Resources