Statistical tools for programmers [closed] - math

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm trying to evaluate the purchase of a statistical tool. This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation. Of course, cost is an issue, but if I can build a solid case, we could probably buy a commercial package, so we're not totally limited to free options.
So far, our options are:
Statistica (which some non-programmers already know)
Matlab Statistics toolbox (programmers already use matlab)
R language (would need a UI for non-programmers)
Hack something into Excel (not fun, but that's what non-programmers do right now)
?...
What else is out there? What's the industry standard? What kind of distinctive features should I look for? What would you recommend, and why?
Ideally, we'd like a tool that can run both on Linux and Windows machines.
(I work in medical imaging, so we do both biostatistics, and software engineering statistics)

Hands down it's R. R is very programmer friendly. It has functional aspects and it's GNU.
S-PLUS and R are both based off the S language. Both are similar and in most cases you can run as S-PLUS program in R and vice versa.
SAS is another option, although geared more towards BI and enterprise. SAS has a simpler syntax than R and in my opinion is easier to pickup for a non-programmer.
Other options include SPSS, Matlab, and even Excel.

I recommend R, personally. It's used by bioinformaticians and psychologists, I hear. Don't know what your field is though, so maybe it's a lousy choice. It is reasonably easy to use and learn.

Stata and SPSS tend to be the most commonly used packages in clinical studies. Both are pretty easy to pick up and use for non-technically minded folks but are generally flexible enough. I've used Stata more than any of the others and have been pretty happy with its options (supports both menu-based and command line operation, decent enough plugin system to get new user-created modules, good graphing support).
R is a little more daunting for newbie users, though it is popular with the biostatisticians. Since it's free, that's another nice point in its favor.

For a statistical package with a GUI which non-technical users can use, I would recommend that you go with "SAS Enterprise Guide". You will get the common and advanced SAS procedures, an excellent graphics facility and the ability to program for the technical users. I recommend that you start with the "SAS Learning Edition" (http://support.sas.com/learn/le/) which is a fully functional version of Enterprise Guide, but limited to processing 1000 rows at a time only. It is under $500, which makes it a pretty good deal.

I would look at S-Plus.
You get a strong programming environment (S-Plus Workbench, based upon the Eclipse platform), an intuitive GUI for non-programmers, and an extensive user community (including users of R, which was based upon the original S).

Visual Numerics is another option.

It sounds like you're trying to maximize multiple goals. You say "This will be used in part by non-programming users (doing clinical studies) and in part by programmers, so I'm trying to find a good compromise between usability and automation", with an implicit assumption that this will be the same tool in both cases, when that might not be realistic. What's the compromise for Word and LaTeX, for example?
Some different questions about the requirements:
Should it be extensible for programmers
Able to use C extensions
Easy to make new procedures and methods
What analysis are non-programmers going to want to use?
Graphics?
Ease of use for different groups
So my read on this:
Easy to extend: R/S-plus, Matlab/Octave (I happen to prefer R, but I do more stats and fewer matrix things)
Easy to use for normal people: Excel, custom wrapped R, SPSS
Also, R on windows has a limited GUI, which may or may not help your users.
If it was me, I'd go with a hybrid solution. Use R, and give a cheat sheet for for common tasks to non-programmers that illustrates common tasks, or even better, write some wrapper functions with names like "image_summary" that automate their exploratory work.
For writing front end scripts for R, the RPy python wrappers might help as well.

SAS Enterprise Guide has good usability for non-programmers. Also, it has good options to connect to Excel. And for programmers, it's the most robust option out there. The sas server runs on anything, though, enterprise guide is Windows only.

Consider Excel one more time. It is well known, and widely available. Refer this book or this book.

This Wikipedia page compares the features available for several statistical packages, as well as their OS compatibility and pricing info (which seems a little out of date, but it gives an overall idea)

We ended up getting the Matlab Statistics toolbox (mainly because we already have some experience with Matlab in the team, and needed the tool anyway)
So far, it's doing what we need to do, and it's easily expansible. Usage will show if non-programmers really use it, but so far it's looking good.

Related

R vs Pentaho Spoon as an ETL tool [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Background (sorry it's so long):
I've been tasked with maintaining an ETL that collects a variety of online advertising data, around 20-30 MBs a day, and appends it to tables in MySQL. Outside contractors built the ETL with Pentaho Spoon (kitchen, kettle?). The ETL consists of about 250 jobs and transformations (.ktr,.kjb), each with about 5 to 25 steps. It is very common that something is going wrong in this large process. I've found that writing R scripts to do the transform and load is much more efficient. In fact, I think the ETL could be reduced to well under 1000 lines of code besides calls with RMySQL (i.e. plyr!). Perhaps Python would be used to extract the data from the web.
My use of R has led to some resistance. The computer programmers that designed the ETL don't know R so couldn't be called if I leave, and moreover a lot of time was invested in the Spoon ETL. Also, a layman can more easily follow the steps visually in Spoon, than in the R scripts. For my part, I think we are getting bogged down by the ETL. However, I don't have a large say in the matter as I don't have a background in computer science.
Please comment if you have any insights on the following. Please know I have been researching this for months and have read many opinions, but nothing as concise or reliable as SO usually provides:
R has been called not as scalable by some at the company. I think the opposite mostly because of the logging capabilities. Spoon has limited pure logging output, whereas all R scripts can be sinked into a daily log. Fixing and avoiding mistakes in the .ktrs is very tedious, but easy with setting flags and/or searching through the R log. Any thoughts on this?
This leads to a big picture question. What is the point of ETLs like Pentaho? This post Do I need a ETL?, leads me to believe that if you use R or other so-called OOL, there is no reason to have a tool like Pentaho. Can someone please confirm this if so? I really need a second opinion here. If this is so who uses tools like Pentaho? Is it simply people without the programming background, or someone else? I do see a fair amount of Pentaho questions on SO.
It is true that a lot more people use R and than Pentaho, right? This http://www.kdnuggets.com/2012/05/top-analytics-data-mining-big-data-software.html makes it look so. To be honest I was surprised that Pentaho was 5th, which makes me doubly wonder who uses Pentaho and if my doubts about it's use in my work setting are misplaced.
Thanks for any responses. I don't mean any condescension towards Spoon or Spoon users; I am just really confused and in need of outside opinions.
R as an ETL tool? Thats a new one, but whatever floats your boat.
I would say this though, if you can get 250 jobs and transformations down to under 1000 lines of R I would say your ETL is poorly written.
Along with this you have to think about supportability and scalability. Both of which I would imagine would be far easier with a graphical tool like Spoon rather than R code.
Personally I think you are misguided and the question you ask is poorly written but thats a different argument.
Regarding your points, PDI's logging is very good and you can log pretty much however you like, all into one large database table if you like a consolidated log.
ETL's wont be going away, even with the advent of the love of unstructured data storage pools like HDFS, also think about data analysis done outside R, if you want reporting or OLAP over the top of your data, it will still need transforming regardless.
Is it true, more people use R vs Pentaho? What sort of question is that? By Pentaho I assume you mean PDI? How can that ever be compared? A data analysis tool vs ETL tool and you want to count users? eh? If on the other hand you mean R vs Pentaho as a whole, then I would guess no.You are looking at a report on R vs Weka and making it fit your ETL argument. That doesn't wash in a month of sundays.
==EDIT==
Okay so you have around 1000 lines of R & Python code currently. As your bosses requirements expand this slowly grows over time, and because you are trying to hit deadlines the new code is written as cleanly or as well documented as the code you currently have in place. So over time this grows to 5000 lines say plus a few python scripts. Then one day you get hit by a bus, and some new person has to come in and manage your code... where do they start, how to they make changes?
Virtually anyone with a modicum of data experience could make a change to a PDI ETL should they be required to. Where as it would take some with enough in depth R knowledge to make changes to what you have done.
ETL tools are designed to be quick and easy to use, they also offer far more than R can provide in terms of data connectivity to different systems (non db or file based, for example), although I guess this is why people resort to python etc.
That said there is room for both, there is an R plugin for PDI kicking around in the community I've seen demonstrated.
On top of that I've seen enough TSQL to ETL migrations over the years to know from experience, that even though maintaining your ETL in code may seem practical in the short term, in the long term it just brings more pain.
On the other hand if you can code 250 PDI transformations down to 1000 lines of R, your ETL is likely bloated through bad design by your predecessor.
If you'd like me to give an opinion on your existing PDI ETL structure, that can also be arranged.
Tom

A prototyping language with the ability to be fast [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
as an engineering student with a strong mathemathical background, i dealing some problems like this at university:
(numerical) Simulations
AI Problems
Robotics
Control Systems
and some more
as you can see some are just numerical ones, others have to process some kinds of symbols.
currently i'm working with java, but i'm not very pleased with it (can't say exactly why, probably a personal taste) and now i'm searching for a programming language, in which i can easily prototype new algorithms, like for example in python, and don't care about low level stuff, but has the ability to speed things up if neccessary, e.g. with concurrent/parallel programming, etc. (writing it in python and rewrite it in C/C++ isn't really a option i prefer...)
to sum it up:
easy to prototype, but
the ability to speed algorithms up
syntax without boilerplate stuff like in java
syntax which is easy to read (i know this could be achived with the most, but some language encourage you more...)
i've looked around at sites, like http://rosettacode.org/ and picked 2 or 3 favorites: Go, Lisp (and maybe Haskell) but other recommandations are welcome
Common Lisp using SBCL is pretty fast if you take the time to make it fast.
Why does this fit what you want?
symbolic computations
good number handling
compiles to native on demand by default.
I would use python together with cython: http://www.cython.org for speeding up your code. For symbolic computations you have http://code.google.com/p/sympy/
Try Clojure; it fulfills most of your requirements.
Uses Java libraries, compiles to Java bytecode, and has plugins for Java IDEs, so some of your existing knowledge about Java and its ecosystem will come in handy.
Very concise, readable, and ease of prototyping is extremely high.
Great support for different concurrency strategies.
Performance is getting better fast; typical stuff is within a speed factor of 2 of Java, and slow things can typically be made fast with minimally confounding changes (e.g. a few type hints here and there to use Java primitives.)
An alternative to Common Lisp would be a implementation of scheme. My favorite so far is Racket.
http://racket-lang.org/
When I first got into Lisp I started with scheme and ended up being able to learn it within a matter of days. Also Lisp-wise Racket is a pretty complete language and has a decent IDE in DrRacket.
How about F#?
F# is a remarkable language for prototyping for the following reasons:
F# has an interactive mode allowing you to evaluate blocks of code directly, without compiling your entire project.
Type inference helps keep code small, and makes refactoring your type hierarchy relatively painless. This may not be so important in production code, but I found that to be very valuable during prototyping.
F# integration with .NET makes it easy to prototype extensions of your existing products. In the all-too-common case when a prototype becomes a product (due to time constraints), it's also easy to integrate your F# code within your .NET product.
If prototyping makes up a significant part of your overall development process, then F# can really help you speed up your coding.
I don't think F# will produce code that is significantly faster than other .NET languages. The functional style of programming, in particular purity (no side-effects), can be applied to other programming languages, meaning it is just as easy to write concurrent programs in other languages as well. It does however "feel more natural" to do so in F#.
F# has the Option type, which can be used in place of null values. Code reliability with respect to null-pointer exceptions can be guaranteed at compile time, which is a huge benefit.
Finally, be advised that F# is still in development, and suffers issues, some of which may disappear over time, but not all. See for instance what devhawk and Oliver Sturm have to say about it (in particular about linear scoping and interdependent classes, other issues like overloading, better Visual Studio integration have already been addressed).
this is stated in article: https://stackoverflow.com/questions/328329/why-should-i-use-f
by JOH

What is a good language to develop in for simple, yet customizable math programs?

I'm writing to ask for some guidance on choosing a language and course of action in learning programming. I apologize if this type of question is inappropriate for Cross Validated, please advise me to another forum if that is the case.
I've seen thread after thread with questions from newbies, asking, "What is the best language to start with?" and then it always starts a flame war or someone just answers, "There's no best language, it's best to pick one and start learning it." My question is a little bit more focused than that.
First off, I've been programming my whole life, in very limited capacities. My deepest training was in C++. Whilst in my EECS degree program, I resolved to never be a software developer because I couldn't stand not interacting with people for such long periods of time. Instead I realized I wanted to be a math teacher, and so that is the path I have taken.
But now that I'm well down that path, I've started to realize that perhaps I could develop my own software to help me in the classroom. If I want to demonstrate the Euclidean algorithm, what better way than to have a piece of software that breaks down the process? Students could run that software as part of their studies, and the advanced students might even develop programs for themselves. Or, with an Ipad in hand, why not have an app that lets students take their own attendance? It would certainly streamline some of the needs of classroom management.
There's obviously a lot of great stuff already out there for math, and for education, but I want a way to more directly create things specific to my lectures. If I'm teaching a specific way of calculating a percent, I want to create an app that aligns with my teaching style, not just another calculator app that requires the student to learn twice.
The most I use in class right now is iWork Numbers/Microsoft Excel for my stats class. Students can learn the basic statistical functions, and turn some of their data into graphs.
I have dabbled a bit with R, and used Maple in college. I've started the basic tutorials for OS X/iOS development and have actually made good progress making an OS X app that takes a text string, converts it to numbers, and performs encryption using modular addition and multiplication. I sometimes use Wolfram|Alpha to save myself some time in getting quick solutions to equations or base conversions. I know of MatLab, Mathematica, and recently people have been telling me to check into Python or Ruby. I also know basic HTML, and while it's forgotten now, learned Javascript and PERL in college.
If I keep on the path of Obj-C/Cocoa, I think it will have great benefits. Unfortunately, anything I produced for Mac would only be usable on a Mac, so it wouldn't be universal for all of my students. Perhaps then learning a web language would be better. Second, I'm wondering if the primary use is mathematical, then perhaps my time would be better spent learning Mathematica Programming Language, or R, or something based less on GUI and more on simple coding of algorithms, maybe Python or Ruby?
It seems that Mathematica already has a lot of demos for different math concepts, so why reinvent the wheel is also a question I have. I think overall, it would be good to have more control and design things the way I need. And then, if I do want to make an "Attendance" app or something else, I would already have the programming experience to more easily design something for my iPad or MacBook.
The related question to this is what is a good language to teach to my students? In his TED talk, Conrad Wolfram says one of the best ways to check the understanding of a student is have them write a program. But if Mathematica does the math virtually automatically for them, then I'm not sure that will get the deeper experience of working out logic for themselves, like you do when you're writing C, or a traditional procedural language.
I know that programming takes time to learn, but I also know that at this point, my goal is not to be able to make an app like "Tiny Wings." With the app store ease, some of my work may be an extra revenue stream, but I see myself as more of a hobbyist, and now teacher looking to software development specifically for its ability to help me demonstrate mathematical concepts.
I think I will push ahead with Obj-C/Cocoa for OSX/iOS, but if anyone has some better guidance regarding all of the other available stuff, it would be much appreciated. I don't think I would want to go fully to the web (I like apps), but perhaps someone could suggest a nice way of bridging what I produce in XCode to a universal web version. For example, if you come up with an algorithm in obj-c is it easiest to transition that to ruby and run it online, or is there another approach that works better?
Mathematica is pretty awesome for the first part of your question. I've used the interactive mode (Manipulate[]) for explaining things to my colleges (and myself). It makes really nice dynamic figures and is fairly expressive (although your code can end up looking like line noise). It is very powerful, but it does far less for you than you might think. It's pretty intuitive, which is a good thing for teaching.
You could use Scala if you want an "easy" way to make a domain specific language for teaching. Python seems to confuse people as a first programming language. Objective C seems like a completely random choice to me.
Mathematica then. It's worth the price. But anything that is interpreted and has an interactive shell is probably better than a compiled language. BBC BASIC?
Nothing beats Haskell for general-purpose mathematical programming. The wiki's quite extensive and the IRC channel (#haskell on Freenode) is great for asking questions. If you statically link your binaries on compilation, you should be able to run your programs on just about any system (with a few exceptions, e.g., libgmp).
Haskell code reads (roughly) like mathematical notation once you get the hang of it, so it can really help to tie things together for your students who are motivated to write their own programs. The purely functional style can be beneficial, as well, since it focuses less on I/O and the marshalling of data (perfectly useful in applications, perhaps less so in pure math), and more on the actual creation and refinement of functions and algorithms. You can even compose functions just as you would on paper.
If you want to get really serious, you could also look into Coq or Agda, but those might be a bit much for most classes.
For a Haskell program idea for an educator, check out this link.
A nice list of arguments can also be found at:
Eleven Reasons to use Haskell as a Mathematician and the book The Haskell Road to Logic, Maths and Programming

How does software development compare with statistical programming/analysis? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Statistical analysis/programming, is writing code. Whether for descriptive or inferential, You write code to: import data, to clean it, to analyse it and to compile a report.
Analyzing the data can involve many twists and turns of statistical procedures, and angles from which you look at your data. At the end, you have many files, with many lines of code, performing tasks on your data. Some of which is reusable and you capsulate it as a "good to have" function.
This process of "Statistical analysis" feels to me like "programming" But I am not sure it feels the same to everyone.
From the Wikipedia article on Software development:
The term software development is often
used to refer to the activity of
computer programming, which is the
process of writing and maintaining the
source code, whereas the broader sense
of the term includes all that is
involved between the conception of the
desired software through to the final
manifestation of the software.
Therefore, software development may
include research, new development,
modification, reuse, re-engineering,
maintenance, or any other activities
that result in software products.
For larger software systems, usually
developed by a team of people, some
form of process is typically followed
to guide the stages of production of
the software.
According to this simplistic definition (and my humble opinion), this sounds very much like building a statistical analysis. But I imagine it is not that simple.
Which leads me to my question: what differences can you outline between the two activities?
It can be in terms of the technical aspects, the different strategies or work styles, and what ever else you think is relevant.
This question came to me from the following threads:
How do you combine "Revision Control" with "Workflow" for R?
How to organize large R programs?
Workflow for statistical analysis and report writing
As I said in my response to your other question, what you're describing is programming. So the short answer is: there is no difference. The slightly longer answer is that statistical and scientific computing should require even more controls around development than other programming.
A certain percentage of statistical analysis can be done using Excel, or in a point-and-click approach using SPSS, SAS, Matlab, or S-Plus (for instance). A more sophisticated analysis done using one of those programs (or R) that involves programming is clearly a form of software development. And this kind of statistical computing can benefit immensely from following all the best practices from software development: source control, documentation, a project plan, scope document, bug tracking/change control, etc.
Moreover, there are different kinds of statistical analyses that can follow different approaches, as with any programming project:
Exploratory data analysis should follow an iterative methodology, like the Agile methodology. In this case, when you don't know explicity the steps involved up front, it's critical to use a development methodology that is adaptive and self-reflective.
A more routine kind of analysis (e.g. an government annual survey such as the Census) could follow a more traditional methodology such as the waterfall approach since it would be following a very clear set of steps that are mostly known in advance.
I would suggest that any statistician would benefit from reading a book like "Code Complete" (look at the other top books in this post): the more organized you are with your analysis, the greater the likelihood of success.
Statistical analysis in some sense requires even more good practices around version control and documentation than other programming. If your program is just serving some business need, then the algorithm or software used is really of secondary importance so long as the program functions the way the specifications require. On the other hand, with scientific and statistical computing, accuracy and reproducibility are paramount. This is one of John Chambers' (the creator of the S language) major emphases in "Software for Data Analysis". That is another reason to add literate programming (e.g. with Sweave) as an important tool in the statistician's toolkit.
Perhaps the common denominator is "problem solving."
Beyond that, i doubt i doubt i could provide any insight, but i can at least provide a limited answer from personal experience.
This issue arises for us in hiring--i.e., do we hire a programmer and teach them statistics or do we hire a statistics person and teach them to program? Ideally we could find someone fluent in both discipline, and indeed, that's the third net we cast, but rarely with any success.
Here's an example. The most stable distinction between the two activities (software dev & statistical analysis) is probably their respective outputs, or project deliverables. For instance, in my group someone is conducting the statistical analysis on the results of our split-path and factorial experiments (e.g., from the t-test results, whether the difference is significant, or whether the test ought to continue). That analysis will be sent to the marketing department which they'll use to modify the web pages comprising the Site with a view towards improving conversion. A second task involves the abstraction of and partial automation of those analyses so the results can be processed in near-real time.
For the first task, we'll assign a statistician; for the second, a programmer. The business problem we are trying to solve is the same for both tasks, yet for the first, the crux is statistics, for the second, the statistics problems have been largely solved and the crux is a core programming task (I/O).
Notice also how the evolution of the tools associated with the two activities have evolved so the distinction between the two (software dev & data analysis) is further obfuscated: mainstream development languages are being adapted for use as domain-specific analytical tools, at the same time, frameworks continue to be developed which enable the non-developers to quickly build lightweight, task-oriented applications in DSLs.
For instance, python, a general purpose development language has R bindings (RPy2) which along with its native interactive interpreter (IDLE), substantially facilitates Python's use in statistical analysis, while at the same time, there is a clear trend in R package development toward (web) application development: R Bindings for Qt, gWidgetsWWW, and RApache--are all R Packages directed to Client or Web App development, and whose initial release was (i think) w/in the past 18 months. Aside from that, since at least the last quarter of last year, i've noticed an accelerating frequency of blog posts, presentations, etc. on the subject of Web app development in R.
Finally, i wonder if your question is perhaps evidence of the growing popularity of R. Here's what i mean. A decade ago, when my employer purchased a site license, i began learning and using one of the major statistical computing products (no point here in saying which one, it begins with "S"). i found it unnatural and inflexible. Unlike Perl (which i was using at the time) this tool was not an extension of my brain (which isn't an optional attribute of an analytical tool, to me it's more or less the definition of one). Interacting with this System was more like using a vending machine--i selected some statistical function i wanted and then waited for the "output", which was often an impressive set of high-impact, full-color charts and tables. Nearly always though what i wanted was to modify my input or use that output for the next analytical step. That seemed to required another, separate trip to the vending machine. The fact that this tool was context-aware--i.e., it knew statistics--while Perl didn't, didn't compensate for the awkward interaction. Statistical analysis done this way would never be confused with software development. (Again, this is just a summary of my own experience, i don't claim it can be abstracted. It's also not a polemic against any (or all) commercial data analysis platforms--millions use them and they've earned zillions for the people who created them, so let's assume it was my own limitations that caused the failure to bond.)
I had never heard of R until about 18 months ago, and i only discovered it while scanning PyPI (The Web Interface to Python's external package repository) for statistics libraries for python. There i came across RPy, which seemed brilliant but required a dependency called "R" (RPy of course is really just a set of Python bindings to R).
Perhaps R appeals to programmer and non-programmers equally, still for a programmer/analyst, this was a godsend. It hit everything on my wish list for a data analysis platform: an engine based on a full-featured, general programming language (which in this case is a proven scheme descendant), an underlying functional paradigm, built-in interactive interpreter, native data types built from the ground up for data analysis, and the domain knowledge baked in. Data analysis became more like coding. Life was good.
If you are using R, then you'll likely be writing code to solve your statistical questions, so in this sense, statistical analysis is a subset of programming.
On the other hand, there are plenty of SPSS users who have never ventured beyind a bit of pointing and clicking to solve their stats problems. This feels less like programming to me.

What best practices do you use for programming in R? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some good practices for programming in R?
Since R is a special-purpose language that I don't use all the time, I typically just hack together some quick scripts that do what I need.
But what are some tips for writing clean and efficient R code?
You already provide some hints by stating your approach is 'hack quick scripts'. If you want best practices and structure, simple follow the established best practices from CRAN:
create a package, this opens the door to running R CMD check which is very useful
as many people have stated, having a package helps you in the code writing stage too as you are somewhat forced to document the code; that is a Good Thing (TM)
once you have a package, add code in the \examples{} section of the documentation as this will be running during R CMD check and provides an easy entry to regression testing
once you get used to regression testing, start to use a package such as RUnit; that really is best practices
JD's pointer to the Google Style Guide is a good one too. That isn't the only style guide as e.g. Henrik's R Coding Convention precedes it by a few years; and there is also Hadley's riff on Google's style guide
Otherwise, the oldie-but-goldie 'do what your colleagues and coauthors do' also applies
I recommend Josh Reich's Load, Clean, Func, Do workflow from this previous question.
In addition I recommend following coding guidelines such as Google's R Style Guide. Using a coding style guide makes reading the code later so much easier.
I completely agree with the existing answers, especially regarding the usage of packages. Packages require a lot of discipline, documentation, and structure, which really help to enforce best practices (along with R CMD CHECK). You can also use the codetools package to help with this. Use the roxygen package for documentation.
Beyond that, I recommend that you not only vectorize your code, but more particularly, make every effort to vectorize your functions, meaning that you should be able to provide vector arguments and get vectors returned (even from things like database calls). That will really improve your code efficiency and clarity in the long run.
Lastly, I really like to use something like Sweave to organize my code into clear literate reproducible research whenever writing a report. Along with this I recommend using the cache package.
For efficiency, prefer vector operations over for loops.
This is good programming practice in general, but use a version control system such as SVN manage your code.

Resources