This question already has an answer here:
Automatic documentation of datasets
(1 answer)
Closed 8 years ago.
Is there any code self-documenting system for R?
I think writing documentation is a very important part of any statistical analysis. There are always important details in your code or the steps of data cleaning that are not reflected in the final analysis report. I wonder whether there is any efficient self-documenting system (or approach) in R that can help me documenting my codes including my comments o my codes with structure files explaining the structure of the datasets (or tables) used in my code?
Beyond using Sweave or knitr in R, is there any other way of doing that?
I'd suggest bundling up your code and data sets in an R package. There's a steep learning curve the first time you do it, but if you're at the point where you're asking, "how can I better manage this code documentation thing", you're likely ready to take the plunge.
Plus, doesn't the idea of typing ?myOwnFunction or ?myOwnDataset, and having the appropriate help file pop up (just as it does when you do ?mean or ?iris) sound appealing?
You might try writing your code as a Sweave or kntr file, which contains LaTeX text along with R code. This process produces a pdf of your text, including your code, and executes your code.
If you choose to organise your analysis as package, you can use roxygen2 for documentation of your code and data.
Related
I am in the process of automating a number of graphs that are produced where I work through R that are currently in Excel.
Note that for now, I am not able to convince that doing the graphs directly in R is the best solution, so the solution cannot be "use ggplot2", although I will push for it.
So in the meantime, my path is to download, update and tidy data in R, then export it to an existing Excel file where the graph is already constructed.
The way I have been trying to do that is through openxlsx, which seems to be the most frequent recommendation (for instance here).
However, I am encountering an issue taht I cannot solve with this way (I asked a question there that did not inspire a lot of answers !).
Therefore, I am going to try other ways, but I seem to mainly be directed to the aforementioned solution. What are the existing alternatives ?
I am very sorry if this is a stupid question,
I was wondering… let’s say you have an HTML file created from R Markdown containing some graph. More specifically, this HTML document is a flexdashboard. Is it possible to obtain the original dataframe used to create the graph, using only this HTML file?
I don’t know if I managed to explain, but I am worried about the security of the original dataframe.
I suppose the real question is, how does Markdown knit the R chunks? Is it only the outcome of the code that is incorporated in the HTML document... or is it everything? I always thought that because knitr create a .md temporary file that contains the code and its outcomes that meant that it only contained the code and the outcomes and not the original dataframe used to make graphs, stats, tables, etc. But I may be completely wrong and I wish to know the truth.
If anybody has the answer (and maybe some tips in how to secure the flexdashboards), I would be very thankful. I hope I managed to explain my problem in a comprehensible way.
Is there any way in R to write a macro like one would in SAS? That is, I want to write a macro with some input variable (corresponding to a row in a dataset) so I can quickly make a plot of certain characteristics from said row. Any information regarding a package/method to do so would be greatly appreciated.
R will generate some very, very, very basic code for you. If you have RStudio installed, you can click File > Import Dataset > From ... point to your file and click 'Open'. R will automatically create the code to do the import. Again, this is very basic. You really need to know how to code to do anything useful.
You get out of it what you put into it, so spend some time learning this stuff, and inevitably you'll learn a ton. I've found that it's very helpful to read through people's questions that are posted here, and try to solve the problem yourself. You'll learn a lot that way and you'll see what the current trends are. Reading books is great, of course, but sometimes I feel like some authors are too academic, and in the real world, sometimes it's done differently than what you see in textbooks.
The exams package is a really fantastic tool for generating exams from R.
I am interested in the possibilities of using it for (programming) assignments. The main difference from an exam is that besides solutions I'd also like hints to be included in the PDF / HTML output file.
Typically I put the hints for (sub)-questions in a separate section at the end of the PDF assignment (using a separate Latex section), but this requires manual labour. These are for students to consult if they need help getting started on any particular exercise, and it avoids having them look at the solutions directly for hints on how to start.
An assignment might look like:
Question 1
Question 2 ...
Question 10
Hints to all questions
I'd be open to changing the exact format as long it is possible to look up hints without looking up the answer, and it remains optional to read the hints.
So in fact I am looking for an intermediate "hints" section between the between the "question" and "solution" section, which is present for some questions but not for all.
My questions: Is this already possible? If not, how could this be implemented using the exams package?
R/exams does not have dedicated/native support for this kind of assignment so it isn't available out of the box. So if you want to get this kind of processing you have to ensure it yourself using LaTeX for PDF or CSS for HTML.
In LaTeX I think it should be possible to do what you want using the newfloat and endfloat packages in the LaTeX template that you pass to exams2pdf(). Any LaTeX template needs to provide {question} and {solution} environments, e.g., the plain.tex template shipped with the package has
\newenvironment{question}{\item \textbf{Problem}\newline}{}
\newenvironment{solution}{\textbf{Solution}\newline}{}
with the exercises embedded as
\begin{enumerate}
%% \exinput{exercises}
\end{enumerate}
Now instead of the \newenvironment{solution}... you could use
\usepackage{newfloat,endfloat}
\DeclareFloatingEnvironment{hint}
\DeclareDelayedFloat{hint}{Hint}
\DeclareFloatingEnvironment{solution}
\DeclareDelayedFloat{solution}{Solution}
This defines two new floating environments {hint} and {solution} which are then declared delayed floats. And then you would need to customize these environments regarding the text displayed within the questions at the beginning and the listing at the end. I'm not sure if this can get you exactly what you want, though, but hopefully it is a useful place to start from.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The discussion in this question is the direct cause for me asking this question. The more general reason is the fact that I often have to explain R use to people that are only familiar with SPSS. I know most of the basics of SPSS, as we still use it in the base course statistics. But as I'm more of an R guy, it's difficult to know how SPSS users experience the first meeting with R.
I know there is the book R for SAS and SPSS users and that contains already some information. Yet, I would like to know what the more difficult parts are when you switch from SPSS to R.
Or in other words : if you would have to explain R in one day to SPSS users, which topics would you focus on? This is not a hypothetical question by the way (yeah, I know, it's not because one get paid for it that it always makes sense...).
Firstly, data manipulation has been the most challenging thing to learn coming from SPSS/SAS to R. I've found, personally, that getting the data in the right shape for an analysis is usually much more difficult than the analysis itself. Secondly, a true understanding of how to deal with categorical values through the use of factors. Lastly, summary statistics and descriptives can sometimes be challenging to get in a format that is transmutable to PPT or Excel which are what (my) clients generally expect/demand for reporting.
I would focus on:
1 Data manipulation
Understanding data structures. Import/Export. Then in-depth training on the use of packages like plyer, reshape with a particular focus on how to effectively use cast with formulas and melt with ids. How to apply numerical functions within a data.frame using ddply.
2 Factoring Data
In general, an explanation of dealing with recoding with, epicalc or a user-defined function. Also an explanation of the significance of factors, levels, and labels
3 Descriptives
Take a few minutes to introduce xtabs(), table(), prop.table() using cast() from reshape to create columnar tables of data that are more reasonably exported to Excel.
Graphics are optional, if you've done a good job of the above they should be able to get the data they need to create graphs in whatever software they are most comfortable with.
4 Graphics
If you've done a good job teaching the data manipulation, getting data into the shape needed for graphing should be pretty straightforward (or at least reproducible) at this point. ggplot2 is complicated and requires a day just by itself to be played with. But it is possible to give a quick overview of it. Alternatively, base graphics are simple to understand and the help is much more clear on what things do and how the syntax works.
Note: I left out statistical analysis. However, an overview of lm() and perhaps anova(), or cor() would be helpful as a start point. But this should be explained at the same time as data.manipulation.
Although I "wrote the book" on R to SPSS migration, that was aimed at programmers and most SPSS users that I know prefer to "point-and-click" instead. A graphical user interface like Deducer (or R Commander) can help them feel at home while teaching them how R programming code works if they want to see it. Deducer's Plot Builder also does a nice job letting you create complex plots easily, and if you want to learn to ggplot2 code, it will show you that as well. Ian did a great job with it!
However, while the SPSS graphical user interface covers 98% of what SPSS can do, Deducer covers perhaps 1% of what R can do. That's probably still 75% of what your average researcher needs, but R is so broad that to get the most out of it people will need to learn to program. The free version of my book, "R for SAS and SPSS Users" is only 80 pages & covers the areas of programming that I think are most likely to confuse beginners. It's at http://r4stats.com.
Just recently I've had a student who was somewhat versed in statistics and did some analysis beforehand in SPSS. I then showed him how to do the exact same thing in R. We went through the code and plotting, explaining and debating each line. He realized how easy and convenient it is to do it in R. Thus, R community grew by 1. :)
The biggest issue that the researchers I've dealt with have is the lack of point-and-click GUI. While there are a number of efforts out there in the R community, none of them have reached the ease-of-use/power level that SPSS has.
Since coding is second nature to R users, sometimes we forget that the majority of users of statistical software can't program (and would avoid it like the plague), even though they may have a strong practical understanding of statistics.
If I had one day to bring an SPSS user into R, I'd start them on Deducer. Deducer is an R GUI project (Self promotion note: I'm the author) that should feel very familiar to a user coming from SPSS. As they find themselves needing more advanced functions, they will naturally move to the command line to fulfill their needs.