Importing quiz questions created using the R exams package into canvas - r-exams

I have been using the R exams package to create exams for my introductory statistics course this semester. It is really a great tool! I've been able to create several questions from scratch & import them to canvas without issue. However, there are some questions that give me problems when I try to import them (e.g., the anova and boxplot examples that are included in the package). I can successfully import if I use:
R> library("exams")
R> set.seed(1)
R> exams2canvas("anova.Rmd")
However, I sometimes run into problems when trying to create many versions of the same question:
R> library("exams")
R> exams2canvas("anova.Rmd", n=50)

TL;DR
The source of the problems are multiple-choice exercises with no correct alternative. These are not supported by learning management systems like Canvas or Moodle and hence exercises for these systems must assure at least one correct alternative and one wrong alternative.
Demo exercises
Some of the demo exercises in R/exams did not restrict the number of correct/wrong alternatives to a minimum of one. So from time to time it could happen that no alternative is correct. Up to version 2.3-6 of R/exams this affects the following exercises:
anova,
automaton,
boxplots,
cholesky,
relfreq,
scatterplot.
All of these have been adapted in version 2.4-0 (which was the development version of the package at the time of writing this answer).
Background
Multiple-choice exercises without correct alternatives are straightforward to handle without partial credits when the entire answer pattern must be fully correct. However, when using partial credits, no positive points can be obtained when there are no correct alternatives.
When we created the demo exercises in R/exams we adapted exercises from an environment where we did not use partial credits. But learning management systems like Moodle or Canvas expect at least one correct (and typically also one wrong) alternative for scoring it correctly with partial credits.

Related

How can I generate exams that contain randomly-generated single-choice answers using R/exams package?

I am interested in using R/exams package in order to generate tests composed of 'single-choice' questions. The three most important things that I am looking for are:
-being able to randomly select one (or more) out of a set of exercises for each participant
-being able to randomly shuffle answer alternatives
-being able to randomly select numbers, text blocks, graphics using the R programming language.
I have followed the basic R/exams tutorials and was able to generate their demo exams, but I was not yet able to find a full tutorial on how to achieve these goals. I am a beginner R programmer and I would, therefore, need a step-by-step tutorial.
If there are any suggestions of such tutorials here I would really appreciate any help.
Thank you
All things you are looking for can be accomplished with R/exams. There is not one step-by-step tutorial that illustrates everything, though. But there are quite a few bits and pieces that should get you started.
Do you want to generate written single-choice exams or do you want to conduct your tests in some learning management system like Moodle or so? If you're looking for written exams, then exams2nops() is the most complete solution, see:
http://www.R-exams.org/tutorials/exams2nops/
For setting up single-choice exercises based on numeric questions, a step-by-step tutorial is: http://www.R-exams.org/tutorials/static_num_schoice/
If you prefer an arithmetic illustration rather than one from economics, there is:
http://www.R-exams.org/general/user2019/
For selecting one out of a set of exercises for each participant, you need to define an exam with a list of exercises, e.g.,
exm <- list(
c("a.Rmd", "b.Rmd", "c.Rmd"),
c("d.Rmd", "e.Rmd")
)
When using exams2xyz(exm) then you will get an exam with two exercises. The first one is a random sample of a-c and the second one a random sample of d-e.
I suggest you try to get started with these, keeping it simple in the beginning. I.e., instead of accomplishing all tasks immediately, try to take them one by one.

Multivariate Dynamic Time Warping(DTW) with R

I'm currently dealing with multivariate dynamic time warping (DTW) in R. The best library I found so far is the dtw package as described here: http://dtw.r-forge.r-project.org/
But I do not know how multivariate dtw is actually implemented and it is also not described in the description of the package. All in all, I would like to know if it implements DTWD (Dependent DTW) or DTWI (Independent DTW).
Does anyone have an idea or a suggestion how to find out which of these two approaches the package uses? Or are there libaries which allow me to choose the variant?
You can always look at the companion papers that come with the packages for a more theoretical grounding of the packages
https://www.jstatsoft.org/article/view/v031i07
Additionally, you can always look into the "guts" of the functions to understand how things are implemented in the code
https://www.rdocumentation.org/packages/dtw/versions/1.20-1/source
As far as this question in general, perhaps it is better suited for https://stats.stackexchange.com/ where the questions are more methodological than programming.
The two types of multivariate dynamic time warping are carefully explained here
https://www.cs.ucr.edu/~eamonn/Multi-Dimensional_DTW_Journal.pdf

How to make publishable tables and plots using R? [duplicate]

There are a range of tools available for creating publication quality tables using R, Sweave, and LaTeX.
In particular, there are helper functions like latex in the Hmisc package, and xtable in the xtable package. I've also often written my own code so that I could have complete control over table formatting (e.g., see this example).
However, when preparing publication quality tables a range of issues often arise:
how and when to apply numeric formatting
how to precisely control alignment of columns and cells
how to precisely control cell borders
how to convert variable labels to variable names
and so on
Beyond the high level issues of specifying the desired table format, there are issues of implementation.
When should a helper function such as xtable be used?
Which helper function should be used in a given situation?
How can the default output of helper functions be customised to particular requirements?
Question
It seems to me that the above issues are deserving of a detailed textbook-style introduction.
Are there any online or offline resources that provide a detailed overview of how to produce publication quality tables using R, Sweave, and LaTeX, and that address the issues discussed above?
Just to tie this up with a nice little bow at the time of current writing, the best existant tutorials on publication-quality tables and usage scenarios appear to be an amalgamation of these documents:
A Sweave example (source)
The Joy of Sweave: A Beginner's Guide to Reproducible Research with Sweave (source)
Latex and R via Sweave: An example document how to use Sweave (source)
Sweave = R · LaTeX2 (source)
The xtable gallery (source)
The Sweave Homepage
LaTeX documentation
Going beyond the scope of what currently exists, you may want to ask the author of The Joy of Sweave for a document on publication-quality tables specifically. It seems like he's gone above and beyond this problem in his research. In addition to the questions you've raised, this space specifically could use a style guide that, flatly, does not currently exist.
And, as mentioned in the question errata, this is a perfect example of a question for https://tex.stackexchange.com/. I encourage you to continue to ask specific questions there when you run into any difficulties in your current projects.
The package stargazer can create publication-quality - incl. using templates designed to resemble existing academic journals - from commonly used R statistical functions and packages (lm, glm, plm, svyglm, survival, pscl, AER, and others). Also good for creating summary statistics tables, and can directly output data frame content as well.
There is a tabular function in the tables package which addresses formatting, alignment and label operations. The package has a vignette which is a good starting point.
xtable has worked fine for me so far.
In combination with siunitx, and when necessary, longtable, it can produce pretty effective tables, in my opinion. With packages like booktabs and caption, the aesthetics can be pleasing too.
I am not sure this level of detail was asked for by the OP, but for what it's worth, the basic implementation could be something along these lines: https://tex.stackexchange.com/questions/41067/caption-for-longtable-in-sweave/41183#41183 (my own answer to another question).
I highly recommend ConTeXt which makes use of the TABLE package. There is a Table overview in contextgarden and an exhaustive manual.

Are there any guidelines for when reproducible code should be included into a publication?

Given the stress toward reproducible science, I was wondering if my recent work warrants the inclusion of example code in the publication. The datasets that I am using are quite big, so it wouldn't make sense to publish those necessarity - However, the statistical methods that I apply within R are not generally known to my audience (although I would think that they should be).
I'm using empirical orthogonal function analysis (EOF) and generalized additive models (GAM) within my analysis. GAM, in particular, is widely used in ecological studies, but less so within the physical sciences - my work spans both disciplines.
I definitely refer to the R packages that I use, and it wouldn't really be difficult for a reviewer / reader to look for those references (and included examples) themselves. So, my question is, what situations are most appropriate for the inclusion of reproducible code in a publication?
Code is the most accurate representation of what you actually did. Therefore, in my view you should always aim to publish code alongside your article.
However, editor resistance to this is pretty strong. The fear is that if the reviewer had access to the code, then the journal looks pretty bad if a substantive coding mistake is later found. This is not a hypothetical fear, given the Levitt paper, etc.
Knuth has some strong views on literate programming that you should be able to cite as justification. If you can't convince the journal to accept your code as an integral piece of the publication, consider publishing it on your personal website (the approach taken by e.g. Raj Chetty for many of his papers) or publish it as an R package.
Finally, here's a note I wrote to my programming students:
Consider publishing your code. Doing so will act as a commitment
device which will encourage good habits--habits that make your own
work easier. Publishing your code also makes it easier for others to
extend your analysis, which can result in more citations of your work.
Releasing your code is good academic practice as well: it is the
truest testament to your analysis. And offering your program to the
world shows off the beautiful coding skills which you are about to
acquire.
A basic tenet of science is reproducibility. So the answer would be to "include" code required to conduct your analysis to every paper/publication that is based on data analysis.
I say "include" because you don't need to put the R code directly into the paper. Many if not most journals allow supplementary material which is an option. Alternative, supply your script to one of the many Science data archiving sites (Such as Figshare) and then (and here is the killer!) cite your own script using the DOI that Figshare gives to your deposited script. If you can post the data too, then all the better; Figshare doesn't really care too much about big data sets.
The above applies to code where you are using other packages and your R script does things like loads and formats data, calls functions from other packages and then plots or displays output/results. If you have developed new R code to implement a particular method then I would say package the code as an R package and submit that to CRAN or r-forge or something like that.
From your description, the former (deposit the analysis script in a repo) would be most appropriate.
We recently had a discussion at our research institute regarding reproducible research. The incentive came from the Nature editorial (http://arstechnica.com/science/2012/02/science-code-should-be-open-source-according-to-editorial/) which argued that all your code should be published. I whole heartedly agree with this. Even though your dataset is very big, publishing the R code that you used to create your results makes it crystal clear what you did. Often times the methods of a paper do not contain sufficient detail to reproduce the result, the code is quite a help in this case.

Comparing R to Matlab for Data Mining

Instead of starting to code in Matlab, I recently started learning R, mainly because it is open-source. I am currently working in data mining and machine learning field. I found many machine learning algorithms implemented in R, and I am still exploring different packages implemented in R.
I have quick question: how do you compare R to Matlab for data mining application, its popularity, pros and cons, industry and academic acceptance etc.? Which one would you choose and why?
I went through various comparisons for Matlab vs R against various metrics but I am specifically interested to get answer for its applicability in Data Mining and ML.
Since both language are pretty new for me I was just wondering if R would be a good choice or not.
I appreciate any kind of suggestions.
For the past three years or so, i have used R daily, and the largest portion of that daily use is spent on Machine Learning/Data Mining problems.
I was an exclusive Matlab user while in University; at the time i thought it was
an excellent set of tools/platform. I am sure it is today as well.
The Neural Network Toolbox, the Optimization Toolbox, Statistics Toolbox,
and Curve Fitting Toolbox are each highly desirable (if not essential)
for someone using MATLAB for ML/Data Mining work, yet they are all separate from
the base MATLAB environment--in other words, they have to be purchased separately.
My Top 5 list for Learning ML/Data Mining in R:
Mining Association Rules in R
This refers to a couple things: First, a group of R Package that all begin arules (available from CRAN); you can find the complete list (arules, aruluesViz, etc.) on the Project Homepage. Second, all of these packages are based on a data-mining technique known as Market-Basked Analysis and alternatively as Association Rules. In many respects, this family of algorithms is the essence of data-mining--exhaustively traverse large transaction databases and find above-average associations or correlations among the fields (variables or features) in those databases. In practice, you connect them to a data source and let them run overnight. The central R Package in the set mentioned above is called arules; On the CRAN Package page for arules, you will find links to a couple of excellent secondary sources (vignettes in R's lexicon) on the arules package and on Association Rules technique in general.
The standard reference, The Elements of Statistical
Learning by Hastie et al.
The most current edition of this book is available in digital form for free. Likewise, at the book's website (linked to just above) are all data sets used in ESL, available for free download. (As an aside, i have the free digital version; i also purchased the hardback version from BN.com; all of the color plots in the digital version are reproduced in the hardbound version.) ESL contains thorough introductions to at least one exemplar from most of the major
ML rubrics--e.g., neural metworks, SVM, KNN; unsupervised
techniques (LDA, PCA, MDS, SOM, clustering), numerous flavors of regression, CART,
Bayesian techniques, as well as model aggregation techniques (Boosting, Bagging)
and model tuning (regularization). Finally, get the R Package that accompanies the book from CRAN (which will save the trouble of having to download the enter the datasets).
CRAN Task View: Machine Learning
The +3,500 Packages available
for R are divided up by domain into about 30 package families or 'Task Views'. Machine Learning
is one of these families. The Machine Learning Task View contains about 50 or so
Packages. Some of these Packages are part of the core distribution, including e1071
(a sprawling ML package that includes working code for quite a few of
the usual ML categories.)
Revolution Analytics Blog
With particular focus on the posts tagged with Predictive Analytics
ML in R tutorial comprised of slide deck and R code by Josh Reich
A thorough study of the code would, by itself, be an excellent introduction to ML in R.
And one final resource that i think is excellent, but didn't make in the top 5:
A Guide to Getting Stared in Machine Learning [in R]
posted at the blog A Beautiful WWW
Please look at the CRAN Task Views and in particular at the CRAN Task View on Machine Learning and Statistical Learning which summarises this nicely.
Both Matlab and R are good if you are doing matrix-heavy operations. Because they can use highly optimized low-level code (BLAS libraries and such) for this.
However, there is more to data-mining than just crunching matrixes. A lot of people totally neglect the whole data organization aspect of data mining (as opposed to say, plain machine learning).
And once you get to data organization, R and Matlab are a pain. Try implementing an R*-tree in R or matlab to take an O(n^2) algorithm down to O(n log n) runtime. First of all, it totally goes against the way R and Matlab are designed (use bulk math operations wherever possible), secondly it will kill your performance. Interpreted R code for example seems to run at around 50% of the speed of the C code (try R built-in k-means vs. flexclus k-means); and the BLAS libraries are optimized to an insane level, exploiting cache sizes, data alignment, advanced CPU features. If you are adventurous, try implementing a manual matrix multiplication in R or Matlab, and benchmark it against the native one.
Don't get me wrong. There is a lot of stuff where R and matlab are just elegant and excellent for prototyping. You can solve a lot of things in just 10 lines of code, and get a decent performance out of it. Writing the same thing by hand would be hundreds of lines, and probably 10x slower. But sometimes you can optimize by a level of complexity, which for large data sets does beat the optimized matrix operations of R and matlab.
If you want to scale up to "Hadoop size" on the long run, you will have to think about data layout and organization, too, unless all you need is a linear scan over the data. But then, you could just be sampling, too!
Yesterday I found two new books about Data mining. These series of books entitled by ‘Data Mining’ address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters.The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. Books are: “New Fundamental Technologies in Data Mining” here http://www.intechopen.com/books/show/title/new-fundamental-technologies-in-data-mining & “Knowledge-Oriented Applications in Data Mining” here http://www.intechopen.com/books/show/title/knowledge-oriented-applications-in-data-mining These are open access books so you can download it for free or just read on online reading platform like I do. Cheers!
We should not forget the origin sources for these two software: scientific computation and also signal processing leads to Matlab but statistics leads to R.
I used matlab a lot in University since we have one installed on Unix and open to all students. However, the price for Matlab is too high especially compared to free R. If your major focus is not on matrix computation and signal processing, R should work well for your needs.
I think it also depends in which field of study you are. I know of people in coastal research that use a lot of Matlab. Using R in this group would make your life more difficult. If a colleague has solved a problem, you can't use it because he fixed it using Matlab.
I would also look at the capabilities of each when you are dealing with large amounts of data. I know that R can have problems with this, and might be restrictive if you are used to an iterative data mining process. For example looking at multiple models concurrently. I don't know if MATLAB has a data limitation.
I admit to favoring MATLAB for data mining problems, and I give some of my reasoning here:
Why MATLAB for Data Mining?
I will admit to only a passing familiarity with R/S-Plus, but I'll make the following observations:
R definitely has more of a statistical focus than MATLAB. I prefer building my own tools in MATLAB, so that I know exactly what they're doing, and I can customize them, but this is more of a necessity in MATLAB than it would be in R.
Code for new statistical techniques (spatial statistics, robust statistics, etc.) often appears early in S-Plus (I assume that this carries over to R, at least some).
Some years ago, I found the commercial version of R, S-Plus to have an extremely limited capacity for data. I cannot say what the state of R/S-Plus is today, but you may want to check if your data will fit into such tools comfortably.

Resources