Is it possible to remove the letter prefix ("a. ", "b. ", etc.) from the possible answers in mulitple choice tests? - r-exams

When producing multiple choice questions, exams prefixes the possible answers with lower case letters. Is it possible to change this behaviour when using exams2qti21 so that the answers are displayed without this prefix?
e.g. to go from
a. 12
b. 35
c. 15
d. 25
to simply,
12
35
15
25
I would like to do this because our content management system, "itsLearning" can randomise the possible answers (per student) and the inclusion of the letter prefixes messes this up.

You can do this by setting the enumerate argument to FALSE for the mchoice and/or schoice questions. By default, the setting of mchoice is also propagated to schoice. So this should do what you want:
exams2qti21(..., mchoice = list(enumerate = FALSE))
As an additional comment:
Letting the learning management system do the randomization is more efficient if the exercises and choice lists are otherwise static. Then you just need to upload one exercise and re-use it because the learning management system does the shuffling.
Letting the exams2xyz() interface from R/exams do the shuffling, on the other hand, gives you far more options than most learning management systems support. In particular you can generate the choice lists fully dynamically (as in deriv2 or tstat2) or you can do subsampling from a large static list (as in capitals). In both cases I would switch off the shuffling in the learning management system.

Related

How to grade exams/questions manually?

What I'd like to do:
I would like to use r-exams in the following procedure:
Providing electronic exams in pdf format to students (using exams2pdf(..)
Let the students upload excel file with their answers
Grade the answers using (using eval_nops(...))
My Question:
Is calling the function eval_nops() the preferred way to manually grad questions in r-exams?
If not, which way is to be prefered?
What I have tried:
I'm aware of the exam2nops() function, and I know that it gives back an .RDS file where the correct answers are stored. Hence, I basically have what I need. However, I found that procedure to be not very straightforward, as the correct answers are buried rather deeply inside the RDS file.
Overview
You are right that there is no readily available system for administering/grading exams outside of a standard learning management system (LMS) like Moodle or Canvas, etc. R/exams does provide some building blocks for the grading, though, especially exams_eval(). This can be complemented with tools like Google forms etc. Below I start with the "hard facts" regarding exams_eval() even though this is a bit technical. But then I also provide some comments regarding such approaches.
Using exams_eval()
Let us consider a concrete example
eval <- exams_eval(partial = TRUE, negative = FALSE, rule = "false2")
indicating that you want partial credits for multiple-choice exercises but the overall points per item must not become negative. A correctly ticked box yields 1/#correct points and an incorrectly ticked box 1/#false. The only exception is where there is only one false item (which would then cancel all points) then 1/2 is used.
The resulting object eval is a list with the input parameters (partial, negative, rule) and three functions checkanswer(), pointvec(), pointsum(). Imagine that you have the correct answer pattern
cor <- "10100"
The associated points for correctly and incorrectly ticked boxed would be:
eval$pointvec(cor)
## pos neg
## 0.5000000 -0.3333333
Thus, for the following answer pattern you get:
ans <- "11100"
eval$checkanswer(cor, ans)
## [1] 1 -1 1 0 0
eval$pointsum(cor, ans)
## [1] 0.6666667
The latter would still need to be multiplied with the overall points assigned to that exercise. For numeric answers you can only get 100% or 0%:
eval$pointsum(1.23, 1.25, tolerance = 0.05)
## [1] 1
eval$pointsum(1.23, 1.25, tolerance = 0.01)
## [1] 0
Similarly, string answers are either correct or false:
eval$pointsum("foo", "foo")
## [1] 1
eval$pointsum("foo", "bar")
## [1] 0
Exercise metainformation
To obtain the relevant pieces of information for a given exercise, you can access the metainformation from the nested list that all exams2xyz() interfaces return:
x <- exams2xyz(...)
For example, you can then extract the metainfo for the i-th random replication of the j-th exercise as:
x[[i]][[j]]$metainfo
This contains the correct $solution, the $type, and also the $tolerance etc. Sure, this is somewhat long and inconvenient to type interactively but should be easy enough to cycle through programatically. This is what nops_eval() for example does base on the .rds file containing exactly the information in x.
Administering exams without a full LMS
My usual advice here is to try to leverage your university's services (if available, of course). Yes, there can be problems with the bandwidth/stability etc. but you can have all of the same if you're running your own system (been there, done that). Specifically, a discussion of Moodle vs. PDF exams mailed around is available here:
Create fillable PDF form with exams2nops
http://www.R-exams.org/general/distancelearning/#replacing-written-exams
If I were to provide my exams outside of an LMS I would use HTML, though, and not PDF. In HTML it is much easier to embed additional information (data, links, etc.) than in PDF. Also HTML can be viewed on mobile device moch more easily.
For collecting the answers, some R/exams users employ Google forms, see e.g.:
https://R-Forge.R-project.org/forum/forum.php?thread_id=34076&forum_id=4377&group_id=1337. Others have been interested in using learnr or webex for that:
http://www.R-exams.org/general/distancelearning/#going-forward.
Regarding privacy, though, I would be very surprised if any of these are better than using the university's LMS.

Grading multiple choice and cloze questions created with exams2moodle()

I am using exams2moodle() from R/exams to create multiple choice and cloze questions in Moodle. Before preparing exams, I would like to be certain how Moodle computes grades.
It seems to me that in multiple choice questions the default setting in the evaluation policy is partial = TRUE, rule = "false", negative = FALSE. Is that correct?
For the cloze questions, it seems that the overall grade assigned to the cloze question is divided equally among the subquestions. I wonder if there is some way to give unequal weight to the single sub-questions.
Thank you in advance for any help!
Overview
Essentially you are correct. However, I will discuss a few details below that are Moodle-specific because Moodle does not support the full flexibility of the exams_eval() strategies from R/exams.
Multiple-choice (mchoice) questions
Moodle only supports evaluations with partial credits and thus setting partial = FALSE is not possible.
The evaluations in Moodle are always such that not ticking a box yields zero points. Only ticking a box can yield points (positive or negative).
Ticking a correct box always gives the proportion 1/#ncorrect of the overall number of points were #ncorrect is the number of correct answer alternatives.
Ticking an incorrect box can give a negative amount of points. The precise amount is controlled by the rule specification. For rule = "false" an incorrectly ticked box gives the proportion -1/#incorrect = -1/(n - #ncorrect).
As this is pretty harsh in the case of exactly one incorrect alternative (ticking it erases all points), there is rule = "false2" which is the default. An incorrectly ticked box still subtracts the proportion 1/#incorrect except in the case where #incorrect = 1 when 1/2 is subtracted.
The overall amount of points from all boxes combined cannot become negative! It may be displayed as negative in some situations but is actually treated as 0. Thus, the R/exams argument negative = TRUE would be ignored, it is implicitly always negative = FALSE.
Because only ticking correct answer alternatives can yield positive points, there needs to be at least one correct answer alternative. If a multiple-choice question without correct answer alternatives is used, it always yields zero points.
Single-choice (schoice) questions
Essentially, the same rules as above apply. The main difference is that Moodle uses radio buttons (rather than check boxes) for single-choice questions. Thus, participants can only select one alternative and not several ones.
The other important difference is that the sum of points can become negative! (Unlike for multiple-choice questions.)
exams2moodle() currently uses the same default evaluation strategy rule = "false2" for single-choice questions. This assures that the expected number of points from random guessing strategies is zero. (However, we are thinking about changing the default to rule = "none" which seems to be more commonly used, i.e, selected an incorrect alternative just gives zero points.)
Cloze questions
The different parts of a cloze questions are evaluated separately and then simply summed up.
For schoice or mchoice items within a cloze question, the same rules as above apply. Thus, points can become negative for single-choice questions but not for multiple-choice questions. The default is, however, not to use negative points.
Note also that mchoice items within a cloze are currently not supported very well by R/exams because they were not supported at all by Moodle until recently. See Cloze question combining mchoice and num import in Moodle We hope to improve this in future versions.
By default the overall number of points is split equally among the items within a cloze. However, expoints can be set to a vector of the same length as the number of items giving the number of points for each. (This was improved in version 2.4-0, though.)
Marks
The assignment of marks to the points achieved cannot be controlled from exams2moodle() in R/exams. Thus, you have to configure this in Moodle when putting together the quiz there.

How do I insert text before a group of exercises in an exam?

I am very new to R and to R/exams. I've finally figured out basic things like compiling a simple exam with exams2pdf and exams2canvas, and I've figured out how to arrange exercises such that this group of X exercises gets randomized in the exam and others don't.
In my normal written exams, sometimes I have a group of exercises that require some introductory text (e.g,. a brief case study on which the next few questions are based, or a specific set of instructions for the questions that follow).
How do I create this chunk of text using R/exams and Rmd files?
I can't figure out if it's a matter of creating a particular Rmd file and then simply adding that to the list when creating the exam (like a dummy file of sorts that only shows text, but isn't numbered), or if I have to do something with the particular tex template I'm using.
There's a post on R-forge about embedding a plain LaTeX file between exercises that seems to get at what I'm asking, but I'm using Rmd files to create exercises, not Rnw files, and so, frankly, I just don't understand it.
Thank you for any help.
There are two strategies for this:
1. Separate exercise files in the same sequence
Always use the same sequence of exercises, say, ex1.Rmd, ex2.Rmd, ex3.Rmd where ex1.Rmd creates and describes the setting and ex2.Rmd and ex3.Rmd simply re-use the variables created internally by ex1.Rmd. In the exams2xyz() interface you have to assure that all exercises are processed in the same environment, e.g., the global environment:
exams2pdf(c("ex1.Rmd", "ex2.Rmd", "ex3.Rmd"), envir = .GlobalEnv)
For .Rnw exercises this is not necessary because they are always processed in the global environment anyway.
2. Cloze exercises
Instead of separate exercise files, combine all exercises in a single "cloze" exercise ex123.Rmd that combines three sub-items. For a simple exercise with two sub-items, see: http://www.R-exams.org/templates/lm/
Which strategy to use?
For exams2pdf() both strategies work and it is more a matter of taste whether one prefers all exercises together in a single file or split across separate files. However, for other exams2xyz() interfaces only one or none of these strategies work:
exams2pdf(): 1 + 2
exams2html(): 1 + 2
exams2nops(): 1
exams2moodle(): 2
exams2openolat(): 2
exams2blackboard(): -
exams2canvas(): -
Basically, strategy 1 is only guaranteed to work for interfaces that generate separate files for separate exams like exams2pdf(), exams2nops(), etc. However, for interfaces that create pools of exercises for learning management systems like exams2moodle(), exams2canvas(), etc. it typically often cannot be assured that the same random replication is drawn for all three exercises. (Thus, if there are two random replications per exercise, A and B, participants might not get A/A/A or B/B/B but A/B/A.)
Hence, if ex1/2/3 are multiple-choice exercises that you want to print and scan automatically, then you could use exams2nops() in combination with strategy 1. However, strategy 2 would not work because cloze exercises cannot be scanned automatically in exams2nops().
In contrast, if you want to use Moodle, then exams2moodle() could be combined with strategy 2. In contrast, strategy 1 would not work (see above).
As you are interested in Canvas export: In Canvas neither of the two strategies works. It doesn't support cloze exercises. And to the best of my knowledge it is not straightforward to assure that exercises are sampled "in sync".

Adding skipped/ungraded open-ended questions

Is there a way to include open-ended/free-form questions that are ungraded or skipped by r-exams?
Use case: we want to have an exam with mostly multiple choice questions using the package and its grading capability, but also have 5-10 open ended questions that are printed in the same exam. Ideally, r-exams would provide the grade for the first MCQ section, and we could manually add the grade of the open-ended questions.
I forked the package and made some small changes that allows one to control how many questions are printed on the first page and to remove the string-question pages.
The new parameters are number_of_closed_questions and include_string_pages. It is far away from being ideal, but works for me.
As an example let us have 6 mpc/single-choice questions and one essay question (essayreg):
# install devtools if you do not have it!
# install the fork
devtools::install_github("johannes-titz/exams")
library("exams")
myexam <- list(
"tstat2.Rnw",
"ttest.Rnw",
"relfreq.Rnw",
"anova.Rnw",
c("boxplots.Rnw", "scatterplot.Rnw"),
"cholesky.Rnw",
"essayreg.Rnw"
)
set.seed(403)
ex1 <- exams2nops(myexam, n = 2,
dir = "nops_pdf", name = "demo", date = "2015-07-29",
number_of_closed_questions = 6, include_string_pages = FALSE)
This will produce only 6 questions on the front page (instead of 7) and will also exclude the string-question pages.
If you want normal behavior, just exclude the new parameters. Obviously, one will have to set the number of closed questions manually, so one should be really careful.
I guess one could automatically detect how many string questions are loaded and from this determine the number of open-ended/closed-ended questions, but I currently do not have the time to write this and the presented solution is usable for my case.
I am not 100% sure that the scans will work this way, but I assume there should not be any bigger problems as I did not really change much. Maybe Achim Zeileis could comment on that? See my commit: https://github.com/johannes-titz/exams/commit/def044e7e171ea032df3553acec0ea0590ae7f5e
There is built-in support for up to three open-ended "string" questions that are printed on a separate sheet that has to be marked by hand. The resulting sheet can then be scanned and evaluated along with the main sheet using nops_scan() and nops_eval(). It's on the wish list for the package to extend that number but it hasn't been implemented yet.
Another "trick" you could do is to use the pages= argument of exams2nops() to include a separate PDF sheet with the extra questions. But this would have to be handled completely separately "by hand" afterwards.

Fuzzy matching of product names

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database.
For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS"
should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with some added heuristics (removing obvious common words, assigning higher cost to number changes etc), which works to some extent, but not well enough unfortunately.
The main problem is that even single-letter changes in relevant keywords can make a huge difference, but it's not easy to detect which are the relevant keywords. Consider for example three product names:
Lenovo T400
Lenovo R400
New Lenovo T-400, Core 2 Duo
The first two are ridiculously similar strings by any standard (ok, soundex might help to disinguish the T and R in this case, but the names might as well be 400T and 400R), the first and the third are quite far from each other as strings, but are the same product.
Obviously, the matching algorithm cannot be a 100% precise, my goal is to automatically match around 80% of the names with a high confidence.
Any ideas or references is much appreciated
I think this will boil down to distinguishing key words such as Lenovo from chaff such as New.
I would run some analysis over the database of names to identify key words. You could use code similar to that used to generate a word cloud.
Then I would hand-edit the list to remove anything obviously chaff, like maybe New is actually common but not key.
Then you will have a list of key words that can be used to help identify similarities. You would associate the "raw" name with its keywords, and use those keywords when comparing two or more raw names for similarities (literally, percentage of shared keywords).
Not a perfect solution by any stretch, but I don't think you are expecting one?
The key understanding here is that you do have a proper distance metric. That is in fact not your problem at all. Your problem is in classification.
Let me give you an example. Say you have 20 entries for the Foo X1 and 20 for the Foo Y1. You can safely assume they are two groups. On the other hand, if you have 39 entries for the Bar X1 and 1 for the Bar Y1, you should treat them as a single group.
Now, the distance X1 <-> Y1 is the same in both examples, so why is there a difference in the classification? That is because Bar Y1 is an outlier, whereas Foo Y1 isn't.
The funny part is that you do not actually need to do a whole lot of work to determine these groups up front. You simply do an recursive classification. You start out with node per group, and then add the a supernode for the two closest nodes. In the supernode, store the best assumption, the size of its subtree and the variation in it. As many of your strings will be identical, you'll soon get large subtrees with identical entries. Recursion ends with the supernode containing at the root of the tree.
Now map the canonical names against this tree. You'll quickly see that each will match an entire subtree. Now, use the distances between these trees to pick the distance cutoff for that entry. If you have both Foo X1 and Foo Y1 products in the database, the cut-off distance will need to be lower to reflect that.
edg's answer is in the right direction, I think - you need to distinguish key words from fluff.
Context matters. To take your example, Core 2 Duo is fluff when looking at two instances of a T400, but not when looking at a a CPU OEM package.
If you can mark in your database which parts of the canonical form of a product name are more important and must appear in one form or another to identify a product, you should do that. Maybe through the use of some sort of semantic markup? Can you afford to have a human mark up the database?
You can try to define equivalency classes for things like "T-400", "T400", "T 400" etc. Maybe a set of rules that say "numbers bind more strongly than letters attached to those numbers."
Breaking down into cases based on manufacturer, model number, etc. might be a good approach. I would recommend that you look at techniques for term spotting to try and accomplish that: http://www.worldcat.org/isbn/9780262100854
Designing everything in a flexible framework that's mostly rule driven, where the rules can be modified based on your needs and emerging bad patterns (read: things that break your algorithm) would be a good idea, as well. This way you'd be able to improve the system's performance based on real world data.
You might be able to make use of a trigram search for this. I must admit I've never seen the algorithm to implement an index, but have seen it working in pharmaceutical applications, where it copes very well indeed with badly misspelt drug names. You might be able to apply the same kind of logic to this problem.
This is a problem of record linkage. The dedupe python library provides a complete implementation, but even if you don't use python, the documentation has a good overview of how to approach this problem.
Briefly, within the standard paradigm, this task is broken into three stages
Compare the fields, in this case just the name. You can use one or more comparator for this, for example an edit distance like the Levenshtein distance or something like the cosine distance that compares the number of common words.
Turn an array fo distance scores into a probability that a pair of records are truly about the same thing
Cluster those pairwise probability scores into groups of records that likely all refer to the same thing.
You might want to create logic that ignores the letter/number combination of model numbers (since they're nigh always extremely similar).
Not having any experience with this type of problem, but I think a very naive implementation would be to tokenize the search term, and search for matches that happen to contain any of the tokens.
"Canon PowerShot A20 IS", for example, tokenizes into:
Canon
Powershot
A20
IS
which would match each of the other items you want to show up in the results. Of course, this strategy will likely produce a whole lot of false matches as well.
Another strategy would be to store "keywords" with each item, such as "camera", "canon", "digital camera", and searching based on items that have matching keywords. In addition, if you stored other attributes such as Maker, Brand, etc., you could search on each of these.
Spell checking algorithms come to mind.
Although I could not find a good sample implementation, I believe you can modify a basic spell checking algorithm to comes up with satisfactory results. i.e. working with words as a unit instead of a character.
The bits and pieces left in my memory:
Strip out all common words (a, an, the, new). What is "common" depends on context.
Take the first letter of each word and its length and make that an word key.
When a suspect word comes up, looks for words with the same or similar word key.
It might not solve your problems directly... but you say you were looking for ideas, right?
:-)
That is exactly the problem I'm working on in my spare time. What I came up with is:
based on keywords narrow down the scope of search:
in this case you could have some hierarchy:
type --> company --> model
so that you'd match
"Digital Camera" for a type
"Canon" for company and there you'd be left with much narrower scope to search.
You could work this down even further by introducing product lines etc.
But the main point is, this probably has to be done iteratively.
We can use the Datadecision service for matching products.
It will allow you to automatically match your product data using statistical algorithms. This operation is done after defining a threshold score of confidence.
All data that cannot be automatically matched will have to be manually reviewed through a dedicated user interface.
The online service uses lookup tables to store synonyms as well as your manual matching history. This allows you to improve the data matching automation next time you import new data.
I worked on the exact same thing in the past. What I have done is using an NLP method; TF-IDF Vectorizer to assign weights to each word. For example in your case:
Canon PowerShot a20IS
Canon --> weight = 0.05 (not a very distinguishing word)
PowerShot --> weight = 0.37 (can be distinguishing)
a20IS --> weight = 0.96 (very distinguishing)
This will tell your model which words to care and which words to not. I had quite good matches thanks to TF-IDF.
But note this: a20IS cannot be recognized as a20 IS, you may consider to use some kind of regex to filter such cases.
After that, you can use a numeric calculation like cosine similarity.

Resources