How to change the correct answer of a question and replicate exams from scratch when exshuffle is on (package 'exams')? - r-exams

I have the following question, where the argument of \exsolution is {0010} but it should be {1000}
\begin{question}
What is the capital of Italy
\begin{answerlist}
\item Rome
\item Paris
\item Vienna
\item Madrid
\end{answerlist}
\end{question}
\extype{schoice}
\exsolution{0010}
\exshuffle{4}
I have corrected the error and re-run the processes of creating and grading exams from scratch using the same seed. Unfortunately, the sequence of answers in this question changes (note \exshuffle{4}), so the grades assigned to this particular question are wrong. All other questions are OK.

Due to the way that exshuffle is implemented it is not easy to just change the {answerlist} and/or exsolution and get the right resulting exam.
Instead I would recommend to go through the meta-information and fix it there. I presume that you are generating the exams with exams2nops() and have stored the RDS with the metainformation, right? I will produce such a file via:
set.seed(1)
exams2nops(c("capitals.Rnw", "italy.Rnw", "switzerland.Rnw"), n = 5, dir = ".")
Thus, there are five exams with 3 exercises each with your problematic exercise italy.Rnw in second place. The metainformation is stored in metainfo.rds which we can read again via
x <- readRDS("metainfo.rds")
Now x is a list of 5 elements (exams), each of which has 3 elements (exercises), which have elements question, questionlist, solution, solutionlist, metainfo, and supplements. Here, we need to inspect the questionlist in order to fix the metainfo$solution. Currently, Vienna is marked as being correct:
x[[1]][[2]]$questionlist
## [1] "Madrid" "Vienna" "Rome" "Paris"
x[[1]][[2]]$metainfo$solution
## [1] FALSE TRUE FALSE FALSE
However, it should be Rome:
x[[1]][[2]]$questionlist == "Rome"
## [1] FALSE FALSE TRUE FALSE
So we can loop through this and save the result. Just to be safe, we also store the original RDS file:
x <- readRDS("metainfo.rds")
file.copy("metainfo.rds", "metainfo-orig.rds")
for(i in seq_along(x)) {
x[[i]][[2]]$metainfo$solution <- x[[i]][[2]]$questionlist == "Rome"
}
saveRDS(x, "metainfo.rds")
Final remark: There is also an element metainfo$string that is used when extracting exams_metainfo(). If we wanted to use that, we would need to fix the $string as well. But for nops_eval() it is sufficient to fix the $solution.

Related

How to use quanteda to find instances of appearance of certain words before certain others in a sentence

As an R newbie, by using quanteda I am trying to find instances when a certain word sequentially appears somewhere before another certain word in a sentence. To be more specific, I am looking for instances when the word "investors" is located somewhere before the word "shall" in a sentence in the corpus consisted of an international treaty concluded between Morocco and Nigeria (the text can be found here: https://edit.wti.org/app.php/document/show/bde2bcf4-e20b-4d05-a3f1-5b9eb86d3b3b).
The problem is that sometimes there are multiple words between these two words. For instance, sometimes it is written as "investors and investments shall". I tried to apply similar solutions offered on this website. When I tried the solution on (Keyword in context (kwic) for skipgrams?) and ran the following code:
kwic(corpus_mar_nga, phrase("investors * shall"))
I get 0 observations since this counts only instances when there is only one word between "investors" and "shall".
And when I follow another solution offered on (Is it possible to use `kwic` function to find words near to each other?) and ran the following code:
toks <- tokens(corpus_mar_nga)
toks_investors <- tokens_select(toks, "investors", window = 10)
kwic(toks_investors, "shall")
I get instances when "investor" appear also after "shall" and this changes the context fundamentally since in that case, the subject of the sentence is something different.
At the end, in addition to instances of "investors shall", I should also be getting, for example the instances when it reads as "Investors, their investment and host state authorities shall", but I can't do it with the above codes.
Could anyone offer me a solution on this issue?
Huge thanks in advance!
Good question. Here are two methods, one relying on regular expressions on the corpus text, and the second using (as #Kohei_Watanabe suggests in the comment) using window for tokens_select().
First, create some sample text.
library("quanteda")
## Package version: 2.1.2
# sample text
txt <- c("The investors and their supporters shall do something.
Shall we tell the investors? Investors shall invest.
Shall someone else do something?")
Now reshape this into sentences, since your search occurs within sentence.
# reshape to sentences
corp <- txt %>%
corpus() %>%
corpus_reshape(to = "sentences")
Method 1 uses regular expressions. We add a boundary (\\b) before "investors", and the .+ says one or more of any character in between "investors" and "shall". (This would not catch newlines, but corpus_reshape(x, to = "sentences") will remove them.)
# method 1: regular expressions
corp$flag <- stringi::stri_detect_regex(corp, "\\binvestors.+shall",
case_insensitive = TRUE
)
print(corpus_subset(corp, flag == TRUE), -1, -1)
## Corpus consisting of 2 documents and 1 docvar.
## text1.1 :
## "The investors and their supporters shall do something."
##
## text1.2 :
## "Investors shall invest."
A second method applies tokens_select() with an asymmetric window, with kwic(). First we select all documents (which are sentences) containing "investors", but discarding tokens before and keeping all tokens after. 1000 tokens after should be enough. Then, apply the kwic() where we keep all context words but focus on the word after, which by definition must be after, since the first word was "investors".
# method 2: tokens_select()
toks <- tokens(corp)
tokens_select(toks, "investors", window = c(0, 1000)) %>%
kwic("shall", window = 1000)
##
## [text1.1, 5] investors and their supporters | shall | do something.
## [text1.3, 2] Investors | shall | invest.
The choice depends on what suits your needs best.

Making mchoice question behaves as schoice in `exams2moodle()` of package exams

I'm using the rexams package to create questionaries for Moodle with the exams2moodle() function.
I would like to create an mchoice question with, for instance, 5 true answers and 10 false answers. Ok, but I would like that this mchoice question behaves as an schoice question; that is, that finally an schoice question is created from the mchoice.
The final created schoice question would have 1 true answer (taken randomly from the 5 true answers in the mchoice question) and 3 false answers (from the 10 false answers in the mchoice).
I think this is possible within the rexams package, at least I remember to have seen it, but I cannot do it. Thanks
See here:
exshuffle is set to 5 so that 1 correct and 4 random wrong
alternatives are subsampled and shuffled
An MWE (in .Rnw):
\exname{Test}
\extype{schoice}
\exsolution{11100000} % true, true, true and the others are false
\exshuffle{5}
\begin{question}
Question text.
\begin{answerlist}
\item a
\item b
\item c
\item 1
\item 2
\item 3
\item 4
\item 5
\end{answerlist}
\end{question}
TL;DR: As already explained by #uzsolt, you simply need to set exshuffle to 5 and extype to schoice. Then the sampling will be performed as you indicated.
Worked example: For illustration you can consider the capitals exercise provided with the R/exams package: http://www.R-exams.org/templates/capitals/ (added in R version 2.3-5).
As also discussed in an accompanying YouTube video (https://www.youtube.com/watch?v=XI5xG7Y0hQ0), this exercise is inlucded in the package as an mchoice exercise with six false and five true answer alternatives. As exshuffle is set to 5 this will randomly select five answer alternatives, making sure that at least one is true and at least one is false.
But if you modify the same exercise template to schoice, it will employ the sampling that you described: Only one of the true answers alternatives is selected and four of the false ones.

Is there no "multiple match vector" function in R?

I was trying to find a "readily available" function to do the following:
> my_array = c(5,9,11,10,6,5,9,13)
> my_array
[1] 5 9 11 10 6 5 9 13
> my_test <- c(5, 6)
> new_match_function(my_test, my_array)
[1] 1 5 6
# or instead, maybe:
# [[1]]
# [1] 1 6
# [[2]]
# [1] 5
For my purposes, %in% is close enough, since it will return:
> my_array %in% my_test
[1] TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
and I could just do:
> seq(length(my_array))[my_array %in% my_test]
[1] 1 5 6
But it just seems that something like match should provide this capability: a means to return multiple elements from the match.
If I were to create a package simply to provide this solution, it would not be strongly adopted (for good reason... this tiny use case is not worth installing a package).
Is there a solution already available? If not, where is a good place for me to add this? As I showed, it's easy enough to solve without a new function, but for match to not allow for multiple matches seems crazy. I'd ideally like to either:
Find out that I'm wrong and there is a direct function to accomplish this, or
Be able to alter match itself so that it can return multiple occurrences.
But my impression (right or wrong) has been that any adjustments to the base code are more trouble than they are worth.
For simple cases, which(my_array %in% my_test) or lapply(my_test, function(x) which(my_array==x)) works fine, but those are not the most efficient.
For the first case (just knowing which are matches, not seeing to which elements they correspond), using the fastmatch-package may help, it has the %fin% (fast-in) function, that keeps a hash table of your array so that subsequent lookups are more efficient.
For the second case, there is findMatches in the S4Vectors-bioconductor-package. (https://bioconductor.org/packages/release/bioc/html/S4Vectors.html)
Note that this function doesn't return a list, but a hits-object. To get a list, you need the buioconductor IRanges-package as well (and use as.list). (https://bioconductor.org/packages/release/bioc/html/IRanges.html)

R in simple terms - why do I have to feel like such an idiot? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
my question is simple... every reference I find in books and on the internet for learning R programming is presented in a very linear way with no context. When I try and learn things like functions, I see the code and my brain just freezes because it's looking for something to relate these R terms to and I have no frame of reference. I have a PhD and did a lot of statistics for my dissertation but that was years ago when we were using different programming languages and when it comes to R, I don't know why I can't get this into my head. Is there someone who can explain in plain english an example of this simple code? So for example:
above <- function(x, n){
use <- x > n
x[use]
}
x <- 1:20
above(x, 12)
## [1] 13 14 15 16 17 18 19 20
I'm trying to understand what's going on in this code but simply don't. As a result, I could never just write this code on my own because I don't have the language in my head that explains what is happening with this. I get stuck at the first line:
above <- function(x, n) {
Can someone just explain this code sample in plain English so I have some kind of context for understanding what I'm looking at and why I'm doing what I'm doing in this code? And what I mean by plain English is, walking through the code, step by step and not just repeating the official terms from R like vector and function and array and all these other things, but telling me, in a common sense way, what this means.
Since your background ( phd in statsitics) the best way to understand this
is in mathematics words.
Mathematically speaking , you are defining a parametric function named above that extracts all element from a vector x above a certain value n. You are just filtering the set or the vector x.
In sets notation you can write something like :
above:{x,n} --> {y in x ; y>n}
Now, Going through the code and paraphrasing it (in the left the Math side , in the right its equivalent in R):
Math R
---------------- ---------------------
above: (x,n) <---> above <- function(x, n)
{y in x ; y>n} <---> x[x > n]
So to wrap all the statments together within a function you should respect a syntax :
function_name <- function(arg1,arg2) { statements}
Applying the above to this example (we have one statement here) :
above <- function(x,n) { x[x>n]}
Finally calling this function is exactly the same thing as calling a mathematical function.
above(x,2)
ok I will try, if this is too detailed let me know, but I tried to go really slowly:
above <- function(x, n)
this defines a function, which is just some procedure which produces some output given some input, the <- means assign what is on the right hand side to what is on the left hand side, or in other words put everything on the right into the object on the left, so for example container <- 1 puts 1 into the container, in this case we put a function inside the object above,
function(x, n) everything in the paranthesis specifys what inputs the function takes, so this one takes two variables x and n,
now we come to the body of the function which defines what it does with the inputs x and n, the body of the function is everything inside the curley braces:
{
use <- x > n
x[use]
}
so let's explain that piece by piece:
use <- x > n
this part again puts whats on the right side into the object on the left, and what is happening on the right hand side? a comparison returning TRUE if x is bigger than n and FALSE if x is equal to or smaller then n, so if x is 5 and n is 3 the result will be TRUE, and this value will get stored inside use, so use contains TRUE now, now if we have more than one value inside x than every value inside x will get compared to n, so for example if x = [1, 2, 3] and n = 2
than we have
1 > 2 FALSE
2 > 2 FALSE
3 > 2 TRUE
, so use will contain FALSE, FALSE, TRUE
x[use]
now we are taking a part of x, the square brackets specify which parts of x we want, so in my example case x has 3 elements and use has 3 elements if we combine them we have:
x use
1 FALSE
2 FALSE
3 TRUE
so now we say I dont want 1,2 but i want 3 and the result is 3
so now we have defined the function, now we call it, or in normal words we use it:
x <- 1:20
above(x, 12)
first we assign the numbers 1 through 20 to x, and then we tell the function above to execute (do everything inside its curley braces with the inputs x = 1:20 and n = 12, so in other words we do the following:
above(x, 12)
execute the function above with the inputs x = 1:20 and n = 12
use <- 1:20 > 12
compare 12 to every number from 1:20 and return for each comparison TRUE if the number is in fact bigger than 12 and FALSE if otherwise, than store all the results inside use
x[use]
now give me the corresponding elements of x for which the vector use contains TRUE
so:
x use
1 FALSE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE
7 FALSE
8 FALSE
9 FALSE
10 FALSE
11 FALSE
12 FALSE
13 TRUE
14 TRUE
15 TRUE
16 TRUE
17 TRUE
18 TRUE
19 TRUE
20 TRUE
so we get the numbers 13:20 back as a result
I'll give it a crack too. A few basic points that should get you going in the right direction.
1) The idea of a function. Basically, a function is reusable code. Say I know that in my analysis for some bizarre reason I will often want to add two numbers, multiply them by a third, and divide them by a fourth. (Just suspend disbelief here.) So one way I could do that would just be to write the operation over and over, as follows:
(75 + 93)*4/18
(847 + 3)*3.1415/2.7182
(999 + 380302)*-6901834529/2.5
But that's tedious and error-prone. (What happens if I forget a parenthesis?) Alternatively, I can just define a function that takes whatever numbers I feed into it and carries out the operation. In R:
stupidMath <- function(a, b, c, d){
result <- (a + b)*c/d
}
That code says "I'd like to store this series of commands and attach them to the name "stupidMath." That's called defining a function, and when you define a function, the series of commands is just stored in memory---it doesn't actually do anything until you "call" it. "Calling" it is just ordering it to run, and when you do so, you give it "arguments" ---the stuff in the parentheses in the first line are the arguments it expects, i.e., in my example, it wants four distinct pieces of data, which will be called 'a', 'b', 'c', and 'd'.
Then it'll do the things it's supposed to do with whatever you give it. "The things it's supposed to do" is the stuff in the curly brackets {} --- that's the "body" of the function, which describes what to do with the arguments you give it. So now, whenever you want to carry that mathematical operation you can just "call" the function. To do the first computation, for example, you'd just write stupidMath(75, 93, 4, 18) Then the function gets executed, treating 75 as 'a', 83 as 'b', and so forth.
In your example, the function is named "above" and it takes two arguments, denoted 'x' and 'n'.
2) The "assignment operator": R is unique among major programming languages in using <- -- that's equivalent to = in most other languages, i.e., it says "the name on the left has the value on the right." Conceptually, it's just like how a variable in algebra works.
3) so the "body" of the function (the stuff in the curly brackets) first assigns the name "use" to the expression x > n. What's going on there. Well, an expression is something that the computer evaluates to get data. So remember that when you call the function, you give it values for x and n. The first thing this function does is figures out whether x is greater than n or less than n. If it's greater than n, it evaluates the expression x > n as TRUE. Otherwise, FALSE.
So if you were to define the function in your example and then call it with above(10, 5), then the first line of the body would set the local variable (don't worry right now about what a 'local' variable is) 'use' to be 'TRUE'. This is a boolean value.
Then the next line of the function is a "filter." Filtering is a long topic in R, but basically, R things of everything as a "vector," that is, a bunch of pieces of data in a row. A vector in R can be like a vector in linear algebra, i.e., (1, 2, 3, 4, 5, 99) is a vector, but it can also be of stuff other than numbers. For now let's just focus on numbers.
The wacky thing about R (one of the many wacky things about R) is that it treats a single number (a "scalar" in linear algebra terms) just as a vector with only one item in it.
Ok, so why did I just go into that? Because in lots of places in R, a vector and a scalar are interchangable.
So in your example code, instead of giving a scalar for the first argument, when we call the function we've given 'above' a vector for its first argument. R likes vectors. R really likes vectors. (Just talk to R people for a while. They're all obsessed with doing every goddmamn thing in terms of a vector.) So it's no problem to pass a vector for the first argument. But what that means is that the variable 'use' is going to be a vector too. Specifically, 'use' is going to be a vector of booleans, i.e., of TRUE or FALSE for each individual value of X.
To take a simpler version: suppose you said:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
when the code runs, the first thing it's going to do is define that 'use' variable. But x is a vector now, not a scalar (the c(5,10) code said "make a vector with two elements, and fill them with the numbers '5' and '10'), so R's going to go ahead and carry out the comparison for each element of x. Since 5 is less than 7 and 10 is greater than 7, use becomes the two item-vector of boolean values (FALSE, TRUE)
Ok, now we can talk about filtering. So a vector of boolean values is called a 'logical vector.' And the code x[use] says "filter x by the stuff in the variable use." When you tell R to filter something by a logical vector, it spits back out the elements of the thing being filtered which correspond to the values of 'TRUE'
So in the example just given:
mynums <- c(5, 10)
myresult <- above(mynums, 7)
the value of myresult will just be 10. Why? Because the function filtered 'x' by the logical vector 'use,' 'x' was (5, 10), and 'use' was (FALSE, TRUE); since the second element of the logical was the only true, you only got the second element of x.
And that gets assigned to the variable myresult because myresult <- above(mynums, 7) means "assign the name myresult to the value of above(mynums, 7)"
voila.

Subsetting by [ ] returns a plurality of elements when only a single element was expected [R]. Why?

Consider this 5kb data set listing the 1960 Billboard Hot 100
Load the data set into [r]:
initFile<-read.csv("Billboard Hot100 1960.csv", header = TRUE, sep = ",")
Typing in initFile$Title[1] yields the following:
[1] Theme from A Summer Place
100 Levels: A Million to One ... Young Emotions
Which is about three times more information than expected - I was under the impression that only the string directly to the right of the [1] row identifier should be returned.
After further inspection it appears that initFile$Title[1] is not a string after all,
typeof(initFile$Title[1])
[1] "integer"
Finally, forcing the element to character returns the desired output.
as.character(initFile$Title[[1]])
[1] "Theme from A Summer Place"
Why is it that initFile$Title[1] doesn't return a single string element in the first place? Other languages like MATLAB seem to be more succinct in their element access routines - is there a more compact way of accessing this information that doesn't require reminding [r] that it's looking at characters?

Resources