Real data not found in R/exams - r

I am trying to develop an exam based on the results of a logit model fitted to a real data set. I try to load the data set, fit the model, and include some variables extracted from the model using the r varname syntax.
I first developed a small example using artificial data generated within the exercise. That worked fine and this is the corresponding Rmd file:
```{r data generation, echo = FALSE, results = "hide"}
library(tidyverse)
d <- tibble(y = rbinom(100, 1, 0.6), x1 = rnorm(100), x2=rnorm(100))
# randomize exams
nsize <- sample(50:150, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
fmodel <- glm(y ~ x1 + x2, data = fd, family = binomial("logit"))
```
Question
========
`r nrow(fd)`
```{r}
summary(fmodel)
```
Choose the correct answer.
Answerlist
----------
* sol1 `r nrow(fd)`
* sol2
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This worked as expected when launching
elearn_exam <- c("ess3.Rmd")
set.seed(1234567)
exams2nops(elearn_exam, n = 2, language = "en",
institution = "U", title = "Exam",
dir = "nops_pdf", name = "BDV", date = "2018-01-08", duplex = FALSE)
However, this is the analogous exercise loading a real data set:
```{r data generation, echo = FALSE, results = "hide"}
load("d.Rdata")
# randomize exams
nsize <- sample(180:250, 1)
sampled_dat <- sample(1:nrow(d), nsize, replace = TRUE)
fd <- d[sampled_dat, ]
logitModel <- glm(Adopted ~ CultArea + Trained + LabRice+ Education + ExtContact, data = fd, family=binomial("logit"))
```
Question
========
`r nrow(fd)`
Choose the correct answer.
Answerlist
----------
* When adding variables, the deviance did not change. The variables did not bring some useful information.
* sol2 `r nrow(fd)`
Meta-information
================
exname: bdvDeviance
extype: schoice
exsolution: 10
exshuffle: TRUE
```
This time, I get the following error:
> elearn_exam <- c("ess4.Rmd")
> set.seed(1234567)
> exams2nops(elearn_exam, n = 2, language = "en",
+ institution = "Uu", title = "Exam",
+ dir = "nops_pdf", name = "BDV_R", date = "2018-01-08", duplex = FALSE)
Quitting from lines 14-35 (ess4.Rmd)
Error in nrow(fd) : object 'fd' not found
I do not understand what the problem is in the second case. Apparently, the fd variable is not found when including it in r fd. The problem does not come from the regression because that works fine when knitting the Rmd file.

Your second example using the real data set just loads the corresponding data file via load("d.Rdata"), assuming that it is in the current working directory. However, when using any exams2xyz() interface, the exercises are processed in a temporary directory in order not to clutter the user's workspace. Hence, the d.Rdata file is not found in that directory and consequently cannot be loaded. And because of this problem, the fd object cannot be created and inserted. In short, the r fd code is working fine, the problem is loading the data.
To avoid this problem, you must either specify the full absolute path to your data file in load("/path/to/d.Rdata") or you need to copy the data to the temporary directory before loading it. For the latter, there is the convenience function include_supplement() that copies supplementary files to the temporary directory. By default, it takes them from the directory the exercise resides in. So you simply need to add:
include_supplement("d.Rdata")
before loading the data file. Note that when the file is not in the exercise directory itself but some sub-directory you can add the argument recursive = TRUE. Then sub-directories are searched recursively.

Related

Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords (full error text below)

Full error text: Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords, : Expecting a single string value: [type=closure; extent=1].
I am trying to run a word embedding analysis using this data https://www.kaggle.com/datasets/therohk/million-headlines?resource=download to obtain:
top 25 closest words to focus word
plot these 25 words
compare same analysis with different data (JSTOR data on articles with "populism" https://constellate.org/dataset/f53e497b-844e-2b60-ec2f-b9c54d2e334e?unigrams=political,%20social)
I loaded all the data and necessary packages, as well as pre-processing the ABCNews data for the analysis. (See code)
#Loading necessary packages
install.packages(c("tidyverse", "tidytext", "word2vec", "Rtsne", "future", "jstor", "magritrr", "ggplot2", "dplyr"))
library("tidyverse")
library("tidytext")
library("word2vec")
library("Rtsne")
library("future")
library("jstor")
library(magrittr)
library("ggplot2")
library("dplyr")
#Preprocessing abcnews data
##Select text data from csv file ABC NEWS FILE
head(abcnews_pop)
abc_pop_text <- abcnews_pop %>%
select("headline_text")
head(abc_pop_text)
I then used the following code to process the embedding:
#ABCNews data
text_news<-abc_pop_text%>%
txt_clean_word2vec(.,ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)
set.seed(123456789)
news_model<-word2vec(x=text, type = "cbow", dim = 500, iter = 50)
embedding_news<-as.matrix(news_model)
The first function (text_news<-abc_pop...) ran smoothly. However, the second one (set.seed(123456789) news_model...) puts out this mistake:
Error in w2v_train(trainFile = file_train, modelFile = model, stopWordsFile = file_stopwords, : Expecting a single string value: [type=closure; extent=1].
Does anyone know how to address this?
I had an error in naming objects/variables. This has been resolved, thank you.

How does R Markdown automatically format print effects into dataframes? Or how can I access special print methods?

I'm working with the WRS2 package and there are cases where it'll output its analysis (bwtrim) into a list with a special class of the analysis type class = "bwtrim". I can't as.data.frame() it, but I found that there is a custom print method called print.bwtrim associated with it.
As an example let's say this is the output: bwtrim.out <- bwtrim(...). When I run the analysis output in an Rmarkdown chunk, it seems to "steal" part of the text output and make it into a dataframe.
So here's my question, how can I either access print.bwtrim or how does R markdown automatically format certain outputs into dataframes? Because I'd like to take this outputted dataframe and use it for other purposes.
Update: Here is a minimally working example -- put the following in a chunk in Rmd file."
```{r}
library(WRS2)
df <-
data.frame(
subject = rep(c(1:100), each = 2),
group = rep(c("treatment", "control"), each = 2),
timepoint = rep(c("pre", "post"), times = 2),
dv = rnorm(200, mean = 2)
)
analysis <- WRS2::bwtrim(dv ~ group * timepoint,
id = subject,
data = df,
tr = .2)
analysis
```
With this, a data.frame automatically shows up in the chunk afterwards and it shows all the values very nicely. My main question is how can I get this data.frame for my own uses. Because if you do str(analysis), you see that it's a list. If you do class(analysis) you get "bwtrim". if you do methods(class = "bwtrim"), you get the print method. And methods(print) will have a line that says print.bwtrim*. But I can't seem to figure out how to call print.bwtrim myself.
Regarding what Rmarkdown is doing, compare the following
If you run this in a chunk, it actually steals the data.frame part and puts it into a separate figure.
```{r}
capture.output(analysis)
```
However, if you run the same line in the console, the entire output comes out properly. What's also interesting is that if you try to assign it to another object, the output will be stolen before it can be assigned.
Compare x when you run the following in either a chunk or the console.
```{r}
x<-capture.output(analysis)
```
This is what I get from the chunk approach when I call x
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] ""
This is what I get when I do it all in the console
[1] "Call:"
[2] "WRS2::bwtrim(formula = dv ~ group * timepoint, id = subject, "
[3] " data = df, tr = 0.2)"
[4] ""
[5] " value df1 df2 p.value"
[6] "group 1.0397 1 56.2774 0.3123"
[7] "timepoint 0.0001 1 57.8269 0.9904"
[8] "group:timepoint 0.5316 1 57.8269 0.4689"
[9] ""
My question is what can I call whatever Rstudio/Rmarkdown is doing to make data.frames, so that I can have an easy data.frame myself?
Update 2: This is probably not a bug, as discussed here https://github.com/rstudio/rmarkdown/issues/1150.
Update 3: You can access the method by using WRS2:::bwtrim(analysis), though I'm still interested in what Rmarkdown is doing.
Update 4: It might not be the case that Rmarkdown is stealing the output and automatically making dataframes from it, as you can see when you call x after you've already captured the output. Looking at WRS2:::print.bwtrim, it prints a dataframe that it creates, which I'm guessing Rmarkdown recognizes then formats it out.
See below for the print.bwtrim.
function (x, ...)
{
cat("Call:\n")
print(x$call)
cat("\n")
dfx <- data.frame(value = c(x$Qa, x$Qb, x$Qab), df1 = c(x$A.df[1],
x$B.df[1], x$AB.df[1]), df2 = c(x$A.df[2], x$B.df[2],
x$AB.df[2]), p.value = c(x$A.p.value, x$B.p.value, x$AB.p.value))
rownames(dfx) <- c(x$varnames[2], x$varnames[3], paste0(x$varnames[2],
":", x$varnames[3]))
dfx <- round(dfx, 4)
print(dfx)
cat("\n")
}
<bytecode: 0x000001f587dc6078>
<environment: namespace:WRS2>
In R Markdown documents, automatic printing is done by knitr::knit_print rather than print. I don't think there's a knit_print.bwtrim method defined, so it will use the default method, which is defined as
function (x, ..., inline = FALSE)
{
if (inline)
x
else normal_print(x)
}
and normal_print will call print().
You are asking why the output is different. I don't see that when I knit the document to html_document, but I do see it with html_notebook. I don't know the details of what is being done, but if you look at https://rmarkdown.rstudio.com/r_notebook_format.html you can see a discussion of "output source functions", which manipulate chunks to produce different output.
The fancy output you're seeing looks a lot like what knitr::knit_print does for a dataframe, so maybe html_notebook is substituting that in place of print.

Is it possible to add more than four answer options to exams2nops exams?

Is it possible to have more than 4 answer options on exams2nops exams? I tried to set the option nchoice = 6 but it did not produce any effect. I have 6 answer options in the correspondent *.rmd exercise.
One example:
exams2nops(questions, n = 1, nsamp = 1, encoding = "UTF-8", blank = 0, nchoice = 6, duplex = T, reglength = 5L, points = 4, replacement = T,schoice = list(eval = ee))
And the exercise *.rmd:
```{r}
df <- readRDS(file = "some.rds")
variable <- names(df[,4:7]) %>% sample(1)
measCT<- ifelse(variable==names(df)[4],"Mean",
ifelse(variable==names(df)[5],"Mean",
ifelse(variable==names(df)[6],"Median",
ifelse(variable==names(df)[7],"Median",NaN))))
measuresTC <- c("Mode", "Percentile 25", "Percentile 75", "Median", "Mean", "Geometric mean")
options_answers <- paste0(c(measCT,measuresTC[!measuresTC %in% measCT]))
solutions <- c(T,F,F,F,F,F)
```
Question
========
`r paste0("Some question about the ", variable)`
```{r questionlist, echo = FALSE, results = "asis"}
exams::answerlist(unlist(options_answers), markup = "markdown")
```
Meta-information
================
exname: 1_1
extype: schoice
exsolution: `r paste(solutions, collapse = "|")`
exshuffle: 4
The produced pdf always presents four options...
Answer
Currently, exams2nops() only supports up to five choice alternatives.
Further comments
Optionally supporting more choice alternatives is on the wishlist for NOPS exercises but it's not very likely to be implemented in the near(er) future. (Changes in NOPS exercises require quite a bit of work because generation, scanning, and evaluation all have to be in sync and thoroughly tested etc.)
In your example, there are always exactly four choice alternatives because you set exshuffle to 4. Thus, always four alternatives are randomly selected. If you want five alternatives, you can set it to exshuffle: 5. And if you specify a number > 5 then you get a warning from exams2nops():
Error in exams2nops(questions) :
the following exercises have length < 2 or > 5: ...
Setting the nchoice argument has no effect because it is not an argument that you can set in exams2nops() but an argument for make_nops_template(). When you call exams2nops() internally the following steps happen:
Determine how many choice alternatives there are per exercise.
Set up a LaTeX template with the correct number of choices via make_nops_template().
Call exams2nops(..., template = ...) with the template created in the previous step.

Can a Y/N prompt in the RStudio Console be deactivated?

I'm using a function from an R package called RAC (R Package for Aqua Culture). It generates a Y/N prompt in the console window prior to execution. Is there a way to deactivate the prompt or automatically answer N every time?
The function Bass_pop_main will generate:
Do you want to change the inputs? [y/n]
Here's an example:
library(RAC)
setwd("../RAC_seabass") #working directory
userpath <- "../RAC_seabass" #userpath
Bass_pop_skeleton(userpath) #create input and output folders
forcings <- Bass_pop_dataloader(userpath) #load environmental variables
output <- Bass_pop_main("../RAC", forcings) #run growth model
Not sure if there is any setting that you can supply externally which will allow you to answer "No" automatically every time. However, we can change the source code of Bass_pop_main according to our requirement and use it. The source code is available if you enter Bass_pop_main in the console.
library(RAC)
Bass_pop_main_revised <- function (userpath, forcings) {
rm(list = ls())
cat("Sea Bass population bioenergetic model\n")
cat(" \n")
currentpath = getwd()
out_pre <- Bass_pop_pre(userpath, forcings)
Param = out_pre[[1]]
Tint = out_pre[[2]]
Gint = out_pre[[3]]
Food = out_pre[[4]]
IC = out_pre[[5]]
times = out_pre[[6]]
Dates = out_pre[[7]]
N = out_pre[[8]]
CS = out_pre[[9]]
out_RKsolver <- Bass_pop_loop(Param, Tint, Gint, Food, IC, times, N, userpath)
out_post <- Bass_pop_post(userpath, out_RKsolver, times, Dates, N, CS)
cat(" ")
cat("End")
return(out_post)
}
Now use Bass_pop_main_revised function instead of Bass_pop_main and it will never ask for input.
setwd("../RAC_seabass")
userpath <- "../RAC_seabass"
Bass_pop_skeleton(userpath)
forcings <- Bass_pop_dataloader(userpath)
output <- Bass_pop_main_revised("../RAC", forcings)

Using R Hmisc summary/summaryM latex command within Knitr Markdown pdf

I have been trying to get the Hmisc latex.summary and latex.summaryM examples to work within a pdf document created using Knitr in RStudio. But keep getting error messages. The example data is:
options(digits=3)
set.seed(173)
sex <- factor(sample(c("m","f"), 500, rep=TRUE))
country <- factor(sample(c('US', 'Canada'), 500, rep=TRUE))
age <- rnorm(500, 50, 5)
sbp <- rnorm(500, 120, 12)
label(sbp) <- 'Systolic BP'
units(sbp) <- "mmHg"
treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE))
sbp[1] <- NA
# Generate a 3-choice variable; each of 3 variables has 5 possible levels
symp <- c('Headache','Stomach Ache','Hangnail',
'Muscle Ache','Depressed')
symptom1 <- sample(symp, 500,TRUE)
symptom2 <- sample(symp, 500,TRUE)
symptom3 <- sample(symp, 500,TRUE)
Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
And I want to create a pdf document that contains the tables
tab1 <- summary(sex ~ treatment + Symptoms, fun=table)
tab2 <- summaryM(age + sex + sbp + Symptoms ~ treatment,
groups='treatment', test=TRUE)
I'm running R version 3.5.2 (2018-12-20), RStudio 1.1.463, Hmisc_4.2-0, and have installed tinytex using tinytex::install_tinytex().
After a few hours trial and error I discovered how, and am posting the code below in case it helps others.
The following code works for me, note;
Requirement for relsize latex package when Hmisc::units attribute is used to prevent the following failed to compile error.
! Undefined control sequence.
<recently read> \smaller
The mylatex function is taken from https://stackoverflow.com/a/31443576/4241780, and is required for removing unwanted output.
The option file = "" is needed to prevent the error
Error in system(comd, intern = TRUE, wait = TRUE) : 'yap' not found
Calls: <Anonymous> ... print -> print.latex -> show.latex -> show.dvi -> system
The use of the where = "!htbp" option ensures that the tables remain where they are placed and do not float to the top of the page (by default where = "!tbp") https://tex.stackexchange.com/a/2282.
---
title: "Untitled"
author: "Author"
date: "15 April 2019"
output:
pdf_document:
extra_dependencies: ["relsize"]
---
```{r setup, include=FALSE}
library(Hmisc)
library(dplyr)
mylatex <- function (...) {
o <- capture.output(latex(file = "", where = "!htbp", ...))
# this will strip /all/ line-only comments; or if you're only
# interested in stripping the first such comment you could
# adjust accordingly
o <- grep('^%', o, inv=T, value=T)
cat(o, sep='\n')
}
```
```{r data}
# As in question above ...
```
Here is the first table
```{r tab1, results = "asis"}
tab1 <- summary(sex ~ treatment + Symptoms, fun=table)
mylatex(tab1)
```
Here is the second table
```{r tab2, results = "asis"}
tab2 <- summaryM(age + sex + sbp + Symptoms ~ treatment, test=TRUE)
mylatex(tab2)
```

Resources