error-safe templating with brew / whisker - r

An external program needs an input file with some control parameters, and I wish to generate those automatically using R. Usually, I simply use paste("parameter1: ", param1, ...) to create the long string of text, and output to a file, but the script rapidly becomes unreadable. This problem is probably well suited to whisker,
library(whisker)
template= 'Hello {{name}}
You have just won ${{value}}!
'
data <- list( name = "Chris", value= 124)
whisker.render(template, data)
My issue here, is that there's no safe-checking that data contains all the required variables, e.g.
whisker.render(template, data[-1])
will silently ignore the fact that I forgot to specify a name. My end-program will crash, however, if I fail to produce a complete config file.
Another templating system is provided by brew; it has the advantage of actually evaluating things, and potentially this can also help detect missing variables,
library(brew)
template2 = 'Hello <%= name %>
You have just won $<%= value %>!
'
data <- list( name = "Chris", value= 124)
own_brew <- function(template, values){
attach(values, pos=2)
out = capture.output(brew(text = template))
detach(values, pos=2)
cat(out, sep='\n')
invisible(out)
}
own_brew(template2, data)
own_brew(template2, data[-1]) # error
However, I am stuck with two issues:
attach() ... detach() is not ideal, (gives warnings every now and then), or at least I don't know how to use it properly. I tried to define an environment for brew(), but it was too restrictive and didn't know about base functions anymore...
even though an error occurs, a string is still returned by the function. I tried to wrap the call in try() but I have no experience in error handling. How do I tell it to quit the function producing no output?
Edit: I have updated the brew solution to use a new environment instead of attach(), and stop execution in case of an error. (?capture.output suggests that it was not the right function to use here, since "An attempt is made to write output as far as possible to file if there is an error in evaluating the expressions"...)
own_brew <- function(template, values, file=""){
env <- as.environment(values)
parent.env(env) <- .GlobalEnv
a <- textConnection("cout", "w")
out <- try(brew(text = template, envir=env, output=a))
if(inherits(out, "try-error")){
close(a)
stop()
}
cat(cout, file=file, sep="\n")
close(a)
invisible(cout)
}
There must be an easier way with tryCatch, but I can't understand a single thing in its help page.
I welcome other suggestions on the more general problem.

Using regular expressions to retrieve the variable names from the template, you could validate before rendering, e.g.,
render <- function(template, data) {
vars <- unlist(regmatches(template, gregexpr('(?<=\\{\\{)[[:alnum:]_.]+(?=\\}\\})', template, perl=TRUE)))
stopifnot(all(vars %in% names(data)))
whisker.render(template, data)
}
render(template, data)

The new glue package provides another alternative,
library(glue)
template <- 'Hello {name} You have just won ${value}!'
data <- list( name = "Chris", value= 124)
glue_data(template, .x=data)
# Hello Chris You have just won $124!
glue_data(template, .x=data[-1])
# Error in eval(expr, envir, enclos) : object 'name' not found

Since version 1.1.0 (on CRAN 19 Aug, 2016), the stringr package includes the str_interp() function (unfortunately not mentioned in the NEWS file of the release).
template <- "Hello ${name} You have just won $${value}!"
data <- list( name = "Chris", value= 124)
stringr::str_interp(template, data)
[1] "Hello Chris You have just won $124!"
stringr::str_interp(template, data[-1L])
Error in FUN(X[[i]], ...) : object 'name' not found

While preparing the stringr answer I noticed that OP's questions concerning the usage of brew() hasn't been addressed so far. In particular, the OP was asking how to provide his data to the environment and how to prevent a character string being returned in case of an error.
The OP has created a function own_brew() which wraps the call to brew(). Although, there are alternative package available now, I feel the original question deserves an answer.
This is my attempt to improve baptiste's version:
own_brew <- function(template, values, file=""){
a <- textConnection("cout", "w")
out <- brew::brew(text = template, envir=list2env(values), output=a)
close(a)
if (inherits(out, "try-error")) stop()
cat(cout, file=file, sep="\n")
invisible(cout)
}
The main differences are that list2env() is used to pass the list of values to brew() and that the call to try() is avoided by testing the return value out for an error.
template <- "Hello <%= name %> You have just won $<%= value %>!"
data <- list( name = "Chris", value= 124)
own_brew(template, data)
Hello Chris You have just won $124!
own_brew(template, data[-1L])
Error in cat(name) : object 'name' not found
Error in own_brew(template, data[-1L]) :

The new jinjar package will raise an error if a data variable is missing. It also provides the ability to explicitly handle missing values in your template.
library(jinjar)
# if name missing, raise error
template <- 'Hello {{ name }}
You have just won ${{ value }}!'
render(template, value = 124)
#> Error: [inja.exception.render_error] (at 1:10) variable 'name' not found
# if name missing, use default
template <- 'Hello {{ default(name, "world") }}
You have just won ${{ value }}!'
render(template, value = 124)
#> [1] "Hello world\nYou have just won $124!"
# if name missing, skip section
template <- '{% if exists("name") %}Hello {{ name }}{% endif -%}
You have just won ${{ value }}!'
render(template, value = 124)
#> [1] "You have just won $124!"
Created on 2022-06-25 by the reprex package (v2.0.1)

Related

googledrive::drive_mv gives error "Parent specified via 'path' is invalid: x Does not exist"

This is a weird one and I am hoping someone can figure it out. I have written a function that uses googlesheets4 and googledrive. One thing I'm trying to do is move a googledrive document (spreadsheet) from the base folder to a specified folder. I had this working perfectly yesterday so I don't know what happened as it just didn't when I came in this morning.
The weird thing is that if I step through the function, it works fine. It's just when I run the function all at once that I get the error.
I am using a folder ID instead of a name and using drive_find to get the correct folder ID. I am also using a sheet ID instead of a name. The folder already exists and like I said, it was working yesterday.
outFolder <- 'exact_outFolder_name_without_slashes'
createGoogleSheets <- function(
outFolder
){
folder_id <- googledrive::drive_find(n_max = 10, pattern = outFolder)$id
data <- data.frame(Name = c("Sally", "Sue"), Data = c("data1", "data2"))
sheet_id <- NA
nameDate <- NA
tempData <- data.frame()
for (i in 1:nrow(data)){
nameDate <- data[i, "Name"]
tempData <- data[i, ]
googlesheets4::gs4_create(name = nameDate, sheets = list(sheet1 = tempData)
sheet_id <- googledrive::drive_find(type = "spreadsheet", n_max = 10, pattern = nameDate)$id
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
} end 'for'
} end 'function'
I don't think this will be a reproducible example. The offending code is within the for loop that is within the function and it works fine when I run through it step by step. folder_id is defined within the function but outside of the for loop. sheet_id is within the for loop. When I move folder_id into the for loop, it still doesn't work although I don't know why it would change anything. These are just the things I have tried. I do have the proper authorization for google drive and googlesheets4 by using:
googledrive::drive_auth()
googlesheets4::gs4_auth(token = drive_token())
<error/rlang_error>
Error in as_parent():
! Parent specified via path is invalid:
x Does not exist.
Backtrace:
global createGoogleSheets(inputFile, outPath, addNames)
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
googledrive:::as_parent(path)
Run rlang::last_trace() to see the full context.
Backtrace:
x
-global createGoogleSheets(inputFile, outPath, addNames)
-googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
\-googledrive:::as_parent(path)
\-googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
\-cli::cli_abort(message = message, ..., .envir = .envir)
\-rlang::abort(message, ..., call = call, use_cli_format = TRUE)
I have tried changing the folder_id to the exact path of my google drive W:/My Drive... and got the same error. I should mention I have also tried deleting the folder and re-creating it fresh.
Anybody have any ideas?
Thank you in advance for your help!
I can't comment because I don't have the reputation yet, but I believe you're missing a parenthesis in your for-loop.
You need that SECOND parenthesis below:
for (i in 1:nrow(tempData) ) {
...
}

Why does my function work just at the beginning of its code?

The problem I'm having here is that, apparently, the only lines of code that the function is executing is
library(rvest)
library(RCurl)
and
url <-paste("https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/",estate,"/",year,month,sep="") as you guys can see at the end of the code.
So I think that the function can't attach values to any of the variables. Can you guys tell me how I could solve this?
I know that I could see what is happening with more detail using debug, but I'm having difficulty with that too.
icms_data <- function(estate, year, month){
# Creating a data frame
icms<- data.frame(NULL)
library(rvest)
library(RCurl)
#downloading the webpage with the arguments from the function(estate, year and month)
url <-paste("https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/",estate,"/",year,month,sep="")
#ignore token validation
options(RCurlOptions =
list(capath = system.file("CurlSSL",
"cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
y1<-getURL(url)
y <- read_html(y1)
a<- y %>%
html_nodes("#formfield-form-widgets-icms_primario div") %>%
html_text()
if(all.equal(a,character(0))==TRUE)
{
a=0
} else
{
a<-substr(a,4,100)
a = type.convert(a, na.strings = "NA", as.is = F, dec = ",",numerals = "no.loss")
}
b<- y %>%
html_nodes("#formfield-form-widgets-icms_secundario div") %>%
html_text()
if(all.equal(b,character(0))==TRUE)
{
b=0
} else
{
b<-substr(b,4,100)
b = type.convert(b, na.strings = "NA", as.is = F, dec = ",",numerals = "no.loss")
}
#puting the information scraped into the data frame
df<-data.frame(estate,year,month,a,b)
icms<-rbind(icms,df)
print(paste(url))
}
> icms_data("SP","2018", "01")
Loading required package: xml2
Loading required package: bitops
[1] "https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/SP/201801"
Firstly, as your output contains the printed URL, it looks like the entire function body is executed.
Judging by the name of your function, I assume you want it to return the variable icms.
R is a functional programming language and as such functions return their last executed expression as their result.
You should thus put icms or return(icms) at the very end of your function:
icms_data <- function(...){
<everything else you wrote>
icms<-rbind(icms,df)
print(paste(url))
icms
}
Some more background info: variable assignments that you do inside a function using <- or = are local variables to the function environment, meaning they will not be available outside of the function body. If you want these variables outside of the function you need to (a) return them as described above or (b) assign them to a different environment (for example set "global variables" using <<-). Option (b) should generally be avoided unless you know the implications of what you are doing in detail, as it can otherwise cause name conflicts that are very hard to debug.

R - `try` in conjunction with capturing ALL console output?

Here's a piece of code I'm working with:
install.package('BiocManager');BiocManager::install('UniProt.ws')
requireNamespace('UniProt.ws')
uniprot_object <- UniProt.ws::UniProt.ws(
UniProt.ws::availableUniprotSpecies(
pattern = '^Homo sapiens$')$`taxon ID`)
query_results <- try(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')))
This particular key/keytype combination is non-productive and produces the following output:
Getting mapping data for BAA08084.1 ... and ACC
error while trying to retrieve data in chunk 1:
no lines available in input
continuing to try
Error in `colnames<-`(`*tmp*`, value = `*vtmp*`) :
attempt to set 'colnames' on an object with less than two dimensions
Of the two [eE]rrors reported only the second is a 'proper' R error object and given the use of try accordingly captured in the variable query_result.
I am, however, desperate to capture the other error bit (no lines available in input) to inform downstream programmatic processes.
After playing with a plethora of capture.output, sink, purrr::quietly, etc. options found by startpaging (googling), I continue to fail capturing that bit. How can I do that?
As #Csd suggested, you could use tryCatch. The message that you are after is printed by the message() function in R, not stop(), so try() will ignore it. To capture output from message(), use code like this:
query_results <- tryCatch(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
message = function(e) conditionMessage(e))
This will abort evaluation when it gets any message, and return the message in query_results. If you are doing more than debugging, you probably want the message saved, but evaluation to continue. In that case, use withCallingHandlers instead. For example,
saveMessages <- c()
query_results <- withCallingHandlers(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
message = function(e)
saveMessages <<- c(saveMessages, conditionMessage(e)))
When I run this version, query_results is unchanged (because the later error aborted execution), but the messages are saved:
saveMessages
[1] "Getting mapping data for BAA08084.1 ... and ACC\n"
[2] "error while trying to retrieve data in chunk 1:\n no lines available in input\ncontinuing to try\n"
Based on #user2554330 s most excellent answer, I constructed an ugly thing that does exactly what I want:
try to execute the statement
don't fail fatally
leave no ugly messages
allow me access to errors and messages
So here it is in all it's despicable glory:
saveMessages <- c()
query_results <- suppressMessages(
withCallingHandlers(
try(
UniProt.ws::select(
x = uniprot_object,
keys = 'BAA08084.1',
keytype = 'EMBL/GENBANK/DDBJ',
columns = c('ENSEMBL','UNIPROTKB')),
silent = TRUE),
message = function(e)
saveMessages <<- c(saveMessages, conditionMessage(e))))

Using 'ignore' argument in hunspell function

I'm attempting to exclude some words when running hunspell_check on a text block in Rstudio.
ignore_me <- c("Daniel")
hunspell_check(unlist(some_text), ignore = ignore_me, dict = dictionary("en_GB"))
However, whenever I run I get the following error:
Error in hunspell_check(unlist(some_text, dict = dictionary("en_GB"), :
unused argument (ignore = ignore_me))
I've had a look around SO and trawled the documenation but am struggling to figure what's gone wrong.
It looks like you’ve missed a closing bracket after some_text, so it’s passinng ignore as an argument to unlist() rather than hunspell_check().
UPDATE: Ok, I think you were looking at an old version of the documentation. At least that's what I did at first (https://www.rdocumentation.org/packages/hunspell/versions/1.1/topics/hunspell_check). In the current version, 2.9, ignore is no longer an argument for hunspell_check(). Instead, use add_words in the call to dictionary():
library(hunspell)
some_text <- list("hello", "there", "Daniell")
hunspell_check(unlist(some_text), dict = dictionary("en_GB"))
# [1] TRUE TRUE FALSE
ignore_me <- "Daniell"
hunspell_check(unlist(some_text), dict = dictionary("en_GB", add_words = ignore_me))
# [1] TRUE TRUE TRUE

Get function's title from documentation

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"
If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"
It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Resources