Using 'ignore' argument in hunspell function - r

I'm attempting to exclude some words when running hunspell_check on a text block in Rstudio.
ignore_me <- c("Daniel")
hunspell_check(unlist(some_text), ignore = ignore_me, dict = dictionary("en_GB"))
However, whenever I run I get the following error:
Error in hunspell_check(unlist(some_text, dict = dictionary("en_GB"), :
unused argument (ignore = ignore_me))
I've had a look around SO and trawled the documenation but am struggling to figure what's gone wrong.

It looks like you’ve missed a closing bracket after some_text, so it’s passinng ignore as an argument to unlist() rather than hunspell_check().
UPDATE: Ok, I think you were looking at an old version of the documentation. At least that's what I did at first (https://www.rdocumentation.org/packages/hunspell/versions/1.1/topics/hunspell_check). In the current version, 2.9, ignore is no longer an argument for hunspell_check(). Instead, use add_words in the call to dictionary():
library(hunspell)
some_text <- list("hello", "there", "Daniell")
hunspell_check(unlist(some_text), dict = dictionary("en_GB"))
# [1] TRUE TRUE FALSE
ignore_me <- "Daniell"
hunspell_check(unlist(some_text), dict = dictionary("en_GB", add_words = ignore_me))
# [1] TRUE TRUE TRUE

Related

R S3 generic methods set to visible FALSE

I have an R object lf which is an element of the class tbl_lazy:
library(dbplyr)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = simulate_hana())
>class(lf)
[1] "tbl_HDB" "tbl_lazy" "tbl"
With the help of the sloop package, I can see that the generic function print.tbl_lazy is set to visible = FALSE. This seems to be the reason why printing print.tbl_lazy returns Error: object 'print.tbl_lazy' not found.
generic class visible source
<chr> <chr> <lgl> <chr>
11 print tbl_lazy FALSE registered S3method
When I debug print I see the call to print.lazy and can now see the content of print.tbl_lazy.
debugging in: function (x, ...)
UseMethod("print")(x)
debug: UseMethod("print")
Browse[2]> n
debugging in: print.tbl_lazy(x)
debug: {
show_query(x)
}
My question is why are all the methods of the class tbl_lazy set to visible = FALSE and what are the consequences of this? It would appear to me, while it may have some advantages, whatever they might be, it makes the code of the method more difficult to access, which in a language like R, used by so many non technical users, seems to be a big disadvantage.
I wasn't able to find any documentation on this.

R-How dos the pos() function work for parts-of-speech tagging

I'm new to R and confused with the way the pos() function works. Here's why:
Example:
library(qdap)
s1<-c("Hello World")
pos(s1)
This produces the correct output saying the word count
wrd.cnt - 2
NN -1(50%)
UH-1(50%)
whereas the following to operations throws errors:
s2<-"Hello"
pos(s2)
Error in apply(pro, 2, paster, digits = digits, symbol = s.ymb, override = override) :
dim(X) must have a positive length
s3<-c("Hello Hello")
pos(s3)
Error in apply(pro, 2, paster, digits = digits, symbol = s.ymb, override = override) :
dim(X) must have a positive length
I'm not able to understand why this is caused.
You have found a bug in this version of qdap cause by not using drop = FALSE while indexing.
The dev version will behave as expected. You can download it easily with this code:
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_install_gh("trinker/qdap")
The following has been added to the NEWS file as well:
pos threw an error if only one word was passed to text.var. Fix:
drop = FALSE has been added to data frame indexing. Caught by
StackOverflow user G_1991 R-How dos the pos() function work for parts-of-speech tagging.
Here's the updated output:
library(qdap)
s1<-c("Hello World")
pos(s1)
## wrd.cnt NN UH
## 1 2 1(50%) 1(50%)
s2<-"Hello"
pos(s2)
## wrd.cnt UH
## 1 1 1(100%)

How to get the queue number from CONDOR into your R job

I think I have a simple problem because I was looking up and down the internet and couldn't find someone else asking this question:
My university has a Condor set-up. I want to run several repetitions of the same code (e.g. 100 times). My R code has a routine to store the results in a file, i.e.:
write.csv(res, file=paste(paste(paste(format(Sys.time(), '%y%m%d'),'res', queue, sep="_"), sep='/'),'.csv',sep='',collapse=''))
res are my results (a data.frame), I indicate that this file contains the results with 'res' and finally I want to add the queue number of this calculation (otherwise files would be replaced, wouldn't they?). It should look like: 140109_res_1.csv, 140109_res_2.csv, ...
My submit file to condor looks like this:
universe = vanilla
executable = /usr/bin/R
arguments = --vanilla
log = testR.log
error = testR.err
input = run_condor.r
output = testR$(Process).txt
requirements = (opsys == "LINUX") && (arch == "X86_64") && (HAS_R_2_13 =?= True)
request_memory = 1000
should_transfer_files = YES
transfer_executable = FALSE
when_to_transfer_output = ON_EXIT
queue 3
I wonder how do I get the 'queue' number into my R code? I tried a simple example with
print(queue)
print(Queue)
But there is no object found called queue or Queue. Any suggestions?
Best wishes,
Marco
Okay, I solved the problem. This is how it goes:
I had to change my submit file. I changed the slot arguments to:
arguments = --vanilla --args $(Process)
Now the process number is forwarded to the R code. There you retrieve it with the following line. The value will be stored as a character. Therefore, you should convert it to a numeric value (also check whether a number like 10 is passed on as '1' and '0' in which case you should also collapse the values).
run <- commandArgs(TRUE)
Here is an example of the code I let run.
> run <- commandArgs(TRUE)
> run
[1] "0"
> class(run)
[1] "character"
> try(as.numeric(run))
[1] 0
> try(run <- as.numeric(paste(run, collapse='')) )
> try(print(run))
[1] 0
> try(write(run, paste(run,'csv', sep='.')))
You can also find information how to pass on variables/arguments to your code here: http://research.cs.wisc.edu/htcondor/manual/v7.6/condor_submit.html
I hope this helps anyone.
Cheers and thanks for all other commenters!
Marco

error-safe templating with brew / whisker

An external program needs an input file with some control parameters, and I wish to generate those automatically using R. Usually, I simply use paste("parameter1: ", param1, ...) to create the long string of text, and output to a file, but the script rapidly becomes unreadable. This problem is probably well suited to whisker,
library(whisker)
template= 'Hello {{name}}
You have just won ${{value}}!
'
data <- list( name = "Chris", value= 124)
whisker.render(template, data)
My issue here, is that there's no safe-checking that data contains all the required variables, e.g.
whisker.render(template, data[-1])
will silently ignore the fact that I forgot to specify a name. My end-program will crash, however, if I fail to produce a complete config file.
Another templating system is provided by brew; it has the advantage of actually evaluating things, and potentially this can also help detect missing variables,
library(brew)
template2 = 'Hello <%= name %>
You have just won $<%= value %>!
'
data <- list( name = "Chris", value= 124)
own_brew <- function(template, values){
attach(values, pos=2)
out = capture.output(brew(text = template))
detach(values, pos=2)
cat(out, sep='\n')
invisible(out)
}
own_brew(template2, data)
own_brew(template2, data[-1]) # error
However, I am stuck with two issues:
attach() ... detach() is not ideal, (gives warnings every now and then), or at least I don't know how to use it properly. I tried to define an environment for brew(), but it was too restrictive and didn't know about base functions anymore...
even though an error occurs, a string is still returned by the function. I tried to wrap the call in try() but I have no experience in error handling. How do I tell it to quit the function producing no output?
Edit: I have updated the brew solution to use a new environment instead of attach(), and stop execution in case of an error. (?capture.output suggests that it was not the right function to use here, since "An attempt is made to write output as far as possible to file if there is an error in evaluating the expressions"...)
own_brew <- function(template, values, file=""){
env <- as.environment(values)
parent.env(env) <- .GlobalEnv
a <- textConnection("cout", "w")
out <- try(brew(text = template, envir=env, output=a))
if(inherits(out, "try-error")){
close(a)
stop()
}
cat(cout, file=file, sep="\n")
close(a)
invisible(cout)
}
There must be an easier way with tryCatch, but I can't understand a single thing in its help page.
I welcome other suggestions on the more general problem.
Using regular expressions to retrieve the variable names from the template, you could validate before rendering, e.g.,
render <- function(template, data) {
vars <- unlist(regmatches(template, gregexpr('(?<=\\{\\{)[[:alnum:]_.]+(?=\\}\\})', template, perl=TRUE)))
stopifnot(all(vars %in% names(data)))
whisker.render(template, data)
}
render(template, data)
The new glue package provides another alternative,
library(glue)
template <- 'Hello {name} You have just won ${value}!'
data <- list( name = "Chris", value= 124)
glue_data(template, .x=data)
# Hello Chris You have just won $124!
glue_data(template, .x=data[-1])
# Error in eval(expr, envir, enclos) : object 'name' not found
Since version 1.1.0 (on CRAN 19 Aug, 2016), the stringr package includes the str_interp() function (unfortunately not mentioned in the NEWS file of the release).
template <- "Hello ${name} You have just won $${value}!"
data <- list( name = "Chris", value= 124)
stringr::str_interp(template, data)
[1] "Hello Chris You have just won $124!"
stringr::str_interp(template, data[-1L])
Error in FUN(X[[i]], ...) : object 'name' not found
While preparing the stringr answer I noticed that OP's questions concerning the usage of brew() hasn't been addressed so far. In particular, the OP was asking how to provide his data to the environment and how to prevent a character string being returned in case of an error.
The OP has created a function own_brew() which wraps the call to brew(). Although, there are alternative package available now, I feel the original question deserves an answer.
This is my attempt to improve baptiste's version:
own_brew <- function(template, values, file=""){
a <- textConnection("cout", "w")
out <- brew::brew(text = template, envir=list2env(values), output=a)
close(a)
if (inherits(out, "try-error")) stop()
cat(cout, file=file, sep="\n")
invisible(cout)
}
The main differences are that list2env() is used to pass the list of values to brew() and that the call to try() is avoided by testing the return value out for an error.
template <- "Hello <%= name %> You have just won $<%= value %>!"
data <- list( name = "Chris", value= 124)
own_brew(template, data)
Hello Chris You have just won $124!
own_brew(template, data[-1L])
Error in cat(name) : object 'name' not found
Error in own_brew(template, data[-1L]) :
The new jinjar package will raise an error if a data variable is missing. It also provides the ability to explicitly handle missing values in your template.
library(jinjar)
# if name missing, raise error
template <- 'Hello {{ name }}
You have just won ${{ value }}!'
render(template, value = 124)
#> Error: [inja.exception.render_error] (at 1:10) variable 'name' not found
# if name missing, use default
template <- 'Hello {{ default(name, "world") }}
You have just won ${{ value }}!'
render(template, value = 124)
#> [1] "Hello world\nYou have just won $124!"
# if name missing, skip section
template <- '{% if exists("name") %}Hello {{ name }}{% endif -%}
You have just won ${{ value }}!'
render(template, value = 124)
#> [1] "You have just won $124!"
Created on 2022-06-25 by the reprex package (v2.0.1)

Get function's title from documentation

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"
If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"
It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Resources