I have just started diving into NLP and want to use hunspell in order to perform tokenization. However, until now I was not able to use hunspell properly, since it returns "false" everytime I use the function "hunspell_check".
I installed hunspell serveral times and checked, whether dictionaries are actually present (they are). Also, I tried different functions of hunspell (like "hunspell()"), but they do not work either. Interestingly, I cannot find an error message of any kind.
> hunspell_check("work")
[1] FALSE
> dictionary(lang = "en_US")
<hunspell dictionary>
affix: C:\Users\NilsKlähn\Documents\R\win-library\3.6\hunspell\dict\en_US.aff
dictionary: C:\Users\NilsKlähn\Documents\R\win-library\3.6\hunspell\dict\en_US.dic
encoding: ISO8859-1
wordchars: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ
added: 0 custom words
I expect the function hunspell_check("work") to return true, instead of false, since it is spelled correctly. The dictionary seems to be alright though.
Related
I am coding in R and due to stability purposes when I have to deploy something, I call every function with the syntax package::function(arguments) just to avoid conflicts that as you know may happen when using a lot of packages. It helped me a lot over the years.
I know that if is a reserved word so technically speaking it is impossible (or at least it should be in my knowledge) for someone to define an object and name it if.
I am also aware that it belongs to control flow statement (which I think are a different "thing") and due to the previous consideration I am also aware that the following questions might be useless. My pure technical doubts are:
Why if I embrace it in back-ticks the function class returns "function" as a result?
Why without back-ticks I get an error? and last but most important
Why I am unable to access it via the usual base::if() syntax?
As I said, most likely useless questions but at this point I am curious about the details underneath it.
> class(if)
Error: unexpected ')' in "class(if)"
> class(`if`)
[1] "function"
> base::if(T) T
Error: unexpected 'if' in "base::if"
> if(T) T
[1] TRUE
> base::if(`T`) T
Error: unexpected 'if' in "base::if"
if-with-backticks actually returns .Primitive("if")
The R language definition section on "Internal vs Primitive" specifies that .Primitive objects include
“Special functions” which really are language elements, but implemented as primitive functions:
{ ( if for while repeat break next
return function quote switch
The reason that a naked "if" without backticks or base::if don't work is that the "language elements" above are treated as special cases by R's parser. Once you have typed base::, R's parser expects the next symbol to be a regular symbol that can be looked up in the base namespace. base::if, base::for, and base::( all return errors because R does not expect these special elements to occur at this position in the input stream; they are syntactically incorrect.
I was going through swirl() again as a refresher, and I've noticed that the author of swirl says the command ?matrix is the correct form to calling for a help screen. But, when I run ?matrix(), it still works? Is there a difference between having and not having a pair of parenthesis?
It's not specific to the swirl environment (about which I was entirely unaware until 5 minutes ago) That is standard for R. The help page for the ? shortcut says:
Arguments
topic
Usually, a name or character string specifying the topic for which help is sought.
Alternatively, a function call to ask for documentation on a corresponding S4 method: see the section on S4 method documentation. The calls pkg::topic and pkg:::topic are treated specially, and look for help on topic in package pkg.
It something like the second option that is being invoked with the command:
?matrix()
Since ?? is actually a different shortcut one needs to use this code to bring up that page, just as one needs to use quoted strings for help with for, if, next or any of the other reserved words in R:
?'?' # See ?Reserved
This is not based on a "fuzzy logic" search in hte help system. Using help instead of ? gets a different response:
> help("str()")
No documentation for ‘str()’ in specified packages and libraries:
you could try ‘??str()’
You can see the full code for the ? function by typing ? at the command line, but I am just showing how it starts the language level processing of the expressions given to it:
`?`
function (e1, e2)
{
if (missing(e2)) {
type <- NULL
topicExpr <- substitute(e1)
}
#further output omitted
By running matrix and in general any_function you get the source code of it.
From their quickstart guide I got this following sample
alert cpu.is.too.high {
template = test
$metric = q("sum:rate{counter,,1}:os.cpu{host=your-system-here}", "1h", "")
$avgcpu = avg($metric)
crit = $avgcpu > 80
warn = $avgcpu > 60
}
I would guess it's a perlish DSL. What is the name of this language?
We just call it "Bosun's expression language" and is documented at http://bosun.org/expressions.html. As you said it is a custom DSL. It currently has the following qualities
It is not imperative. The language itself actually lacks true variables, the "$foo" are just text replacement
It is functional
It is well typed (functions accept and return specific types. Since the DSL is for alerting, we believe it is important to catch as many errors at possible at parse time.)
The guts implementation of the parser and lexer is based on that guts of text/template. A map function that takes an expression to operator on every X item in a series for an entire seriesSet is in the works, so the language is still a bit in the works. But I don't think we will be change the underlying design choices mentioned above (except maybe actually use real variables instead of text replacement at some point.)
I don't know what is happening, but I can't seem to add a constant to a vector. For example, typing in the console c(1,2,3,4)+5 returns 15 instead of (6,7,8,9). What am I doing wrong?
Thank you for your help.
Someone.... probably you ... has redefined the "+" function. It's easy to do:
> `+` <- function(x,y) sum(x,y)
> c(1,2,3,4)+5
[1] 15
It's easy to fix, Just use rm():
> rm(`+`)
> c(1,2,3,4)+5
[1] 6 7 8 9
EDIT: The comments (which raised the alternate possibility that c had instead been redefined as sum) are prompting me to add information about how to examine and recover from the alternative possibilities. You could use two methods to determine which of the two functions in the expression c(1,2,3,4) + 5 was the culprit. One could either type their names (with the backticks enclosing +), and note whether you got the proper definition:
> `+`
function (e1, e2) .Primitive("+")
> c
function (..., recursive = FALSE) .Primitive("c")
Using rm on the culprit (the on that doesn't match above) remains the quickest solution. Using a global rm is an in-session brainwipe:
rm(list=ls())
# all user defined objects, including user-defined functions will be removed
The advice to quit and restart would not work in some situations. If you quit-with-save, the current function definitions would be preserved. If you had earlier quit-with-save from a session where the redefinition occurred, then not saving in this session would not have fixed the problem, either. The results of prior session are held in a file named ".Rdata and this file is invisible for both Mac and Windows users because the OS file viewer (Mac's Finder.app or MS's Windows Explorer) will not display file names that begin with a "dot". I suspect that Linux users get to see them by default since using ls in a Terminal session will show them. (It's easy to find ways to change that behavior in a Mac, and that is the way I run my device.) Deleting the .Rdata file is helpful in this instance, as well as in the situation where your R session crashes on startup.
I am trying to do the following:
try(htmlParse(ip[1], T),
where I define a as:
ip[1] = paste('http://en.wikipedia.org/wiki/George_Clooney')
I want to check if the htmlParse worked or not. For many names in my list, there will be no wikipedia sites and thus I need to be able to check and replace ip[1] with NA if the wiki pages does not exist.
Can someone please advise how I can do that. I tried using the command geterrmessage(), however I am not sure how to flush that everytime I change the name of the celebrity.
Currently I have the following:
if(!isTRUE(as.logical(grep(ip[1],err)))) {
ip[1] = NA
}
else {
This is definately incorrect as it is not running the logical statement I want.
Thanks
Amar
This simple example should help you out, I think:
res <- try(log("a"),silent = TRUE)
class(res) == "try-error"
[1] TRUE
The basic idea is the try returns (invisibly) an object of class "try-error" when there's an error. Otherwise, res will contain the result of the expression you pass to try. i.e.
res <- try(log(2),silent = TRUE)
res
[1] 0.6931472
Spend some time reading ?try carefully, including the examples (which aren't as simple as they could be, I guess). As GSee notes below, a more idiomatic way to check if an error is thrown is to use inherits(res,'try-error').
I would try to download all the names (existing or not) from wiki and save it in separate files.I would then grep the following string Wikipedia does not have an article with this exact name and for the non-existing ones I would get a TRUE value. In this way I believe you'll make sure whether the parser worked or the name didn't exist. Additionally you can sort the downloaded files based on their size in case you are suspecting that something went wrong. Corrupted ones have smaller size.
Additionally I would use tryCatch function in order to treat the logical status:
x<-3
tryCatch(x>5,error=print("this is an error"))
Here's a function that evaluates an expression and returns TRUE if it works and FALSE if it doesn't. You can also assign variables inside the expression.
try_catch <- function(exprs) {!inherits(try(eval(exprs)), "try-error")}
try_catch(out <- log("a")) # returns FALSE
out # Error: object 'out' not found
try_catch(out <- log(1)) # returns TRUE
out # out = 0