Automate Response at Prompt in R interactive - r

Please see below my reference to a previous question asked along these lines.
I am running the library taxize in R. Taxize includes a function for getting a stable number associated with a scientific name, get_tsn().
I can run this in interactive mode or non-interactive mode so that I am either
prompted or not, respectively, to choose among multiple hits.
Interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
tsn target commonNames nameUsage
1 28728 Acer rubrum red maple accepted
2 28730 Acer rubrum ssp. drummondii NA not accepted
3 526853 Acer rubrum var. drummondii Drummond's maple accepted
...
More than one TSN found for taxon 'Acer rubrum'!
Enter rownumber of taxon (other inputs will return 'NA'):
Non-interactive:
> tax.num <- get_tsn("Acer rubrum", ask=TRUE)
Retrieving data for taxon 'Acer rubrum'
Warning message:
> 1 result; no direct match found
I need to run this library in interactive mode so that I do not get an empty result when there is more than one match. However, babysitting this script is totally unrealistic for the size of my data, which are in the millions of scientific names. Thus, I want to automate a response to the prompt so that the answer is always 1. This will be the right answer for probably 99% of cases and will ultimately still lead to the right answer downstream in 100% of cases for reasons that are probably beyond the scope of this question.
Thus, how can I automate the response to always be 1?
I looked at this question and tried modifying my code accordingly.
options(httr_oauth_cache=T)
tax.num <- get_tsn("Acer rubrum",ask=T)
However, this gave the same result shown for interactive mode above.
Your help is appreciated.

UPDATE: Ignore below. Obviously Nathan Werth posted the best answer in a comment above.
tax.num <- get_tsn_(searchterm = "Acer rubrum", rows = 1)
works wonderfully!
...
I decided to modify the source code to handle this. I suspect that there is a more desirable solution, but this one meets my needs.
Thus, in the file get_tsn.R from the source, I replaced the following block of code
# prompt
message("\n\n")
print(tsn_df)
message("\nMore than one TSN found for taxon '", x, "'!\n
Enter rownumber of taxon (other inputs will return 'NA'):\n")
# prompt
take <- scan(n = 1, quiet = TRUE, what = 'raw')
with
take <- 1
I could have deleted other echoing to screen bits, that are unnecessary and now not true.
The revised function, which I tested using trace("get_tsn",edit=TRUE), returns as follows:
> print(tax.num)
[1] "28728"
attr(,"match")
[1] "found"
attr(,"multiple_matches")
[1] TRUE
attr(,"pattern_match")
[1] FALSE
attr(,"uri")
[1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?
search_topic=TSN&search_value=28728"
attr(,"class")
[1] "tsn"
I will recompile and install it on Linux now with the edit for use with this particular project.
I still welcome other, better answers.

Related

Accessing API with for-loop randomly has encoding error, which breaks loop in R

I'm trying to access an API from iNaturalist to download some citizen science data. I'm using the package rinat to get this done (see vignette). The loop below is, essentially, pulling all observations for one species, in one state, in one year iteratively on a per-month basis, then summing the number of observations for that year (input parameters subset from my actual script for convenience).
require(rinat)
state_ids <- c(18, 14)
bird_ids <- c(14886,1409)
months <- c(1:12)
final_nums <- vector()
for(i in 1:length(state_ids)){
total_count <- vector()
for(j in 1:length(months)){
monthly <- get_inat_obs(place_id=state_ids[i],
taxon_id=bird_ids[i],
year=2019,
month = months[j])
total_count <- append(total, length(monthly$scientific_name))
print(paste("done with month", months[j], "in state", state_ids[i]))
}
final_nums <- append(final_nums, sum(total_count))
print(paste("done with state", state_ids[i]))
}
Occasionally, and seemingly randomly, I get the following error:
No encoding supplied: defaulting to UTF-8.
Error in if (!x$headers$`content-type` == "text/csv; charset=utf-8") { :
argument is of length zero
This ends up breaking the loop or makes the loop run without actually pulling any real data. Is this an issue with my script, or the API, or something else? I've tried manually supplying encoding information to the get_inat_obs() function, but it doesn't accept that as an argument. Thank you in advance!
I don't believe this is an error in your script. The issue is with the api most likely.
the error argument is of length zero is a common error when you try to make a comparison that has no length. For example:
if(logical(0) == "TEST") print("WORKED!!")
#Error in if (logical(0) == "TEST") print("WORKED!!") :
# argument is of length zero
I did some a few greps on their source code to see where this if statement is and it seems to be within inat_handle line 211 in get_inate_obs.R
This would suggest that the authors did not expect for
!x$headers$`content-type` == 'text/csv; charset=utf-8'
to evaluate to logical(0), but more specifically
x$headers$`content-type`
to be NULL.
I would suggest making a bug report on their GitHub and recommend they change the specified line to:
if(is.null(x$headers$`content-type`) || !x$headers$`content-type` == 'text/csv; charset=utf-8'){
Suggesting a bug is usually more well received if you have a reproducible example.
Also, you could totally make this change yourself locally by cloning out the git repository, editing the file, rebuild the package, and then confirm if you no longer get an error in your code.

automate input to user query in R

I apologize if this question has been asked with terminology I don't recognize but it doesn't appear to be.
I am using the function comm2sci in the library taxize to search for the scientific names for a database of over 120,000 rows of common names. Here is a subset of 10:
commnames <- c("WESTERN CAPERCAILLIE", "AARDVARK", "AARDWOLF", "ABACO ISLAND BOA",
"ABBOTT'S DAY GECKO", "ABDIM'S STORK", "ABRONIA GRAMINEA", "ABYSSINIAN BLUE
WINGED GOOSE",
"ABYSSINIAN CAT", "ABYSSINIAN GROUND HORNBILL")
When searching with the NCBI database in this function, it asks for user input if the common name is generic/general and not species specific, for example the following call will ask for clarification for "AARDVARK" by entering '1', '2' or 'return' for 'NA'.
install.packages("taxize")
library(taxize)
ncbioutput <- comm2sci(commnames, db = "ncbi")###querying ncbi database
Because of this, I cannot rely on this function to find the names of the 120000 species without me sitting and entering 'return' every few minutes. I know this question sounds taxize specific - but I've had this situation in the past with other functions as well. My question is: is there a general way to place the comm2sci call in a conditional statement that will return a specific value when user input is prompted? Or otherwise write a function that will return some input when prompted?
All searches related to this tell me how to ask for user input but not how to override user queries. These are two of the question threads I've found, but I can't seem to apply them to my situation: Make R wait for console input?, Switch R script from non-interactive to interactive
I hope this was clear. Thank you very much for your time!
So the get_* functions, used internally, all by default ask for user input when there is > 1 option. But, all of those functions have a sister function with an underscore, e.g., get_uid_ that do not prompt for input, and return all data. You can use that to get all the data, then process however you like.
Made some changes to comm2sci, so update first: devtools::install_github("ropensci/taxize")
Here's an example.
library(taxize)
commnames <- c("WESTERN CAPERCAILLIE", "AARDVARK", "AARDWOLF", "ABACO ISLAND BOA",
"ABBOTT'S DAY GECKO", "ABDIM'S STORK", "ABRONIA GRAMINEA",
"ABYSSINIAN BLUE WINGED GOOSE",
"ABYSSINIAN CAT", "ABYSSINIAN GROUND HORNBILL")
Then use get_uid_ to get all data
ids <- get_uid_(commnames)
Process the results in ids as you like. Here, for brevity, we'll just grab first row of each
ids <- lapply(ids, function(z) z[1,])
Then grab the uid's out
ids <- as.uid(unname(vapply(ids, "[[", "", "uid")), check = FALSE)
And pass to comm2sci
comm2sci(ids)
$`100830`
[1] "Tetrao urogallus"
$`9818`
[1] "Orycteropus afer"
$`9680`
[1] "Proteles cristatus"
$`51745`
[1] "Chilabothrus exsul"
$`8565`
[1] "Gekko"
$`39789`
[1] "Ciconia abdimii"
$`278977`
[1] "Abronia graminea"
$`8865`
[1] "Cyanochen cyanopterus"
$`9685`
[1] "Felis catus"
$`153643`
[1] "Bucorvus abyssinicus"
Note that NCBI returns common names from get_uid/get_uid_, so you can just go ahead and pluck those out if you want

Writing help information for user defined functions in R

I frequently use user defined functions in my code.
RStudio supports the automatic completion of code using the Tab key. I find this amazing because I always can read quickly what is supposed to go in the (...) of functions/calls.
However, my user defined functions just show the parameters, no additional info and obviously, no help page.
This isn't so much pain for me but I would like to share code I think it would be useful to have some information at hand besides the #coments in every line.
Nowadays, when I share, my lines usually look like this
myfun <- function(x1,x2,x3,...){
# This is a function for this and that
# x1 is a factor, x2 is an integer ...
# This line of code is useful for transformation of x2 by x1
some code here
# Now we do this other thing
more code
# This is where the magic happens
return (magic)
}
I think this line by line comment is great but I'd like to improve it and make some things handy just like every other function.
Not really an answer, but if you are interested in exploring this further, you should start at the rcompgen-help page (although that's not a function name) and also examine the code of:
rc.settings
Also, executing this allows you to see what the .CompletionEnv has in it for currently loaded packages:
names(rc.status())
#-----
[1] "attached_packages" "comps" "linebuffer" "start"
[5] "options" "help_topics" "isFirstArg" "fileName"
[9] "end" "token" "fguess" "settings"
And if you just look at:
rc.status()$help_topics
... you see the character items that the tab-completion mechanism uses for matching. On my machine at the moment there are 8881 items in that vector.

`data.table` error: "reorder received irregular lengthed list" in setkey

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:
setkey(my.dt,my.column)
I receive the following cryptic error message:
"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"
I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.
Even more frustrating, the following code completes correctly:
first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)
I have no idea what could be going on here. Any tips?
Edit 1: I have confirmed every value in the key fits a fairly standard format:
> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE
Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.
> Sys.info()
sysname release version nodename machine login
"Windows" "Server 2008 x64" "build 7600" "WIN-9RH28AH0CKG" "x86-64" "Administrator"
user effective_user
"Administrator" "Administrator"
Please run :
sapply(my.dt, length)
I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.
If so, there is this bug fix in v1.8.1 :
o rbind() of DT with an irregular list() now recycles the list items
correctly, #2003. Test added.
Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.
EDIT :
The original cryptic error message is now improved in v1.8.1, as follows :
DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)
Error in setkeyv(x, cols, verbose = verbose) :
Column 2 is length 4 which differs from length of column 1 (6). Invalid
data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
not already reported and fixed, please report to datatable-help.
NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

R: How can I disable truncation of listing of package functions?

How can I list all of the results that used to occur when typing packageName<tab>, i.e. the full list offered via auto-completion? In R 2.15.0, I get the following for Matrix::<tab>:
> library(Matrix)
> Matrix::
Matrix::.__C__abIndex Matrix::.__C__atomicVector Matrix::.__C__BunchKaufman Matrix::.__C__CHMfactor Matrix::.__C__CHMsimpl
Matrix::.__C__CHMsuper Matrix::.__C__Cholesky Matrix::.__C__CholeskyFactorization Matrix::.__C__compMatrix Matrix::.__C__corMatrix
Matrix::.__C__CsparseMatrix Matrix::.__C__dCHMsimpl Matrix::.__C__dCHMsuper Matrix::.__C__ddenseMatrix Matrix::.__C__ddiMatrix
Matrix::.__C__denseLU Matrix::.__C__denseMatrix Matrix::.__C__dgCMatrix Matrix::.__C__dgeMatrix Matrix::.__C__dgRMatrix
Matrix::.__C__dgTMatrix Matrix::.__C__diagonalMatrix Matrix::.__C__dMatrix Matrix::.__C__dpoMatrix Matrix::.__C__dppMatrix
Matrix::.__C__dsCMatrix Matrix::.__C__dsparseMatrix Matrix::.__C__dsparseVector Matrix::.__C__dspMatrix Matrix::.__C__dsRMatrix
Matrix::.__C__dsTMatrix Matrix::.__C__dsyMatrix Matrix::.__C__dtCMatrix Matrix::.__C__dtpMatrix Matrix::.__C__dtrMatrix
Matrix::.__C__dtRMatrix Matrix::.__C__dtTMatrix Matrix::.__C__generalMatrix Matrix::.__C__iMatrix Matrix::.__C__index
Matrix::.__C__isparseVector Matrix::.__C__ldenseMatrix Matrix::.__C__ldiMatrix Matrix::.__C__lgCMatrix Matrix::.__C__lgeMatrix
Matrix::.__C__lgRMatrix Matrix::.__C__lgTMatrix Matrix::.__C__lMatrix Matrix::.__C__lsCMatrix Matrix::.__C__lsparseMatrix
[...truncated]
That [...truncated] message is irritating and I want to produce the full listing. Which option/flag/knob/configuration/incantation do I need to invoke in order to avoid the truncation? I have this impression that I used to see the full list, but not anymore - perhaps that was on a different OS (e.g. Linux).
I know that ls("package:Matrix") is one useful approach, but it is not the same as setting an option, and the list is different.
Unfortunately, on Windows, it looks like this behavior is hard-wired into the C code used to construct the console. So the answer seems to be that "no, you can't disable it" (at least not without modifying the sources and then recompiling R from scratch).
Here are the relevant lines from $RHOME/src/gnuwin32/console.c:
909 static void performCompletion(control c)
910 {
911 ConsoleData p = getdata(c);
912 int i, alen, alen2, max_show = 10, cursor_position = p->c - prompt_wid;
...
...
1001 if (alen > max_show)
1002 consolewrites(c, "\n[...truncated]\n");
You are correct that on some other platforms, all of the results are printed out. (I often use Emacs, for instance, and it pops all results of tab completion up in a separate buffer).
As an interesting side note, rcompgen, the backend that actually performs the tab-completion (as opposed to printing results to the console) does always find all completions. It's just that Windows doesn't then print them out for us to see.
You can verify that this happens even on Windows by typing:
library(Matrix)
Matrix::
## Then type <TAB> <TAB>
## Then type <RET>
rc.status() ## Careful not to use tab-completion to complete rc.status !
matches <- rc.status()$comps
length(matches) # -> 288
matches # -> lots of symbols starting with 'Matrix::'
For more details about about the backend, and the functions and options that control its behavior, see ?rcompgen.

Resources