Get function's title from documentation - r

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"

If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"

It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Related

googledrive::drive_mv gives error "Parent specified via 'path' is invalid: x Does not exist"

This is a weird one and I am hoping someone can figure it out. I have written a function that uses googlesheets4 and googledrive. One thing I'm trying to do is move a googledrive document (spreadsheet) from the base folder to a specified folder. I had this working perfectly yesterday so I don't know what happened as it just didn't when I came in this morning.
The weird thing is that if I step through the function, it works fine. It's just when I run the function all at once that I get the error.
I am using a folder ID instead of a name and using drive_find to get the correct folder ID. I am also using a sheet ID instead of a name. The folder already exists and like I said, it was working yesterday.
outFolder <- 'exact_outFolder_name_without_slashes'
createGoogleSheets <- function(
outFolder
){
folder_id <- googledrive::drive_find(n_max = 10, pattern = outFolder)$id
data <- data.frame(Name = c("Sally", "Sue"), Data = c("data1", "data2"))
sheet_id <- NA
nameDate <- NA
tempData <- data.frame()
for (i in 1:nrow(data)){
nameDate <- data[i, "Name"]
tempData <- data[i, ]
googlesheets4::gs4_create(name = nameDate, sheets = list(sheet1 = tempData)
sheet_id <- googledrive::drive_find(type = "spreadsheet", n_max = 10, pattern = nameDate)$id
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
} end 'for'
} end 'function'
I don't think this will be a reproducible example. The offending code is within the for loop that is within the function and it works fine when I run through it step by step. folder_id is defined within the function but outside of the for loop. sheet_id is within the for loop. When I move folder_id into the for loop, it still doesn't work although I don't know why it would change anything. These are just the things I have tried. I do have the proper authorization for google drive and googlesheets4 by using:
googledrive::drive_auth()
googlesheets4::gs4_auth(token = drive_token())
<error/rlang_error>
Error in as_parent():
! Parent specified via path is invalid:
x Does not exist.
Backtrace:
global createGoogleSheets(inputFile, outPath, addNames)
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
googledrive:::as_parent(path)
Run rlang::last_trace() to see the full context.
Backtrace:
x
-global createGoogleSheets(inputFile, outPath, addNames)
-googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
\-googledrive:::as_parent(path)
\-googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
\-cli::cli_abort(message = message, ..., .envir = .envir)
\-rlang::abort(message, ..., call = call, use_cli_format = TRUE)
I have tried changing the folder_id to the exact path of my google drive W:/My Drive... and got the same error. I should mention I have also tried deleting the folder and re-creating it fresh.
Anybody have any ideas?
Thank you in advance for your help!
I can't comment because I don't have the reputation yet, but I believe you're missing a parenthesis in your for-loop.
You need that SECOND parenthesis below:
for (i in 1:nrow(tempData) ) {
...
}

R askYesNo function: print a variable which is a list + strings

I'm a complete newbie in programming with R and stuck at what I believe is actually a very simple question. I've borrowed some code snippets and put them together and everything seems to work, just printing of the package names which has to be installed from GitHub doesn't..
How can I print a Variable which is a list together with strings in the askyesno function. I tried {},[] and doubled them, tried "" and .format as in python, nothing worked.
In the following my Code, please help :)
not_installed = my_packages[!(my_packages %in% installed.packages()[ , "Package"])]
if(length(not_installed)) install.packages(not_installed)
if(length(not_installed != installed.packages()))
still_not_installed = list(not_installed)
Ask = askYesNo("$still_not_installed + cannot be install from CRAN. \n Load from GitHub?",
default = TRUE, prompts = getOption("askYesNo"), gettext(c("Yes", "No", "Cancel")))
if(Ask == TRUE)
p_load_gh("muschellij2/aal", "taiyun/corrplot/blob/master/R/corrplot-package.R",
install = TRUE, dependencies = TRUE)
Do you think this is a proper solution to search for not installed packages and load them?
Your approach to checking for install is not ideal in itself. It will not detect if a package is missing dependencies for example. We can use require which automatically checks if the package is actually usable.
Then we can just build the message with paste.
I assume you will also paste the package into your p_load_gh function, but I don't know the syntax of that particular one.
my_packages <- c("test","test2")
for(p in my_packages)
{
tryCatch(test <- require(p,character.only=T),
warning=function(w) return())
if(!test)
{
print(paste("Package", p, "not found. Installing Package!"))
install.packages(p)
}
tryCatch(test <- require(p,character.only=T),
warning=function(w) return())
if(!test)
{
Ask = askYesNo(paste("Package", p," not installable from CRAN. \n Load from GitHub?", default = TRUE, prompts = getOption("askYesNo"), gettext(c("Yes", "No", "Cancel")))
if(Ask) p_load_gh("muschellij2/aal", "taiyun/corrplot/blob/master/R/corrplot-package.R",
install = TRUE, dependencies = TRUE)
}
}
You can build a message string with paste. toString will nicely concatenate and comma-separate a vector, and then we can paste than on to the rest of your message:
Ask = askYesNo(
msg = paste(toString(still_not_installed),
"cannot be install from CRAN. \n Load from GitHub?"),
default = TRUE,
prompts = getOption("askYesNo"),
gettext(c("Yes", "No", "Cancel"))
)
I think you've got a bigger issue. Line 1: You get the subset of my_packages that are not installed, good. Line 2: you try to install them, fine. Line 3: this is bad. != does element-wise comparison - you're testing if the first not_installed package is not equal to the first installed package (alphabetically), then comparing the second to the second, etc. And then you're testing if the resulting boolean vector has any length---which it will. Instead I would suggest updating the not_installed list, just repeat Line 1 to get the update the list of uninstalled packages. And you don't need to list() them, keep them as a character vector.
Also, we should nest the attempted if(Ask == TRUE) inside the if() of needing github packages at all.
not_installed = my_packages[!(my_packages %in% installed.packages()[ , "Package"])]
if(length(not_installed)) install.packages(not_installed)
still_not_installed = my_packages[!(my_packages %in% installed.packages()[, "Package"])]
if(length(still_not_installed)) {
Ask = askYesNo(
msg = paste(toString(still_not_installed),
"cannot be install from CRAN. \n Load from GitHub?"),
default = TRUE,
prompts = getOption("askYesNo"),
gettext(c("Yes", "No", "Cancel"))
)
if(Ask == TRUE) {
# This code could still be improved, it assumes if we get to
# this point that both packages are missing, but it might
# only be one of them...
p_load_gh(
"muschellij2/aal",
"taiyun/corrplot/blob/master/R/corrplot-package.R",
install = TRUE, dependencies = TRUE)
)
}
}

Why does my function work just at the beginning of its code?

The problem I'm having here is that, apparently, the only lines of code that the function is executing is
library(rvest)
library(RCurl)
and
url <-paste("https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/",estate,"/",year,month,sep="") as you guys can see at the end of the code.
So I think that the function can't attach values to any of the variables. Can you guys tell me how I could solve this?
I know that I could see what is happening with more detail using debug, but I'm having difficulty with that too.
icms_data <- function(estate, year, month){
# Creating a data frame
icms<- data.frame(NULL)
library(rvest)
library(RCurl)
#downloading the webpage with the arguments from the function(estate, year and month)
url <-paste("https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/",estate,"/",year,month,sep="")
#ignore token validation
options(RCurlOptions =
list(capath = system.file("CurlSSL",
"cacert.pem",
package = "RCurl"),
ssl.verifypeer = FALSE))
y1<-getURL(url)
y <- read_html(y1)
a<- y %>%
html_nodes("#formfield-form-widgets-icms_primario div") %>%
html_text()
if(all.equal(a,character(0))==TRUE)
{
a=0
} else
{
a<-substr(a,4,100)
a = type.convert(a, na.strings = "NA", as.is = F, dec = ",",numerals = "no.loss")
}
b<- y %>%
html_nodes("#formfield-form-widgets-icms_secundario div") %>%
html_text()
if(all.equal(b,character(0))==TRUE)
{
b=0
} else
{
b<-substr(b,4,100)
b = type.convert(b, na.strings = "NA", as.is = F, dec = ",",numerals = "no.loss")
}
#puting the information scraped into the data frame
df<-data.frame(estate,year,month,a,b)
icms<-rbind(icms,df)
print(paste(url))
}
> icms_data("SP","2018", "01")
Loading required package: xml2
Loading required package: bitops
[1] "https://www.confaz.fazenda.gov.br/legislacao/boletim-do-icms/SP/201801"
Firstly, as your output contains the printed URL, it looks like the entire function body is executed.
Judging by the name of your function, I assume you want it to return the variable icms.
R is a functional programming language and as such functions return their last executed expression as their result.
You should thus put icms or return(icms) at the very end of your function:
icms_data <- function(...){
<everything else you wrote>
icms<-rbind(icms,df)
print(paste(url))
icms
}
Some more background info: variable assignments that you do inside a function using <- or = are local variables to the function environment, meaning they will not be available outside of the function body. If you want these variables outside of the function you need to (a) return them as described above or (b) assign them to a different environment (for example set "global variables" using <<-). Option (b) should generally be avoided unless you know the implications of what you are doing in detail, as it can otherwise cause name conflicts that are very hard to debug.

How to output results of 'msa' package in R to fasta

I am using the R package msa, a core Bioconductor package, for multiple sequence alignment. Within msa, I am using the MUSCLE alignment algorithm to align protein sequences.
library(msa)
myalign <- msa("test.fa", method=c("Muscle"), type="protein",verbose=FALSE)
The test.fa file is a standard fasta as follows (truncated, for brevity):
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWRFLL
When I run the code on the file, I get:
MUSCLE 3.8.31
Call:
msa("test.fa", method = c("Muscle"), type = "protein", verbose = FALSE)
MsaAAMultipleAlignment with 2 rows and 480 columns
aln
[1] MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
[2] MSVVAIVKEGWLHKRGEYIKTWR---FLL
Con MS?VAIVKEGWLHKRGEYIKTWR???FLL
As you can see, a very reasonable alignment.
I want to write the gapped alignment, preferably without the consensus sequence (e.g., Con row), to a fasta file. So, I want:
>sp|P31749|AKT1_HUMAN_RAC
MSDVAIVKEGWLHKRGEYIKTWRPRYFLL
>sp|P31799|AKT1_HUMAN_RAC
MSVVAIVKEGWLHKRGEYIKTWR---FLL
I checked the msa help, and the package does not seem to have a built in method for writing out to any file type, fasta or otherwise.
The seqinr package looks somewhat promising, because maybe it could read this output as an msf format, albeit a weird one. However, seqinr seems to need a file read in as a starting point. I can't even save this using write(myalign, ...).
I wrote a function:
alignment2Fasta <- function(alignment, filename) {
sink(filename)
n <- length(rownames(alignment))
for(i in seq(1, n)) {
cat(paste0('>', rownames(alignment)[i]))
cat('\n')
the.sequence <- toString(unmasked(alignment)[[i]])
cat(the.sequence)
cat('\n')
}
sink(NULL)
}
Usage:
mySeqs <- readAAStringSet('test.fa')
myAlignment <- msa(mySeqs)
alignment2Fasta(myAlignment, 'out.fasta')
I think you ought to follow the examples in the help pages that show input with a specific read function first, then work with the alignment:
mySeqs <- readAAStringSet("test.fa")
myAlignment <- msa(mySeqs)
Then the rownames function will deliver the sequence names:
rownames(myAlignment)
[1] "sp|P31749|AKT1_HUMAN_RAC" "sp|P31799|AKT1_HUMAN_RAC"
(Not what you asked for but possibly useful in the future.) Then if you execute:
detail(myAlignment) #function actually in Biostrings
.... you get a text file in interactive mode that you can save
2 29
sp|P31749|AKT1_HUMAN_RAC MSDVAIVKEG WLHKRGEYIK TWRPRYFLL
sp|P31799|AKT1_HUMAN_RAC MSVVAIVKEG WLHKRGEYIK TWR---FLL
If you wnat to try hacking a function for which you can get a file written in code, then look at the Biostrings detail function code that is being used
> showMethods( f= 'detail')
Function: detail (package Biostrings)
x="ANY"
x="MsaAAMultipleAlignment"
(inherited from: x="MultipleAlignment")
x="MultipleAlignment"
showMethods( f= 'detail', classes='MultipleAlignment', includeDefs=TRUE)
Function: detail (package Biostrings)
x="MultipleAlignment"
function (x, ...)
{
.local <- function (x, invertColMask = FALSE, hideMaskedCols = TRUE)
{
FH <- tempfile(pattern = "tmpFile", tmpdir = tempdir())
.write.MultAlign(x, FH, invertColMask = invertColMask,
showRowNames = TRUE, hideMaskedCols = hideMaskedCols)
file.show(FH)
}
.local(x, ...)
}
You may use export.fasta function from bio2mds library.
# reading of the multiple sequence alignment of human GPCRS in FASTA format:
aln <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
export.fasta(aln)
You can convert your msa alignment first ("AAStringSet") into an "align" object first, and then export as fasta as follows:
library(msa)
library(bios2mds)
mysequences <-readAAStringSet("test.fa")
alignCW <- msa(mysequences)
#https://rdrr.io/bioc/msa/man/msaConvert.html
alignCW_as_align <- msaConvert(alignCW, "bios2mds::align")
export.fasta(alignCW_as_align, outfile = "test_alignment.fa", ncol = 60, open = "w")

Importing data into R (rdata) from Github

I want to put some R code plus the associated data file (RData) on Github.
So far, everything works okay. But when people clone the repository, I want them to be able to run the code immediately. At the moment, this isn't possible because they will have to change their work directory (setwd) to directory that the RData file was cloned (i.e. downloaded) to.
Therefore, I thought it might be easier, if I changed the R code such that it linked to the RData file on github. But I cannot get this to work using the following snippet. I think perhaps there is some issue text / binary issue.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
y <- load(x)
Any help would be appreciated.
Thanks
This works for me:
githubURL <- "https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData"
load(url(githubURL))
head(df)
# X Y Z
# 1 16602794 -4183983 94.92019
# 2 16602814 -4183983 91.15794
# 3 16602834 -4183983 87.44995
# 4 16602854 -4183983 83.79617
# 5 16602874 -4183983 80.19643
# 6 16602894 -4183983 76.65052
EDIT Response to OP comment.
From the documentation:
Note that the https:// URL scheme is not supported except on Windows.
So you could try this:
download.file(githubURL,"myfile")
load("myfile")
which works for me as well, but this will clutter your working directory. If that doesn't work, try setting method="curl" in the call to download.file(...).
I've had trouble with this before as well, and the solution I've found to be the most reliable is to use a tiny modification of source_url from the fantastic [devtools][1] package. This works for me (on a Mac).
load_url <- function (url, ..., sha1 = NULL) {
# based very closely on code for devtools::source_url
stopifnot(is.character(url), length(url) == 1)
temp_file <- tempfile()
on.exit(unlink(temp_file))
request <- httr::GET(url)
httr::stop_for_status(request)
writeBin(httr::content(request, type = "raw"), temp_file)
file_sha1 <- digest::digest(file = temp_file, algo = "sha1")
if (is.null(sha1)) {
message("SHA-1 hash of file is ", file_sha1)
}
else {
if (nchar(sha1) < 6) {
stop("Supplied SHA-1 hash is too short (must be at least 6 characters)")
}
file_sha1 <- substr(file_sha1, 1, nchar(sha1))
if (!identical(file_sha1, sha1)) {
stop("SHA-1 hash of downloaded file (", file_sha1,
")\n does not match expected value (", sha1,
")", call. = FALSE)
}
}
load(temp_file, envir = .GlobalEnv)
}
I use a very similar modification to get text files from github using read.table, etc. Note that you need to use the "raw" version of the github URL (which you included in your question).
[1] https://github.com/hadley/devtoolspackage
load takes a filename.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
writeLines(x, tmp <- tempfile())
y <- load(tmp)

Resources