Accessing "examples" subdirectory in R-package - r

I using a CRAN package which contains a subdirectory "examples/" containing a file "ex.txt". How do I access this file?
I tried
require("XX")
read.table(paste(.path.package("XX"), "/examples/ex.txt", sep=""), header=TRUE, sep="\t")
but then the file is not found. When I look in the installation directory of the package, I indeed see no "examples/" subdirectory. However, when I run R CMD check and R CMD INSTALL on the package source, I get no warnings about the "examples/" subdirectory. So the package installs without problems, but omits the examples. What do I have to do in order to access the files in "examples/"?

At first I misread your question and thought you were the package author. The problem is that as you noticed examples doesn't get copied in when installed. A solution would be for the package authors to put the folder in /inst/examples instead of /examples. Since you don't have control of that we can create a workaround by downloading the source and then using that instead.
# Downloads the source code for a package
# Extracts it to a temporary directory
downloadAndExtract <- function(package, tdir = tempdir()){
down <- download.packages(package, destdir = tdir)
targz <- down[,2]
untar(targz, exdir = tdir)
file.path(tdir, package)
}
path <- downloadAndExtract("XX")
filepath <- file.path(path, "examples", "ex.txt")
dat <- read.table(filepath, header = TRUE, sep = "\t")
Clearly this isn't ideal but since you won't find that file in the installed package we need to resort to some sort of workaround...

Related

How do I read an .tar.xz file?

I downloaded the Gwern Branwen dataset here: https://www.gwern.net/DNM-archives
I'm trying to read the dataset in R and I'm having a lot of trouble. I tried to open one of the files in the dataset called "1776.tar.xz" and I think I "unzipped" it with untar() but I'm not getting anything past that.
untar("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
files = NULL,
list = FALSE, exdir = ".",
compressed = "xz", extras = NULL, verbose = FALSE, restore_times = TRUE,
tar = Sys.getenv("TAR"))
Edit: Thanks for all of the comments so far! The code is in base R. I have multiple datasets that I downloaded from Gwern's website. I'm just trying to open one to explore.
Base R includes function untar. On my Ubuntu 19.10 running R 3.6.2, default installation, the following was enough.
fls <- list.files(pattern = "\\.xz")
untar(fls[1], verbose = TRUE)
Note.
In the question, "dataset" is singular but there were several datasets (plural) on that website. To download the files I used
args <- "--verbose rsync://78.46.86.149:873/dnmarchives/grams.tar.xz rsync://78.46.86.149:873/dnmarchives/grams-20150714-20160417.tar.xz ./"
cmd <- "rsync"
od <- getwd()
setwd('~/tmp')
system2(cmd, args)
Thanks everyone! Not sure what was wrong with r for a bit but I reinstalled. I ended up unzipping manually and loading up the files.
I find that base R's untar() is a bit unreliable and/or slow on Windows.
What worked very well for me (on all platforms) was
library(archive)
archive_extract("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
dir="C:/User/user/Downloads/dnmarchives")
It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.
And one can also use it directly read in a csv file within an archive without having to UNZIP it first using
read_csv(archive_read("C:/User/user/Downloads/dnmarchives/1776.tar.xz", file = 1), col_types = cols())
On Debian or Ubuntu, first install the package xz-utils
$ sudo apt-get install xz-utils
Extract a .tar.xz the same way you would extract any tar.__ file.
$ tar -xf file.tar.xz
Done.

Can I get the URL of what will be used by install.packages?

When running install.packages("any_package") on windows I get the message :
trying URL
'somepath.zip'
I would like to get this path without downloading, is it possible ?
In other terms I'd like to get the CRAN link to the windows binary of the latest release (the best would actually be to be able to call a new function with the same parameters as install.packages and get the proper url(s) as an output).
I would need a way that works from the R console (no manual checking of the CRAN page etc).
I am not sure if this is what you are looking for. This build the URL from the repository information and building the file name of the list of available packages.
#get repository name
repos<- getOption("repos")
#Get url for the binary package
#contrib.url(repos, "both")
contriburl<-contrib.url(repos, "binary")
#"https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.5"
#make data.frame of avaialbe packages
df<-as.data.frame(available.packages())
#find package of interest
pkg <- "tidyr" #example
#ofinterest<-grep(pkg, df$Package)
ofinterest<-match(pkg, df$Package) #returns a single value
#assemble name, assumes it is always a zip file
name<-paste0(df[ofinterest,]$Package, "_", df[ofinterest,]$Version, ".zip")
#make final URL
finalurl<-paste0(contriburl, "/", name)
Here's a couple functions which respectively :
get the latest R version from RStudio's website
get the url of the last released windows binary
The first is a variation of code I found in the installr package. It seems there's no clean way of getting the last version, so we have to scrape a webpage.
The second is really just #Dave2e's code optimized and refactored into a function (with a fix for outdated R versions), so please direct upvotes to his answer.
get_package_url <- function(pkg){
version <- try(
available.packages()[pkg,"Version"],
silent = TRUE)
if(inherits(version,"try-error"))
stop("Package '",pkg,"' is not available")
contriburl <- contrib.url(getOption("repos"), "binary")
url <- file.path(
dirname(contriburl),
get_last_R_version(2),
paste0(pkg,"_",version,".zip"))
url
}
get_last_R_version <- function(n=3){
page <- readLines(
"https://cran.rstudio.com/bin/windows/base/",
warn = FALSE)
line <- grep("R-[0-9.]+.+-win\\.exe", page,value=TRUE)
long <- gsub("^.*?R-([0-9.]+.+)-win\\.exe.*$","\\1",line)
paste(strsplit(long,"\\.")[[1]][1:n], collapse=".")
}
get_package_url("data.table")
# on my system with R 3.3.1
# [1] "https://lib.ugent.be/CRAN/bin/windows/contrib/3.5/data.table_1.11.4.zip"

install.packages does not deal with whitespace in file path

A simple change in the example vignette from this site illustrates my problem.
The code below will run. No problem. Because there is no whitespace in the url.
#miniCRAN example
library("miniCRAN")
# use Revolution Analytics CRAN mirror
revolution <- c(CRAN = "http://cran.microsoft.com")
# Specify list of packages to download
pkgs <- c("foreach")
pkgList <- pkgDep(pkgs, repos = revolution, type = "source", suggests = FALSE)
pkgList
# Create temporary folder for miniCRAN
dir.create(pth <- file.path("C:", "RTEMP", "miniCRAN"), recursive=TRUE)
# Make repo for source and win.binary
makeRepo(pkgList, path = pth, repos = revolution, type = c("source", "win.binary"))
# List all files in miniCRAN
list.files(pth, recursive = TRUE, full.names = FALSE)
#install packages from your local repository
install.packages(pkgs, repos = paste0("file:///", pth), type = "source")
But if we change the following line so it has a space character, then it will fail on install.packages.
# Create temporary folder for miniCRAN
dir.create(pth <- file.path("C:", "WHITE SPACE", "miniCRAN"), recursive=TRUE)
Looks to me like the pth string gets split up. Is there any way around this, other than changing folder names in my filesystem? I tried to replace " " with "%20" but that did not help. I am on a Windows system, btw.
Warning: invalid package 'C:/WHITE'
Warning: invalid package 'SPACE/miniCRAN/src/contrib/foreach_1.4.4.tar.gz'
Error: ERROR: no packages specified
Firstly, I think file.path("C:", "WHITE SPACE", "miniCRAN") is not valid path, because there's no slash after C:.
Anyway, to use install.packages with a path containing white spaces, use shortPathName:
shortPathName(file.path("C:/", "WHITE SPACE", "miniCRAN"))

issue with get_rollit_source

I tried to use get_rollit_source from the RcppRoll package as follows:
library(RcppRoll)
get_rollit_source(roll_max,edit=TRUE,RStudio=TRUE)
I get an error:
Error in get("outFile", envir = environment(fun)) :
object 'outFile' not found
I tried
outFile="C:/myDir/Test.cpp"
get_rollit_source(roll_max,edit=TRUE,RStudio=FALSE,outFile=outFile)
I get an error:
Error in get_rollit_source(roll_max, edit = TRUE, RStudio = FALSE, outFile = outFile) :
File does not exist!
How can fix this issue?
I noticed that the RcppRoll folder in the R library doesn't contain any src directory. Should I download it?
get_rollit_source only works for 'custom' functions. For things baked into the package, you could just download + read the source code (you can download the source tarball here, or go to the GitHub repo).
Anyway, something like the following should work:
rolling_sqsum <- rollit(final_trans = "x * x")
get_rollit_source(rolling_sqsum)
(I wrote this package quite a while back when I was still learning R / Rcpp so there are definitely some rough edges...)

Using inst/extdata with vignette during package checking R 2.14.0

I have a package which contains a csv file which I put in inst/extdata per R-exts. This file is needed for the vignette. If I Sweave the vignette directly, all works well. When I run R --vanilla CMD check however, the check process can't find the file. I know it has been moved into an .Rcheck directory during checking and this is probably part of the problem. But I don't know how to set it up so both direct Sweave and vignette building/checking works.
The vignette contains a line like this:
EC1 <- dot2HPD(file = "../inst/extdata/E_coli/ecoli.dot",
node.inst = "../inst/extdata/E_coli/NodeInst.csv",
and the function dot2HPD accesses the file via:
ni <- read.csv(node.inst)
Here's the error message:
> tab <- read.csv("../inst/extdata/E_coli/NodeInst.csv")
Warning in file(file, "rt") :
cannot open file '../inst/extdata/E_coli/NodeInst.csv': No such file or directory
When sourcing ‘HiveR.R’:
Error: cannot open the connection
Execution halted
By the way, this is related to this question but that info seems outdated and doesn't quite cover this territory.
I'm on a Mac.
Have you tried using system.file instead of hardcoded relative paths?
EC1 <- dot2HPD(file = system.file("inst", "extdata", "E_coli", "ecoli.dot", package = "your_package+name"))
node.inst <- system.file("inst", "extdata", "E_coli", "NodeInst.csv", package = "your_package_name")

Resources