data.table fread error - gzip file - set temporary directory - r

I'm attempting to read a .gz-file using data.tables fread-function. I have tried the syntax suggested here:
dt = fread("gunzip -c myfile.gz")
but I get a verbose error message:
Error in fread("gunzip -c myfile.gz") :
File is empty: C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef
In addition: Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c (gunzip -c myfile.gz) > C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef' had status 1
2: In shell(paste("(", input, ") > ", tt, sep = "")) :
'(gunzip -c 180227.2101.2017.MRE.csv.gz) > C:\Users\MARK~1.MUR\AppData\Local\Temp\RtmpIBawPA\file498c1c4114ef' execution failed with error code 1
My guess here is that access to a temporary file is being denied by my IT masters (?). If this is the case how do I set the temporary file path to say the current directory for the unzip?

As you are on a Windows PC you probably don't have access to command line tools, which might be the reason for this.
A possible solution might be to unzip first and then read with fread. The following example works on my Windows VM:
write.csv(mtcars, 'mtcars.csv')
zip('mtcars.csv.zip', 'mtcars.csv')
unzip('mtcars.csv.zip')
fread('mtcars.csv')
For .gz files, you can use the gunzip function from R.utils. The following example works for me:
write.csv(mtcars, gzfile('mtcars2.csv.gz'))
library(R.utils)
gunzip('mtcars2.csv.gz')
fread('mtcars2.csv')
Consequently, you might need something like this:
library(R.utils)
gunzip('myfile.gz')
fread('myfile.csv')

Try read_csv() from the readr package, which handles .gz automatically:
dt = as.data.table(read_csv("myfile.gz"))
(or another read_* function if it's not a csv)

Related

Spaces in paths in batch mode R

I'm trying to get an R script to run from a batch file so it can be nice and clean for other users. Currently, you drag and drop a CSV file onto the batch file and it passes the file name to the R script for input.
When there's a space in the file path/name it works fine in RStudio but causes problems when I call it from the batch file. When I do that it tries to open the path before the space.
I've tried to reformat the file path from within R by using shortPathName(inputPath) and by replacing spaces with "\ " but it doesn't seem to work.
At the moment, the script is launched with
"%~dp0\R-3.6.0\bin\R.exe" CMD BATCH "--args %~1" "%~dp0\Script.R"
with the script containing
args <- commandArgs(TRUE)
inputPath <- args[1]
inputPath <- shortPathName(inputPath)
inputData <- read.csv(inputPath)
It runs fine from within RStudio but crashes when launched from the batch producing this error message in the output file:
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file 'file path up to the space': No such file or directory
Execution halted
By no means a R expert, but I'd try
%~dp0\R-3.6.0\bin\R.exe" CMD BATCH "--args %~s1" "%~dp0\Script.R"
The %~s1 should supply the short filename as the argument.
After trying several formulations of the batch file and some debugging, I found that the batch file was passing the first part of the file before the space as the first argument.
After finding that the use of R in CMD BATCH mode is no longer advisable so switched to running using Rscript mode as
"%~dp0\R-3.6.0\bin\Rscript.exe" --vanilla "%~dp0\Script.R" "%~1"
This allowed for the argument to be passed to R with "", and hence with the space.
Since v3.5.1, R accepts file paths with spaces.

Extracting file from LZMA archive with R

I am trying to extract a file from a LZMA archive downloaded from an API containing JSON files, using R. On my computer I can extract the file manually in Windows Explorer with no problems.
Here's my code currently (API details removed):
tempFile <- tempfile()
destDir <- "extracted-files"
if (!dir.exists(destDir)) dir.create(destDir)
download.file("api_url.tar.xz", destfile = tempFile)
untar(tempFile, exdir = destDir)
When I attempt to extract the file, I receive the following error messages:
/usr/bin/tar: This does not look like a tar archive
/usr/bin/tar: Skipping to next header
/usr/bin/tar: Exiting with failure status due to previous errors
Warning messages:
1: running command 'tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"' had status 2
2: In untar(tempFile, exdir = destDir) :
‘tar.exe -xf "C:\Users\XXX\AppData\Local\Temp\RtmpMncPWp\file2eec75e23a15" -C "extracted-files"’ returned error code 2
I am using Windows 10 with R version 3.3.1 (2016-06-21).
Using library(archive) one can also read in a particular csv file within an archive without having to UNZIP it first :
library(archive)
library(readr)
read_csv(archive_read("api_url.tar.xz", file = 1), col_types = cols()) # adjust file=XX as appropriate
This is quite a bit faster.
To unzip everything one can use
archive_extract("api_url.tar.xz", dir=XXX)
That worked very well for me & is faster than the unbuilt untar(). It also works on all platforms. It supports 'tar', 'ZIP', '7-zip', 'RAR', 'CAB', 'gzip', 'bzip2', 'compress', 'lzma' and 'xz' formats.
SOLVED:
While it seemed to work perfectly on Mac, for it to work on Windows you need to open the compressed .xz file connection for reading in binary mode, before passing it to untar():
download.file(url, tmp)
zz <- xzfile(tmp, open = "rb")
untar(zz, exdir = destDir)
An alternative, and even simpler solution is to specify the 'mode' parameter for download.file() as follows:
download.fileurl, destfile = tmp, mode = "wb")

Set 'texi2dvi' for 'R CMD Rd2pdf'

My texi2dvi is apparently in a place where R CMD Rd2pdf doesn't expect it. Mine is at /usr/local/bin/texi2dvi, and it's being looked for at /usr/local/opt/texinfo/bin/texi2dvi:
[KenMacBook:~/git] % \R CMD Rd2pdf missing
Hmm ... looks like a package
Converting Rd files to LaTeX
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
Output:
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
Output:
Error in running tools::texi2pdf()
I can work around this by running R_TEXI2DVICMD=/usr/local/bin/texi2dvi R CMD Rd2pdf, and then the docs are built correctly.
I'd like to put that setting in my .Rprofile so that things like RStudio (which won't read my .zshrc) and other random R sessions will see the setting. But neither of the following seems to have any effect in my .Rprofile:
Sys.setenv(R_TEXI2DVICMD='/usr/local/bin/texi2dvi')
options(texi2dvi='/usr/local/bin/texi2dvi')
I'm guessing .Rprofile doesn't get read by R CMD commands, is that correct? Is there an appropriate place to put my settings?
UPDATE:
Since Dirk doubts my doubting of .RProfile for affecting R CMD Rd2pdf :-), here's my evidence:
[KenMacBook:~/git] % tail -n2 ~/.Rprofile
Sys.setenv(TEXI2DVI='/no/where')
cat("End of RProfile\n")
[KenMacBook:~/git] % Rscript -e '2+2'
End of RProfile
[1] 4
[KenMacBook:~/git] % R CMD Rd2pdf missing
Hmm ... looks like a package
Converting Rd files to LaTeX
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
Output:
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
Output:
Error in running tools::texi2pdf()
Notice that the file's settings are respected in a normal R session, but setting TEXI2DVI has no effect here.
That seems wrong as /usr/local/bin/texi2dvi should be in the $PATH.
I have
edd#max:~$ grep texi2dvi /etc/R/Renviron
## used for options("texi2dvi")
R_TEXI2DVICMD=${R_TEXI2DVICMD-${TEXI2DVI-'/usr/bin/texi2dvi'}}
edd#max:~$
Note that if you want to set the TEXI2DVI environment variable, you probably have to do start before you start R -- think ~/.bash_profile
Here is an example explicitly setting TEXI2DVI:
edd#max:/tmp$ TEXI2DVI=/no/where R CMD Rd2pdf Rcpp-package.Rd
Converting Rd files to LaTeX ...
Rcpp-package.Rd
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: 1: /no/where: not found
Output:
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: 1: /no/where: not found
Output:
Error in running tools::texi2pdf()
edd#max:/tmp$
As you can see, it is respected.
Edit: Also let's not forget Renviron and Renviron.site so you have plenty of choices so set his.
Edit 2: As you seem to doubt ~/.Rprofile:
edd#max:~$ tail -1 .Rprofile
cat("End of .Rprofile\n")
edd#max:~$ Rscript -e '2+2'
End of .Rprofile
[1] 4
edd#max:~$
Had the same problem and figured out how to fix it - I think it has something to do with a previous installation of macports interfering with the path when R has been installed using brew (assuming you're on OSX).
Run the following in terminal:
defaults write com.apple.finder AppleShowAllFiles TRUE
Then go Apple > Force Quit > Finder > Relaunch. You'll now be able to see hidden files.
In your user directory there may be a file named .profile, in this file I commented out the line (i.e. put a # in front of it, as shown).
#export PATH=/opt/local/bin:/opt/local/sbin:$PATH
Then navigate to your R.home() (get this by running R.home() in R)
R.home()
[1] "/usr/local/Cellar/r/3.2.4_1/R.framework/Resources"
And modify the following line in Renviron
R_TEXI2DVICMD=${R_TEXI2DVICMD-${TEXI2DVI-'/usr/local/bin/texi2dvi'}}
This fixed it for me.
To put finder back to usual run
defaults write com.apple.finder AppleShowAllFiles FALSE
in the terminal and relaunch it.
Hope that helps.
The back-and-forth with Dirk produced some solutions, but they're pretty buried, so I summarize them here.
Diagnosis: etc/Renviron contains stale info.
My /usr/local/Cellar/r/3.2.2_1/R.framework/Versions/3.2/Resources/etc/Renviron file (installed using Homebrew) contains this line:
R_TEXI2DVICMD=${R_TEXI2DVICMD-${TEXI2DVI-'/usr/local/opt/texinfo/bin/texi2dvi'}}
That's a remnant of someone (possibly me, possibly Homebrew's R package creator) who installed MacTeX in the default location, and then that path got frozen in time in the Renviron file. My texi2dvi is now at /usr/local/bin/texi2dvi, so this value needs to be overridden somehow.
1) $HOME/.Rprofile and $HOME/.Renviron won't help.
They don't take effect soon enough for R to notice them. R sets options("texi2dvi") based on the environment it sees at startup,
% tail -n2 ~/.Rprofile
Sys.setenv(TEXI2DVI='/no/where')
options(texi2dvi='/no/where/else')
% cat ~/.Renviron
TEXI2DVI=/no/where/at/all
% R CMD Rd2pdf myPackageDirectory # Still no joy
Hmm ... looks like a package
Converting Rd files to LaTeX
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
...
See "Initialization at Start of an R Session" for more info about startup files, though as shown above, the information in that document about overriding R_HOME/etc/Renviron is either incorrect or incomplete for this situation. Perhaps it should be amended in the section about R_CHECK_ENVIRON and R_BUILD_ENVIRON to also include something about R CMD RD2*, but I'm not sure whether that's what's going on, I only know this isn't a solution.
2) $HOME/.zshrc (and friends) won't help.
On OS X, your shell startup file is not consulted when you launch GUI apps. You could use defaults write or launchctl setenv to change the TEXI2DVI variable so that it's set when R launches, but you'd also have to stick it in your shell startup file for processes not started by launchd, which is icky. I also version my dotfiles, and I don't like sticking this bit of configuration in the launchctl ether where I can't easily remember it's there. But launchctl is presumably one solution to this.
3) etc/Renviron.site won't help.
This one is surprising - I expected it to work:
% cat /usr/local/Cellar/r/3.2.2_1/R.framework/Resources/etc/Renviron.site
R_TEXI2DVICMD=/usr/local/bin/texi2dvi
TEXI2DVI=/usr/local/bin/texi2dvi
% \R CMD Rd2pdf myPackageDirectory
Hmm ... looks like a package
Converting Rd files to LaTeX
Creating pdf output from LaTeX ...
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: /usr/local/opt/texinfo/bin/texi2dvi: No such file or directory
So Renviron.site's settings aren't taking effect here. I'm getting impatient, so I didn't try to diagnose why.
3) Editing etc/Renviron as a last resort
So this finally works:
% grep TEXI2DVI /usr/local/Cellar/r/3.2.2_1/R.framework/Resources/etc/Renviron
TEXI2DVI=/usr/local/bin/texi2dvi ## Added by Ken
R_TEXI2DVICMD=${R_TEXI2DVICMD-${TEXI2DVI-'/usr/local/opt/texinfo/bin/texi2dvi'}}
I don't like it much, because I'll lose those settings next time I upgrade R. And R's documentation specifically says "do not change ‘R_HOME/etc/Renviron’ itself". But at least it works.

Error in untar( ) while using R

I am new to the R programming language and am having basic issues with it. I want to untar a file, but it has not been able to work for me.
Here is the code that I enter:
untar("CD_data.tar", exdir="data")
It then returns the following error message:
/bin/sh: /usr/bin/gnutar: No such file or directory
Warning message:
In untar("CD_data.tar", exdir = "data") :
‘/usr/bin/gnutar -xf 'CD_data.tar' -C 'data'’ returned error code 127
Please help! Thanks!
R on OS X 10.9 (Mavericks) seems to set a wrong TAR environment variable.
You can fix this by adding the following to your .Rprofile (or executing it manually):
Sys.setenv(TAR = '/usr/bin/tar')
Alternatively, you can provide the tar path as an argument when calling untar.
My 2 cents is that you are using a mac and have not installed tar. You are getting value 127 because the command is not found within your $PATH and it's not a built-in command (which is usually the case if you were in unix...
In other words you need to install tar.
Or run it in linux.

osmar package in R (OpenStreetMap)

The osmar package in R has a demo file called demo("navigator"). It is provided to illustrate package capabilities and functions. When I ten the script, I hit the following line and error:
R> muc <- get_osm(muc_bbox, src)
sh: osmosis: command not found
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file '/var/folders/81/4k487q0969q1d8rfd1pyhyr40000gs/T//RtmpdgZSOy/file13a473cb904c': No such file or directory
The command is intended to convert an osmosis data object to a osmar object. I have properly installed osmosis for MacOSX, updated my path definition in the bash shell to point to the osmosis executable.
I'm not sure what the error message means and how best to respond. Any help appreciated
Brad
Have your restarted R? It looks like osmosis isn't in your path, although you do mention that you set that. Make sure that you can run one of the osmosis commands in Terminal:
osmosis --read-xml SloveniaGarmin.osm --tee 4 --bounding-box left=15 top=46 --write-xml SloveniaGarminSE.osm --bounding-box left=15 bottom=46 --write-xml SloveniaGarminNE.osm --bounding-box right=15 top=46 --write-xml SloveniaGarminSW.osm --bounding-box right=15 bottom=46 --write-xml SloveniaGarminNW.osm
The example is irrelevant, as long as it doesn't say osmosis file not found.
Also, make sure you have gzip in your path. I am almost certain that it is default, but the demo package relies on it to run. Just open a Terminal and type gzip to make sure it is there.
Finally, if you need to debug this, then run this:
library(osmar)
download.file("http://osmar.r-forge.r-project.org/muenchen.osm.gz","muenchen.osm.gz")
system("gzip -d muenchen.osm.gz")
# At this point, check the directory listed by getwd(). It should contain muenchen.osm.
src <- osmsource_osmosis(file = "muenchen.osm",osmosis = "osmosis")
muc_bbox <- center_bbox(11.575278, 48.137222, 3000, 3000)
debug(osmar:::get_osm_data.osmosis)
get_osm(muc_bbox, src)
# Press Enter till you get to
# request <- osm_request(source, what, destination)
# Then type request to get the command it is sending.
After you type Enter once, and then request you will get the string it is sending to your OS. It should be something like:
osmosis --read-xml enableDateParsing=no file=muenchen.osm --bounding-box top=48.1507120588903 left=11.5551240885889 bottom=48.1237319411097 right=11.5954319114111 --write-xml file=<your path>
Try pasting this into your Terminal. It should work from any directory.
Oh, and type undebug(osmar:::get_osm_data.osmosis) to stop debugging. Type Q to exit the debugger.
Hey I just got this thing working. The problem is not with the system path variable for osmosis. It is with the system call the script makes which uses the "gzip" application to unzip the .gz file it has downloaded before. So there is an error when gzip is not installed in your machine or gzip is not in the system path variable. so installing gzip and adding it to the path variable will mitigate this error. alternatively you can unzip the file manually to the same path and run the script again.

Resources