I downloaded a .csv file and saved it on my desktop. Now, to work with it, I am supposed to use the read.table() or read.csv() functions to load the file into R. How do I find the file path for input into a line like this:
yy_2 <- read.csv(file =....., header = TRUE, stringsAsFactors = FALSE)
I use a MacBook Pro, if that helps.
On MacOS, this is most likely to be
fdir <- file.path("~/Desktop")
(~ is Unix shorthand for your home directory.) You can try list.files(fdir) to see if the files are there. Alternately you could try file.choose() as suggested in comments above, although that can only select a file, not a directory; this seems to be a long-standing gap in R (see e.g. this mailing list post from 2012, which suggests dirname(file.choose()) or this function:
choose.dir <- function() {
system("osascript -e 'tell app \"R\" to POSIX path of (choose folder with prompt \"Choose Folder:\")' > /tmp/R_folder",
intern = FALSE, ignore.stderr = TRUE)
p <- system("cat /tmp/R_folder && rm -f /tmp/R_folder", intern = TRUE)
return(ifelse(length(p), p, NA))
}
which appears to crash RStudio (!) but works in the R console on MacOS for me ...
Related
I am trying to read a csv in a zip file by using the command fread("unzip -cq file.zip") which works perfectly when the file is in my working directory.
But when I try the command by specifying the path of the file without changing the directory say fread("unzip -cq C:/Users/My user/file.zip") I get an error saying the following unzip: cannot find either C:/Users/My or C:/Users/My.zip
The reason why this happens is that there are spaces in my path but what would be the workaround?
The only option that I have thought is to just change to the directory where each file is located and read it from there but this is not ideal.
I use shQuote for this, like...
fread_zip = function(fp, silent=FALSE){
qfp = shQuote(fp)
patt = "unzip -cq %s"
thecall = sprintf(patt, qfp)
if (!silent) cat("The call:", thecall, sep="\n")
fread(thecall)
}
Defining a pattern and then substituting in with sprintf can keep things readable and easier to manage. For example, I have a similar wrapper for .tar.gz files (which apparently need to be unzipped twice with a | pipe between the steps).
If your zip contains multiple csvs, fread isn't set up to read them all (though there's an open issue). My workaround for that case currently looks like...
library(magrittr)
fread_zips = function(fp, unzip_dir = file.path(dirname(fp), sprintf("csvtemp_%s", sub(".zip", "", basename(fp)))), silent = FALSE, do_cleanup = TRUE){
# only tested on windows
# fp should be the path to mycsvs.zip
# unzip_dir should be used only for CSVs from inside the zip
dir.create(unzip_dir, showWarnings = FALSE)
# unzip
unzip(fp, overwrite = TRUE, exdir = unzip_dir)
# list files, read separately
# not looking recursively, since csvs should be only one level deep
fns = list.files(unzip_dir)
if (!all(tools::file_ext(fns) == "csv")) stop("fp should contain only CSVs")
res = lapply(fns %>% setNames(file.path(unzip_dir, .), .), fread)
if (do_cleanup) unlink(unzip_dir, recursive = TRUE)
res
}
So, because we're not passing a command-line call directly to fread, there's no need for shQuote here. I wrote and used this function yesterday, so there are probably still some oversights or bugs.
The magrittr %>% pipe part could be written as setNames(file.path(unzip_dir, fns), fns) instead.
Try to assign the location to a variable and use paste to call the zip file like below:
myVar<-"C:/Users/Myuser/"
fread(paste0("unzip -cq ",myVar,"file.zip"))
I am working on a project which is developed by our team. We share the codes in a repository. Every team member is using his/her own machine with his/her own working directories. That is why we use relative paths in our projects. Usually we use something like
setwd("MyUser/MyProject/MyWD/myCodesDir") # local
...
MyReportingPath <- "../ReportsDir" # in repository
Now I try to render a markdown report to this directory:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "/ReportingHTML.html"))
This doesn't work. It only works if I type in the full path of the output file ("/home/User/..../ReportingHTML.html")
This is one of the issues I would like to clarify: is there any possibility to use relative paths in any way for Markdown?
Second issue is that if I type in an non-existing directory in the output_file, pandoc throws me an error instead of creating this directory with my output file. Is there any possibility to do a dynamic output directory creation? (except for doing system(paste0("mkdir ", reportPath), intern = T) before rendering)
P.S. It is important for me to render the markdown document in a separate R function, where I create the whole environment which is inherited by my Markdown document.
Trivial issue - since you're using paste0 you need to provide the / delimiter between your output directory and output file.
You wrote:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "ReportingHTML.html"))
Instead, try:
rmarkdown::render(input = "relevantPath/ReportingHTML.Rmd",
output_file = paste0(MyReportingPath, "/", "ReportingHTML.html"))
More broadly:
For your first issue (settting the path for the input file) - I also suggest using here::here(). If you need to navigate up from your working directory you can break down the path as follows:
parent_dir <- paste(head(unlist(strsplit(here::here(), "/", fixed = TRUE)), -1), collapse = "/")
grandparent_dir <- paste(head(unlist(strsplit(here::here(), "/", fixed = TRUE)), -2), collapse = "/")
However - it might be easier to set the working directory to a higher level, then build up your code and results directories, for example:
project_dir <- here::here()
codefile <- paste(project_dir, "code", "myreport.Rmd", sep = "/")
outfile <- paste(project_dir, "results", "myreport.html", sep = "/")
rmarkdown::render(input = codefile,
output_file = outfile))
For your second issue (creating the directory for output) - using dir.create("MyReportingPath", recursive = TRUE) will create the output directory and any intermediate levels. You will get a warning if the directory exists which can be suppressed using showWarnings = FALSE.
I just ran into this myself, and as it turns out, there's one additional complication here: You really do need to use an absolute path for the output if your input file being rendered isn't in your current working directory. You also need to use absolute paths for anything else during rendering, for example images like ![](path.png).
This looks to be because rmarkdown::render temporarily sets your working directory to the directory containing the input file, so it interprets a relative directory as relative to that path, not your initial working directory. (EDIT: There's currently an open issue for this on github.)
For example if you have this setup:
subdir/test.Rmd
outdir
And you do this:
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = "outdir/out.html")
You get the error:
Error: The directory 'outdir' does not not exist.
Execution halted
Instead, you could do:
output_path <- file.path(normalizePath("."), "outdir/out.html")
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = output_path)
...and that should work.
You might think you could just use normalizePath("outdir/out.html"), but you can't, because that function only works when the path already exists. You also might think you could do this:
rmarkdown::render(
input = "subdir/test.Rmd",
output_file = file.path(normalizePath("."), "outdir/out.html"))
but you can't, because R only gets around to interpreting the value of output_file once the working directory has already been changed.
I would like to set the working directory to the path of current script programmatically but first I need to get the path of current script.
So I would like to be able to do:
current_path = ...retrieve the path of current script ...
setwd(current_path)
Just like the RStudio menu does:
So far I tried:
initial.options <- commandArgs(trailingOnly = FALSE)
file.arg.name <- "--file="
script.name <- sub(file.arg.name, "", initial.options[grep(file.arg.name, initial.options)])
script.basename <- dirname(script.name)
script.name returns NULL
source("script.R", chdir = TRUE)
Returns:
Error in file(filename, "r", encoding = encoding) : cannot open the
connection In addition: Warning message: In file(filename, "r",
encoding = encoding) : cannot open file '/script.R': No such file or
directory
dirname(parent.frame(2)$ofile)
Returns: Error in dirname(parent.frame(2)$ofile) : a character vector argument expected
...because parent.frame is null
frame_files <- lapply(sys.frames(), function(x) x$ofile)
frame_files <- Filter(Negate(is.null), frame_files)
PATH <- dirname(frame_files[[length(frame_files)]])
Returns: Null because frame_files is a list of 0
thisFile <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
needle <- "--file="
match <- grep(needle, cmdArgs)
if (length(match) > 0) {
# Rscript
return(normalizePath(sub(needle, "", cmdArgs[match])))
} else {
# 'source'd via R console
return(normalizePath(sys.frames()[[1]]$ofile))
}
}
Returns: Error in path.expand(path) : invalid 'path' argument
Also I saw all answers from here, here, here and here.
No joy.
Working with RStudio 1.1.383
EDIT: It would be great if there was no need for an external library to achieve this.
In RStudio, you can get the path to the file currently shown in the source pane using
rstudioapi::getSourceEditorContext()$path
If you only want the directory, use
dirname(rstudioapi::getSourceEditorContext()$path)
If you want the name of the file that's been run by source(filename), that's a little harder. You need to look for the variable srcfile somewhere back in the stack. How far back depends on how you write things, but it's around 4 steps back: for example,
fi <- tempfile()
writeLines("f()", fi)
f <- function() print(sys.frame(-4)$srcfile)
source(fi)
fi
should print the same thing on the last two lines.
Update March 2019
Based on Alexis Lucattini and user2554330 answers, to make it work on both command line and RStudio. Also solving the "as_tibble" deprecated message
library(tidyverse)
getCurrentFileLocation <- function()
{
this_file <- commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(col=value, into=c("key", "value"), sep="=", fill='right') %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
if (length(this_file)==0)
{
this_file <- rstudioapi::getSourceEditorContext()$path
}
return(dirname(this_file))
}
TLDR: The here package (available on CRAN) helps you build a path from a project's root directory. R projects configured with here() can be shared with colleagues working on different laptops or servers and paths built relative to the project's root directory will still work. The development version is at github.com/r-lib/here.
With git
You certainly store your R code in a directory. This directory is probably part of a git repository and/or an R studio project. I would recommend building all paths relative to that project's root directory. For example let's say that you have an R script that creates reusable plotting functions and that you have an R markdown notebook that loads that script and plots graphs in a nice (so nice) document. The project tree would look something like this
├── notebooks
│ ├── analysis.Rmd
├── R
│ ├── prepare_data.R
│ ├── prepare_figures.R
From the analysis.Rmd notebook, you would import plotting function with here() as such:
source(file.path(here::here("R"), "prepare_figures.R"))
Why?
Hadley Wickham in a Stackoverflow
comment:
"You should never use setwd() in R code - it basically defeats the idea of
using a working directory because you can no longer easily move your code
between computers. – hadley Nov 20 '10 at 23:44 "
From the Ode to the here package:
Do you:
Have setwd() in your scripts? PLEASE STOP DOING THAT.
This makes your script very fragile, hard-wired to exactly one time and place. As soon as you rename or move directories, it breaks. Or maybe you get a new computer? Or maybe someone else needs to run your code?
[...]
Classic problem presentation: Awkwardness around building paths and/or setting working directory in projects with subdirectories. Especially if you use R Markdown and knitr, which trips up alot of people with its default behavior of “working directory = directory where this file lives”. [...]
Install the here package:
install.packages("here")
library(here)
here()
here("construct","a","path")
Documentation of the here() function:
Starting with the current working directory during package load time,
here will walk the directory hierarchy upwards until it finds
a directory that satisfies at least one of the following conditions:
contains a file matching [.]Rproj$ with contents matching ^Version: in
the first line
[... other options ...]
contains a directory .git
Once established, the root directory doesn't change during the active
R session. here() then appends the arguments to the root directory.
The development version of the here package is available on github.
What about
What about files outside the project directory?
If you are loading or sourcing files outside the project directory, the recommended way is to use an environment variable at the Operating System level. Other users of your R code on different laptops or servers would need to set the same environment variable. The advantage is that it is portable.
data_path <- Sys.getenv("PROJECT_DATA")
df <- read.csv(file.path(data_path, "file_name.csv"))
Note: There is a long list of environmental variables which can affect an R session.
What about many projects sourcing each other?
It's time to create an R package.
If you're running an Rscript through the command-line etc
Rscript /path/to/script.R
The function below will assign this_file to /path/to/script
library(tidyverse)
get_this_file <- function() {
commandArgs() %>%
tibble::enframe(name = NULL) %>%
tidyr::separate(
col = value, into = c("key", "value"), sep = "=", fill = "right"
) %>%
dplyr::filter(key == "--file") %>%
dplyr::pull(value)
}
this_file <- get_this_file()
print(this_file)
Here is a custom function to obtain the path of a file in R, RStudio, or from an Rscript:
stub <- function() {}
thisPath <- function() {
cmdArgs <- commandArgs(trailingOnly = FALSE)
if (length(grep("^-f$", cmdArgs)) > 0) {
# R console option
normalizePath(dirname(cmdArgs[grep("^-f", cmdArgs) + 1]))[1]
} else if (length(grep("^--file=", cmdArgs)) > 0) {
# Rscript/R console option
scriptPath <- normalizePath(dirname(sub("^--file=", "", cmdArgs[grep("^--file=", cmdArgs)])))[1]
} else if (Sys.getenv("RSTUDIO") == "1") {
# RStudio
dirname(rstudioapi::getSourceEditorContext()$path)
} else if (is.null(attr(stub, "srcref")) == FALSE) {
# 'source'd via R console
dirname(normalizePath(attr(attr(stub, "srcref"), "srcfile")$filename))
} else {
stop("Cannot find file path")
}
}
https://gist.github.com/jasonsychau/ff6bc78a33bf3fd1c6bd4fa78bbf42e7
Another option to get current script path is funr::get_script_path() and you don't need run your script using RStudio.
I had trouble with all of these because they rely on libraries that I couldn't use (because of packrat) until after setting the working directory (which was why I needed to get the path to begin with).
So, here's an approach that just uses base R. (EDITED to handle windows \ characters in addition to / in paths)
args = commandArgs()
scriptName = args[substr(args,1,7) == '--file=']
if (length(scriptName) == 0) {
scriptName <- rstudioapi::getSourceEditorContext()$path
} else {
scriptName <- substr(scriptName, 8, nchar(scriptName))
}
pathName = substr(
scriptName,
1,
nchar(scriptName) - nchar(strsplit(scriptName, '.*[/|\\]')[[1]][2])
)
If you don't want to use (or have to remember) code, simply hover over the script and the path will appear
The following solves the problem for three cases: RStudio source Button, RStudio R console (source(...), if the file is still in the Source pane) or the OS console via Rscript:
this_file = gsub("--file=", "", commandArgs()[grepl("--file", commandArgs())])
if (length(this_file) > 0){
wd <- paste(head(strsplit(this_file, '[/|\\]')[[1]], -1), collapse = .Platform$file.sep)
}else{
wd <- dirname(rstudioapi::getSourceEditorContext()$path)
}
print(wd)
The following code gives the directory of the running Rscript if you are running it either from Rstudio or from the command line using Rscript command:
if (rstudioapi::isAvailable()) {
if (require('rstudioapi') != TRUE) {
install.packages('rstudioapi')
}else{
library(rstudioapi) # load it
}
wdir <- dirname(getActiveDocumentContext()$path)
}else{
wdir <- getwd()
}
setwd(wdir)
Does anyone know if it's possible to derive the filename/filepath of an R program? I'm looking for something similar to "%sysfunc(GetOption(SYSIN))" in SAS which will return the filepath of a SAS program (running in batch mode). Can I do anything similar in R?
The best I've been able to come up with so far is to add the filename and current directory using shortcut keys in the text editor I use (PSPad). Is there an easier way to do this?
Here's my example:
progname<-"Iris data listing"
# You must use either double-backslashes or forward slashes in pathnames
progdir<-"F:\\R Programming\\Word output\\"
# Set the working directory to the program location
setwd(progdir)
# Make the ReporteRs package available for creating Word output
library(ReporteRs)
# Load the "Iris" provided with R
data("iris")
options('ReporteRs-fontsize'=8, 'ReporteRs-default-font'='Arial')
# Initialize the Word output object
doc <- docx()
# Add a title
doc <- addTitle(doc,"A sample listing",level=1)
# Create a nicely formatted listing, style similar to Journal
listing<-vanilla.table(iris)
# Add the listing to the Word output
doc <- addFlexTable(doc, listing)
# Create the Word output file
writeDoc( doc, file = paste0(progdir,progname,".docx"))
This works fairly well, both in batch and in RStudio. I'd really appreciate a better solution though
The link to Rscript: Determine path of the executing script provided by #Juan Bosco contained most of the information I needed. One problem it didn't address was running an R program in RStudio (sourcing in RStudio was discussed and solved). I found that this problem could be dealt with using rstudioapi::getActiveDocumentContext()$path).
It's also noteworthy that the solutions for batch mode won't work using
Rterm.exe --no-restore --no-save < %1 > %1.out 2>&1
The solutions require that the --file= option be used, e.g.
D:\R\R-3.3.2\bin\x64\Rterm.exe --no-restore --no-save --file="%~1.R" > "%~1.out" 2>&1 R_LIBS=D:/R/library
Here's a new version of the get_script_path function posted by #aprstar. This has been modified to also work in RStudio (note that it requires the rstudioapi library.
# Based on "get_script_path" function by aprstar, Aug 14 '15 at 18:46
# https://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
# That solution didn't work for programs executed directly in RStudio
# Requires the rstudioapi package
# Assumes programs executed in batch have used the "--file=" option
GetProgramPath <- function() {
cmdArgs = commandArgs(trailingOnly = FALSE)
needle = "--file="
match = grep(needle, cmdArgs)
if (cmdArgs[1] == "RStudio") {
# An interactive session in RStudio
# Requires rstudioapi::getActiveDocumentContext
return(normalizePath(rstudioapi::getActiveDocumentContext()$path))
}
else if (length(match) > 0) {
# Batch mode using Rscript or rterm.exe with the "--file=" option
return(normalizePath(sub(needle, "", cmdArgs[match])))
}
else {
ls_vars = ls(sys.frames()[[1]])
if ("fileName" %in% ls_vars) {
# Source'd via RStudio
return(normalizePath(sys.frames()[[1]]$fileName))
}
else {
# Source'd via R console
return(normalizePath(sys.frames()[[1]]$ofile))
}
}
}
I placed this in my .Rprofile file. Now I can get the file information in either batch mode or in RStudio using the following code. I haven't tried it using source() but that should work too.
# "GetProgramPath()" returns the full path name of the file being executed
progpath<-GetProgramPath()
# Get the filename without the ".R" extension
progname<-tools::file_path_sans_ext(basename(progpath))
# Get the file directory
progdir<-dirname(progpath)
# Set the working directory to the program location
setwd(progdir)
I am using Rscript to plot some figures from a given CSV file in some directory, which is not necessarily my current working directory. I can call it as follows:
./script.r ../some_directory/inputfile.csv
Now I want to output my figures in the same directory (../some_directory), but I have no idea how to do that. I tried to get the absolute path for the input file because from this I could construct the output path, but I couldn't find out how to do that.
normalizePath() #Converts file paths to canonical user-understandable form
or
library(tools)
file_path_as_absolute()
The question is very old but it still misses a working solution. So here is my answer:
Use normalizePath(dirname(f)).
The example below list all the files and directories in the current directory.
dir <- "."
allFiles <- list.files(dir)
for(f in allFiles){
print(paste(normalizePath(dirname(f)), fsep = .Platform$file.sep, f, sep = ""))
}
Where:
normalizePath(dirname(f)) gives the absolute path of the parent directory. So the individual file names should be added to the path.
.Platform is used to have an OS-portable code. (here)
file.sep gives "the file separator used on your platform: "/" on both Unix-alikes and on Windows (but not on the former port to Classic Mac OS)." (here)
Warning: This may cause some problems if not used with caution. For instance, say this is the path: A/B/a_file and the working directory is now set to B. Then the code below:
dir <- "B"
allFiles <- list.files(dir)
for(f in allFiles){
print(paste(normalizePath(dirname(f)), fsep = .Platform$file.sep, f, sep = ""))
}
would give:
> A/a_file
however, it should be:
> A/B/a_file
Here the solution:
args = commandArgs(TRUE)
results_file = args[1]
output_path = dirname(normalizePath(results_file))
To get the absolute path(s) from file(s)
Why not combine the base R function file.path() with the answer that #Marius gave. This appears marginally simpler, will work with a vector of files (files), and take care of system specific separators:
file.path(normalizePath(dirname(files)), files)
And wrapped inside a function (abspath):
abspath <- function(files)file.path(normalizePath(dirname(files)), files)
For instance:
> setwd("~/test")
> list.files()
[1] "file1.txt" "file2.txt"
And then:
> abspath(files)
[1] "/home/myself/test/file1.txt" "/home/myself/test/file2.txt"
I see that people gave pieces of the solution, but not all of it.
I have used this:
outputFile = paste(normalizePath(dirname(inputFile)),"\\", "my_file.ext", sep = "")
Hope it helps.
fs::path_abs() is my preferred way. It avoids the backslashes of normalizePath().