Global paths conversions from WIndows to MACOS (R language example) - r

I tried to find some already existing solutions - no success so far.
The problem
There are some projects, all run in R, by a group of people, where each team member uses Windows as the main operating system.
Nearly each script file uses the following command at the very beginning
setwd("Z://00-00-00/path/to/project")
What is used here is some common disc space under the path Z://00-00-00/. Since I work on MAC OS my paths are /common-drive/path/to/project the question is:
Is there a way to include a command/script in some sort of file like ~/.bashrc or maybe some R-related settings that will convert Windows-like absolute file paths to paths that are MAC OS-like when they detect it?
What I think should run is:
path.to.be.used <- "Z://00-00-00/path/to/project"
str_replace(path.to.be.used, "Z://00-00-00/", "/common-drive/")
however, all scripts have the path hard-coded directly in setwd, so I cannot change each file by hand. That is why I am trying to find out some workaround that will convert these paths in a "silent mode".
Does anyone have an idea how to do this? Any way to make a control on system or R-studio level if the path should be converted?
Thank you for you time and help!

As others said in the comments, you should convince your co-workers not to do that. However, that's often difficult, so here's a hack solution (mentioned by #MrFlick):
setwd <- function(dir) {
newdir <- sub("Z://00-00-00/", "/common-drive/", dir)
cat("Requested ", dir, ", using ", newdir, "\n")
base::setwd(newdir)
}

Related

R command dir.create and file.path

I've just started learning r and confused by the following question given in the course:
Create a directory in the current working directory called “testdir2” and a subdirectory for it called “testdir3”, all in one command by using dir.create() and file.path().
I couldn't get it to accept my answer and then found another site online giving the answers. This is the answer the other site gave:
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)
After copy/pasting this answer it still didn't let me progress in the course. Is there something wrong with the answer?
Also why would i want to use file.path to create the folders. Would it not make more sense to do this:
dir.create("testdir2/testdir3", recursive = TRUE)
What is the purpose of using the file.path function to create folders?
The purpose of using file.path() to create folders is so that you can write a function, script, or package that can be used by people that are using different kinds of computers. Different kinds of computers, or Platforms, use different file separators. Unix systems use the forward slash: /. This includes macs. Windows systems use the back slash: \.
Try looking at .Platform and you'll see there are a number of variables that you can refer to in order to create platform independent code. .Machine is another one.
Anyhow, the idea is that file.path(dir1, dir2, dir3) can create a valid path no matter what kind of platform R is running on.
As to why an auto grader didn't accept your answer, they can be very finicky, especially about hidden whitespace characters you can sometimes pick up when copying and pasting. Sometimes they test the output that your command produces, but sometimes (bad) auto graders just test the input, so even if your command would produce the same behavior, if it's not exactly the same, the auto grader won't accept it.
I got this code to work:
dir.create(file.path("testdir2","testdir3"), recursive = TRUE)
I think you just needed double quotes around "testdir2" and "testdir3".

How can I set `path.expand` to begin at my working directory?

I'm using a Mac. The path.expand function is several folders removed from my desired working directory. For example:
path.expand('~')
[1] "/Users/my.name"
I'd like to change it to something like this:
path.expand('~')
[1] "/Users/my.name/drive/R/project/sub.folder"
How can I go about this?
Thank you.
The tilde is, in all unix-sen (including macos), special in that it refers to what the operating system considers the home directory (via the env var HOME).
There are two types of answers to this. Can it be done? Perhaps, sure even. Should it be done? There will likely be unintended consequences (that may be hard to troubleshoot and/or workaround), so likely not.
This works on my ubuntu box:
me#mybox:/some/path$ Rscript -e 'Sys.getenv("HOME")'
[1] "/home/me"
me#mybox:/some/path$ HOME=/tmp/ Rscript -e 'Sys.getenv("HOME")'
[1] "/tmp/"
me#mybox:/some/path$ Rscript -e 'Sys.setenv(HOME="/tmp/");Sys.getenv("HOME")'
[1] "/tmp/"
(This notably does not work as well on Windows ... which is not very unix-y of it!)
So you can try overriding it with either:
Sys.setenv(HOME = "/Users/my.name/drive/R/project/sub.folder"), or
Set the HOME variable in your working environment before starting R.
This might have unintended consequences. For instance, R looks for ~/.Rprofile, and git and commands look for ~/.gitconfig and such.
My recommended way-ahead would be to define a variable and change there. If you use RStudio, then its "Projects" can always start you in the correct directory. If not and you still want this "special directory" available to you, perhaps add this to your /Users/username/.Rprofile (in your "actual" homedir)
.specialdir <- "/Users/my.name/drive/R/project/sub.folder"
and, whenever you need to go there, use file.expand(.specialdir). One side-effect of this is that any of your code, functions, reports, whatever that use this will no longer be reproducible.
A way to easily reference your files without needing to change the HOME directory is to use the here package. This basically uses a heuristic to find the right working directory based on where your script is. Normally it looks for RStudio Project files (.rproj) or for a .git file if your working directory is a git repository. It's easy to use and robust to moving machines or accidental use of setwd, or even forgetting to set HOME on a different machine/profile.
If your data file some_data.csv above is stored in /Users/my.name/drive/R/project/sub.folder/some_data.csv, where project is the root folder for the project:
here::here()
[1] "/Users/my.name/drive/R/project"
here::here("sub.folder", "some_data.csv")
[1] "/Users/my.name/drive/R/project/sub.folder/some_data.csv"
and you can use it as a drop in replacement for the path, as in:
data <- read_csv(here::here("sub.folder", "some_data.csv"))

risks of using setwd() in a script?

I've heard it said that it is bad practice to use setwd() in a script.
What are the risks/dangers associated with it?
What are better alternatives?
It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).
My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.
Yihui Xie (author of knitr) feels particularly strongly about this:
https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0
Whenever you want to manipulate files, they are assumed to be under
the same directory of your source (e.g. Rnw documents). Then you can
always use relative paths and you will never need to setwd(). Using
setwd() contradicts with the principle of reproducibility, e.g. you
use setwd('foo/bar/') and the directory may not exist in other
people's computers. See FAQ 7:
https://github.com/yihui/knitr/blob/master/FAQ.md
And from the aforementioned FAQ 7:
You'd better not do this [change working directory inside knitr code
chunks]. Your working directory is always getwd() (all output files
will be written here), but the code chunks are evaluated under the
directory where your input document comes from. Changing working
directories while running R code is a bad practice in general. See #38
for a discussion. You should also try to avoid absolute directories
whenever possible (use relative directories instead), because it makes
things less reproducible.
See also: https://github.com/yihui/knitr/issues/38
I can't think of any particular issues with using setwd() in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd() when being lazy about paths - see below!
I use file.path() extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd().
kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
# full path to the report file created
sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
# full path to the csv file that will be created
ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
#
# ok... we're going to process this CSV file...
r1 <- read.csv(file.path(kInDir, fnam))
#
# we''ll put the output from the analysis into this report file
sink(sfnam, split=TRUE)
# processs it... into a new data.frame k1
# blah blah blah...
#
write.csv(k1, file=ofnam, row.names=FALSE)
sink() # turn off this particular report file
}
Toward the better alternatives question:
I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.
RStudio - Projects
I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.
ProjectTemplate
On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.
Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient
setrelwd <- function(rel_path){
curr_dir <- getwd()
abs_path <- file.path(curr_dir,rel_path)
if(dir.exists(abs_path)){
setwd(abs_path)
}
else
{
warning('Directory does not exist. Please create it first.')
}
}
> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.
Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist
To make things a bit more portable where I work we all put this in a Rprofile
hdrive=
switch(Sys.info()[[1]],
'Linux'="/mnt/hdrive",
'Windows'="H:/",
"Darwin"="/Volumes/hdrive/mnt/hdrive"
)
So i always have that variable to get me to our shared drive. Then in my script we can write
setwd(paste(hdrive,"/relative/path/",sep="/"))
So that gets us around some of the problems that others are talking about.
I personally added the following code. I use Sys.info() and any() with unique information.
First step is to use Sys.info() and find the unique identifier for your computer.
if(any(Sys.info() == "COMPUTER1")) {
setwd("c:/Users/user1/repos/project/")
}
if(any(Sys.info() == "COMPUTER2")) {
setwd("home/user1/repos/project/")
}
and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.
For reproduction it does not change anyone's working directory unless they are that specific user.

How to open a local html file from R in an operating system independent way?

How to open a local html file from R in an operating system independent way?
For demonstration purposes, assume that the file is called test.html and is in the working directory.
initial thoughts
system('gnome-open test.html')
This works on Ubuntu
browseURL(paste('file://', getwd(),'test.html', sep='/'))
This works on Ubuntu, but it feels like a bit of a hack and I'm not certain whether it would work on Windows.
You might find my open.file.in.OS function useful, sources can be found here.
A short summary about what this function does:
Check platform
Based on platform, call:
shell.exec on Windows
open with system on Mac
and xdg-open with system on other Unix-like operating system
Uses shQuote on the privided file
Update: See now the openFileInOS in the pander package.
library(pander)
openFileInOS("d:/del/dt/a.html")
References: this function is a forked version of David Hajage's convert function can be found here.
I just wanted to pull the answer given by #daroczig out of the comments and into an answer. If #darcozig wants to post this as a separate answer, I'll delete this copy.
openHTML <- function(x) browseURL(paste0('file://', file.path(getwd(), x)))
Use the file.path function to construct the file path.
file.path(..., fsep = .Platform$file.sep)
...: character vectors.
fsep: the path separator to use.
By default it will use the current os path separator.
For example
> file.path ("", "home", "phoxis", "paragraph")
[1] "/home/phoxis/paragraph"
This generates my file "/home/phoxis/paragraph"
Note the blank string "" at the beginning. This forces to ad an extra "/" in my case to generate the absolute path. Adjust to generate absolute or relative path as per your need and have a look at ?file.path
I think this will fulfil your needs

get filename and path of `source`d file

How can a sourced or Sweaved file find out its own path?
Background:
I work a lot with .R scripts or .Rnw files.
My projects are organized in a directory structure, but the path of the project's base directory frequently varies between different computers (e.g. because I just do parts of data analysis for someone else, and their directory structure is different from mine: I have projects base directories ~/Projects/StudentName/ or ~/Projects/Studentname/Projectname and most students who have just their one Project usually have it under ~/Measurements/ or ~/DataAnalysis/ or something the like - which wouldn't work for me).
So a line like
setwd (my.own.path ())
would be incredibly useful as it would allow to ensure the working directory is the base path of the project regardless of where that project actually is. Without the need that the user must think of setting the working directory.
Let me clarify: I look for a solution that works with pressing the editor's/IDE's source or Sweave Keyboard shortcut of the unthinking user.
Just FYI, knitr will setwd() to the dir of the input file when (and only when) evaluating the code chunks, i.e. if you call knit('path/to/input.Rnw'), the working dir will be temporarily switched to path/to/. If you want to know the input dir in code chunks, currently you can call an unexported function knitr:::input_dir() (I may export it in the future).
Starting from gsk3's Seb's suggestions, here's an idea:
the combination of username (login) and IP or name of the computer could be used to select the right directory.
That leads to something like:
setwd (switch (paste (Sys.info () [c ("user", "nodename")], collapse="."),
user.laptop = "~/Messungen",
user2.server = "~/Projekte/Projekt/",
))
So there is an automatic solution, that
works with source
works with Sweave
even works for interactive sessions where the commands are sent line by line
the combination of user and nodename of course needs to be specific
the paths need to be edited by hand, though.
Improvements welcome!
Update:
Gabor Grothendieck answered the following to a related question on r-help today:
this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)
which will work for source.
Another update: I now do most of the data analysis work in RStudio. RStudio's projects basically solve the problem: RStudio changes the working directory to the project root directory every time I switch between projects.
I can therefore put the project directory as far down my directory tree as I want (and the students can also put their copy wherever they want) and sync the data files and scripts/.Rnws via version control (We use a private git server). The RStudio project files are kept out of the version control, i.e. .gitignore contains .Rproj.user.
Obviously, within the project, the directory structure needs to be synchronized.
You can use sys.calls() to get the command used to source the file. Then you need a bit of trickery using regular expressions to get the pathname, bearing in mind that source("something/filename") could have used either the absolute or relative path. Here's a first attempt at putting all the pieces together: try inserting the following lines at the top of a source file.
whereFrom=sys.calls()[[1]]
# This should be an expression that looks something like
# source("pathname/myfilename.R")
whereFrom=as.character(whereFrom[2]) # get the pathname/filename
whereFrom=paste(getwd(),whereFrom,sep="/") # prefix it with the current working directory
pathnameIndex=gregexpr(".*/",whereFrom) # we want the string up to the final '/'
pathnameLength=attr(pathnameIndex[[1]],"match.length")
whereFrom=substr(whereFrom,1,pathnameLength-1)
print(whereFrom) # or "setwd(whereFrom)" to set the working directory
It's not very robust—for instance, it will fail on windows with source("pathname\\filename"), and I haven't tested what happens if you have one file sourcing another file—but you might be able to build a solution on top of this.
I have no direct solution how to obtain the directory of the file itself but if you have a limited range of directories and directory structures you can probably use
if(file.exists("c:/somedir")==TRUE){setwd("c:/somedir")}
You could check out the pattern of the directory in question and then set the dir. Does this help you?
An additional problem is that the working directory is a global variable, which can be changed by any script, so if your script calls another script, it will have to set the wd back. In RStudio I use Session -> Set Working Directory -> To Source File Location (I know, it's not ideal), and then my script does
wd = getwd ()
...
source ("mySubDir/myOtherScript.R", chdir=TRUE); setwd (wd)
...
source ("anotherSubDir/anotherScript.R", chdir=TRUE); setwd (wd)
In this way one can maintain a stack of working directories. I would love to see this implemented in the language itself.
This answer works for source and also inside nvim-R - I have no idea if it works with knitr and similar things. Any feedback appreciated.
If you have multiple scripts source-ing each other, it is important to get the correct one. That is, the largest i for which sys.frame(i)$ofile exists.
get.full.path.to.this.sourced.script = function() {
for(i in sys.nframe():1) { # Go through all the call frames,
# in *reverse* order.
x = sys.frame(i)$ofile
if(!is.null(x)) # if $ofile exists,
return(normalizePath(x)) # then return the full absolute path
}
}

Resources