R command dir.create and file.path

R command dir.create and file.path - r

I've just started learning r and confused by the following question given in the course:
Create a directory in the current working directory called “testdir2” and a subdirectory for it called “testdir3”, all in one command by using dir.create() and file.path().
I couldn't get it to accept my answer and then found another site online giving the answers. This is the answer the other site gave:
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)
After copy/pasting this answer it still didn't let me progress in the course. Is there something wrong with the answer?
Also why would i want to use file.path to create the folders. Would it not make more sense to do this:
dir.create("testdir2/testdir3", recursive = TRUE)
What is the purpose of using the file.path function to create folders?

The purpose of using file.path() to create folders is so that you can write a function, script, or package that can be used by people that are using different kinds of computers. Different kinds of computers, or Platforms, use different file separators. Unix systems use the forward slash: /. This includes macs. Windows systems use the back slash: \.
Try looking at .Platform and you'll see there are a number of variables that you can refer to in order to create platform independent code. .Machine is another one.
Anyhow, the idea is that file.path(dir1, dir2, dir3) can create a valid path no matter what kind of platform R is running on.
As to why an auto grader didn't accept your answer, they can be very finicky, especially about hidden whitespace characters you can sometimes pick up when copying and pasting. Sometimes they test the output that your command produces, but sometimes (bad) auto graders just test the input, so even if your command would produce the same behavior, if it's not exactly the same, the auto grader won't accept it.

I got this code to work:
dir.create(file.path("testdir2","testdir3"), recursive = TRUE)
I think you just needed double quotes around "testdir2" and "testdir3".

Related

Global paths conversions from WIndows to MACOS (R language example)

I tried to find some already existing solutions - no success so far.
The problem
There are some projects, all run in R, by a group of people, where each team member uses Windows as the main operating system.
Nearly each script file uses the following command at the very beginning
setwd("Z://00-00-00/path/to/project")
What is used here is some common disc space under the path Z://00-00-00/. Since I work on MAC OS my paths are /common-drive/path/to/project the question is:
Is there a way to include a command/script in some sort of file like ~/.bashrc or maybe some R-related settings that will convert Windows-like absolute file paths to paths that are MAC OS-like when they detect it?
What I think should run is:
path.to.be.used <- "Z://00-00-00/path/to/project"
str_replace(path.to.be.used, "Z://00-00-00/", "/common-drive/")
however, all scripts have the path hard-coded directly in setwd, so I cannot change each file by hand. That is why I am trying to find out some workaround that will convert these paths in a "silent mode".
Does anyone have an idea how to do this? Any way to make a control on system or R-studio level if the path should be converted?
Thank you for you time and help!

As others said in the comments, you should convince your co-workers not to do that. However, that's often difficult, so here's a hack solution (mentioned by #MrFlick):
setwd <- function(dir) {
newdir <- sub("Z://00-00-00/", "/common-drive/", dir)
cat("Requested ", dir, ", using ", newdir, "\n")
base::setwd(newdir)
}

how can i get the current file directory in R

I have seen many related answers here,but i didn't get a proper way to solve my problem under windows system...
I know the link the similar question
I got that setwd() can locate the directory what i want,however,my R script may move to another directory without any modification,so I want to know the current file directory,becase there are expression like source(...),this called source file and the execution file under the same parent directory in a R project,how I can do?
any help appreciated.

You can get your current directory using the getwd() function and give it a name, say:
cpath = getwd()
Another useful function is the file.path, which can help you specify new directories with simple syntax. For example, you want to get the directory that is one level "above" the current directory, you can use:
upp.dir = file.path("..", "cpath")
This gives upp.dir as "../Your_Current_Dir". How about changing to another folder (called Folder_A) in current directory? Use:
folderA = file.path("cpath", "Folder_A")
These may help easy navigate the file system.

Basically, if you write scripts and those scripts depend on where they are, then you are Doing It Wrong.
Write code in packages. Parameterise functions to make them generally applicable. If you have folders with data in, then make one of those parameters a folder.
A script called with source() cannot reliably locate itself, but that shouldn't be a problem, because WHATEVER CALLED THE SCRIPT knows where the script is (it has to, or how else can it call it?) so it could pass that as a parameter. Something like:
> youarehere = "C:\foo\"
> source("C:\foo\bar.R")
and now bar.R can do setwd(youarehere) and it will work, even if it is badly written such that it relies on sourcing other code in its containing folder.
Or you can do:
> setwd(youarehere)
> source("bar.R")
in your calling function.
But really, its a fail, its a sign of badly written code. Use functions, write packages, use devtools, its really not that hard, then your code will work anywhere and you wont be writing stupid scripts that are a twisty turny maze of source() calls.
Stay classy.

risks of using setwd() in a script?

I've heard it said that it is bad practice to use setwd() in a script.
What are the risks/dangers associated with it?
What are better alternatives?

It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).
My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.
Yihui Xie (author of knitr) feels particularly strongly about this:
https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0
Whenever you want to manipulate files, they are assumed to be under
the same directory of your source (e.g. Rnw documents). Then you can
always use relative paths and you will never need to setwd(). Using
setwd() contradicts with the principle of reproducibility, e.g. you
use setwd('foo/bar/') and the directory may not exist in other
people's computers. See FAQ 7:
https://github.com/yihui/knitr/blob/master/FAQ.md
And from the aforementioned FAQ 7:
You'd better not do this [change working directory inside knitr code
chunks]. Your working directory is always getwd() (all output files
will be written here), but the code chunks are evaluated under the
directory where your input document comes from. Changing working
directories while running R code is a bad practice in general. See #38
for a discussion. You should also try to avoid absolute directories
whenever possible (use relative directories instead), because it makes
things less reproducible.
See also: https://github.com/yihui/knitr/issues/38

I can't think of any particular issues with using setwd() in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd() when being lazy about paths - see below!
I use file.path() extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd().
kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
# full path to the report file created
sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
# full path to the csv file that will be created
ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
#
# ok... we're going to process this CSV file...
r1 <- read.csv(file.path(kInDir, fnam))
#
# we''ll put the output from the analysis into this report file
sink(sfnam, split=TRUE)
# processs it... into a new data.frame k1
# blah blah blah...
#
write.csv(k1, file=ofnam, row.names=FALSE)
sink() # turn off this particular report file
}

Toward the better alternatives question:
I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.
RStudio - Projects
I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.
ProjectTemplate
On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.

Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient
setrelwd <- function(rel_path){
curr_dir <- getwd()
abs_path <- file.path(curr_dir,rel_path)
if(dir.exists(abs_path)){
setwd(abs_path)
}
else
{
warning('Directory does not exist. Please create it first.')
}
}
> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.
Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist

To make things a bit more portable where I work we all put this in a Rprofile
hdrive=
switch(Sys.info()[[1]],
'Linux'="/mnt/hdrive",
'Windows'="H:/",
"Darwin"="/Volumes/hdrive/mnt/hdrive"
)
So i always have that variable to get me to our shared drive. Then in my script we can write
setwd(paste(hdrive,"/relative/path/",sep="/"))
So that gets us around some of the problems that others are talking about.

I personally added the following code. I use Sys.info() and any() with unique information.
First step is to use Sys.info() and find the unique identifier for your computer.
if(any(Sys.info() == "COMPUTER1")) {
setwd("c:/Users/user1/repos/project/")
}
if(any(Sys.info() == "COMPUTER2")) {
setwd("home/user1/repos/project/")
}
and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.
For reproduction it does not change anyone's working directory unless they are that specific user.

get filename and path of `source`d file

How can a sourced or Sweaved file find out its own path?
Background:
I work a lot with .R scripts or .Rnw files.
My projects are organized in a directory structure, but the path of the project's base directory frequently varies between different computers (e.g. because I just do parts of data analysis for someone else, and their directory structure is different from mine: I have projects base directories ~/Projects/StudentName/ or ~/Projects/Studentname/Projectname and most students who have just their one Project usually have it under ~/Measurements/ or ~/DataAnalysis/ or something the like - which wouldn't work for me).
So a line like
setwd (my.own.path ())
would be incredibly useful as it would allow to ensure the working directory is the base path of the project regardless of where that project actually is. Without the need that the user must think of setting the working directory.
Let me clarify: I look for a solution that works with pressing the editor's/IDE's source or Sweave Keyboard shortcut of the unthinking user.

Just FYI, knitr will setwd() to the dir of the input file when (and only when) evaluating the code chunks, i.e. if you call knit('path/to/input.Rnw'), the working dir will be temporarily switched to path/to/. If you want to know the input dir in code chunks, currently you can call an unexported function knitr:::input_dir() (I may export it in the future).

Starting from gsk3's Seb's suggestions, here's an idea:
the combination of username (login) and IP or name of the computer could be used to select the right directory.
That leads to something like:
setwd (switch (paste (Sys.info () [c ("user", "nodename")], collapse="."),
user.laptop = "~/Messungen",
user2.server = "~/Projekte/Projekt/",
))
So there is an automatic solution, that
works with source
works with Sweave
even works for interactive sessions where the commands are sent line by line
the combination of user and nodename of course needs to be specific
the paths need to be edited by hand, though.
Improvements welcome!
Update:
Gabor Grothendieck answered the following to a related question on r-help today:
this.dir <- dirname(parent.frame(2)$ofile)
setwd(this.dir)
which will work for source.
Another update: I now do most of the data analysis work in RStudio. RStudio's projects basically solve the problem: RStudio changes the working directory to the project root directory every time I switch between projects.
I can therefore put the project directory as far down my directory tree as I want (and the students can also put their copy wherever they want) and sync the data files and scripts/.Rnws via version control (We use a private git server). The RStudio project files are kept out of the version control, i.e. .gitignore contains .Rproj.user.
Obviously, within the project, the directory structure needs to be synchronized.

You can use sys.calls() to get the command used to source the file. Then you need a bit of trickery using regular expressions to get the pathname, bearing in mind that source("something/filename") could have used either the absolute or relative path. Here's a first attempt at putting all the pieces together: try inserting the following lines at the top of a source file.
whereFrom=sys.calls()[[1]]
# This should be an expression that looks something like
# source("pathname/myfilename.R")
whereFrom=as.character(whereFrom[2]) # get the pathname/filename
whereFrom=paste(getwd(),whereFrom,sep="/") # prefix it with the current working directory
pathnameIndex=gregexpr(".*/",whereFrom) # we want the string up to the final '/'
pathnameLength=attr(pathnameIndex[[1]],"match.length")
whereFrom=substr(whereFrom,1,pathnameLength-1)
print(whereFrom) # or "setwd(whereFrom)" to set the working directory
It's not very robust—for instance, it will fail on windows with source("pathname\\filename"), and I haven't tested what happens if you have one file sourcing another file—but you might be able to build a solution on top of this.

I have no direct solution how to obtain the directory of the file itself but if you have a limited range of directories and directory structures you can probably use
if(file.exists("c:/somedir")==TRUE){setwd("c:/somedir")}
You could check out the pattern of the directory in question and then set the dir. Does this help you?

An additional problem is that the working directory is a global variable, which can be changed by any script, so if your script calls another script, it will have to set the wd back. In RStudio I use Session -> Set Working Directory -> To Source File Location (I know, it's not ideal), and then my script does
wd = getwd ()
...
source ("mySubDir/myOtherScript.R", chdir=TRUE); setwd (wd)
...
source ("anotherSubDir/anotherScript.R", chdir=TRUE); setwd (wd)
In this way one can maintain a stack of working directories. I would love to see this implemented in the language itself.

This answer works for source and also inside nvim-R - I have no idea if it works with knitr and similar things. Any feedback appreciated.
If you have multiple scripts source-ing each other, it is important to get the correct one. That is, the largest i for which sys.frame(i)$ofile exists.
get.full.path.to.this.sourced.script = function() {
for(i in sys.nframe():1) { # Go through all the call frames,
# in *reverse* order.
x = sys.frame(i)$ofile
if(!is.null(x)) # if $ofile exists,
return(normalizePath(x)) # then return the full absolute path
}
}

When to use `source()` or `attach()`

Part of my project directory structure looks like:
\projects\project\main.R
\projects\project\src
where \src contains a bunch of 1-function-per-file, project-specific functions.
Q: What's the best practice way to add these functions to the working directory projects\project?
There are a few solutions I see:
attach("./src"). I'm trying to avoid this because (1) the Google Styleguide recommends avoiding the use of attach() and (2) I receive the
Warning messages:
1: Reading Unix style database directory (./tmp) from Splus on Windows: may
have problems finding some datasets, especially those whose names
differ only by case (file tmp-script1.ssc should not have been made by
Splus on Windows) in: exists(name, where = db)
when doing this.
lapply(paste("./src/",list.files("./src/"),sep=""),source). This works perfectly fine, it just seems clunky. There has to be a better way, right?
Refer to my functions by their full name ./src/myfunc. This will get ugly very quick. I'm sure there's a better way.
Get rid of the ./src part of my directory and just throw all the functions in the main working directory. The problem with this is that I'd prefer to keep with a directory structure that is close to that of John Myles White's ProjectTemplate
Throw all the functions in one file, ./src/func.R and source that. I guess this approach avoids the ugliness of "2." above, but I'd really like to have one function per file. Just seems cleaner that way.

Try
lapply(list.files("src", full.names = TRUE), source)
EDIT
or
lapply(Sys.glob("src/*"), source)

If you don't want to put everything into a local package, then I'd go for option 2.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex