When to use `source()` or `attach()` - r

Part of my project directory structure looks like:
\projects\project\main.R
\projects\project\src
where \src contains a bunch of 1-function-per-file, project-specific functions.
Q: What's the best practice way to add these functions to the working directory projects\project?
There are a few solutions I see:
attach("./src"). I'm trying to avoid this because (1) the Google Styleguide recommends avoiding the use of attach() and (2) I receive the
Warning messages:
1: Reading Unix style database directory (./tmp) from Splus on Windows: may
have problems finding some datasets, especially those whose names
differ only by case (file tmp-script1.ssc should not have been made by
Splus on Windows) in: exists(name, where = db)
when doing this.
lapply(paste("./src/",list.files("./src/"),sep=""),source). This works perfectly fine, it just seems clunky. There has to be a better way, right?
Refer to my functions by their full name ./src/myfunc. This will get ugly very quick. I'm sure there's a better way.
Get rid of the ./src part of my directory and just throw all the functions in the main working directory. The problem with this is that I'd prefer to keep with a directory structure that is close to that of John Myles White's ProjectTemplate
Throw all the functions in one file, ./src/func.R and source that. I guess this approach avoids the ugliness of "2." above, but I'd really like to have one function per file. Just seems cleaner that way.

Try
lapply(list.files("src", full.names = TRUE), source)
EDIT
or
lapply(Sys.glob("src/*"), source)

If you don't want to put everything into a local package, then I'd go for option 2.

Related

RDCOMClient log file

I have been using RDCOMClient for a while now to interact with vendor software. For the most part it has worked fine. Recently, however, I have the need to loop through many operations (several hundred). I am running into problems with the RDCOM.err file growing to a very large size (easily GBs). This file is put in C: with no apparent option to change that. Is there some way that I can suppress this output or specify another location for the file to go? I don't need any of the output in the file so suppressing it would be best.
EDIT: I tried to add to my script a file.remove but R has the file locked. The only way I can get the lock released is to restart R.
Thanks.
Setting the permissions to read only was going to be my suggested hack.
A slightly more elegant approach is to edit one line of the C code in the package in src/RUtils.h from
\#define errorLog(a,...) fprintf(getErrorFILE(), a, ##__VA_ARGS__); fflush(getErrorFILE());
to
\#define errorLog(a, ...) {}
However, I've pushed some simple updates to the package on github that add a writeErrors() function that one can use to toggle whether errors are written or not. So this allows this to be turned on and off dynamically.
So
library(RDCOMClient)
writeErrors(FALSE)
will turn off the error logging to the file.
I found a work around for this. I created the files C:\RDCOM.err and C:\RDCOM_server.err and marked them both as read-only. I am not sure if there is a better way to accomplish this, but for now I am running without logging.

R command dir.create and file.path

I've just started learning r and confused by the following question given in the course:
Create a directory in the current working directory called “testdir2” and a subdirectory for it called “testdir3”, all in one command by using dir.create() and file.path().
I couldn't get it to accept my answer and then found another site online giving the answers. This is the answer the other site gave:
dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)
After copy/pasting this answer it still didn't let me progress in the course. Is there something wrong with the answer?
Also why would i want to use file.path to create the folders. Would it not make more sense to do this:
dir.create("testdir2/testdir3", recursive = TRUE)
What is the purpose of using the file.path function to create folders?
The purpose of using file.path() to create folders is so that you can write a function, script, or package that can be used by people that are using different kinds of computers. Different kinds of computers, or Platforms, use different file separators. Unix systems use the forward slash: /. This includes macs. Windows systems use the back slash: \.
Try looking at .Platform and you'll see there are a number of variables that you can refer to in order to create platform independent code. .Machine is another one.
Anyhow, the idea is that file.path(dir1, dir2, dir3) can create a valid path no matter what kind of platform R is running on.
As to why an auto grader didn't accept your answer, they can be very finicky, especially about hidden whitespace characters you can sometimes pick up when copying and pasting. Sometimes they test the output that your command produces, but sometimes (bad) auto graders just test the input, so even if your command would produce the same behavior, if it's not exactly the same, the auto grader won't accept it.
I got this code to work:
dir.create(file.path("testdir2","testdir3"), recursive = TRUE)
I think you just needed double quotes around "testdir2" and "testdir3".

Sourcing So many R scripts

I have lots of .r scripts that I want to source all. I have written a function like the one below to source.
sourcer=function(){
source("wil.r")
source("k.r")
source("l.r")
}
Please can any one tell me how to get this codes activated and how to call each one any time I want to use it?
In addition to the answer by #user2885462, if the amount of R code you need to source becomes bigger, you might want to wrap the code into an R package. This provides a convenient way of loading the code, and allows you to add tests, documentation, etc. Reading the official package writing tutorial is a good place to start for that.
For an individual project, I like to have all (or most) of my R functions in separate .r files, all in the same folder: e.g., AllFunctions
Then at the beginning of my main code I run the following line of code, which sources all .r (and other extensions if they exist - which they usually don't) in the AllFunctions folder:
for (nm in list.files("AllFunctions", pattern = ".[RrSsQq]$")) source(file.path("AllFunctions", nm))

how can i get the current file directory in R

I have seen many related answers here,but i didn't get a proper way to solve my problem under windows system...
I know the link the similar question
I got that setwd() can locate the directory what i want,however,my R script may move to another directory without any modification,so I want to know the current file directory,becase there are expression like source(...),this called source file and the execution file under the same parent directory in a R project,how I can do?
any help appreciated.
You can get your current directory using the getwd() function and give it a name, say:
cpath = getwd()
Another useful function is the file.path, which can help you specify new directories with simple syntax. For example, you want to get the directory that is one level "above" the current directory, you can use:
upp.dir = file.path("..", "cpath")
This gives upp.dir as "../Your_Current_Dir". How about changing to another folder (called Folder_A) in current directory? Use:
folderA = file.path("cpath", "Folder_A")
These may help easy navigate the file system.
Basically, if you write scripts and those scripts depend on where they are, then you are Doing It Wrong.
Write code in packages. Parameterise functions to make them generally applicable. If you have folders with data in, then make one of those parameters a folder.
A script called with source() cannot reliably locate itself, but that shouldn't be a problem, because WHATEVER CALLED THE SCRIPT knows where the script is (it has to, or how else can it call it?) so it could pass that as a parameter. Something like:
> youarehere = "C:\foo\"
> source("C:\foo\bar.R")
and now bar.R can do setwd(youarehere) and it will work, even if it is badly written such that it relies on sourcing other code in its containing folder.
Or you can do:
> setwd(youarehere)
> source("bar.R")
in your calling function.
But really, its a fail, its a sign of badly written code. Use functions, write packages, use devtools, its really not that hard, then your code will work anywhere and you wont be writing stupid scripts that are a twisty turny maze of source() calls.
Stay classy.

risks of using setwd() in a script?

I've heard it said that it is bad practice to use setwd() in a script.
What are the risks/dangers associated with it?
What are better alternatives?
It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).
My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.
Yihui Xie (author of knitr) feels particularly strongly about this:
https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0
Whenever you want to manipulate files, they are assumed to be under
the same directory of your source (e.g. Rnw documents). Then you can
always use relative paths and you will never need to setwd(). Using
setwd() contradicts with the principle of reproducibility, e.g. you
use setwd('foo/bar/') and the directory may not exist in other
people's computers. See FAQ 7:
https://github.com/yihui/knitr/blob/master/FAQ.md
And from the aforementioned FAQ 7:
You'd better not do this [change working directory inside knitr code
chunks]. Your working directory is always getwd() (all output files
will be written here), but the code chunks are evaluated under the
directory where your input document comes from. Changing working
directories while running R code is a bad practice in general. See #38
for a discussion. You should also try to avoid absolute directories
whenever possible (use relative directories instead), because it makes
things less reproducible.
See also: https://github.com/yihui/knitr/issues/38
I can't think of any particular issues with using setwd() in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd() when being lazy about paths - see below!
I use file.path() extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd().
kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
# full path to the report file created
sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
# full path to the csv file that will be created
ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
#
# ok... we're going to process this CSV file...
r1 <- read.csv(file.path(kInDir, fnam))
#
# we''ll put the output from the analysis into this report file
sink(sfnam, split=TRUE)
# processs it... into a new data.frame k1
# blah blah blah...
#
write.csv(k1, file=ofnam, row.names=FALSE)
sink() # turn off this particular report file
}
Toward the better alternatives question:
I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.
RStudio - Projects
I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.
ProjectTemplate
On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.
Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient
setrelwd <- function(rel_path){
curr_dir <- getwd()
abs_path <- file.path(curr_dir,rel_path)
if(dir.exists(abs_path)){
setwd(abs_path)
}
else
{
warning('Directory does not exist. Please create it first.')
}
}
> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.
Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist
To make things a bit more portable where I work we all put this in a Rprofile
hdrive=
switch(Sys.info()[[1]],
'Linux'="/mnt/hdrive",
'Windows'="H:/",
"Darwin"="/Volumes/hdrive/mnt/hdrive"
)
So i always have that variable to get me to our shared drive. Then in my script we can write
setwd(paste(hdrive,"/relative/path/",sep="/"))
So that gets us around some of the problems that others are talking about.
I personally added the following code. I use Sys.info() and any() with unique information.
First step is to use Sys.info() and find the unique identifier for your computer.
if(any(Sys.info() == "COMPUTER1")) {
setwd("c:/Users/user1/repos/project/")
}
if(any(Sys.info() == "COMPUTER2")) {
setwd("home/user1/repos/project/")
}
and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.
For reproduction it does not change anyone's working directory unless they are that specific user.

Resources