Is it possible to define a cross-platform working directory for R? - r

I am teaching R tutorials in person w/ a large number of undergraduate R novices. I am also trying to format my notes on RPubs so that they can easily be used by other people. Nothing derails things faster than people mis-specifying working directories or saving spreadsheet files to someplace different than their working directory.
Is it possible to define a working directory that is universal across platforms? For example, a line of code or a function like
setwd( someplace that is likely to exist on every computer)
This could involve a function that finds some place the usually exists on all computers, such as the desktop, downloads folder, or R directory.

In general your best bet is to go for the user's home area,
setwd("~")
path.expand("~")
Since you are teaching novices, a common problem is that students notice the R package directory ~/R/ and assume that they should put their scripts this directory; thereby creating odd bugs. To avoid this, I would go for
dir.create("~/RCourse", FALSE)
setwd("~/RCourse")
If you use RStudio, you could get them to create an RStudio project.
In the past, I have come across situations where this doesn't work. For example, some people have their home area as a network drive, but can't connect to the internet or get through a firewall.

You asked about
someplace that is likely to exist on every computer
and yes, R works hard to ensure this returns a valid directory: tempdir().
The main danger, though, may be that this directory will vanish after the session (unless you override the default behaviour of removing the per-session temporary directory at end). Until then, it works.
Still, this can be useful. I sometimes use that to write temporary files I don't want to clutter in the current directory, or ~.
Otherwise #csgillespie gave you a good answer pertaining to $HOME aka ~.

Related

Unable to use correct file paths in R/RStudio

Disclaimer: I am very new here.
I am trying to learn R via RStudio through a tutorial and very early have encountered an extremely frustrating issue: when I am trying to use the read.table function, the program consistently reads my files (written as "~/Desktop/R/FILENAME") as going through the path "C:/Users/Chris/Documents/Desktop/R/FILENAME". Note that the program is considering my Desktop folder to be through my documents folder, which is preventing me from reading any files. I have already set and re-set my working directory multiple times and even re-downloaded R and RStudio and I still encounter this error.
When I enter the entire file path instead of using the "~" shortcut, the program is successfully able to access the files, but I don't want to have to type out the full file path every single time I need to access a file.
Does anyone know how to fix this issue? Is there any further internal issue with how my computer is viewing the desktop in relation to my other files?
I've attached a pic.
Best,
Chris L.
The ~ will tell R to look in your default directory, which in Windows is your Documents folder, this is why you are getting this error. You can change the default directory in the RStudio settings or your R profile. It just depends on how you want to set up your project. For example:
Put all the files in the working directory (getwd() will tell you the working directory for the project). Then you can just call the files with the filename, and you will get tab completion (awesome!). You can change the working directory with setwd(), but remember to use the full path not just ~/XX. This might be the easiest for you if you want to minimise typing.
If you use a lot of scripts, or work on multiple computers or cross-platform, the above solution isn't quite as good. In this situation, you can keep all your files in a base directory, and then in your script use the file.path function to construct the paths:
base_dir <- 'C:/Desktop/R/'
read.table(file.path(base_dir, "FILENAME"))
I actually keep the base_dir assignemnt as a code snippet in RStudio, so I can easily insert it into scripts and know explicitly what is going on, as opposed to configuring it in RStudio or R profile. There is a conditional in the code snippet which detects the platform and assigns the directory correctly.
When R reports "cannot open the connection" it means either of two things:
The file does not exist at that location - you can verify whether the file is there by pasting the full path echoed back in the error message into windows file manager. Sometimes the error is as simple as an extra subdirectory. (This seems to be the problem with your current code - Windows Desktop is never nested in Documents).
If the file exists at the location, then R does not have permission to access the folder. This requires changing Windows folder permissions to grant R read and write permission to the folder.
In windows, if you launch RStudio from the folder you consider the "project workspace home", then all path references can use the dot as "relative to workspace home", e.g. "./data/inputfile.csv"

Relative paths in R: how to avoid my computer being set on fire?

A while back I was reading an article about improving project workflow. The advice was not to use setwd or my computer would burn:
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
I started using the here package and it worked great until I started to schedule scripts using cronR. After asking this question my laptop was again threatened with arson:
If the first line of your #rstats script is wd <- here(), I will come
into your lab and SET YOUR COMPUTER ON FIRE.
Fearing for my laptop's safety I started using the method suggested in the answer to get relative file paths:
wd <- Sys.getenv("HOME")
wd <- file.path(wd, "projects", "my_proj")
Which worked for me but not people I was working with who didn't have the same projects directory. So now I'm confused. What is the safest / best way get relative file paths so that a project can be portable?
There are quite a few options: 1, 2. My requirements are to source functions/scripts and read/write csv files. Perhaps the rprojroot package is the best bet?
Create an RStudio project and then reference all files with relative paths from the project's root folder. That way, all users will open the project and automatically have the correct working directory.
There are many ways to organize code and data for use with R. Given that the "arsonist" described in the OP has rejected at least two approaches for locating the project files in an R script, the best next step is to ask the arsonist how s/he performs this function, and adjust your code and file structures accordingly.
UPDATE: Since the "arsonists" appear to be someone who writes on Tidyverse.org (see Tidyverse article in OP) and an answer on SO (see additional links in OP), your computer appears to be relatively safe.
If you are sharing code or executing it with batch processes where the "user" is someone other than you, a useful approach is to place the code, data, and configuration under version control, and develop a runbook to explain how others can retrieve the components and execute them on another computer.
As noted in the comments to the OP, there's nothing wrong with here::here() if its use can be made reliable through documentation in a runbook.
I structure all of my R code into Projects within RStudio, which are organized into a gitrepositories directory. All of the projects can be accessed as subdirectories from the gitrepositories directory. If I need to share a project, I make the project accessible to other users on GitHub.
In my R code I reference external files as subdirectories from the project root directory, such as ./data/gen01.csv.
There are two parts to this question:
how to load data from a relative path, and
how to load code from a relative path
For most use cases (including when invoking tools from a CRON job or similar) the location of the data should either be specified by the user (via command line arguments, standard input or environment variables) or should be relative to the current working directory (getwd() in R).
… Unless the data is a fixed part of the project itself — more on this below.
Loading code from a path that’s relative to other code is simply not supported by base R. For example, source('xyz.r') won’t source an xyz.r file from the project. It will always try to load it from the current working directory, whatever that happens to be. Which is pretty much never what you want. And as you’ve noticed, the ‘here’ package also doesn’t always work.
R basically only works when code is only loaded from packages. But packages aren’t suitable for all types of projects. R has no built-in solution for those other cases. I recommend using ‘box’ modules to solve this. ‘box’ provides a modern module system for R, which means that you can have R projects consisting of multiple code files (and nested sub-projects), without having to wrap them in packages. Loading code inside the same relative path in a module is as simple as
box::use(./xyz)
This always works, as you’d expect from a modern module system, and doesn’t require ‘here’ or similar hacks.
OK, back to the point about data that’s bundled with a project itself. If your project is an R package, you’d use system.file() to load that data. However, this once again doesn’t work for non-package projects. But if you use ‘box’ modules to structure your project, you can use box::file() to load data that’s associated with a module.
Packages such as ‘here’ or ‘rprojroot’, while well-intended, are essentially hacks to work around limitations in R’s handling of non-package code. The proper solution is to make non-package code into a first-class citizen of the R world, and ‘box’ does that.
You can check docs of RSuite package (https://RSuite.io). It is working with script_path that points to currently run R script. I use it to make relative paths using 'file.path' command

How can I share (GitHub) my code (R) with sensitive information (passwords)?

Imagine you are using a package that uses an access token. Maybe one from rOpenSci.
My current approach is to source a file at the beginning that is in the .gitignore. It, hence, gets ignored and I can share without worries.
source("never-commit-password.R")
However, there is still a danger that it might be uploaded via .RData because I left it in the workspace.
What is leading practice that trades off convenience with safety?

How i Connect the data that's in my folder without writing the specific path like:"C:\\Users\\Dima\\Desktop\\NewData\\..."

I am writing a script that's Requires Data Which is in my computer folder.
But eventually this script will be used in another computer, by another person.
I can't tell him to change all the paths to the data in the script.
How i Connect the data that's in my folder without writing the specific path
Like:"C:\Users\Dima\Desktop\NewData\..."
The best way of making your code shareable depends upon your use case.
As Carl Witthoft pointed out, most code should be encapsulated in functions. These functions can then be packaged into packages and easily redistributed on other peoples's machines. Writing packages is easier than you think.
For one off analyses, scripts are appropriate. How you make them user-independent depends on who your users are. If your are sharing the script with colleagues, try to keep your data on a network drive, then the link to the data will be the same for everyone. If you are sharing your script with the world, then keep your data on the internet, and the link to the data will be a hyperlink, again, the same for everyone.
If you are sharing your script with a few people who don't have access to a common drive, and you can't put your data on the internet, then some directory manipulation is acceptable.
Change your working directory to the root of where your project files are.
setwd("c:/Users/Dima/My Project")
Then you can reference the location of the data using relative paths.
data_file <- "Data/My data file.csv"
my_data <- read.csv(data_file)
Assuming that you keep the directory structure within your project the same, then you only need to change the call to setwd on each machine.
Also note that the special location "~" refers to your user home directory. Try
normalizePath("~")
That way, if you keep your project in that location, you can avoid reference to "Dima" entirely.

.goutputstream-XXXXX - possible to relocate?

I've been trying to create a union file system for a college project. One of its features that differentiates it from unionfs is the fact that there are no copy-ups. This means that if a file is located in a certain branch, it will remain there even if it is written to.
But my current problem with that is the fact that .goutputstream-XXXXX are created, renamed, and deleted whenever a write operation occurs. This is actually OK if the file being written to is in the highest priority branch (i.e. the default branch where files can be created), but makes my kernel crash if I try to write to a file in a lower branch.
How do I deal with this? How can I rig it so that all .goutputstream-XXXXX files are written to only one location? These .goutputstream-XXXXX files seem to be intricately connected to the files they correspond too, and seem to work only the same directory as the file being written to.
I also noticed that .goutputstream-XXXXX files appear when a directory is read. What are they for, anyway?
There has been a bug submitted to the ubuntu launchpad in which the creation of .goutputstream-xxxxx files is discussed.
https://bugs.launchpad.net/ubuntu/+source/lightdm/+bug/984785
From what i see now, these files are created when shutting down without preceding logout, but several other sources may occur, like evince or maybe gedit.
maybe lightdm has something to do with the creation of these files.
which distribution did you use?
maybe changing the distribution would help.
.goutputstream-XXXXX created by gedit and there is no simple way (menu or settings) to relocate them.

Resources