Let's say my colleagues and I have a shared directory, such as a SharePoint drive. Our file path to any given directory, say OurProject1 will be the same with the only difference being our username.
So for example my path will be: "C:/Users/JohnLennon/SharedDrive/SharedData/baseline_data"
While theirs will be: "C:/Users/RingoStarr/SharedDrive/SharedData/baseline_data"
I am trying to write a function that will allow any of my colleagues who has mapped the shared drive to run a script that accesses data in the shared data without them having to manually input their username. Keep in mind that the project directory is not the shared drive - that if I share this script with a colleague it will be kept outside of the shared directory and so relative file paths with regards to the project won't work.
I have been trying to approach this using an absolute file path set temporarily within the function that infers the first half of the directory path from getwd(). So the function looks a bit like this:
wd <- getwd() # get the users working dir
usr <- substr(wd, 1, 18) # extract the root down to the username
paste(usr, "SharedDrive/SharedData/baseline_data", sep = "") # prefix this onto the shared directory path
This works fine for RingoStarr, who has the same number of characters in his username as JohnLennon, but what about GeorgeHarrison, or all the other users? Counting characters on line two is clearly a limited approach.
I am looking for a modification to line two that will navigate "blindly" from the working directory, which we assume to be a subdirectory of "C:/Users/Username/" to two levels below the root directory (i.e. in the Username directory). ".." won't work here as we don't know where abouts within the the Username directory getwd() is.
I am also open to a different approach to the problem if one exists
Instead of substr, you can try strsplit and then paste with the collapse argument:
wd_split <- strsplit(wd, "\\/")
wd_split
# [[1]]
# [1] "C:" "Users" "JohnLennon" "SharedDrive" "SharedData"
usr <- paste(wd_split[[1]][1:3], collapse = "/")
usr
# "C:/Users/JohnLennon"
Related
In my main folder i have many sub folders like AA,BB,CC,DD ...etc. and all folders have a common script named run_script.R and i want to run this script in every folder. folder can be any amount.
Its working abut running in first folder only ,but i wanted it to run in every folder.
also when i am using setwd(folder) then showing error
Error in setwd(folder) : cannot change working directory
data_folder <- "C:/Users/mosho/Desktop/New folder (2)/"
allfolders <- data.frame(Folders = list.dirs(path = data_folder, recursive = F, full.names = F))
r_scripts <- "run_script.R"
for (folder in allfolders$Folders) {
#setwd(folder)
message(folder)
source(paste0(data_folder,folder,"/",r_scripts))
}
You are on a right path, I did some minor tweaks to your script which will resolve the issue. The points missing in your scripts are;
the allfolders contains the folder name not the entire explicit path. To set the working directory you need to set give the explicit path, by only calling the folder name will result into error unless you existing working directory is contains that folder. Anyways, its best practice to work with full path names.
also to simplify setting up allfolders as list for iterator will make your life lot easier than a data frame
Below is my work-out;
I created some dummy folders (DIC01, DIC02, DIC03...) under path "C:\Users\XXXXXX\Documents\TEST MAIN", and placed code run_script.R inside each one. This run_script.R contains simple code print("Hello World !!")
Next I set initial working directory where to the path where all the folders present i.e. to path "C:\Users\XXXXXX\Documents\TEST MAIN". Next listed the folders/directories present within this path as a list instead of data frame. Next is for loop which iterate over list of folder names. Inside we reset the working directory by the folder name and source the R code.
data_folder <- "C:\\Users\\XXXXXX\\Documents\\TEST MAIN"
setwd(data_folder)
allfolders <- list.dirs(path = data_folder, recursive = F, full.names = F)
r_scripts <- "run_script.R"
for (folder in allfolders) {
print(folder)
setwd(paste0(data_folder,"\\",folder))
source(paste0(data_folder,"\\",folder,"\\",r_scripts))
}
The result I get after the execution is something like this. First the name of the directory and then execution result.
I hope this resolves you problem. If yes Like/Up vote the answer and let me know.
I have used googledrive functions successfully to access xlsx spreadsheets on my own google drive - so
drive_download(file = "DIRECTOR_TM/Faculty/Faculty Productivity/Faculty productivity.xlsx",
overwrite=TRUE)
works and saves a local copy of the file for me to run analyses on.
Mid year we switched to using team drives and the equivalent
drive_download(file = "Director/Faculty/Faculty Productivity/Faculty productivity.xlsx",
overwrite=TRUE)
doesn't work - I get an error that says "Error: 'file' does not identify at least one Drive file."
So I have tried using the team_drive_get function - and am confused
Director <- team_drive_get("Director")
does work - I get a tribble with one observation. But the file I want is in a subdirectory in the "Director" team drive. So I tried
TeamDrive <- team_drive_get("Director/Faculty/Faculty Productivity/")
but the result is a 0 obs tribble.
How do I get access to a file in a subdirectory on a team drive?
googledrive uses IDs to identify objects in a flattened file structure for your team, i.e., you don't need to know the subdirectory. If you know the name of your file, you just need to search the team drive and find the ID (see your specific question---and why I found this---addressed below).
# environment variables
FILENAME <- "your_file_name"
TEAM_DRIVE_NAME <- "your_team_name_here"
# get file(s)
gdrive_files_df <- drive_find(team_drive = TEAM_DRIVE_NAME)
drive_download(
as_id(gdrive_files_df[gdrive_files_df$name == FILENAME,]$id),
overwrite = TRUE
)
Alternatively, this is what you can do if you do need to find the specific ID of a subdirectory (perhaps for an upload where there is no existing ID for the file).
# environment variables
FILEPATH <- "your_file_path"
TEAM_SUBDIRECTORY <- "your_subdirectory"
# grab the ID of your subdirectory and upload to that directory
drive_upload(
FILEPATH,
path = as_id(gdrive_files_df[gdrive_files_df$name == TEAM_SUBDIRECTORY,]$id),
name = FILENAME,
)
I have an object called wanted.bam with the list of wanted file names for all the .bam (is the extension) files in three of my directories path1,path2,path3. I am looping over all these directories to search for the wanted files. What I am trying to do is look for wanted files by looping over each directory and implement a FUNCTION in each file. This loop works for all the matched file in the first directory, but as it progresses to another directory, it breaks giving an error:
Error in value[[3L]](cond) :
failed to open BamFile: file(s) do not exist:
'sort.bam'
my code:
bam.dir<- c("path1","path2","path3")
for (j in 1:length(bam.dir)){
all.bam.files <- list.files(bam.dir[j])
all.bam.files <- grep(wanted.names, all.bam.files, value=TRUE)
print(paste("The wanted number of bam files in this directory:", (length(all.bam.files))))
if(length(all.bam.files)==0){
next
}else{
setwd(bam.dir[j])
}
print(paste("The working directory number:",j,":",(getwd())))
## ****using another loop here for each file to implement a function*****
all.FAD<- {}
for(i in 1:length(all.bam.files)){
output<- FUNCTION(all.bam.files[i])
}
}
You probably don't want to be changing working directory like this. Instead, use the option in list.files, full.names=TRUE, to return the full path of your files. Then, you can just use read.csv, or whatever, on the full path name without need to change directory. Your code is failing because after you set directory, the relative path to the next directory is changed.
If you want to keep changing directories, just make sure you set the directory back to the base directory at the end of the loop.
I've been working on a R project (projectA) that I want to hand over to a colleague, what would be the best way to handle workspace references in the scripts? To illustrate, let's say projectA consists of several R scripts that each read input and write output to certain directories (dirs). All dirs are contained within my local dropbox. The I/O part of the scripts look as follows:
# Script 1.
# Give input and output names and dirs:
dat1Dir <- "D:/Dropbox/ProjectA/source1/"
dat1In <- "foo1.asc"
dat2Dir <- "D:/Dropbox/ProjectA/source2/"
dat2In <- "foo2.asc"
outDir <- "D:/Dropbox/ProjectA/output1/"
outName <- "fooOut1.asc"
# Read data
setwd(dat1Dir)
dat1 <- read.table(dat1In)
setwd(dat2Dir)
dat2 <- read.table(dat2In)
# do stuff with dat1 and dat2 that result in new data foo
# Write new data foo to file
setwd(outDir)
write.table(foo, outName)
# Script 2.
# Give input and output names and dirs
dat1Dir <- "D:/Dropbox/ProjectA/output1/"
dat1In <- "fooOut1.asc"
outDir <- "D:/Dropbox/ProjectA/output2/"
outName <- "fooOut2.asc"
Etc. Each script reads and write data from/to file and subsequent scripts read the output of previous scripts. The question is: how can I ensure that the directory-strings remain valid after transfer to another user?
Let's say we copy the ProjectA folder, including subfolders, to another PC, where it is stored at, e.g., C:/Users/foo/my documents/. Ideally, I would have a function FindDir() that finds the location of the lowest common folder in the project, here "ProjectA", so that I can replace every directory string with:
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
So that:
# At my own PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "D:/Dropbox/ProjectA/source1/"
# At my colleagues PC
dat1Dir <- paste(FindDir(), "ProjectA/source1", sep= "")
> "C:Users/foo/my documents/ProjectA/source1/"
Or perhaps there is a different way? Our work IT infrastructure currently does not allow using a shared disc. I'll put helper-functions in an 'official' R project (ie, hosted on R forge), but I'd like to use scripts when many I/O parameters are required and because the code can easily be viewed and commented.
Many thanks in advance!
You should be able to do this by using relative directory paths. This is what I do for my R projects that I have in Dropbox and that I edit/run on both my Windows and OS X machines where the Dropbox folder is D:/Dropbox and /Users/robin/Dropbox respectively.
To do this, you'll need to
Set the current working directory in R (either in the first line of your script, or interactively at the console before running), using setwd('/Users/robin/Dropbox;) (see the full docs for that command).
Change your paths to relative paths, which mean they just have the bit of the path from the current directory, in this case the 'ProjectA/source1' bit if you've set your current directory to your Dropbox folder, or just 'source1' if you've set your current directory to the ProjectA folder (which is a better idea).
Then everything should just work!
You may also be interested in an R library that I love called ProjectTemplate - it gives you really nice functionality for making self-contained projects for this sort of work in R, and they're entirely reproducible, moveable between computers and so on. I've written an introductory blog post which may be useful.
How do I get the path to parent directory in R?
I have to write an R script that takes input from a directory in the parent directory and outputs data into another directory in the parent folder. So, if I could find path to parent folder, then I could do this.
You can use dirname on getwd to extract everything but the top most level of your current directory:
dirname(getwd())
[1] "C:/Documents and Settings"
Actually dirname allows to go back to several parent folders
Path="FolderA/FolderB/FolderC/FolderD"
dirname(Path)
"FolderA/FolderB/FolderC"
dirname(dirname(Path))
"FolderA/FolderB"
And so on...
I assume you mean parent directory of R's working directory?
The simplest solution is probably as follows.
wd <- getwd()
setwd("..")
parent <- getwd()
setwd(wd)
This saves the working directory, changes it to its parent, gets the result in parent, and resets the working directory again. This saves having to deal with the vagaries of root directories, home directories, and other OS-specific features, which would probably require a bunch of fiddling with regexes.
Possibly these two tips may help
"~/" # after the forward slash you "are" in your home folder
then on windows
"C:/" # you are in your main hard drive
"G:/" # you are just in another hard drive :-)
on unix you can do something similar with
"/etc/"
then you can go down into any sub directory you need
Or as #Hong Ooi suggests you can go up to the parent dir of your working directory with
"../"
NB: just after the final forward slash press tab and you'll have all the file and folder, very handy, especially in RStudio
Another possibility:
parts <- unlist(strsplit(getwd(), .Platform$file.sep))
do.call(file.path, as.list(parts[1:length(parts) - 1]))
This splits the filepath into directories, drops the last directory, and then recombines the parts into a filepath again.
You could simply use ".." like output_dir <- paste(input_dir, "..", "out", sep = .Platform$file.sep), or using the fs package (install.packages("fs")):
input_dir <- "base/input"
parent_dir <- fs::path(input_dir, "..") # "base/input/.."
output_dir <- fs::path(input_dir, "..", "out") # "base/input/../out"
# to shorten the path (avoid "input/../") you could use `fs::normalize`:
fs::normalize(fs::path(input_dir, "..", "out")) # "base/out"
# in case input is a symlink and you want the parent directory of the target, look at `fs::real`
In RStudio you could navigate to your code directory and "Set As Working Directory" in Files. And then ".." will work.