Can I work with parallel working directories in R, or can I change the working directory in a loop to access the files from different folders?
I find it easier to have a single working directory. You find out what that is using the
getwd()
function. Typically, my working directory is something like:
~/colin/project1/R
You can change your working directory using
setwd()
You can easily access other files using the full path. In particular, I find
##List files in current directory
list.files()
##Give full path
list.files(full.names=TRUE)
##list files in the species1 directory
list.files("species1/", full.names=TRUE)
very handy.
Don't change the working directory in a loop, loop over the directories and use file.path to get to the file you want. Something like:
for(path in c("data1","data2","data3")){
for(file in c("file1.txt","file2.txt")){
fullPath = file.path(path,file)
doSomethingWith(fullPath)
}
}
That will loop over data1/file1.txt, data1/file2.txt and so on. Note it will also handle differences between path separators in different operating systems - don't try and paste file path components together with paste because you'll get it wrong.
Related
I have already finished with my RMarkdown and I'm trying to clean up the workspace a little bit. This isn't exactly a necessary thing but more of an organizational practice which I'm not even sure if it's a good practice, so that I can keep the data separate from some scripts and other R and git related files.
I have a bunch of .csv files for data that I used. Previously they were on (for example)
C:/Users/Documents/Project
which is what I set as my working directory. But now I want them in
C:/Users/Document/Project/Data
The problem is that this only breaks the following code because they are not in the wd.
#create one big dataframe by unioning all the data
bigfile <- vroom(list.files(pattern = "*.csv"))
I've tried adding a full path to list.files() to where the csvs are but no luck.
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv"))
Error: 'data1.csv' does not exist in current working directory ('C:/Users/Documents/Project').
Is there a way to only access the /Data folder once for creating my dataframe with vroom() instead of changing the working directory multiple times?
You can list files including those in all subdirectories (Data in particular) using list.files(pattern = "*.csv", recursive = TRUE)
Best practices
Have one directory of raw and only raw data (the stuff you measured)
Have another directory of external data (e.g. reference data bases). This is something you do can remove afterwards and redownload if required.
Have another directory for the source code
Put only the source code directory under version control plus one other file containing check sums of the raw and external data to proof integrity
Every other thing must be reproducible using raw data and the source code. This can be removed after the project. Maybe you want to keep small result files (e.g. tables) which take long time to reproduce.
You can list the files and capture the full filepath name right?
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv", full.names = T))
and that should read the file in the directory without reference to your wd
Try one of these:
# list all csv files within Data within current directory
Sys.glob("Data/*.csv")
# list all csv files within immediate subdirectories of current directory
Sys.glob("*/*.csv")
If you only have csv files then these would also work but seem less desirable. Might be useful though if you quickly want to review what files and directories are there. (I would be very careful not to use the second one within statements to delete files since if you are not in the directory you think it is in then you can wind up deleting files you did not intend to delete. The first one might too but is a bit safer since it would only lead to deleting wrong files if the directory you are in does have a Data subdirectory.)
# list all files & directories within Data within current directory
Sys.glob("Data/*")
# list all files & directories within immediate subdirectories of current directory
Sys.glob("*/*")
If the subfolder always has the same name (or the same number of characters), you should be able to do it thanks to substring. In your example, "Data" has 4 characters (5 with the /), so the following code should do:
Repository <- substring(getwd(), 1, nchar(getwd())-5)
I am trying to copy Java files by listing them from a folder (guava-master) and sub-folders (that is the reason I used the recursive function) using the code below
filenames <- list.files("C:/Users/shahr/Documents/master_unzip/guava-master/", pattern="*.java", recursive = TRUE)
The above list is fine but then...
I tried to paste them into a new folder using the code below
file.copy(filenames, "C:/Users/shahr/Documents/")
However, the output I receive is FALSE for all the files and I don't see any files being copied. Am I making any mistake?
many thanks.
I use the googlesheets package. The default directory for spreadsheets is the root of Google Drive. I guess that I can specify the directory - like for a "normal" directory path - but I don't know how to do that.
gs_new(title = "MyData") # export to the root
gs_new(title = "Something/MyData") # export to the specified directory
I'm also interested in this question. I will try the following to see if it works. If not, I may try to use the 'googledrive' package on top of, or in replacement of, the 'googlesheets' package to do sheet creation in a list folder hierarchy. This way I can loop through a list of subfolders while creating any files inside them until all subfolders have their new files created.
So here's my thinking... When I have time to test this out, I'll let you know!
for(path in file_paths){
setwd(path)
for(file in files){
gs_new(file)
}
}
Of course, get your parent folder as a string and use list.files("string", full.names=TRUE). Then, if you have any subfolders (assuming they're created already), it'll return a list in which to loop through. If you just want to create one workbook at one location, simply setting the working directory might work. Again, I'll need to test this in multiple methods.
There is a folder that contains some files of interest. Using R, I want to call every file in the folder. I can do it individually for each file as follows:
source("filename.r")
But is there any way to specify all such files in the folder in one stroke?
I would use the list.files() command in combination with a for loop. Make sure you have set your working directory. You can also specify a directory in list.files()
Files<-list.files()
for(i in 1:length(Files))
{
assign(paste("File_",i,sep=""),read.csv(Files[i]))
}
Obviously, the second input in my assign function can differ depending on what type of files you are looking to call
I have a list of folders. In each folder there's an R identical script that have to run on files into the folder. I wrote the script one time and copied the script in each folder. The problem is that I have a list of around 100 folders so it is impossible to me to setwd() in the current working dir by hand. I would like to know if it is possible to set current working dir with for example a "." in this way:
setwd("/User/myname/./")
or in another easy way that tells R the current working dir instead of typing every time the folder name.
how about this?
# set the working directory to the main folder containing all the directories
setwd( "/user/yourdir/" )
# pull all files and folders (including subfolders) into a character vector
# keep ONLY the files that END with ".R" or ".r"
r.scripts <- list.files( pattern=".*\\.[rR]$" , recursive = TRUE )
# look at the contents.. now you've got just the R scripts..
# i think that's what you want?
r.scripts
# and you can loop through and source() each one
for ( i in r.scripts ) source( i )
As far as I understand, you want to trigger a batch of R scripts, where the scripts are distributed across a number of folders.
Personally, I would probably write a shell script (or OS equivilent) to do this, rather than doing it in R.
for dir in /directoriesLocation/*/
do
cat $dir/scriptName.R | R --slave --args $arg1 $arg2
done
where $dir is the the location of all directories containing the R script scriptName.R
In addition to the other great answers the source function has a chdir argument that will temporarily change the working directory to the one that the sourced file is in.
One option would be to create a vector with the filenames (including paths) for each of your script files, using list.files and/or other tools. Then source each of those files and letting source with chdir handle setting the working directory for you.