Specifying folder paths in R - r

I have a folder structure which looks like the following. I have a file which needs to be processed in the Data folder. After processing I'll be writing 2 files to Results folder and the Log folder. (I use R in a windows environment).
Below is my R code for setting the folder paths for files:
setwd("E:/Assignment")
myData = read.csv("E:/Assignment/Data/accident_rates.csv")
myResultsFileLoc = "E:/Assignment/Results/accident_rates_accumilated.csv"
myLogFileLoc = "E:/Assignment/Results/accident_rates_accumilated_Log.csv"
write.table(results, myResultsFileLoc,...)
write.table(results, myLogFileLoc ,...)
As I have already set the working directory to "E:/Assignment", I don't want to repeat this part for the sub folders when setting the paths for other files. Is there anyway to instruct R the folder is a sub folder of the set working directory (without specifying the complete path)? I tried using the below methods but it gives me errors.
myResultsFileLoc = "/Results/accident_rates_accumilated.csv"
myResultsFileLoc = "~/Results/accident_rates_accumilated.csv"
I'm looking for this solution because my folder structure might expand and it would be bit difficult to maintain the folder path for each and every file in the sub folder.

Related

How to import an external dataset into in a Moodle question?

I would like to import an external dataset using read.table() (or any other function for reading files) and then randomize or sample over it. The file is stored in a subfolder within the parent folder that contains the exercises *.rmd. I am working within a RStudio project. I tried placing the dataset in different levels of the folder structure. Using relative path did not work, but absolute paths did.
My folder structure is:
$home/project_name/exercises # It contains the RMD files
$home/project_name/exercises/data # It contains data files that I want to process
$home/project_name/datasets # this folder could eventually contain the dataset I want to process
To make this code more portable, I would like to know o the manage relative paths within *.Rmd for the knitting process.
The exercises are copied to a temporary directory and processed there. Hence, the easiest option is to copy these files to the temporary directory using include_supplement("file.csv"). By default this assumes that the file.csv is in the same directory that the exercise itself resides in. If it is in a subdirectory you can use include_supplement("file.csv", recursive = TRUE) and then subdirectories are searched recursively for file.csv.
After using include_supplement(), the copied file is available locally and can be processed with read.table() or also included in the exercise as a supplementary file. See http://www.R-exams.org/templates/Rlogo/ for a worked example. However, note that the Rlogo template explicitly specifies the directory from which the file should be copied. This is not necessary if that directory is the same as the exercise or a subdirectory.

Copying files of specific format from a folder and sub-folders and paste them into a new folder

I am trying to copy Java files by listing them from a folder (guava-master) and sub-folders (that is the reason I used the recursive function) using the code below
filenames <- list.files("C:/Users/shahr/Documents/master_unzip/guava-master/", pattern="*.java", recursive = TRUE)
The above list is fine but then...
I tried to paste them into a new folder using the code below
file.copy(filenames, "C:/Users/shahr/Documents/")
However, the output I receive is FALSE for all the files and I don't see any files being copied. Am I making any mistake?
many thanks.

R - Specify the directory using the package googlesheets

I use the googlesheets package. The default directory for spreadsheets is the root of Google Drive. I guess that I can specify the directory - like for a "normal" directory path - but I don't know how to do that.
gs_new(title = "MyData") # export to the root
gs_new(title = "Something/MyData") # export to the specified directory
I'm also interested in this question. I will try the following to see if it works. If not, I may try to use the 'googledrive' package on top of, or in replacement of, the 'googlesheets' package to do sheet creation in a list folder hierarchy. This way I can loop through a list of subfolders while creating any files inside them until all subfolders have their new files created.
So here's my thinking... When I have time to test this out, I'll let you know!
for(path in file_paths){
setwd(path)
for(file in files){
gs_new(file)
}
}
Of course, get your parent folder as a string and use list.files("string", full.names=TRUE). Then, if you have any subfolders (assuming they're created already), it'll return a list in which to loop through. If you just want to create one workbook at one location, simply setting the working directory might work. Again, I'll need to test this in multiple methods.

Iterate through folders, then subfolders and print filenames with path to text file

I am trying to use python to create the files needed to run some other software in batch.
For part of this I need to produce a text file that loads the needed data files into the software.
My problem is that the files I need to enter into this text file are stored in a set of structured folders.
I need to loop over a set of folders (up to 20), which each could contain up to 3 more folders which contain the files I need. The bottom level of the folders contain a set of files needed for each run of the software. The text file should have the path+name of these files printed line by line, add an instruction line and then move to the next set of files from a folder and so on until all of sub level folders have been checked.
Charles' answer is good, but can be improved upon to increase speed and efficiency. Each item produced by os.walk() (See docs) is a tuple of three items. Those items are:
The working directory
A list of strings naming any sub-directories present in the working directory
A list of files present in the working directory
Knowing this, much of Charles' code can be condensed with the modification of a forloop:
import os
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
r.append(os.path.join(root, name))
return r
Use os.walk(). The following will output a list of all files within the subdirectories of "dir". The results can be manipulated to suit you needs:
import os
def list_files(dir):
r = []
subdirs = [x[0] for x in os.walk(dir)]
for subdir in subdirs:
files = os.walk(subdir).next()[2]
if (len(files) > 0):
for file in files:
r.append(os.path.join(subdir, file))
return r
For python 3, change next() to __next__().
This will help to list specific file extension. In my sub-folders i have many files but i am only interested parquet files.
import os
dir = r'/home/output/'
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
filepath = root + os.sep + name
if filepath.endswith(".snappy.parquet"):
r.append(os.path.join(root, name))
return r

Can I work with multiple working directories in R?

Can I work with parallel working directories in R, or can I change the working directory in a loop to access the files from different folders?
I find it easier to have a single working directory. You find out what that is using the
getwd()
function. Typically, my working directory is something like:
~/colin/project1/R
You can change your working directory using
setwd()
You can easily access other files using the full path. In particular, I find
##List files in current directory
list.files()
##Give full path
list.files(full.names=TRUE)
##list files in the species1 directory
list.files("species1/", full.names=TRUE)
very handy.
Don't change the working directory in a loop, loop over the directories and use file.path to get to the file you want. Something like:
for(path in c("data1","data2","data3")){
for(file in c("file1.txt","file2.txt")){
fullPath = file.path(path,file)
doSomethingWith(fullPath)
}
}
That will loop over data1/file1.txt, data1/file2.txt and so on. Note it will also handle differences between path separators in different operating systems - don't try and paste file path components together with paste because you'll get it wrong.

Resources