R - Specify the directory using the package googlesheets - r

I use the googlesheets package. The default directory for spreadsheets is the root of Google Drive. I guess that I can specify the directory - like for a "normal" directory path - but I don't know how to do that.
gs_new(title = "MyData") # export to the root
gs_new(title = "Something/MyData") # export to the specified directory

I'm also interested in this question. I will try the following to see if it works. If not, I may try to use the 'googledrive' package on top of, or in replacement of, the 'googlesheets' package to do sheet creation in a list folder hierarchy. This way I can loop through a list of subfolders while creating any files inside them until all subfolders have their new files created.
So here's my thinking... When I have time to test this out, I'll let you know!
for(path in file_paths){
setwd(path)
for(file in files){
gs_new(file)
}
}
Of course, get your parent folder as a string and use list.files("string", full.names=TRUE). Then, if you have any subfolders (assuming they're created already), it'll return a list in which to loop through. If you just want to create one workbook at one location, simply setting the working directory might work. Again, I'll need to test this in multiple methods.

Related

Is there a way to reference files in a folder within the working directory in R?

I have already finished with my RMarkdown and I'm trying to clean up the workspace a little bit. This isn't exactly a necessary thing but more of an organizational practice which I'm not even sure if it's a good practice, so that I can keep the data separate from some scripts and other R and git related files.
I have a bunch of .csv files for data that I used. Previously they were on (for example)
C:/Users/Documents/Project
which is what I set as my working directory. But now I want them in
C:/Users/Document/Project/Data
The problem is that this only breaks the following code because they are not in the wd.
#create one big dataframe by unioning all the data
bigfile <- vroom(list.files(pattern = "*.csv"))
I've tried adding a full path to list.files() to where the csvs are but no luck.
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv"))
Error: 'data1.csv' does not exist in current working directory ('C:/Users/Documents/Project').
Is there a way to only access the /Data folder once for creating my dataframe with vroom() instead of changing the working directory multiple times?
You can list files including those in all subdirectories (Data in particular) using list.files(pattern = "*.csv", recursive = TRUE)
Best practices
Have one directory of raw and only raw data (the stuff you measured)
Have another directory of external data (e.g. reference data bases). This is something you do can remove afterwards and redownload if required.
Have another directory for the source code
Put only the source code directory under version control plus one other file containing check sums of the raw and external data to proof integrity
Every other thing must be reproducible using raw data and the source code. This can be removed after the project. Maybe you want to keep small result files (e.g. tables) which take long time to reproduce.
You can list the files and capture the full filepath name right?
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv", full.names = T))
and that should read the file in the directory without reference to your wd
Try one of these:
# list all csv files within Data within current directory
Sys.glob("Data/*.csv")
# list all csv files within immediate subdirectories of current directory
Sys.glob("*/*.csv")
If you only have csv files then these would also work but seem less desirable. Might be useful though if you quickly want to review what files and directories are there. (I would be very careful not to use the second one within statements to delete files since if you are not in the directory you think it is in then you can wind up deleting files you did not intend to delete. The first one might too but is a bit safer since it would only lead to deleting wrong files if the directory you are in does have a Data subdirectory.)
# list all files & directories within Data within current directory
Sys.glob("Data/*")
# list all files & directories within immediate subdirectories of current directory
Sys.glob("*/*")
If the subfolder always has the same name (or the same number of characters), you should be able to do it thanks to substring. In your example, "Data" has 4 characters (5 with the /), so the following code should do:
Repository <- substring(getwd(), 1, nchar(getwd())-5)

Is there a way to change all the directory path for all R files?

I recently got a new computer and moved all my work files over to it. The main issue is that the file structure is slightly different than from my previous computer. Therefore, for my R code to work correctly, I'd need to change the path in setwd() for each of my files. Is there an efficient way to do this? Or is there a more efficient best practice for setting the directory or reading files into R?
I highly recommend the here package. Much more efficient way to organize, find, read, and collaborate with/across R files compared to setwd(), which is bound by local use and paths.
1) If all the paths are set in setwd commands then as a first step define your own setwd that checks if the argument is an old path and if so replaces it with a new one. The software otherwise does not need to change. so it can be done quickly.
setwd <- function(dir) {
if (dir == "oldpath1") dir <- "newpath1"
else if (dir == "oldpath2") dir <- "newpath2"
# etc
base::setwd(dir)
}
2) To fix this for the future as well, instead of the above, define the paths as options and put them in your .Rprofile file.
setwd <- function(dir) {
if (dir == "oldpath1") dir <- getOption("MYPROJ_PATH1")
else if (dir == "oldpath2") dir <- getOption("MYPROJ_PATH2")
# etc
base::setwd(dir)
}
and in your .Rprofile
options(MYPROJ_PATH1 = "...whatever...")
options(MYPROJ_PATH2 = "...whatever...")
# etc
Then if you move computers again or change paths for any reason then it is just a matter of setting the options in the .Rprofile .
An additional benefit of this is that if you forget where things are such as when returning to a project that you have not worked on for some period of time the key paths are located in your .Rprofile for all your projects.
The .Rprofile is usually in the path shown by this R command
path.expand("~/.Rprofile")
but can be placed in certain other locations as discussed in ?Startup .
3) Over time you might want to remove the setwd command defined above and replace each use of setwd using code like this:
myproj_path1 <- getOption("MYPROJ_PATH1")
setwd(myproj_path1)
Also you might be able to simplify things if everything in the project can be put in a single directory tree in which case you could just change the root of the tree and keep all other directories as fixed relative path offsets which do not change in moving to another computer. Thus there is only one root directory that needs to be changed each time you move.
root <- getOption("MYPROJ_ROOT")
path1 <- file.path(root, "relative_path1")
path2 <- file.path(root, "relative_path2")
Smaller projects can often do that but if there are several projects that share resources that likely won't be possible. For example you may have a database directory that is shared and other directories that are not. At any rate you can try to reduce the number of root paths as much as possible by fixing relative paths to the degree feasible and only changing roots.

How to import an external dataset into in a Moodle question?

I would like to import an external dataset using read.table() (or any other function for reading files) and then randomize or sample over it. The file is stored in a subfolder within the parent folder that contains the exercises *.rmd. I am working within a RStudio project. I tried placing the dataset in different levels of the folder structure. Using relative path did not work, but absolute paths did.
My folder structure is:
$home/project_name/exercises # It contains the RMD files
$home/project_name/exercises/data # It contains data files that I want to process
$home/project_name/datasets # this folder could eventually contain the dataset I want to process
To make this code more portable, I would like to know o the manage relative paths within *.Rmd for the knitting process.
The exercises are copied to a temporary directory and processed there. Hence, the easiest option is to copy these files to the temporary directory using include_supplement("file.csv"). By default this assumes that the file.csv is in the same directory that the exercise itself resides in. If it is in a subdirectory you can use include_supplement("file.csv", recursive = TRUE) and then subdirectories are searched recursively for file.csv.
After using include_supplement(), the copied file is available locally and can be processed with read.table() or also included in the exercise as a supplementary file. See http://www.R-exams.org/templates/Rlogo/ for a worked example. However, note that the Rlogo template explicitly specifies the directory from which the file should be copied. This is not necessary if that directory is the same as the exercise or a subdirectory.

Can I work with multiple working directories in R?

Can I work with parallel working directories in R, or can I change the working directory in a loop to access the files from different folders?
I find it easier to have a single working directory. You find out what that is using the
getwd()
function. Typically, my working directory is something like:
~/colin/project1/R
You can change your working directory using
setwd()
You can easily access other files using the full path. In particular, I find
##List files in current directory
list.files()
##Give full path
list.files(full.names=TRUE)
##list files in the species1 directory
list.files("species1/", full.names=TRUE)
very handy.
Don't change the working directory in a loop, loop over the directories and use file.path to get to the file you want. Something like:
for(path in c("data1","data2","data3")){
for(file in c("file1.txt","file2.txt")){
fullPath = file.path(path,file)
doSomethingWith(fullPath)
}
}
That will loop over data1/file1.txt, data1/file2.txt and so on. Note it will also handle differences between path separators in different operating systems - don't try and paste file path components together with paste because you'll get it wrong.

Where to store .xls file for xlsReadWrite in R

I am relatively new to R and am having some trouble with how to access my data. I have my test.xls file created in my MYDocuments. How to I access it from R
library(xlsReadWrite)
DF1 <- read.xls("test.xls") # read 1st sheet
Set the working directory with:
setwd("C:/Documents and Settings/yourname/My Documents")
This link may be useful as a method of making working folders per project and then placing all relevant info in that folder. It's a nice tutorial for making project files that contain everything you need. This is one approach.
http://www.dangoldstein.com/flash/Rtutorial2/Rtutorial2.html
The setwd() is another approach. I use a combination of the two in my work.

Resources