There is a folder that contains some files of interest. Using R, I want to call every file in the folder. I can do it individually for each file as follows:
source("filename.r")
But is there any way to specify all such files in the folder in one stroke?
I would use the list.files() command in combination with a for loop. Make sure you have set your working directory. You can also specify a directory in list.files()
Files<-list.files()
for(i in 1:length(Files))
{
assign(paste("File_",i,sep=""),read.csv(Files[i]))
}
Obviously, the second input in my assign function can differ depending on what type of files you are looking to call
Related
I have already finished with my RMarkdown and I'm trying to clean up the workspace a little bit. This isn't exactly a necessary thing but more of an organizational practice which I'm not even sure if it's a good practice, so that I can keep the data separate from some scripts and other R and git related files.
I have a bunch of .csv files for data that I used. Previously they were on (for example)
C:/Users/Documents/Project
which is what I set as my working directory. But now I want them in
C:/Users/Document/Project/Data
The problem is that this only breaks the following code because they are not in the wd.
#create one big dataframe by unioning all the data
bigfile <- vroom(list.files(pattern = "*.csv"))
I've tried adding a full path to list.files() to where the csvs are but no luck.
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv"))
Error: 'data1.csv' does not exist in current working directory ('C:/Users/Documents/Project').
Is there a way to only access the /Data folder once for creating my dataframe with vroom() instead of changing the working directory multiple times?
You can list files including those in all subdirectories (Data in particular) using list.files(pattern = "*.csv", recursive = TRUE)
Best practices
Have one directory of raw and only raw data (the stuff you measured)
Have another directory of external data (e.g. reference data bases). This is something you do can remove afterwards and redownload if required.
Have another directory for the source code
Put only the source code directory under version control plus one other file containing check sums of the raw and external data to proof integrity
Every other thing must be reproducible using raw data and the source code. This can be removed after the project. Maybe you want to keep small result files (e.g. tables) which take long time to reproduce.
You can list the files and capture the full filepath name right?
bigfile <- vroom(list.files(path = "C:/Users/Documents/Project/Data", pattern = "*.csv", full.names = T))
and that should read the file in the directory without reference to your wd
Try one of these:
# list all csv files within Data within current directory
Sys.glob("Data/*.csv")
# list all csv files within immediate subdirectories of current directory
Sys.glob("*/*.csv")
If you only have csv files then these would also work but seem less desirable. Might be useful though if you quickly want to review what files and directories are there. (I would be very careful not to use the second one within statements to delete files since if you are not in the directory you think it is in then you can wind up deleting files you did not intend to delete. The first one might too but is a bit safer since it would only lead to deleting wrong files if the directory you are in does have a Data subdirectory.)
# list all files & directories within Data within current directory
Sys.glob("Data/*")
# list all files & directories within immediate subdirectories of current directory
Sys.glob("*/*")
If the subfolder always has the same name (or the same number of characters), you should be able to do it thanks to substring. In your example, "Data" has 4 characters (5 with the /), so the following code should do:
Repository <- substring(getwd(), 1, nchar(getwd())-5)
I do not know how to create a reproducible example of this issue since it would require access to a folder on my computer(if anyone has a suggestion I am fine to follow up). However, what I would like to do is upload a file in a folder without actually referencing the name but only referencing the order it appears in the file, and I would like to do it in R if possible(although python would be fine as well). I want to do this because the folder has 40 or so files and I need to format them all the same, but they all have different names.
In other words I want a code that would look something like this:
setwd(folder)
for(i in 1:number of files in folder){
upload file i
process file i
rbind(master file,file i)
}
Obviously this is not an actual code, but merely a framework, so I did not put it in the code framework on the site. The line I do not know how to do is the first line in the loop(download file i). Is it possible to do this, or do I need to download every file individually with their specified name?
I think you're looking for list.fles()
ff <- list.files()
for(i in seq_along(ff)){
print(ff[i])
read.csv(ff[i], ...) # etc
...
}
I am trying to copy Java files by listing them from a folder (guava-master) and sub-folders (that is the reason I used the recursive function) using the code below
filenames <- list.files("C:/Users/shahr/Documents/master_unzip/guava-master/", pattern="*.java", recursive = TRUE)
The above list is fine but then...
I tried to paste them into a new folder using the code below
file.copy(filenames, "C:/Users/shahr/Documents/")
However, the output I receive is FALSE for all the files and I don't see any files being copied. Am I making any mistake?
many thanks.
I use the googlesheets package. The default directory for spreadsheets is the root of Google Drive. I guess that I can specify the directory - like for a "normal" directory path - but I don't know how to do that.
gs_new(title = "MyData") # export to the root
gs_new(title = "Something/MyData") # export to the specified directory
I'm also interested in this question. I will try the following to see if it works. If not, I may try to use the 'googledrive' package on top of, or in replacement of, the 'googlesheets' package to do sheet creation in a list folder hierarchy. This way I can loop through a list of subfolders while creating any files inside them until all subfolders have their new files created.
So here's my thinking... When I have time to test this out, I'll let you know!
for(path in file_paths){
setwd(path)
for(file in files){
gs_new(file)
}
}
Of course, get your parent folder as a string and use list.files("string", full.names=TRUE). Then, if you have any subfolders (assuming they're created already), it'll return a list in which to loop through. If you just want to create one workbook at one location, simply setting the working directory might work. Again, I'll need to test this in multiple methods.
Can I work with parallel working directories in R, or can I change the working directory in a loop to access the files from different folders?
I find it easier to have a single working directory. You find out what that is using the
getwd()
function. Typically, my working directory is something like:
~/colin/project1/R
You can change your working directory using
setwd()
You can easily access other files using the full path. In particular, I find
##List files in current directory
list.files()
##Give full path
list.files(full.names=TRUE)
##list files in the species1 directory
list.files("species1/", full.names=TRUE)
very handy.
Don't change the working directory in a loop, loop over the directories and use file.path to get to the file you want. Something like:
for(path in c("data1","data2","data3")){
for(file in c("file1.txt","file2.txt")){
fullPath = file.path(path,file)
doSomethingWith(fullPath)
}
}
That will loop over data1/file1.txt, data1/file2.txt and so on. Note it will also handle differences between path separators in different operating systems - don't try and paste file path components together with paste because you'll get it wrong.