Read in data from same subfolder in different subfolders to R - r

I have multiple folders in my data file such that the files all have a common directory of "~/Desktop/Data/". Each file in the data folder is different such that
/Desktop
/Data
/File1/Data1/
/File2/Data1/
/File3/Data1/
The File folders are different but they all contain a data folder that is named the same. I have .dta files in each of the data subfolders that I would like to read into R
EDIT: I should also note the contents in the File folders to be:
../Filex
/Data1 -- What I want to read from
/Data2
/Data3
/Code
with /Filex/Data1 being the main folder of interest. All File folders are structured this way.
I have consulted multiple stack overflow feeds and so far only figured out how to list them all had all the File folders been the same. However, I am unsure as to how I can read the data into R if these File folders were named slightly differently.
I have tried this thus far, but I get an empty set in return
files <- dir("~/Desktop/Data/*/Data/", recursive=TRUE, full.names=TRUE, pattern="\\.dta$")
For actual data, downloading files from ICPSR might help in replicating the issue.
EDIT: I am working on MAC OSX 10.15.5
Thank you so much for your assistance!

Try
files <- dir("~/Desktop/Data",pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
# to make sure /Data is there, as suggestted by #Martin Gal:
files[grepl("Data/",files)]
This Regex tester and this Regex cheatsheet have been very useful to come to the solution.
Tested under Windows :
files <- dir('c:/temp',pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
files[grepl("Data/",files)]
[1] "c:/temp/File1/Data/test2.dta" "c:/temp/File2/Data/test.dta"

Related

Compress files from a directory in gzip format (*.gz)

I have a directory that contains files with different file extensions and I have to compress them one by one because the 7z program does not allow me to do it massively. for Example:
List item
file1.xyz
file2.rrr
file3.qwe
file250.pep
expected output
file1.xyz.gz
file2.rrr.gz
file3.qwe.gz
file250.pep.gz
any idea how to do this in R? thank you
Yes you can do this in R. Assuming your files are in a subdirectory called files:
files <- dir("./files/", full.names = TRUE)
lapply(files, R.utils::gzip, remove = FALSE)
Note that remove = FALSE is very important if after compression you do not want to delete the original file. The documentation lists other options, e.g. whether you want to overwrite existing files of the same name.

Uploading, reading and naming multiple files from a Network directory in R

I am trying to read several .xlsx files from a network directory that has this path:
\\181.01.2\Global_Office_Net\Accounting
Inside this folder there are several other folders (around 15) and in each of these folders are several files but each folder does have a .xlsx file with a name that starts with "overall_counts_123" the "123" could be any number but the name of the files will always start with "overall_counts" and my goal is to have all files uploaded to Rstudio and rename them with the tag "file1", "file2" etc, I apologize if I'm not being clear let me set an example:
If there are 3 folders in the directory and each folder has "n" files that start with "overall_counts" I would like to get only something like this:
\\181.01.2\Global_Office_Net\Accounting\folder1\overall_counts1.xlsx
\\181.01.2\Global_Office_Net\Accounting\folder2\overall_counts1.xlsx
\\181.01.2\Global_Office_Net\Accounting\folder2\overall_counts15.xlsx
\\181.01.2\Global_Office_Net\Accounting\folder3\overall_counts1008.xlsx
I'm using this code:
file_paths<-fs::dir_ls("\\181.01.2\Global_Office_Net\Accounting")
FILES<-file_paths %>%
map(function(path){
read_xlsx(path)})
But instead of looking in each folder for the files that start with "overall_counts" this is uploading everything and somehow making a list of them... when what I really looking for is to have each desired file uploaded as file2, file2 and so on in separates dataframes I will be so thankful if you could please reference an article on how to upload files based in a criteria for the name of the file and upload them separately thank you so much guys I truly owe you this one
You can try this:
Get a list of the files that match, looking in all the subfolders (recursive = TRUE)
filenames <- list.files(path = "\\\\181.01.2\\Global_Office_Net\\Accounting\\",
pattern = "overall_counts[0-9]+\\.xlsx",
recursive = TRUE)
Tag with "file1", "file2", etc.
fns <- setNames(filenames, paste0("file", 1:length(filenames)))
Now read these files into R:
library(readxl)
dfs <- lapply(fns, read_xlsx)
This results in a list of dataframes (tibbles) like so: dfs$file1, dfs$file2, etc.

Looping through folder and finding specific file in R

I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.

Proper phrasing for a loop to convert all .dta files to .csv in a directory

So I have a single instance of dta to csv conversion, and I need to repeat it for all files in a directory. Great help on SO, but I'm still not quite there. Here's the single instance
#Load Foreign Library
library(foreign)
## Set working directory in which dtw files can be found)
setwd("~/Desktop")
## Single File Convert
write.csv(read.dta("example.dta"), file = "example.csv")
From here, I figure I use something like:
## Get list of all the files
file_list<-dir(pattern = ".dta$", recursive=F, ignore.case = T)
## Get the number of files
n <- length(file_list)
## Loop through each file
for(i in 1:n) file_list[[i]]
But I'm not sure of the proper syntax, expressions, etc. After reviewing the great solutions below, I'm just confused (not necessarily getting errors) and about to do it manually -- quick tips for an elegant way to go through each file in a directory and convert it?
Answers reviewed include:
Convert Stata .dta file to CSV without Stata software
applying R script prepared for single file to multiple files in the directory
Reading multiple files from a directory, R
THANKS!!
Got the answer: Here's the final code:
## CONVERT ALL FILES IN A DIRECTORY
## Load Foreign Library
library(foreign)
## Set working directory in which dtw files can be found)
setwd("~/Desktop")
## Convert all files in wd from DTA to CSV
### Note: alter the write/read functions for different file types. dta->csv used in this specific example
for (f in Sys.glob('*.dta'))
write.csv(read.dta(f), file = gsub('dta$', 'csv', f))
If the files are in your current working directory, one way would be to use Sys.glob to get the names, then loop over this vector.
for (f in Sys.glob('*.dta'))
write.csv(read.dta(f), file = gsub('dta$', 'csv', f))

How can I read multiple files from multiple directories into R for processing?

I am running a simulation study and need to process and save the results from several text files. I have the data organized in such a way where there are sub directories and within each sub directory, I need to process and get individual results for 1000 data files. This is very easy to do in SAS using macros. However, I am new to R and cannot figure out how to do such. Below is what I am trying to accomplish.
DATA Folder-> DC1 -> DC1R1.txt ... DC1R1000.txt
DC2 -> DC2R1.txt ... DC2R1000.txt
Any help would be greatly appreciated!
I'm not near a computer with R right now, but read the help for file-related functions:
The dir function will list the files and directories. It has a recursive argument.
list.files is an alias for dir. The file.info function will tell you (among other things) if a path is a directory and file.path will combine path parts.
The basename and dirname functions might also be useful.
Note that all these functions are vectorized.
EDIT Now at a computer, so here's an example:
# Make a function to process each file
processFile <- function(f) {
df <- read.csv(f)
# ...and do stuff...
file.info(f)$size # dummy result
}
# Find all .csv files
files <- dir("/foo/bar/", recursive=TRUE, full.names=TRUE, pattern="\\.csv$")
# Apply the function to all files.
result <- sapply(files, processFile)
If you need to run the same analysis on each of the files, then you can access them in one shot using list.files(recursive = T). This is assuming that you have already set your working directory to Data Folder. The recursive = T lists all files within subdirectories as well.
filenames <- list.files("path/to/files", recursive=TRUE)
This will give you all the files residing under one folder and sub folders under it.
You can use Perl's glob () function to get a list of files and send it to R using, e.g., RSPerl's interface.

Resources