R: list.files() does not find files in "special folder" - r

I have the following problem: I want to list all files recursively in a given folder. But this folder contains a somehow special folder, where list.files() can't look into. However, fs::dir_ls() is able to look into the folder. See the example:
> list.files(path, recursive = TRUE)
[1] "???" "archive_folders.R"
[3] "archived_folder/archived_file.txt"
>
> dir_ls(path, recurse = TRUE)
U:/Eigene Dateien/R/archive_folders/archive_folders.R
U:/Eigene Dateien/R/archive_folders/archived_folder
U:/Eigene Dateien/R/archive_folders/archived_folder/archived_file.txt
U:/Eigene Dateien/R/archive_folders/ааа
U:/Eigene Dateien/R/archive_folders/ааа/archived_file.txt
I'm working on Windows 7 and looking into the properties of the aaa folder did not give a hint about the problem. So my question is twofold:
Any ideas on what might be so special with the aaa folder?
Is there any possibility that list.files() can find the files inside this special folder?
EDIT:
The name of the folder ааа is in fact not aaa. Sounds confusing? The folder's name consists of U+00430, not the usual letter a (U+0061).

I had a similar problem. I don't know why but list.files() simply didn't work on my previous computer. I solved it using dir(). This function is loaded onto base R.
dir(path, recursive = TRUE)
Otherwise, you can try this to see if changing the working directory changes the result:
setwd(path)
dir(, recursive = TRUE)
list.files(, recursive = TRUE)
As for your question regarding the folder, I have no idea why this happens.

Related

Read in data from same subfolder in different subfolders to R

I have multiple folders in my data file such that the files all have a common directory of "~/Desktop/Data/". Each file in the data folder is different such that
/Desktop
/Data
/File1/Data1/
/File2/Data1/
/File3/Data1/
The File folders are different but they all contain a data folder that is named the same. I have .dta files in each of the data subfolders that I would like to read into R
EDIT: I should also note the contents in the File folders to be:
../Filex
/Data1 -- What I want to read from
/Data2
/Data3
/Code
with /Filex/Data1 being the main folder of interest. All File folders are structured this way.
I have consulted multiple stack overflow feeds and so far only figured out how to list them all had all the File folders been the same. However, I am unsure as to how I can read the data into R if these File folders were named slightly differently.
I have tried this thus far, but I get an empty set in return
files <- dir("~/Desktop/Data/*/Data/", recursive=TRUE, full.names=TRUE, pattern="\\.dta$")
For actual data, downloading files from ICPSR might help in replicating the issue.
EDIT: I am working on MAC OSX 10.15.5
Thank you so much for your assistance!
Try
files <- dir("~/Desktop/Data",pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
# to make sure /Data is there, as suggestted by #Martin Gal:
files[grepl("Data/",files)]
This Regex tester and this Regex cheatsheet have been very useful to come to the solution.
Tested under Windows :
files <- dir('c:/temp',pattern=".+.dta$", full.names = TRUE, recursive = TRUE)
files[grepl("Data/",files)]
[1] "c:/temp/File1/Data/test2.dta" "c:/temp/File2/Data/test.dta"

R renaming file extension

I have tried looking at File extension renaming in R and using the script without any luck. My question is very much the same.
I have a bunch of files with the a file extension that I want to change. I have used the following code but cannot get the last step to work.
I know similar questions have been asked before but I'm simply stuck and therefore reaching out anyway.
startingDir<-"/Users/anders/Documents/Juni 2019/DATA"
endDir<-"/Users/anders/Documents/Juni 2019/DATA/formatted"
#List over files in startingDir with the extension .zipwblibcurl that I want to replace
old_files<-list.files(startingDir,pattern = "\\.zipwblibcurl")
#View(old_files)
#Renaming the file extension and making a new list i R changing the file extension from .zipwblibcurl to .zip
new_files <- gsub(".zipwblibcurl", ".zip", old_files)
#View(new_files)
#Replacing the old files in the startingDir. Eventually I would like to move them to newDir. For simplicity I have just tried as in the other post without any luck:...
file.rename( old_files, new_files)
After running file.rename I get the output FALSE for every entry.
The full answer here, including comment from #StephaneLaurent: make sure that you have full.names = TRUE inside the list.files(); otherwise the path to the file will not be captured, just the file name.
Full working snippet:
old = list.files(startingDir,
pattern = "\\.zipwblibcurl",
full.names = TRUE) #
# replace the file names
new <- gsub(".zipwblibcurl", ".zip", old )
# Rename old files names to the new file names
file.rename(old, new)
Like #StéphaneLaurent said, it's most likely that R tries to look in the current working directory for the files and can't find them. You can correct this by adding
file.rename(paste(startingDir, old_files, sep = "/"), paste(newDir, new_files, sep = "/"))

Deleting files from a directory using a list [duplicate]

Is there any way to automatically delete all files or folders with few R command lines?
I am aware of the unlink() or file.remove() functions, but for those you need to define a character vector with exactly all the names of the files you want to delete. I am looking more for something that lists all the files or folders within a specific path (e.g. 'C:/Temp') and then delete all files with a certain name (regardless of its extension).
Any help is very much appreciated!
Maybe you're just looking for a combination of file.remove and list.files? Maybe something like:
do.call(file.remove, list(list.files("C:/Temp", full.names = TRUE)))
And I guess you can filter the list of files down to those whose names match a certain pattern using grep or grepl, no?
For all files in a known path you can:
unlink("path/*")
dir_to_clean <- tempdir() #or wherever
#create some junk to test it with
file.create(file.path(
dir_to_clean,
paste("test", 1:5, "txt", sep = ".")
))
#Now remove them (no need for messing about with do.call)
file.remove(dir(
dir_to_clean,
pattern = "^test\\.[0-9]\\.txt$",
full.names = TRUE
))
You can also use unlink as an alternative to file.remove.
Using a combination of dir and grep this isn't too bad. This could probably be turned into a function that also tells you which files are to be deleted and gives you a chance to abort if it's not what you expected.
# Which directory?
mydir <- "C:/Test"
# What phrase do you want contained in
# the files to be deleted?
deletephrase <- "deleteme"
# Look at directory
dir(mydir)
# Figure out which files should be deleted
id <- grep(deletephrase, dir(mydir))
# Get the full path of the files to be deleted
todelete <- dir(mydir, full.names = TRUE)[id]
# BALEETED
unlink(todelete)
To delete everything inside the folder, but keep the folder empty
unlink("path/*", recursive = T, force = T)
To delete everything inside the folder, and also delete the folder
unlink("path", recursive = T, force = T)
Use force = T, to overwrite any read-only/hidden/etc. issues.
I quite like here::here for finding my way through folders (especially if I'm switching between inline evaluation and knit versions of an Rmarkdown notebook)... yet another solution:
# Batch remove files
# Match files in chosen directory with specified regex
files <- dir(here::here("your_folder"), "your_pattern")
# Remove matched files
unlink(paste0(here::here("your_folder"), files))

Looping through folder and finding specific file in R

I am trying to loop through many folders in a directory, looking for a particular xml file buried in one of the folders. I would then like to save the location of that file and then run my code against that file (I will not include that code in this). What I am asking here is to loop through all the folders and then open the specific file.
For example:
My main folder would be: C:\Parsing
It has two folders named "folder1" and "folder2".
each folder has an xml file that I am interested in, lets say its called "needed.xml"
I would like to have a scrip that loops through the directory and finds those particular scripts.
Do you know how I could that in R.
Using list.files and greplyou could look recursively through all sub-folders
rootPath="C:\Parsing"
listFiles=list.files(rootPath,recursive=TRUE)
searchFileName="needed.xml"
presentFile=grepl(searchFileName,listFiles)
if(nchar(presentFile)) cat("File",searchFileName,"is present at", presentFile,"\n")
Is this what you're looking for?
require(XML)
fol <- list.files("C:/Parsing")
for (i in fol){
dir <- paste("C:/Parsing" , i, "/needed.xml", sep = "")
if(file.exists(dir) == T){
needed <- xmlToList(dir)
}
}
This will locate your xml file and read it into R as a list. I wasn't clear from your question if you wanted the output to be the data itself or just the directory location of your data which could then be supplied to another function/script. If you just want the location, remove the 'xmlToList' function.
I would do something like this (replace *.xml with your filename.xml if you want):
list.files(path = "C:\Parsing", pattern = "*.xml", recursive = TRUE, full.names = TRUE)
This will recursively look for files with extension .xml in the path C:\Parsing and return the full path of the matched files.

Change directory/ path in beginning of syntax will change all the following identical directories

I have have a working directory:
setwd("C:/Patient migration")
then I have other directories where I save my workspace data and where I get the source data from.
C:/Patient migration/source data
C:/Patient migration/workspace
As this directories appear many times in the syntax (as part of a complete path name) and as other persons should be able to work with my syntax as well.
Such a directory later on in the syntax would look like this:
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
Is it possible to change a directory once, for example in the beginning, so that all the same directories in the syntax further down will be changed as well?
My goal is that i can name the 2 or 3 directories in the beginning of my syntax. Other users can change those and consequently all the other directories somewhere in the syntax change as well.
Do you understand what I want to do? Are there probably smarter ways to do that?
I don't really want all this data in the working directory.
Hopefully somebody can help. Thanks a lot!
Maybe you can firstly label your file with names in the beginning of your syntax like this
source.file <- "C:/Patient migration/source data"
work.file <- "C:/Patient migration/workspace"
Then you can just use the names of those paths rather than type it every time.
Other user of your syntax can set the file path in the beginning and need not change the following code any more.
I found a solution that works for me. I use relative paths which start with the subfolder where the data i need comes from or where the output is going to. This subfolder is lying in the working directory.
Like that I just need to change the working directory. Everything else can stay the same.
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
becomes
Patient migration/Workspace/SCICases2010.RData")
and
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
becomes
source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)

Resources