Create directory structure using template in a text file - r

Suppose I have a text file text.txt that contains several lines of text indented using single spaces (not tabs) like:
My folder1/
My folder2/
My folder3/
file1.md
My folder4/
For example, and actual directory template might look like:
Proposal/
rules/
proposal/
document.tex
figs/
tabs/
ref/
log/
src/
master.r
crdat/
andat/
temp/
data/
raw/
clean/
Admin/
budget/
contract/
invoices/
receipts/
team/
hiring/
vitae/
gantt/
forms/
misc/
Study/
document.tex
figs/
tabs/
ref/
log/
src/
master.r
crdat/
andat/
temp/
data/
raw/
clean/
Dissemination/
presentations/
conference1/
submission.md
slides.tex
notes.md
admin/
registration/
travel/
receipts/
program/
forms/
conference2/
manuscripts/
journal1/
submission/
letter_v1.tex
manuscript_v1.pdf
replication_v1.zip
comments/
reviewer1.txt
R&R/
letter_v2.tex
manuscript_v2.tex
diff.tex
replication_v2.zip
journal2/
master_notes.md
TODOs.md
README.md
Essentially the file is a template for directory structure. This template can be set by the user. She could use different templates for different projects say, and the names are generic. The only constraints are that hierarchy is established using spaces, and folders end in forward slash.
I want to write a function that takes a any such directory template as an input and creates a directory structure in your current working directory. The pseudo code is as follows:
lines <- readLines(text.txt)
last.indent <- 0
for (i in lines) {
Create directory structure by looking at leading character and last characters (folders end in /)
using dir.create(i) or file.create(i)
}

I suspect you're actually doing something more complicated. In which case this may not work:
library(qdap); library(reports)
x <- readLines(n=13) ## you'd read this in from a file I assume
My folder1/
My folder2/
My folder3/
file1.md
My folder4/
My folder1b/
My folder2/
My folder3/
file1.zip
My folder1c/
My folder2/
My folder3/
file1.pdf
## You didn't ask for this but this keeps the direcories contained
new <- folder(new)
setwd(new)
dirs <- which(substring(x, 1, 1) != " ")
lens <- lapply(seq_along(dirs), function(i) {
dirs[i]:c(tail(dirs - 1, -1), length(x))[i]
})
paths <- split(x, rep(1:length(dirs), sapply(lens, length)))
path <- sapply(paths, function(x) paste(Trim(x), collapse=""))
is.dir <- sapply(path, function(x) substring(x, nchar(x))) == "/"
lapply(seq_along(is.dir), function(i){
out <- folder(folder.name =ifelse(is.dir[i],
path[i], dirname(path[i])))
if(!is.dir[i]) {
cat("", file = paste0(path[i], "_hold_.txt"))
}
})

Related

Copy/move files by folder name pattern in R

I have a folder (~/PATH/MYFOLDER) with a lot of subfolders and files.
Subfolders are named, for example, as: LClass_orgx, LClass_orgy, LClass_phyw, LClass_detz, LClass_appq
Inside each subfolder has a lot of image files (*.png and/or *.jpg)
In ~/PATH/ I have folders with part of the name of subfolders, as: orgx, orgy, phyw, detz, appq
I would to copy image files of subfolders: LClass_orgx, LClass_orgy, LClass_phyw, LClass_detz, LClass_appq, to respective folders: orgx, orgy, phyw, detz, appq
Any help would be great.
Thanks all.
You can use sub to remove "MYFOLDER/Lclass_" from the file names. Something like this:
from = list.files(
path = "~/PATH/MYFOLDER",
pattern = "(png|jpg)$",
recursive = TRUE,
full.names = TRUE
)
to = sub(x = from, pattern = "MYFOLDER/Lclass_", replacement = "", fixed = TRUE)
file.copy(from = from, to = to)
This should take input from list.files like "~/PATH/MYFOLDER/LClass_orgx/file.jpg" (from) and change it to "~/PATH/orgx/file.jpg" (to), and then copy it accordingly. You could then use file.remove to delete the old ones. (Potentially you could do this all at once with file.rename, but it seems safer to copy and take a minute to check that things look right before deleting the old ones.)
If you need to be more specific in the sources, you could modify the list.files(pattern) to specify the source directories you mention, LClass_orgx, LClass_orgy, LClass_phyw, LClass_detz, LClass_appq.

make file.exists() case insensitive

I have a line of code in my script that checks if a file exists (actually, many files, this one line gets looped for a bunch of different files):
file.exists(Sys.glob(file.path(getwd(), "files", "*name*")))
This looks for any file in the directory /files/ that has "name" in it, e.g. "filename.csv". However, some of my files are named "fileName.csv" or "thisfileNAME.csv". They do not get recognized. How can i make file.exists treat this check in a case insensitive way?
In my other code i usually make any imported names or lists immediately lowercase with the tolower function. But I don't see any option to include that in the file.exists function.
Suggested solution using list.files:
If we have many files we might want to do this only once, otherwise we can put in in the function (and pass path_to_root_directory instead of found_files to the function)
found_files <- list.files(path_to_root_directory, recursive=FALSE)
Behaviour as file.exists (return value is boolean):
fileExIsTs <- function(file_path, found_files) {
return(tolower(file_path) %in% tolower(found_files))
}
Return value is file with spelling as found in directory or character(0) if no match:
fileExIsTs <- function(file_path, found_files) {
return(found_files[tolower(found_files) %in% tolower(file_path)])
}
Edit:
New solution to fit new requirements:
keywordExists <- function(keyword, found_files) {
return(any(grepl(keyword, found_files, ignore.case=TRUE)))
}
keywordExists("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] TRUE
Or
Return value are files with spelling as found in directory or character(0) if no match:
keywordExists2 <- function(file_path, found_files) {
return(found_files[grepl(keyword, found_files, ignore.case=TRUE)])
}
keywordExists2("NaMe", found_files=c("filename.csv", "morefilenames.csv"))
Returns:
[1] "filename.csv" "morefilenames.csv"
The following should return a 1 if the filename matches in any case and a 0 if it does not.
max(grepl("*name*",list.files()),ignore.case=T)

Append to file names in folder

How to append filenames in a folder
Filenames:
abc.wav
wjejrt.wav
13567tin.wav
Desired Output
abc_ENG.wav
wjejrt_ENG.wav
13567tin_ENG.wav
Tried this line code below but getting an error, maybe because I don't know the right use of file.rename function. Please help...
file.rename(list.files(pattern="*.wav"), paste0("_ENG"))
With base Ryou can do:
Filenames <- c("abc.wav", "wjejrt.wav", "13567tin.wav")
Fnames_new <- sub(".wav", "_ENG.wav", Filenames, fixed = TRUE)
file.rename(Filenames, Fnames_new)
Since you tagged Python, you could use os.rename() to rename your files:
from os import rename
from os import listdir
from os.path import splitext
# Current directory script is being run in
# You can change this to any path you want
path_to_folder = "."
for f in listdir(path_to_folder):
if f.endswith(".wav"):
name, ext = splitext(f)
rename(f, name + "_ENG" + ext)
You can try this one
^.*(?=\\.wav)
Explanation
^ - Anchor to start of string.
.* - Match anything except new line.
(?=\\.wav) - Positive look ahead matches .wav.
Change your code to this
file.rename(list.files(pattern=".*(?=\\.wav)"), paste0("_ENG"))
Demo

How to delete all files in a directory?

My script :
#RequireAdmin
FileDelete("C:\Users\Administrator\Desktop\temp\")
I want to delete all files in that directory. I also tried :
#RequireAdmin
DirRemove("C:\Users\Administrator\Desktop\temp\")
But it's not working, any suggestion?
The syntax for FileDelete() is FileDelete("filename") ;not only directory!. You can also use wildcards for filename (* and ?).
DirRemove() works as follows: DirRemove ( "path" [, recurse = 0] ). With recurse=0 (default), deletes the folder, but only if it is empty. With recurse=1 removes files and subdirectories (like the DOS DelTree command).
Maybe you misunderstood the flag to use:
; Remove only the empty folder "Folder_path"
DirRemove("Folder_Path")
; Remove folder "Folder_Path" with all subfolder and all files within
DirRemove("Folder_Path", 1)
If this doesn't work it's a matter of system rights. If you want to delete files without deleting containing folder:
#include <Files.au3>
; Get all files in folder and delete them:
Local $aFilesInRoot = _FileListToArray("Your_Path", 1, True) ; 1=$FLTA_FILES = Return files only, True=returns full path
For $i = 1 To $aFilesInRoot[0]
FileDelete($aFilesInRoot[1])
Next
; Get all subfolders under root and delete them:
Local $aFolderInRoot = _FileListToArray("Your_Path", 2, True) ;2=$FLTA_FOLDERS = Return Folders only
For $i = 1 To $aFolderInRoot[0]
DirRemove($aFolderInRoot[1], 1)
Next
But isn't it easier to remake the deleted folder after deleting all with only one command?

How to create a sitemap.xml file using R and the {XML} package?

I have a vector of links from which I would like to create a sitemap.xml file (file protocol is available from here: http://www.sitemaps.org/protocol.html)
I understand the sitemap.xml protocol (it is rather simple), but I'm not sure what is the smartest way to use the {XML} package for it.
A simple example:
links <- c("http://r-statistics.com",
"http://www.r-statistics.com/on/r/",
"http://www.r-statistics.com/on/ubuntu/")
How can "links" be used to construct a sitemap.xml file?
Is something like this what you are looking for. (It uses the httr package to get the last modified bit and writes the XML directly with the very useful whisker package.)
require(whisker)
require(httr)
tpl <- '
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
{{#links}}
<url>
<loc>{{{loc}}}</loc>
<lastmod>{{{lastmod}}}</lastmod>
<changefreq>{{{changefreq}}}</changefreq>
<priority>{{{priority}}}</priority>
</url>
{{/links}}
</urlset>
'
links <- c("http://r-statistics.com", "http://www.r-statistics.com/on/r/", "http://www.r-statistics.com/on/ubuntu/")
map_links <- function(l) {
tmp <- GET(l)
d <- tmp$headers[['last-modified']]
list(loc=l,
lastmod=format(as.Date(d,format="%a, %d %b %Y %H:%M:%S")),
changefreq="monthly",
priority="0.8")
}
links <- lapply(links, map_links)
cat(whisker.render(tpl))
I could not use #jverzani's solution, because I wasn't able to create a valid xml file from the cat output. Thus I created an alternative.
## Input a data.frame with 4 columns: loc, lastmod, changefreq, and priority
## This data.frame is named sm in the code below
library(XML)
doc <- newXMLDoc()
root <- newXMLNode("urlset", doc = doc)
temp <- newXMLNamespace(root, "http://www.sitemaps.org/schemas/sitemap/0.9")
temp <- newXMLNamespace(root, "http://www.google.com/schemas/sitemap-image/1.1", "image")
for (i in 1:nrow(sm))
{
urlNode <- newXMLNode("url", parent = root)
newXMLNode("loc", sm$loc[i], parent = urlNode)
newXMLNode("lastmod", sm$lastmod[i], parent = urlNode)
newXMLNode("changefreq", sm$changefreq[i], parent = urlNode)
newXMLNode("priority", sm$priority[i], parent = urlNode)
rm(i, urlNode)
}
saveXML(doc, file="sitemap.xml")
rm(doc, root, temp)
browseURL("sitemap.xml")

Resources