R renaming file extension - r

I have tried looking at File extension renaming in R and using the script without any luck. My question is very much the same.
I have a bunch of files with the a file extension that I want to change. I have used the following code but cannot get the last step to work.
I know similar questions have been asked before but I'm simply stuck and therefore reaching out anyway.
startingDir<-"/Users/anders/Documents/Juni 2019/DATA"
endDir<-"/Users/anders/Documents/Juni 2019/DATA/formatted"
#List over files in startingDir with the extension .zipwblibcurl that I want to replace
old_files<-list.files(startingDir,pattern = "\\.zipwblibcurl")
#View(old_files)
#Renaming the file extension and making a new list i R changing the file extension from .zipwblibcurl to .zip
new_files <- gsub(".zipwblibcurl", ".zip", old_files)
#View(new_files)
#Replacing the old files in the startingDir. Eventually I would like to move them to newDir. For simplicity I have just tried as in the other post without any luck:...
file.rename( old_files, new_files)
After running file.rename I get the output FALSE for every entry.

The full answer here, including comment from #StephaneLaurent: make sure that you have full.names = TRUE inside the list.files(); otherwise the path to the file will not be captured, just the file name.
Full working snippet:
old = list.files(startingDir,
pattern = "\\.zipwblibcurl",
full.names = TRUE) #
# replace the file names
new <- gsub(".zipwblibcurl", ".zip", old )
# Rename old files names to the new file names
file.rename(old, new)

Like #StéphaneLaurent said, it's most likely that R tries to look in the current working directory for the files and can't find them. You can correct this by adding
file.rename(paste(startingDir, old_files, sep = "/"), paste(newDir, new_files, sep = "/"))

Related

Creating objects from all .xlsx documents in working directory

I am trying to create objects from all files in working directory with name of the original file. I tried to go the following way, but couldn't solve appearing problems.
# - SETTING WD
getwd()
setwd("PATH TO THE FILE")
library(readxl)
# - CREATING OBJECTS
file_objects <- list.files()
xlsx_objects <- unlist(grep(".xlsx",file_objects,value = T))
for (i in xlsx_objects) {
xlsx_objects[i] <- read_xlsx(xlsx_objects[i], header = T)
}
I tried to paste [i]item from "xlsx_objects" with path to WD but it only created a list of files names from docs in WD.
I also find information, that read.csv can read only one file at the time, but I guess that it should be the case with for loop, right? It is reading only one file at the time.
Using lapply (as described in this forum) I was able to get the data in the environment, but argument header didn't work, I lost names of my docs in that object which does not have desired structure. I am though looking for having these files in separated objects without calling every document exclusively.
IIUC, you could do something like:
files = list.files("PATH TO THE FILE", full.names = T, pattern = 'xlsx')
list_files = map(files, readxl::read_excel)
(You can't use read.csv to read excel files)
Also I recommend reading about R Projects so you don't have to use setwd() ever again, which makes your code harder to reproduce down the pipeline

Deleting files from a directory using a list [duplicate]

Is there any way to automatically delete all files or folders with few R command lines?
I am aware of the unlink() or file.remove() functions, but for those you need to define a character vector with exactly all the names of the files you want to delete. I am looking more for something that lists all the files or folders within a specific path (e.g. 'C:/Temp') and then delete all files with a certain name (regardless of its extension).
Any help is very much appreciated!
Maybe you're just looking for a combination of file.remove and list.files? Maybe something like:
do.call(file.remove, list(list.files("C:/Temp", full.names = TRUE)))
And I guess you can filter the list of files down to those whose names match a certain pattern using grep or grepl, no?
For all files in a known path you can:
unlink("path/*")
dir_to_clean <- tempdir() #or wherever
#create some junk to test it with
file.create(file.path(
dir_to_clean,
paste("test", 1:5, "txt", sep = ".")
))
#Now remove them (no need for messing about with do.call)
file.remove(dir(
dir_to_clean,
pattern = "^test\\.[0-9]\\.txt$",
full.names = TRUE
))
You can also use unlink as an alternative to file.remove.
Using a combination of dir and grep this isn't too bad. This could probably be turned into a function that also tells you which files are to be deleted and gives you a chance to abort if it's not what you expected.
# Which directory?
mydir <- "C:/Test"
# What phrase do you want contained in
# the files to be deleted?
deletephrase <- "deleteme"
# Look at directory
dir(mydir)
# Figure out which files should be deleted
id <- grep(deletephrase, dir(mydir))
# Get the full path of the files to be deleted
todelete <- dir(mydir, full.names = TRUE)[id]
# BALEETED
unlink(todelete)
To delete everything inside the folder, but keep the folder empty
unlink("path/*", recursive = T, force = T)
To delete everything inside the folder, and also delete the folder
unlink("path", recursive = T, force = T)
Use force = T, to overwrite any read-only/hidden/etc. issues.
I quite like here::here for finding my way through folders (especially if I'm switching between inline evaluation and knit versions of an Rmarkdown notebook)... yet another solution:
# Batch remove files
# Match files in chosen directory with specified regex
files <- dir(here::here("your_folder"), "your_pattern")
# Remove matched files
unlink(paste0(here::here("your_folder"), files))

How to read a csv file or load an excel workbook by ignoring some characters in the file path?

I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.
For example, here are 3 paths to those files:
G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx
G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx
G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx
I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.
Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:
require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))
for(i in 1:length(wd.seq)) {
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}
`
Thanks for reading and suggestions, if any.
Mathieu
I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example
fileList <- list.files(""G://User//Documents//daily_data//")
And then load the xlsx files looping through the list with a for loop
for(i in fileList) {
loadWorkbook(file = i)
}
I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)
I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"
fileListClean <- fileList[grepl("Op_Schedule_",fileList)]
or perhaps only selecting .xlsx files in the directory:
fileListClean <- fileList[grepl(".xlsx",fileList)]
Edit to fit your reply:
Since you need to fit it to a sequence, you can do it as you did earlier:
wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")
And then use grepl to only pick files starting with that extensions:
fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]
Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns

R download.file() rename the downloaded file, if the filename already exists

In R, I am trying to download files off the internet using the download.file() command in a simple code (am complete newbie). The files are downloading properly. However, if a file already exists in the download destination, I'd wish to rename the downloaded file with an increment, as against an overwrite which seems to be the default process.
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
nse.destfile = paste0(nse.folder,"fo04FEB2016bhav.csv.zip")
download.file(nse.url,nse.destfile,mode = "wb",method = "libcurl")
Problem w.r.t to this specific code: if "fo04FEB2016bhav.csv.zip" already exists, then get say "fo04FEB2016bhav.csv(2).zip"?
General answer to the problem (and not just the code mentioned above) would be appreciated as such a bottleneck could come up in any other situations too.
The function below will automatically assign the filename based on the file being downloaded. It will check the folder you are downloading to for the presence of a similarly named file. If it finds a match, it will add an incrementation and download to the new filename.
ekstroem's suggestion to fiddle with the curl settings is probably a much better approach, but I wasn't clever enough to figure out how to make that work.
download_without_overwrite <- function(url, folder)
{
filename <- basename(url)
base <- tools::file_path_sans_ext(filename)
ext <- tools::file_ext(filename)
file_exists <- grepl(base, list.files(folder), fixed = TRUE)
if (any(file_exists))
{
filename <- paste0(base, " (", sum(file_exists), ")", ".", ext)
}
download.file(url, file.path(folder, filename), mode = "wb", method = "libcurl")
}
download_without_overwrite(
url = "https://raw.githubusercontent.com/nutterb/redcapAPI/master/README.md",
folder = "[path_to_folder]")
Try this:
nse.url = "https://www1.nseindia.com/content/historical/DERIVATIVES/2016/FEB/fo04FEB2016bhav.csv.zip"
nse.folder = "D:/R/Download files from Internet/"
#Get file name from url, with file extention
fname.x <- gsub(".*/(.*)", "\\1", nse.url)
#Get file name from url, without file extention
fname <- gsub("(.*)\\.csv.*", "\\1", fname.x)
#Get xtention of file from url
xt <- gsub(".*(\\.csv.*)", "\\1", fname.x)
#How many times does the the file exist in folder
exist.times <- sum(grepl(fname, list.files(path = nse.folder)))
if(exist.times){
# if it does increment by 1
fname.x <- paste0(fname, "(", exist.times + 1, ")", xt)
}
nse.destfile = paste0(nse.folder, fname.x)
download.file(nse.url, nse.destfile, mode = "wb",method = "libcurl")
Issues
This approach will not work in cases where part of the file name already exists for example you have url/test.csv.zip and in the folder you have a file testABC1234blahblah.csv.zip. It will think the file already exists, so it will save it as test(2).csv.zip.
You will need to change the #How many times does the the file exist in folder part of the code accordingly.
This is not a proper answer and shouldn't be considered as such, but the comment section above was too small to write it all.
I thought the -O -n options to curl could be used to but now that I looked at it more closely it turned out that it wasn't implemented yet. Now wget automatically increment the filename when downloading a file that already exists. However, setting method="wget" doesn't work with download.file because you are forced to set the destination file name, and once you do that you overwrite the automatic file increments.
I like the solution that #Benjamin provided. Alternatively, you can use
system(paste0("wget ", nse.url))
to get the file through the system (provided that you have wget installed) and let wget handle the increment.

Change directory/ path in beginning of syntax will change all the following identical directories

I have have a working directory:
setwd("C:/Patient migration")
then I have other directories where I save my workspace data and where I get the source data from.
C:/Patient migration/source data
C:/Patient migration/workspace
As this directories appear many times in the syntax (as part of a complete path name) and as other persons should be able to work with my syntax as well.
Such a directory later on in the syntax would look like this:
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
Is it possible to change a directory once, for example in the beginning, so that all the same directories in the syntax further down will be changed as well?
My goal is that i can name the 2 or 3 directories in the beginning of my syntax. Other users can change those and consequently all the other directories somewhere in the syntax change as well.
Do you understand what I want to do? Are there probably smarter ways to do that?
I don't really want all this data in the working directory.
Hopefully somebody can help. Thanks a lot!
Maybe you can firstly label your file with names in the beginning of your syntax like this
source.file <- "C:/Patient migration/source data"
work.file <- "C:/Patient migration/workspace"
Then you can just use the names of those paths rather than type it every time.
Other user of your syntax can set the file path in the beginning and need not change the following code any more.
I found a solution that works for me. I use relative paths which start with the subfolder where the data i need comes from or where the output is going to. This subfolder is lying in the working directory.
Like that I just need to change the working directory. Everything else can stay the same.
save (SCICases2010,file="C:/Patient migration/Workspace/SCICases2010.RData")
becomes
Patient migration/Workspace/SCICases2010.RData")
and
Data22 <- read.table(file = "C:/Patient migration/source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)
becomes
source data/DATA_BFS_MS_GEO_NiNo_2010_2.dat", sep = "|", header = TRUE)

Resources