How to rename multiple files inside a loop in R - r

I have downloaded one photo of each deputy. In total, I have 513 photos (but I hosted a file with 271 photos). Each photo was named with the ID of the deputy. I want to change the name of photo to the deputy's name. This means that "66179.jpg" file would be named "norma-ayub.jpg".
I have a column with the IDs ("uri") and their names ("name_lower"). I tried to run the code with "destfile" of download.file(), but it receives only a string. I couldn't find out how to work with file.rename().
And rename_r_to_R changes only the file extension.
I am a beginner in working with R.
CSV file:
https://gist.github.com/gabrielacaesar/3648cd61a02a3e407bf29b7410b92cec
Photos:
https://github.com/gabrielacaesar/studyingR/blob/master/chamber-of-deputies-17jan2019-files.zip
(It's not necessary to download the ZIP file; When running the code below, you also get the photos, but it takes some time to download them)
deputados <- fread("dep-legislatura56-14jan2019.csv")
i <- 1
while(i <= 514) {
tryCatch({
url <- deputados$uri[i]
api_content <- rawToChar(GET(url)$content)
pessoa_info <- jsonlite::fromJSON(api_content)
pessoa_foto <- pessoa_info$dados$ultimoStatus$urlFoto
download.file(pessoa_foto, basename(pessoa_foto), mode = "wb")
Sys.sleep(0.5)
}, error = function(e) return(NULL)
)
i <- i + 1
}

I downloaded the files you provided and directly read them into R or unzipped them into a new folder respectivly:
df <- data.table::fread(
"https://gist.githubusercontent.com/gabrielacaesar/3648cd61a02a3e407bf29b7410b92cec/raw/1d682d8fcdefce40ff95dbe57b05fa83a9c5e723/chamber-of-deputies-17jan2019",
sep = ",",
header = TRUE)
download.file("https://github.com/gabrielacaesar/studyingR/raw/master/chamber-of-deputies-17jan2019-files.zip",
destfile = "temp.zip")
dir.create("photos")
unzip("temp.zip", exdir = "photos")
Then I use list.files to get the file names of all photos, match them with the dataset and rename the photos. This runs very fast and the last bit will report if renaming the file was succesful.
photos <- list.files(
path = "photos",
recursive = TRUE,
full.names = TRUE
)
for (p in photos) {
id <- basename(p)
id <- gsub(".jpg$", "", id)
name <- df$name_lower[match(id, basename(df$uri))]
fname <- paste0(dirname(p), "/", name, ".jpg")
file.rename(p, fname)
# optional
cat(
"renaming",
basename(p),
"to",
name,
"succesful:",
ifelse(success, "Yes", "No"),
"\n"
)
}

Related

Move csv files to specific named folder in archive

I've 16 folders with specific person name in Risk/Archive/ folder and I want to copy my excel files (which also contain specific person name) from Risk/ folder to Risk/archive/ folder matching with the folder name
I'm using below code but it's not what i want to accomplish.
f = list.files('Risk/')
d = list.dirs('Risk/Archive')
if (length(f) > 0) {
File = lapply(paste0('Risk/',f), function(i){
x <- read.xlsx(i, sheet = 1, startRow=2, colNames = TRUE, check.names = FALSE, cols = c(1:73))
file.copy(from=i, to='Risk/Archive/',
overwrite = TRUE, recursive = FALSE,copy.mode = TRUE)
x})
File <- do.call("rbind.data.frame", File)}
There might be a better way to do this, but if I understand correctly, I think this should do the trick:
# Get list of names of people
names <- list.dirs(path = "./Risk/Archive",
full.names = F,
recursive = F)
# Get list of files to copy
files <- list.files(path = "./Risk",
pattern = ".xlsx",
full.names = T)
# Loop through each name and move the file for that person
for(name in 1:length(names)){
# Current name in loop
cname <- names[name]
# Get index of file that contains current name
name.idx <- grep(files, pattern = cname)
# Get file path for file that matches current name
file.path <- files[name.idx]
# Make file path for archive folder for current name
name.path <- paste0("./Risk/Archive/", cname)
# Copy file from "Risk" folder to "Archive" folder for current name
file.copy(from = file.path,
to = name.path,
overwrite = T)
# Remove original file after archiving
file.remove(file.path)
# Output message
cat(paste0("Moved file for: ", cname, "\n"))
}

Add a suffix to filenames based on subfolder names within a directory in R

I have a number of (sub)folders stored within a directory folder. Each subfolder contains 5-35 .jpg aerial photograph files that are named by flightline name and number (ie: bej-3-83). I would like to add a suffix to each of these files based upon the subfolder they are stored upon. For example if 'bej-3-83' is stored within 'T13N_10W' subfolder I would like my R script to rename 'bej-3-83' as 'bej-3-83-T13N_10W' and so forth for each file stored within each subfolder.
I can partially accomplish this process albeit still with more manual input than I'd like using this script:
folder = "C:\\...\\T23N_R14W"
files <- list.files(folder,pattern = "\\.jpg$",full.names = T)
files
sapply(files,FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern="_clip", paste0("_T23N_R14W"),eachPath))
})
But as you can see this script uses a manual paste input of the subfolder name which isn't useful when you're trying to create a script that does what I need in one fell swoop.
I'm seeing similar questions and answers which utilize 'pushd' and 'popd' and I've attached to of those threads below as links. I'm trying to read as much as I can on these functions but so far the process to make it work has me stuck.
How to rename files in folders to foldername using batch file
Rename Files Based On Folder Name
Sincerely,
Henry
You might have to change the dir_separator to \ on windows:
make_filename <- function(file_path) {
s <- unlist(strsplit(file_path, dir_separator))
fname <- gsub('\\.jpg$', '', s[length(s)])
parent_dir <- s[(length(s) - 1)]
new_fname <- paste0(parent_dir, "_", fname, '.jpg')
path <- paste(s[-length(s)], collapse = dir_separator)
return(paste(path, new_fname, sep = dir_separator))
}
folder = './data'
dir_separator = '/'
files <- paste0(folder, dir_separator, list.files(folder, recursive = T))
sapply(files, function(x) file.rename(from = x, to = make_filename(x)))
A recursive approach.
Pass the path to the root folder containing your files and the extension of the files you want to rename, to rename_batch.
Defaults are working directory and jpeg.
library(stringr)
# An auxiliary function
rename_file <- function(str, extra){
file_name <- tools::file_path_sans_ext(str)
file_ext <- tools::file_ext(str)
return(paste0(file_name, '-', extra, '.', file_ext))
}
rename_batch <- function(path = "./",
extension = 'jpeg'){
# Separate files from folders
l <- list.files(path)
files <- l[grepl(paste0("\\." , extension), l)]
folders <- list.dirs(path, F, F)
present_folder <-
stringr::str_extract(path, '(?<=/)([^/]+)$')
# Check if there is a / at the end of path and removes it
# for consistency
path_len <- nchar(path)
last <- substr(path, path_len, path_len)
if (last == '/') {
path <- substr(path, 1, path_len - 1)
}
if (length(files) > 0) {
file_updtate <- paste0(path, '/', files)
file.rename(file_updtate, rename_file(file_updtate, present_folder))
}
if (length(folders) > 0) {
for (i in paste0(path, '/', folders)) {
cat('Renaming in:', i, '\n')
rename_batch(i)
}
}
}

Change data source from FTP server to local directory

I am working with AIMS model developed by APEC Climate center. The model downloads data from ftp server and then calls the LoadCmip5DataFromAdss function from datasource.R to load data into the model.
#do.call("LoadCmip5DataFromAdss", parameters)
On github I found the source code for LoadCmip5DataFromAdss which gives the path of an ftp server to download data
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
I want to get the data from a local directory instead of downloading because that takes a lot of time. How do I do that?
Where do I find the file containing LoadCmip5DataFromAdss on my PC, because in the setup only datasource.R is given.
All that function does is copy the zip file (cmip5_daily_ + whatever you specified for NtlCode + .zip) to the directory you specified for dbdir after it downloads it then unzips it and removes the ZIP file. Here's the whole function from rSQM:
LoadCmip5DataFromAdss <- function(dbdir, NtlCode) {
fname <- paste("cmip5_daily_", NtlCode, ".zip", sep="")
if(nchar(NtlCode)==4 && substr(NtlCode,1,2)=="US"){
adss <- "ftp://cis.apcc21.org/CMIP5DB/US/"
}else{
adss <- "ftp://cis.apcc21.org/CMIP5DB/"
}
srcfname <- paste(adss, fname, sep="")
dstfname <- paste(dbdir, "/", fname, sep = "")
download.file(srcfname, dstfname, mode = "wb")
unzip(dstfname, exdir = dbdir)
unlink(dstfname, force = T)
cat("CMIP5 scenario data at",NtlCode,"is successfully loaded.\n")
}
You can just do something like:
unzip(YOUR_LOCAL_NtlCode_ZIP_FILE, exdir = WHERE_YOUR_dbdir_IS)
vs use that function.

Converting twitteR results to data frame

I have a simple for loop to write the past 100 tweets of a few usernames to .csv files:
library(twitteR)
mclist <- read.table('usernames.txt')
for (mc in mclist)
{
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), row.names = F)
}
I mostly followed what I've read on StackOverflow but I continue to get this error message:
Error in file(file, ifelse(append, "a", "w")) :
invalid 'description' argument
In addition: Warning message:
In if (file == "") file <- stdout() else if (is.character(file)) { :
the condition has length > 1 and only the first element will be used
Where did I go wrong?
I just cleaned up the code a bit, and everything started working.
Step 1: Let's set the working directory and load the 'twitteR' package.
library(twitteR)
setwd("C:/Users/Dinre/Desktop") # Replace with your desired directory
Step 2: First, we need to load a list of user names from a flat text file. I'm assuming that each line in the text file has one username, like so:
[contents of usernames.txt]
edclef
notch
dkanaga
Let's load it using the 'scan' function to read each line into an array:
mclist <- scan("usernames.txt", what="", sep="\n")
Step 3: We'll loop through the usernames, just like you did before, but we're not going to refer to the directory, since we're going to use the same directory for output as input. The original code had a syntax error in attempting to referring to the desktop directory, and we're just going to sidestep that.
for (mc in mclist){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste(mc, ".csv", sep = ""), row.names = F)
}
I end up with three files on the desktop, and all the data seems to be correct.
edclef.csv
notch.csv
dkanaga.csv
Update: If you really want to refer to different directories within your code, use the '.' character to refer to the parent directory. For instance, if your working directory is your Windows user profile, you would refer to the 'Desktop' folder like so:
setwd("C:/Users/Dinre")
...
write.csv(df, file=paste("./Desktop/". mc, ".csv", sep = ""), row.names = F)
There's a convenience function in the package twListToDF which will handle the conversion of the list of tweets to a data.frame.
Since your mclist is a data.frame, you can replace your for by apply
apply( mclist, 1,function(mc){
tweets <- userTimeline(mc, n = 100)
df <- do.call("rbind", lapply(tweets, as.data.frame))
write.csv(df, file=paste("Desktop/", mc, ".csv", sep = ""), ##!! Change Desktop to
## something like Desktop/tweets/
row.names = F)
})
PS :
The userTimeline function will only work if the user requested has a
public timeline, or you have previously registered a OAuth object
using registerTwitterOAuth

Recursively ftp download, then extract gz files

I have a multiple-step file download process I would like to do within R. I have got the middle step, but not the first and third...
# STEP 1 Recursively find all the files at an ftp site
# ftp://prism.oregonstate.edu//pub/prism/pacisl/grids
all_paths <- #### a recursive listing of the ftp path contents??? ####
# STEP 2 Choose all the ones whose filename starts with "hi"
all_files <- sapply(sapply(strsplit(all_paths, "/"), rev), "[", 1)
hawaii_log <- substr(all_files, 1, 2) == "hi"
hi_paths <- all_paths[hawaii_log]
hi_files <- all_files[hawaii_log]
# STEP 3 Download & extract from gz format into a single directory
mapply(download.file, url = hi_paths, destfile = hi_files)
## and now how to extract from gz format?
For part 1, RCurl might be helpful. The getURL function retrieves one or more URLs; dirlistonly lists the contents of the directory without retrieving the file. The rest of the function creates the next level of url
library(RCurl)
getContent <- function(dirs) {
urls <- paste(dirs, "/", sep="")
fls <- strsplit(getURL(urls, dirlistonly=TRUE), "\r?\n")
ok <- sapply(fls, length) > 0
unlist(mapply(paste, urls[ok], fls[ok], sep="", SIMPLIFY=FALSE),
use.names=FALSE)
}
So starting with
dirs <- "ftp://prism.oregonstate.edu//pub/prism/pacisl/grids"
we can invoke this function and look for things that look like directories, continuing until done
fls <- character()
while (length(dirs)) {
message(length(dirs))
urls <- getContent(dirs)
isgz <- grepl("gz$", urls)
fls <- append(fls, urls[isgz])
dirs <- urls[!isgz]
}
we could then use getURL again, but this time on fls (or elements of fls, in a loop) to retrieve the actual files. Or maybe better open a url connection and use gzcon to decompress and process on the file. Along the lines of
con <- gzcon(url(fls[1], "r"))
meta <- readLines(con, 7)
data <- scan(con, integer())
I can read the contents of the ftp page if I start R with the internet2 option. I.e.
C:\Program Files\R\R-2.12\bin\x64\Rgui.exe --internet2
(The shortcut to start R on Windows can be modified to add the internet2 argument - right-click /Properties /Target, or just run that at the command line - and obvious on GNU/Linux).
The text on that page can be read like this:
download.file("ftp://prism.oregonstate.edu//pub/prism/pacisl/grids", "f.txt")
txt <- readLines("f.txt")
It's a little more work to parse out the Directory listings, then read them recursively for the underlying files.
## (something like)
dirlines <- txt[grep("Directory <A HREF=", txt)]
## split and extract text after "grids/"
split1 <- sapply(strsplit(dirlines, "grids/"), function(x) rev(x)[1])
## split and extract remaining text after "/"
sapply(strsplit(split1, "/"), function(x) x[1])
[1] "dem" "ppt" "tdmean" "tmax" "tmin"
It's about here that this stops seeming very attractive, and gets a bit laborious so I would actually recommend a different option. There would no doubt be a better solution perhaps with RCurl, and I would recommend learning to use and ftp client for you and your user. Command line ftp, anonymous logins, and mget all works pretty easily.
The internet2 option was explained for a similar ftp site here:
https://stat.ethz.ch/pipermail/r-help/2009-January/184647.html
ftp.root <- where are the files
dropbox.root <- where to put the files
#=====================================================================
# Function that downloads files from URL
#=====================================================================
fdownload <- function(sourcelink) {
targetlink <- paste(dropbox.root, substr(sourcelink, nchar(ftp.root)+1,
nchar(sourcelink)), sep = '')
# list of contents
filenames <- getURL(sourcelink, ftp.use.epsv = FALSE, dirlistonly = TRUE)
filenames <- strsplit(filenames, "\n")
filenames <- unlist(filenames)
files <- filenames[grep('\\.', filenames)]
dirs <- setdiff(filenames, files)
if (length(dirs) != 0) {
dirs <- paste(sourcelink, dirs, '/', sep = '')
}
# files
for (filename in files) {
sourcefile <- paste(sourcelink, filename, sep = '')
targetfile <- paste(targetlink, filename, sep = '')
download.file(sourcefile, targetfile)
}
# subfolders
for (dirname in dirs) {
fdownload(dirname)
}
}

Resources