How to create multiple tempdirs in a single R session? - r

I need to create multiple temp directories during a single R session but every time I call tempdir() I get the same directory.
Is there an easy way to ensure that every call will give me a new temp directory?

Use dir.create(tempfile()) to create a uniquely named directory inside the R temporary directory. Repeat as necessary.

You can only have one tempdir. But you could create subdirectories in it and use those instead.
If you want to automate the creation of those subdirectories (instead of having to name them manually), you could use:
if(dir.exists(paste0(tempdir(), "/1"))) {
dir.create(paste0(
tempdir(), paste0(
"/", as.character(as.numeric(sub(paste0(
tempdir(), "/"
),
"", tail(list.dirs(tempdir()), 1))) + 1))))
} else {
dir.create(paste0(tempdir(), "/1"))
}
This expression will name the first subdirectory 1 and any subsequent one with increment of 1 (so 2, 3, etc.).
This way you do not have to keep track of how many subdirectories you have already created and you can use this expression in a function, etc.

Related

scp_download to download multiple files based on a pattern?

I need to download many files from a server (specifically tectia) ideally using the ssh package. These files all follow the a predictable pattern across multiple sub folders. The filepath is formatted like this
/directory/subfolder/A001/abcde001.csv
Where A001 counts up alongside the last 3 digits of the filename (/A002/abcde002.csv and so on)
In the vignette for scp_download it states that the files parameter may contain wildcards so I have tried to do something like
scp_download(session, "/directory/subfolder/A.*/abcde.*[.]csv", to=tempdir())
and
scp_download(session, "directory/subfolder/A\\d{3}/abcde\\d{3}[.]csv", to=tempdir())
but no matter which combination of patterns or wildcards I can think of (which isn't many) I only get something like
Warning: SSH warning: scp: /directory/subfolder/A\d{3}/abcde\d{3}[.]csv: No such file or directory
What I'm hoping to do is either find a way to do pattern matching here, or to find a way to store tectia directories as a string to be read by scp_download. I've made sure that my session is connected properly and it works without attempting to pattern match, which it does.
I had the same problem. The problem is that when you use * in your pattern it gets escaped when you send it to the server. However, when you request a special file name like this /directory/subfolder/A001/abcde001.csv, it works fine.
Finally I changed my code based on the below steps:
I got the list of files/folders using ls command with ssh_exec_wait function and then store them on a variable.
Download files in the variable separately
session <- ssh_connect("username#ip",passwd="password")
files<-capture.output(ssh_exec_wait(session, command = 'ls /directory/subfolder/A001/*'))
dnc1<- scp_download(session, files[1], to = paste0(getwd(),"/data/"))
dnc2<- scp_download(session, files[2], to = paste0(getwd(),"/data/"))
dnc3<- scp_download(session, files[3], to = paste0(getwd(),"/data/"))
The bottom 3 commands can be done in a loop as this could be hundreds or thousands of records.

Create multiple directories based on files found

I'm trying to create directories to store image sequences based on a 'find' command.
Let's say there are 2 image sequences in different locations within the 'test' directory
test_123.####.dpx
test_abc.####.dpx
I would run something like:
testDir=$(find /Users/Tim/test/ -type f -name "test*.dpx")
and it would return all of the files as listed above.
What I would like to do is create two directories named test_123 and test_abc.
mkdir /Users/Tim/test/scan/${testDir:t:r:r}
If I run this then it will only create one directory, presumably based on the first result.
How would I be able to make this work to create directories that share the same base name for an unlimited number of results? (not just two as in the case of this example).
if it's not important that it's done in zsh, you could just do it in python like this:
#!/usr/bin/python3
import os
x = 0
y = 0
if input(f"{os.getcwd()} is current working dir. press y to continue \n") == "y":
for file in os.listdir(): # get files in current folder
split = file.split(".") # split by .
if len(split) > 1: # only files with more than one . are considered
if split[0] not in os.listdir(): # if the folder it fits doesn't exist
os.mkdir(split[0]) # make the folder
print(f"made new folder {split[0]}")
y = y + 1
os.rename(file, os.path.join(split[0],file)) # then move it there - OS agnostic path generated!
x = x + 1
print(f"moved {x} files to {y} folders!")
I added a check before it runs, just to prevent random people who find this from wreaking havoc. The if len(split) > 1: should also prevent some accidents by making sure it's only files with at least 2 dots that are moved, as that's an unusual naming scheme.

Moving large amounts of files from one large folder to several smaller folders using R

I have over 7,000 .wav files in one folder which need to be split up into groups of 12 and placed into separate smaller folders.
The files correspond to 1-minute recordings taken every 5 minutes, so every 12 files corresponds to 1 hour.
The files are stored on my PC in the working directory: "E:/Audiomoth Files/Winter/Rural/Emma/"
Examples of the file names are as follows:
20210111_000000.wav
20210111_000500.wav
20210111_001000.wav
20210111_001500.wav
20210111_002000.wav
20210111_002500.wav
20210111_003000.wav
20210111_003500.wav
20210111_004000.wav
20210111_004500.wav
20210111_005000.wav
20210111_005500.wav
which would be one hour, then
20210111_010000.wav
20210111_010500.wav
20210111_011000.wav
and so on.
I need the files split into groups of 12 and then I need a new folder to be created in: "E:/Audiomoth Files/Winter/Rural/Emma/Organised Files"
With the new folders named 'Hour 1', 'Hour 2' and so on.
What is the exact code I need to do this?
As is probably very obvious I'm a complete beginner with R so if the answer could be spelt out in layman's terms that would be brilliant.
Thank you in advance
Something like this?
I intentionally used copy instead of cut in order to prevent data from being lost. I edited the answer so the files will keep their old names. I order to give them new names, replace name in the last line by "Part_", i, ".wav", for example.
# get a list of the paths to all the files
old_files <- list.files("E:/Audiomoth Files/Winter/Rural/Emma/", pattern = "\\.wav$", full.names = TRUE)
# create new directory
dir.create("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files")
# start a loop, repeat as often as there are groups of 12 within the list of files
for(i in 1:(round(length(old_files)/12)+1)){
# create a directory for the hour
directory <- paste("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files", "/Hour_", i, sep = "")
dir.create(directory)
# select the files that are to copy (I guess it will start with 1*12-11 = 1st file
# and end with i*12 = 12th file)
filesToCopy <- old_files[(i*12-11):(i*12)]
# for those files run another loop:
for(file in 1:12){
# get the name of the file
name <- basename(filesToCopy[file])
# copy the file to the current directory
file.copy(filesToCopy[file], paste(directory, "/", name, sep = ""))
}
}
When you're not entirely sure, I'd recommend to copy the files instead of moving them directly (which is what I hope this script here does). You can delete them manually, later on. After you checked that everything worked well and all data is where it should be. Otherwise data can be lost due to even small errors, which we do not want to happen.

If statement for directory starting with specific character

I have a script in which I call R and depending on the directory I specify I want it to carry out a different process. One directory starts with L and the other with S. I have numerous directories that either start with L or S and they all end differently.
I specify the directory in bash and run a script like so:
./script L_dir
or
./script S_dir
So within my R script I have it set up as such:
args <- commandArgs(TRUE)
img_dir <- args[1]
if(img_dir == "^L*"){
do_process_1
} else {
do_process_2
}
Everything works fine except that no matter what directory I specify, the process called will always be do_process_2.
I have looked at this question and tried to adapt it but can't get it to work.
After changing my code to
if(grepl("^LM*", img_dir)){
do_process_1
} else {
do_process_2
}
it worked. Be careful if you change it to the above and it still carries out process_2. This may be because what you are looking for, in my case ^L*, may also be in your second directory name i.e. dir_L = LMNOP, dir_S = STUVLJH. But once i specified ^LM* it did what i wanted it to do.

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

Resources