Create multiple directories based on files found - zsh

I'm trying to create directories to store image sequences based on a 'find' command.
Let's say there are 2 image sequences in different locations within the 'test' directory
test_123.####.dpx
test_abc.####.dpx
I would run something like:
testDir=$(find /Users/Tim/test/ -type f -name "test*.dpx")
and it would return all of the files as listed above.
What I would like to do is create two directories named test_123 and test_abc.
mkdir /Users/Tim/test/scan/${testDir:t:r:r}
If I run this then it will only create one directory, presumably based on the first result.
How would I be able to make this work to create directories that share the same base name for an unlimited number of results? (not just two as in the case of this example).

if it's not important that it's done in zsh, you could just do it in python like this:
#!/usr/bin/python3
import os
x = 0
y = 0
if input(f"{os.getcwd()} is current working dir. press y to continue \n") == "y":
for file in os.listdir(): # get files in current folder
split = file.split(".") # split by .
if len(split) > 1: # only files with more than one . are considered
if split[0] not in os.listdir(): # if the folder it fits doesn't exist
os.mkdir(split[0]) # make the folder
print(f"made new folder {split[0]}")
y = y + 1
os.rename(file, os.path.join(split[0],file)) # then move it there - OS agnostic path generated!
x = x + 1
print(f"moved {x} files to {y} folders!")
I added a check before it runs, just to prevent random people who find this from wreaking havoc. The if len(split) > 1: should also prevent some accidents by making sure it's only files with at least 2 dots that are moved, as that's an unusual naming scheme.

Related

pdf2image: how to remove the '0001' in jpg file names? (Solved)

My goal is to convert a multi page pdf file into a number of .jpg files, in such a way that the images are directly written to the hard-disk/SSD in stead of stored into memory.
In python 3.11 :
from pdf2image import convert_from_path
poppler_path = r".\poppler-22.12.0\Library\bin"
images = convert_from_path('test.pdf', output_folder='.', output_file = 'test',
poppler_path=poppler_path, paths_only = True)
pdf2image generates files with the following names
'test_0001-1.jpg',
'test_0001-2.jpg',
etc
Problem:
I would like to have the files have names without the suffix '_0001-' (eg. 'test1.jpg').
The only way so far seems to be to use convert_from_path WITHOUT output_folder and then
save each images by images.save. But in this way the images are stored first into memory, which easyly can become a lot of Mbytes.
Is it possible to change the way pdf2image generates the file names when saving images directly to files?
I'm not familiar if Poppler already has some parameters to customize the generated file names, but you can always do this:
Run the command in an empty directory (e.g. in tempfile.TemporaryDirectory())
After command finishes, list the contents of the directory and store the result in a list
Iterate over the list with a regex that will match the numbers, and create a dict for the mapping (integer to file name)
At this point you are free to rename the files to whatever you like, or to process them.
The benefit of this solution is that it's neutral, robust and works for many similar scenarios.
hi have a look at your codebase in file generators.py ,
I got mine from def counter_generator(prefix="", suffix="", padding_goal=4):
at line 41 you have :
....
#threadsafe
def counter_generator(prefix="", suffix="", padding_goal=4):
"""Returns a joined prefix, iteration number, and suffix"""
i = 0
while True:
i += 1
yield str(prefix) + str(i).zfill(padding_goal) + str(suffix)
....
think you need to play with the yield line zfill() :
The Python String zfill() method is used to fill the string with zeroes on its left until it reaches a certain width; else called Padding. If the prefix of this string is a sign character (+ or -), the zeroes are added after the sign character rather than before.
The Python String zfill() method does not fill the string if the length of the string is greater than the total width of the string after padding.
Note: The zfill() method works similar to the rjust() method if we assign '0' to the fillchar parameter of the rjust() method.
https://www.tutorialspoint.com/python/string_zfill.htm
Just use poppler utilities direct (or xpdf pdftopng) so simply call it via a shell (add other options like -r 200 as desired for resolutions other than 150)
I recommend PNG as better image fidelity, however if you want .jpg replace "-png" below with "-jpg" (direct answer as asked would be pdftoppm -jpg -f 1 -l 9 -sep "" test.pdf "test") but do follow the below enhancement for file sorting. Windows file sorting needs leading zeros otherwise sort in zip or folder is 1,10,11...2,20...., which is often undesirable.
"path to bin\pdftoppm" -png "path to \in.pdf" "name"
Result =
name-1.png
name-2.png etc.
adding digits is limited compared to other apps so if you want "name-01.png" you need to only output pages 1-9 as
\bin>pdftoppm -png -f 1 -l 9 -sep "0" in.pdf "name-"
then for pages 10 to ## use say for up to 99 page file use default (it will only use the page numbers that are available)
\bin>pdftoppm -png -f 10 -l 99 in.pdf "name"
thus for 12 pages this would produce only -10 -11 and -12 as required
likewise, for up to 9999 pages you need 4 calls, if you don't want - simply delete it. For different output directory adjust output accordingly.
set "name=%~dpn1"
set "bin=path to Poppler\Release-22.12.0-0\poppler-22.12.0\Library\bin"
"%bin%\pdftoppm" -png -r 200 -f 1 -l 9 -sep "0" "%name%.pdf" "%name%-00"
"%bin%\pdftoppm" -png -r 200 -f 10 -l 99 -sep "0" "%name%.pdf" "%name%-0"
"%bin%\pdftoppm" -png -r 200 -f 100 -l 999 -sep "0" "%name%.pdf" "%name%-"
"%bin%\pdftoppm" -png -r 200 -f 1000 -l 9999 -sep "" "%name%.pdf" "%name%-"
in say example for 12 page above the worst case would be last calls replies
Wrong page range given: the first page (100) can not be after the last page (12). and same for 1000 Thus, those warnings can be ignored.
Those 4 lines could be in a windows or OS script batch file (for sendto or drag and drop) that accepts arguments then very simply use in system or python by call pdf2png.bat input.pdf for each file and output will in that simple case be same directory.

Moving large amounts of files from one large folder to several smaller folders using R

I have over 7,000 .wav files in one folder which need to be split up into groups of 12 and placed into separate smaller folders.
The files correspond to 1-minute recordings taken every 5 minutes, so every 12 files corresponds to 1 hour.
The files are stored on my PC in the working directory: "E:/Audiomoth Files/Winter/Rural/Emma/"
Examples of the file names are as follows:
20210111_000000.wav
20210111_000500.wav
20210111_001000.wav
20210111_001500.wav
20210111_002000.wav
20210111_002500.wav
20210111_003000.wav
20210111_003500.wav
20210111_004000.wav
20210111_004500.wav
20210111_005000.wav
20210111_005500.wav
which would be one hour, then
20210111_010000.wav
20210111_010500.wav
20210111_011000.wav
and so on.
I need the files split into groups of 12 and then I need a new folder to be created in: "E:/Audiomoth Files/Winter/Rural/Emma/Organised Files"
With the new folders named 'Hour 1', 'Hour 2' and so on.
What is the exact code I need to do this?
As is probably very obvious I'm a complete beginner with R so if the answer could be spelt out in layman's terms that would be brilliant.
Thank you in advance
Something like this?
I intentionally used copy instead of cut in order to prevent data from being lost. I edited the answer so the files will keep their old names. I order to give them new names, replace name in the last line by "Part_", i, ".wav", for example.
# get a list of the paths to all the files
old_files <- list.files("E:/Audiomoth Files/Winter/Rural/Emma/", pattern = "\\.wav$", full.names = TRUE)
# create new directory
dir.create("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files")
# start a loop, repeat as often as there are groups of 12 within the list of files
for(i in 1:(round(length(old_files)/12)+1)){
# create a directory for the hour
directory <- paste("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files", "/Hour_", i, sep = "")
dir.create(directory)
# select the files that are to copy (I guess it will start with 1*12-11 = 1st file
# and end with i*12 = 12th file)
filesToCopy <- old_files[(i*12-11):(i*12)]
# for those files run another loop:
for(file in 1:12){
# get the name of the file
name <- basename(filesToCopy[file])
# copy the file to the current directory
file.copy(filesToCopy[file], paste(directory, "/", name, sep = ""))
}
}
When you're not entirely sure, I'd recommend to copy the files instead of moving them directly (which is what I hope this script here does). You can delete them manually, later on. After you checked that everything worked well and all data is where it should be. Otherwise data can be lost due to even small errors, which we do not want to happen.

Moving files between folders when folder names match partially (in R or VBA)

I'm trying to solve the following problem
I have 9 folders titled PROS_2010 to PROS_2019. Each of them has about 500 subfolders with names structured as follows e.g. PROS_201001211_FIRM NAME_number. Each subfolder has a variety of pdf files with different names.
I have created in VBA another folder called sample with about 400 subfolders, each of which is named a specific FIRM NAME. For this I used the following code:
Sub MakeFolders()
Dim Rng As Range
Dim maxRows, maxCols, r, c As Integer
Set Rng = Selection
maxRows = Rng.Rows.Count
maxCols = Rng.Columns.Count
For c = 1 To maxCols
r = 1
Do While r <= maxRows
If Len(Dir(ActiveWorkbook.Path & "\" & Rng(r, c), vbDirectory)) = 0 Then
MkDir (ActiveWorkbook.Path & "\" & Rng(r, c))
On Error Resume Next
End If
r = r + 1
Loop
Next c
End Sub
I now want to move all the pdf files that are in the original subfolders PROS_201001211_FIRM NAME_number to the folders titled FIRM NAME only.
Basically, each original subfolder contains a report about a firm for a specific year (2010 to 2019) and I want to get all the firm reports for all years in a single folder titled FIRM NAME
To make it easier I already have an excel file that basically has the complete list of subfolders that looks like this:
Data structure: Company name is the name of the folder in which I want to move the files that are currently in "attachment folder". attachment1 is the pdf file name (which always changes so ideally the code would pluck all the files in attachment folder and move them to the file with company name
Thanks in advance,
Simon
OK
So thanks to the help of a mate I found it is super easy to solve this problem using the "command" command in windows
Basically create a text file (in notepad) that has the following structure
move "original pdf file directory" "new pdf file location\"
...
Repeat the structure for each file (which requires some basic excel string manipulations)
Then save the .txt file as a .cmd file and open it.
Done

User independent data import

Say X and Y are working on a project which requieres a file data.csv. They have some common file within a cloud service named main.R. Now assume that within main.R X and Y respectively are importing data via
# uncomment first line if you are X, otherwise uncomment second line
# data <- read.csv("C:/User/X/Documents/cloud/project/data.csv")
# data <- read.csv("C:/User/Y/Desktop/cloud/project/data.csv")
Instead of uncomment one of the lines depending who is running the script I'd like to have one command in total which is universal and referes to the part of the file path which they have in common, say,
data <- read.csv(".../cloud/project/data.csv")
Any idea how to accomplish this?
Try this:
#check if directory exists
dataDir <-
if(dir.exists(paste0("C:/Users/",Sys.info()["effective_user"], "/Documents/cloud/project/"))){
paste0("C:/Users/", Sys.info()["effective_user"],"/Documents/cloud/project/")
} else if(dir.exists(paste0("C:/Users/", Sys.info()["effective_user"],"/Desktop/cloud/project/"))) {
paste0("C:/Users/", Sys.info()["effective_user"],"/Desktop/cloud/project/")
}
#if exists then read in
if(!is.null(dataDir)){ read.csv(paste0(dataDir,"data.csv")) }

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

Resources