Moving files between folders when folder names match partially (in R or VBA) - r

I'm trying to solve the following problem
I have 9 folders titled PROS_2010 to PROS_2019. Each of them has about 500 subfolders with names structured as follows e.g. PROS_201001211_FIRM NAME_number. Each subfolder has a variety of pdf files with different names.
I have created in VBA another folder called sample with about 400 subfolders, each of which is named a specific FIRM NAME. For this I used the following code:
Sub MakeFolders()
Dim Rng As Range
Dim maxRows, maxCols, r, c As Integer
Set Rng = Selection
maxRows = Rng.Rows.Count
maxCols = Rng.Columns.Count
For c = 1 To maxCols
r = 1
Do While r <= maxRows
If Len(Dir(ActiveWorkbook.Path & "\" & Rng(r, c), vbDirectory)) = 0 Then
MkDir (ActiveWorkbook.Path & "\" & Rng(r, c))
On Error Resume Next
End If
r = r + 1
Loop
Next c
End Sub
I now want to move all the pdf files that are in the original subfolders PROS_201001211_FIRM NAME_number to the folders titled FIRM NAME only.
Basically, each original subfolder contains a report about a firm for a specific year (2010 to 2019) and I want to get all the firm reports for all years in a single folder titled FIRM NAME
To make it easier I already have an excel file that basically has the complete list of subfolders that looks like this:
Data structure: Company name is the name of the folder in which I want to move the files that are currently in "attachment folder". attachment1 is the pdf file name (which always changes so ideally the code would pluck all the files in attachment folder and move them to the file with company name
Thanks in advance,
Simon

OK
So thanks to the help of a mate I found it is super easy to solve this problem using the "command" command in windows
Basically create a text file (in notepad) that has the following structure
move "original pdf file directory" "new pdf file location\"
...
Repeat the structure for each file (which requires some basic excel string manipulations)
Then save the .txt file as a .cmd file and open it.
Done

Related

Apple Script: when moving a file into a folder, is it possible to move duplicates inside that folder?

I have an apple script which moves all files within a folder into a new folder according to the first 6 characters of the filenames.
If the folder already exists, the files get moved into that existing folder instead.
If the file already exists in the new/existing folder, the file to be moved into the folder gets a "copy" extension to the filename – which is not in the script, it's a system function ;-)
My question:
Is it possible to move all existing (duplicate) files into a subfolder (named "_alt" for example), before the other files get moved within the folder (to prevent the "copy"-extension)? I would like to keep both files, the original files and the newly moved files.
Thanks.
This is my existing script:
set mgFilesFolder to (choose folder with prompt "Choose folder…")
(* get the files of the mgFilesFolder folder *)
tell application "Finder" to set fileList to files of mgFilesFolder
(* iterate over every file *)
repeat with i from 1 to number of items in fileList
set this_item to item i of fileList
set this_file_Name to name of this_item as string
set thisFileCode to characters 1 thru 6 of this_file_Name as text
log thisFileCode
tell application "Finder"
try
set nf to make new folder at mgFilesFolder with properties {name:thisFileCode}
end try
end tell
end repeat
set sourceFolder to mgFilesFolder
set destinationFolder to mgFilesFolder
tell application "Finder"
set the_files to (get files of mgFilesFolder)
repeat with this_file in the_files
set the_name to name of this_file
set the_short_name to items 1 thru 6 of the_name as string
if exists folder the_short_name of mgFilesFolder then move this_file to folder the_short_name of mgFilesFolder
end repeat
end tell

Create multiple directories based on files found

I'm trying to create directories to store image sequences based on a 'find' command.
Let's say there are 2 image sequences in different locations within the 'test' directory
test_123.####.dpx
test_abc.####.dpx
I would run something like:
testDir=$(find /Users/Tim/test/ -type f -name "test*.dpx")
and it would return all of the files as listed above.
What I would like to do is create two directories named test_123 and test_abc.
mkdir /Users/Tim/test/scan/${testDir:t:r:r}
If I run this then it will only create one directory, presumably based on the first result.
How would I be able to make this work to create directories that share the same base name for an unlimited number of results? (not just two as in the case of this example).
if it's not important that it's done in zsh, you could just do it in python like this:
#!/usr/bin/python3
import os
x = 0
y = 0
if input(f"{os.getcwd()} is current working dir. press y to continue \n") == "y":
for file in os.listdir(): # get files in current folder
split = file.split(".") # split by .
if len(split) > 1: # only files with more than one . are considered
if split[0] not in os.listdir(): # if the folder it fits doesn't exist
os.mkdir(split[0]) # make the folder
print(f"made new folder {split[0]}")
y = y + 1
os.rename(file, os.path.join(split[0],file)) # then move it there - OS agnostic path generated!
x = x + 1
print(f"moved {x} files to {y} folders!")
I added a check before it runs, just to prevent random people who find this from wreaking havoc. The if len(split) > 1: should also prevent some accidents by making sure it's only files with at least 2 dots that are moved, as that's an unusual naming scheme.

Moving large amounts of files from one large folder to several smaller folders using R

I have over 7,000 .wav files in one folder which need to be split up into groups of 12 and placed into separate smaller folders.
The files correspond to 1-minute recordings taken every 5 minutes, so every 12 files corresponds to 1 hour.
The files are stored on my PC in the working directory: "E:/Audiomoth Files/Winter/Rural/Emma/"
Examples of the file names are as follows:
20210111_000000.wav
20210111_000500.wav
20210111_001000.wav
20210111_001500.wav
20210111_002000.wav
20210111_002500.wav
20210111_003000.wav
20210111_003500.wav
20210111_004000.wav
20210111_004500.wav
20210111_005000.wav
20210111_005500.wav
which would be one hour, then
20210111_010000.wav
20210111_010500.wav
20210111_011000.wav
and so on.
I need the files split into groups of 12 and then I need a new folder to be created in: "E:/Audiomoth Files/Winter/Rural/Emma/Organised Files"
With the new folders named 'Hour 1', 'Hour 2' and so on.
What is the exact code I need to do this?
As is probably very obvious I'm a complete beginner with R so if the answer could be spelt out in layman's terms that would be brilliant.
Thank you in advance
Something like this?
I intentionally used copy instead of cut in order to prevent data from being lost. I edited the answer so the files will keep their old names. I order to give them new names, replace name in the last line by "Part_", i, ".wav", for example.
# get a list of the paths to all the files
old_files <- list.files("E:/Audiomoth Files/Winter/Rural/Emma/", pattern = "\\.wav$", full.names = TRUE)
# create new directory
dir.create("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files")
# start a loop, repeat as often as there are groups of 12 within the list of files
for(i in 1:(round(length(old_files)/12)+1)){
# create a directory for the hour
directory <- paste("E:/Audiomoth Files/Winter/Rural/Emma/Organised Files", "/Hour_", i, sep = "")
dir.create(directory)
# select the files that are to copy (I guess it will start with 1*12-11 = 1st file
# and end with i*12 = 12th file)
filesToCopy <- old_files[(i*12-11):(i*12)]
# for those files run another loop:
for(file in 1:12){
# get the name of the file
name <- basename(filesToCopy[file])
# copy the file to the current directory
file.copy(filesToCopy[file], paste(directory, "/", name, sep = ""))
}
}
When you're not entirely sure, I'd recommend to copy the files instead of moving them directly (which is what I hope this script here does). You can delete them manually, later on. After you checked that everything worked well and all data is where it should be. Otherwise data can be lost due to even small errors, which we do not want to happen.

How to find a common variable in a large number of databases using Stata

So I have a large number of databases (82) in Stata, that each contain around 1300 variables and several thousand observations. Some of these databases contain variables that give the mean or standard deviation of certain concepts. For example, a variable in such a dataset could be called "leverage_mean". Now, I want to know which datasets contain variables called concept_mean or concept_sd, without having to go through every dataset by hand.
I was thinking that maybe there is a way to loop through the databases looking for variables containing "mean" or "sd", unfortunately I have idea how to do this. I'm using R and Stata datafiles.
Yes, you can do this with a loop in stata as well as R. First, you should check out the stata command ds and the package findname, which will do many of the things described here and much more. But to show you what is happening "under the hood", I'll show the Stata code that can achieve this below:
/*Set your current directory to the location of your databases*/
cd "[your cd here]"
Save the names of the 82 databases to a list called "filelist" using stata's dir function for macros. NOTE: you don't specify what kind of file your database files are, so I'm assuming .xls. This command saves all files with extension ".xls" into the list. What type of file you save into the list and how you import your database will depend on what type of files you are reading in.
local filelist : dir . files "*.xls"
Then loop over all files to show which ones contain variables that end with "_sd" or "_mean".
foreach file of local filelist {
/*import the data*/
import excel "`file'", firstrow clear case(lower)
/*produce a list of the variables that end with "_sd" and "_mean"*/
cap quietly describe *_sd *_mean, varlist
if length("r(varlist)") > 0 {
/*If the database contains variables of interest, display the database file name and variables on screen*/
display "Database `file' contains variables: " r(varlist)
}
}
Final note, this loop will only display the database name and variables of interest contained within it. If you want to perform actions on the data, or do anything else, those actions need to be included in the position of the final "display" command (which you may or may not ultimately actually need).
You can use filelist, (from SSC) to create a dataset of files. To install filelist, type in Stata's Command window:
ssc install filelist
With a list of datasets in memory, you can then loop over each file and use describe to get a list of variables for each file. You can store this list of variables in a single string variable. For example, the following will collect the names of all Stata datasets shipped with Stata and then store for each the variables they contain:
findfile "auto.dta"
local base_dir = subinstr("`r(fn)'", "/a/auto.dta", "", 1)
dis "`base_dir'"
filelist, dir("`base_dir'") pattern("*.dta")
gen variables = ""
local nmatch = _N
qui forvalues i = 1/`nmatch' {
local f = dirname[`i'] + "/" + filename[`i']
describe using "`f'", varlist
replace variables = " `r(varlist)' " in `i'
}
leftalign // also from SSC, to install: ssc install leftalign
Once you have all this information in the data in memory, you can easily search for specific variables. For example:
. list filename if strpos(variables, " rep78 ")
+-----------+
| filename |
|-----------|
13. | auto.dta |
14. | auto2.dta |
+-----------+
The lookfor_all package (SSC) is there for that purpose:
cd "pathtodirectory"
lookfor_all leverage_mean
Just make sure the file extensions are in lowercase(.dta) and not upper.

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

Resources