Split files and process in chunks

Split files and process in chunks - r

I believe this is pretty simple, but so far I've had no luck.
I have a directory containing many large files which I split and move to /temp. I then proceed to process the files using script.R.
# create temp dir
mkdir -p "$1/temp"
# split and move files to /temp
for file in "$1"/*; do
split --verbose -b 10M --numeric-suffixes "$file" "$file"
mv -t "$1/temp" "$1/"*[0-99]
done
# process files in /temp
script.R "$1/temp"
The splitting of files result in nearly 8000 files. And for some reason the whole process crashes past a couple of thousand files. This is a problem which I have no idea how to construct a question for. :)
When I test this on smaller number of files it runs smoothly which is why I would like to perform the whole thing in chunks.
So how do I split, let's say, 10 files at a time, process them and then move on to the next 10 files.
I believe this can be achieved using xargs, nested for loops and other approaches... But welp, I'm a GNU noob.
Thanks in advance.

Could you please try this script:
cd "$1" || exit
# create temp dir
rm -rf temp && mkdir temp || exit
# split files to temp
for file in *; do
if [ -f "$file" ]; then
split --verbose -b 10M --numeric-suffixes "$file" temp/"$file"
fi
done
# process files in temp
script.R temp

Related

loop ffmpeg command through sets of subfolders and process one folder at a time

I have this script on .bat file that will process any .mp4 file and create a new one with same name but adds -no-logo after the name for each video file
The problem is that I have 150 folder each has video files that I want to run this script on, is there a way to run it on each folders one by one, and do the same task
script
for %%a in (".mp4") do ffmpeg -i "%%a" -filter:v "crop=1280:700:0:0" -c:a copy "%%~na-old.mp4"
for %%a in (".mp4") do ffmpeg -i "%%~na-old.mp4" -vf scale=1280:720,setsar=1:1 "%%~na-no-Logo.mp4"
for %%a in ("*.mp4") do del "%%~na-old.mp4"
I don't want it to run 150 time at once, this will kill my PC, I want it to go one folder at a time

.If I understood your question correctly, I would get two "for". However it is not clear if you are using linux bash or Windows bat because you selected the bash subject but you mention using .bat.
The below is going to process one MP4 at the time anyway. Do one folder, then the next until all is done.
for dir in */; do
printf "\n\nFolder processed: ${dir}\n\n"
count=0
for mp4file in $dir"**/#(*.mp4); do
ffmpeg -i "$mp4file" -y
-loglevel error \
-v quiet -stats \
place your ffmpeg command here
count+=1
done
printf "\n MP4 processed count = ${count}\n"
done
You can continue from there.

Mac OS: How to use RSYNC to copy files modified within the last 24 hours and keep folder structure?

It's a simple question that I can't seem to figure out. I'm on a Mac with Big Sur with all the latest updates, and I'm going through Terminal to get these commands to run. If there's a better way please let me know.
This is, in basic terms, what I'm trying to do--I want RSYNC to recursively go through a source directory (which in this case would ideally be an entire drive), find any files modified within the last 24 hours, and copy those to another drive, while preserving the folder structure. So if I have:
/Volumes/Drive1/Folder1/File1.file
/Volumes/Drive1/Folder1/File2.file
/Volumes/Drive1/Folder1/File3.file
And File1 has been modified in the last 24 hours, but the other two haven't, I want it to copy that file, so that on the second drive I wind up with:
/Volumes/Drive2/Folder1/File1.file
But without copying File2 and File3.
I've tried a lot of different solutions and strings, but I'm running into problems. The closest I've been able to get is this:
find /Volumes/Drive1/ -type f -mtime -1 -exec cp -a "{}" /Volumes/Drive2/ \;
The problem is that while this one does go through Drive1 and find all the files newer than a day like I want, when it copies them it just dumps them all into the root of Drive2.
This one also seems to come close:
rsync --progress --files-from=<(find /Volumes/Drive1/ -mtime -1 -type f -exec basename {} \;) /Volumes/Drive1/ /Volumes/Drive2/
This one also identifies all the files modified in the last 24 hours, but instead of copying them it gives an error, "link_stat (filename and path) failed: no such file or directory (2)."
I've spent several days trying to figure out what I'm doing wrong but I can't figure it out. Help please!

I think this'll work:
srcDir=/Volumes/Drive1
destDir=/Volumes/Drive2
(cd "$srcDir" && find . -type f -mtime -1 -print0) |
while IFS= read -r -d $'\0' filepath; do
mkdir -p "$(dirname "$destDir/$filepath")"
cp -a "$srcDir/$filepath" "$destDir/$filepath"
done
Explanation:
Using cd "$srcDir"; find . -whatever will generate relative paths (starting with "./") from the source directory to the found files; that means appending the results to $srcDir and $destDir will give the full source and destination paths for each file.
Putting it in parentheses makes it run in a subshell, so the cd won't affect other commands. Coupling cd and find with && means that if cd fails, it won't run find (which would run in the wrong place, generate a list of the wrong file file, and generally cause trouble).
Using -print0 and while IFS= read -r -d $'\0' is a standard weird-filename-safe way of iterating over found files (see BashFAQ #20). Note that if anything in the loop reads from standard input (e.g. cp -i asking for confirmation), it'll steal part of the file list; if this is a worry, use this variant (instead of the pipe) to send the file list over file descriptor #3 instead of standard input:
while IFS= read -r -d $'\0' filepath <&3; do
...
done 3< <(cd "$srcDir" && find . -type f -mtime -1 -print0)
Finally, mkdir -p is used to make sure the destination directory exists, and then cp to copy the file.

Unix script changing directory

I am in root directory, i am creating a script that will take me from root > Home > Logs and inside logs delete 3 log files.
Script will check if they exist, if YES it will delete it.
I am facing some syntax problems if you could help.
Thanks
My code:
#!/bin/sh
cd Home/Log
if [ -e error1.log ]
then
rm error1
fi
if [ -e error2.log ]
then
rm error1
fi
if [ -e error3.log ]
then
rm error1
fi
when i execute the file in root using ./delete here is what is am getting as errors:
$ ./delete
: No such file or directoryme/Log
./delete: line 14: syntax error near unexpected token `fi'

I am in root directory
When writing a script, it's almost always better not to assume things like that. If you know where the files are and it's not important that they're somewhere relative to what happens to be your current working directory, just name them.
Here are three ways you could accomplish what you want safely.
#!/bin/sh
dir=/Home/Log
rm -f ${dir}/error1.log ${dir}/error2.log ${dir}/error2.log
or
#!/bin/sh
dir=/Home/Log
rm -f ${dir}/error{1,2,3}.log
or
#!/bin/sh
set -e
cd /Home/Log && rm -f error1.log error2.log error2.log
For anything nontrivial, set -e is your friend. In your example, nothing happens later in the script. What you don't want is to keep going thinking you've changed directories, but haven't, and wind up scribbling somewhere you didn't intend. Many have lost much that way.

Shell script to sort & mv file based on date

Im new to unix,I have search a lot of info but still don not how to make it in a bash
What i know is used this command ls -tr|xargs -i ksh -c "mv {} ../tmp/" to move file by file.
Now I need to make a script that sorts all of these files by system date and moves them into a directory, The first 1000 oldest files being to be moved.
Example files r like these
KPK.AWQ07102011.66.6708.01
KPK.AWQ07102011.68.6708.01
KPK.EER07102011.561.8312.13
KPK.WWS07102011.806.3287.13
-----------This is the script tat i hv been created-------
if [ ! -d /app/RAID/Source_Files/test/testfolder ] then
echo "test directory does not exist!"
mkdir /app/RAID/Source_Files/calvin/testfolder
echo "unused_file directory created!"
fi
echo "Moving xx oldest files to test directory"
ls -tr /app/RAID/Source_Files/test/*.Z|head -1000|xargs -i ksh -c "mv {} /app/RAID/Source_Files/test/testfolder/"
the problem of this script is
1) unix prompt a syntax erro 'if'
2) The move command is working but it create a new filename testfolder instead move to directory testfolder (testfolder alredy been created in this path)
anyone can gv me a hand ? thanks

Could this help?
mv `ls -tr|head -1000` ../tmp/
head -n takes the n first lines of the previous command (here the 1000 oldest files). The backticks allow for the result of ls and head commands to be used as arguments to mv.

UNIX untar content into multiple folders

I have a tar.gz file about 13GB in size. It contains about 1.2 million documents. When I untar this all these files sit in one single directory & any reads from this directory takes ages. Is there any way I can split the files from the tar into multiple new folders?
e.g.: I would like to create new folders named [1,2,...] each having 1000 files.

This is a quick and dirty solution but it does the job in Bash without using any temporary files.
i=0 # file counter
dir=0 # folder name counter
mkdir $dir
tar -tzvf YOURFILE.tar.gz |
cut -d ' ' -f12 | # get the filenames contained in the archive
while read filename
do
i=$((i+1))
if [ $i == 1000 ] # new folder for every 1000 files
then
i=0 # reset the file counter
dir=$((dir+1))
mkdir $dir
fi
tar -C $dir -xvzf YOURFILE.tar.gz $filename
done
Same as a one liner:
i=0; dir=0; mkdir $dir; tar -tzvf YOURFILE.tar.gz | cut -d ' ' -f12 | while read filename; do i=$((i+1)); if [ $i == 1000 ]; then i=0; dir=$((dir+1)); mkdir $dir; fi; tar -C $dir -xvzf YOURFILE.tar.gz $filename; done
Depending on your shell settings the "cut -d ' ' -f12" part for retrieving the last column (filename) of tar's content output could cause a problem and you would have to modify that.
It worked with 1000 files but if you have 1.2 million documents in the archive, consider testing this with something smaller first.

Obtain filename list with --list
Make files containing filenames with grep
untar only these files using --files-from
Thus:
tar --list archive.tar > allfiles.txt
grep '^1' allfiles.txt > files1.txt
tar -xvf archive.tar --files-from=files1.txt

If you have GNU tar you might be able to make use of the --checkpoint and --checkpoint-action options. I have not tested this, but I'm thinking something like:
# UNTESTED
cd /base/dir
mkdir $(printf "dir%04d\n" {1..1500}) # probably more than you need
ln -s dest0 linkname
tar -C linkname ... --checkpoint=1000 \
--checkpoint-action='sleep=1' \
--checkpoint-action='exec=ln -snf dest%u linkname ...

you can look at the man page and see if there are options like that. worst comes to worst, just extract the files you need (maybe using --exclude ) and put them into your folders.

tar doesn't provide that capability directly. It only restores its files into the same structure from which it was originally generated.
Can you modify the source directory to create the desired structure there and then tar the tree? If not, you could untar the files as they are in the file and then post-process that directory using a script to move the files into the desired arrangement. Given the number of files, this will take some time but at least it can be done in the background.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split files and process in chunks - r

Could you please try this script: cd "$1" || exit # create temp dir rm -rf temp && mkdir temp || exit # split files to temp for file in *; do if [ -f "$file" ]; then split --verbose -b 10M --numeric-suffixes "$file" temp/"$file" fi done # process files in temp script.R temp

Related

loop ffmpeg command through sets of subfolders and process one folder at a time

Mac OS: How to use RSYNC to copy files modified within the last 24 hours and keep folder structure?

Unix script changing directory

Shell script to sort & mv file based on date

UNIX untar content into multiple folders

Categories

Resources