How to recursively diff without transversing filesystems? - directory

I'd like to create a patchset for two directory trees both of which contain (bind-)mounts which should be ignored. Is there any diff -r option similar to rsync's -x, --one-file-system? Or is another tool more appropriate for this? I considered using rsync --compare-dest, but the problem is a "diff"-directory obtained this way contains no information on file deletions.
Background: I want to store the modifications made to a chrooted-into Gentoo stage3 archive

As a workaround, I currently waste a lot of time by running rsync twice:
ORIG=/path/to/original
MOD=/path/to/modded
# find the modified/added files:
mkdir modded && rsync -axP --prune-empty-dirs --compare-dest=$ORIG $MOD/ modded
# the other way around, includes both deleted and modded files
mkdir deleted && rsync -axP --prune-empty-dirs --compare-dest=$MOD $ORIG/ deleted
# find the modded files and remove them
for i in $(find deleted); do [ -e modded${i#deleted} ] && rm $i; done
# delete the empty directories
find modded delete -type d -empty -delete
# create a list of the deleted files
cd deleted && find -type f > ../deleted.list && cd ..
# tar the modifications
cd modded && tar czf ../modded.tgz && cd ..
rm -rf deleted modded
Now modded.tgz contains the files that were modified/added, while deleted.list contains the names of deleted files, so to apply them run
tar xf modded.tgz
while read -r line; do rm $line; done < deleted.list
This can probably also be used to create a patchfile instead...

Related

Mac OS: How to use RSYNC to copy files modified within the last 24 hours and keep folder structure?

It's a simple question that I can't seem to figure out. I'm on a Mac with Big Sur with all the latest updates, and I'm going through Terminal to get these commands to run. If there's a better way please let me know.
This is, in basic terms, what I'm trying to do--I want RSYNC to recursively go through a source directory (which in this case would ideally be an entire drive), find any files modified within the last 24 hours, and copy those to another drive, while preserving the folder structure. So if I have:
/Volumes/Drive1/Folder1/File1.file
/Volumes/Drive1/Folder1/File2.file
/Volumes/Drive1/Folder1/File3.file
And File1 has been modified in the last 24 hours, but the other two haven't, I want it to copy that file, so that on the second drive I wind up with:
/Volumes/Drive2/Folder1/File1.file
But without copying File2 and File3.
I've tried a lot of different solutions and strings, but I'm running into problems. The closest I've been able to get is this:
find /Volumes/Drive1/ -type f -mtime -1 -exec cp -a "{}" /Volumes/Drive2/ \;
The problem is that while this one does go through Drive1 and find all the files newer than a day like I want, when it copies them it just dumps them all into the root of Drive2.
This one also seems to come close:
rsync --progress --files-from=<(find /Volumes/Drive1/ -mtime -1 -type f -exec basename {} \;) /Volumes/Drive1/ /Volumes/Drive2/
This one also identifies all the files modified in the last 24 hours, but instead of copying them it gives an error, "link_stat (filename and path) failed: no such file or directory (2)."
I've spent several days trying to figure out what I'm doing wrong but I can't figure it out. Help please!
I think this'll work:
srcDir=/Volumes/Drive1
destDir=/Volumes/Drive2
(cd "$srcDir" && find . -type f -mtime -1 -print0) |
while IFS= read -r -d $'\0' filepath; do
mkdir -p "$(dirname "$destDir/$filepath")"
cp -a "$srcDir/$filepath" "$destDir/$filepath"
done
Explanation:
Using cd "$srcDir"; find . -whatever will generate relative paths (starting with "./") from the source directory to the found files; that means appending the results to $srcDir and $destDir will give the full source and destination paths for each file.
Putting it in parentheses makes it run in a subshell, so the cd won't affect other commands. Coupling cd and find with && means that if cd fails, it won't run find (which would run in the wrong place, generate a list of the wrong file file, and generally cause trouble).
Using -print0 and while IFS= read -r -d $'\0' is a standard weird-filename-safe way of iterating over found files (see BashFAQ #20). Note that if anything in the loop reads from standard input (e.g. cp -i asking for confirmation), it'll steal part of the file list; if this is a worry, use this variant (instead of the pipe) to send the file list over file descriptor #3 instead of standard input:
while IFS= read -r -d $'\0' filepath <&3; do
...
done 3< <(cd "$srcDir" && find . -type f -mtime -1 -print0)
Finally, mkdir -p is used to make sure the destination directory exists, and then cp to copy the file.

writing a loop to go into subdirectory folders and then to copy and relabel folder names in unix

Dear Stackoverflowers,
I am having problems writing a loop that goes into a directory with multiple folders and copies and
relabels the name of each folder. Each of the folders is labelled in the same way to start, followed by different numbers,
so the structure is:
groupfolder1
123456789_ab_1234
123456789_ab_1235
123456789_ab_1236
123456789_ab_1237
groupfolder2
123456789_cd_1310
123456789_cd_1321
123456789_cd_1322
123456789_cd_1323
I want to go into each groupfolder (e.g., 123456789_ab_1234) and make a new folder with the same contents but labelled (e.g., sub-1234).
I have trying to learn Unix but am struggling with moving from completing abstract exercises to real-life problems so really appreciate responses and any explanations of how you came to a solution.
Regards E
If I well understand you want something like
for d in *
do
if test -d $d
then
mkdir /tmp/sub-$d
(cd $d ; tar cf - .) | (cd /tmp/sub-$d ; tar xf -)
mv /tmp/sub-$d $d/.
fi
done
Above I suppose you are already in groupfolder1 for instance
I don't know how are all your directories, may be you can add an embedding loop to start on the parent level containing groupfolder1 / groupfolder2 etc an go into them automatically like that :
for dd in *
do
if test -d $dd
then
cd $dd
for d in *
do
if test -d $d
then
mkdir /tmp/sub-$d
(cd $d ; tar cf - .) | (cd /tmp/sub-$d ; tar xf -)
mv /tmp/sub-$d $d/.
fi
done
cd ..
fi
done
woops posted this in the wrong section. Thank you the explanation.
I tried this:
cd /media/Eunice/'Drive B'/groupfolder1/
for d in *
do
if test -d $d
then
mkdir /tmp/sub-$d
(cd $d ; tar cf - .) | (cd /tmp/sub-$d ; tar xf -)
mv /tmp/sub-$d $d/.
fi
done
And received this error.
mkdir: cannot create directory ‘/tmp/sub-123456789_ab_1234’: No such file or directory
-bash: cd: /tmp/sub-123456789_ab_1234: No such file or directory
My concern is that 'd' here refers to the folder '123456789_ab_1234' and for the purposes of making a new folder, I need to take the numbers after '123456789_ab_' to make the label for the new folder such as sub-1234. Is there a way to easily identify 'd' as '1234'

Unix Copy Recursive Including All Directories

I have the following two directories:
~/A
drawable/
imageb.png
new/`
newimage.png
~/B
drawable/
imagec.png
When I use the cp -r ~/A/* ~/B command newimage.png with its new/ folder is copied across to ~/B however imageb.png is not copied into ~/B/drawable.
Could you explain why this is the case and how I can get around this?
Use tar instead of cp:
(cd A ; tar cf - *) | (cd B ; tar xf -)
or more compactly (if you're using GNU tar):
tar cC A -f - . | tar xC B -f -
If you are on linux you can use the -r option.
eg: cp -r ~/A/. ~/B/
If you are on BSD you could use the -R option.
eg: cp -R ~/A/. ~/B/
For more information on exactly what option you should pass, refer man cp
Also note that, if you do not have permissions to the file you it would prevent copying files.

Find and tar files on Solaris

I've got a little problem with my bash script. I'm newbie in unix world, so I find it difficult to deal with an exercise. What I have to do is find files on Solaris server with specific name, modified in specific time and archive them in one .tar file. First two points are easy, but I'm having a nightmare with trying to archive it. The thing is, I constantly archive whole tree of file (with file at the end) to .tar file, but I need just a file. My code looks like this:
find ~ -name "$maska" -mtime -$dni | xargs -t -L 1 tar -cvf $3 -C
where $maska is the name of the file, $dni refers to modification time and $3 is just a archive name. I found out about -C switch, that let's me jump into the folder where desired file is, but when I use it with xargs, it seems just to jump there and do nothing else.
So my question is:
1) is there any possibility of achieving my goal this way?
Please remember, I don't work on gnu tar. And I HAVE TO use commands: tar, find.
Edit: I'd like to specify more my problem. When I use the script for, for example, file a, it should look for it since the point shown in script (it's ~ ) and everything it will find should be in one tar file.
What I got right now is (I'm in /home/me/Scripts):
-bash-3.2$ ./Script.sh a 1000 backup
a /home/me/Program/Test/a/ 0K
a /home/me/Program/Test/a/a.c 1K
a /home/me/Program/Test/a/a.out 8K
So script has done some packing. Next I want to see my packed file, so:
-bash-3.2$ tar -tf backup
/home/me/Program/Test/a/
/home/me/Program/Test/a/a.c
/home/me/Program/Test/a/a.out
And that's the problem. Tar file have all the paths in it, so if I will untar it, instead of getting just the file I wanted to archive, I will replace them in their old places. For visualisation:
-bash-3.2$ ls
Script.sh* Script.sh~* backup
-bash-3.2$ tar -xvf backup
x /home/me/Program/Test/a, 0 bytes, 0 tape blocks
x /home/me/Program/Test/a/a.c, 39 bytes, 1 tape blocks
x /home/me/Program/Test/a/a.out, 7928 bytes, 16 tape blocks
-bash-3.2$ ls
Script.sh* Script.sh~* backup
That's the problem.
So all I want is to pack all those desired file (a in example above) in one tar file without those paths, so it will simply untar in the directory I run the Script.sh.
I'm not sure to understand what you want but this might be it :
find ~ -name "$maska" -mtime -$dni -exec tar cvf $3 {} +
Edit: second attempt after your wrote the main issue is the absolute path:
( cd ~; find . -name "$maska" -type f -mtime -$dni -exec tar cvf $3 {} + )
Edit: third attempt, after you wrote you want no path at all in the archive, maska is a directory name and $3 need to be in the current directory:
mkdir ~/foo && \
find ~ -name "$maska" -type d -mtime -$dni -exec sh -c 'ln -s $1/* ~/foo/' sh {} \; && \
( cd ~/foo ; tar chf - * ) > $3 && \
rm -rf ~/foo
Replace ~/foo by ~/somethingElse if ~/foo already exists for some reason.
Maybe you can do something like this:
#!/bin/bash
find ~ -name "$maska" -mtime -$dni -print0 | while read -d $'\0' file; do
d=$(dirname "$file")
f=$(basename "$file")
echo $d: $f # Show directory and file for debug purposes
tar -rvf tarball.tar -C"$d" "$f"
done
I don't have a Solaris box at hand for testing :-)
First of all, my assumptions:
1. "one tar file", like you said, and
2. no absolute paths, ie if you backup ~/dir/file, you should be able to test extracting it in /tmp obtaining /tmp/dir/file.
If the problem is the full paths, you should replace
find ~ # etc
with
cd ~ || exit
find . # etc
If the tar archive isn't an absolute name, instead, it should be something like
(
cd ~ || exit
find . etc etc | xargs tar cf - etc etc
) > $3
Explanation
"(...)" runs a subshell, meaning some of the tings you change in there have no effects outside of the parens; the current directory is one of them, so "(cd whatever; foo)" means you run another shell, change its current directory, run foo from there, and then you're back in your script which never changed directory.
"cd ~ || exit" is paranoia, it means "cd ~; if that fails, exit".
"." is an alias meaning "the current directory, whatever that is"; play with "find ." vs "find ~" if you don't know what it means, you'll understand it better than if I explained it here.
"tar cf -" means that you create the tar archive on standard output; I think the syntax is portable enough, you may have to replace "-" with "/dev/stdout" or whatever works on solaris (the simplest solution is simply "tar", without the "c" command, but it's ugly to read).
The final "> $3", outside of the parens, is output redirection: rather than writing the output to the terminal, you save it into a file.
So the whole script reads like this:
- open a subshell
- change the subshell's current directory to ~
- in the subshell, find the files newer than requested, archive them, and write the contents of the resulting tar archive to standard output
- the subshell's stdout is saved to $3; because the redirection is outside the parens, relative paths are resolved relatively to your script's $PWD, meaning that eg if you run the script from the /tmp directory you'll get a tar archive in the /tmp directory (it would be in ~ if the redirection happened in the subshell).
If I misunderstood your question, the solution doesn't work or the explanation isn't clear let me know (the answer is too long, but I already know that :).
The pax command will output tar-compatible archives and has the flexibility you need to rewrite pathnames.
find ~ -name "$maska" -mtime -$dni | pax -w -x ustar -f "$3" -s '!.*/!!'
Here are what the options mean, paraphrasing from the man page:
-w write the contents of the file operands to the standard output (or to the pathname specified by the -f option) in an archive format.
-x ustar the output archive format is the extended tar interchange format specified in the IEEE POSIX standard.
-s '!.*/!!' Modifies file operands according to the substitution expression, using regular expression syntax. Here, it deletes all characters in each file name from the beginning to the final /.

UNIX untar content into multiple folders

I have a tar.gz file about 13GB in size. It contains about 1.2 million documents. When I untar this all these files sit in one single directory & any reads from this directory takes ages. Is there any way I can split the files from the tar into multiple new folders?
e.g.: I would like to create new folders named [1,2,...] each having 1000 files.
This is a quick and dirty solution but it does the job in Bash without using any temporary files.
i=0 # file counter
dir=0 # folder name counter
mkdir $dir
tar -tzvf YOURFILE.tar.gz |
cut -d ' ' -f12 | # get the filenames contained in the archive
while read filename
do
i=$((i+1))
if [ $i == 1000 ] # new folder for every 1000 files
then
i=0 # reset the file counter
dir=$((dir+1))
mkdir $dir
fi
tar -C $dir -xvzf YOURFILE.tar.gz $filename
done
Same as a one liner:
i=0; dir=0; mkdir $dir; tar -tzvf YOURFILE.tar.gz | cut -d ' ' -f12 | while read filename; do i=$((i+1)); if [ $i == 1000 ]; then i=0; dir=$((dir+1)); mkdir $dir; fi; tar -C $dir -xvzf YOURFILE.tar.gz $filename; done
Depending on your shell settings the "cut -d ' ' -f12" part for retrieving the last column (filename) of tar's content output could cause a problem and you would have to modify that.
It worked with 1000 files but if you have 1.2 million documents in the archive, consider testing this with something smaller first.
Obtain filename list with --list
Make files containing filenames with grep
untar only these files using --files-from
Thus:
tar --list archive.tar > allfiles.txt
grep '^1' allfiles.txt > files1.txt
tar -xvf archive.tar --files-from=files1.txt
If you have GNU tar you might be able to make use of the --checkpoint and --checkpoint-action options. I have not tested this, but I'm thinking something like:
# UNTESTED
cd /base/dir
mkdir $(printf "dir%04d\n" {1..1500}) # probably more than you need
ln -s dest0 linkname
tar -C linkname ... --checkpoint=1000 \
--checkpoint-action='sleep=1' \
--checkpoint-action='exec=ln -snf dest%u linkname ...
you can look at the man page and see if there are options like that. worst comes to worst, just extract the files you need (maybe using --exclude ) and put them into your folders.
tar doesn't provide that capability directly. It only restores its files into the same structure from which it was originally generated.
Can you modify the source directory to create the desired structure there and then tar the tree? If not, you could untar the files as they are in the file and then post-process that directory using a script to move the files into the desired arrangement. Given the number of files, this will take some time but at least it can be done in the background.

Resources