diff multiple originals versus backups - unix

I have a folder with 5 files. I decide I want to do some search and replace on them using sed. Problem is, I need to keep track of the changes. So I make a backup folder "bak" which has all the copes of the original files.
There are now 10 files total. 5 originals and 5 backups.
I would like to run a sed command over the originals and then compare them to the backup to keep track of changes.
would this be as simple as
diff * ./backup_folder/*
The above code doesn't work but it illustrates the concept. Is there a better way to do this?

Perhaps put your backup folder in a different location (i.e. not a subdirectory of your current folder) - maybe in the parent of your current folder. Then a simple:
diff -r ../backup_folder .
or
diff -r /path/to/backup_folder .
should work.

for f in *; do diff "$f" ./backup_folder/"$f"; done

Related

Script Issues with find -> tar/gzip

I am currently working on a script, to store/backup our old files, so that we have more space on our server. This script will be used as a cronjob to backup the stuff every week. My script currently looks like this:
#!/bin/bash
currentDate=$(date '+%Y%m%d%T' | sed -e 's/://g')
find /Directory1/ -type f -mtime +90 | xargs tar cvf - | gzip > /Directory2/Backup$currentDate.tar.gz
find /Directory1/ -type f -mtime +90 -exec rm {} \;
The script is at first saving the current Date + Timestamp(without ":") as a variable. Afterwards it searches for files older than 90 days, tars them and finally makes a gzip out of them, which has the name "Backup$currentDate.tar.gz".
Then it's supposed to find the files again and remove them.
I do however have some issues here:
Directory1 consists of multiple Directories. It does find the files and creates the gz file, but while some files are zipped properly(for instance /DirName1/DirName2/DirName3/File), others appear directly in the "root" Dir. What could be the issue here?
Is there a way to tell the Script, to only create the gz file, if files are found? Because currently, we get gz files, even if there was nothing found, leading to empty directories.
Can I somehow use the find output later on(store variable?), so that the remove at the end really only targets those files found in the step before? Because if the third step would take, let's say a hour and the last step gets executed after it's finished, it could potentially remove files, that weren't older than 90 days before, but are now, so they are never backed up, but then deleted(highly unlikly, but not impossible).
If there's anything else you need to know, feel free to ask ^^
Best regards
I've "rephrased" your original code a bit. I don't have an AIX machine to test anything, so DO NOT cut and paste this. Using this code, you should be able to address your issues. To wit:
It make a record of what files it intends to operate on ($BFILES).
This record can be used to check for empty tar files.
This record can be used to see why your find is producing "funny" output. It wouldn't surprise me to find that xargs hit a space character.
This record can be used to delete exactly the files archived.
As a child, I had a serious accident with xargs and have avoided it ever since. Maybe there is a safe version out there.
#!/bin/bash
# I don't have an AIX machine to test this, so exit immediately until
# someone can proof this code.
exit 1
currentDate=$(date '+%Y%m%d%T' | sed -e 's/://g')
BFILES=/tmp/Backup$currentDate.files
find /Directory1 -type f -mtime +90 -print > $BFILES
# Here is the time to proofread the file list, $BFILES
# The AIX page I read lists the '-L' option to take filenames from an
# input file. I've found xargs to be sketchy unless you are very
# careful about quoting.
#tar -c -v -L $BFILES -f - | gzip -9 > /Directory2/Backup$currentDate.tar.gz
# I've found xargs to be sketchy unless you are very careful about
# quoting. I would rather loop over the input file one well quoted
# line at a time rather than use the faster, less safe xargs. But
# here it is.
#xargs rm < $BFILES

Copy folders to new folder with different ending

I have a huge number of folders all with different names but same ending.
Like this:
blabla_ending1
Now I want to copy all those folders and give them another ending (ending2). I tried this but it did not work like I want to:
cp -r *_ending1 *_ending2
Somehow I need to specify that the second * depends on the first one. Maybe I am also unaware of the precise meaning of *. I know its very basic but I could not find any help yet.
I can't think of a simple command to achieve that. However, the following will achieve the desired result:
for path in *_ending1; do
newpath=`echo $path | sed 's/_ending1$/_ending2/'`
cp -r $path $newpath
done

Makefile rule depend on directory content changes

Using Make is there a nice way to depend on a directories contents.
Essentially I have some generated code which the application code depends on. The generated code only needs to change if the contents of a directory changes, not necessarily if the files within change their content. So if a file is removed or added or renamed I need the rule to run.
My first thought is generate a text file listing of the directory and diff that with the last listing. A change means rerun the build. I think I will have to pass off the generate and diff part to a bash script.
I am hoping somehow in their infinite intelligence might have an easier solution.
Kudos to gjulianm who got me on the right track. His solution works perfect for a single directory.
To get it working recursively I did the following.
ASSET_DIRS = $(shell find ../../assets/ -type d)
ASSET_FILES = $(shell find ../../assets/ -type f -name '*')
codegen: ../../assets/ $(ASSET_DIRS) $(ASSET_FILES)
generate-my-code
It appears now any changes to the directory or files (add, delete, rename, modify) will cause this rule to run. There is likely some issue with file names here (spaces might cause issues).
Let's say your directory is called dir, then this makefile will do what you want:
FILES = $(wildcard dir/*)
codegen: dir # Add $(FILES) here if you want the rule to run on file changes too.
generate-my-code
As the comment says, you can also add the FILES variable if you want the code to depend on file contents too.
A disadvantage of having the rule depend on a directory is that any change to that directory will cause the rule to be out-of-date — including creating generated files in that directory. So unless you segregate source and target files into different directories, the rule will trigger on every make.
Here is an alternative approach that allows you to specify a subset of files for which additions, deletions, and changes are relevant. Suppose for example that only *.foo files are relevant.
# replace indentation with tabs if copy-pasting
.PHONY: codegen
codegen:
find . -name '*.foo' |sort >.filelist.new
diff .filelist.current .filelist.new || cp -f .filelist.new .filelist.current
rm -f .filelist.new
$(MAKE) generate
generate: .filelist.current $(shell cat .filelist.current)
generate-my-code
.PHONY: clean
clean:
rm -f .filelist.*
The second line in the codegen rule ensures that .filelist.current is only modified when the list of relevant files changes, avoiding false-positive triggering of the generate rule.

Locating most recently updated file recursively in UNIX

For a website I'm working on I want to be able to automatically update the "This page was last modified:" section in the footer as I'm doing my nightly git commit. Essentially I plan on writing a shell script to run at midnight each night which will do all of my general server maintenance. Most of these tasks I already know how to automate, but I have a file (footer.php) which is included in every page and displays the date the site was last updated. I want to be able to recursively look through my website and check the timestamp on every file, then if any of these were edited after the date in footer.php I want to update this date.
All I need is a UNIX command that will recursively iterate through my files and return ONLY the date of the last modification. I don't need file names or what changes were made, I just need to know a single day (and hopefully time) that the most recently updated file was changed.
I know using "ls -l" and "cut" I could iterate through every folder to do this, but I was hoping for a quicker-running and easier command. Preferably a single-line shell command (possibly with a -R parameter)
The find outputs all the access times in Unix format, then sort and take the biggest.
Converting into whatever date format is wanted is left as an exercise for the reader:
find /path -type f -iname "*.php" -printf "%T#" | sort -n | tail -1
GNU find
find /path -type -f -iname "*.php" -printf "%T+"
check the find man page to play with other -printf specifiers.
You might want to look at a inotify script that updates the footer every time any other file is modified, instead of looking all through the file system for new updates.

Can I symlink multiple directories into one?

I have a feeling that I already know the answer to this one, but I thought I'd check.
I have a number of different folders:
images_a/
images_b/
images_c/
Can I create some sort of symlink such that this new directory has the contents of all those directories? That is this new "images_all" would contain all the files in images_a, images_b and images_c?
No. You would have to symbolically link all the individual files.
What you could do is to create a job to run periodically which basically removed all of the existing symbolic links in images_all, then re-create the links for all files from the three other directories, but it's a bit of a kludge, something like this:
rm -f images_all/*
for i in images_[abc]/* ; do; ln -s $i images_all/$(basename $i) ; done
Note that, while this job is running, it may appear to other processes that the files have temporarily disappeared.
You will also need to watch out for the case where a single file name exists in two or more of the directories.
Having come back to this question after a while, it also occurs to me that you can minimise the time during which the files are not available.
If you link them to a different directory then do relatively fast mv operations that would minimise the time. Something like:
mkdir images_new
for i in images_[abc]/* ; do
ln -s $i images_new/$(basename $i)
done
# These next two commands are the minimal-time switchover.
mv images_all images_old
mv images_new images_all
rm -rf images_old
I haven't tested that so anyone implementing it will have to confirm the suitability or otherwise.
You could try a unioning file system like unionfs!
http://www.filesystems.org/project-unionfs.html
http://aufs.sourceforge.net/
to add on to paxdiablo 's great answer, i think you could use cp -s
(-s or --symbolic-link)
which makes symbolic links instead of literal copying
to maybe speed up or simplify the the bulk adding of symlinks to the "merge" folder A , of the files from folder B and C.
(i have not tested this though)
I cant recall of the top of my head, but im sure there is some option for CP to NOT overwrite existing, thus only symlinks of new files will be "cp -s" ed

Resources