rsync filtering - unix

I use an rsync command to sync two directories remote >local
the command is (used in python script)
os.system('rsync --verbose --progress --stats --recursive\
--copy-links --times --include="*/" --include="*good_name*.good_ext*"\
--exclude-from "/myhome/mydir/src/rsync.exclude"\
%s %s'%(remotepath,localpath))
I want to exclude certain directories that has the same files that I also want to include.
I want to include recursively
any_dir_name/any_file_name.good
but I want to exclude any and all files that are in
bad_dir_name/
I used --exclude-from and here is my exclude from file
*
/*.bad_dir_name/
Unfortunately it doesn't work. I suspect it may have something to do with --include="*/" but if I remove it the command doesn't sync any files at all.

I got it. I used -vv to find according to which rule the directory was showing up in the sync list and since rsync supports regular expressions,
I changed my include statement from "*/" to
--include="*[^.bad_dir_name]/"
and all works fine now.

Related

Folderstructure with rsync in bash

I looked up the forum but didn't find an article which matches my problem. Maybe there is some, and you can help me out with it.
My problem is I want to sync an folder with the command rsync -a -v. The point is I got 5 different Maschinen. On every maschine is a scratch folder I want to sync into the folder: ~/work_dir/scratch_maschines and inside the /scratch_maschines folder should be a folder for maschine_a, maschine_b and so on.
On the maschines it is always the same path: /scratch/my_name. So when I use now this command for the first two maschines:
rsync -a -v --exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk' --delete sp02:/scratch/my_name ~/work_dir/scratch_maschine01; rsync -a -v --exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk' --delete maschine02:/scratch/my_name ~/work_dir/scratch_maschine02
I got a folders for scratch_maschine01 and scratch_maschine02 in my working directory but inside these folders are not direct my data there is first a folder inside with my_name and this folder contains the data. So my question is how can I use the rsync command and get the files from the scratch directorys straight to the folders for each machine?
You might want to consider reformulating your commands similar to the following:
START=`pwd`
EXCLUDES="--exclude='*.chk' --exclude='*.rwf' --exclude='*.fchk'"
{ SOURCE="sp02:/scratch/my_name"
REMOTE="${HOME}/work_dir/scratch_maschine01"
cd "${SOURCE}"
rsync --recursive -v --delete ${EXCLUDES} "./" "${REMOTE}/"
}>${START}/job.log 2>${START}/job.err
The key elements there are
the --recursive which will rsync will expand to include all content and subdirs of the SOURCE directory.
the / behind the ${SOURCE} notifies rsync to limit itself to content of the SOURCE directory, but not the directory itself.
the / behind the ${REMOTE} notifies rsync to limit itself to depositing content into that directory and expect it to already exist, to specifically fail if that does not already exist at REMOTE; this ensures that the remote site doesn't attempt a failsafe PWD and deposit files elsewhere than expected.
The above approach lends itself to a function form that could be placed into a loop with pre-attempt condition checks, along with having a complementary case for all variable assignments grouped under a destination heading (i.e. case statements).
Using such an approach with meaningful labels for variables lends itself to a type of implicit documentation, making the code more meaningful to someone not familiar with the code, as well as a refresher for yourself after a long period of not working or using the code.
I try to avoid the "~" because I prefer to always enclose definitions for variables in double quotes, to avoid issues that might arise from paths that may include unexpected characters or spaces. That way, you are sure to have your defined paths correctly interpreted by commands in scripts.
Lastly, I prefer to use the long form for the rsync options (and almost every other command) so that I don't have to refer to the manual every time to translate the single-character options when trying to understand what is coded, if the need arises for troubleshooting unexpected errors (I have always had poor memory).
My own backup command is as follows. The only reason why the
${PathMirror}${dirC}/
is not encapsulated in single quotes within the double quotes for COM is because I know those variables all evaluate to non-complex strings which cannot be misinterpreted.

rsync skip subdirectory path but not contents

I want to use rsync to copy some files from a folder structure and in the new location have the structure modified slightly. Below is what I currently have and what I'm trying to achive
Folders:
Parent/A/1/a,b,c
Parent/A/2/j,k,l
Parent/A/3/x,y,z
Parent/B/1/a1,b1,c1
Parent/B/2/j1,k1,l1
Parent/B/3/x1,y1,z1
In the new location what I want is
Parent/A/x,y,z
Parent/B/x1,y1,z1
what I have is
PathToParent/A/3/x,y,z
PathToParent/B/3/x1,y1,z1
after using the following command sequence
rsync -avzP --exclude=*/1 --exclude=*/2 ../Parent/ remote:../ParentPath/
I can easily work around this issue but I was hoping that rsync had an option to allow me to run this is as a single command.
Thanks in advance!
No, it can't do that transformation.
You can put multiple rsync invocations in a script, however ...
rsync -a Parent/A/3/ remote:../ParentPath/A/
rsync -a Parent/B/3/ remote:../ParentPath/B/

Rsync previous half-copied files?

I found rsync behaves differently in the following two situations:
(1) All the files are copied by using rsync, then using rsync again will be fast (skip all the files);
(2) Use cp to copy files, then using rsync will be slow (or may be run freshly?)
So my confusion is "Does rsync generate any internal things on the files so that it can refer to avoid duplicate checking?"
rsync -a (in archive mode, which I presume you ran) retains all attributes of a file, including creation/modification time. cp does not. I suppose something in the file attributes that's different when you use cp, probably a later modification time, in the destination files, made rsync think they are newer files, so it either recopied them or had to check the contents.

Have rsync only report files which were updated

When rsync prints out the details of what it did for each file (using one of the verbose flags) it seems to include both files that were updated and files that were not updated. For example a snippet of my output using the -v flag looks like this:
rforms.php is uptodate
robots.txt is uptodate
sorry.html
thankyou.html is uptodate
I'm only interested about the files that were updated. In the above case that's sorry.html. It also prints out directory names as it enters them even if there is no file in that directory that is updated. Is there a way to filter out uptodate files and directories with no updated files from this output?
You can pipe it through grep:
rsync -vv (your other rsync options here) | grep -v 'uptodate'
Rsync's output can be extensively customized, take a look at rsync --info=help; -v is a fairly coarse way to get information from a modern rsync.
In your case, I'm not sure exactly what you consider "updated" to mean. For example, deleted on the receiver too? Only files/dirs, but also pipes and symlinks too? Mod/access times or only content?
As a simple test I suggest you look at: rsync --info=name1 <other opts>.
Here's my take... (work-proven and very happy with it.)
rsync -arzihv --stats --progress \
/media/frank/foo/ \
/mnt/backup_drive/ | grep -E '^[^.]|^$'
The important bit is the -i for itemize.
The grep lets all output lines pass (also any summary as in -h --stats, also empty ones before that, which benefits legibility) except those starting with a dot: These are the ones, that describe unchanged files:
A . means that the item is not being updated (though it
might have attributes that are being modified).

How do I synchronize in both directions?

I want to use rsync to synchronize two directories in both directions.
I refer to synchronization in classical sense
(not how it is meant in rsync manuals):
I want to update the directories in both directions,
depending on which of them is newer.
Can this be done by rsync (preferable in a Linux-way)?
If not, what other solutions exist?
Just run it twice, with "newer" mode (-u or --update flag) plus -t (to copy file modified time), -r (for recursive folders), and -v (for verbose output to see what it is doing):
rsync -rtuv /path/to/dir_a/* /path/to/dir_b
rsync -rtuv /path/to/dir_b/* /path/to/dir_a
This won't handle deletes, but I'm not sure there is a good solution to that problem with only periodic sync'ing.
Do you know Unison File Synchronizer?
Unison is a file-synchronization tool
for Unix and Windows. It allows two
replicas of a collection of files and
directories to be stored on different
hosts (or different disks on the same
host), modified separately, and then
brought up to date by propagating the
changes in each replica to the other. ...
Note also that it is resilient to failure:
Unison is resilient to failure. It is
careful to leave the replicas and its
own private structures in a sensible
state at all times, even in case of
abnormal termination or communication failures.
You need to run rsync twice and I recommend to run it with -au:
rsync -au /local/source/* /remote/destination
rsync -au /remote/destination/* /local/source
-a (a for archive) is a shortcut for -rlptgoD:
-r Recurse into sub directories
-l Also sync symbolic links
-p Also sync file permissions
-t Also sync file modification times
-g Also sync file groups
-o Also sync file owner
-D Also sync special (not regular/meta) files
Basically whenever you want to create an identical one-to-one copy using rsync, you should always use -a as that's what most users expect to happen when they talk about "syncing". Other answers here seem to overlook that sometimes the content of a file stays unchanged but its owner may have changed or its access permissions may have changed and in that case rsync would not sync the file which could be fatal.
But you also require -u as that tells rsync to completely leave any file/folder alone, in case it exists already at the destination and has a newer last modification date. Without -u rsync would sync regardless if a file/folder is newer or not.
Please note that this solution cannot handle deleted files. Handling deletes is not easily possible as consider the following situation: A file has been deleted at the source, now how shall rsync know if that file once existed and has been deleted (in that case it must be deleted at the destination as well) or whether it never existed at the source (in that case it must be copied from the destination). These two situations look identical to rsync thus it cannot know how to react correctly. It won't help to sync the other way round as that can lead to the same situation: A file exists at the source but not at the destination. Why? Has it never existed at the destination or has it been deleted? Both cases look identical to rsync.
Sync tools that can reliably sync deleted files usually manage a sync log about all past sync operations. If that log reveals that there once was a file and has been synced but now it is missing, it's clear that it has been deleted. If there never was such a file according to the log, it must be synced. By storing all log entries with timestamps, it's even possible that a deleted file comes back and gets deleted multiple times yet the sync tool will always know what to do and the result is always correct. rsync has no such log, it only relies on the current file state of two sides of the operation.
You can however build yourself a sync command using rsync and a bit POSIX shell scripting which gets already very close to a sync tool as described above. As I needed such a tool myself, here is an answer on Stackoverflow that guides you through the creation of such a script.
Thanks jsight
rsync -urv --progress dir_a dir_b && rsync -urv --progress dir_b dir_a
This would result in the second sync happening immediately after 1st sync is over. In case the directory structure is huge, this will save time, as one does not need to sit before the pc. If the structure is huge, remove the verbose and progress stuff
rsync -ur dir_a dir_b && rsync -ur dir_b dir_a
Use rsync <OPTIONS> [hostname:]source-dir [hostname:]dest-dir
for example:
rsync -pogtEtvr --progress --bwlimit=2000 xxx-files different-stuff
Will sync xxx-files to different-stuff/xxx-files .If different-stuff/xxx-files did not exist, it will create it - i.e. copy it.
-pogtEtv - just bunch of options to preserve file metadata, plus v - verbose and r - recursive
--progress - show progress of syncing in real time - super useful if you copy big files
--bwlimit=2000 - sets maximum speed of copying/syncing (bw = bandwidth)
P.S. rsync is critically important when you work over network in case of local machine you can use commands like cp.
Good Luck!
What you need is Rclone. Rclone ("rsync for cloud storage") is a command line Linux program to sync files and directories to and from different cloud storage providers (box,dropbox,ftp etc) and local filesystems. Rlone supports mirror syncing only.
Another more graphical solution which includes real-time syncing would be to use FreeFileSync, which includes the program RealTimeSync. FreefileSync support 2-way bidirectional syncing which includes handling deletes.
I was having the same question and end up using git. It might not fit your situation, but if anyone find this topic and have the same question, you may consider a version control system.
I'm using rsync with inotifywait.
When you change any file, rsync will be executed.
inotifywait -m --exclude "$_LOG_FILE" -r -e create,delete,delete_self,modify,moved_to --format "%w%f" "$folder"
You need run inotifywait on both host. Please check example inotifywait

Resources