How to rsync specific files, from multiple folders - rsync

I have two folders,
/home/epi/folder1/
home/epi/folder2/
And
home/epi/destination/
I want to move *.txt files from the first two folders into destination, but there are sometimes 5 files, sometimes 100k.
I want to move a max of 40k files (combined) (no matter how many files are in there). These could be in either folder1 or folder2, so in the first step I generate a list.txt using find and head that contains 40k filenames.
I've tried a few things but am struggling to get both the specific files moved (in the list), due to the fact they could be in either folder. If I use --files-from, it doesn't seem to allow me to use multiple source directories. It takes one, then expects the destination location. If I leave that out, I can specify two source folders and move all the files, but then, I'm moving everything.
I've tried using `cat list.txt`, but 40k is too large a list of arguments to pass into the rsync. It works fine for smaller numbers, but I don't want the overhead of calling it multiple times for the same list cut into chunks.
a)
rsync -O -av --stats --files-from=list.txt --remove-source-files --log-file=test.log /home/epi/destination
b)
rsync -O -av `cat list.txt` --stats --remove-source-files --log-file=log.txt --progress --temp-dir=/temp /home/epi/destination
With a) I get an
`rsync error: syntax or usage error (code 1) at options.c(1652) [client=3.0.6]`
With b) due to the size of the arguments I get
`rsync: /usr/bin/rsync: cannot execute [Argument list too long]`

Related

Rsync all files (recursively) from one dir to another, maintaining only a portion of the original dir structure

I have two directories:
Directory #1, 'C'
C's absolute path:
/A/B/C
Directory #2, 'T'
T's absolute path:
/Q/R/T
I want to use rsync, to copy all files, recursively, from C, and copy them in to T, while maintaining the original directory structure - but only from B onwards.
Example to make it clearer: suppose 'B' has only 3 files nested within it:
/A/B/f1.txt
/A/B/C/f2.txt
/A/B/C/D/f3.txt
Then I want to end up with only f2.txt and f3.txt being copied over, with the final filepaths as follows (notice how I keep the directory structure, only from B onwards):
/Q/R/T/B/C/f2.txt
/Q/R/T/B/C/D/f3.txt
Here is the catch: I must execute the rsync cmd from within /Q/R/. So when I execute this command, my pwd must be /Q/R/.
Can anyone help me figure out how to do this?
[If I did not have this constraint of where my cwd must be, I could cd to /A/B, and then execute: rsync . /Q/R/T/ --recursive --relative . Unfortunately, I can not do that for reasons that would take a lot of pointless explaining here. And when I try to execute rsync /A/. /Q/R/T/ --recursive --relative, I end up with not only everything within A, but maintaining that first part of the dir structure (/A/) that I don't want. (Note - in the real life scenario the dir structure is much more complex then this, this is just the general problem.]
The rsync command includes a couple of options which are suitable for this scenario. They are:
--include=PATTERN - Don't exclude files matching PATTERN
--exclude=PATTERN - Exclude files matching PATTERN
An excellent description and examples of the --exclude flag can be found here.
Solution
Given the directory structures provided in your question and your pwd being set to /Q/R/. Running the following command will meet your requirement:
rsync ../../A/ T/ --recursive --include A/B/** --exclude B/*.*
Edit:
If you do want /A/B/f1.txt to copy to /Q/R/T/B/f1.txt (as it's unclear in your question because you don't show it in the "I want to end up with" example"). Then omit the --exclude B/*.* part, so the complete command is reduced to:
rsync ../../A/ T/ --recursive --include A/B/**
or reduced even further in complexity to just:
rsync ../../A/** T/ --recursive
Explanation of the command
../../A/
The first argument provides the path to the source directory. I.e. The relative position within the hierarchical tree of names (Based on your pwd being /Q/R).
T/
The second argument provides the path to the destination directory. Again this is a relative position within the hierarchical tree of names (and is also based on the pwd being /Q/R).
--recursive
The first option is to recurse into the directories.
--include A/B/**
This says that you want to include all the assets (files/folders), however many levels deep, from within the folder named B which resides inside folder A.
--exclude B/*.*
This says that you want to exclude any assets (files/folders), whose name includes a dot [.] plus extension, which reside inside folder B (at the top level). This will prevent the file named f1.txt from being copied. You could be even more specific here and use --exclude B/f1.txt instead, however I'm assuming in real life you perhaps have additional files you want to exclude here too.
Additional notes
Both the --include and --exclude options can be utilized multiple times. This can be very useful for some scenarios too as it enables you to be specific about what to include and/or exclude during the copy process.
For example, lets assume that your source directory /A/B/, (as described in your question), also contains a folder named X. So its path is A/B/X.
Lets say that we also do not want to copy this folder named X (in the same way as you currently do not want to copy /A/B/f1.txt).
For this scenario we add another --exclude option as follows:
rsync ../../A/ T/ --recursive --include A/B/** --exclude B/*.* --exclude X/
Note the additional --exclude X/ at the end.
You mention...
(Note - in the real life scenario the dir structure is much more complex then this, this is just the general problem.
... in your question, so you may find it necessary to add additional --exclude=PATTERN to truly meet your requirements.
Grunt
As you have included the gruntjs flag with your question, then you may want to consider utilizing plug-ins which can run shell commands like rsync such as:
grunt-shell
grunt-exec

UNIX command to move multiple files to multiple subdirectories?

I work in an X11 window on a MAC OS X machine. Now I have hundreds of files in one directory, each file name containing a substring such as "1970", "1971",..., "2014", etc. indicating that the file is for that year. Now I have just created subdirectories named "1970", "1971", ..., "2014".
What is the one-line UNIX command that would move all the files into the subdirectories corresponding to their years?
If the year-name sub-directories are in the single directory that currently contains all the files, then you should be able to use something like this, assuming that the current directory is that single directory:
shopt -s nullglob
for year in {1970..2014}
do
mv *?${year}* $year
mv *${year}?* $year
done
The globbing insists on at least one more character in the name to be moved than just the year, either before or after the year, to prevent an attempt to move 1970 into itself (which would fail). You need two mv commands to prevent a-1970-b from matching both glob expressions (which would cause the second to fail as the file would have already been removed). Using globbing like this preserves spaces etc in file names correctly. (Using command substitution, etc, does not.)
The shopt command means that if there are no files for a given glob, there'll be nothing in the output. That will generate a usage error from mv (a nuisance), but is otherwise harmless. You could decide to filter such error messages if you really want to; you probably don't want to send all error messages to /dev/null, though.
Since you're on a Mac, you don't have GNU mv with the very useful -t target option.
You said you need a single line command; replace each newline except the one after do with a semicolon; replace the newline after do with a space.
If you know that the year is never at the beginning or end of the file name, you can use a single mv *?${year}?* $year command.

Makefile rule depend on directory content changes

Using Make is there a nice way to depend on a directories contents.
Essentially I have some generated code which the application code depends on. The generated code only needs to change if the contents of a directory changes, not necessarily if the files within change their content. So if a file is removed or added or renamed I need the rule to run.
My first thought is generate a text file listing of the directory and diff that with the last listing. A change means rerun the build. I think I will have to pass off the generate and diff part to a bash script.
I am hoping somehow in their infinite intelligence might have an easier solution.
Kudos to gjulianm who got me on the right track. His solution works perfect for a single directory.
To get it working recursively I did the following.
ASSET_DIRS = $(shell find ../../assets/ -type d)
ASSET_FILES = $(shell find ../../assets/ -type f -name '*')
codegen: ../../assets/ $(ASSET_DIRS) $(ASSET_FILES)
generate-my-code
It appears now any changes to the directory or files (add, delete, rename, modify) will cause this rule to run. There is likely some issue with file names here (spaces might cause issues).
Let's say your directory is called dir, then this makefile will do what you want:
FILES = $(wildcard dir/*)
codegen: dir # Add $(FILES) here if you want the rule to run on file changes too.
generate-my-code
As the comment says, you can also add the FILES variable if you want the code to depend on file contents too.
A disadvantage of having the rule depend on a directory is that any change to that directory will cause the rule to be out-of-date — including creating generated files in that directory. So unless you segregate source and target files into different directories, the rule will trigger on every make.
Here is an alternative approach that allows you to specify a subset of files for which additions, deletions, and changes are relevant. Suppose for example that only *.foo files are relevant.
# replace indentation with tabs if copy-pasting
.PHONY: codegen
codegen:
find . -name '*.foo' |sort >.filelist.new
diff .filelist.current .filelist.new || cp -f .filelist.new .filelist.current
rm -f .filelist.new
$(MAKE) generate
generate: .filelist.current $(shell cat .filelist.current)
generate-my-code
.PHONY: clean
clean:
rm -f .filelist.*
The second line in the codegen rule ensures that .filelist.current is only modified when the list of relevant files changes, avoiding false-positive triggering of the generate rule.

Make rsync exclude all directories that contain a file with a specific name

I would like rsync to exclude all directories that contain a file with a specific name, say ".rsync-exclude", independent of the contents of the ".rsync-exclude" file.
If the file ".rsync-exclude" contained just "*", I could use rsync -r SRC DEST --filter='dir-merge,- .rsync-exclude'.
However, the directory should be excluded independent of the contents of the ".rsync-exclude" file (it should at least be possible to leave the ".rsync-exclude" file empty).
Any ideas?
rsync does not support this (at least the manpage does not mention anything), but you can do it in two steps:
run find to find the .rsync-exclude files
pipe this list to --exclude-from (or use a temporary file)
--exclude-from=FILE
This option is related to the --exclude option, but it specifies a FILE that contains exclude patterns
(one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -,
the list will be read from standard input.
alternatively, if you do not mind to put something in the files, you can use:
-F The -F option is a shorthand for adding two --filter rules to your command. The first time it is used
is a shorthand for this rule:
--filter='dir-merge /.rsync-filter'
This tells rsync to look for per-directory .rsync-filter files that have been sprinkled through the
hierarchy and use their rules to filter the files in the transfer. If -F is repeated, it is a short-
hand for this rule:
--filter='exclude .rsync-filter'
This filters out the .rsync-filter files themselves from the transfer.
See the FILTER RULES section for detailed information on how these options work.
Old question, but I had the same one..
You can add the following filter:
--filter="dir-merge,n- .rsync-exclude"
Now you can place a .rsync-exclude file in any folder and write the names of the files and folders you want to exclude line by line. for example:
#.rsync-exclude file
folderYouWantToExclude
allFilesThatStartWithXY*
someSpecialImage.png
So you can use patterns in there too.
What you can't do is:
#.rsync-exclude file
folder/someFileYouWantToExlude
Hope it helps! Cheers
rsync -avz --exclude 'dir' /source /destination

With RSYNC, how do includes and excludes combine?

I want to rsync everything in /Volumes/B/, except for Cache directories, which I want to exclude globally. Also, I don't want to rsync any other /Volume/
I have the following exclusion file:
+ /Volumes/B/***
- Cache/
- /Volumes/*
The first and 3rd line seem to work correctly, except that rsync also picks up all Cache dirs under /Volumes/B/... ( /Volumes/B/***/Cache/ )
What am I missing?
rsync reads the exclude file top down when traversing the directories.
When it visited the Caches dirs, rsync acted on the first matching pattern.
The first matching pattern was "+ /Volumes/B/*", so Cache was included.
The rule is:
When having particular subdirectories, put them first.
Here 's a simple step by step explanation.

Resources