rysnc transfered file name differs from source file name - rsync

This problem is not specific to rsync. If I touch a file named /media/KINGSTON/seventeen. then what is created is /media/KINGSTON/seventeen instead. Can someone explain why?
dmesg
. . .
ugen3.2: <Kingston DataTraveler 3.0> at usbus3
umass0 on uhub6
umass0: <Kingston DataTraveler 3.0, class 0/0, rev 2.10/1.10, addr 2> on usbus3
umass0: SCSI over Bulk-Only; quirks = 0x8100
umass0:5:0: Attached to scbus5
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <Kingston DataTraveler 3.0 PMAP> Removable Direct Access SPC-4 SCSI device
da0: Serial Number 485B39472CCAB171D76F0DF0
da0: 40.000MB/s transfers
da0: 118272MB (242221056 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
gpart show /dev/da0*
=> 63 242220993 da0 MBR (116G)
63 31041 - free - (15M)
31104 242189952 1 !12 [active] (115G)
I am backing up a cyrus-imap mailstore using rsync. Cyrus imap message file names are numbers followed by a dot (####.). When these message files are transferred using rsync on a FreeBSD-11.2 host the trailing dot is removed on the target file name (####. becomes ####). Is there some way to prevent this behaviour?
rsync \
--copy-links \
--no-group \
--no-perms \
--progress \
--protect-args \
--modify-window=1 \
--recursive \
--times \
--update \
--verbose \
./Documents/Personal/IMAP \
/media/KINGSTON/Documents/Personal/IMAP
It appears from further testing that this behaviour is dependent upon the destination. When copied from and to the system hdd the trailing dot appears in the target file name. When the target is a USB key then the dot disappears from the target.

A trailing dot or space is not permitted in a valid MS Windows file name. The transferred files have their names silently altered to meet this requirement when copied to a FAT formatted USB.
From Microsoft file naming conventions (https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file):
Do not end a file or directory name with a space or a period. Although
the underlying file system may support such names, the Windows shell
and user interface does not. However, it is acceptable to specify a
period as the first character of a name. For example, ".temp".

Related

rsync : how to copy only latest file from target to source

We have a main Linux server, say M, where we have files like below (for 2 months, and new files arriving daily)
Folder1
PROCESS1_20211117.txt.gz
PROCESS1_20211118.txt.gz
..
..
PROCESS1_20220114.txt.gz
PROCESS1_20220115.txt.gz
We want to copy only the latest file on our processing server, say P.
So as of now, we were using the below command, on our processing server.
rsync --ignore-existing -azvh -rpgoDe ssh user#M:${TargetServerPath}/${PROCSS_NAME}_*txt.gz ${SourceServerPath}
This process worked fine until now, but from now, in the processing server, we can keep files only up to 3 days. However, in our main server, we can keep files for 2 months.
So when we remove older files from the processing server, the rsync command copies all files from main server to the processing server.
How can I change rsync command to copy only latest file from Main server?
*Note: the example above is only for one file. We have multiple files on which we have to use the same command. Hence we cannot hardcode any filename.
What I tried:
There are multiple solutions, but all seems to be when I want to copy latest file from the server I am running rsync on, not on the remote server.
Also I tried running below to get the latest file from main server, but I cannot pass variable to SSH in my company, as it is not allowed. So below command works if I pass individual path/file name, but cannot work as with variables.
ssh M 'ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz|tail -1'
Would really appreciate any suggestions on how to implement this solution.
OS: Linux 3.10.0-1160.31.1.el7.x86_64
ssh quoting is confusing - to properly quote it, you have to double-quote it locally.
Handy printf %q trick is helpful - quote the relevant parts.
file=$(
ssh M "ls -1 $(printf "%q" "${getServerPath}/${PROCSS_NAME}")_*.txt.gz" |
tail -1
)
rsync --ignore-existing -azvh -rpgoDe ssh user#M:"$file" "${SourceServerPath}"
or maybe nicer to run tail -n1 on the remote, so that minimum amount of data are transferred (we only need one filename, not them all), invoke explicit shell and pass the variables as shell arguments:
file=$(ssh M "$(printf "%q " bash -c \
'ls -1 "$1"_*.txt.gz | tail -n1'
'_' "${TargetServerPath}/${PROCSS_NAME}"
)")
Overall, I recommend doing a function and using declare -f :
sshqfunc() { echo "bash -c $(printf "%q" "$(declare -f "$1"); $1 \"\$#\"")"; };
work() {
ls -1 "$1"_*txt.gz | tail -1
}
tmp=$(ssh M "$(sshqfunc work)" _ "${TargetServerPath}/${PROCSS_NAME}")
or you can also use the mighty declare to transfer variables to remote - then run your command inside single quotes:
ssh M "
$(declare -p TargetServerPath PROCSS_NAME);
"'
ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz | tail -1
'

How should I deal with "sph2pipe command not found" error message?

I'm trying to use the sph2pipe tool to convert the SPH files into wav or mp3 files. Although I have downloaded and installed the tool downloaded from here: https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools
still don't see any program that I can use..
On windows 10, after downloading sph2pipe and click the .exe file, a window just quickly popped up and never showed up again. And then I can't find any program called sph2pipe from the system and no command named sph2pipe either.
On Mac, I downloaded the program from where I forgot, but after clicked the executable file on mac, I got this document saying
Last login: Tue May 8 18:57:21 on ttys001
Pennys-MBP:~ me$ /Users/me/Downloads/SPH/sph2pipe_v2.5/sph2pipe ; exit;
Usage: sph2pipe [-h hdr] [-t|-s b:e] [-c 1|2] [-p|-u|-a] [-f typ] infile [outfile]
default conditions (for 'sph2pipe infile'):
input file contains sphere header
output full duration of input file
output all channels from input file
output same sample coding as input file
output format is WAV on Wintel machines, SPH elsewhere
output is written to stdout
optional controls (items bracketed separately above can be combined):
-h hdr -- treat infile as headerless, get sphere info from file 'hdr'
-t b:e -- output portion between b and e sec (floating point)
-s b:e -- output portion between b and e samples (integer)
-c 1 -- only output first channel
-c 2 -- only output second channel
-p -- force conversion to 16-bit linear pcm
-u -- force conversion to 8-bit ulaw
-a -- force conversion to 8-bit alaw
-f typ -- select alternate output header format 'typ'
five types: sph, raw, au, rif(wav), aif(mac)
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
[Process completed]
But still when try to type sph2pipe on my terminal, I got the response:
-bash: sph2pipe: command not found
Can somebody help me? I need to do the conversion very soon.
Thank you!
I figured it out:
sph2pipe.exe file file.wav

Why is rsync daemon truncating this path?

I'm trying to synchronize a set of remote files via an rsync daemon, but the resulting path is missing the initial path element.
$ rsync -HRavP ftp.ncbi.nih.gov::refseq/H_sapiens/README 2015-05-11/
receiving incremental file list
created directory 2015-05-11
H_sapiens/
H_sapiens/README
4,850 100% 4.63MB/s 0:00:00 (xfr#1, to-chk=0/2)
sent 51 bytes received 5,639 bytes 3,793.33 bytes/sec
total size is 4,850 speedup is 0.85
$ tree 2015-05-11/
2015-05-11/
└── H_sapiens
└── README
Notice that the resulting tree is missing the first part of the remote path ("refseq").
I realize that I can append the first element of the remote path to the destination path, but it seems unlikely (to me) that this is the intended behavior of rsync.
It's worth noting for comparison that rsync -HRavP refseq/H_sapiens/README 2015-05-11/ (where the source is a local file) correctly creates the full relative path under the destination directory.
See rsync description:
CONNECTING TO AN RSYNC SERVER
...
Using rsync in this way is the same as using it with rsh or ssh except that:
You use a double colon :: instead of a single colon to separate the hostname from the path.
The first word of the "path" is actually a module name.
You can get all module names with
rsync -HRavP ftp.ncbi.nih.gov::

How to rsync only a specific list of files?

I've about 50 or so files in various sub-directories that I'd like to push to a remote server. I figured rsync would be able to do this for me using the --include-from option. Without the --exclude="*" option, all the files in the directory are being synced, with the option, no files are.
rsync -avP -e ssh --include-from=deploy/rsync_include.txt --exclude=* ./ root#0.0.0.0:/var/www/ --dry-run
I'm running it as dry initially and 0.0.0.0 is obviously replaced by the IP of the remote server. The contents of rsync_include.txt is a new line separated list of relative paths to the files I want to upload.
Is there a better way of doing this that is escaping me on a Monday morning?
There is a flag --files-from that does exactly what you want. From man rsync:
--files-from=FILE
Using this option allows you to specify the exact list of files to transfer (as read from the specified FILE or - for standard input). It also tweaks the default behavior of rsync to make transferring just the specified files and directories easier:
The --relative (-R) option is implied, which preserves the path information that is specified for each item in the file (use --no-relative or --no-R if you want to turn that off).
The --dirs (-d) option is implied, which will create directories specified in the list on the destination rather than noisily skipping them (use --no-dirs or --no-d if you want to turn that off).
The --archive (-a) option’s behavior does not imply --recursive (-r), so specify it explicitly, if you want it.
These side-effects change the default state of rsync, so the position of the --files-from option on the command-line has no bearing on how other options are parsed (e.g. -a works the same before or after --files-from, as does --no-R and all other options).
The filenames that are read from the FILE are all relative to the source dir -- any leading slashes are removed and no ".." references are allowed to go higher than the source dir. For example, take this command:
rsync -a --files-from=/tmp/foo /usr remote:/backup
If /tmp/foo contains the string "bin" (or even "/bin"), the /usr/bin directory will be created as /backup/bin on the remote host. If it contains "bin/" (note the trailing slash), the immediate contents of the directory would also be sent (without needing to be explicitly mentioned in the file -- this began in version 2.6.4). In both
cases, if the -r option was enabled, that dir’s entire hierarchy would also be transferred (keep in mind that -r needs to be specified explicitly with --files-from, since it is not implied by -a). Also note that the effect of the (enabled by default) --relative option is to duplicate only the path info that is read from the file -- it
does not force the duplication of the source-spec path (/usr in this case).
In addition, the --files-from file can be read from the remote host instead of the local host if you specify a "host:" in front of the file (the host must match one end of the transfer). As a short-cut, you can specify just a prefix of ":" to mean "use the remote end of the transfer". For example:
rsync -a --files-from=:/path/file-list src:/ /tmp/copy
This would copy all the files specified in the /path/file-list file that was located on the remote "src" host.
If the --iconv and --protect-args options are specified and the --files-from filenames are being sent from one host to another, the filenames will be translated from the sending host’s charset to the receiving host’s charset.
NOTE: sorting the list of files in the --files-from input helps rsync to be more efficient, as it will avoid re-visiting the path elements that are shared between adjacent entries. If the input is not sorted, some path elements (implied directories) may end up being scanned multiple times, and rsync will eventually unduplicate them after
they get turned into file-list elements.
For the record, none of the answers above helped except for one. To summarize, you can do the backup operation using --files-from= by using either:
rsync -aSvuc `cat rsync-src-files` /mnt/d/rsync_test/
OR
rsync -aSvuc --recursive --files-from=rsync-src-files . /mnt/d/rsync_test/
The former command is self explanatory, beside the content of the file rsync-src-files which I will elaborate down below. Now, if you want to use the latter version, you need to keep in mind the following four remarks:
Notice one needs to specify both --files-from and the source directory
One needs to explicitely specify --recursive.
The file rsync-src-files is a user created file and it was placed within the src directory for this test
The rsyn-src-files contain the files and folders to copy and they are taken relative to the source directory. IMPORTANT: Make sure there is not trailing spaces or blank lines in the file. In the example below, there are only two lines, not three (Figure it out by chance). Content of rsynch-src-files is:
folderName1
folderName2
--files-from= parameter needs trailing slash if you want to keep the absolute path intact. So your command would become something like below:
rsync -av --files-from=/path/to/file / /tmp/
This could be done like there are a large number of files and you want to copy all files to x path. So you would find the files and throw output to a file like below:
find /var/* -name *.log > file
$ date
Wed 24 Apr 2019 09:54:53 AM PDT
$ rsync --version
rsync version 3.1.3 protocol version 31
...
Syntax: rsync <args> <file_and_or_folder_list> <source_dir> <destination_dir/>
Folder names - WITH a trailing /; e.g. Cancer - Evolution/ - are provided in a file (e.g. my_folder_list):
# comment: /mnt/Vancouver/my_folder_list
# comment: 2019-04-24
some_file
another_file
Cancer/
Cancer - Evolution/
Cancer - Genomic Variants/
Cancer - Metastasis (EMT Transition ...)/
Cancer Pathways, Networks/
Catabolism - Autophagy; Phagosomes; Mitophagy/
so those are the "source" (files and/or) folders, to be rsync'd.
Note that if you don't include the trailing / shown above, rsync creates the target folders, but they are empty.
Those folder names provided in the <file_and_or_folder_list> are appended to the rest of their path: <src_dir> = /home/victoria/RESEARCH - NEWS (here, on a different partition), thus providing the complete folder path to rsync; e.g.: ... /home/victoria/RESEARCH - NEWS/Cancer - Evolution/ ...
[ I'm editing this answer some time later (2022-07), and I can't recall if the path provided to <src_dir> is /home/victoria/RESEARCH - NEWS or /home/victoria/RESEARCH - NEWS/ - providing the correct concatenated path. I believe it's the former; if it doesn't work, use the latter. ]
Note that you also need to use --files-from= ..., NOT --include-from= ...
Again the rsync syntax is:
rsync <args> <file_and_or_folder_list> <source_dir> <destination_dir/>
so,
rsync -aqP --delete --files-from=/mnt/Vancouver/my_folder_list "/home/victoria/RESEARCH - NEWS" $DEST_DIR/
where
<args> is -aqP --delete
<file_and_or_folder_list> is --files-from=/mnt/Vancouver/my_folder_list
<source_dir> is "/home/victoria/RESEARCH - NEWS"
<destination_dir/> is $DEST_DIR/ (note the trailing / added to the variable name)
In my BASH script, for coding flexibility I defined variable $DEST_DIR in two parts as follows.
BASEDIR="/mnt/Vancouver"
DEST_DIR=$BASEDIR/data
echo $DEST_DIR ## /mnt/Vancouver/data
## To clarify, here is $DEST_DIR with / appended to the variable name:
echo $DEST_DIR/ ## /mnt/Vancouver/data/
echo $DEST_DIR/apple/banana ## /mnt/Vancouver/data/apple/banana
However, you can more simply specify the destination path:
via a BASH variable: $DEST_DIR=/mnt/Vancouver/data
note that in the rsync expression above, / is appended to $DEST_DIR (i.e. $DEST_DIR/ is actually $DEST_DIR + /), giving the destination directory path /mnt/Vancouver/data/
explicitly state the destination path: /mnt/Vancouver/data/
rsync options used: ## man rsync or rsync -h
-a : archive: equals -rlptgoD (no -H,-A,-X)
-r : recursive
-l : copy symlinks as symlinks
-p : preserve permissions
-t : preserve modification times
-g : preserve group
-o : preserve owner (super-user only)
-D : same as --devices --specials
-P : same as --partial --progress
-q : quiet (https://serverfault.com/questions/547106/run-totally-silent-rsync)
--delete
This tells rsync to delete extraneous files from the RECEIVING SIDE (ones
that AREN’T ON THE SENDING SIDE), but only for the directories that are
being synchronized. You must have asked rsync to send the whole directory
(e.g. "dir" or "dir/") without using a wildcard for the directory’s contents
(e.g. "dir/*") since the wildcard is expanded by the shell and rsync thus
gets a request to transfer individual files, not the files’ parent directory.
Files that are excluded from the transfer are also excluded from being
deleted unless you use the --delete-excluded option or mark the rules as
only matching on the sending side (see the include/exclude modifiers in the
FILTER RULES section). ...
Edit: atp's answer below is better. Please use that one!
You might have an easier time, if you're looking for a specific list of files, putting them directly on the command line instead:
# rsync -avP -e ssh `cat deploy/rsync_include.txt` root#0.0.0.0:/var/www/
This is assuming, however, that your list isn't so long that the command line length will be a problem and that the rsync_include.txt file contains just real paths (i.e. no comments, and no regexps).
None of these answers worked for me, when all I had was a list of directories. Then I stumbled upon the solution! You have to add -r to --files-from because -a will not be recursive in this scenario (who knew?!).
rsync -aruRP --files-from=directory.list . ../new/location
I got similar task: to rsync all files modified after given date, but excluding some directories. It was difficult to build one liner all-in-one style, so I dived problem into smaller pieces.
Final solution:
find ~/sourceDIR -type f -newermt "DD MMM YYYY HH:MM:SS" | egrep -v "/\..|Downloads|FOO" > FileList.txt
rsync -v --files-from=FileList.txt ~/sourceDIR /Destination
First I use find -L ~/sourceDIR -type f -newermt "DD MMM YYYY HH:MM:SS". I tried to add regex to find line to exclude name patterns, however my flavor of Linux (Mint) seams not to understand negate regex in find. Tried number of regex flavors - non work as desired.
So I end up with egrep -v - option that excludes pattern easy way. My rsync is not copying directories like /.cache or /.config plus some other I explicitly named.
This answer is not the direct answer for the question.
But it should help you figure out which solution fits best for your problem.
When analysing the problem you should activate the debug option -vv
Then rsync will output which files are included or excluded by which pattern:
building file list ...
[sender] hiding file FILE1 because of pattern FILE1*
[sender] showing file FILE2 because of pattern *

Copying from Dropbox to UNIX machine using rsync or similar?

Is there any way I can copy data directly from the Dropbox servers to a UNIX server, without being root or having Dropbox software installed there?
Something like:
rsync -aP myusername#dropbox.com:somepath/ .
(The reason for wanting to do this is that the transfer speed between the UNIX server and the Dropbox server would be much faster since they are both on the backbone, than between my local machine and the UNIX server, which have a home broadband connection).
You can use wget to download the file from dropbox to the Server from the command line. some thing like
wget http://dl.dropbox.com/u/12345678/largefile.zip
If you are on a slow link and the download cuts off before all of the file is downloaded, you can resume download from wherever it stopped, by using the -c option with wget. Just do:
wget -c http://dl.dropbox.com/u/12345678/largefile.zip
Try rclone https://github.com/dropbox/dbxcli/issues/60#issuecomment-497713363
install
$ curl -OJN https://downloads.rclone.org/rclone-current-linux-amd64.zip
$ unzip rclone-current-linux-amd64.zip
$ cp rclone-v1.47.0-linux-amd64/rclone ~/bin/
config https://rclone.org/dropbox/
$ rclone config
2019/05/31 15:00:07 NOTICE: Config file "/home/roman/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> dropbox
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
1 / A stackable unification remote, which can appear to merge the contents of several remotes
\ "union"
2 / Alias for a existing remote
\ "alias"
3 / Amazon Drive
\ "amazon cloud drive"
4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, etc)
\ "s3"
5 / Backblaze B2
\ "b2"
6 / Box
\ "box"
7 / Cache a remote
\ "cache"
8 / Dropbox
\ "dropbox"
9 / Encrypt/Decrypt a remote
\ "crypt"
10 / FTP Connection
\ "ftp"
11 / Google Cloud Storage (this is not Google Drive)
\ "google cloud storage"
12 / Google Drive
\ "drive"
13 / Hubic
\ "hubic"
14 / JottaCloud
\ "jottacloud"
15 / Koofr
\ "koofr"
16 / Local Disk
\ "local"
17 / Mega
\ "mega"
18 / Microsoft Azure Blob Storage
\ "azureblob"
19 / Microsoft OneDrive
\ "onedrive"
20 / OpenDrive
\ "opendrive"
21 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
\ "swift"
22 / Pcloud
\ "pcloud"
23 / QingCloud Object Storage
\ "qingstor"
24 / SSH/SFTP Connection
\ "sftp"
25 / Webdav
\ "webdav"
26 / Yandex Disk
\ "yandex"
27 / http Connection
\ "http"
Storage> 8
** See help for dropbox backend at: https://rclone.org/dropbox/ **
Dropbox App Client Id
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_id>
Dropbox App Client Secret
Leave blank normally.
Enter a string value. Press Enter for the default ("").
client_secret>
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n>
y/n> y
If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...
Got code
--------------------
[dropbox]
type = dropbox
token = {"access_token":"<token>","token_type":"bearer","expiry":"0001-01-01T00:00:00Z"}
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:
Name Type
==== ====
dropbox dropbox
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q
copy all files from dropbox https://rclone.org/docs/
$ rclone copy --dry-run dropbox:/ .
$ rclone copy dropbox:/ .
I don't believe you have SSH access to the Dropbox server. Think about it, that would mean the Dropbox server would have thousands if not millions of SSH accounts, just in the chance that a user would use it. Additionally, a default Unix user has access to read so much of the Unix OS and restricting it to just a few commands is somewhat of a big deal. Imagine if your Unix user was somehow able to see the data of another Unix user including their Dropbox data. Good idea though.
Edit: capitalization.

Resources