I have some files on remote host (in a directory) and I want to perform rsync in an atomic manner at directory level to pull files on local host (In a distributed setup). One way I could think of is a very trivial case when I can take files backup on local host and then replace the old files with the new files, but the approach is not efficient as far as disk space is concerned. e.g. files size is 10GB and diff is just 100 MB.
Is there a way to store just the rsync diff on local host in temporary location and then update the files on local host?
You could do it like this:
Run rsync between local host and a temp folder in remote host. To make sure you only get the diff, use the --link-dest option and link to the real folder in remote host.
You'd basically have a command like this:
rsync --link-dest="/var/www" --archive "/localhost/path/www/" "remote#example.com:/var/www_update_20131129"
(With /var/www being the files to update and /var/www_update_20131129/ being the "temp" folder)
Once the rsync operation is done, you can swap the www_update_20131129/ and real www/ folders in remote host (possibly by soft-linking www/ to www_update_20131129/).
Related
I'm copying files from openshift pod to UNIX server. Files are in Giga Bytes size. I'm using oc rsync in Unix server. But, it's using /tmp directory as cache directory while copying. File size is greater than the /tmp directory size. Due to that, I'm getting "no space left on the device"
Is there is any way to bypass /tmp directory cache to different folder or can we totally avoid the cache?
You can try to set variable TMP or TEMP to point other directory with enough space.
I am sure in documentation you will find mentioned the proper variable (if it's not in two above)
The following worked for me.
export TMPDIR="folder were data should be cached"
oc rsync pod:source_path target_path
Thanks to #Romeo Ninov for pointing me in the right direction.
I need to load hive partitions from staging folders. Currently we copy and delete. Can I use mv?
I am told that I can not use mv if the folders are EAR (Encryption At Rest). How to tell if a folder is EAR'ed?
I'm assuming the feature you are using for encryption at rest is HDFS transparent encryption (see cloudera 5.14 docs).
There is a command to get all the zones configured for encryption, listZones, but that command requires admin privileges. However, if you just need to check the permission of one file at a time, you should be able to run getFileEncryptionInfo without these permissions.
For example
hdfs crypto -getFileEncryptionInfo -path /path/to/my/file
As for whether you can move files, it looks like the answer to that is no. From the "Rename and Trash considerations" section of the transparent encryption documentation:
HDFS restricts file and directory renames across encryption zone boundaries. This includes renaming an encrypted file / directory into an unencrypted directory (e.g., hdfs dfs mv /zone/encryptedFile /home/bob), renaming an unencrypted file or directory into an encryption zone (e.g., hdfs dfs mv /home/bob/unEncryptedFile /zone), and renaming between two different encryption zones (e.g., hdfs dfs mv /home/alice/zone1/foo /home/alice/zone2).
and
A rename is only allowed if the source and destination paths are in the same encryption zone, or both paths are unencrypted (not in any encryption zone).
So it looks like using cp and rm is your best bet.
I have two VM's : dev and prod.
I want to use rsync to copy dump file from prod and then restore it on dev. I'm using this command to copy:
rsync -rave user#ip:/home/user/dumps /home/anotheruser/workspace/someapp/dumps
The same thing successfully copies static files (.html, .css) from another directory, but in this case only the folder itself is created but without the file:
/home/anotheruser/workspace/someapp/dumps
but I'm expecting:
/home/anotheruser/workspace/someapp/dumps/dumpfile
What is going wrong? dumpfile exists there user#ip:/home/user/dumps/dumpfile.
The command you want is probably this:
rsync -av user#ip:/home/user/dumps/ /home/anotheruser/workspace/someapp/dumps/
I've
removed the r because it's implied by the a anyway.
removed the e because that was probably your problem; it requires a parameter that you haven't given.
added the / at the end of the pathnames to make sure they're treated as directories.
I'm using something like
rsync -av --remove-source-files user#host:/remotedir/* localdir
to move about 100 quite large files from a server to a local machine, which takes about one day. With the command above rsync deletes the source files after the last file has been transfered.
I'm looking for a way to delete each of the about 100 remote files immediately after it has been transfered. I'm not forced to use rsync. All that works through a ssh connection would be fine.
The following command works great for me for a single file:
scp your_username#remotehost.edu:foobar.txt /some/local/directory
What I want to do is do it recursive (i.e. for all subdirectories / subfiles of a given path on server), merge folders and overwrite files that already exist locally, and finally downland only those files on server that are smaller than a certain value (e.g. 10 mb).
How could I do that?
Use rsync.
Your command is likely to look like this:
rsync -az --max-size=10m your_username#remotehost.edu:foobar.txt /some/local/directory
-a (archive mode - the sync is recursive, transfers ownership, attributes, symlinks among other things)
-z (compresses transfer)
--max-size (only copies files up to a certain size)
There are many more flags which may be suitable. Checkout the docs for more details - http://linux.die.net/man/1/rsync
First option: use rsync.
Second option, and it's not going to be a one liner, but can be done in three or four lines:
Create a tar archive on the remote system using ssh.
Copy the tar from remote system with scp.
Untar the archive locally.
If the creation of the archive gets a bit complicated and involves using find and/or tar with several options it is quite practical to create a script which would do that locally, upload it on the server with scp, and only then execute remotely with ssh.