Can I used HDFS mv on encrypted folders

Can I used HDFS mv on encrypted folders - encryption

I need to load hive partitions from staging folders. Currently we copy and delete. Can I use mv?
I am told that I can not use mv if the folders are EAR (Encryption At Rest). How to tell if a folder is EAR'ed?

I'm assuming the feature you are using for encryption at rest is HDFS transparent encryption (see cloudera 5.14 docs).
There is a command to get all the zones configured for encryption, listZones, but that command requires admin privileges. However, if you just need to check the permission of one file at a time, you should be able to run getFileEncryptionInfo without these permissions.
For example
hdfs crypto -getFileEncryptionInfo -path /path/to/my/file
As for whether you can move files, it looks like the answer to that is no. From the "Rename and Trash considerations" section of the transparent encryption documentation:
HDFS restricts file and directory renames across encryption zone boundaries. This includes renaming an encrypted file / directory into an unencrypted directory (e.g., hdfs dfs mv /zone/encryptedFile /home/bob), renaming an unencrypted file or directory into an encryption zone (e.g., hdfs dfs mv /home/bob/unEncryptedFile /zone), and renaming between two different encryption zones (e.g., hdfs dfs mv /home/alice/zone1/foo /home/alice/zone2).
and
A rename is only allowed if the source and destination paths are in the same encryption zone, or both paths are unencrypted (not in any encryption zone).
So it looks like using cp and rm is your best bet.

Related

How to rewrite the destination path with rsync?

Hi I want to use rsync to move data from sourcehost to desthost but the directory structure is not 100% what I want.
sourcehost:
crappy_dir
another_crappy_dir
i_like_this_one
and_this too
file
desthost should have:
my_cool_dir
i_like_this_one
and_this too
file
this is my files_to_include.txt:
/crappy_dir/another_crappy_dir/i_like_this_one/and_this_too/file
and my current test rsync command:
desthost# rsync -aAHXv -e ssh --files-from=files_to_include.txt sourcehost: /my_cool_dir
but it creates
/my_cool_dir/crappy_dir/another_crappy_dir/i_like_this_one/and_this_too/file
is there any option in rsync to re-write the destination path as I want to? let's say some magical perl-like-regexp like --magical-dest-transformation "s#/crappy_dir/another_crappy_dir/#/#" will do it. I couldn't come up with a good --rsync-command option either. suggestions are welcomed.
Note: this is a several terabytes multi host copy that will take some days to do, a "simple mv" after copying is not good enough because I'll re-run rsync several times. I need it to be smart enough to "peer up" the files.

is there any option in rsync to re-write the destination path as I want to?
No because the sync part of rsync depends on path, filename, and other meta characteristics as well for not just copying but also deleting.
a "simple mv" after copying is not good enough
How about a simple mv before copying? If that's not feasible, why not make a linked folder structure on the source and then rsync that folder structure to the destination?
I need it to be smart enough to "peer up" the files
Did you consider making one rsync command for each file? This way you just have to transform source and dest folders once but can re-run the rsync lines multiple times to peer up.

Unable to copy using cp -R command

When I am doing cp -R at my system
cp -R \[Bark\]\ Java/ /Volumes/Seagate\ Backup\ Plus\ Drive/
its throwing this message
cp: /Volumes/Seagate Backup Plus Drive/Advanced: Read-only file system
cp: [Bark] Java//Advanced:unable to copy extended attributes to /Volumes/Seagate Backup Plus Drive/Advanced: Read-only file system.
I guess I need to make the backup disk in write mode
How to change the mode of the backup disk.

Mounting on many UNIX systems is controlled by the etc/fstab file (see here). This usually specifies mount options for each device.
You need to change the mount options for the device you're interested in.
If you examine the /etc/fstab file on your system, you should see something along the lines of ro in the options column.
If you change that to rw and then remount it, it should be writeable.
You may want to change it back when you're done so as to protect the information on it.
For Mac OSX, fstab is still used but the file systems may also be under the control of automount - look up the man pages for automount and autofs.conf for information on how to configure those.

Atomic rsync at directory level with minimum temporary storage

I have some files on remote host (in a directory) and I want to perform rsync in an atomic manner at directory level to pull files on local host (In a distributed setup). One way I could think of is a very trivial case when I can take files backup on local host and then replace the old files with the new files, but the approach is not efficient as far as disk space is concerned. e.g. files size is 10GB and diff is just 100 MB.
Is there a way to store just the rsync diff on local host in temporary location and then update the files on local host?

You could do it like this:
Run rsync between local host and a temp folder in remote host. To make sure you only get the diff, use the --link-dest option and link to the real folder in remote host.
You'd basically have a command like this:
rsync --link-dest="/var/www" --archive "/localhost/path/www/" "remote#example.com:/var/www_update_20131129"
(With /var/www being the files to update and /var/www_update_20131129/ being the "temp" folder)
Once the rsync operation is done, you can swap the www_update_20131129/ and real www/ folders in remote host (possibly by soft-linking www/ to www_update_20131129/).

synchronise local directories over ssh

The following command works great for me for a single file:
scp your_username#remotehost.edu:foobar.txt /some/local/directory
What I want to do is do it recursive (i.e. for all subdirectories / subfiles of a given path on server), merge folders and overwrite files that already exist locally, and finally downland only those files on server that are smaller than a certain value (e.g. 10 mb).
How could I do that?

Use rsync.
Your command is likely to look like this:
rsync -az --max-size=10m your_username#remotehost.edu:foobar.txt /some/local/directory
-a (archive mode - the sync is recursive, transfers ownership, attributes, symlinks among other things)
-z (compresses transfer)
--max-size (only copies files up to a certain size)
There are many more flags which may be suitable. Checkout the docs for more details - http://linux.die.net/man/1/rsync

First option: use rsync.
Second option, and it's not going to be a one liner, but can be done in three or four lines:
Create a tar archive on the remote system using ssh.
Copy the tar from remote system with scp.
Untar the archive locally.
If the creation of the archive gets a bit complicated and involves using find and/or tar with several options it is quite practical to create a script which would do that locally, upload it on the server with scp, and only then execute remotely with ssh.

Copy or rsync command

The following command is working as expected...
cp -ur /home/abc/* /mnt/windowsabc/
Does rsync has any advantage over it? Is there a better way to keep to backup folder in sync every 24 hours?

Rsync is better since it will only copy only the updated parts of the updated file, instead of the whole file. It also uses compression and encryption if you want. Check out this tutorial.

rsync is not necessarily more efficient, due to the more detailed inventory of files and blocks it performs. The algorithm is fantastic at what it does, but you need to understand your problem to know if it is really going to be the best choice.
On a very large file system (say many thousands or millions of files) where files tend to be added but not updated, "cp -u" will likely be more efficient. cp makes the decision to copy solely on metadata and can simply get to the business of copying.
Note that you might want some buffering, e.g. by using tar rather than straight cp, depending on the size of the files, network performance, other disk activity, etc. I find the following idea very useful:
tar cf - . | tar xCf directory -
Metadata itself may actually become a significant overhead on very large (cluster) file systems, but rsync and cp will share this problem.
rsync seems to frequently be the preferred tool (and in general purpose applications is my usual default choice), but there are probably many people who blindly use rsync without thinking it through.

The command as written will create new directories and files with the current date and time stamp, and yourself as the owner. If you are the only user on your system and you are doing this daily it may not matter much. But if preserving those attributes matters to you, you can modify your command with
cp -pur /home/abc/* /mnt/windowsabc/
The -p will preserve ownership, timestamps, and mode of the file. This can be pretty important depending on what you're backing up.
The alternative command with rsync would be
rsync -avh /home/abc/* /mnt/windowsabc
With rsync, -a indicates "archive" which preserves all those attributes mentioned above. -v indicates "verbose" which just lists what it's doing with each file as it runs. -z is left out here for local copies, but is for compression, which will help if you are backing up over a network. Finally, the -h tells rsync to report sizes in human-readable formats like MB,GB,etc.
Out of curiosity, I ran one copy to prime the system and avoid biasing against the first run, then I timed the following on a test run of 1GB of files from an internal SSD drive to a USB-connected HDD. These simply copied to empty target directories.
cp -pur : 19.5 seconds
rsync -ah : 19.6 seconds
rsync -azh : 61.5 seconds
Both commands seem to be about the same, although zipping and unzipping obviously tax the system where bandwidth is not a bottleneck.

Especially if you use a copy-on-write filesystem like BTRFS or ZFS, rsync is much better.
I use BTRFS, and I have this in my ~/.bashrc:
alias cp="rsync -ah --inplace --no-whole-file --info=progress2"
The important flag here for CoW FSs like BTRFS is --inplace because it only copies the changed part of the files, doesn't create new inodes for small changes between files, etc. See this.

It's not really a question of what's more efficient.
The commands 'rsync', and 'cp' are not equivalent and achieve different goals.
1- rsync can preserve the time of creation of existing files. (using -a option)
2- rsync will run multiprocess and transfer using either local sockets or network sockets. (i.e. fork itself into multiple processes)
3- The multiprocessing, and threading will increase your throughput when copying large number of small files, and even with multiple larger files.
So bottom line is rsync is for large data, and cp is for smaller local copying. (MB to small GB range). When you start getting into multiple GB or in the TB range, go with rsync. And of course network copies, rsync all the way.

For a local copy, the only advantage of rsync is that it will avoid copying if the file already exists in the destination directory. The definition of "already exists" is (a) same file name (b) same size (c) same timestamp. (Maybe same owner/group; I am not sure...)
The "rsync algorithm" is great for incremental updates of a file over a slow network link, but it will not buy you much for a local copy, as it needs to read the existing (partial) file to run it's "diff" computation.
So if you are running this sort of command frequently, and the set of changed files is small relative to the total number of files, you should find that rsync is faster than cp. (Also rsync has a --delete option that you might find useful.)

Keep in mind that while transferring files internally on a machine i.e not network transfer, using the -z flag can have a massive difference in the time taken for the transfer.
Transfer within same machine
Case 1: With -z flag:
TAR took: 9.48345208168
Encryption took: 2.79352903366
CP took = 5.07273387909
Rsync took = 30.5113282204
Case 2: Without the -z flag:
TAR took: 10.7535531521
Encryption took: 3.0386879921
CP took = 4.85565590858
Rsync took = 4.94515299797

if you are using cp doesn't save existing files when copying folders of the same name. Lets say you have this folders:
/myFolder
someTextFile.txt
/someOtherFolder
/myFolder
wellHelloThere.txt
Then you copy one over the other:
cp /someOtherFolder/myFolder /myFolder
result:
/myFolder
wellHelloThere.txt
This is at least what happens on macOS and I wanted to preserve the diff files so I used rsync.

I will prefer to use rsync with the following options
rsync -avhW --no-compress --progress --info=progress2 <src directory> <dst directory>
The above parameters can be defined as follows :
-a for the archive to preserves ownership, permissions, etc.
-v for verbose
-h for human-readable
-W for copying whole files only
--no-compress as there's no lack of bandwidth between local devices
--progress to see the progress of large files
--info=progress2 to see the overall progress
source directory path
destination directory path

rsync is much much better compared to cp because rsync copies whole files/directory only the first time. The next time when you use rsync command with the same files/directory, only new changes are copied to the destination folder, not the entire files are copied.

I used rsynk to transfer 330G data from a local HD to a external HD via USB 3.0. It took me three days. The transfer rate went down to 800 Kb/s and rised to 50 M/s for a while only after pausing the job. It is a typical overbuffering issue. Bad experience for local file tranfers: as the name indicates, (R)sync stands for REMOTE-sync (optimized for tranfers via network). As often happens, I discovered the "-z" flag only after I wondered about the issue and looked for an understandment

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Can I used HDFS mv on encrypted folders - encryption

I need to load hive partitions from staging folders. Currently we copy and delete. Can I use mv? I am told that I can not use mv if the folders are EAR (Encryption At Rest). How to tell if a folder is EAR'ed?

Related

How to rewrite the destination path with rsync?

Unable to copy using cp -R command

Atomic rsync at directory level with minimum temporary storage

synchronise local directories over ssh

Copy or rsync command

Categories

Resources