Related
I need to transfer the Files from remote Linux server to directly HDFS.
I have keytab placed on remote server , after kinit command its activated however i cannot browse the HDFS folders. I know from edge nodes i can directly copy files to HDFS however i need to skip the edge node and directly transfer the files to HDFS.
how can we achieve this.
Let's assume a couple of things first. You have one machine on which the external hard drive is mounted (named DISK) and one cluster of machines with an ssh access to the master (we denote by master in the command line the user#hostname part of the master machine). You run the script on the machine with the drive. The data on the drive consists of multiple directories with multiple files in each (like a 100); the numbers don't matter, it's just to justify the loops. The path to the data will be stored in the ${DIR} variable (on Linux, it would be /media/DISK and on Mac OS X /Volumes/DISK). Here is what the script looks like:
DIR=/Volumes/DISK;
for d in $(ls ${DIR}/);
do
for f in $(ls ${DIR}/${d}/);
do
cat ${DIR}/${d}/${f} | ssh master "hadoop fs -put - /path/on/hdfs/${d}/${f}";
done;
done;
Note that we go over each file and we copy it into a specific file because the HDFS API for put requires that "when source is stdin, destination must be a file."
Unfortunately, it takes forever. When I came back the next morning, it only did a fifth of the data (100GB) and was still running... Basically taking 20 minutes per directory! I ended up going forward with the solution of copying the data temporarily on one of the machines and then copying it locally to HDFS. For space reason, I did it one folder at a time and then deleting the temporarily folder immediately after. Here is what the script looks like:
DIR=/Volumes/DISK;
PTH=/path/on/one/machine/of/the/cluster;
for d in $(ls ${DIR}/);
do
scp -r -q ${DIR}/${d} master:${PTH}/
ssh master "hadoop fs -copyFromLocal ${PTH}/${d} /path/on/hdfs/";
ssh master "rm -rf ${PTH}/${d}";
done;
Hope it helps!
I'd like to copy files from/to remote server in different directories.
For example, I want to run these 4 commands at once.
scp remote:A/1.txt local:A/1.txt
scp remote:A/2.txt local:A/2.txt
scp remote:B/1.txt local:B/1.txt
scp remote:C/1.txt local:C/1.txt
What is the easiest way to do that?
Copy multiple files from remote to local:
$ scp your_username#remote.edu:/some/remote/directory/\{a,b,c\} ./
Copy multiple files from local to remote:
$ scp foo.txt bar.txt your_username#remotehost.edu:~
$ scp {foo,bar}.txt your_username#remotehost.edu:~
$ scp *.txt your_username#remotehost.edu:~
Copy multiple files from remote to remote:
$ scp your_username#remote1.edu:/some/remote/directory/foobar.txt \
your_username#remote2.edu:/some/remote/directory/
Source: http://www.hypexr.org/linux_scp_help.php
From local to server:
scp file1.txt file2.sh username#ip.of.server.copyto:~/pathtoupload
From server to local (up to OpenSSH v9.0):
scp -T username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From server to local (OpenSSH v9.0+):
scp -OT username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From man 1 scp:
-O Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the
SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for
particular filename wildcard patterns and for expanding paths with a ‘~’ prefix for older SFTP
servers.
HISTORY
Since OpenSSH 9.0, scp has used the SFTP protocol for transfers by default.
You can copy whole directories with using -r switch so if you can isolate your files into own directory, you can copy everything at once.
scp -r ./dir-with-files user#remote-server:upload-path
scp -r user#remote-server:path-to-dir-with-files download-path
so for instance
scp -r root#192.168.1.100:/var/log ~/backup-logs
Or if there is just few of them, you can use:
scp 1.txt 2.txt 3.log user#remote-server:upload-path
As Jiri mentioned, you can use scp -r user#host:/some/remote/path /some/local/path to copy files recursively. This assumes that there's a single directory containing all of the files you want to transfer (and nothing else).
However, SFTP provides an alternative if you want to transfer files from multiple different directories, and the destinations are not identical:
sftp user#host << EOF
get /some/remote/path1/file1 /some/local/path1/file1
get /some/remote/path2/file2 /some/local/path2/file2
get /some/remote/path3/file3 /some/local/path3/file3
EOF
This uses the "here doc" syntax to define a sequence of SFTP input commands. As an alternative, you could put the SFTP commands into a text file and execute sftp user#host -b batchFile.txt
The answers with {file1,file2,file3} works only with bash (on remote or locally)
The real way is :
scp user#remote:'/path1/file1 /path2/file2 /path3/file3' /localPath
After playing with scp for a while I have found the most robust solution:
(Beware of the single and double quotation marks)
Local to remote:
scp -r "FILE1" "FILE2" HOST:'"DIR"'
Remote to local:
scp -r HOST:'"FILE1" "FILE2"' "DIR"
Notice that whatever after "HOST:" will be sent to the remote and parsed there. So we must make sure they are not processed by the local shell. That is why single quotation marks come in. The double quotation marks are used to handle spaces in the file names.
If files are all in the same directory, we can use * to match them all, such as
scp -r "DIR_IN"/*.txt HOST:'"DIR"'
scp -r HOST:'"DIR_IN"/*.txt' "DIR"
Compared to using the "{}" syntax which is supported only by some shells, this one is universal
The simplest way is
local$ scp remote:{A/1,A/2,B/3,C/4}.txt ./
So {.. } list can include directories (A,B and C here are directories; "1.txt" and "2.txt" are file names in those directories).
Although it would copy all these four files into one local directory - not sure if that's what you wanted.
In the above case you will end up remote files A/1.txt, A/2.txt, B/3.txt and C/4.txt copied over to a single local directory, with file names ./1.txt, ./2.txt, ./3.txt and ./4.txt
Problem: Copying multiple directories from remote server to local machine using a single SCP command and retaining each directory as it is in the remote server.
Solution: SCP can do this easily. This solves the annoying problem of entering password multiple times when using SCP with multiple folders. Consequently, this also saves a lot of time!
e.g.
# copies folders t1, t2, t3 from `test` to your local working directory
# note that there shouldn't be any space in between the folder names;
# we also escape the braces.
# please note the dot at the end of the SCP command
~$ cd ~/working/directory
~$ scp -r username#contact.server.de:/work/datasets/images/test/\{t1,t2,t3\} .
PS: Motivated by this great answer: scp or sftp copy multiple files with single command
Based on the comments, this also works fine in Git Bash on Windows
You can do this way:
scp hostname#serverNameOrServerIp:/path/to/files/\\{file1,file2,file3\\}.fileExtension ./
This will download all the listed filenames to whatever local directory you're on.
Make sure not to put spaces between each filename only use a comma ,.
Copy multiple directories:
scp -r dir1 dir2 dir3 admin#127.0.0.1:~/
Is more simple without using scp:
tar cf - file1 ... file_n | ssh user#server 'tar xf -'
This also let you do some things like compress the stream (-C) or (since OpenSSH v7.3) -J to jump any times through one (or more) proxy servers.
Avoid using passwords by coping your public key to ~/.ssh/authorized_keys (on server) with ssh-copy-id (on client).
Posted also here (with more details) and here.
scp remote:"[A-C]/[12].txt" local:
NOTE: I apologize in advance for answering only a portion of the above question. However, I found these commands to be useful for my current unix needs.
Uploading specific files from a local machine to a remote machine:
~/Desktop/dump_files$ scp file1.txt file2.txt lab1.cpp etc.ext your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Uploading an entire directory from a local machine to a remote machine:
~$ scp -r Desktop/dump_files your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Downloading an entire directory from a remote machine to a local machine:
~/Desktop$ scp -r your-user-id#remote.host.edu:Public/web/ Desktop/
In my case, I am restricted to only using the sftp command.
So, I had to use a batchfile with sftp. I created a script such as the following. This assumes you are working in the /tmp directory, and you want to put the files in the destdir_on_remote_system on the remote system. This also only works with a noninteractive login. You need to set up public/private keys so you can login without entering a password. Change as needed.
#!/bin/bash
cd /tmp
# start script with list of files to transfer
ls -1 fileset1* > batchfile1
ls -1 fileset2* >> batchfile1
sed -i -e 's/^/put /' batchfile1
echo "cd destdir_on_remote_system" > batchfile
cat batchfile1 >> batchfile
rm batchfile1
sftp -b batchfile user#host
In the specific case where all the files have the same extension but with different suffix (say number of log file) you use the following:
scp user_name#ip.of.remote.machine:/some/log/folder/some_log_file.* ./
This will copy all files named some_log_file from the given folder within the remote, i.e.- some_log_file.1 , some_log_file.2, some_log_file.3 ....
In my case there were too many files with non related names.
I ended up using,
$ for i in $(ssh remote 'ls ~/dir'); do scp remote:~/dir/$i ./$i; done
1.txt 100% 322KB 1.2MB/s 00:00
2.txt 100% 33KB 460.7KB/s 00:00
3.txt 100% 61KB 572.1KB/s 00:00
$
scp uses ssh for data transfer with the same authentication and provides the same security as ssh.
A best practise here is to implement "SSH KEYS AND PUBLIC KEY AUTHENTICATION". With this, you can write your scripts without worring about authentication. Simple as that.
See WHAT IS SSH-KEYGEN
serverHomeDir='/home/somepath/ftp/'
backupDirAbsolutePath=${serverHomeDir}'_sqldump_'
backupDbName1='2021-08-27-03-56-somesite-latin2.sql'
backupDbName2='2021-08-27-03-56-somesite-latin1.sql'
backupDbName3='2021-08-27-03-56-somesite-utf8.sql'
backupDbName4='2021-08-27-03-56-somesite-utf8mb4.sql'
scp -i ~/.ssh/id_rsa.pub user#server.domain.com:${backupDirAbsolutePath}/"{$backupDbName1,$backupDbName2,$backupDbName3,$backupDbName4}" .
. - at the end will download the files to current dir
-i ~/.ssh/id_rsa.pub - assuming that you established ssh to your server with .pub key
scp -r root#ip-address:/root/dir/ C:\Users\your-name\Downloads\
the -r will let you download all the files inside the dir directory of your remote server
I want to store --password-file option that comes with rsync. I don't want to use ssh public_private key encryption. I have tried this command:
rsync -avz --progress --password-file=pass.txt source destination
This says:
The --password-file option may only be used when accessing an rsync daemon.
So, I tried using:
rsync -avz --progress --password-file=pass.txt source destination rsyncd --daemon
But this return various errors like unknown options. Is my sytanx correct? How do I setup rsync daemon in my Debian machine.
That is correct,
--password-file is only applicable when connecting to a rsync daemon.
You probably haven't set it in the daemon itself though, the password you set and the one you use during that call must match.
Edit /etc/rsyncd.secrets, and set the owner/group of that file to root:root with world reading permissions.
#/etc/rsyncd.secrets
root:YourSecretestPassword
To connect to a rsync daemon, use a double colon followed by the module name, and the file or folder to synchronize (instead of a colon when using SSH),
RSYNC_PASSWORD="YourSecretestPassword"; rsync -rtv user#remotehost::module/source/ destination/
NOTE:
this implies abdicating SSH encryption, though the password itself is not sent across the network in plain text, your data is ...
this is already insecure as is, never as the the same password as any of your users account.
For a better understanding of its inner workings (how to give specific IPs/processes the ability to upload to specified areas of the filesystem without the need for a user account): http://transamrit.net/docs/rsync/
After trying a while, I got this to work. Since Im copying from my live server (and routers data) to my local server in my laptop as backup user no problem with password been unencrypted, its secured wired on my laptop at home. First you need to install sshpass if Centos with yum install sshpass then create a user backup and assign a temp password. I listed the -p option in case your ssh port is different than default.
sshpass -p 'password' rsync -vaurP -e 'ssh -p 2222' backup#???.your.ip.???:/somedir/public_data/temp/ /your/localdata/temp
Understand SSH RSA is a better permanente alternative and all that, but this is a quick alternative to backup and restore on the go. It works if you are not too concern about security but more concern about your data been backup locally as in an emergency o data recovery. Your user backup password you can change it once the backup is completed. Its a lot faster to setup when your servers change IPs, users, and its in constant modifications (as routers change config and non static IPs, also when routers are not local and you are backing up clients servers locally, where you dont have always access to do SSH. Some of my clients dont even have SSH installed and they dont want to hassle with creating public keys. On some servers only where you have access on a temporary basis. By the way, if you want to do the restore, just reverse the case. Dont need change much, from the same command shell you can do it reversing the order of target and source directories, and creating another backup user with same temp password on the target. After finish, you delete the backup user or change its passwords on target and/or source servers. You can protect even further, as I have done, replacing the password for a one line file using a bash script for multi server environment. Alternative is to use the -f option so the password does not show in the bash history -f "/path/to/passwordfile" Regards
NOTE: If you want to update only modified files then you should use this parameters -h -v -r -P -t as described here https://unix.stackexchange.com/questions/67539/how-to-rsync-only-new-files
rsync -arv -e \
"sshpass -f '/your/pass.txt' ssh -o StrictHostKeyChecking=no" \
--progress /your/source id#IP:/your/destination
Maybe you have to install "sshpass" if you not.
Command-line sftp in my Ubuntu doesn't have recursive put implemented. I found some debate from 2004 about implementing such feature with -R option switch. So I see some sort of self-made recursion as only option.
Ie.
iterate through directory listing
cd into directories
mkdir them if nonexistent
put files
I'm planning on doing this with bash, but any other language would suffice.
Rsync or scp is not an option because I don't have shell access to server. Only sftp.
Look at lftp. It's a powerful file transfer client which supports ftp, ftps, http, https, hftp, fish (file transfer over ssh shell session) and sftp. It has ftp-like interactive interface, but also allows to specify all commands at the command line. Look at mput (non recursive but handles glob patterns) and mirror (poor man's rsync) commands.
I use it with a server which only handles sftp uploads like this:
lftp -c "open -u $MYUSER,$MYPASSWORD sftp://$TARGET ; mirror -R $SOME_DIRECTORY"
While I think lftp is the best option if it's available, I got stuck on an ancient install of Cent OS and needed to do a recursive put via SFTP only. Here's what I did:
find dir -type d -exec echo 'mkdir {}' \; | sftp user#host
find dir -type f -exec echo 'put {} {}' \; | sftp user#host
So basically make sure all the directories exist and then send the files over.
The GUI FTP client FileZilla also supports SFTP and also supports uploading and downloading while directories.
my ubuntu 12.04 comes with put -r in sftp
On the command line you can do that by using the putty-tools package.
It comes with a sftp replacement called psftp.
It supports mput -r which copies a local directory to the remote recursively.
I guess you can do this with bash but it's going to be a lot of work. Instead, I suggest to have a look at Python and the Chilkat library.
In Java, you can use edtFTPj/PRO, our commercial product, to transfer recursively via SFTP. Alternatively you might want to consider SCP - that generally supports recursion and runs over SSH.
How about sshfs?
Combined, of course, with cp -r.
Or, failing that, rsync -r by itself.
After lot's of googling and good answers I used Transmit syncing for the job. Not a very good solution, but does the job.
Here is how --
sftp -r <host>
password: <pass>
cd <remote dir> # moves to remote dest dir
put -r localdir/* # creates dir and copies files over
Is it possible to have an SSH session use all your local configuration files (.bash_profile, .vimrc, etc..) on login? That way you would have the same configuration for, say, editing files in vim in the remote session.
I just came across two alternatives to just doing a git clone of your dotfiles. I take no credit for either of these and can't say I've used either extensively so I don't know if there are pitfalls to either of these.
sshrc
sshrc is a tool (actually just a big bash function) that copies over local rc-files without permanently writing them to the remove user's $HOME - the idea being that might be a shared admin account that other people use. Appears to be customizable for different remote hosts as well.
.ssh/config and LocalCommand
This blog post suggests a way to automatically run a command when you login to a remote host. It tars and pipes a set of files to the remote, then un-tars them on the remote's $HOME:
Your local ~/.ssh/config would look like this:
Host *
PermitLocalCommand yes
LocalCommand tar c -C${HOME} .bashrc .bash_profile .exports .aliases .inputrc .vimrc .screenrc \
| ssh -o PermitLocalCommand=no %n "tar mx -C${HOME}"
You could modify the above to only run the command on certain hosts (instead of the * wildcard) or customize for different hosts as well. There might be a fair amount of duplication per host with this method - although you could package the whole tar c ... | ssh .. "tar mx .." into a script maybe.
Note the above looks like it clobbers the same files on the remote when you connect, so use with caution.
Use a dotfiles.git repo
What I do is keep all my config files in a dotfiles.git on a central server.
You can set it up so that when you ssh into a remote machine, you automatically pull the latest version of the dotfiles. I do something like this:
ssh myhost
cd ~/dotfiles
git pull --rebase
cd ~
ln -sf dotfiles/$username/linux/.* .
Note:
To put that in a shell script, you can automate the process of executing commands on a remote machine by piping to ssh.
The "$username" is there so that you can share your config files with other people you're working with.
The "ln -sf" creates symbolic links to all your dotfiles, overwriting any local ones, such that ~/.emacs is linked to the version controlled file ~/dotfiles/$username/.emacs.
The use of a "linux" subdirectory is just to allow for configuration changes across platforms. I also have a mac directory under dotfiles/$username/mac. Most of the files in the /mac directory are symlinked from the linux directory as it's very similar, but there are some exceptions.
Finally, note that you can make this even more sophisticated with hostnames and the like rather than just a generic 'linux'. With a dotfiles.git, you can also raid dotfiles from your friends, which is awesome -- everyone has their own set of little tricks and hacks.
No, because it's not SSH using your config files, but the remote shell.
I suggest keeping your config files in Subversion or some other VCS. Here's how I do it.
Well, no, because as Andy Lester says, the remote machine is the one doing the work, and it has no access back to your local machine to get .vimrc ...
On the other hand, you could use sshfs to mount the remote file system locally and edit the files locally. This doesn't require you to install anything on the remote machine. Not sure how efficient it is, maybe not great for editing big files over slow links.
Or Komodo IDE has a neat "Open >> Remote File" option which lets you edit files on remote
machines by scping them back and forth automatically.
I do this kind of things every day. I have about 15 bash rc files and .vimrc, a few vim plugin scripts, .screenrc and some other rc files. I have a sync script (written in bash) which uses the cool rsync command to sync all these files to remote servers. Every time I update some files on my main server, I would call the script to sync them to remote servers.
Setting up a svn/git/hg repository on the main server also works for me but my remote servers need to be repeatedly reinstalled for testing. So I find it's more convenient to use rsync.
A few years ago I also used the rdist tool which can also meet the requirement for most of the time. But now I prefer rsync as it supports incremental sync which is very efficient.
ssh can be configured to pass certain environment variables through to the other (remote side). And since most shells will check some environment variables for additional settings to apply, you can hack that into applying some local settings remotely. But its a bit complicated and most administrators turn off the ssh environment variable pass-through in the sshd config anyways.
You could always just copy the files to the machine before connecting with ssh:
#!/bin/bash
scp ~/.bash_profile ~/.vimrc user#host:
ssh user#host
This works best if you are using keys to login and no one else logs in as that user.
Here's a simple bash script I've used for this purpose. It syncs over some folders I like to have copied over using rsync and then adds the ~/bin folder to the remote machines .bashrc if it's not there already. It works best if you have have copied your ssh keys to each server. I use this approach instead of a "dotfiles repo" as lots of the servers I connect to don't have git on them.
So to use it, you'd do something like this:
./bin_sync_to_machine.sh server1
bin_sync_to_machine.sh
function show_help()
{
echo ""
echo "usage: SERVER {SERVER2 SERVER3 etc...}"
echo ""
exit
}
if [ "$1" == "help" ]
then
show_help
fi
if [ -z "$1" ]
then
show_help
fi
# Sync ~/bin and some dot files to remote server using rsync
for SERVER in $*; do
rsync -avrz --progress ~/bin/ -e ssh $SERVER:~/bin
rsync -avrz --progress ~/.vim/ -e ssh $SERVER:~/.vim
rsync -avrz --progress ~/.vimrc -e ssh $SERVER:~/.vimrc
rsync -avrz --progress ~/.aliases $SERVER:~/.aliases
rsync -avrz --progress ~/.aliases $SERVER:~/.bash_aliases
# Ensure remote server has ~/bin in the path
ssh $SERVER '~/bin/path_add_to_path.sh'
done
path_add_to_path.sh
pathadd() {
if [ -d "$1" ] && [[ ":$PATH:" != *":$1:"* ]]; then
PATH="${PATH:+"$PATH:"}$1"
fi
}
# Add to current path if running in a shell
pathadd ~/bin
# Add to ~/.bashrc
if ! grep -q PATH:~/bin ~/.bashrc; then
echo "PATH=\$PATH:~/bin" >> ~/.bashrc
fi
if ! grep -q source ~/.aliases ~/.bashrc; then
echo "source ~/.aliases" >> ~/.bashrc
fi
I wrote an extremely simple tool for this that will allow you to natively transport your .vimrc file whenever you ssh, by using SSHd built-in config options in a non-standard way.
No additional svn,scp,copy/paste, etc required.
It is simple, lightweight, and works by default on all server configurations I have tested so far.
https://github.com/gWOLF3/viSSHous
I think that https://github.com/fsquillace/kyrat does what you need.
I wrote it long time ago before sshrc was born and it has more benefits compared to sshrc:
It does not require dependencies on xxd for both hosts (which can be unavailable on remote host)
Kyrat uses a more efficient encoding algorithm
It is just ~20 lines of code (really easy to understand!)
No need of root access or any installations to the remote host
For instance:
$> echo "alias q=exit" > ~/.config/kyrat/bashrc
$> kyrat myuser#myserver.com
myserver.com $> q
exit