I need to transfer the Files from remote Linux server to directly HDFS.
I have keytab placed on remote server , after kinit command its activated however i cannot browse the HDFS folders. I know from edge nodes i can directly copy files to HDFS however i need to skip the edge node and directly transfer the files to HDFS.
how can we achieve this.
Let's assume a couple of things first. You have one machine on which the external hard drive is mounted (named DISK) and one cluster of machines with an ssh access to the master (we denote by master in the command line the user#hostname part of the master machine). You run the script on the machine with the drive. The data on the drive consists of multiple directories with multiple files in each (like a 100); the numbers don't matter, it's just to justify the loops. The path to the data will be stored in the ${DIR} variable (on Linux, it would be /media/DISK and on Mac OS X /Volumes/DISK). Here is what the script looks like:
DIR=/Volumes/DISK;
for d in $(ls ${DIR}/);
do
for f in $(ls ${DIR}/${d}/);
do
cat ${DIR}/${d}/${f} | ssh master "hadoop fs -put - /path/on/hdfs/${d}/${f}";
done;
done;
Note that we go over each file and we copy it into a specific file because the HDFS API for put requires that "when source is stdin, destination must be a file."
Unfortunately, it takes forever. When I came back the next morning, it only did a fifth of the data (100GB) and was still running... Basically taking 20 minutes per directory! I ended up going forward with the solution of copying the data temporarily on one of the machines and then copying it locally to HDFS. For space reason, I did it one folder at a time and then deleting the temporarily folder immediately after. Here is what the script looks like:
DIR=/Volumes/DISK;
PTH=/path/on/one/machine/of/the/cluster;
for d in $(ls ${DIR}/);
do
scp -r -q ${DIR}/${d} master:${PTH}/
ssh master "hadoop fs -copyFromLocal ${PTH}/${d} /path/on/hdfs/";
ssh master "rm -rf ${PTH}/${d}";
done;
Hope it helps!
I am able to copy a file from remote Perforce server to my local machine using this command
p4 print -o <output_filename> <remote_filename>
However, I have two issues here.
First, I have to manually specify my output filename.
Second, I cannot print out all files and sub-directories within a directory.
What I want to do is coping a specific directory recursively from Perforce server to my local machine directory without creating a workspace on my local machine.
Easy answer: Discard the requirement to not create a workspace on your local machine, and create a workspace on your local machine. It is way easier and faster to create a workspace, sync -p, and delete the workspace than it is to re-implement p4 sync.
p4 --field "View=//depot/path/... //TEMPCLIENT/..." --field "Root=C:\output_path" client -o TEMPCLIENT | p4 client -i
p4 -c TEMPCLIENT sync -p
p4 client -d TEMPCLIENT
Hard answer: Implement your own view mapping logic. Use p4 files //depot/path/... to get the list of files, then use a regex to map each depot file into a local file, then run a p4 print command for each pair. It'd look something like:
p4 -F %depotFile% files //depot/path/... | (some gnarly piece of awk/sed/perl that takes "depotFile" as input and produces "print -o localFile depotFile" as output) | p4 -x - run
I'd like to copy files from/to remote server in different directories.
For example, I want to run these 4 commands at once.
scp remote:A/1.txt local:A/1.txt
scp remote:A/2.txt local:A/2.txt
scp remote:B/1.txt local:B/1.txt
scp remote:C/1.txt local:C/1.txt
What is the easiest way to do that?
Copy multiple files from remote to local:
$ scp your_username#remote.edu:/some/remote/directory/\{a,b,c\} ./
Copy multiple files from local to remote:
$ scp foo.txt bar.txt your_username#remotehost.edu:~
$ scp {foo,bar}.txt your_username#remotehost.edu:~
$ scp *.txt your_username#remotehost.edu:~
Copy multiple files from remote to remote:
$ scp your_username#remote1.edu:/some/remote/directory/foobar.txt \
your_username#remote2.edu:/some/remote/directory/
Source: http://www.hypexr.org/linux_scp_help.php
From local to server:
scp file1.txt file2.sh username#ip.of.server.copyto:~/pathtoupload
From server to local (up to OpenSSH v9.0):
scp -T username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From server to local (OpenSSH v9.0+):
scp -OT username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From man 1 scp:
-O Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the
SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for
particular filename wildcard patterns and for expanding paths with a ‘~’ prefix for older SFTP
servers.
HISTORY
Since OpenSSH 9.0, scp has used the SFTP protocol for transfers by default.
You can copy whole directories with using -r switch so if you can isolate your files into own directory, you can copy everything at once.
scp -r ./dir-with-files user#remote-server:upload-path
scp -r user#remote-server:path-to-dir-with-files download-path
so for instance
scp -r root#192.168.1.100:/var/log ~/backup-logs
Or if there is just few of them, you can use:
scp 1.txt 2.txt 3.log user#remote-server:upload-path
As Jiri mentioned, you can use scp -r user#host:/some/remote/path /some/local/path to copy files recursively. This assumes that there's a single directory containing all of the files you want to transfer (and nothing else).
However, SFTP provides an alternative if you want to transfer files from multiple different directories, and the destinations are not identical:
sftp user#host << EOF
get /some/remote/path1/file1 /some/local/path1/file1
get /some/remote/path2/file2 /some/local/path2/file2
get /some/remote/path3/file3 /some/local/path3/file3
EOF
This uses the "here doc" syntax to define a sequence of SFTP input commands. As an alternative, you could put the SFTP commands into a text file and execute sftp user#host -b batchFile.txt
The answers with {file1,file2,file3} works only with bash (on remote or locally)
The real way is :
scp user#remote:'/path1/file1 /path2/file2 /path3/file3' /localPath
After playing with scp for a while I have found the most robust solution:
(Beware of the single and double quotation marks)
Local to remote:
scp -r "FILE1" "FILE2" HOST:'"DIR"'
Remote to local:
scp -r HOST:'"FILE1" "FILE2"' "DIR"
Notice that whatever after "HOST:" will be sent to the remote and parsed there. So we must make sure they are not processed by the local shell. That is why single quotation marks come in. The double quotation marks are used to handle spaces in the file names.
If files are all in the same directory, we can use * to match them all, such as
scp -r "DIR_IN"/*.txt HOST:'"DIR"'
scp -r HOST:'"DIR_IN"/*.txt' "DIR"
Compared to using the "{}" syntax which is supported only by some shells, this one is universal
The simplest way is
local$ scp remote:{A/1,A/2,B/3,C/4}.txt ./
So {.. } list can include directories (A,B and C here are directories; "1.txt" and "2.txt" are file names in those directories).
Although it would copy all these four files into one local directory - not sure if that's what you wanted.
In the above case you will end up remote files A/1.txt, A/2.txt, B/3.txt and C/4.txt copied over to a single local directory, with file names ./1.txt, ./2.txt, ./3.txt and ./4.txt
Problem: Copying multiple directories from remote server to local machine using a single SCP command and retaining each directory as it is in the remote server.
Solution: SCP can do this easily. This solves the annoying problem of entering password multiple times when using SCP with multiple folders. Consequently, this also saves a lot of time!
e.g.
# copies folders t1, t2, t3 from `test` to your local working directory
# note that there shouldn't be any space in between the folder names;
# we also escape the braces.
# please note the dot at the end of the SCP command
~$ cd ~/working/directory
~$ scp -r username#contact.server.de:/work/datasets/images/test/\{t1,t2,t3\} .
PS: Motivated by this great answer: scp or sftp copy multiple files with single command
Based on the comments, this also works fine in Git Bash on Windows
You can do this way:
scp hostname#serverNameOrServerIp:/path/to/files/\\{file1,file2,file3\\}.fileExtension ./
This will download all the listed filenames to whatever local directory you're on.
Make sure not to put spaces between each filename only use a comma ,.
Copy multiple directories:
scp -r dir1 dir2 dir3 admin#127.0.0.1:~/
Is more simple without using scp:
tar cf - file1 ... file_n | ssh user#server 'tar xf -'
This also let you do some things like compress the stream (-C) or (since OpenSSH v7.3) -J to jump any times through one (or more) proxy servers.
Avoid using passwords by coping your public key to ~/.ssh/authorized_keys (on server) with ssh-copy-id (on client).
Posted also here (with more details) and here.
scp remote:"[A-C]/[12].txt" local:
NOTE: I apologize in advance for answering only a portion of the above question. However, I found these commands to be useful for my current unix needs.
Uploading specific files from a local machine to a remote machine:
~/Desktop/dump_files$ scp file1.txt file2.txt lab1.cpp etc.ext your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Uploading an entire directory from a local machine to a remote machine:
~$ scp -r Desktop/dump_files your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Downloading an entire directory from a remote machine to a local machine:
~/Desktop$ scp -r your-user-id#remote.host.edu:Public/web/ Desktop/
In my case, I am restricted to only using the sftp command.
So, I had to use a batchfile with sftp. I created a script such as the following. This assumes you are working in the /tmp directory, and you want to put the files in the destdir_on_remote_system on the remote system. This also only works with a noninteractive login. You need to set up public/private keys so you can login without entering a password. Change as needed.
#!/bin/bash
cd /tmp
# start script with list of files to transfer
ls -1 fileset1* > batchfile1
ls -1 fileset2* >> batchfile1
sed -i -e 's/^/put /' batchfile1
echo "cd destdir_on_remote_system" > batchfile
cat batchfile1 >> batchfile
rm batchfile1
sftp -b batchfile user#host
In the specific case where all the files have the same extension but with different suffix (say number of log file) you use the following:
scp user_name#ip.of.remote.machine:/some/log/folder/some_log_file.* ./
This will copy all files named some_log_file from the given folder within the remote, i.e.- some_log_file.1 , some_log_file.2, some_log_file.3 ....
In my case there were too many files with non related names.
I ended up using,
$ for i in $(ssh remote 'ls ~/dir'); do scp remote:~/dir/$i ./$i; done
1.txt 100% 322KB 1.2MB/s 00:00
2.txt 100% 33KB 460.7KB/s 00:00
3.txt 100% 61KB 572.1KB/s 00:00
$
scp uses ssh for data transfer with the same authentication and provides the same security as ssh.
A best practise here is to implement "SSH KEYS AND PUBLIC KEY AUTHENTICATION". With this, you can write your scripts without worring about authentication. Simple as that.
See WHAT IS SSH-KEYGEN
serverHomeDir='/home/somepath/ftp/'
backupDirAbsolutePath=${serverHomeDir}'_sqldump_'
backupDbName1='2021-08-27-03-56-somesite-latin2.sql'
backupDbName2='2021-08-27-03-56-somesite-latin1.sql'
backupDbName3='2021-08-27-03-56-somesite-utf8.sql'
backupDbName4='2021-08-27-03-56-somesite-utf8mb4.sql'
scp -i ~/.ssh/id_rsa.pub user#server.domain.com:${backupDirAbsolutePath}/"{$backupDbName1,$backupDbName2,$backupDbName3,$backupDbName4}" .
. - at the end will download the files to current dir
-i ~/.ssh/id_rsa.pub - assuming that you established ssh to your server with .pub key
scp -r root#ip-address:/root/dir/ C:\Users\your-name\Downloads\
the -r will let you download all the files inside the dir directory of your remote server
I am writing a shell script to put data into hadoop as soon as they are generated. I can ssh to my master node, copy the files to a folder over there and then put them into hadoop. I am looking for a shell command to get rid of copying the file to the local disk on master node. to better explain what I need, here below you can find what I have so far:
1) copy the file to the master node's local disk:
scp test.txt username#masternode:/folderName/
I have already setup SSH connection using keys. So no password is needed to do this.
2) I can use ssh to remotely execute the hadoop put command:
ssh username#masternode "hadoop dfs -put /folderName/test.txt hadoopFolderName/"
what I am looking for is how to pipe/combine these two steps into one and skip the local copy of the file on masterNode's local disk.
thanks
In other words, I want to pipe several command in a way that I can
Try this (untested):
cat test.txt | ssh username#masternode "hadoop dfs -put - hadoopFoldername/test.txt"
I've used similar tricks to copy directories around:
tar cf - . | ssh remote "(cd /destination && tar xvf -)"
This sends the output of local-tar into the input of remote-tar.
The node where you have generated the data on, is this able to reach each of your cluster nodes (the name node and all the datanodes).
If you do have data connectivity then you can just execute the hadoop fs -put command from the machine where the data is generated (assuming you have the hadoop binaries installed there too):
#> hadoop fs -fs masternode:8020 -put test.bin hadoopFolderName/
Hadoop provides a couple of REST interfaces. Check Hoop and WebHDFS. You should be able to copy the file without copying the file to the master using them from non-Hadoop environments.
Create pipe and then using pipe do the transfer. In this way file is not stored locally.
mkfifo transfer_pipe
scp remote_file transfer_pipe| hdfs dfs -put transfer_pipe <hdfs_path>
(untested)
Since the node where you create your data has access to internet, then perhaps you could install hadoop client node software, then add it to the cluster - after normal hadoop fs -put, then disconnect and remove your temporary node - the hadoop system should then automatically make replication of your files blocks inside your hadoop cluster