rsync : how to copy only latest file from target to source - unix

We have a main Linux server, say M, where we have files like below (for 2 months, and new files arriving daily)
Folder1
PROCESS1_20211117.txt.gz
PROCESS1_20211118.txt.gz
..
..
PROCESS1_20220114.txt.gz
PROCESS1_20220115.txt.gz
We want to copy only the latest file on our processing server, say P.
So as of now, we were using the below command, on our processing server.
rsync --ignore-existing -azvh -rpgoDe ssh user#M:${TargetServerPath}/${PROCSS_NAME}_*txt.gz ${SourceServerPath}
This process worked fine until now, but from now, in the processing server, we can keep files only up to 3 days. However, in our main server, we can keep files for 2 months.
So when we remove older files from the processing server, the rsync command copies all files from main server to the processing server.
How can I change rsync command to copy only latest file from Main server?
*Note: the example above is only for one file. We have multiple files on which we have to use the same command. Hence we cannot hardcode any filename.
What I tried:
There are multiple solutions, but all seems to be when I want to copy latest file from the server I am running rsync on, not on the remote server.
Also I tried running below to get the latest file from main server, but I cannot pass variable to SSH in my company, as it is not allowed. So below command works if I pass individual path/file name, but cannot work as with variables.
ssh M 'ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz|tail -1'
Would really appreciate any suggestions on how to implement this solution.
OS: Linux 3.10.0-1160.31.1.el7.x86_64

ssh quoting is confusing - to properly quote it, you have to double-quote it locally.
Handy printf %q trick is helpful - quote the relevant parts.
file=$(
ssh M "ls -1 $(printf "%q" "${getServerPath}/${PROCSS_NAME}")_*.txt.gz" |
tail -1
)
rsync --ignore-existing -azvh -rpgoDe ssh user#M:"$file" "${SourceServerPath}"
or maybe nicer to run tail -n1 on the remote, so that minimum amount of data are transferred (we only need one filename, not them all), invoke explicit shell and pass the variables as shell arguments:
file=$(ssh M "$(printf "%q " bash -c \
'ls -1 "$1"_*.txt.gz | tail -n1'
'_' "${TargetServerPath}/${PROCSS_NAME}"
)")
Overall, I recommend doing a function and using declare -f :
sshqfunc() { echo "bash -c $(printf "%q" "$(declare -f "$1"); $1 \"\$#\"")"; };
work() {
ls -1 "$1"_*txt.gz | tail -1
}
tmp=$(ssh M "$(sshqfunc work)" _ "${TargetServerPath}/${PROCSS_NAME}")
or you can also use the mighty declare to transfer variables to remote - then run your command inside single quotes:
ssh M "
$(declare -p TargetServerPath PROCSS_NAME);
"'
ls -1 ${TargetServerPath}/${PROCSS_NAME}_*txt.gz | tail -1
'

Related

SCP issue with multiple files - UNIX

Getting error in copying multiple files. Below command is copying only first file and giving error for rest of the files. Can someone please help me out.
Command:
scp $host:$(ssh -n $host "find /incoming -mmin -120 -name 2018*") /incoming/
Result:
user#host:~/scripts/OTA$ scp $host:$(ssh -n $host "find /incoming -mmin -120 -name 2018*") /incoming/
Password:
Password:
2018084session_event 100% |**********************************************************************************************************| 9765 KB 00:00
cp: cannot access /incoming/2018084session_event_log.195-10.45.40.9
cp: cannot access /incoming/2018084session_event_log.195-10.45.40.9_2_3
Your command uses Command Substitution to generate a list of files. Your assumption is that there is some magic in the "source" notation for scp that would cause multiple members of the list generated by your find command to be assumed to live on $host, when in fact your command might expand into something like:
scp remotehost:/incoming/someoldfile anotheroldfile /incoming
Only the first file is being copied from $host, because none of the rest include $host: at the beginning of the path. They're not found in your local /incoming directory, hence the error.
Oh, and in addition, you haven't escape the asterisk in the find command, so 2018* may expand to multiple files that are in the login directory for the user in question. I can't tell from here, it depends on your OS and shell configuration.
I should point out that you are providing yet another example of the classic Parsing LS problem. Special characters WILL break your command. The "better" solution usually offered for this problem tends to be to use a for loop, but that's not really what you're looking for. Instead, I'd recommend making a tar of the files you're looking for. Something like this might do:
ssh "$host" "find /incoming -mmin -120 -name 2018\* -exec tar -cf - {} \+" |
tar -xvf - -C /incoming
What does this do?
ssh runs a remote find command with your criteria.
find feeds the list of filenames (regardless of special characters) to a tar command as options.
The tar command sends its result to stdout (-f -).
That output is then piped into another tar running on your local machine, which extracts the stream.
If your tar doesn't support -C, you can either remove it and run a cd /incoming before the ssh, or you might be able to replace that pipe segment with a curly-braced command: { cd /incoming && tar -xvf -; }
The curly brace notation assumes a POSIX-like shell (bash, zsh, etc). The rest of this should probably work equally well in csh if that's what you're stuck with.
Limited warranty: Best Effort Only. Untested on animals or computers. Your milage may vary. May contain nuts.
If this doesn't work for you, poke at it until it does.

Run multiple instances of RStudio in a web browser

I have RStudio server installed on a remote aws server (ubuntu) and want to run several projects at the same time (one of which takes lots of time to finish). On Windows there is a simple GUI solution like 'Open Project in New Window'. Is there something similar for rstudio server?
Simple question, but failed to find a solution except this related question for Macs, which offers
Run multiple rstudio sessions using projects
but how?
While running batch scripts is certainly a good option, it's not the only solution. Sometimes you may still want interactive use in different sessions rather than having to do everything as batch scripts.
Nothing stops you from running multiple instances of RStudio server on your Ubuntu server on different ports. (I find this particularly easy to do by launching RStudio through docker, as outlined here. Because an instance will keep running even when you close the browser window, you can easily launch several instances and switch between them. You'll just have to login again when you switch.
Unfortunately, RStudio-server still prevents you having multiple instances open in the browser at the same time (see the help forum). This is not a big issue as you just have to log in again, but you can work around it by using different browsers.
EDIT: Multiple instances are fine, as long as they are not on the same browser, same browser-user AND on the same IP address. e.g. a session on 127.0.0.1 and another on 0.0.0.0 would be fine. More importantly, the instances keep on running even if they are not 'open', so this really isn't a problem. The only thing to note about this is you would have to log back in to access the instance.
As for projects, you'll see you can switch between projects using the 'projects' button on the top right, but while this will preserve your other sessions I do not think the it actually supports simultaneous code execution. You need multiple instances of the R environment running to actually do that.
UPDATE 2020 Okay, it's now 2020 and there's lots of ways to do this.
For running scripts or functions in a new R environment, check out:
the callr package
The RStudio jobs panel
Run new R sessions or scripts from one or more terminal sessions in the RStudio terminal panel
Log out and log in to the RStudio-server as a different user (requires multiple users to be set up in the container, obviously not a good workflow for a single user but just noting that many different users can access the same RStudio server instance no problem.
Of course, spinning up multiple docker sessions on different ports is still a good option as well. Note that many of the ways listed above still do not allow you to restart the main R session, which prevents you from reloading installed packages, switching between projects, etc, which is clearly not ideal. I think it would be fantastic if switching between projects in an RStudio (server) session would allow jobs in the previously active project to keep running in the background, but have no idea if that's in the cards for the open source version.
Often you don't need several instances of Rstudio - in this case just save your code in .R file and launch it using ubuntu command prompt (maybe using screen)
Rscript script.R
That will launch a separate R session which will do the work without freezing your Rstudio. You can pass arguments too, for example
# script.R -
args <- commandArgs(trailingOnly = TRUE)
if (length(args) == 0) {
start = '2015-08-01'
} else {
start = args[1]
}
console -
Rscript script.R 2015-11-01
I think you need R Studio Server Pro to be able to log in with multiple users/sessions.
You can see the comparison table below for reference.
https://www.rstudio.com/products/rstudio-server-pro/
Installing another instance of rstudio server is less than ideal.
Linux server admins, fear not. You just need root access or a kind admin.
Create a group to use: groupadd Rwarrior
Create an additional user with same home directory as your primary Rstudio login:
useradd -d /home/user1 user2
Add primary and new user into Rwarrior group:
gpasswd -a user2 Rwarrior
gpasswd -a user1 Rwarrior
Take care of the permissions for your primary home directory:
cd /home
chown -R user1:Rwarrior /home/user1
chmod -R 770 /home/user1
chmod g+s /home/user1
Set password for the new user:
passwd user2
Open a new browser window in incognito/private browsing mode and login to Rstudio with the new user you created. Enjoy.
I run multiple RStudio servers by isolating them in Singularity instances. Download the Singularity image with the command singularity pull shub://nickjer/singularity-rstudio
I use two scripts:
run-rserver.sh:
Find a free port
#!/bin/env bash
set -ue
thisdir="$(dirname "${BASH_SOURCE[0]}")"
# Return 0 if the port $1 is free, else return 1
is_port_free(){
port="$1"
set +e
netstat -an |
grep --color=none "^tcp.*LISTEN\s*$" | \
awk '{gsub("^.*:","",$4);print $4}' | \
grep -q "^$port\$"
r="$?"
set -e
if [ "$r" = 0 ]; then return 1; else return 0; fi
}
# Find a free port
find_free_port(){
local lower_port="$1"
local upper_port="$2"
for ((port=lower_port; port <= upper_port; port++)); do
if is_port_free "$port"; then r=free; else r=used; fi
if [ "$r" = "used" -a "$port" = "$upper_port" ]; then
echo "Ports $lower_port to $upper_port are all in use" >&2
exit 1
fi
if [ "$r" = "free" ]; then break; fi
done
echo $port
}
port=$(find_free_port 8080 8200)
echo "Access RStudio Server on http://localhost:$port" >&2
"$thisdir/cexec" \
rserver \
--www-address 127.0.0.1 \
--www-port $port
cexec:
Create a dedicated config directory for each instance
Create a dedicated temporary directory for each instance
Use the singularity instance mechanism to avoid that forked R sessions are adopted by PID 1 and stay around after the rserver has shut down. Instead, they become children of the Singularity instance and are killed when that shuts down.
Map the current directory to the directory /data inside the container and set that as home folder (this step might not be nessecary if you don't care about reproducible paths on every machine)
#!/usr/bin/env bash
# Execute a command in the container
set -ue
if [ "${1-}" = "--help" ]; then
echo <<EOF
Usage: cexec command [args...]
Execute `command` in the container. This script starts the Singularity
container and executes the given command therein. The project root is mapped
to the folder `/data` inside the container. Moreover, a temporary directory
is provided at `/tmp` that is removed after the end of the script.
EOF
exit 0
fi
thisdir="$(dirname "${BASH_SOURCE[0]}")"
container="rserver_200403.sif"
# Create a temporary directory
tmpdir="$(mktemp -d -t cexec-XXXXXXXX)"
# We delete this directory afterwards, so its important that $tmpdir
# really has the path to an empty, temporary dir, and nothing else!
# (for example empty string or home dir)
if [[ ! "$tmpdir" || ! -d "$tmpdir" ]]; then
echo "Error: Could not create temp dir $tmpdir"
exit 1
fi
# check if temp dir is empty (this might be superfluous, see
# https://codereview.stackexchange.com/questions/238439)
tmpcontent="$(ls -A "$tmpdir")"
if [ ! -z "$tmpcontent" ]; then
echo "Error: Temp dir '$tmpdir' is not empty"
exit 1
fi
# Start Singularity instance
instancename="$(basename "$tmpdir")"
# Maybe also superfluous (like above)
rundir="$(readlink -f "$thisdir/.run/$instancename")"
if [ -e "$rundir" ]; then
echo "Error: Runtime directory '$rundir' exists already!" >&2
exit 1
fi
mkdir -p "$rundir"
singularity instance start \
--contain \
-W "$tmpdir" \
-H "$thisdir:/data" \
-B "$rundir:/data/.rstudio" \
-B "$thisdir/.rstudio/monitored/user-settings:/data/.rstudio/monitored/user-settings" \
"$container" \
"$instancename"
# Delete the temporary directory after the end of the script
trap "singularity instance stop '$instancename'; rm -rf '$tmpdir'; rm -rf '$rundir'" EXIT
singularity exec \
--pwd "/data" \
"instance://$instancename" \
"$#"

Using file locks with rsync

From the rsync manual documentation I see that by using the option rsync-path, it is possible to specify what program is to be run on the remote machine to start up rsync. In particular, the program could be a wrapper script which calls the actual rsync command in the middle, but which does some actions before and/or after the rsync invocation. One possible interesting use would be to acquire/release a lock (e.g., a flock), so that the operations of rsync at the remote end could be co-ordinated with another process at the far end which is contending for write access to the same files. There could be multiple rsync processes simultaneously holding the shared lock (I am aware of potential for starvation but am not concerned about that right now). The 'writer' process I'm dealing with would just be changing a few hard-links, so it would not block the rsync process for any significant lengh of time.
I have looked at other co-ordination approaches, e.g., implementing a custom remote locking protocol between the client and server, but they all involve more development work and/or are unsatisfactory for other reasons, which is why I am interested in the wrapper/(f)lock approach.
My questions are:
1) Is this a reasonable way to solve the problem of co-ordinating rsync 'readers' with another, 'writer' process accessing the same directory?
2) Can you also put a wrapper around rsync when using the inetd (or xinetd) daemon approach to running rsync, by adding a line something like the following to /etc/inetd.conf (as per the rsyncd.conf man page):
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
but replacing /usr/bin/rsync with the path to your rsync-lookalike wrapper, which in this case would be a C/C++ -code program which seizes a lock, forks off rsync, waits for rsync to complete, then releases the lock.
Thanks,
Tom
One potential catch with the wrapper approach: the remote process seems to be called with extra arguments, which are appended to whatever command line you specify with --rsync-path. So if you need to pass arguments something like the following style is needed.
#! /bin/sh
lock_target=$1
shift
if ! lockfile ${lock_target}.lock ; then exit 1 ; fi
trap "rm -f ${lock_target}.lock" EXIT HUP TERM INT
/usr/bin/rsync "$#"
Thanks to the question and the comments. Armed with your ideas I solved it (for me) using --rsync-path but without any wrapper scrips on the remote host, simply by putting all payload script into --rsync-path, with a few tricks.
This particular example uses rsync to pull data from remote host while holding a flock on the remote host, e.g. remote host dumps data periodically while also holding a flock, so dump and pull must not be interleaved.
Points to note
rsync will append its arguments to the end of whatever command you specify in "--rsync-path", so command needs to cope with that, and for that I rely on bash shell features on both pulling and remote hosts.
any pre and post processing on remote host must not write to STDOUT because that will corrupt rsync protocol and rsync will bail. Any error output should go to STDERR and it will turn up on pulling host as rsync STDERR output. This is why '1>&2' in all the error handling.
this probably relies on remote command spawned by rsync to run by bash because I think the good old sh does not support arrays. This works for me between RHEL7 boxes. Possible work around proposed at the end.
With that in mind, here is my simplified concept only rehash (I've not run this particular script, my full solution has extra layers that distract attention from the main point).
The script on the pulling host:
#!/bin/bash
function rsync_wrap() {
{
flock --exclusive --timeout ${LOCK_TIMEOUT} 100 || {
echo "Failed to lock: ${LOCK_TIMEOUT}" 1>&2
return 1
}
# call real rsync with original arguments
rsync "$#"
exit_code=$?
if [ ${exit_code} -eq 0 ]; then
# Do clean up when success
# rm -f "${LOCK_FILE}"
# rm -rf /eg/purge/data
else
# Do clean up when failed
fi
# Note, return is important, do not let it fall out
return ${exit_code}
} 100<"${LOCK_FILE}"
echo "Failed to open lock file: ${LOCK_FILE}" 1>&2
return 1
}
# Define vars
LOCK_FILE=/var/somedir/name.lock; # or /dev/shm/name.lock
LOCK_TIMEOUT=600; #in seconds
# Build remote command, define vars and functions inside the command
remote_cmd="
# this approach deals with crazy chars in variables and function code
$( declare -p LOCK_FILE )
$( declare -p LOCK_TIMEOUT )
$( declare -f rsync_wrap )
rsync_wrap "
local_cmd=(
rsync
-a
--rsync-path="${remote_cmd}"
# I want to handle network timeouts in SSH, not in rsync,
# because rsync does not know that waiting for lock is expected
-e "ssh -o BatchMode=yes -o ServerAliveCountMax=3 -o ServerAliveInterval=30 ${IDENTITY_FILE:+ -i '${IDENTITY_FILE}'}"
/remote/source/path
/local/destination/path/
)
# Do it
"${local_cmd[#]}"
If remote side executes --rsync-path in something other than bash then maybe the whole remote command could be wrapped in something like:
local_cmd="bash -c '${local_cmd//\'/\'\\\'\'}'"
As per comments to the original post, it is indeed feasible to use wrapper approach to implement (f)locks around rsync at the server end.

scp or sftp copy multiple files with single command

I'd like to copy files from/to remote server in different directories.
For example, I want to run these 4 commands at once.
scp remote:A/1.txt local:A/1.txt
scp remote:A/2.txt local:A/2.txt
scp remote:B/1.txt local:B/1.txt
scp remote:C/1.txt local:C/1.txt
What is the easiest way to do that?
Copy multiple files from remote to local:
$ scp your_username#remote.edu:/some/remote/directory/\{a,b,c\} ./
Copy multiple files from local to remote:
$ scp foo.txt bar.txt your_username#remotehost.edu:~
$ scp {foo,bar}.txt your_username#remotehost.edu:~
$ scp *.txt your_username#remotehost.edu:~
Copy multiple files from remote to remote:
$ scp your_username#remote1.edu:/some/remote/directory/foobar.txt \
your_username#remote2.edu:/some/remote/directory/
Source: http://www.hypexr.org/linux_scp_help.php
From local to server:
scp file1.txt file2.sh username#ip.of.server.copyto:~/pathtoupload
From server to local (up to OpenSSH v9.0):
scp -T username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From server to local (OpenSSH v9.0+):
scp -OT username#ip.of.server.copyfrom:"file1.txt file2.txt" "~/yourpathtocopy"
From man 1 scp:
-O Use the legacy SCP protocol for file transfers instead of the SFTP protocol. Forcing the use of the
SCP protocol may be necessary for servers that do not implement SFTP, for backwards-compatibility for
particular filename wildcard patterns and for expanding paths with a ‘~’ prefix for older SFTP
servers.
HISTORY
Since OpenSSH 9.0, scp has used the SFTP protocol for transfers by default.
You can copy whole directories with using -r switch so if you can isolate your files into own directory, you can copy everything at once.
scp -r ./dir-with-files user#remote-server:upload-path
scp -r user#remote-server:path-to-dir-with-files download-path
so for instance
scp -r root#192.168.1.100:/var/log ~/backup-logs
Or if there is just few of them, you can use:
scp 1.txt 2.txt 3.log user#remote-server:upload-path
As Jiri mentioned, you can use scp -r user#host:/some/remote/path /some/local/path to copy files recursively. This assumes that there's a single directory containing all of the files you want to transfer (and nothing else).
However, SFTP provides an alternative if you want to transfer files from multiple different directories, and the destinations are not identical:
sftp user#host << EOF
get /some/remote/path1/file1 /some/local/path1/file1
get /some/remote/path2/file2 /some/local/path2/file2
get /some/remote/path3/file3 /some/local/path3/file3
EOF
This uses the "here doc" syntax to define a sequence of SFTP input commands. As an alternative, you could put the SFTP commands into a text file and execute sftp user#host -b batchFile.txt
The answers with {file1,file2,file3} works only with bash (on remote or locally)
The real way is :
scp user#remote:'/path1/file1 /path2/file2 /path3/file3' /localPath
After playing with scp for a while I have found the most robust solution:
(Beware of the single and double quotation marks)
Local to remote:
scp -r "FILE1" "FILE2" HOST:'"DIR"'
Remote to local:
scp -r HOST:'"FILE1" "FILE2"' "DIR"
Notice that whatever after "HOST:" will be sent to the remote and parsed there. So we must make sure they are not processed by the local shell. That is why single quotation marks come in. The double quotation marks are used to handle spaces in the file names.
If files are all in the same directory, we can use * to match them all, such as
scp -r "DIR_IN"/*.txt HOST:'"DIR"'
scp -r HOST:'"DIR_IN"/*.txt' "DIR"
Compared to using the "{}" syntax which is supported only by some shells, this one is universal
The simplest way is
local$ scp remote:{A/1,A/2,B/3,C/4}.txt ./
So {.. } list can include directories (A,B and C here are directories; "1.txt" and "2.txt" are file names in those directories).
Although it would copy all these four files into one local directory - not sure if that's what you wanted.
In the above case you will end up remote files A/1.txt, A/2.txt, B/3.txt and C/4.txt copied over to a single local directory, with file names ./1.txt, ./2.txt, ./3.txt and ./4.txt
Problem: Copying multiple directories from remote server to local machine using a single SCP command and retaining each directory as it is in the remote server.
Solution: SCP can do this easily. This solves the annoying problem of entering password multiple times when using SCP with multiple folders. Consequently, this also saves a lot of time!
e.g.
# copies folders t1, t2, t3 from `test` to your local working directory
# note that there shouldn't be any space in between the folder names;
# we also escape the braces.
# please note the dot at the end of the SCP command
~$ cd ~/working/directory
~$ scp -r username#contact.server.de:/work/datasets/images/test/\{t1,t2,t3\} .
PS: Motivated by this great answer: scp or sftp copy multiple files with single command
Based on the comments, this also works fine in Git Bash on Windows
You can do this way:
scp hostname#serverNameOrServerIp:/path/to/files/\\{file1,file2,file3\\}.fileExtension ./
This will download all the listed filenames to whatever local directory you're on.
Make sure not to put spaces between each filename only use a comma ,.
Copy multiple directories:
scp -r dir1 dir2 dir3 admin#127.0.0.1:~/
Is more simple without using scp:
tar cf - file1 ... file_n | ssh user#server 'tar xf -'
This also let you do some things like compress the stream (-C) or (since OpenSSH v7.3) -J to jump any times through one (or more) proxy servers.
Avoid using passwords by coping your public key to ~/.ssh/authorized_keys (on server) with ssh-copy-id (on client).
Posted also here (with more details) and here.
scp remote:"[A-C]/[12].txt" local:
NOTE: I apologize in advance for answering only a portion of the above question. However, I found these commands to be useful for my current unix needs.
Uploading specific files from a local machine to a remote machine:
~/Desktop/dump_files$ scp file1.txt file2.txt lab1.cpp etc.ext your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Uploading an entire directory from a local machine to a remote machine:
~$ scp -r Desktop/dump_files your-user-id#remotemachine.edu:Folder1/DestinationFolderForFiles/
Downloading an entire directory from a remote machine to a local machine:
~/Desktop$ scp -r your-user-id#remote.host.edu:Public/web/ Desktop/
In my case, I am restricted to only using the sftp command.
So, I had to use a batchfile with sftp. I created a script such as the following. This assumes you are working in the /tmp directory, and you want to put the files in the destdir_on_remote_system on the remote system. This also only works with a noninteractive login. You need to set up public/private keys so you can login without entering a password. Change as needed.
#!/bin/bash
cd /tmp
# start script with list of files to transfer
ls -1 fileset1* > batchfile1
ls -1 fileset2* >> batchfile1
sed -i -e 's/^/put /' batchfile1
echo "cd destdir_on_remote_system" > batchfile
cat batchfile1 >> batchfile
rm batchfile1
sftp -b batchfile user#host
In the specific case where all the files have the same extension but with different suffix (say number of log file) you use the following:
scp user_name#ip.of.remote.machine:/some/log/folder/some_log_file.* ./
This will copy all files named some_log_file from the given folder within the remote, i.e.- some_log_file.1 , some_log_file.2, some_log_file.3 ....
In my case there were too many files with non related names.
I ended up using,
$ for i in $(ssh remote 'ls ~/dir'); do scp remote:~/dir/$i ./$i; done
1.txt 100% 322KB 1.2MB/s 00:00
2.txt 100% 33KB 460.7KB/s 00:00
3.txt 100% 61KB 572.1KB/s 00:00
$
scp uses ssh for data transfer with the same authentication and provides the same security as ssh.
A best practise here is to implement "SSH KEYS AND PUBLIC KEY AUTHENTICATION". With this, you can write your scripts without worring about authentication. Simple as that.
See WHAT IS SSH-KEYGEN
serverHomeDir='/home/somepath/ftp/'
backupDirAbsolutePath=${serverHomeDir}'_sqldump_'
backupDbName1='2021-08-27-03-56-somesite-latin2.sql'
backupDbName2='2021-08-27-03-56-somesite-latin1.sql'
backupDbName3='2021-08-27-03-56-somesite-utf8.sql'
backupDbName4='2021-08-27-03-56-somesite-utf8mb4.sql'
scp -i ~/.ssh/id_rsa.pub user#server.domain.com:${backupDirAbsolutePath}/"{$backupDbName1,$backupDbName2,$backupDbName3,$backupDbName4}" .
. - at the end will download the files to current dir
-i ~/.ssh/id_rsa.pub - assuming that you established ssh to your server with .pub key
scp -r root#ip-address:/root/dir/ C:\Users\your-name\Downloads\
the -r will let you download all the files inside the dir directory of your remote server

Most powerful examples of Unix commands or scripts every programmer should know

There are many things that all programmers should know, but I am particularly interested in the Unix/Linux commands that we should all know. For accomplishing tasks that we may come up against at some point such as refactoring, reporting, network updates etc.
The reason I am curious is because having previously worked as a software tester at a software company while I am studying my degree, I noticed that all of developers (who were developing Windows software) had 2 computers.
To their left was their Windows XP development machine, and to the right was a Linux box. I think it was Ubuntu. Anyway they told me that they used it because it provided powerful unix operations that Windows couldn't do in their development process.
This makes me curious to know, as a software engineer what do you believe are some of the most powerful scripts/commands/uses that you can perform on a Unix/Linux operating system that every programmer should know for solving real world tasks that may not necessarily relate to writing code?
We all know what sed, awk and grep do. I am interested in some actual Unix/Linux scripting pieces that have solved a difficult problem for you, so that other programmers may benefit. Please provide your story and source.
I am sure there are numerous examples like this that people keep in their 'Scripts' folder.
Update: People seem to be misinterpreting the question. I am not asking for the names of individual unix commands, rather UNIX code snippets that have solved a problem for you.
Best answers from the Community
Traverse a directory tree and print out paths to any files that match a regular expression:
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
Invoke the default editor(Nano/ViM)
(works on most Unix systems including Mac OS X)
Default editor is whatever your
"EDITOR" environment variable is
set to. ie: export
EDITOR=/usr/bin/pico which is
located at ~/.profile under Mac OS
X.
Ctrl+x Ctrl+e
List all running network connections (including which app they belong to)
lsof -i -nP
Clear the Terminal's search history (Another of my favourites)
history -c
I find commandlinefu.com to be an excellent resource for various shell scripting recipes.
Examples
Common
# Run the last command as root
sudo !!
# Rapidly invoke an editor to write a long, complex, or tricky command
ctrl-x ctrl-e
# Execute a command at a given time
echo "ls -l" | at midnight
Esoteric
# output your microphone to a remote computer's speaker
dd if=/dev/dsp | ssh -c arcfour -C username#host dd of=/dev/dsp
How to exit VI
:wq
Saves the file and ends the misery.
Alternative of ":wq" is ":x" to save and close the vi editor.
grep
awk
sed
perl
find
A lot of Unix power comes from its ability to manipulate text files and filter data. Of course, you can get all of these commands for Windows. They are just not native in the OS, like they are in Unix.
and the ability to chain commands together with pipes etc. This can create extremely powerful single lines of commands from simple functions.
Your shell is the most powerful tool you have available
being able to write simple loops etc
understanding file globbing (e.g. *.java etc.)
being able to put together commands via pipes, subshells. redirection etc.
Having that level of shell knowledge allows you to do enormous amounts on the command line, without having to record info via temporary text files, copy/paste etc., and to leverage off the huge number of utility programs that permit slicing/dicing of data.
Unix Power Tools will show you so much of this. Every time I open my copy I find something new.
I use this so much I am actually ashamed of myself. Remove spaces from all filenames and replace them with an underscore:
[removespaces.sh]
#!/bin/bash
find . -type f -name "* *" | while read file
do
mv "$file" "${file// /_}"
done
My personal favorite is the lsof command.
"lsof" can be used to list opened file descriptors, sockets, and pipes.
I find it extremely useful when trying to figure out which processes have used which ports/files on my machine.
Example: List all internet connections without hostname resolution and without port to port name conversion.
lsof -i -nP
http://www.manpagez.com/man/8/lsof/
If you make a typo in a long command, you can rerun the command with a substitution (in bash):
mkdir ~/aewseomeDirectory
you can see that "awesome" is mispelled, you can type the following to re run the command with the typo corrected
^aew^awe
it then outputs what it substituted (mkdir ~/aweseomeDirectory) and runs the command. (don't forget to undo the damage you did with the incorrect command!)
The tr command is the most under-appreciated command in Unix:
#Convert all input to upper case
ls | tr a-z A-Z
#take the output and put into a single line
ls | tr "\n" " "
#get rid of all numbers
ls -lt | tr -d 0-9
When solving problems on faulty linux boxes, by far the most common key sequence I type end up typing is alt+sysrq R E I S U B
The power of this tools (grep find, awk, sed) comes from their versatility, so giving a particular case seems quite useless.
man is the most powerful comand, because then you can understand what you type instead of just blindly copy pasting from stack overflow.
Example are welcome, but there are already topics for tis.
My most used :
grep something_to_find * -R
which can be replaced by ack and
find | xargs
find with results piped into xargs can be very powerful
some of you might disagree with me, but nevertheless, here's something to talk about. If one learns gawk ( other variants as well) throughly, one can skip learning and using grep/sed/wc/cut/paste and a few other *nix tools. all you need is one good tool to do the job of many combined.
Some way to search (multiple) badly formatted log files, in which the search string may be found on an "orphaned" next line. For example, to display both the 1st, and a concatenated 3rd and 4th line when searching for id = 110375:
[2008-11-08 07:07:01] [INFO] ...; id = 110375; ...
[2008-11-08 07:07:02] [INFO] ...; id = 238998; ...
[2008-11-08 07:07:03] [ERROR] ... caught exception
...; id = 110375; ...
[2008-11-08 07:07:05] [INFO] ...; id = 800612; ...
I guess there must be better solutions (yes, add them...!) than the following concatenation of the two lines using sed prior to actually running grep:
#!/bin/bash
if [ $# -ne 1 ]
then
echo "Usage: `basename $0` id"
echo "Searches all myproject's logs for the given id"
exit -1
fi
# When finding "caught exception" then append the next line into the pattern
# space bij using "N", and next replace the newline with a colon and a space
# to ensure a single line starting with a timestamp, to allow for sorting
# the output of multiple files:
ls -rt /var/www/rails/myproject/shared/log/production.* \
| xargs cat | sed '/caught exception$/N;s/\n/: /g' \
| grep "id = $1" | sort
...to yield:
[2008-11-08 07:07:01] [INFO] ...; id = 110375; ...
[2008-11-08 07:07:03] [ERROR] ... caught exception: ...; id = 110375; ...
Actually, a more generic solution would append all (possibly multiple) lines that do not start with some [timestamp] to its previous line. Anyone? Not necessarily using sed, of course.
for card in `seq 1 8` ;do
for ts in `seq 1 31` ; do
echo $card $ts >>/etc/tuni.cfg;
done
done
was better than writing the silly 248 lines of config by hand.
Neded to drop some leftover tables that all were prefixed with 'tmp'
for table in `echo show tables | mysql quotiadb |grep ^tmp` ; do
echo drop table $table
done
Review the output, rerun the loop and pipe it to mysql
Finding PIDs without the grep itself showing up
export CUPSPID=`ps -ef | grep cups | grep -v grep | awk '{print $2;}'`
Best answers from the Community
Traverse a directory tree and print out paths to any files that match a regular expression:
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
Invoke the default editor(Nano/ViM)
(works on most Unix systems including Mac OS X)
Default editor is whatever your
"EDITOR" environment variable is
set to. ie: export
EDITOR=/usr/bin/pico which is
located at ~/.profile under Mac OS
X.
Ctrl+x Ctrl+e
List all running network connections (including which app they belong to)
lsof -i -nP
Clear the Terminal's search history (Another of my favourites)
history -c
Repeat your previous command in bash using !!. I oftentimes run chown otheruser: -R /home/otheruser and forget to use sudo. If you forget sudo, using !! is a little easier than arrow-up and then home.
sudo !!
I'm also not a fan of automatically resolved hostnames and names for ports, so I keep an alias for iptables mapped to iptables -nL --line-numbers. I'm not even sure why the line numbers are hidden by default.
Finally, if you want to check if a process is listening on a port as it should, bound to the right address you can run
netstat -nlp
Then you can grep the process name or port number (-n gives you numeric).
I also love to have the aid of colors in the terminal. I like to add this to my bashrc to remind me whether I'm root without even having to read it. This actually helped me a lot, I never forget sudo anymore.
red='\033[1;31m'
green='\033[1;32m'
none='\033[0m'
if [ $(id -u) -eq 0 ];
then
PS1="[\[$red\]\u\[$none\]#\H \w]$ "
else
PS1="[\[$green\]\u\[$none\]#\H \w]$ "
fi
Those are all very simple commands, but I use them a lot. Most of them even deserved an alias on my machines.
Grep (try Windows Grep)
sed (try Sed for Windows)
In fact, there's a great set of ports of really useful *nix commands available at http://gnuwin32.sourceforge.net/. If you have a *nix background and now use windows, you should probably check them out.
You would be better of if you keep a cheatsheet with you... there is no single command that can be termed most useful. If a perticular command does your job its useful and powerful
Edit you want powerful shell scripts? shell scripts are programs. Get the basics right, build on individual commands and youll get what is called a powerful script. The one that serves your need is powerful otherwise its useless. It would have been better had you mentioned a problem and asked how to solve it.
Sort of an aside, but you can get powershell on windows. Its really powerful and can do a lot of the *nix type stuff. One cool difference is that you work with .net objects instead of text which can be useful if you're using the pipeline for filtering etc.
Alternatively, if you don't need the .NET integration, install Cygwin on the Windows box. (And add its directory to the Windows PATH.)
The fact you can use -name and -iname multiple times in a find command was an eye opener to me.
[findplaysong.sh]
#!/bin/bash
cd ~
echo Matched...
find /home/musicuser/Music/ -type f -iname "*$1*" -iname "*$2*" -exec echo {} \;
echo Sleeping 5 seconds
sleep 5
find /home/musicuser/Music/ -type f -iname "*$1*" -iname "*$2*" -exec mplayer {} \;
exit
When things work on one server but are broken on another the following lets you compare all the related libraries:
export MYLIST=`ldd amule | awk ' { print $3; }'`; for a in $MYLIST; do cksum $a; done
Compare this list with the one between the machines and you can isolate differences quickly.
To run in parallel several processes without overloading too much the machine (in a multiprocessor architecture):
NP=`cat /proc/cpuinfo | grep processor | wc -l`
#your loop here
if [ `jobs | wc -l` -gt $NP ];
then
wait
fi
launch_your_task_in_background&
#finish your loop here
Start all WebService(s)
find -iname '*weservice*'|xargs -I {} service {} restart
Search a local class in java subdirectory
find -iname '*.java'|xargs grep 'class Pool'
Find all items from file recursivly in subdirectories of current path:
cat searches.txt| xargs -I {} -d, -n 1 grep -r {}
P.S searches.txt: first,second,third, ... ,million
:() { :|: &} ;:
Fork Bomb without root access.
Try it on your own risk.
You can do anything with this...
gcc

Resources