I have one directory, with multiple subdirectories. In each subdirectory there is a file on which I want to perform analysis (code already written).
Common for all subdirectories is that they have file with same extension on which analysis should be performed.
Using Unix shell, is there a way to write a commands which will:
for each subdirectory in main directory, use file with certain extension and perform further commands on that file (further commands include creation of some new directories and files)
repeat it for all subdirectories in main directory and files inside them
I will appreciate all suggestions.
Use the find command. find . -type f -name '*.txt' -exec prog \{} \; will execute program prog with the name of every file in the current directory . and below with the extension .txt (i.e. that matches the pattern *.txt). The -type f excludes directories (and pipes and devices). The -exec means execute this command; the \{} will be replaced with the filename; \; means end of command.
This definitely works if your filenames have no spaces, quote marks, or backslashes in them. If they do, it gets a little trickier: find . -type f -name '*.txt' -print0 | xargs -0 -n1 prog, assuming the filename argument goes at the end of the line. The -print0 means output the file with null termination (zero character) and the -0 means input with null termination. xargs takes its input and invokes prog for every null-terminated word. -n1 means only use one argument per invocation; you can omit it if the program accepts multiple filenames as arguments. You can use -i if you need to insert text after the argument.
Note: I am aware that using -exec for various obscure reasons may not be preferable for, say, secure system shell scripts, but for a use case like this it is fine.
Getting error in copying multiple files. Below command is copying only first file and giving error for rest of the files. Can someone please help me out.
Command:
scp $host:$(ssh -n $host "find /incoming -mmin -120 -name 2018*") /incoming/
Result:
user#host:~/scripts/OTA$ scp $host:$(ssh -n $host "find /incoming -mmin -120 -name 2018*") /incoming/
Password:
Password:
2018084session_event 100% |**********************************************************************************************************| 9765 KB 00:00
cp: cannot access /incoming/2018084session_event_log.195-10.45.40.9
cp: cannot access /incoming/2018084session_event_log.195-10.45.40.9_2_3
Your command uses Command Substitution to generate a list of files. Your assumption is that there is some magic in the "source" notation for scp that would cause multiple members of the list generated by your find command to be assumed to live on $host, when in fact your command might expand into something like:
scp remotehost:/incoming/someoldfile anotheroldfile /incoming
Only the first file is being copied from $host, because none of the rest include $host: at the beginning of the path. They're not found in your local /incoming directory, hence the error.
Oh, and in addition, you haven't escape the asterisk in the find command, so 2018* may expand to multiple files that are in the login directory for the user in question. I can't tell from here, it depends on your OS and shell configuration.
I should point out that you are providing yet another example of the classic Parsing LS problem. Special characters WILL break your command. The "better" solution usually offered for this problem tends to be to use a for loop, but that's not really what you're looking for. Instead, I'd recommend making a tar of the files you're looking for. Something like this might do:
ssh "$host" "find /incoming -mmin -120 -name 2018\* -exec tar -cf - {} \+" |
tar -xvf - -C /incoming
What does this do?
ssh runs a remote find command with your criteria.
find feeds the list of filenames (regardless of special characters) to a tar command as options.
The tar command sends its result to stdout (-f -).
That output is then piped into another tar running on your local machine, which extracts the stream.
If your tar doesn't support -C, you can either remove it and run a cd /incoming before the ssh, or you might be able to replace that pipe segment with a curly-braced command: { cd /incoming && tar -xvf -; }
The curly brace notation assumes a POSIX-like shell (bash, zsh, etc). The rest of this should probably work equally well in csh if that's what you're stuck with.
Limited warranty: Best Effort Only. Untested on animals or computers. Your milage may vary. May contain nuts.
If this doesn't work for you, poke at it until it does.
When i use this command on command line with hard coded values for the parameters $dir,$year,$month it works very well and creates the gzip file. I checked the contents of the gz file and i see the desired results.
find $dir -name '*_$year-$month-*' -type f -print | \
xargs tar -zvcf $dir/log.$year-$month.tar.gz
But when i embed this in a shell script and run, it won't run and gives me this error.
tar: Cowardly refusing to create an empty archive
First of all, your one-liner is full of quoting issues:
$dir will break if the directory name contains whitespace. You need "$dir" instead
Single quotes prevent variable expansion - '*_$year-$month-*' should probably be "*_$year-$month-*".
In your shell script code, find will not match any files (you don't have any filenames with the string _$year-$month-, do you?) and therefore tar will not be supplied with any files to include in the archive.
On a sidenote, using xargs in this particular case is dangerous - if you have too many files, xargs will call tar more than once and any files archived in all but the last run will be erased from the archive as it is overwritten.
Additionally, this command will also break on file paths with whitespace - by default xargs uses whitespace as the argument delimiter. Depending on the version of find and xargs binaries that you are using, there may be a -print0 option for find and a matching -0 option for xargs to deal with this issue:
find ... -print0 | xargs -0 ...
Finally, some xargs versions have an option to avoid calling the specified command if no arguments have been supplied - for GNU xargs that is the -r option:
find ... -print0 | xargs -0 -r ...
Variable substitution doesn't happen within single quotes.
Use double quotes: find "$dir" -name "*_$year-$month-*" ...
(You may use different kinds of quotes within the same argument if it's needed (not here): '*_'"$year")
If you can, attempt to echo the values of $dir, $year, $month from the shellscript onto the console. It is possible that those values are different in the shellscript.
Secodly, which find and tar is the shellscript using? Attempt to use the full paths for diagnostics or modify the value of $PATH to match what you get when you run the command from the command line.
There are many things that all programmers should know, but I am particularly interested in the Unix/Linux commands that we should all know. For accomplishing tasks that we may come up against at some point such as refactoring, reporting, network updates etc.
The reason I am curious is because having previously worked as a software tester at a software company while I am studying my degree, I noticed that all of developers (who were developing Windows software) had 2 computers.
To their left was their Windows XP development machine, and to the right was a Linux box. I think it was Ubuntu. Anyway they told me that they used it because it provided powerful unix operations that Windows couldn't do in their development process.
This makes me curious to know, as a software engineer what do you believe are some of the most powerful scripts/commands/uses that you can perform on a Unix/Linux operating system that every programmer should know for solving real world tasks that may not necessarily relate to writing code?
We all know what sed, awk and grep do. I am interested in some actual Unix/Linux scripting pieces that have solved a difficult problem for you, so that other programmers may benefit. Please provide your story and source.
I am sure there are numerous examples like this that people keep in their 'Scripts' folder.
Update: People seem to be misinterpreting the question. I am not asking for the names of individual unix commands, rather UNIX code snippets that have solved a problem for you.
Best answers from the Community
Traverse a directory tree and print out paths to any files that match a regular expression:
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
Invoke the default editor(Nano/ViM)
(works on most Unix systems including Mac OS X)
Default editor is whatever your
"EDITOR" environment variable is
set to. ie: export
EDITOR=/usr/bin/pico which is
located at ~/.profile under Mac OS
X.
Ctrl+x Ctrl+e
List all running network connections (including which app they belong to)
lsof -i -nP
Clear the Terminal's search history (Another of my favourites)
history -c
I find commandlinefu.com to be an excellent resource for various shell scripting recipes.
Examples
Common
# Run the last command as root
sudo !!
# Rapidly invoke an editor to write a long, complex, or tricky command
ctrl-x ctrl-e
# Execute a command at a given time
echo "ls -l" | at midnight
Esoteric
# output your microphone to a remote computer's speaker
dd if=/dev/dsp | ssh -c arcfour -C username#host dd of=/dev/dsp
How to exit VI
:wq
Saves the file and ends the misery.
Alternative of ":wq" is ":x" to save and close the vi editor.
grep
awk
sed
perl
find
A lot of Unix power comes from its ability to manipulate text files and filter data. Of course, you can get all of these commands for Windows. They are just not native in the OS, like they are in Unix.
and the ability to chain commands together with pipes etc. This can create extremely powerful single lines of commands from simple functions.
Your shell is the most powerful tool you have available
being able to write simple loops etc
understanding file globbing (e.g. *.java etc.)
being able to put together commands via pipes, subshells. redirection etc.
Having that level of shell knowledge allows you to do enormous amounts on the command line, without having to record info via temporary text files, copy/paste etc., and to leverage off the huge number of utility programs that permit slicing/dicing of data.
Unix Power Tools will show you so much of this. Every time I open my copy I find something new.
I use this so much I am actually ashamed of myself. Remove spaces from all filenames and replace them with an underscore:
[removespaces.sh]
#!/bin/bash
find . -type f -name "* *" | while read file
do
mv "$file" "${file// /_}"
done
My personal favorite is the lsof command.
"lsof" can be used to list opened file descriptors, sockets, and pipes.
I find it extremely useful when trying to figure out which processes have used which ports/files on my machine.
Example: List all internet connections without hostname resolution and without port to port name conversion.
lsof -i -nP
http://www.manpagez.com/man/8/lsof/
If you make a typo in a long command, you can rerun the command with a substitution (in bash):
mkdir ~/aewseomeDirectory
you can see that "awesome" is mispelled, you can type the following to re run the command with the typo corrected
^aew^awe
it then outputs what it substituted (mkdir ~/aweseomeDirectory) and runs the command. (don't forget to undo the damage you did with the incorrect command!)
The tr command is the most under-appreciated command in Unix:
#Convert all input to upper case
ls | tr a-z A-Z
#take the output and put into a single line
ls | tr "\n" " "
#get rid of all numbers
ls -lt | tr -d 0-9
When solving problems on faulty linux boxes, by far the most common key sequence I type end up typing is alt+sysrq R E I S U B
The power of this tools (grep find, awk, sed) comes from their versatility, so giving a particular case seems quite useless.
man is the most powerful comand, because then you can understand what you type instead of just blindly copy pasting from stack overflow.
Example are welcome, but there are already topics for tis.
My most used :
grep something_to_find * -R
which can be replaced by ack and
find | xargs
find with results piped into xargs can be very powerful
some of you might disagree with me, but nevertheless, here's something to talk about. If one learns gawk ( other variants as well) throughly, one can skip learning and using grep/sed/wc/cut/paste and a few other *nix tools. all you need is one good tool to do the job of many combined.
Some way to search (multiple) badly formatted log files, in which the search string may be found on an "orphaned" next line. For example, to display both the 1st, and a concatenated 3rd and 4th line when searching for id = 110375:
[2008-11-08 07:07:01] [INFO] ...; id = 110375; ...
[2008-11-08 07:07:02] [INFO] ...; id = 238998; ...
[2008-11-08 07:07:03] [ERROR] ... caught exception
...; id = 110375; ...
[2008-11-08 07:07:05] [INFO] ...; id = 800612; ...
I guess there must be better solutions (yes, add them...!) than the following concatenation of the two lines using sed prior to actually running grep:
#!/bin/bash
if [ $# -ne 1 ]
then
echo "Usage: `basename $0` id"
echo "Searches all myproject's logs for the given id"
exit -1
fi
# When finding "caught exception" then append the next line into the pattern
# space bij using "N", and next replace the newline with a colon and a space
# to ensure a single line starting with a timestamp, to allow for sorting
# the output of multiple files:
ls -rt /var/www/rails/myproject/shared/log/production.* \
| xargs cat | sed '/caught exception$/N;s/\n/: /g' \
| grep "id = $1" | sort
...to yield:
[2008-11-08 07:07:01] [INFO] ...; id = 110375; ...
[2008-11-08 07:07:03] [ERROR] ... caught exception: ...; id = 110375; ...
Actually, a more generic solution would append all (possibly multiple) lines that do not start with some [timestamp] to its previous line. Anyone? Not necessarily using sed, of course.
for card in `seq 1 8` ;do
for ts in `seq 1 31` ; do
echo $card $ts >>/etc/tuni.cfg;
done
done
was better than writing the silly 248 lines of config by hand.
Neded to drop some leftover tables that all were prefixed with 'tmp'
for table in `echo show tables | mysql quotiadb |grep ^tmp` ; do
echo drop table $table
done
Review the output, rerun the loop and pipe it to mysql
Finding PIDs without the grep itself showing up
export CUPSPID=`ps -ef | grep cups | grep -v grep | awk '{print $2;}'`
Best answers from the Community
Traverse a directory tree and print out paths to any files that match a regular expression:
find . -exec grep -l -e 'myregex' {} \; >> outfile.txt
Invoke the default editor(Nano/ViM)
(works on most Unix systems including Mac OS X)
Default editor is whatever your
"EDITOR" environment variable is
set to. ie: export
EDITOR=/usr/bin/pico which is
located at ~/.profile under Mac OS
X.
Ctrl+x Ctrl+e
List all running network connections (including which app they belong to)
lsof -i -nP
Clear the Terminal's search history (Another of my favourites)
history -c
Repeat your previous command in bash using !!. I oftentimes run chown otheruser: -R /home/otheruser and forget to use sudo. If you forget sudo, using !! is a little easier than arrow-up and then home.
sudo !!
I'm also not a fan of automatically resolved hostnames and names for ports, so I keep an alias for iptables mapped to iptables -nL --line-numbers. I'm not even sure why the line numbers are hidden by default.
Finally, if you want to check if a process is listening on a port as it should, bound to the right address you can run
netstat -nlp
Then you can grep the process name or port number (-n gives you numeric).
I also love to have the aid of colors in the terminal. I like to add this to my bashrc to remind me whether I'm root without even having to read it. This actually helped me a lot, I never forget sudo anymore.
red='\033[1;31m'
green='\033[1;32m'
none='\033[0m'
if [ $(id -u) -eq 0 ];
then
PS1="[\[$red\]\u\[$none\]#\H \w]$ "
else
PS1="[\[$green\]\u\[$none\]#\H \w]$ "
fi
Those are all very simple commands, but I use them a lot. Most of them even deserved an alias on my machines.
Grep (try Windows Grep)
sed (try Sed for Windows)
In fact, there's a great set of ports of really useful *nix commands available at http://gnuwin32.sourceforge.net/. If you have a *nix background and now use windows, you should probably check them out.
You would be better of if you keep a cheatsheet with you... there is no single command that can be termed most useful. If a perticular command does your job its useful and powerful
Edit you want powerful shell scripts? shell scripts are programs. Get the basics right, build on individual commands and youll get what is called a powerful script. The one that serves your need is powerful otherwise its useless. It would have been better had you mentioned a problem and asked how to solve it.
Sort of an aside, but you can get powershell on windows. Its really powerful and can do a lot of the *nix type stuff. One cool difference is that you work with .net objects instead of text which can be useful if you're using the pipeline for filtering etc.
Alternatively, if you don't need the .NET integration, install Cygwin on the Windows box. (And add its directory to the Windows PATH.)
The fact you can use -name and -iname multiple times in a find command was an eye opener to me.
[findplaysong.sh]
#!/bin/bash
cd ~
echo Matched...
find /home/musicuser/Music/ -type f -iname "*$1*" -iname "*$2*" -exec echo {} \;
echo Sleeping 5 seconds
sleep 5
find /home/musicuser/Music/ -type f -iname "*$1*" -iname "*$2*" -exec mplayer {} \;
exit
When things work on one server but are broken on another the following lets you compare all the related libraries:
export MYLIST=`ldd amule | awk ' { print $3; }'`; for a in $MYLIST; do cksum $a; done
Compare this list with the one between the machines and you can isolate differences quickly.
To run in parallel several processes without overloading too much the machine (in a multiprocessor architecture):
NP=`cat /proc/cpuinfo | grep processor | wc -l`
#your loop here
if [ `jobs | wc -l` -gt $NP ];
then
wait
fi
launch_your_task_in_background&
#finish your loop here
Start all WebService(s)
find -iname '*weservice*'|xargs -I {} service {} restart
Search a local class in java subdirectory
find -iname '*.java'|xargs grep 'class Pool'
Find all items from file recursivly in subdirectories of current path:
cat searches.txt| xargs -I {} -d, -n 1 grep -r {}
P.S searches.txt: first,second,third, ... ,million
:() { :|: &} ;:
Fork Bomb without root access.
Try it on your own risk.
You can do anything with this...
gcc
On the UNIX bash shell (specifically Mac OS X Leopard) what would be the simplest way to copy every file having a specific extension from a folder hierarchy (including subdirectories) to the same destination folder (without subfolders)?
Obviously there is the problem of having duplicates in the source hierarchy. I wouldn't mind if they are overwritten.
Example: I need to copy every .txt file in the following hierarchy
/foo/a.txt
/foo/x.jpg
/foo/bar/a.txt
/foo/bar/c.jpg
/foo/bar/b.txt
To a folder named 'dest' and get:
/dest/a.txt
/dest/b.txt
In bash:
find /foo -iname '*.txt' -exec cp \{\} /dest/ \;
find will find all the files under the path /foo matching the wildcard *.txt, case insensitively (That's what -iname means). For each file, find will execute cp {} /dest/, with the found file in place of {}.
The only problem with Magnus' solution is that it forks off a new "cp" process for every file, which is not terribly efficient especially if there is a large number of files.
On Linux (or other systems with GNU coreutils) you can do:
find . -name "*.xml" -print0 | xargs -0 echo cp -t a
(The -0 allows it to work when your filenames have weird characters -- like spaces -- in them.)
Unfortunately I think Macs come with BSD-style tools. Anyone know a "standard" equivalent to the "-t" switch?
The answers above don't allow for name collisions as the asker didn't mind files being over-written.
I do mind files being over-written so came up with a different approach. Replacing each / in the path with - keep the hierarchy in the names, and puts all the files in one flat folder.
We use find to get the list of all files, then awk to create a mv command with the original filename and the modified filename then pass those to bash to be executed.
find ./from -type f | awk '{ str=$0; sub(/\.\//, "", str); gsub(/\//, "-", str); print "mv " $0 " ./to/" str }' | bash
where ./from and ./to are directories to mv from and to.
If you really want to run just one command, why not cons one up and run it? Like so:
$ find /foo -name '*.txt' | xargs echo | sed -e 's/^/cp /' -e 's|$| /dest|' | bash -sx
But that won't matter too much performance-wise unless you do this a lot or have a ton of files. Be careful of name collusions, however. I noticed in testing that GNU cp at least warns of collisions:
cp: will not overwrite just-created `/dest/tubguide.tex' with `./texmf/tex/plain/tugboat/tubguide.tex'
I think the cleanest is:
$ find /foo -name '*.txt' | xargs -i cp {} /dest
Less syntax to remember than the -exec option.
As far as the man page for cp on a FreeBSD box goes, there's no need for a -t switch. cp will assume the last argument on the command line to be the target directory if more than two names are passed.