sort -t $'\t' equivalent compatible with POSIX sh? - unix

I am trying use a for loop for multiple files in my directory with a pipe command,but it does not seem to work. When am running the same command on a single file it seems to work. Where am I getting it wrong?
for x in *summary-FDR0.05 ; do sort -t $'\t' -k8,8rn $x | head -n 50000 | sortBed -i > sorted_top_50k_$x.bed; done
All my files end with summary-FDR0.05. When I run
sort -t $'\t' -k8,8rn sample13-summary-FDR0.05 | head -n 50000 | sortBed -i > sorted_top_50k_S_13_O1_122*K27ac.bed
This seems to work well. May I know where I am getting it worng
Error:
sort: multi-character tab `$\\t'
Thanks

For POSIX compatibility, replace $'\t' with "$(printf "\t")".

Related

What is the easiest way for grepping the 'man grep' for flags

I do use grep a lot, but I would love to improve a bit.
Regarding the question. I wanted to narrow the man entry to find the explanation of what the -v in grep -v 'pattern' filename stood for, mainly this:
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
Thus, to find the next five lines after the line which contains -v I tried:
man grep | grep -A 5 -v
and
man grep | grep -A 5 '-v'
but they return:
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
This confuses me since:
man grep | grep -A 5 'Selected'
and
man grep | grep -A 5 Selected
do work.
What is wrong in my approach? Is there any easier way to achieve what I need?
One approach is to parse the Info documents for the command directly. If you run info grep (or other command) you will often find much more detailed and better-structured documentation, which will let you pin-point just the section you need.
Here's a function that will print out the relevant Info section for an option/variable/etc:
info_search() {
info --subnodes "$1" -o - 2>&- \
| awk -v RS='' "/(^|\n)(‘|'|\`)$2((,|\[| ).*)?(’|')\n/"
}
This should work on Linux/macOS/BSD. Output is like:
$ info_search grep -v
‘-v’
‘--invert-match’
Invert the sense of matching, to select non-matching lines. (‘-v’
is specified by POSIX.)
$ info_search gawk RS
'RS == "\n"'
Records are separated by the newline character ('\n'). In effect,
every line in the data file is a separate record, including blank
...
$ info_search bash -i
`-i'
Force the shell to run interactively. Interactive shells are
...

unix sort descending order

I want to sort a tab limited file in descending order according to the 5th field of the records.
I tried
sort -r -k5n filename
But it didn't work.
The presence of the n option attached to the -k5 causes the global -r option to be ignored for that field. You have to specify both n and r at the same level (globally or locally).
sort -t $'\t' -k5,5rn
or
sort -rn -t $'\t' -k5,5
If you only want to sort only on the 5th field then use -k5,5.
Also, use the -t command line switch to specify the delimiter to tab. Try this:
sort -k5,5 -r -n -t \t filename
or if the above doesn't work (with the tab) this:
sort -k5,5 -r -n -t $'\t' filename
The man page for sort states:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
Finally, this SO question Unix Sort with Tab Delimiter might be helpful.
To list files based on size in asending order.
find ./ -size +1000M -exec ls -tlrh {} \; |awk -F" " '{print $5,$9}' | sort -n\

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

Passing output from one command as argument to another [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Closed 5 years ago.
I have this for:
for i in `ls -1 access.log*`; do tail $i |awk {'print $4'} |cut -d: -f 1 |grep - $i > $i.output; done
ls will give access.log, access.log.1, access.log.2 etc.
tail will give me the last line of each file, which looks like: 192.168.1.23 - - [08/Oct/2010:14:05:04 +0300] etc. etc. etc
awk+cut will extract the date (08/Oct/2010 - but different in each access.log), which will allow me to grep for it and redirect the output to a separate file.
But I cannot seem to pass the output of awk+cut to grep.
The reason for all this is that those access logs include lines with more than one date (06/Oct, 07/Oct, 08/Oct) and I just need the lines with the most recent date.
How can I achieve this?
Thank you.
As a sidenote, tail displays the last 10 lines.
A possible solution would be to grepthis way:
for i in `ls -lf access.log*`; do grep $(tail $i |awk {'print $4'} |cut -d: -f 1| sed 's/\[/\\[/') $i > $i.output; done
why don't you break it up into steps??
for file in *access.log
do
what=$(tail "$i" |awk {'print $4'} |cut -d: -f 1)
grep "$what" "$file" >> output
done
You shouldn't use ls that way. Also, ls -l gives you information you don't need. The -f option to grep will allow you to pipe the pattern to grep. Always quote variables that contain filenames.
for i in access.log*; do awk 'END {sub(":.*","",$4); print substr($4,2)}' "$i" | grep -f - $i > "$i.output"; done
I also eliminated tail and cut since AWK can do their jobs.
Umm...
Use xargs or backticks.
man xargs
or
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_03_04.html , section 3.4.5. Command substitution
you can try:
grep "$(stuff to get piped over to be grep-ed)" file
I haven't tried this, but my answer applied here would look like this:
grep "$(for i in `ls -1 access.log*`; do tail $i |awk {'print $4'} |cut -d: -f 1 |grep - $i > $i.output; done)" $i

Using lsof to get a list of file names

EDIT 1
I'm having problems using the arguments given. Maybe it is the way I'm passing my arguments through NSTask? Any suggestions as to how I can do this?
NSTask *file_Task = [NSTask new];
[file_Task setLaunchPath:#"/usr/sbin/lsof"];
[file_Task setArguments:[NSArray arrayWithObjects:#"+p", the_Pid, nil]];
Good Afternoon Fellow Coders....
I'm using the following command:
lsof +p 13812
to get the list of a files accessed by a process. The thing is it is giving me a lot of additional information that I don't want such as TYPE, DEVICE, etc.
Is there an argument that I can add to the above command so that I get ONLY the NAME?
Thank you, thank you thank you! :)
Eric
You can use:
lsof -Fn +p 12345
This will output a list of lines, with the first being p followed by the process ID,
and all following lines consisting of n followed by the file name.
If you'd like to quickly preprocess this, you can do something similar to the following:
lsof -Fn +p 12345 | tail -n +2 | cut -c2-
See the lsof man page for more information, specifically under the OUTPUT FOR OTHER PROGRAMS heading.
try:
lsof | tr -s ' ' | cut -d' ' -f9
lsof +p 9174 | awk '{ print $9 }'
Listing the currently playing song (nfs file, accessed by user mpd):
$ sudo lsof -N -a -u mpd -Fn |
sed '/n/!d; s/^n//'
/R/audio/[...] Jay-Jay Johanson , So Tell The Girls That I Am Back.mp3
The sed part deletes any lines not starting with n and removes n in the final output.

Resources