Print labels using awk - unix

On my FreeBSD 10.1 I'm writing a little piece of code that basically calls ls and automatically breaks the results down into something like this:
directory:
2.4M .git
528K src
380K dist
184K test
file:
856K CONDUCT.md
20K README.md
........
You will only need to list out directories and regular files, and you don't have to list out . .., but you have to list out hidden files, and sort them from largest to smallest separately.
The challenge is to complete it as a one-line command without using $(cmd), &&, ||, >, >>, <, ;, & and within 12 pipes (back quotes count as well).
Currently my progress is:
ls -Alh | sort -d -h -r |
awk 'BEGIN {print "Directories:"}
NR>1 {if(substr($1,1,1)~"d")print" "$5" "$9}'
which prints out only until the last directory item. But since the entire command will output once every record, I can't find a way to print files: only once, and then print out the remaining output.

Well, you may have to store the files in an array and print at the end:
ls -Alh|sed 1d|
sort -h -k5r|
awk 'BEGIN {print "Directories:"}
/^d/{print "\t"$5"\t"$9}
/^-/{f[n++]=sprintf("\t"$5"\t"$9)}
END{print "Files:";
for(i=0;i<n;++i)print f[i]}'
One additional problem you'll need to work out: files and dirs may have spaces in the name, and the simple $9 will be insufficient for that case.

Related

Awk command to perform action on lines excluding 1st and last

I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)
This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv
$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f
You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.
Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv

unix combine grep w and v command

I want to search a file and include the text #!/bin/bash, but exclude any other line that has a # sign. These two commands: grep -w '#!/bin/bash' file and grep -v '^#' file each do one part of this job. I would like this to be a single command, so here's what I've tried.
grep -w '#!/bin/bash' | grep -v '^#' file
This excludes lines beginning with #, but doesn't include the line #!/bin/bash
grep -w '#!/bin/bash' -v '^#' file
This just prints every line but #!/bin/bash
grep "^[^#]\|^#\!/bin/bash$" test.sh
Explanation:
^[^#] means starts by something different that #
\| is a or
^#\!/bin/bash$ is the exact line #!/bin/bash
So .. it looks as if you're trying to strip comments from bash files without removing their shebang.
The grep command can search for regular expressions, but isn't so good at applying rules of logic. You could do something like this:
grep -v '^#[^!]' input.sh
But you'd fail to strip comments that are affixed to the ends of lines. Note that I'm being a little more liberal with this regex, since it's entirely possible that a script might use something other than /bin/bash for its shebang. :-)
Another possibility would be to use awk. This lets you apply logic that cannot be expressed within a regular expression. For example, if you want to keep the commented line only if it is a shebang on the first line of the file, and remove all other comments, awk can express that as follows:
awk '
NF==1 && /^#!/; # if we're on the first line and find shebang, print.
/^#/ { next } # if this is a comment line, skip it.
1 # print everything else.
' input.sh

ls and xargs to output specific file extentions

I am trying to use ls and xargs to print specific file extensions .bam and .vcf witout the path. The below is close but when I | the two ls commands I get the error below. Separated it works fine except each file is printed on a newline (my actual data has hundreds of files and make it easier to read). Thank you :).
files in directory
1.bam
1.vcf
2.bam
2.vcf
command with error
ls /home/cmccabe/Desktop/NGS/test/R_folder/*.bam | xargs -n1 basename | ls /home/cmccabe/Desktop/NGS/test/R_folder/*.vcf | xargs -n1 basename >> /home/cmccabe/Desktop/NGS/test/log
xargs: basename: terminated by signal 13
desired output
1.bam 1.vcf
2.bam 2.vcf
You cannot pipe output into ls and have it print that with its other output. You should give the parameters to the first one and it will output everything.
ls *.a *.b *.c | xargs ...q
ls isn't really doing anything for you currently, it's the shell that's listing all your files. Since you're piping ls's output around, you're actually vulnerable to dangerous file names.
basename can take multiple arguments with the -a option:
basename -a "path/to/files/"*.{bam,vcf}
To print that in two columns, you could use printf via xargs, with sort for... sorting. The -z or -0 flags throughout cause null bytes to be used as the filename separators:
basename -az "path/to/files/"*.{bam,vcf} | sort -z | xargs -0n 2 printf "%b\t%b\n"
If you're going to be doing any more processing after printing to columns, you may want to replace the %bs in the printf format with %qs. That will escape non-printable characters in the output, but might look a bit ugly to human eyes.

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh
The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character
It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.
Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.
In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.
In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}
This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.
It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).
Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

Interpret as fixed string/literal and not regex using sed

For grep there's a fixed string option, -F (fgrep) to turn off regex interpretation of the search string.
Is there a similar facility for sed? I couldn't find anything in the man. A recommendation of another gnu/linux tool would also be fine.
I'm using sed for the find and replace functionality: sed -i "s/abc/def/g"
Do you have to use sed? If you're writing a bash script, you can do
#!/bin/bash
pattern='abc'
replace='def'
file=/path/to/file
tmpfile="${TMPDIR:-/tmp}/$( basename "$file" ).$$"
while read -r line
do
echo "${line//$pattern/$replace}"
done < "$file" > "$tmpfile" && mv "$tmpfile" "$file"
With an older Bourne shell (such as ksh88 or POSIX sh), you may not have that cool ${var/pattern/replace} structure, but you do have ${var#pattern} and ${var%pattern}, which can be used to split the string up and then reassemble it. If you need to do that, you're in for a lot more code - but it's really not too bad.
If you're not in a shell script already, you could pretty easily make the pattern, replace, and filename parameters and just call this. :)
PS: The ${TMPDIR:-/tmp} structure uses $TMPDIR if that's set in your environment, or uses /tmp if the variable isn't set. I like to stick the PID of the current process on the end of the filename in the hopes that it'll be slightly more unique. You should probably use mktemp or similar in the "real world", but this is ok for a quick example, and the mktemp binary isn't always available.
Option 1) Escape regexp characters. E.g. sed 's/\$0\.0/0/g' will replace all occurrences of $0.0 with 0.
Option 2) Use perl -p -e in conjunction with quotemeta. E.g. perl -p -e 's/\\./,/gi' will replace all occurrences of . with ,.
You can use option 2 in scripts like this:
SEARCH="C++"
REPLACE="C#"
cat $FILELIST | perl -p -e "s/\\Q$SEARCH\\E/$REPLACE/g" > $NEWLIST
If you're not opposed to Ruby or long lines, you could use this:
alias replace='ruby -e "File.write(ARGV[0], File.read(ARGV[0]).gsub(ARGV[1]) { ARGV[2] })"'
replace test3.txt abc def
This loads the whole file into memory, performs the replacements and saves it back to disk. Should probably not be used for massive files.
If you don't want to escape your string, you can reach your goal in 2 steps:
fgrep the line (getting the line number) you want to replace, and
afterwards use sed for replacing this line.
E.g.
#/bin/sh
PATTERN='foo*[)*abc' # we need it literal
LINENUMBER="$( fgrep -n "$PATTERN" "$FILE" | cut -d':' -f1 )"
NEWSTRING='my new string'
sed -i "${LINENUMBER}s/.*/$NEWSTRING/" "$FILE"
You can do this in two lines of bash code if you're OK with reading the whole file into memory. This is quite flexible -- the pattern and replacement can contain newlines to match across lines if needed. It also preserves any trailing newline or lack thereof, which a simple loop with read does not.
mapfile -d '' < file
printf '%s' "${MAPFILE//"$pat"/"$rep"}" > file
For completeness, if the file can contain null bytes (\0), we need to extend the above, and it becomes
mapfile -d '' < <(cat file; printf '\0')
last=${MAPFILE[-1]}; unset "MAPFILE[-1]"
printf '%s\0' "${MAPFILE[#]//"$pat"/"$rep"}" > file
printf '%s' "${last//"$pat"/"$rep"}" >> file
perl -i.orig -pse 'while (($i = index($_,$s)) >= 0) { substr($_,$i,length($s), $r)}'--\
-s='$_REQUEST['\'old\'']' -r='$_REQUEST['\'new\'']' sample.txt
-i.orig in-place modification with backup.
-p print lines from the input file by default
-s enable rudimentary parsing of command line arguments
-e run this script
index($_,$s) search for the $s string
substr($_,$i,length($s), $r) replace the string
while (($i = index($_,$s)) >= 0) repeat until
-- end of perl parameters
-s='$_REQUEST['\'old\'']', -r='$_REQUEST['\'new\'']' - set $s,$r
You still need to "escape" ' chars but the rest should be straight forward.
Note: this started as an answer to How to pass special character string to sed hence the $_REQUEST['old'] strings, however this question is a bit more appropriately formulated.
You should be using replace instead of sed.
From the man page:
The replace utility program changes strings in place in files or on the
standard input.
Invoke replace in one of the following ways:
shell> replace from to [from to] ... -- file_name [file_name] ...
shell> replace from to [from to] ... < file_name
from represents a string to look for and to represents its replacement.
There can be one or more pairs of strings.

Resources