How to print number of particular columns in shell script? - unix

I have a text file temp1 and say it has more than 20 columns and it has numerical values like as follows,
1,0,3,0,5........,
1,0,5,0,8........,
3,0,6,0,3........,
5,0,6,0,4........,
.................,
I want to remove the columns which has the total(sum) of zero and i need to redirect remaining columns to the new file
ie : for example as above 2nd and 4th columns have the total of zero so i need to remove 2nd and 4 th column and redirect it to separate file .
can any one help me pls?

$ cat file
1,0,3,0,5
1,0,5,0,8
3,0,6,0,3
5,0,6,0,4
$ awk -f tst.awk file
1,3,5
1,5,8
3,6,3
5,6,4
$ cat tst.awk
BEGIN{ FS="," }
{
for (j=1;j<=NF;j++) {
val[NR,j] = $j
sum[j] += val[NR,j]
}
}
END {
for (i=1;i<=NR;i++) {
ofs = ""
for (j=1;j<=NF;j++) {
if (sum[j]) {
printf "%s%s",ofs,val[i,j]
ofs = FS
}
}
print ""
}
}

You can use awk: (the following is ugly but I hope readable. That's the goal. I let better awkist enhance/reduce it further)
If the data is in file /path/to/zefile:
awk -F',' '
FNR==NR { for (col=1;col<=NF;col++)
{ if ($col != 0)
{wewantthiscolumn[col]=1 }
}
next
}
{ for (col=1;col<=NF;col++)
{ if (wewantthiscolumn[col]==1)
{ printf ("%s,",$col) }
}
print ""
}' /path/to/zefile /path/to/zefile | sed -e 's/,$//'
The idea: we launch awk on /path/to/zefile /path/to/zefile (hence, it read is twice).
On the first pass, we create a "wewantthiscolumn" array. This array contains "1" as soon as that column has something different from 0. The "next" ensure we only do this bit when FNR (=Number of Rows in the CURRENT file) == NR (=total number of rows), which is true only on the first pass.
On the second pass (hence we go directly to the 2nd { } as now NR>FNR) : we only display the column value $col which has a corresponding wewantthiscolumn(col)==1, and followed by a "," (so there is a little problem: the last col will have a "," after it)
Then we pass this through sed to get rid of the ",$" bit.
I am not sure there is not a very better way : can awk delete a field? so it could delete field col on the 2nd pass? Then it would be much easier to print the resulting $0, setting OFS=',' to have them separated with , ...
This would make the 2nd pass:
awk -F',' '
FNR==NR { for (col=1;col<=NF;col++)
{ if ($col != 0)
{wewantthiscolumn[col]=1 }
}
next
}
{ for (col=1;col<=NF;col++)
{ if (wewantthiscolumn[col]==0)
$col="DELETETHIS"
}
gensub(",DELETETHIS","",g)
gensub("DELETETHIS,","",g)
print $0
}' /path/to/zefile /path/to/zefile
I didn't want to assume no columns could be empty, hence I use "DELETETHIS" to make sure I only delete relevant fields... But this means the 1st way is in fact simpler ^^ : only print the fields you need, and then get rid of the "," at the end of line.

Here's one way using awk. Run like:
awk -f ./script.awk file{,}
Contents of script.awk:
BEGIN {
FS=","
}
FNR==NR {
for(i=1;i<=NF;i++) {
if ($i != 0) {
a[i]
}
}
next
}
{
for(j=1;j<=NF;j++) {
if (j in a) {
printf "%s%s", $j, (j==NF ? RS : FS)
}
}
}
Alternatively, here's the one-liner:
awk -F, 'FNR==NR { for(i=1;i<=NF;i++) if ($i != 0) a[i]; next } { for(j=1;j<=NF;j++) if (j in a) printf "%s%s", $j, (j==NF ? RS : FS) }' filex{,}
Contents of file:
1,0,3,0,5,0
1,0,5,0,8,1
3,0,6,0,3,2
5,0,6,0,4,5
Results:
1,3,5,0
1,5,8,1
3,6,3,2
5,6,4,5

A solution using python:
#!/usr/bin/env python
def transpose(grid):
return zip(*grid)
def removeBlankRows(grid):
return [list(row) for row in grid if any(map(int,row))]
grid = []
with open("input.csv") as fd:
for line in fd:
grid.append(line.strip().split(','))
data = removeBlankRows(transpose(removeBlankRows(transpose(grid))))
for i in data:
print ",".join(i)
input:
1,0,3,0,5
1,0,5,0,8
3,0,6,0,3
5,0,6,0,4
output:
1,3,5
1,5,8
3,6,3
5,6,4
input:
1,0,3,0,5
1,0,5,0,8
3,0,6,0,3
5,0,6,1,4
output:
1,3,0,5
1,5,0,8
3,6,0,3
5,6,1,4

Related

Trying to force an entry in an array to be an array

I am trying to create an associative array of associative arrays in gawk, and what I initially tried was:
options[key][subkey] = 1
However, when it got to this line, I unceremoniously received the error fatal: attempt to use scalar 'option["Declaration"]' as an array ("Declaration" being one of the main keys that my program uses, although I presume the exact value is irrelevant. At this particular point in the program, there was no "Declaration" entry assigned, although there were entries which had "Declaration" as a subkey on other entries, which may be meaningful).
So with a bit of googling, I found another stackoverflow question that looked like it answered my issue, so I put the following code immediately above it:
if (typeof(options[key])!="array") {
options[key] = 0;
delete options[key];
split("",options[key]);
}
However, this does not work either, instead now giving me the error: fatal: split: second argument is not an array
What am I doing wrong?
EDIT: Note, that I cannot use a basic 2-dimensional array here... for what I am doing, it is important that I am using one associative array to another because I need to be able to later identify the subkeys that were used on a given key.
Pursuant to requests below, I am posting the relevant functions that use the associative array, which may help clarify what is going on.
function add_concrete(key, concrete) {
if (key == concrete) {
return;
}
if (length(options[key])>0) {
for(i in options[key]) {
add_concrete(i, concrete);
}
}
contains[key][concrete] = 1
}
function add_options(name, value) {
subkey = trim(name);
if (subkey == "") {
return;
}
if (match(value, ";") > 0) {
exporting = 0;
}
split(value, args, /[ |;]*/);
for (i in args) {
key = trim(args[i]);
if (key != "") {
print("Adding " name " to " key);
options[key][subkey] = 1
if (concrete[key]) {
add_concrete(subkey, key);
}
}
}
}
Sorry, cooking at the same time. As you didn't post much, don't have much to work with, but with no "initialization":
$ awk 'BEGIN {
options[key] = 0;
delete options[key];
# options[key][1] # cant see me
split("",options[key]);
}'
awk: cmd. line:5: fatal: split: second argument is not an array
But with "initialization":
$ awk 'BEGIN {
options[key] = 0;
delete options[key];
options[key][1] # can see me
split("",options[key]);
}'
$_ # see this cursor happily blinking without any error

Premature end of file lex

When I tried to compile it using make keyword it is giving me an error of:
premature end of file in lex.l file in line no 17.
%option noyywrap
%{
#include "grammer.tab.h"
%}
name ([0-9])
whitespace [ \r\t\v\f]
linefeed \n
%%
{name} { return NAME; }
":" { return COLON; }
"->" { return RIGHT_ARROW; }
"{" { return LEFT_BRACE;}
"}" { return RIGHT_BRACE;}
";" { return SEMICOLON;}
{whitespace}
{linefeed} ++yylineno;
%%
So someone kindly help me.
Error:-
Tail:-
enter image description here
You usually get this error from lex (or flex) when the last line is not terminated by a newline.
To resolve the error just put a blank line at the end of the file.
(The same is also true for yacc/bison)
I also note you have a missing action for the pattern {whitespace}. I suggest you might try:
{whitespace} ; /* No action */
%%
/* End of the file */

Join lines based on a starting value using UNIX commands

Here I am again, with another UNIX requirement (as my knowledge in UNIX is limited to basic commands).
I have a file that looks like this (and has about 30 million lines)
123456789012,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
123456789012,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
123456789012,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
456789012345,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
The final output should be like this (without the first value repeating in the joined portions)
123456789012,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
456789012345,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
However, if the above output is a bit complicated, an output like below is also fine. Because I can load the file into Oracle11g and get rid of the redundant columns.
123456789012,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,123456789012,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,123456789012,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,234567890123,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,345678901234,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,345678901234,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
456789012345,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,567890123456,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
Using awk is sufficient; it is a control-break report of sorts. Since the lines with the same key are grouped together — a very important point — it is fairly simple.
awk -F, '{ if ($1 != saved)
{
if (saved != 0) print saved "," list
saved = $1
list = ""
}
pad = ""
for (i = 2; i <= NF; i++) { list = list pad $i; pad = "," }
}
END { if (saved != 0) print saved, list }'
You can feed the data as standard input or list the files to be processed after the final single quote.
Sample output:
123456789012,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
456789012345,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456 PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
The code uses saved to keep a track of the key column value that it is accumulating. When the key column changes, print out the saved values (if there are any) and reset for the new set of lines. At the end, print out the saved values (if there are any). The code deals with an empty file gracefully, therefore.
Perl options
#!/usr/bin/env perl
use strict;
use warnings;
my $saved = "";
my $list;
while (<>)
{
chomp;
my($key,$value) = ($_ =~ m/^([^,]+)(,.*)/);
if ($key ne $saved)
{
print "$saved$list\n" if $saved;
$saved = $key;
$list = "";
}
$list .= $value;
}
print "$saved$list\n" if $saved;
Or, if you really want to, you can saved writing the loop (and using strict and warnings) with:
perl -n -e 'chomp;
($key,$value) = ($_ =~ m/^([^,]+)(,.*)/);
if ($key ne $saved)
{
print "$saved$list\n" if $saved;
$saved = $key;
$list = "";
}
$list .= $value;
} END {
print "$saved$list\n" if $saved;'
That could be squished down to a single (rather long) line. The } END { is a piece of Perl weirdness; the -n option creates a loop while (<>) { … } and interpolates the script in the -e argument into it, so the } in } END { terminates that loop and then creates an END block which is ended by the } that Perl provided. Yes, documented and supported; yes, extremely weird (so I wouldn't do it; I'd use the Perl script shown first).
This awk script does what you want:
BEGIN { FS = OFS = "," }
NR == 1 { a[++n] = $1 }
a[1] != $1 { for(i=1; i<=n; ++i) printf "%s%s", a[i], (i<n?OFS:ORS); n = 1 }
{ a[1] = $1; for(i=2;i<=NF;++i) a[++n] = $i }
END { for(i=1; i<=n; ++i) printf "%s%s", a[i], (i<n?OFS:ORS) }
It stores all of the fields with the same first column in an array. When the first column differs, it prints out all of the elements of the array. Use it like awk -f join.awk file.
Output:
123456789012,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
234567890123,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
345678901234,PID=1,AID=2,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
456789012345,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
567890123456,PID=2,AID=1,EQOSID=1,PDPTY=IPV4,PDPCH=2-0,PID=3,AID=8,EQOSID=1,PDPTY=IPV4,PDPCH=2-0
Here are some Python options, if you decide to go that route... First will work for multiple input files and non-sequential identical indices. Second doesn't read the whole file into memory.
(Note, I know it is not convention, but I intentionally use UpperCase for variables to make it clear what is a user-defined variable and what is a special python word.)
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""
concatenate comma-separated values based on first value
Usage:
catfile.py *.txt > output.dat
"""
import sys
if len(sys.argv)<2:
sys.stderr.write(__doc__)
else:
FileList = sys.argv[1:]
IndexList = []
OutDict = {}
for FileName in FileList:
with open(FileName,'rU') as FStream:
for Line in FStream:
if Line:
Ind,TheRest = Line.rstrip().split(",",1)
if Ind not in IndexList:
IndexList.append(Ind)
OutDict[Ind] = OutDict.get(Ind,"") + "," + TheRest
for Ind in IndexList:
print Ind + OutDict[Ind]
Here is a different version which doesn't load the whole file into memory, but requires that the identical Indices all occur in order, and it only runs on one file:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
"""
concatenate comma-separated values based on first value
Usage:
catfile.py *.txt > output.dat
"""
import sys
if len(sys.argv)<2:
sys.stderr.write(__doc__)
else:
FileName = sys.argv[1]
OutString = ''
PrevInd = ''
FirstLine = True
with open(FileName,'rU') as FStream:
for Line in FStream:
if "," in Line:
Ind,TheRest = Line.rstrip().split(",",1)
if Ind != PrevInd:
if not FirstLine:
print PrevInd+OutString
PrevInd = Ind
OutString = TheRest
FirstLine = False
else:
OutString += ","+TheRest
print Ind + OutString
More generally, you can run these with by saving them as say catfile.py and then doing python catfile.py inputfile.txt > outputfile.txt. Or for longer term solutions, make a scripts directory, add it to your $PATH, make them executable with chmod u+x catfile.py and then you can just type the name of the script from any directory. But that is another topic that you would want to research.
A way without array:
BEGIN { FS = OFS = "," ; ORS = "" }
{
if (lid == $1) { $1 = "" ; print $0 }
else { print sep $0 ; lid = $1 ; sep = "\n" }
}
END { if (NR) print }
Note: if you don't need a newline at the end, remove the END block.
This might work for you (GNU sed):
sort file | sed -r ':a;$!N;s/^(([^,]*),.*)\n\2/\1/;ta;P;D'
Sort the file (if need be) and then delete newline and key where duplicates appear.

Check if the current time falls within defined time range on UNIX

Consider the below PSUEDO-CODE:
#!/bin/ksh
rangeStartTime_hr=13
rangeStartTime_min=56
rangeEndTime_hr=15
rangeEndTime_min=05
getCurrentMinute() {
return `date +%M | sed -e 's/0*//'`;
# Used sed to remove the padded 0 on the left. On successfully find&replacing
# the first match it returns the resultant string.
# date command does not provide minutes in long integer format, on Solaris.
}
getCurrentHour() {
return `date +%l`; # %l hour ( 1..12)
}
checkIfWithinRange() {
if [[ getCurrentHour -ge $rangeStartTime_hr &&
getCurrentMinute -ge $rangeStartTime_min ]]; then
# Ahead of start time.
if [[ getCurrentHour -le $rangeEndTime_hr &&
getCurrentMinute -le $rangeEndTime_min]]; then
# Within the time range.
return 0;
else
return 1;
fi
else
return 1;
fi
}
Is there a better way of implementing checkIfWithinRange()? Are there any inbuilt functions in UNIX that make it easier to do the above? I am new to korn scripting and would appreciate your inputs.
The return command is used to return an exit status, not an arbitrary string. This is unlike many other languages. You use stdout to pass data:
getCurrentMinute() {
date +%M | sed -e 's/^0//'
# make sure sed only removes zero from the beginning of the line
# in the case of "00" don't be too greedy so only remove one 0
}
Also, you need more syntax to invoke the function. Currently you are comparing the literal string "getCurrentMinute" in the if condition
if [[ $(getCurrentMinute) -ge $rangeStartTime_min && ...
I would do if a bit differently
start=13:56
end=15:05
checkIfWithinRange() {
current=$(date +%H:%M) # Get's the current time in the format 05:18
[[ ($start = $current || $start < $current) && ($current = $end || $current < $end) ]]
}
if checkIfWithinRange; then
do something
fi

How to print specific column in row wise using unix?

I have one input file which is given below.
Values,series,setupresultcode,nameofresultcode,resultcode
2,9184200,,serviceSetupResultempty,2001
11,9184200,0,successfulReleasedByService,2001
194,9184200,1,successfulDisconnectedByCallingParty,2001
101,9184200,2,successfulDisconnectByCalledParty,2001
2,9184201,0,successfulReleasedByService,2001
78,9184201,1,successfulDisconnectedByCallingParty,2001
32,9184201,2,successfulDisconnectByCalledParty,2001
4,9184202,0,successfulReleasedByService,2001
63,9184202,1,successfulDisconnectedByCallingParty,2001
37,9184202,2,successfulDisconnectByCalledParty,2001
I want output as given below:
Series,successfulReleasedByService,successfulDisconnectedByCallingParty,successfulDisconnectByCalledParty,serviceSetupResultempty
9184200,11,194,101,2
9184202,4,63,37,
Keep series as common print value of series.i.e. first column with respect to result code.i.e third(integer) or fourth(string) column in input file.
For example: the second column of the data has n number of series; take 9184200. That series having 4 setupresultcode (empty,0,1,2). Name of each result code is given in 4th column. I want to print if resultcode is 0; i.e. successfulReleasedByService then print value 11 with respect to series 9184200.
Something like this might work although I haven't tested it, regard it as some kind of pseudo code.
#!/bin/awk -f
BEGIN
{
number_of_series=0;
}
{
#This part will be executed for every line
if ($3 =="0" || $3 == "1" || $3 == "2")
{
for (i=1; i<=number_of_series; i++)
{
#If the series has already been added
if(seriesarray[i] == $2)
{
#Concat the results
seriesarray[$2]=seriesarray[$2]","$1;
}
#If it's a new series
else
{
number_of_series++;
seriesarray[$2]=$1;
}
}
}
}
END
{
#Iterate over the series and print the series id and the concatenated results
for (series in seriesarray)
{
print series, seriesarray[series];
}
}
This would yield something like
9184200,11,194,101
9184201,2,78,32
9184202,4,63,37

Resources