I am trying to retrieve rows that are relevant from a text file, however i am unsure of how i can get it done.
Below is the sample line in the .txt file.
Name : Alice
Age : 23
Email : Alice#email.com
Name : John
Age : 24
Name : Peter
Age: 25
Email :Peter#email.com
So as seen from above, i am only interested to take the data of Alice and Peter because John's information is not complete (Missing the email row).
So the output should just be :
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
Just print the records that have 3 lines:
$ awk -v RS= -v ORS='\n\n' -F'\n' 'NF==3' file
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
You can even automate it to figure out how many lines each record should have instead of hard-coding 3:
$ awk -v RS= -v ORS='\n\n' -F'\n' 'NR==FNR{m=(NF>m?NF:m);next} NF==m' file file
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
That last assumes there's at least one record in your file that IS complete.
With GNU grep:
grep -Poz '^Name.*\n^Age.*\n^Email.*(\n^$)*' file
Output:
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
You can use the following awk command:
awk '/Name :/&&/Age :/&&/Email :/' RS='' ORS='\n\n' file
Following man awk:
If RS is set to the null string, then records
are separated by blank lines.
This makes awk operating based on records rather lines. /Name :/&&/Age :/&&/Email :/ checks if this records contain all required fields. If that is true, awk will print the record.
awk solution:
awk '/Name/{ n=$0 }n && /Age/{ a=$0; rn=NR }a && /Email/ && (NR-rn == 1){ print n RS a RS $0 RS }' file
The output:
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
very terse perl:
perl -00 -lne 'print if tr/\n/\n/ == 2' file.txt
Related
I am trying to check if each line has a same length(or number of fields) in a file.
I am doing the following but it seems not to work.
NR==1 {length=NF}
NR>1 && NF!=length {print}
Can this be done by a one-liner awk? or a program is fine.
A sample of input would be:
12 34 54 56
12 89 34 33
12
29 56 42 42
My expected output would be "yes" or "no" if they have the same number of fields or not.
You could try this command which checks the number of fields in each line and compares it to the number of fields of the first line:
awk 'NR==1{a=NF; b=0} (NR>1 && NF!=a){print "No"; b=1; exit 1}END{if (b==0) print "Yes"}' test.txt
Checking is aborted in the first line whose number of fields is distinct from the first line of input.
For input
12 43 43
12 32
you will get "No"
Try:
awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file
How it works
BEGIN{a="yes"}
This initializes the variable a to yes. (We assume all lines have the same number fields until proven otherwise.)
last!="" && NF!=last{a="no"; exit}
If last has been assigned a value and the number of fields on the current line is not the same as last, then set a to no and exit.
{last=NF}
Update last to the number of fields on the current line.
END{print a}
Before exiting, print a.
Examples
$ cat file1
2 34 54 56
12 89 34 33
12
29 56 42 42
$ awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file1
no
$ cat file2
2 34 54 56
12 89 34 33
29 56 42 42
$ awk 'BEGIN{a="yes"} last!="" && NF!=last{a="no"; exit} {last=NF} END{print a}' file2
yes
I am assuming that you want to check fields of all lines, if they are equal or not if this is case then try following.
awk '
FNR==1{
value=NF
count++
next
}
{
count=NF==value?++count:count
}
END{
if(count==FNR){
print "All lines are of same fields"
}
else{
print "All lines are NOT of same fields."
}
}
' Input_file
Additional stuff(only if require): In case you want to print contents of file whose all lines are having same fields along with yes or all are same fields in file message in output then try following.
awk '
{
val=val?val ORS $0:$0
}
FNR==1{
value=NF
count++
next
}
{
count=NF==value?++count:count
}
END{
if(count==FNR){
print "All lines are of same fields" ORS val
}
else{
print "All lines are NOT of same fields."
}
}
' Input_file
this should do
$ awk 'NR==1{p=NF} p!=NF{s=1; exit} END{print s?"No":"Yes"}' file
however, setting the exit status would be better if this will be part of a workflow.
Since equivalence has transitive property, there is no need to keep NF other than the first line; setting 0 as your success value doesn't require initialization to default value.
An efficient even fields shell function, using sed to construct a regex, (based on the first line of input), to feed to GNU grep, which looks for field length mismatches:
# Usage: ef filename
ef() { sed '1s/[^ ]*/[^ ]*/g;q' "$1" | grep -v -m 1 -q -f - "$1" \
&& echo no || echo yes ; }
For files with uneven fields grep -m 1 quits after the first non-uniform line -- so if the file is a million lines long, but the mismatch occurs on line #2, grep only needs to read two lines, not a million. On the other hand, if there's no mismatch grep would have to read a million lines.
I want to create a table that looks something like this, where i display the # at the front of every $2 and the Title header but I do not want to store it in my file:
Name ID Gender
John Smith #4567734D Male
Kyle Toh #2437882L Male
Julianne .T #0324153T Female
I got a result like this instead:
Name ID Gender
#
Name ID Gender
John Smith #4567734D Male
Name ID Gender
Kyle Toh #2437882L Male
Name ID Gender
Julianne .T #0324153T Female
By using this command:
awk -F: ' {print "Name:ID:Gender\n"}
{print $1":#"$2":"$3}' data.txt |column -s ":" -t
Print the headers only in the first line:
awk -F: 'NR==1 {print "Name:ID:Gender\n"} {print $1":#"$2":"$3}'
or:
awk -F: 'BEGIN {print "Name:ID:Gender\n"} {print $1":#"$2":"$3}'
Explanation:
An awk program consists of expressions in the form of:
CONDITION { ACTIONS } [ CONDITION { ACTIONS }, ... ]
... where CONDITION and ACTION optional. If CONDITION is missing, like:
{ ACTIONS }
... then ACTIONS would be applied to any line of input. In the case above we have two expressions:
# Only executed if NR==1
NR==1 {print "Name:ID:Gender\n"}
# CONDITION is missing. ACTIONS will be applied to any line of input
{print $1":#"$2":"$3}
I needed to extract all hits from one list (list.txt) which can be found in one of the columns of another (here in Data.txt) into a third (output.txt).
Data.txt (tab delimited)
some_data more_data other_data here yet_more_data etc
A B 2 Gee;Whiz;Hello 13 12
A B 2 Gee;Whizz;Hi 56 32
E 4 Btm;Lol 16 2
T 3 Whizz 13 3
List.txt
Gee
Whiz
Lol
Ideally output.txt looks like
some_data more_data other_data here yet_more_data etc
A B 2 Gee;Whiz;Hello 13 12
A B 2 Gee;Whizz;Hi 56 32
E 4 Btm;Lol 16 2
So I tried a shell script
for ids in List.txt
do
grep $ids Data.txt >> output.txt
done
except I typed out everything (cut and paste actually) in List.txt in said script.
Unfortunately it gave me an output.txt including the last line, I assume as 'Whizz' contains 'Whiz'.
I also tried cat Data.txt | egrep -F "List.txt" and that resulted in grep: conflicting matchers specified -- I suppose that was too naive of me. The actual files: List.txt contains a sorted list of 985 words, Data.txt has 115576 rows with 17 columns.
Some help/guidance would be much appreciated thanks.
Try something like this:
for ids in List.txt
do
grep "[TAB;]$ids[TAB;]" Data.txt >> output.txt
done
But it has two drawbacks:
"Data.txt" is scanned multiple times
You can get one line multiple times.
If it is problem try two step version:
cat List.txt | sed -e "s/.*/[TAB;]\0[TAB;]/g" > List_mod.txt
grep -f List_mod.txt Data.txt > output.txt
Note:
TAB character can be inserted by combination Ctrl-V following by Tab key in command line, and Tab character in editor. You have to check if your edit does not change tab to series of spaces.
The UNIX tool for general text processing is "awk":
awk '
NR==FNR { list[$0]; next }
{
for (word in list) {
if ($0 ~ "[\t;]" word "[\t;]") {
print
next
}
}
}
' List.txt Data.txt > output.txt
The title is self explanatory. I am calling a web service which is returning a String like this :
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content.
The format must be something like this :
Field Name(in yellow color) Value(in yellow color)
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ?
Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here.
Kindly provide some suggestions(sample code). Any help is greatly appreciated .
an awk one-liner could convert the format:
awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file
if you want the output to look better, you could pass the output to column and change the OFS like:
awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t
the output is:
kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)658974562
short explanation:
awk #awk command
-v RS="\\n|;" #set line separator is \n(newline) or ;(semi)
-v OFS="\t" #set output field separator: <tab>
-F= #set "=" as field separator
'{gsub(/"/,""); #remove all double quotes
$1=$1} #$1=$1, to let awk reformat the line with given OFS
7' #the non-zero number to print the whole line.
Can be achieved using tr and column:
$ cat input
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
$ cat input | tr ";" "\n" | column -s= -t | tr -d \"
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes!
EDIT: Didn't realize that you want a CSV. In that event, use:
$ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \"
which will result into a TAB delimited output.
file.txt is as below :
gui : 789
gui : 789
gui : 789
gui : 789
abc : 120
The followibng gives o/p as
$ grep -n "gui : 789" file.txt | cut -f1 -d:
1
2
3
4
If there are N number of such gui : 789 , how to store the line numbers of the same ?
You can use this awk 1 liner:
awk '/gui : 789/{print NR}' file
To process this inside a loop:
while read l
do
echo $l
done < <(awk '/gui : 789/{print NR}' file)
EDIT: These command will work for any number of matches in the file. To store output of above line numbers in an array:
arr=( $(awk '/gui : 789/{print NR}' x) )
later on process these array elements as:
echo ${arr[0]}
echo ${arr[1]}
...
echo ${arr[5]}
Like this:
LINES=$(grep -n "gui : 789" file.txt | cut -f1 -d:)
The "LINES" variable will have: "1 2 3 4".
Note: your question was very generic. This answer will work for Bash or Korn Shell.
If you want to do processing with each line, you can do something like:
grep -n "gui : 789" file.txt | cut -f1 -d: | while read lineno; do
: # process using $lineno
done