extracting first line from file using awk command - unix

I've been going through an online UNIX course and have come across this question which I'm stuck on. Would appreciate any help!
You are provided with a set of files each one of which contains personal details about an individual. Each file is laid out in the following format, with one file per individual:
name:Niko Tanaka
age:41
occupation:Doctor
I know the answer has to be in the form:
n=$(awk -F: ' / /{print }' filename)

n=$(awk -F: '/name/{print $2}' infile)
Whatever is inside of / / are regular expressions. In this case you just want to match on the line that contains 'name'.

Related

Bash/R searching columns in a huge table

I have a huge table I want to extract information from. Firstly, I want to extract a certain line based on a pattern -> I've done that successfully with grep. However this line has loads of columns and I'm interested only in a couple of them that have a certain pattern in them (partial match - beginning of the string). Is it possible to extract only the columns and the number of the column (the nth column) for some partial matches? Hope I was clear enough.
Languages: Preferably in bash but I can also work in R, alternatively I'm open to suggestions if you think another language can be more helpful.
Thanks!
Awk is perfect for stuff like this. To help you write a script I think we need more details. But I'm guessing you'll want to use the print feature of awk. To print out the nth column of a file "your_file" do:
awk '{print $n}' your_file
In solving your problem you may also want to loop over all N columns which you can do via:
for i in {1..N} ;
do
awk -v col=${i} '{print $col}' your_file ;
done

AWK Command to Add values

I have a scenario where I am trying to add three fields in a line, that I receive it in my input file. I have written something which I felt doesn't follow the best unix practice and need some suggestions how to write the best possible. Attached my sample line.
My questions are:
Is it possible to add all three fields using one awk command?
The input file may not contain some of these fields ( basaed on scenarios), is awk able handle this? Should I do any check ?
Some time the values may contain "~" at the end , how to consider only numeric?
Input File1 Sample Line
CLP*4304096*20181231*BH*0*AH>444*158330.97~*FB*0*SS>02*0*SS>03*0*J1*0~
Input File2 sample line
CLP*4304096*20181231*BH*0*AH>444*158330.97*FB*0
Script I have written
clp=$(awk -F'[*]' '/CLP/{print $7}' $file)
ss02=$(awk -F'[*]' '/CLP/{print $11}' $file)
ss03=$(awk -F'[*]' '/CLP/{print $13}' $file)
clpsum=clp+ss02+ss03
I know it's not the best way, pls let me know how can I handle both the input file 1 ( it has 158330.97~) scenario and file 2 scenario.
Thanks!
in 1 awk command:
awk 'BEGIN{FS="*"}{var1=$7;var2=$11;var3=$13; var4=var1+var2+var3; printf("var4 = %.2f\n",var4)}' file.txt
works as long as fields are the same ~ you might want someone with a more robust answer if you want to handle whenever a file comes in with numbers in different fields ~etc. Hope this helps in anyway.

How to delete lines from a file that start with certain words

My file extension is CSV file looks below format in unix server.
"Product_Package_Map_10302017.csv","451","2017-10-30 05:02:26"
"Targeting_10302017.csv","13","2017-10-30 05:02:26",
"Targeting_Options_10302017.csv","42","2017-10-30 05:02:27"
I want to delete a particular line based on filename keyword.
You can use grep -v:
grep -v '^"Product_Package_Map_10302017.csv"' file > file.filtered
'^"Product_Package_Map_10302017.csv"' matches the string "Product_Package_Map_10302017.csv" exactly at the line beginning
or sed can do it in-place:
sed -i '/^"Product_Package_Map_10302017.csv"/d' file
See this related post for other alternatives:
Delete lines in a text file that contain a specific string
See this previous question. A grep-based answer would be my first choice but, as you can see, there are many ways to address this one!
(Would have just commented, but my 'rep' is not yet high enough)

grep: how to show the next lines after the matched one until a blank line [not possible!]

I have a dictionary (not python dict) consisting of many text files like this:
##Berlin
-capital of Germany
-3.5 million inhabitants
##Earth
-planet
How can I show one entry of the dictionary with the facts?
Thank you!
You can't. grep doesn't have a way of showing a variable amount of context. You can use -A to show a set number of lines after the match, such as -A3 to show three lines after a match, but it can't be a variable number of lines.
You could write a quick Perl program to read from the file in "paragraph mode" and then print blocks that match a regular expression.
as andy lester pointed out, you can't have grep show a variable amount of context in grep, but a short awk statement might do what you're hoping for.
if your example file were named file.dict:
awk -v term="earth" 'BEGIN{IGNORECASE=1}{if($0 ~ "##"term){loop=1} if($0 ~ /^$/){loop=0} if(loop == 1){print $0}}' *.dict
returns:
##Earth
-planet
just change the variable term to the entry you're looking for.
assuming two things:
dictionary files have same extension (.dict for example purposes)
dictionary files are all in same directory (where command is called)
If your grep supports perl regular expressions, you can do it like this:
grep -iPzo '(?s)##Berlin.*?\n(\n|$)'
See this answer for more on this pattern.
You could also do it with GNU sed like this:
query=berlin
sed -n "/$query/I"'{ :a; $p; N; /\n$/!ba; p; }'
That is, when case-insensitive $query is found, print until an empty line is found (/\n$/) or the end of file ($p).
Output in both cases (minor difference in whitespace):
##Berlin
-capital of Germany
-3.5 million inhabitants

How do you make awk ignore special characters in the input file?

Ok, So here is the issue. I am trying to create an awk program that adds a few characters to a column in a file. Simple enough, but the problem is the file contains characters awk interprets as escape or special characters, such as \ ^ & and /... I want awk to act as if all characters in between the field separator (or any non field or new record character actually) are simply supposed to be there and don't convey special informatoin. i don't want it to interpret any of the file in any special way. Is there a way to do this?
Judging from your comments, it seems that you are telling awk to use the file as if it were a program rather than treating it as data. Try:
awk -F\| '{print $2}' NH3

Resources