I am trying to read a parameter from the URL, i am able to read for single line but i don't know how to loop in awk, can someone help?
i have file with 1000+ entries like
http://projectreporter.nih.gov/project_info_details.cfm?aid=7714687&icde=0
http://projectreporter.nih.gov/project_info_description.cfm?aid=7896503&icde=0
http://projectreporter.nih.gov/project_info_details.cfm?aid=7895320&icde=0
http://projectreporter.nih.gov/project_info_details.cfm?aid=2675186&icde=9195637
i am trying to only retrive "aid=xxxxxxx", i used the following command to do it and i get the "aid" for the last line
awk '{match($0,"aid=([^ &]+)",a)}END{print a[1]}' file1.txt > outputFile.txt
how to do the same in a loop so i can get all the occurrence?
any help would be appreciated
This should work a little fine tuning for your attempted code.
awk 'match($0,/aid[^&]*/){print substr($0,RSTART,RLENGTH)}' Input_file
In case your single line can have multiple occurrences of aid and you want to print all then try following.
awk '
{
while(match($0,/aid[^&]*/)){
print substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}
' Input_file
Related
I have a scenario where I am trying to add three fields in a line, that I receive it in my input file. I have written something which I felt doesn't follow the best unix practice and need some suggestions how to write the best possible. Attached my sample line.
My questions are:
Is it possible to add all three fields using one awk command?
The input file may not contain some of these fields ( basaed on scenarios), is awk able handle this? Should I do any check ?
Some time the values may contain "~" at the end , how to consider only numeric?
Input File1 Sample Line
CLP*4304096*20181231*BH*0*AH>444*158330.97~*FB*0*SS>02*0*SS>03*0*J1*0~
Input File2 sample line
CLP*4304096*20181231*BH*0*AH>444*158330.97*FB*0
Script I have written
clp=$(awk -F'[*]' '/CLP/{print $7}' $file)
ss02=$(awk -F'[*]' '/CLP/{print $11}' $file)
ss03=$(awk -F'[*]' '/CLP/{print $13}' $file)
clpsum=clp+ss02+ss03
I know it's not the best way, pls let me know how can I handle both the input file 1 ( it has 158330.97~) scenario and file 2 scenario.
Thanks!
in 1 awk command:
awk 'BEGIN{FS="*"}{var1=$7;var2=$11;var3=$13; var4=var1+var2+var3; printf("var4 = %.2f\n",var4)}' file.txt
works as long as fields are the same ~ you might want someone with a more robust answer if you want to handle whenever a file comes in with numbers in different fields ~etc. Hope this helps in anyway.
I have a dataset which I am trying to select the first 10 columns from, and the last 27 columns from (from the 125th column onwards to the final 152nd column).
awk 'BEGIN{FS="\t"} { printf $1,$2,$3,$4,$5,$6,$7,$8,$9,$10; for(i=125; i<=NF; ++i) printf $i""FS; print ""}' Bigdata.txt > Smalldata.txt
With trying this code it gives me the first 12 columns (with their data) and all the headers for all 152 columns from my original big data file. How do I select both columns 1-10 and 125-152 to go into a new file? I am new to linux and any guidence would be appreciated.
don't reinvent the wheel, if you already know the number of columns cut is the tool for this task.
$ cut -f1-10,125-152 bigdata
tab is the default delimiter.
If you don't know the number of columns, awk comes to the rescue!
$ cut -f1-10,$(awk '{print NF-27"-"NF; exit}' file) file
awk will print the end range by reading the first line of the file.
Using the KISS principle
awk 'BEGIN{FS=OFS="\t"}
{ c=""; for(i=1;i<=10;++i) { printf c $i; c=OFS}
for(i=NF-27;i<=NF;++i) { printf c $i }
printf ORS }' file
Could you please try following, since no samples produced so couldn't test it. You need NOT to manually write 1...10 field values you could use a loop for that too.
awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=10;i++){printf("%s%s",$i,OFS)};for(i=(NF-27);i<=NF;i++){printf("%s%s",$i,i==NF?ORS:OFS)}}' Input_file > output_file
Also you need NOT to worry about headers here, since we are simply printing the lines and no logic specifically applied for lines so no need to add any specific entry for 1st line or so.
EDIT: 1 more point here seems you meant that different column values(in different ranges) should come in single line(for a single line from Input) if this is the case then my above code should handle it, since I am printing spaces as separator for their values and printing a new only when their last field's value is printed, by this each line from Input_file fields will be on same line(as Input_file's entry).
Explanation: Adding detailed explanation here.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here, which will be executed before Input_file is getting read.
FS=OFS="\t" ##Setting FS and OFS as TAB here.
} ##Closing BEGIN section here for this awk code.
{ ##Starting a new BLOCK which will be executed when Input_file is being read.
for(i=1;i<=10;i++){ ##Running a for loop which will run 10 times from i=1 to i=10 value.
printf("%s%s",$i,OFS) ##Printing value of specific field with OFS value.
} ##Closing for loop BLOCK here.
for(i=(NF-27);i<=NF;i++){ ##Starting a for loop which will run for 27 last fields only as per OP requirements.
printf("%s%s",$i,i==NF?ORS:OFS) ##Printing field value and checking condition i==NF, if field is last field of line print new line else print space.
} ##Closing block for, for loop now.
}' Input_file > output_file ##Mentioning Input_file name here, whose output is going into output_file.
Sorry to ask this, might be a trivial question, tried awk script as well. But I think I am new to that.
I have a list of Ids in a file i.e. ids.txt
1xre23
223dsf
234ewe
and a log file with FIX messages which might contain those ids.
sample: log file abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=abcd23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=xyzw23^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
I want to check how many ids matched in that log file.
Ids are large almost 10K, and log file size is around 300MB.
sample output I am looking for is.
output:
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Try something like with grep command:
grep -w -f ids.txt abc.log
Output:
35=D^A54=1xre23^A22=s^A120=GBP^A<br>
35=D^A54=234ewe^A22=s^A120=GBP^A<br>
35=D^A54=223dsf^A22=s^A120=GBP^A<br>
If you like to use awk this should do:
awk -F"[=^]" 'FNR==NR {a[$0];next} $4 in a' ids.txt abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
This store the ids.txt in array a
If fourth field (separated by = and ^) contains the ID, print it.
You can also do it the other way around:
awk 'FNR==NR {a[$0];next} {for (i in a) if ($0~i) print}' abc.log ids.txt
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Store all data from abc.log in array a
Then test if line contains data for id.txt
If yes, print the line.
I've been going through an online UNIX course and have come across this question which I'm stuck on. Would appreciate any help!
You are provided with a set of files each one of which contains personal details about an individual. Each file is laid out in the following format, with one file per individual:
name:Niko Tanaka
age:41
occupation:Doctor
I know the answer has to be in the form:
n=$(awk -F: ' / /{print }' filename)
n=$(awk -F: '/name/{print $2}' infile)
Whatever is inside of / / are regular expressions. In this case you just want to match on the line that contains 'name'.
Let's say you got a file containing texts (from 1 to N) separated by a $
How can a slit the file so the end result is N files?
text1 with newlines $
text2 $etc... $
textN
I'm thinking something with awk or sed but is there any available unix app that already perform that kind of task?
awk 'BEGIN{RS="$"; ORS=""} { textNumber++; print $0 > "text"textNumber".out" }' fileName
Thank to Bill Karwin for the idea.
Edit : Add the ORS="" to avoid printing a newline at the end of each files.
Maybe split -p pattern?
Hmm. That may not be exactly what you want. It doesn't split a line, it only starts a new file when it sees the pattern. And it seems to be supported only on BSD-related systems.
You could use something like:
awk 'BEGIN {RS = "$"} { ... }'
edit: You might find some inspiration for the { ... } part here:
http://www.gnu.org/manual/gawk/html_node/Split-Program.html
edit: Thanks to comment from dmckee, but csplit also seems to copy the whole line on which the pattern occurs.
If I'm reading this right, the UNIX cut command can be used for this.
cut -d $ -f 1- filename
I might have the syntax slightly off, but that should tell cut that you're using $ separated fields and to return fields 1 through the end.
You may need to escape the $.
awk -vRS="$" '{ print $0 > "text"t++".out" }' ORS="" file
using split command we can split using strings.
but csplit command will allow you to slit files basing on regular expressions as well.