How to extract this numerical value from the text file in unix - unix

I want to extract the value of VALUE_ID in the below text and store it in a variable.
MSG : SUCCESS! ABCDEFGHIJK
VALUE_ID: 775
Please note that there is a space after : in VALUE_ID.
Can we use awk for this or is there any easier way?

Here is a possible solution:
awk '$1 == "VALUE_ID:" {id=$2}' input_file
This seems fairly pointless to me. If you describe your needs more precisely the I could help you better.

With awk:
var=$(awk '$1 == "VALUE_ID:" {print $2}' File)
Inside awkscript, we check if the first field in the line is VALUE_ID:. if yes, print the value field which will be seperated by space. This output is saved to the bash variable var, which will contain 775.

Related

How to read a value from recursive xml attribute in Unix using sed/awk/grep only

I have config.xml. Here I need to retrieve the value of the attribute from the xpath
/domain/server/name
I can only use grep/sed/awk. Need Help
The content of the xml is below where I need to retrieve the Server Name only.
<domain>
<server>
<name>AdminServer</name>
<port>1234</port>
</server>
<server>
<name>M1Server</name>
<port>5678</port>
</server>
<machine>
<name>machine01</name>
</machine>
<machine>
<name>machine02</name>
</machine>
</domain>
The output should be :
AdminServer
M1Server
I tried to do,
sed -ne '/<\/name>/ { s/<[^>]*>(.*)<\/name>/\1/; p }' config.xml
sed is only for simple substitutions on individual lines, doing anything else with sed is strictly for mental exercise, not for real code. That's not what you are trying to do so you shouldn't even be considering sed. Just use awk:
$ awk -F'[<>]' 'p=="server" && $2=="name"{print $3} {p=$2}' file
AdminServer
M1Server
That will work with any awk on any UNIX box. If that's not all you need then edit your question to provide more truly representative sample input and expected output.
Try this command. Name your xml and supply that file as an input.
awk '/<server>/,/<\/server>/' < name.xml | grep "name" | cut -d ">" -f2 | cut -d "<" -f1
OutPut:
AdminServer
M1Server
Based on your sample Input_file shown, could you please try following.
awk -F"[><]" '/<\/server>/{a="";next} /<server>/{a=1;next} a && /<name>/{print $3}' Input_file
sed -n '/<server>/{n;s/\s*<[^>]*>//gp}'
for example. for the first match
1. /<server>/
match the line that contains "<server>" got " <server>"
2. n
the "n" command will go to next line. after executed "n" command got " <name>AdminServer</name>"
3.s/\s*<[^>]*>//gp
replece all "\s*<[^>]*>" as "". then print the pattern space
type "info sed" for more sed command
You can get the desired output with just sed:
sed -n 's:.*<name>\(.*\)</name>.*:\1:p' config.xml
I feel dirty parsing XML in awk.
The following finds the correct depth of entry with the right tag name. It does not verify the path, though it depends on the elements you specified. While this works on your example data, it makes certain ugly assumptions and it's not guaranteed to work elsewhere:
awk -F'[<>]' '$2~/^(domain|server|name)$/{n++} $1~/\// {n--} n==3&&$2=="name"{print $3}' input.xml
A better solution would be to parse the XML itself.
$ awk -F'[<>]' -v check="domain.server.name" '$2~/^[a-z]/ { path=path "." $2; closex="</"$2">" } $0~closex { sub(/\.[^.]$/,"",path) } substr(path,2)==check {print path " = " $3}' input.xml
.domain.server.name = AdminServer
Here it is split out for easier commenting.
$ awk -F'[<>]' -v check="domain.server.name" '
# Split fields around pointy brackets. Supply a path to check.
$2~/^[a-z]/ { # If we see an open tag,
path=path "." $2 # append the current tag to our path,
closex="</"$2">" # compose a close tag which we'll check later.
}
$0~closex { # If we see a close tag,
sub(/\.[^.]$/,"",path) # truncate the path.
}
substr(path,2)==check { # If we match the given path,
print path " = " $3 # print the result.
}
' input.xml
Note that this solution barfs horribly if you feed it badly formatted XML. The recognition of tags could be improved, but may be sufficient if you have consistently formatted XML. It may barf horribly for other reasons too. Do not do this. Install the correct tools to parse XML properly.

Field spearator to used if they are not escaped using awk

i have once question, suppose i am using "=" as fiels seperator, in this case if my string contain for example
abc=def\=jkl
so if i use = as fields seperator, it will split into 3 as
abc def\ jkl
but as i have escaped 2nd "=" , my output should be as
abc def\=jkl
Can anyone please provide me any suggestion , if i can achieve this.
Thanks in advance
I find it simplest to just convert the offending string to some other string or character that doesn't appear in your input records (I tend to use RS if it's not a regexp* since that cannot appear within a record, or the awk builtin SUBSEP otherwise since if that appears in your input you have other problems) and then process as normal other than converting back within each field when necessary, e.g.:
$ cat file
abc=def\=jkl
$ awk -F= '{
gsub(/\\=/,RS)
for (i=1; i<=NF; i++) {
gsub(RS,"\\=",$i)
print i":"$i
}
}' file
1:abc
2:def\=jkl
* The issue with using RS if it is an RE (i.e. multiple characters) is that the gsub(RS...) within the loop could match a string that didn't get resolved to a record separator initially, e.g.
$ echo "aa" | gawk -v RS='a$' '{gsub(RS,"foo",$1); print "$1=<"$1">"}'
$1=<afoo>
When the RS is a single character, e.g. the default newline, that cannot happen so it's safe to use.
If it is like the example in your question, it could be done.
awk doesn't support look-around regex. So it would be a bit difficult to get what you want by setting FS.
If I were you, I would do some preprocessing, to make the data easier to be handled by awk. Or you could read the line, and using other functions by awk, e.g. gensub() to remove those = s you don't want to have in result, and split... But I guess you want to achieve the goal by playing field separator, so I just don't give those solutions.
However it could be done by FPAT variable.
awk -vFPAT='\\w*(\\\\=)?\\w*' '...' file
this will work for your example. I am not sure if it will work for your real data.
let's make an example, to split this string: "abc=def\=jkl=foo\=bar=baz"
kent$ echo "abc=def\=jkl=foo\=bar=baz"|awk -vFPAT='\\w*(\\\\=)?\\w*' '{for(i=1;i<=NF;i++)print $i}'
abc
def\=jkl
foo\=bar
baz
I think you want that result, don't you?
my awk version:
kent$ awk --version|head -1
GNU Awk 4.0.2

How do you make awk ignore special characters in the input file?

Ok, So here is the issue. I am trying to create an awk program that adds a few characters to a column in a file. Simple enough, but the problem is the file contains characters awk interprets as escape or special characters, such as \ ^ & and /... I want awk to act as if all characters in between the field separator (or any non field or new record character actually) are simply supposed to be there and don't convey special informatoin. i don't want it to interpret any of the file in any special way. Is there a way to do this?
Judging from your comments, it seems that you are telling awk to use the file as if it were a program rather than treating it as data. Try:
awk -F\| '{print $2}' NH3

How can I extract a substring from the results of a cut command in unix?

I have a file that is '|' delimited. One of the fields within the file is a time stamp. The field is in the following format: MM-dd-yyyy HH:mm:ss I'd like to be able to print to a file unique dates. I can use the cut command (cut -f1 -d'|' _file_name_ |sort|uniq) to extract unique dates. However, with the time portion of the field, I'm seeing hundreds of results. After I run the cut command, I'd like to take the substring of the first eleven characters to display unique dates. I tried using an awk command such as:
awk ' { print substr($1,1-11) }' | cut -f1 -d'|' _file_name_ |sort|uniq > _output_file_
I'm having no luck. Am I going about this the wrong way? Is there a more simple way of extracting the data I need. Any help would be appreciated.
cut -c1-11 will display characters 1-11 of each input line.
if the date is the first (space separated) field in the file, then the list of unique dates is just:
cut -f1 -d' ' filename | sort -u
Update: in addition to #shellter's correct answer, I'll just present an alternative to demonstrate other awk facilities:
awk '{split($10, a); date[a[1]]++} END {for (d in date) print d}' filename
You're all most there. This is based on the idea that the date time stamp is in field 1.
Edit : changed field to 10, also used -u option to sort instead of sep process with uniq
You don't need the cut, awk will do that for you.
awk -F"|" ' { print substr($10,1,11) }' _file_name_ |sort -u > _output_file_
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, or give it a + (or -) as a useful answer

How do you split a file base on a token?

Let's say you got a file containing texts (from 1 to N) separated by a $
How can a slit the file so the end result is N files?
text1 with newlines $
text2 $etc... $
textN
I'm thinking something with awk or sed but is there any available unix app that already perform that kind of task?
awk 'BEGIN{RS="$"; ORS=""} { textNumber++; print $0 > "text"textNumber".out" }' fileName
Thank to Bill Karwin for the idea.
Edit : Add the ORS="" to avoid printing a newline at the end of each files.
Maybe split -p pattern?
Hmm. That may not be exactly what you want. It doesn't split a line, it only starts a new file when it sees the pattern. And it seems to be supported only on BSD-related systems.
You could use something like:
awk 'BEGIN {RS = "$"} { ... }'
edit: You might find some inspiration for the { ... } part here:
http://www.gnu.org/manual/gawk/html_node/Split-Program.html
edit: Thanks to comment from dmckee, but csplit also seems to copy the whole line on which the pattern occurs.
If I'm reading this right, the UNIX cut command can be used for this.
cut -d $ -f 1- filename
I might have the syntax slightly off, but that should tell cut that you're using $ separated fields and to return fields 1 through the end.
You may need to escape the $.
awk -vRS="$" '{ print $0 > "text"t++".out" }' ORS="" file
using split command we can split using strings.
but csplit command will allow you to slit files basing on regular expressions as well.

Resources