Extracting multiple value using xmlstarlet - xmlstarlet

How can I extract "failed" from all element and add them up?
<gateway>
<smscs>
<count>3</count>
<smsc>
<id>a</id>
<received><sms>0</sms><dlr>0</dlr></received>
<sent><sms>10537</sms><dlr>0</dlr></sent>
<failed>13</failed>
<queued>6272</queued>
</smsc>
<smsc>
<id>b</id>
<received><sms>0</sms><dlr>0</dlr></received>
<sent><sms>10530</sms><dlr>0</dlr></sent>
<failed>10</failed>
<queued>6284</queued>
</smsc>
<smsc>
<id>c</id>
<received><sms>0</sms><dlr>0</dlr></received>
<sent><sms>10679</sms><dlr>0</dlr></sent>
<failed>11</failed>
<queued>6291</queued>
</smsc>
</smscs>
</gateway>

I simply used
xmlstarlet sel -t -v "sum(/gateway/smscs/smsc/failed)" -n input.xml
which returned
34
The idea is to use the sum() function which takes a node-set and returns the sum of all the elements' string-values converted to numbers.

Solved with
xmlstarlet sel -t -m "gateway/smscs/smsc/failed" -v "." -n | awk '{s+=$1} END {print s}'

Related

Merge a string to a line extracted from a text file in UNIX

I wanted to merge a string ABC to a line that I have extracted from a file.
The following command is used to extract the lines 20-25 in file_ABC, take only the first column, which is then transposed to become a row (or line).
sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s
This is the result:
2727778 14734 0 0 0 2713044
I would like to add at the first position of this line the string ABC.
ABC 2727778 14734 0 0 0 2713044
Any suggestion on how to do that?
A quick hack would be to use something like
printf 'ABC\t%s\n' "$(sed -n '20,25p' < file_ABC | awk '{print $1}' | paste -s)"
You could modify your initial command instead to use awk for everything, though:
awk '
BEGIN {printf "ABC"}
NR>=20 && NR<=25 {printf "\t%s", $1}
END {print ""}
' file_ABC
This might work for you (GNU sed):
sed '20,25{s/\s.*//;H};$!d;x;s/^/ABC/;s/\n/ /g' file
Gather up the first column fields by appending them to the hold space for rows 20 to 25 only. At the end of the file prepend ABC and replace the introduced newlines by spaces.
For fun, bash only
filename=file_ABC
words=("${filename##*_}")
i=0
while read -r word rest_of_line; do
((++i < 20 )) && continue
(( i > 25 )) && break
words+=("$word")
done < "$filename"
join() { local IFS=$1; shift; echo "$*"; }
join $'\t' "${words[#]}"
But this will be much slower than a single awk call.
if you want to keep all in one script
$ awk 'BEGIN {line="ABC"}
NR>=20 && NR<=25 {line=line FS $1}
NR==25 {print line; exit}' file
improved version as suggested by #EdMorton
$awk 'NR>=20 {line=line OFS $1}
NR==25 {print "ABC" line; exit}' file

How to grep for the whole string after a word in the string is matched

I have this
-Xmx10240m -Xms10240m -verbose:gc -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+ScavengeBeforeFullGC -Dsun.net.inetaddr.ttl=3600 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintTenuringDistribution -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:PermSize=512m -XX:MaxPermSize=512m -Xloggc:/www/logs/jboss/macys-navapp_master_prod_cellA_m01/gc-log.txt -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/www/logs/heapdump/navapp_master_prod_cellA__m01/navapp_master_prod_cellA_m01.hprof -Djava.net.preferIPv4Stack=true -Dorg.jboss.resolver.warning=true -Djboss.modules.system.pkgs=org.jboss.byteman -Djava.awt.headless=true -XX:+UseCompressedOops -Dclient.encoding.override=ISO-8859-1 -XX:+DisableExplicitGC -Dorg.apache.jasper.Constants.USE_INSTANCE_MANAGER_FOR_TAGS=false -Dorg.apache.jasper.Constants.USE_INSTANCE_MANAGER -Dorg.apache.jasper.runtime.JspFactoryImpl.USE_POOL=false -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -XX:NewSize=3072m -XX:MaxNewSize=3072m -agentpath:/www/a/apps/dynatrace/dt.so=name=server1_ProdCellA_master_m1,server=ct_collector:9998 -Dfile.encoding=ISO-8859-1 -Dsdp.configuration.home=/www/apps/properties -XX:+UseLargePages -Dzookeeper.sasl.client=false
I want to be able to grep this whole string after i matched "-agentpath" "-agentpath:/www/a/apps/dynatrace/dt.so=name=server1_ProdCellA_master_m1,server=ct_collector:9998"
This is the current command i'm working with but it's not working "cat cached_java_opts | awk '/-agentpath/ {print $(NF)}'"
Thanks you
awk solution:
awk -F'-agentpath:' '{split($2,a," ") ;print FS a[1]}' infile
-agentpath:/www/a/apps/dynatrace/dt.so=name=server1_ProdCellA_master_m1,server=ct_collector:9998
grep : more or less,same as already answered.
grep -oP '\-agentpath:.*?\s' infile
-agentpath:/www/a/apps/dynatrace/dt.so=name=server1_ProdCellA_master_m1,server=ct_collector:9998
launch grep like this:
grep -o '\-agentpath[^ ]*' yourfile
The -o option prints only the matching pattern (not the matching line). Since the pattern is configured to expand until the first space, you'll get the entire argument (this works because it is not the last argument of the command line). Maybe it could be improved with grep -oE '\-agentpath([^ ]*|.*$)'

Find word count using wc and assign to a variable

i want the word count wc -w value be assigned to a variable
i've tried something like this, but i'm getting error, what is wrong?
winget="this is the first line"
wdCount=$winget | wc -w
echo $wdCount
You need to $(...) to assign the result:
wdCount=$(echo $winget | wc -w)
Or you could also avoid echo by using here-document:
wdCount=$(wc -w <<<$winget)
You can pass word count without the filename using the following:
num_of_lines=$(< "$file" wc -w)
See https://unix.stackexchange.com/a/126999/320461
You can use this to store the word count in variable:
word_count=$(wc -w filename.txt | awk -F ' ' '{print $1}'

Is there a way to ignore header lines in a UNIX sort?

I have a fixed-width-field file which I'm trying to sort using the UNIX (Cygwin, in my case) sort utility.
The problem is there is a two-line header at the top of the file which is being sorted to the bottom of the file (as each header line begins with a colon).
Is there a way to tell sort either "pass the first two lines across unsorted" or to specify an ordering which sorts the colon lines to the top - the remaining lines are always start with a 6-digit numeric (which is actually the key I'm sorting on) if that helps.
Example:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
500123TSTMY_RADAR00
222334NOTALINEOUT01
477821USASHUTTLES21
325611LVEANOTHERS00
should sort to:
:0:12345
:1:6:2:3:8:4:2
010005TSTDOG_FOOD01
222334NOTALINEOUT01
325611LVEANOTHERS00
477821USASHUTTLES21
500123TSTMY_RADAR00
(head -n 2 <file> && tail -n +3 <file> | sort) > newfile
The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.
If you don't mind using awk, you can take advantage of awk's built-in pipe abilities
eg.
extract_data | awk 'NR<3{print $0;next}{print $0| "sort -r"}'
This prints the first two lines verbatim and pipes the rest through sort.
Note that this has the very specific advantage of being able to selectively sort parts
of a piped input. all the other methods suggested will only sort plain files which can be read multiple times. This works on anything.
In simple cases, sed can do the job elegantly:
your_script | (sed -u 1q; sort)
or equivalently,
cat your_data | (sed -u 1q; sort)
The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).
For the example given, 2q will do the trick.
The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.
Here is a version that works on piped data:
(read -r; printf "%s\n" "$REPLY"; sort)
If your header has multiple lines:
(for i in $(seq $HEADER_ROWS); do read -r; printf "%s\n" "$REPLY"; done; sort)
This solution is from here
You can use tail -n +3 <file> | sort ... (tail will output the file contents from the 3rd line).
head -2 <your_file> && nawk 'NR>2' <your_file> | sort
example:
> cat temp
10
8
1
2
3
4
5
> head -2 temp && nawk 'NR>2' temp | sort -r
10
8
5
4
3
2
1
It only takes 2 lines of code...
head -1 test.txt > a.tmp;
tail -n+2 test.txt | sort -n >> a.tmp;
For a numeric data, -n is required. For alpha sort, the -n is not required.
Example file:
$ cat test.txt
header
8
5
100
1
-1
Result:
$ cat a.tmp
header
-1
1
5
8
100
So here's a bash function where arguments are exactly like sort. Supporting files and pipes.
function skip_header_sort() {
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
local file=${#: -1}
set -- "${#:1:$(($#-1))}"
fi
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
}
How it works. This line checks if there is at least one argument and if the last argument is a file.
if [[ $# -gt 0 ]] && [[ -f ${#: -1} ]]; then
This saves the file to separate argument. Since we're about to erase the last argument.
local file=${#: -1}
Here we remove the last argument. Since we don't want to pass it as a sort argument.
set -- "${#:1:$(($#-1))}"
Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.
awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
Example usage with a comma separated file.
$ cat /tmp/test
A,B,C
0,1,2
1,2,0
2,0,1
# SORT NUMERICALLY SECOND COLUMN
$ skip_header_sort -t, -nk2 /tmp/test
A,B,C
2,0,1
0,1,2
1,2,0
# SORT REVERSE NUMERICALLY THIRD COLUMN
$ cat /tmp/test | skip_header_sort -t, -nrk3
A,B,C
0,1,2
2,0,1
1,2,0
Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:
$ hsort myfile.txt
$ head -n 100 myfile.txt | hsort -
$ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
The shell function:
hsort ()
{
if [ "$1" == "-h" ]; then
echo "Sort a file or standard input, treating the first line as a header.";
echo "The first argument is the file or '-' for standard input. Additional";
echo "arguments to sort follow the first argument, including other files.";
echo "File syntax : $ hsort file [sort-options] [file...]";
echo "STDIN syntax: $ hsort - [sort-options] [file...]";
return 0;
elif [ -f "$1" ]; then
local file=$1;
shift;
(head -n 1 $file && tail -n +2 $file | sort $*);
elif [ "$1" == "-" ]; then
shift;
(read -r; printf "%s\n" "$REPLY"; sort $*);
else
>&2 echo "Error. File not found: $1";
>&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
return 1 ;
fi
}
This is the same as Ian Sherbin answer but my implementation is :-
cut -d'|' -f3,4,7 $arg1 | uniq > filetmp.tc
head -1 filetmp.tc > file.tc;
tail -n+2 filetmp.tc | sort -t"|" -k2,2 >> file.tc;
Another simple variation on all the others, reading a file once
HEADER_LINES=2
(head -n $HEADER_LINES; sort) < data-file.dat
With Python:
import sys
HEADER_ROWS=2
for _ in range(HEADER_ROWS):
sys.stdout.write(next(sys.stdin))
for row in sorted(sys.stdin):
sys.stdout.write(row)
cat file_name.txt | sed 1d | sort
This will do what you want.

How do I Get the distinct List of Special Characters from a File using GREP or SED?

I have a file which contains about 30000 Records delimited by '|'. I need to get a distinct list of special characters only from the file.
For Eg:
123|fasdf|%df&|pap,came|!
234|%^&asdf|34|'":|
My output should be:
|%&,!^'":
Any help would be greatly appreciated.
Thanks,
Velraj.
grep -o '[|%&,!^":]' input | sort -u
You have to list all your special characters inside brackets.
This will return each unique special character on its own line. If you really need a string with these characters you have to remove newlines afterwards, e.g.:
grep -o '[|%&,!^":]' input | sort -u | tr -d '\n'
UPDATE:
If you need to remove all characters which are not from 'a-zA-Z0-9' set then you can use this one:
grep -o '[^a-zA-Z0-9]' input | sort -u | tr -d '\n'
echo "123|fasdf|%df&|pap,came|! 234|%^&asdf|34|'\":|" \
| { tr -d '[[:alnum:]]'; printf "\n"; } \
| sed 's/\(.\)/\1_/g' \
| awk -v 'RS=_' '{print $0}' \
| sort -u \
| awk '{printf $0}END{printf "\n"}'
output
!"%&',:^||
You can replace the first line echo .... with cat fileName

Resources