Using `position()` to extract specific element - xmlstarlet

I have a datafile in a format similar to
<!-- mydata.xml -->
<alldata>
<data id="first">
<coord><x>0</x><y>5</y></coord>
<coord><x>1</x><y>4</y></coord>
<coord><x>2</x><y>3</y></coord>
</data>
<data id="second">
<coord><x>0</x><y>2</y></coord>
<coord><x>1</x><y>1</y></coord>
<coord><x>2</x><y>0</y></coord>
</data>
</alldata>
As the x values are the same in all my datasets in the xml-files, I would like to extract the data to a csv-format like
x;first y;second y
0;5;2
1;4;1
2;3;0
Naively, I've tried to match <coord> in the first <data> element and use position() to extract the correct <y> from the next <coord>'s in the <data> element with the second id attribute:
xml sel -T -t -m "/alldata/data[#id='first']/coord" -v "concat(x,';',y,';',../../data[#id='second']/coord[position()]/y,';',position())" -n mydata.xml
This outputs the <y> from the first <coord> in all lines in the output even though the position() is incremented in each line:
0;5;2;1
1;4;2;2
2;3;2;3
How can I achieve what I set out to do?

position() gives the context's position (where it's called from). ../../data[#id='second']/coord[position()] actually means "every coord under the second data which is in it's own position" (which is all of them but XPath 1.0 string conversion only takes the first one).
To refer to the coord you're looping on, you can use the XSLT function current(). This doesn't work with position() for some good reason that I can't think of right now, but you can count() the preceding-sibling nodes instead:
xml sel -T -t -m "/alldata/data[#id='first']/coord" -v "concat(x,';',y,';',../../data[#id='second']/coord[count(current()/preceding-sibling::*)+1]/y)" -n mydata.xml

Related

How to get the placeholder's value which is stored in a different file (same directory) using JSch exec

With the conditions:
I cannot use any XML parser tool as I don't have permission , read only
My xmllint version does not support xpath, and I cannot update it , read only
I dont have xmlstarlet and cannot install it
I run my script using Java JSch exec channel ( I have to run it here )
So we have 3 files in a directory.
sample.xml
values1.properties
values2.properties
The contents of the files are as follows:
Sample.xml
<block>
<name>Bob</name>
<address>USA</address>
<email>$BOB_EMAIL</email>
<phone>1234567</phone>
</block>
<block>
<name>Peter</name>
<address>France</address>
<cell>123123123</cell>
<drinks>Coke</drinks>
<car>$PETER_CAR</car>
<bike>Mountain bike</bike>
</block>
<block>
<name>George</name>
<hobby>$GEORGE_HOBBY</hobby>
<phone>$GEORGE_PHONE</phone>
</block>
values1.properties
JOE_EMAIL=joe#google.com
BOB_EMAIL=bob#hotshot.com
JACK_EMAIL=jack#jill.com
MARY_EMAIL=mary#rose.com
PETER_EMAIL=qwert1#abc.com
GEORGE_PHONE=Samsung
values2.properties
JOE_CAR=Honda
DAISY_CAR=Toyota
PETER_CAR=Mazda
TOM_CAR=Audi
BOB_CAR=Ferrari
GEORGE_HOBBY=Tennis
I use this script to get the xml block to be converted to a properties file format
NAME="Bob"
sed -n '/name>'${NAME}'/,/<\/block>/s/.*<\(.*\)>\(.*\)<.*/\1=\2/p' sample.xml
OUTPUT:
name=Bob
address=USA
email=$BOB_EMAIL
phone=1234567
How do I get the value of $BOB_EMAIL in values1.properties and values2.properties. Assuming that I do not know where it is located between the two (or probably more) properties file. Bacause it should work differently if I entered
Name=Peter
in the script, it should get
name=Peter
address=France
cell=123123123
drinks=Coke
car=$PETER_CAR
bike=Mountain bike
and the think that will be searched will be PETER_CAR
EXPECTED OUTPUT (The user only needs to input 1 Name at a time and the output expected is one set of data in properties format with the $PLACEHOLDER replaced with the value from the properties file):
User Input: Name=Bob
name=Bob
address=USA
email=bob#hotshot.com
phone=1234567
User Input: Name=Peter
name=Peter
address=France
cell=123123123
drinks=Coke
car=Mazda
bike=Mountain bike
Ultimately, the script that I need has this logic:
for every word with $
in the result of sed -n '/name>'${name}'/,/<\/block>/s/.*<(.*)>(.*)<.*/\1=\2/p' sample.xml ,
it will search for the value of that word in all of the properties file in that directory(or specified properties files),
then replace the word with $ with the value found in the properties file
PARTIALLY WORKING ANSWER:
Walter A's answer is working in cmd line (putty) but not in Jsch exec.
I keep getting an error of No value found for token 'var' .
The solution beneath will look in the properties files a lot of times, so I think there is a faster solution for the problem.
The solution beneath will get you started and with small files you might be happy with it.
# Question has a bash en ksh tag, choose the shebang line you want
# Make sure it is the first line without space or ^M after it.
#!/bin/ksh
#!/bin/bash
# Remove next line (debugging) when all is working
# set -x
for name in Bob Peter; do
sed -n '/name>'${name}'/,/<\/block>/s/.*<\(.*\)>\(.*\)<.*/\1=\2/p' sample.xml |
while IFS="\$" read line var; do
if [ -n "${var}" ]; then
echo "${line}$(grep "^${var}=" values[12].properties | cut -d= -f2-)"
else
echo "${line}"
fi
done
echo
done
EDIT: Commented two possible shebang lines, set -x and added output.
Result:
name=Bob
address=USA
email=bob#hotshot.com
phone=1234567
name=Peter
address=France
cell=123123123
drinks=Coke
car=Mazda
bike=Mountain bike
. values1.properties
. values2.properties
sed -n '/name>'${NAME}'/,/<\/block>/s/.*<\(.*\)>\(.*\)<.*/echo \1="\2"/p' sample.xml >output
. output
Dangerous, and not the way I would prefer to do it.
A sed based version:
$ temp_properties=`mktemp`
$ NAME=Bob
$ sed '/./{s/^/s|$/;s/=/|/;s/$/|g/}' values*.properties > $temp_properties
$ sed -n '/name>'${NAME}'/,/<\/block>/s/.*<\(.*\)>\(.*\)<.*/\1=\2/p' sample.xml | sed -f $temp_properties
Gives:
name=Bob
address=USA
email=bob#hotshot.com
phone=1234567
It does have issues of script injection. However, if you trust the values*.properties files & contents of NAME variable, you are good to go.

extracting nodes values with xmlstarlet

i have this xml schema , what i want is how to extract the values of all the nodes one by one, using XMLStarlet , in shell script
<service>
<imageScroll>
<imageName>Photo_Gallerie_1.jpg</imageName>
</imageScroll>
<imageScroll>
<imageName>Photo_Gallerie_2.jpg</imageName>
</imageScroll>
<imageScroll>
<imageName>Photo_Gallerie_3.jpg</imageName>
</imageScroll>
</service>
xmlstarlet sel -t -m "//imageName" -v . -n your.xml
output:
Photo_Gallerie_1.jpg
Photo_Gallerie_2.jpg
Photo_Gallerie_3.jpg
Is that what you needed?
sel (select mode)
-t (output template(this is pretty much required)
-m (for each match of the following value)
"// (the double slash means it could be anywhere in the tree)
imageName (name of node you want)"
-v (requests the value of an element in the current path) and the . represents current element in iteration (you could put the name of the node there but it's generally easier this way)
and then the
-n is to add a line for every value you match.
that was the solution that i found and it did perfectly the job.
imagescroller=`xmlstarlet sel -t -m "//root/services/service/imageScroll[rank_of_the_desired_item]" -v imageName -n myfile.xml
sorry for late.

How to find out the content of a XML file using Unix Sed/Awk?

I have a XML file(MyXML.xml) like this :
<?xml version="1.0" encoding="UTF-8"?>
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Body>
<ns3:GetAllInfoFromRest xmlns:ns2="http://com.lanuk.cfe/b2_7/service/objects" xmlns:ns3="http://com.lanuk.cfe/b2_7/service/operations">
1111,GH43567,Hamburger,GET,278598655,\n000001, ,Kunal,Bhyuo,Ramond,856 K. 98 Rd, , ,Tripura,AGT,INDIA,856987, ,S,S,S,8956,\666666
</ns3:GetAllInfoFromRest>
</S:Body>
</S:Envelope>
Now i need to strip out the SOAP content and all the tag attributes from this xml and get only the string response 1111,GH43567,Hamburger,GET,278598655,\n000001, ,Kunal,Bhyuo,Ramond,856 K. 98 Rd, , ,Tripura,AGT,INDIA,856987, ,S,S,S,8956,\666666.
How can i do it with awk or sed ?
I tried it in this way :
$ xgawk -lxml 'XMLATTR["xmlns:ns3"]=="http://com.lanuk.cfe/b2_7/service/operations"{print $2}' MyXML.xml
But obviously I am making some mistake due to which it is not working.
Can some one suggest any other way around this ?
Using awk
awk '{gsub(/<[^>]*>/,"")}NF{$1=$1;print}' file.xml
1111,GH43567,Hamburger,GET,278598655,\n000001, ,Kunal,Bhyuo,Ramond,856 K. 98 Rd, , ,Tripura,AGT,INDIA,856987, ,S,S,S,8956,\666666
gsub section replace everything starting with < and ends with >, so eks <S:Body> is removed. NF just print out lines that do contain data, removing blank lines. $1=$1 removed leading and trailing spaces.
You might want to look into xmlstarlet (http://xmlstar.sourceforge.net/).
xmlstarlet is a command line xml toolkit. xmlstarlet allows you to convert
the xml into the pyx format.
pyx is essentially a flattened xml representation, one line per tag.
Then you can use grep, sed, etc. to extract what you want.

Patchfiles - Editing Lines, rather than Replacing?

Is it possible to create a diff patchfile that will edit lines themselves, rather than replacing an entire line?
For example, I have the following line:
<foo:ListeningPortBar>3423</foo:ListeningPortBar>
and I want to change this to:
<cat:LoremIpsum>3423</cat:LoremIpsum>
That is, I want to change the text around the actual port number, but preserve the port number - I need to apply this patch across a number of files, all with different port numbers - I simply want to change the tags, keeping whatever port number is in there currently.
How can you achieve this please?
Thanks,
Victor
It doesn't really matter if a patch replaces the entire line or just characters in the line (the end result is the same, no...?), but I don't think this is a "patch" question. See below for a simpler solution using "sed".
For example, assume:
$ cat f1.txt
<xml>
<foo:ListeningPortBar>3423</foo:ListeningPortBar>
</xml>
$cat f2.txt
<xml>
<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
Then, literally the patch would be:
$ diff -u f1.txt f2.txt
--- f1.txt 2012-07-08 03:14:39.328328048 -0700
+++ f2.txt 2012-07-08 03:14:30.618177130 -0700
## -1,3 +1,3 ##
<xml>
-<foo:ListeningPortBar>3423</foo:ListeningPortBar>
+<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
This patch file could be used as a template, modified with correct values for all your files that need to be updated, and applied to all the files individually. That sounds like more work than necessary.
On the other hand, just use "sed":
$ sed 's/<foo:ListeningPortBar>\([0-9]*\)<\/foo:ListeningPortBar>/<cat:LoremIpsum>\1<\/cat:LoremIpsum>/' f1.txt
<xml>
<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
Since you have XML, using xsltproc is another alternative, but again probably overkill for this simple search-and-replace task.
To use this in a script, you'd do something like (replacing "etc/etc" with the sed above):
for f in $(find dir -name "*.xml" -exec egrep 'foo:ListeningPortBar' {} \; -print)
do
sed -i.bak 's/etc/etc/g' $f
done
...and then verify that the ".bak" files are actually different than the modified files.

script to extract the details from xml

if have any xml file as below:
<soap env="abc" id="xyz">
<emp>acdf</emp>
<Workinstance name="ab" id="ab1">
<x>1</x>
<y>2</y>
</Workinstance>
<projectinstance name="cd" id="cd1">
<u>1</u>
<v>2</v>
</projectinstance>
</soap>
I want to extract the id field in workinstance using unix script
I tried grep but, it is retrieving the whole xml file.
Can someone help me how to get it?
You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications.
Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.
XmlStarlet seems the tool I was looking for!
To do extract your tag, try to do the following:
cat your_file.xml | xmlstarlet sel -t -v 'soap/Workinstance/#id'
The "soap/Workinstance/#id" is an XPath expression that will get the id attribute inside Workinstance tag. By using "-v" flag, you ask xmlstarlet to print the extracted text to the standard output.
If you have Ruby
$ ruby -ne 'print $_.gsub(/.*id=\"|\".*$/,"" ) if /<Workinstance/' file
ab1

Resources