updating xml, using xmlstarlet, with a sequence of letters - xmlstarlet

When I am having input.xml, and want to update it to get output.xml (see below) XMLSTARLET fails.
First I tried to find the correct XSLT function to get the needed values, which led to this:
xmlstarlet sel -t -m //field -v . -o "=" -v 'substring("ABCDEFGHIJK",position(),1)' -n input.xml
output:
5 =A
3 =B
2 =C
4 =D
55 =E
42 =F
This made me believe that I should be able to update this XML with the following command:
xmlstarlet ed -u //field -x 'substring("ABCDEFGHIJK",position(),1)' input.xmlxmlstarlet ed -u //field -x 'substring("ABCDEFGHIJK",position(),1)' input.xml
But I did get:
Invalid context position
Segmentation fault
I tried using XmlStarlet on Windows 11, and on Ubuntu 20.04, both did a core dump.
I am interested in another solution using XmlStarlet.
FILES
input.xml
<root>
<field> 5 </field>
<field> 3 </field>
<field> 2 </field>
<field> 4 </field>
<field> 55 </field>
<field> 42 </field>
</root>
(desired) output.xml
<root>
<field>A</field>
<field>B</field>
<field>C</field>
<field>D</field>
<field>E</field>
<field>F</field>
</root>

position() works with xmlstarlet select's -m (--match) option (i.e. xsl:for-each) which determines the context position.
xmlstarlet select --indent -t \
-e '{name(*)}' \
-m '//field' -e '{name()}' -v 'substring("ABCDEFGHIJK",position(),1)' \
file.xml
With xmlstarlet edit's -u (--update) you can use a sibling node count, e.g.
xmlstarlet edit -O \
-u '//field' -x 'substring("ABCDEFGHIJK",1+count(preceding-sibling::field),1)' \
file.xml
or
xmlstarlet edit -O \
-u '//field' -x 'substring("ABCDEFGHIJK",count(preceding-sibling::* | self::*),1)' \
file.xml
Each of these commands produces the desired output. Line continuation chars added for readability.

Related

xmlstarlet add element with namespace and attributes

I'm trying to add a node with a namespace and an attribute to an xml, but it fails if I try to do it as multiple commands in one execution of xmlstarlet:
<?xml version="1.0"?>
<levela xmlns:xi="http://www.w3.org/2001/XInclude">
<levelb>
</levelb>
</levela>
xmlstarlet ed -L -s /levela/levelb -t elem -n xi:input -i //xi:input -t attr -n "href" -v "aHref" file.xml
I'm trying to get:
<?xml version="1.0"?>
<levela xmlns:xi="http://www.w3.org/2001/XInclude">
<levelb>
<xi:input href="aHref"/>
</levelb>
</levela>
But the attribute isn't added. So I get:
<?xml version="1.0"?>
<levela xmlns:xi="http://www.w3.org/2001/XInclude">
<levelb>
<xi:input/>
</levelb>
</levela>
It works if I run it as two executions like this:
xmlstarlet ed -L -s /levela/levelb -t elem -n xi:input file.xml
xmlstarlet ed -L -i //xi:input -t attr -n "href" -v "aHref" file.xml
It also works if I add a tag without a namespace e.g:
xmlstarlet ed -L -s /levela/levelb -t elem -n levelc -i //levelc -t attr -n "href" -v "aHref" file.xml
<?xml version="1.0"?>
<levela xmlns:xi="http://www.w3.org/2001/XInclude">
<levelb>
<levelc href="aHref"/>
</levelb>
</levela>
What am I doing wrong? Why doesn't it work with the namespace?
This will do it:
xmlstarlet edit \
-s '/levela/levelb' -t elem -n 'xi:input' \
-s '$prev' -t attr -n 'href' -v 'aHref' \
file.xml
xmlstarlet edit code can use the convenience $prev (aka
$xstar:prev) variable to refer to the node created by the most
recent -i (--insert), -a (--append), or -s (--subnode) option.
Examples of $prev are given in
doc/xmlstarlet.txt and
the source code's
examples/ed-backref*.
Attributes can be added using -i, -a, or -s.
What am I doing wrong? Why doesn't it work with the namespace?
Update 2022-04-15
The -i '//xi:input' … syntax you use is perfectly logical. As your
own 2 alternative commands suggest it's the namespace xi that
triggers the omission and there's a hint in the edInsert function in
the source code's
src/xml_edit.c
where it says NULL /* TODO: NS */.
When you've worked with xmlstarlet for some
time you come to accept its limitations (or not); in this case the
$prev back reference is useful. I wouldn't expect that TODO to
go away anytime soon.
(end update)
Well, I think xmlstarlet edit looks upon node naming as a user
responsibility, as the following example suggests,
printf '<v/>' |
xmlstarlet edit --omit-decl \
-s '*' -t elem -n 'undeclared:qname' -v 'x' \
-s '*' -t elem -n '!--' -v ' wotsinaname ' \
-s '$prev' -t attr -n ' "" ' -v '' \
-s '*' -t elem -n ' <&> ' -v 'harrumph!'
the output of which is clearly not XML:
<v>
<undeclared:qname>x</undeclared:qname>
<!-- "" =""> wotsinaname </!-->
< <&> >harrumph!</ <&> >
</v>
If you want to indent the new element, for example:
xmlstarlet edit \
-s '/levela/levelb' -t elem -n 'xi:input' \
--var newnd '$prev' \
-s '$prev' -t attr -n 'href' -v 'aHref' \
-a '$newnd' -t text -n ignored -v '' \
-u '$prev' -x '(//text())[1][normalize-space()=""]' \
file.xml
The -x XPath expression grabs the first text node provided it
contains nothing but whitespace, i.e. the first child node of levela.
The --var name xpath option to define an xmlstarlet edit
variable is mentioned in
doc/xmlstarlet.txt
but not in the user's guide.
I used xmlstarlet version 1.6.1.
It seems you can't insert an attribute and attribute value into a namespaced node... Maybe someone smarter can figure out something else, but the only way I could get around that, at least in this case, is this:
xmlstarlet ed -N xi="http://www.w3.org/2001/XInclude" --subnode "//levela/levelb" \
--type elem -n "xi:input" --insert "//levela/levelb/*" --type attr --name "href"\
--value "aHref" file.xml

dynamically pass string to Rscript argument with sed

I wrote a script in R that has several arguments. I want to iterate over 20 directories and execute my script on each while passing in a substring from the file path as my -n argument using sed. I ran the following:
find . -name 'xray_data' -exec sh -c 'Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
which results in this error:
ubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
sh: command substitution: line 0: syntax error near unexpected token `('
sh: command substitution: line 0: `sed -e s/.*DeMMO.*[/](.*)_.*[/]xray_data/1/ "./DeMMO1/D1T3rep_Dec2019_Ellison/xray_data"'
When I try to use sed with my pattern on an example file path, it works:
echo "./DeMMO1/D1T1exp_Dec2019_Poorman/xray_data" | sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/'
which produces the correct substring:
D1T1exp_Dec2019
I think there's an issue with trying to use single quotes inside the interpreted string but I don't know how to deal with this. I have tried replacing the single quotes around the sed pattern with double quotes as well as removing the single quotes, both result in this error:
sed: RE error: illegal byte sequence
How should I extract the substring from the file path dynamically in this case?
To loop through the output of find.
while IFS= read -ru "$fd" -d '' files; do
echo "$files" ##: do whatever you want to do with the files here.
done {fd}< <(find . -type f -name 'xray_data' -print0)
No embedded commands in quotes.
It uses a random fd just in case something inside the loop is eating/slurping stdin
Also -print0 delimits the files with null bytes, so it should be safe enough to handle spaces tabs and newlines on the path and file names.
A good start is always put an echo in front of every commands you want to do with the files, so you have an idea what's going to be executed/happen just in case...
This is the solution that ultimately worked for me due to issues with quotes in sed:
for dir in `find . -name 'xray_data'`;
do sampleID="`basename $(dirname $dir) | cut -f1 -d'_'`";
Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f "$dir" -b "$dir/SEM_images" -c "$dir/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "$sampleID";
done

Insert element in XML document at specific position with xmlstarlet

I'm writing a bash script to edit Tomcat's server.xml file. I have it successfully adding a Connector node. To run this example, download and unpack Apache Tomcat 9, go into the conf directory where there is a server.xml file, and run:
xmlstarlet edit -P --inplace \
--subnode "/Server/Service" \
--type elem -n ConnectorNew -v "" \
--insert //ConnectorNew --type attr -n "port" -v "443" \
--insert //ConnectorNew --type attr -n "protocol" -v "org.apache.coyote.http11.Http11NioProtocol" \
--insert //ConnectorNew --type attr -n "keystoreFile" -v "example-key.pem" \
--insert //ConnectorNew --type attr -n "sslProtocol" -v "TLS" \
--insert //ConnectorNew --type attr -n "SSLEnabled" -v "true" \
--subnode "/Server/Service/ConnectorNew" \
--type elem -n "UpgradeProtocolNew" -v "" \
--insert //UpgradeProtocolNew --type attr -n "className" -v "org.apache.coyote.http2.Http2Protocol" \
--rename //ConnectorNew -v Connector \
--rename //UpgradeProtocolNew -v UpgradeProtocol server.xml
which is pretty cool! Upon running that there will now be a TLS Connector on port 443 with the given example key. That would run as usual assuming the key file exists and it's running as root (real server deployments shouldn't run as root but should use jsvc instead).
However that shows up at the very end of the Service element. I would like ideally to put it in the file after the last existing Connector element so the file looks normal. I don't think order of Connector elements has any effect on Tomcat, although I would like it to look like a normal config file that other people would expect, when they go looking for connector elements.
I assume there's some way to do this with xmlstarlet but I couldn't figure it out.
I hope I can avoid using xslt features to do this because I don't want to have to learn and manage another technology to get this script done.
Thank you!
If you have already a Connector defined in you server.xml you can replace --subnode "/Server/Service" by --append /Server/Service/Connector and this will insert your new Connector element right after the first existent Connector.
xmlstarlet edit -P --inplace \
--append /Server/Service/Connector \
--type elem -n ConnectorNew -v "" \
--insert //ConnectorNew --type attr -n "port" -v "443" \
...
If this is the first Connector to insert you would want to do --insert /Server/Service/Engine and your Connector element will be inserted before the Engine element where Connectors usually reside in the default server.xml
xmlstarlet edit -P --inplace \
--insert /Server/Service/Engine \
--type elem -n ConnectorNew -v "" \
--insert //ConnectorNew --type attr -n "port" -v "443" \
...
You may also want to delete all commented xml elements before you start editing the server.xml so that you have a clean and readable file:
xmlstarlet ed -L -d '//comment()' server.xml
and if you do so, you would need to insert a space before the closing "/>", otherwise tomcat will complain that server.xml is corrupt:
sed -i "s/\"\/>/\" \/>/g" server.xml

Grep hex characters in a file

I am having some difficulty finding the number of hex characters in a file. For example:
grep -o \x02 file | wc -l
0
There should be about 3M matches here, but it doesn't seem like the \x02 character is being recognized here. For example (in python):
>>> s=open('file').read()
>>> s.count('\x02')
2932267
The answer by Mark Setchell may be OK for MacOS but doesn't seem to work on debian using bash (tested with bash 4.4, grep 2.27).
I could get a match using the -P directive (for Perl regex)
user#host:~ $ printf '\x02\n3\n\x02' | grep -c -P '\x02'
2
user#host:~ $ printf '\x02\n3\n\x02' | grep -c -P '\xFF' #same input, different pattern
0
user#host:~ $ printf '\x02\n3\n\xff' | grep -c -P '\xFF' #match with unmatching case
2
Hope this helps
This seems to do what you want on macOS:
printf "\x02\n3\n\x02" | grep -c "\x02"
2

UNIX for loop with some options

I have for loop:
for mnt `cat $file.txt`
do
grep -h -i -A 3 -B 4 *log | grep -v "10001" >> extrafile.txt
done
What does -A 3 and -B 4 means?
After and Before followed by number of lines
After en Before. After and before what?
No wonder the grep is confusing: You don't mention the "${mnt}" you are searching for. When I improve your script (moving input and output to the end, outside the loop, and using ${mnt}), the script looks like
while read -r mnt; do
grep -h -i -A 3 -B 4 "${mnt}" *log | grep -v "10001"
done < "${file.txt}" >> extrafile.txt
You get the context of every hit from $file.txt and delete all lines with 10001.

Resources