sed multiline remove before pattern - unix

Hi I have a big log file for which I am trying to get xml data passed into It.
I have a big log file which ressembles this :
2016/01/01 bladh bqskjdqskldjqsdlqskdjqlskdj dazihzmkldjkdjqslkjd
2016/01/01: qsdhqsdlkqsmdjqsldjqslkdjqlskdjqslkdjqslkdjqskdjqsd
2016/01/01: qsjdqmlskdmlqskdmcxxxx [qskjd][qsdjqslkdj] Payload :[<LOG><a>a</a>
<b>b</b>
<c>c</c>
<id>XXXXX</id>
<d>d</d>
</LOG>]]
2016/01/01 bladh bqskjdqskldjqsdlqskdjqlskdj dazihzmkldjkdjqslkjd
2016/01/01: qsdhqsdlkqsmdjqsldjqslkdjqlskdjqslkdjqslkdjqskdjqsd
2016/01/01: qsjdqmlskdmlqskdmcxxxx [qskjd][qsdjqslkdj] Payload :[<LOG> <a>a</a>
<b>b</b>
<c>c</c>
<id>YYYYY</id>
<d>d</d>
</LOG>]]
qskdmqlskdqlsdqlskdqlsdk
qsdlkqsdlkqsdmlkqsdlk
For now I am using
sed -n '/<START/{:start /\/END/!{N;b start};/XXXXX/p}' logFile
and I am getting this
2016/01/01: qsjdqmlskdmlqskdmcxxxx [qskjd][qsdjqslkdj] Payload :[<LOG><a>a</a>
<b>b</b>
<c>c</c>
<id>XXXXX</id>
<d>d</d>
</LOG>]]
I would like to retrieve the whole XML and get :
<LOG>
<a>a</a>
<b>b</b>
<c>c</c>
<id>XXXX</id>
<d>d</d>
</LOG>
Thanks in advance

Solution in TXR:
#(repeat)
# (skip)Payload :[<#tag>#preamble
# (collect)
#middle
# (last)
</#tag>]]
# (end)
# (output)
<#tag>
#(trim-str preamble)
# (repeat)
#middle
# (end)
</#tag>
# (end)
#(end)
Run:
$ txr extract.txr data
<LOG>
<a>a</a>
<b>b</b>
<c>c</c>
<id>XXXXX</id>
<d>d</d>
</LOG>
<LOG>
<a>a</a>
<b>b</b>
<c>c</c>
<id>YYYYY</id>
<d>d</d>
</LOG>

Try this:
sed -n '/<LOG/{:a;/<\/LOG/!{N;ba};s/.*\(<LOG>\)\(.*XXXXX.*<\/LOG>\).*/\1\n\2/p}' logFile
It should do the job but keep in mind that sed is not the right tool for parsing xml. When you'll have to parse valid xml files, you should consider using xmlstarlet or xmllint.

This might work for you (GNU sed):
sed -nr '/<LOG>/,/<\/LOG>/{s/.*(<LOG>)\s*/\1\n/;s/(<\/LOG>).*/\1/;p}' file
Use seds grep-like option to inhibit printing unless explicitly required and utilise the range feature /.../,/.../, top and tailing the string produced.

Related

Extracting and modifying XML with deep structure from Linux command line

I would like to select and change a value in an XML file. I'm trying to use xmlstarlet for this.
I have this file
<?xml version='1.0' encoding='UTF-8'?>
<DeviceDescription xmlns="http://www.3s-software.com/schemas/DeviceDescription-1.0.xsd">
<House>
<Id>
<Number>1</Number>
</Id>
</House>
<Car>
<Id>
<Number>2</Number>
</Id>
</Car>
</DeviceDescription>
My problem is the xmlns= field which xmlstarlet is picky about. Without this field I can use
xmlstarlet sel -t -v '/Description/House/Id/Number' /tmp/x.xml
I found that I can use a default namespace like this, but that returns both Id's
xmlstarlet sel -t -m "//_:Id" -v '_:Number' /tmp/x.xml
How do I specify a full path?
To only match the House id, add it to the -m argument:
xml sel -t -m '//_:House/_:Id' -v '_:Number'
If you want to use the namespace, specify it with -N, e.g.:
xml sel -N ns="http://www.3s-software.com/schemas/DeviceDescription-1.0.xsd" \
-t -v 'ns:DeviceDescription/ns:House/ns:Id/ns:Number'
So to update the value:
xml ed -N ns="http://www.3s-software.com/schemas/DeviceDescription-1.0.xsd" \
-u 'ns:DeviceDescription/ns:House/ns:Id/ns:Number' -v 3
Output:
<?xml version="1.0" encoding="UTF-8"?>
<DeviceDescription xmlns="http://www.3s-software.com/schemas/DeviceDescription-1.0.xsd">
<House>
<Id>
<Number>3</Number>
</Id>
</House>
<Car>
<Id>
<Number>2</Number>
</Id>
</Car>
</DeviceDescription>

adding 1 user with htpasswd in 2 different servers using ssh connection

i'm working with apache camel and i want to add one user in two differents servers.And i want to test if ssh.redundancy=true.This is my code :
<simple ${headers.op} == 1</simple>
<doTry id="try-cmd-httpd">
<setBody id="httpd.cmd.htpasswd">
<simple>htpasswd -b /etc/httpd/passwords ${header.login} ${header.passwd} {{httpd.io_redir}}</simple>
</setBody>
**<to id="to_exec_htpaswd" uri="ssh://{{ssh.user}}:{{ssh.passwd}}#{{ssh.host}}:{{ssh.port}}"/>**
<log id="htpasswdResp_log" message="response: ${body}"/>
**<to id="to_exec_htpaswd2" uri="ssh://{{ssh.user}}:{{ssh.passwd}}#{{ssh.host2}}:{{ssh.port}}"/>**
<log id="htpasswdResp_log2" message="response: ${body}"/> ```
I found the solution.Just add a choice for the parametre ssh.redundancy and invoque for the second time.
<when id="redundancytrue">
<simple>{{ssh.redundancy}} == "true"</simple>
<setBody id="httpd.cmd.htpasswd">
<simple>htpasswd -b /etc/httpd/passwords ${header.login} ${header.passwd} {{httpd.io_redir}}</simple>
</setBody>
<to id="to_exec_htpaswd2" uri="ssh://{{ssh.user}}:{{ssh.passwd}}#{{ssh.host2}}:{{ssh.port}}"/>
</when>
</choice>```

xmlstarlet "does not work" for XMLs with namespaces

I'm using media info, to get some xml information about movie:
mediainfo --Output=XML Krtek\ a\ buldozer-jdvwqZUEbhc.mkv | xmlstarlet format
which output is:
<?xml version="1.0" encoding="UTF-8"?>
<MediaInfo xmlns="https://mediaarea.net/mediainfo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://mediaarea.net/mediainfo https://mediaarea.net/mediainfo/mediainfo_2_0.xsd" version="2.0">
<creatingLibrary version="18.03" url="https://mediaarea.net/MediaInfo">MediaInfoLib</creatingLibrary>
<media ref="Krtek a buldozer-jdvwqZUEbhc.mkv">
<track type="General">
<UniqueID>101120522676894244607292274887483611459</UniqueID>
<VideoCount>1</VideoCount>
<AudioCount>1</AudioCount>
<FileExtension>mkv</FileExtension>
<Format>Matroska</Format>
<Format_Version>4</Format_Version>
<FileSize>60132643</FileSize>
<Duration>374.101</Duration>
<OverallBitRate>1285912</OverallBitRate>
<FrameRate>25.000</FrameRate>
<FrameCount>9352</FrameCount>
<IsStreamable>Yes</IsStreamable>
<File_Modified_Date>UTC 2018-10-15 07:09:29</File_Modified_Date>
<File_Modified_Date_Local>2018-10-15 09:09:29</File_Modified_Date_Local>
<Encoded_Application>Lavf57.71.100</Encoded_Application>
<Encoded_Library>Lavf57.71.100</Encoded_Library>
<extra>
<ErrorDetectionType>Per level 1</ErrorDetectionType>
</extra>
</track>
<track type="Video">
<StreamOrder>0</StreamOrder>
<ID>1</ID>
<UniqueID>1</UniqueID>
<Format>AVC</Format>
<Format_Profile>High</Format_Profile>
<Format_Level>4</Format_Level>
<Format_Settings_CABAC>Yes</Format_Settings_CABAC>
<Format_Settings_RefFrames>3</Format_Settings_RefFrames>
<CodecID>V_MPEG4/ISO/AVC</CodecID>
<Duration>374.080000000</Duration>
<Width>1920</Width>
<Height>1080</Height>
<Stored_Height>1088</Stored_Height>
<Sampled_Width>1920</Sampled_Width>
<Sampled_Height>1080</Sampled_Height>
<PixelAspectRatio>1.000</PixelAspectRatio>
<DisplayAspectRatio>1.778</DisplayAspectRatio>
<FrameRate_Mode>CFR</FrameRate_Mode>
<FrameRate_Mode_Original>VFR</FrameRate_Mode_Original>
<FrameRate>25.000</FrameRate>
<FrameCount>9352</FrameCount>
<ColorSpace>YUV</ColorSpace>
<ChromaSubsampling>4:2:0</ChromaSubsampling>
<BitDepth>8</BitDepth>
<ScanType>Progressive</ScanType>
<Delay>0.000</Delay>
<Default>Yes</Default>
<Forced>No</Forced>
<colour_range>Limited</colour_range>
<colour_description_present>Yes</colour_description_present>
<colour_primaries>BT.709</colour_primaries>
<transfer_characteristics>BT.709</transfer_characteristics>
<matrix_coefficients>BT.709</matrix_coefficients>
</track>
<track type="Audio">
<StreamOrder>1</StreamOrder>
<ID>2</ID>
<UniqueID>2</UniqueID>
<Format>Opus</Format>
<CodecID>A_OPUS</CodecID>
<Duration>374.101000000</Duration>
<Channels>2</Channels>
<ChannelPositions>Front: L R</ChannelPositions>
<SamplingRate>48000</SamplingRate>
<SamplingCount>17956848</SamplingCount>
<BitDepth>32</BitDepth>
<Compression_Mode>Lossy</Compression_Mode>
<Delay>0.000</Delay>
<Delay_Source>Container</Delay_Source>
<Language>en</Language>
<Default>Yes</Default>
<Forced>No</Forced>
</track>
</media>
</MediaInfo>
now say that I want to get all IDs:
... | xmlstarlet sel -t -v "//ID"
and nothing is printed. What? Why? Well it turned out, that if i remove all parameters from tag on second line, the same selection command will work. Now I undestand, that xmlstarlet (probably) works just fine, I'm just missing some magic flag or syntax, so that it can process xmls with defined namespaces. Can someone advice?
You need to use the namespace with -N option, and use it in the query like <namespace>:<xpath>:
... | xmlstarlet sel -N n="https://mediaarea.net/mediainfo" -t -v "//n:ID"
From the help page:
-N <name>=<value>
- predefine namespaces (name without 'xmlns:')
ex: xsql=urn:oracle-xsql
Multiple -N options are allowed.

decode value in xml using shell

i am trying to decode a value from xml. please find a sample below. This will be multiple blocks. I need to find tag and decode the contents and generate the same output. i am just in the process of starting the script.
<SOAP-ENV:Body>
<log-entry serial="abcde" domain="abc">
<date>Tue Oct 17 2017</date>
<time utc="abcde">14:14:30</time>
<type>all</type>
<class>ccccc</class>
<object>Web_Token</object>
<level num="5">notice</level>
<transaction>xxxxx</transaction>
<global-transaction-id>xxxxx</global-transaction-id>
<client>X.X.X.X</client>
<message>
<base64>**encodeddata**</base64>
</message>
</log-entry>
</SOAP-ENV:Body>
i need output
<SOAP-ENV:Body>
<log-entry serial="abcde" domain="abc">
<date>Tue Oct 17 2017</date>
<time utc="abcde">14:14:30</time>
<type>all</type>
<class>ccccc</class>
<object>Web_Token</object>
<level num="5">notice</level>
<transaction>xxxxx</transaction>
<global-transaction-id>xxxxx</global-transaction-id>
<client>X.X.X.X</client>
<message>
<base64>**decodeddata**</base64>
</message>
</log-entry>
</SOAP-ENV:Body>
I am in the process of Iteration, started with decoding the value.
sed -n 's/<base64>\(.*\)<\/base64>/\1/p' log.txt | base64 --decode
thanks.
Try this :
xmllint --xpath '//message/base64/text()' file.xml 2>/dev/null |
base64 -d -

how to find the string starting with # in the xml files through unix

I have the different xml files at the following directory /opt/app/rty/servers/tr/current/ops/config
Let's say there are three files named
abc.xml
bv.xml
ert.xml
Now inside these xml there can be many tags as like shown below
<bean id="sdrt" class="com.interfaces.send.erty">
<property name="eprocDependentOnClientAddress"><value>#argon.tdw.client.address#</value></property>
<property name="eprocDependentOnClientAddressEod"><value>tyu</value></property>
</bean>
Now my objective is that inside directory /opt/app/rty/servers/tr/current/ops/config I need to search in all xml files and have to find that in every xml in context to property tag there should be no value starting from # tag and also ending with # tag
Let's say for example in the above xml file the below is correct
<property name="eprocDependentOnClientAddressEod"><value>tyu</value></property>
but the below is not correct
<property name="eprocDependentOnClientAddress"><value>#argon.tdw.client.address#</value></property>
so please advise what will be the command in unix to search the files names that is those xml in which there is a tag starting with # inside the property tag
To find files have <value># and #</value> strings in content, use this shell
#!/bin/bash
for line in `ls *.xml`; do
a=`egrep -o "<value>#"\|"#</value>" $line`
if [[ "$a" == "<value>#"* && "$a" == *"#</value>" ]]; then
printf "File $line has issues.\n"
else
printf "File $line is clean.\n"
fi
done
And if # needs to replaced from within <value> tag.
#!/bin/bash
for line in `ls *.xml`; do
a=`egrep -o "<value>#"\|"#</value>" $line`
if [[ "$a" == "<value>#"* && "$a" == *"#</value>" ]]; then
printf 'File $line has issues.\n'
sed -e 's/<value>#//g' -e 's/#<\/value>//g' $line > ${line}.replaced
fi
done

Resources