script to extract the details from xml

script to extract the details from xml - unix

if have any xml file as below:
<soap env="abc" id="xyz">
<emp>acdf</emp>
<Workinstance name="ab" id="ab1">
<x>1</x>
<y>2</y>
</Workinstance>
<projectinstance name="cd" id="cd1">
<u>1</u>
<v>2</v>
</projectinstance>
</soap>
I want to extract the id field in workinstance using unix script
I tried grep but, it is retrieving the whole xml file.
Can someone help me how to get it?

You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications.
Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.

XmlStarlet seems the tool I was looking for!
To do extract your tag, try to do the following:
cat your_file.xml | xmlstarlet sel -t -v 'soap/Workinstance/#id'
The "soap/Workinstance/#id" is an XPath expression that will get the id attribute inside Workinstance tag. By using "-v" flag, you ask xmlstarlet to print the extracted text to the standard output.

If you have Ruby
$ ruby -ne 'print $_.gsub(/.*id=\"|\".*$/,"" ) if /<Workinstance/' file
ab1

Related

Can I use conditional "or" statements in selecting files to download with wget

I'm trying to download a bunch of files via ftp with wget. I could do this manually for each of the variables that I am interested in, or I was wondering if I could specify these in an "or" type conditional statement in the filepath name.
For example, I would like to download all files that contain the strings "NRRS412", "NRRS443", "NRRS490", etc. I had planned to do individual calls to wget for each of these, like this:
wget -r -A "L3m*NRRS412*.nc" ftp://username:password#ftp.address
I cannot simply use "L3m*NRRS*.nc", as there are other "NRRS" strings that I don't want.
Is there a way to download all of my target strings in a single call to wget?
Thanks for any help

OK, I figured out the solution, which is to create several possible strings separated by commas:
wget -r -A "L3m*NRRS412*.nc, L3m*NRRS43*.nc, L3m*NRRS490*.nc" ftp://username:password#ftp.address

how to extract sub string of a filename using different types of delimiter in shell script?

I'm learning shell script. Let's say abcd-2.1.1.4.jar is file name. I want to extract the version i.e. "2.1.1.4". I tried with "cut" syntax.
"abcd-2.1.1.4.jar" | cut -d'-' -f 2 return output abcd-2.1.1.4.jar I can't using different types of delimiter
Is there is any other way to achieve that.
Thank you.

You better try with sed: echo abcd-2.1.1.4 | sed 's/.*\-\([0-9\.]\+\)\.jar/\1/'
Hoping you're using GNU sed, otherwise it might be a little different.

GNU `ls` has `--quoting-style` option, what's the equivalent in BSD `ls`

I will use ls output for pipe input, so I need to escape the file name. when I use GNU ls, It works well. what's the equivalent in BSD ls? I hoping the output is like this.
$ gls --quoting-style escape t*1
text\ 1 text1

Why are/were you trying to use ls in a pipeline? You should probably be using find (maybe with -print0 and xargs -0, or -exec).
I suppose you could use ls -1f and then run the output through vis (or some similar filter) with some appropriate options to add the necessary quoting or escaping of your choice, but without knowing what you are feeding filenames into, and what (if any) other options you would want to use with ls, it's impossible to give much better guidance.

From the freebsd man page on ls there is no such option, however, you can try -m which will give you a comma separated streamed output:
-m Stream output format; list files across the page, separated by
commas.
I tried it on osx and it gave me:
$ ls -m
Hello World, Hello World.txt, foo.txt
That is a lot easier to parse from a script.

Search and replace in a console

I want to search and replace string located in a several files via bash console.
Here is the command I use to find a string in a file:
grep "string" * -r
so the above is for searching, now I need a command to replace the string.
Is that even possible?

http://www.grymoire.com/Unix/Sed.html
It's cranky and difficult, but it's one way to do it.
Here's an example:
sed -i 's/ugly/beautiful/g' /home/bruno/old-friends/sue.txt
This replaces ugly with beautiful in sue.txt.

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.

Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...

Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.

you are also use awk
awk '/start/,/end/' file

I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)

use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

script to extract the details from xml - unix

You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications. Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.

If you have Ruby $ ruby -ne 'print $_.gsub(/.id=\"|\".$/,"" ) if /<Workinstance/' file ab1

Related

Can I use conditional "or" statements in selecting files to download with wget

how to extract sub string of a filename using different types of delimiter in shell script?

GNU `ls` has `--quoting-style` option, what's the equivalent in BSD `ls`

Search and replace in a console

Remove lines which are between given patterns from a file (using Unix tools)

Categories

Resources

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

script to extract the details from xml - unix

You might want to consider something like XMLStarlet, which implements the XPath/XQuery specifications. Parsing XML with regular expressions is essentially impossible even under the best of conditions, so the sooner you give up on trying to do this with grep, the better off you're likely to be.

If you have Ruby $ ruby -ne 'print $_.gsub(/.*id=\"|\".*$/,"" ) if /<Workinstance/' file ab1

Related

Can I use conditional "or" statements in selecting files to download with wget

how to extract sub string of a filename using different types of delimiter in shell script?

GNU `ls` has `--quoting-style` option, what's the equivalent in BSD `ls`

Search and replace in a console

Remove lines which are between given patterns from a file (using Unix tools)

Categories

Resources

If you have Ruby $ ruby -ne 'print $_.gsub(/.id=\"|\".$/,"" ) if /<Workinstance/' file ab1