How can I exlude the following line in unix diff?
<xs:element attribute1="0" attribute2="return" attribute3="true" type="ax277:ResponseDataBean"/>
I need to exclude all cases where the number after 'ax' is different. For example, the following diff should be excluded:
File 1:
<xs:element attribute1="0" attribute2="return" attribute3="true" type="ax277:ResponseDataBean"/>
File 2:
<xs:element attribute1="0" attribute2="return" attribute3="true" type="ax111:ResponseDataBean"/>
I have tried:
diff -I 'type="ax^' file1 file2
But it is still displaying those lines.
You should really use XML tools for handling XML, because it's not going to be long before you find one of those files that was written with a line break in a place you don't expect it, making your line that you want to exclude no longer match a single-line pattern.
However, one way to do what you are asking would be to exclude those lines from the input that diff considers to begin with. Something along these lines, using bash process substitution:
diff <(grep -v 'type="ax[0-9]+:' file1) <(grep -v 'type="ax[0-9]+:' file2)
Related
I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)
This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv
$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f
You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.
Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv
I am trying to rename multiple files with extension xyz[n] to extension xyz
example :
mv *.xyz[1] to *.xyz
but the error is coming as - " *.xyz No such file or directory"
Don't know if mv can directly work using * but this would work
find ./ -name "*.xyz\[*\]" | while read line
do
mv "$line" ${line%.*}.xyz
done
Let's say we have some files as shown below.Now i want remove the part -(ab...) from those files.
> ls -1 foo*
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
So the expected file names would be :
> ls -1 foo*
foo-bar-foo-bar-bar.txt
foo-bar-foo-bar.txt
foo-bar.txt
>
Below is a simple way to do it.
> ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
for detailed explanation check here
Here is another way using the automated tools of StringSolver. Let us say your first file is named abc.xyz[1] a second named def.xyz[1] and a third named ghi.jpg (not the same extension as the previous two).
First, filter the files you want by giving examples (ok and notok are any words such that the first describes the accepted files):
filter abc.xyz[1] ok def.xyz[1] ok ghi.jpg notok
Then perform the move with the filter it created:
mv abc.xyz[1] abc.xyz
mv --filter --all
The second line generalizes the first transformation on all files ending with .xyz[1].
The last two lines can also be abbreviated in just one, which performs the moves and immediately generalizes it:
mv --filter --all abc.xyz[1] abc.xyz
DISCLAIMER: I am a co-author of this work for academic purposes. Other examples are available on youtube.
I think mv can't operate on multiple files directly without loop.
Use rename command instead. it uses regular expressions but easy to use once mastered and more powerful.
rename 's/^text-to-replace/new-text-you-want/' text-to-replace*
e.g to rename all .jar files in a directory to .jar_bak
rename 's/^jar/jar_bak/' jar*
Is it possible to create a diff patchfile that will edit lines themselves, rather than replacing an entire line?
For example, I have the following line:
<foo:ListeningPortBar>3423</foo:ListeningPortBar>
and I want to change this to:
<cat:LoremIpsum>3423</cat:LoremIpsum>
That is, I want to change the text around the actual port number, but preserve the port number - I need to apply this patch across a number of files, all with different port numbers - I simply want to change the tags, keeping whatever port number is in there currently.
How can you achieve this please?
Thanks,
Victor
It doesn't really matter if a patch replaces the entire line or just characters in the line (the end result is the same, no...?), but I don't think this is a "patch" question. See below for a simpler solution using "sed".
For example, assume:
$ cat f1.txt
<xml>
<foo:ListeningPortBar>3423</foo:ListeningPortBar>
</xml>
$cat f2.txt
<xml>
<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
Then, literally the patch would be:
$ diff -u f1.txt f2.txt
--- f1.txt 2012-07-08 03:14:39.328328048 -0700
+++ f2.txt 2012-07-08 03:14:30.618177130 -0700
## -1,3 +1,3 ##
<xml>
-<foo:ListeningPortBar>3423</foo:ListeningPortBar>
+<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
This patch file could be used as a template, modified with correct values for all your files that need to be updated, and applied to all the files individually. That sounds like more work than necessary.
On the other hand, just use "sed":
$ sed 's/<foo:ListeningPortBar>\([0-9]*\)<\/foo:ListeningPortBar>/<cat:LoremIpsum>\1<\/cat:LoremIpsum>/' f1.txt
<xml>
<cat:LoremIpsum>3423</cat:LoremIpsum>
</xml>
Since you have XML, using xsltproc is another alternative, but again probably overkill for this simple search-and-replace task.
To use this in a script, you'd do something like (replacing "etc/etc" with the sed above):
for f in $(find dir -name "*.xml" -exec egrep 'foo:ListeningPortBar' {} \; -print)
do
sed -i.bak 's/etc/etc/g' $f
done
...and then verify that the ".bak" files are actually different than the modified files.
I've been going through an online UNIX course and have come across this question which I'm stuck on. Would appreciate any help!
You are provided with a set of files each one of which contains personal details about an individual. Each file is laid out in the following format, with one file per individual:
name:Niko Tanaka
age:41
occupation:Doctor
I know the answer has to be in the form:
n=$(awk -F: ' / /{print }' filename)
n=$(awk -F: '/name/{print $2}' infile)
Whatever is inside of / / are regular expressions. In this case you just want to match on the line that contains 'name'.
I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end