How to delete one specific line in a file and modify the next line in unix? - unix

I have a text file and there was a mistake when it was created. To fix this I need to delete a line with a specific unique string and delete the characters in the following line that precede the # symbol. I was able to do this with sed and cut but it only output that one line, not the many other 1000s of lines in my file. Here is an example of the part of the file that needs fixing. I know the line #s (delete 45603341 and modify 45603342) where this mistake occurs.
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
I need the output to look like this and for it to leave the rest of the file intact.
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
Thanks!

How about:
sed -i -e '45603341d;45603342s/^.*\(#.*\)$/\1/' <filename>
where you replace <filename> with the name of your file.

If you want to change a particular line and delete the above line then run,
sed -ri '45603342s/^([^#]*)(#.*)$/\2/g; 45603341d' aa
Example:
$ cat aa
#HWI-1KL135:70:C305EACXX:5:2105:6727:102841 1:N:0:CAGATC
CCAAGTGTCACCTCTTTTATTTATTGATTT#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC
$ sed -r '2s/^([^#]*)(#.*)$/\2/g; 1d' aa
#HWI-1KL135:70:C305EACXX:5:1101:1178:2203 1:N:0:CAGATC

This might work for you (GNU sed):
sed '45603341!b;N;s/^.*\n[^#]*//' file
Leave as is any other line ecsept 45603341. On this line , append the following line and then remove everything from the start to the first non-# in the the appended line.

An alternative approach to 'sed' can be to use vim macros (This also works on Windows). The main disadvantage is that you will not be able to integrate inside scripts like 'sed' does. The main advantage is that it allows for complex replacements like "search for this pattern, then clear the line, go down 3 lines, move to column 40, switch lines,...). If you are already familiar with VIM it's also much more intuitive.
In this particular case you will have to do something like
qq (start macro recording)
/^#HWI.*CAGATC$ (search pattern)
dd (delete line)
vw (select word)
d (delete selected word)
q (end macro)
To run the macro 100 times:
100#q

Related

Remove duplicate lines based on starting pattern using bash

I'm trying to remove duplicates in a list of Jira tickets that follow the following syntax:
XXXX-12345: a description
where 12345 is a pattern like [0-9]+ and the XXXX is constant. For example, the following list:
XXXX-1111: a description
XXXX-2222: another description
XXXX-1111: yet another description
should get cleaned up like this:
XXXX-1111: a description
XXXX-2222: another description
I've been trying using sed but while what I had worked on Mac it didn't on linux. I think it'd be easier with awk but I'm not an expert on any of them.
I tried:
sed -r '$!N; /^XXXX-[0-9]+\n\1/!P; D' file
This simple awk should get the output:
awk '!seen[$1]++' file
XXXX-1111: a description
XXXX-2222: another description
If the digits are the only thing defining a dup, you could do:
awk -F: '{split($1,arr,/-/); if (seen[arr[2]]++) next} 1' file
If the XXXX is always the same, you can simplify to:
awk -F: '!seen[$1]++' file
Either prints:
XXXX-1111: a description
XXXX-2222: another description
This might work for you (GNU sed):
sed -nE 'G;/^([^:]*:).*\n\1/d;P;h' file
-nE turn on explicit printing and extended regexps.
G append unique lines from the hold space to the current line.
/^([^:]*:).*\n\1/d If the current line key already exists, delete it.
P otherwise, print the current line and
h store unique lines in the hold space
N.B. Your sed solution would work (not as is but with some tweaking) but only if the file(s) were sorted by the key.
sed -E 'N;/^([^:]*:).*\n\1/!P;D' file

Unix Text Processing - how to remove part of a file name from the results?

I'm searching through text files using grep and sed commands and I also want the file names displayed before my results. However, I'm trying to remove part of the file name when it is displayed.
The file names are formatted like this: aja_EPL_1999_03_01.txt
I want to have only the date without the beginning letters and without the .txt extension.
I've been searching for an answer and it seems like it's possible to do that with a sed or a grep command by using something like this to look forward and back and extract between _ and .txt:
(?<=_)\d+(?=\.)
But I must be doing something wrong, because it hasn't worked for me and I possibly have to add something as well, so that it doesn't extract only the first number, but the whole date. Thanks in advance.
Edit: Adding also the working command I've used just in case. I imagine whatever command is needed would have to go at the beginning?
sed '/^$/d' *.txt | grep -P '(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)' *.txt --colour -A 1
The results look like this:
aja_EPL_1999_03_02.txt:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
A desired output would be this:
1999_03_02:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
First off, you might want to think about your regular expression. While the one you have you say works, I wonder if it could be simplified. You told us:
(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)
It looks to me as if this is intended to match lines that start with a case insensitive "PALL", possibly preceded by any number of other characters that start with a capital letter, and that lines must not end in a backslash or a dot. So valid lines might be any of:
PALLILENNUD : korraga üritavad etc etc
Õlu on kena. Do I have appalling speling?
Peeter Pall is a limnologist at EMU!
If you'd care to narrow down this description a little and perhaps provide some examples of lines that should be matched or skipped, we may be able to do better. For instance, your outer parentheses are probably unnecessary.
Now, let's clarify what your pipe isn't doing.
sed '/^$/d' *.txt
This reads all your .txt files as an input stream, deletes any empty lines, and prints the output to stdout.
grep -P 'regex' *.txt --otheroptions
This reads all your .txt files, and prints any lines that match regex. It does not read stdin.
So .. in the command line you're using right now, your sed command is utterly ignored, as sed's output is not being read by grep. You COULD instruct grep to read from both files and stdin:
$ echo "hello" > x.txt
$ echo "world" | grep "o" x.txt -
x.txt:hello
(standard input):world
But that's not what you're doing.
By default, when grep reads from multiple files, it will precede each match with the name of the file from whence that match originated. That's also what you're seeing in my example above -- two inputs, one x.txt and the other - a.k.a. stdin, separated by a colon from the match they supplied.
While grep does include the most minuscule capability for filtering (with -o, or GNU grep's \K with optional Perl compatible RE), it does NOT provide you with any options for formatting the filename. Since you can'd do anything with the output of grep, you're limited to either parsing the output you've got, or using some other tool.
Parsing is easy, if your filenames are predictably structured as they seem to be from the two examples you've provided.
For this, we can ignore that these lines contain a file and data. For the purpose of the filter, they are a stream which follows a pattern. It looks like you want to strip off all characters from the beginning of each line up to and not including the first digit. You can do this by piping through sed:
sed 's/^[^0-9]*//'
Or you can achieve the same effect by using grep's minimal filtering to return every match starting from the first digit:
grep -o '[0-9].*'
If this kind of pipe-fitting is not to your liking, you may want to replace your entire grep with something in awk that combines functionality:
$ awk '
/[\.]$/ {next} # skip lines ending in backslash or dot
/^([A-ZÖÄÜÕŠŽ].*)?PALL/ { # lines to match
f=FILENAME
sub(/^[^0-9]*/,"",f) # strip unwanted part of filename, like sed
printf "%s:%s\n", f, $0
getline # simulate the "-A 1" from grep
printf "%s:%s\n", f, $0
}' *.txt
Note that I haven't tested this, because I don't have your data to work with.
Also, awk doesn't include any of the fancy terminal-dependent colourization that GNU grep provides through the --colour option.

How to delete lines from a file that start with certain words

My file extension is CSV file looks below format in unix server.
"Product_Package_Map_10302017.csv","451","2017-10-30 05:02:26"
"Targeting_10302017.csv","13","2017-10-30 05:02:26",
"Targeting_Options_10302017.csv","42","2017-10-30 05:02:27"
I want to delete a particular line based on filename keyword.
You can use grep -v:
grep -v '^"Product_Package_Map_10302017.csv"' file > file.filtered
'^"Product_Package_Map_10302017.csv"' matches the string "Product_Package_Map_10302017.csv" exactly at the line beginning
or sed can do it in-place:
sed -i '/^"Product_Package_Map_10302017.csv"/d' file
See this related post for other alternatives:
Delete lines in a text file that contain a specific string
See this previous question. A grep-based answer would be my first choice but, as you can see, there are many ways to address this one!
(Would have just commented, but my 'rep' is not yet high enough)

Adding Text using ed to the End of a Specific Line within a File

I have two files, both contain tens of thousands of lines. I'm currently taking a string (Z1234562) from file_one.txt and trying to see if its on file_two.txt. If it's found on file_two.txt, I'm returning the line number the match was on -- in this case, line 1235. I have this working already.
file_one.txt
Line 1> [...]_Z1234562
file_two.txt
Line 1234> [...],Z1234561,[...]
Line 1235> [...],Z1234562,[...]
Line 1236> [...],Z1234563,[...]
However, I want to now append to line 1235 the string ,Yes. So that on file_two.txt I have
Line 1235> [...],Z1234562,[...],Yes
With the help Glenn Jackman's answer of this other question I was able to figure out how to add text using the ed editor before and after a specific line within a file. However, I haven't been able to figure out if with ed I can add text to the end of a line within a file. Reading the documentation I'm not sure there is. So far, based off this AIX site, this is what I have:
(echo '15022a'; echo 'Text to add.'; echo '.'; echo 'w'; echo 'q') | ed -s filename.txt
This appends the string Text to add. after line 15,022. I'm wondering if there is an insert equivalent to this append.
The reason I'm not using sed is because I'm on an AIX system and I can't seem to get what this forum has working. However, I'm not sure if the sed command in this forum only solves adding text before or after a line and not at the end of the line, which I already have working.
My next approach is to remove the return character at the end of the line I want to append to, append, and then re-add the return character but I don't want to reinvent the wheel before exhausting my options. Maybe I'm missing something as I want to do this as soon as possible but any help would be appreciated. I'm starting not to like these AIX systems. Maybe awk can help me but I'm less familiar with awk.
I wrote a small binary search subroutine using Perl in order to find the line that I want to append to. I'm not sure if sed, ed, grep, awk, etc. use binary search but that's why I'm not using ed or sed's pattern-replace searches. I want it to be as fast as possible so I'm open to a different approach.
Here is a general recipe I have used for invoking ed from a script:
(echo '/pattern/s/$/ new text/'
echo w ) | ed filename
This invokes ed on filename, searches for a line containing pattern, and appends "new text" at the end of that line. Season to taste.
You said you were having trouble with sed, but for the record, here's the same sort of thing using sed:
sed '/pattern/s/$/ new text/' filename > filename.modified
You can use the j command
(.,.+1)j
Joins the addressed lines. The addressed lines are deleted from the buffer and replaced by a single line containing their joined text. The current address is set to the resultant line.
So you just have to modifiy your previous command :
cat << EOF | ed -s filename.txt
15022a
Text to add.
.
-1,.j
wq
EOF
First we create a test file foo:
$ cat > foo
foo
bar
If you already know the line number you want to edit, e.g. previous example and line number 2 using sed:
$ sed '2s/$/foo' foo
foo
barfoo
In awk you'd command:
$ awk 'NR==2 {sub(/$/,"foo")} 1' foo
foo
barfoo
In perl:
$ perl -p -e 's/$/foo/ if $. == 2' foo
foo
barfoo

modifiy grep output with find and replace cmd

I use grep to sort log big file into small one but still there is long dir path in output log file which is common every time.I have to do find and replace every time.
Isnt there any way i can grep -r "format" log.log | execute findnreplce thing?
Sed will do what you want. Basic syntax to replace all the matches of foo with bar in-place in $file is:
sed -i 's/foo/bar/g' $file
If you're just wanting to delete rather than replace, simply leave out the 'bar' (so s/foo//g).
See this tutorial for a lot more detail, such as regex support.
sed -n '/match/s/pattern/repl/p'
Will print all the lines that match the regex match, with all instances of pattern replaced by repl. Since your lines may contain paths, you will probably want to use a different delimeter. / is customary, but you can also do:
sed -n '\#match#s##repl#p`
In the second case, omitting pattern will cause match to be used for the pattern to be replaced.

Resources