Delete line containing a specific string starting with dollar sign using unix sed - unix

I am very new to Unix.
I have a parameter file Parameter.prm containing following lines.
$$ErrorTable1=ErrorTable1
$$Filename1_New=FileNew.txt
$$Filename1_Old=FileOld.txt
$$ErrorTable2=ErrorTable2
$$Filename2_New=FileNew.txt
$$Filename2_Old=FileOld.txt
$$ErrorTable3=ErrorTable3
$$Filename3_New=FileNew.txt
$$Filename3_Old=FileOld.txt
I want get the output as
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
Basically, I need to delete line starting with $$Filename.
Since $ is a keyword, I am not able to interpret it as a string. How can I accomplish this using sed?

With sed:
$ sed '/$$Filename/d' infile
$$ErrorTable1=ErrorTable1
$$ErrorTable2=ErrorTable2
$$ErrorTable3=ErrorTable3
The /$$Filename/ part is the address, i.e., for all lines matching this, the command following it will be executed. The command is d, which deletes the line. Lines that don't match are just printed as is.

Extracting information from a textfile based on pattern search is a job for grep:
grep ErrorTable file
or even
grep -F '$$ErrorTable' file
-F tells grep to treat the search term as a fixed string instead of a regular expression.
Just to answer your question, if a regular expression needs to search for characters which have a special meaning in the regex language, you need to escape them:
grep '\$\$ErrorTable' file

Related

unix SED command to replace part of key value pair

We have requirement where i need to replace part of param value in our configuration file.
Example
key1=123-456
I need to replace the value after hyphen with new value.
I got command which is being used in other projects but i am not sure how it works.
Command
[test]$ cat test_sed_key_value.txt
key1=123-456
[test]$ sed -i -e '/key1/ s/-.*$/-789/' test_sed_key_value.txt
[test]$
[test]$ cat test_sed_key_value.txt
key1=123-789
[test]$
It will be helpful if some one can explain how the above command or is there a simpler way to do this using sed.
Here is a list of parts of that commandline, each followed by a short explanation:
sed
which tool to use
-i
flag: apply the effect directly to the processed file (whithout creating a copy of the input file)
-e
expression parameter: the sed code to apply follows
/key1/
"address": only process lines on which this regex applies, i.e. those containing the text "key1"
s/replacethis/withthis/
command: do a search-and-replace, "replacethis" and "withthis" are the next to explanations
-.*$
regex: (what is actually in the commandline instead of "replacethis") a regular expression representing a "minus" followed by anything, in any number, until the end of the line
-789
literal: (what is actually in the commandline instead of "withthis") simply that string "-789"
test_sed_key_value.txt
file parameter: process this file
I cannot think of any way to do this simpler. The shown command already uses some assumptions on the formatting of the input file.
I'd add to Yunnosch's answer that here the "replacethis" is a regexp:
-.*$
See here for an overview of the syntax of sed's regular expressions by Gnu.
Asterisk means a repetition of the previous thing, dot means any character, so .* means a sequence of characters.
$ is the end of the line.
You might want to be a bit more restrictive, since here you'd lose something in a line like this one for instance:
key1=123-456, key2=abc-def
replacing it by:
key1=123-789
removing completely the key2 part (since the .* takes all characters after the first dash until end of line).
So depending on the format of your values, you might prefer something like
-[0-9]*
(without the $), meaning a sequence of numbers after the -
or
-[0-9a-zA-Z_]
meaning a sequence of numbers or letters or underscore after the -

Removing comments from a datafile. What are the differences?

Let's say that you would like to remove comments from a datafile using one of two methods:
cat file.dat | sed -e "s/\#.*//"
cat file.dat | grep -v "#"
How do these individual methods work, and what is the difference between them? Would it also be possible for a person to write the clean data to a new file, while avoiding any possible warnings or error messages to end up in that datafile? If so, how would you go about doing this?
How do these individual methods work, and what is the difference
between them?
Yes, they work same though sed and grep are 2 different commands. Your sed command simply substitutes all those lines which having # with NULL. On other hand grep will simply skip or ignore those lines which will skip lines which have # in it.
You could get more information on these by man page as follows:
man grep:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The
replacement may
contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1
through \9 to
refer to the corresponding matching sub-expressions in the regexp.
Would it also be possible for a person to write the clean data to a
new file, while avoiding any possible warnings or error messages to
end up in that datafile?
yes, we could re-direct the errors by using 2>/dev/null in both the commands.
If so, how would you go about doing this?
You could try like 2>/dev/null 1>output_file
Explanation of sed command: Adding explanation of sed command too now. This is only for understanding purposes and no need to use cat and then use sed you could use sed -e "s/\#.*//" Input_file instead.
sed -e " ##Initiating sed command here with adding the script to the commands to be executed
s/ ##using s for substitution of regexp following it.
\#.* ##telling sed to match a line if it has # till everything here.
//" ##If match found for above regexp then substitute it with NULL.
That grep -v will lose all the lines that have # on them, for example:
$ cat file
first
# second
thi # rd
so
$ grep -v "#" file
first
will drop off all lines with # on it which is not favorable. Rather you should:
$ grep -o "^[^#]*" file
first
thi
like that sed command does but this way you won't get empty lines. man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

programmatic grep command output

Is there a way to get XML or equivalent output of grep command that can be passed on to other programs.
For example, grep can give the file names, line numbers and context of the pattern matched.
Filename and line number extraction can be done using some split command with delimiter ':'. However, if the filename contains ':' character (I know it is weird, but there is a possibility), it would need lot more processing.
With the context (grep -C option), it becomes even more complex. If the context of two matches overlaps, grep optimizes the output and it will be difficult to separate.
So I am wondering if grep command can simply generate an XML or JSON like output that other programs can just load.
There is an option -Z to grep which produces unambiguous output, by using Nul characters.

Unix Text Processing - how to remove part of a file name from the results?

I'm searching through text files using grep and sed commands and I also want the file names displayed before my results. However, I'm trying to remove part of the file name when it is displayed.
The file names are formatted like this: aja_EPL_1999_03_01.txt
I want to have only the date without the beginning letters and without the .txt extension.
I've been searching for an answer and it seems like it's possible to do that with a sed or a grep command by using something like this to look forward and back and extract between _ and .txt:
(?<=_)\d+(?=\.)
But I must be doing something wrong, because it hasn't worked for me and I possibly have to add something as well, so that it doesn't extract only the first number, but the whole date. Thanks in advance.
Edit: Adding also the working command I've used just in case. I imagine whatever command is needed would have to go at the beginning?
sed '/^$/d' *.txt | grep -P '(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)' *.txt --colour -A 1
The results look like this:
aja_EPL_1999_03_02.txt:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
A desired output would be this:
1999_03_02:PALLILENNUD : korraga üritavad ümbermaailmalendu kaks meeskonda
First off, you might want to think about your regular expression. While the one you have you say works, I wonder if it could be simplified. You told us:
(^([A-ZÖÄÜÕŠŽ].*)?[Pp][Aa][Ll]{2}.*[^\.]$)
It looks to me as if this is intended to match lines that start with a case insensitive "PALL", possibly preceded by any number of other characters that start with a capital letter, and that lines must not end in a backslash or a dot. So valid lines might be any of:
PALLILENNUD : korraga üritavad etc etc
Õlu on kena. Do I have appalling speling?
Peeter Pall is a limnologist at EMU!
If you'd care to narrow down this description a little and perhaps provide some examples of lines that should be matched or skipped, we may be able to do better. For instance, your outer parentheses are probably unnecessary.
Now, let's clarify what your pipe isn't doing.
sed '/^$/d' *.txt
This reads all your .txt files as an input stream, deletes any empty lines, and prints the output to stdout.
grep -P 'regex' *.txt --otheroptions
This reads all your .txt files, and prints any lines that match regex. It does not read stdin.
So .. in the command line you're using right now, your sed command is utterly ignored, as sed's output is not being read by grep. You COULD instruct grep to read from both files and stdin:
$ echo "hello" > x.txt
$ echo "world" | grep "o" x.txt -
x.txt:hello
(standard input):world
But that's not what you're doing.
By default, when grep reads from multiple files, it will precede each match with the name of the file from whence that match originated. That's also what you're seeing in my example above -- two inputs, one x.txt and the other - a.k.a. stdin, separated by a colon from the match they supplied.
While grep does include the most minuscule capability for filtering (with -o, or GNU grep's \K with optional Perl compatible RE), it does NOT provide you with any options for formatting the filename. Since you can'd do anything with the output of grep, you're limited to either parsing the output you've got, or using some other tool.
Parsing is easy, if your filenames are predictably structured as they seem to be from the two examples you've provided.
For this, we can ignore that these lines contain a file and data. For the purpose of the filter, they are a stream which follows a pattern. It looks like you want to strip off all characters from the beginning of each line up to and not including the first digit. You can do this by piping through sed:
sed 's/^[^0-9]*//'
Or you can achieve the same effect by using grep's minimal filtering to return every match starting from the first digit:
grep -o '[0-9].*'
If this kind of pipe-fitting is not to your liking, you may want to replace your entire grep with something in awk that combines functionality:
$ awk '
/[\.]$/ {next} # skip lines ending in backslash or dot
/^([A-ZÖÄÜÕŠŽ].*)?PALL/ { # lines to match
f=FILENAME
sub(/^[^0-9]*/,"",f) # strip unwanted part of filename, like sed
printf "%s:%s\n", f, $0
getline # simulate the "-A 1" from grep
printf "%s:%s\n", f, $0
}' *.txt
Note that I haven't tested this, because I don't have your data to work with.
Also, awk doesn't include any of the fancy terminal-dependent colourization that GNU grep provides through the --colour option.

Separating E-mail addresses from a text file using tsch

I have a a text file which contains E-mail addresses surrounded by a lot of garbage.
I need to separate just the E-mail adresses, and write each address in a separate line (or separate between them with a comma).
the text file looks like that:
per#netvision.net אירית שנהב;רוני אשכול 99; מרכז האולפן 99; דפני אלפר; תים רון; (eina#gmail.com) אינה דגן 9303; (ori#gmail.com) אילן דור 9406; 9304 אורי
I think to "catch" all the words that start with [A-Z] and end with [A-Z] because all the garbage around is not letter in English ([A-Z]).
Can someone show me how to write this script?
I would do this using grep -o. It's not precisely "in" tcsh, but you can use grep from any script. The -o option causes grep to return only the text matched by the regexp.
It looks as if your input file currently separates records using a semi-colon. This is important, since grep reads things line-by-line. So we will use tr to replace your record separators with newlines to ensure that grep sees each record.
ghoti#pc> cat strip_email
#!/bin/tcsh
setenv inputfile emails.txt
setenv re_email '[[:alnum:]][[:alnum:]_%=+-]*#([[:alnum:]]([[:alnum:]-])+\.)+[[:alnum:]]{2,}'
tr ';' '\n' < $inputfile | grep -Eo "$re_email"
ghoti#pc> ./strip_email
per#netvision.net
eina#gmail.com
ori#gmail.com
ghoti#pc>
You can adapt the regular expression to whatever suits you. On a unix or linux system, you can check man pages: re_format(7) or regex(7) for documentation. The [:alnum:] piece is documented there, as well as isalnum.

Resources