How to split a string delimited by new lines in /bin/sh

How to split a string delimited by new lines in /bin/sh - unix

I have a string like FIRSTNAME\nLASTNAME in a file. I want to put the first name and last name in their own variables. I have to use /bin/sh, not bash.
How can I easily do this?

You can use parameter expansion operators to strip everything following the \n as well as everything preceding it.
$ s="Stealth\nRabbi"
$ first=${s%\\n*}
$ last=${s#*\\n}
$ echo "$first"
Stealth
$ echo "$last"
Rabbi
Note that there are no literal newlines involved; you have two characters, \ and n.

Related

Linux - Get Substring from 1st occurence of character

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?

Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.

I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"

About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101

The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

How to search a string containing with $ and ; in linux

I ran a ldapsearch command and the output is redirected to a file(overwriting the same file every time the command is run) I have to search 3 strings, In which have 2 strings have $ and ; in it.
Contents in the file example.txt
<some lines above>
changenumber;demo$host-example_good1$fine-example_good2
changenumber;echo$fine-example_good2$host-example_good1
changenumber;echo$host-example_good1$fine-example_good2
changenumber;demo$fine-example_good2$host-example_good1
<some lines below>
<end of file>
Tried below commands
awk -F";|$" '/echo$host-example_good1$fine-example_good2'/ example.txt
awk -F"[$;]" '/demo$host-example_good1$fine-example_good2'/ example.txt
Output: nothing is displayed
Expected output
changenumber;demo$host-example_good1$fine-example_good2
changenumber;echo$host-example_good1$fine-example_good2

$ has special meaning in regular expressions, you need to escape it.
awk -F'[;$]' '/demo\$host-example_good1\$fine-example_good2/' example.txt
Go to regular-expressions.info to read a tutorial about regular expressions.

To search for a string use a string function, not regexp:
$ awk 'index($0,"$host-example_good1$fine-example_good2")' file
changenumber;demo$host-example_good1$fine-example_good2
changenumber;echo$host-example_good1$fine-example_good2

Zsh backslash madness?

Zsh seems to do some weird backslashing when you try to echo a bunch of backslashes. I cannot seem to figure out a very clear pattern to this. Any reasons for this madness? Of course, if I actually wanted to use backslashes properly, then I'd use proper quoting etc, but why does this happen in the first place?
Here's a small example to show the same:
$ echo \\
\
$ echo \\ \\
\ \
$ echo \\ \\ \\
\ \ \
$ echo \\ \\ \\ \\
\ \ \ \
$ echo \\\\ \\ \\
\ \ \
$ echo \\\\\\ \\
\\ \
$ echo \\\\\\\\
\\
I initially independently discovered this a while ago, but was reminded of it by this tweet by Zach Riggle.

In the first step, the echo command is not special. The command line is parsed by rules that are independent of what command is being executed. The overall effect of this step is convert your command from a series of characters to a series of words.
The two general parsing rules you need to know to understand this example are: the space character separates words, and the backslash character escapes special characters, including itself.
So the command echo \\ becomes a list of 2 words:
echo
\
The first backslash escapes the second one, resulting in a single backslash being in the second word.
echo \\ \\ \\ \\
becomes this list of words:
echo
\
\
\
\
Now command line parsing is done. Only now does the shell look for a command named by the first word. Up until now, the fact that the command is echo has been irrelevant. If you'd said cat \\ \\ \\ \\, cat would be invoked with 4 argument words, each containing a single backslash.
Normally when you run echo you'll be getting the shell builtin command. The zsh builtin echo has configurable behavior. I like to use setopt BSD_ECHO to select BSD-style echo behavior, but from your sample output it appears you are in the default mode, SysV-style.
BSD-style echo doesn't do any backslash processing, it would just print them as it received them.
SysV echo processes backslash escapes like in C strings - \t becomes a tab character, \r becomes a carriage return, etc. Also \c is interpreted as "end the output without a newline".
So if you said echo a\\tb then the shell parsing would result in a single backslash in the argument word given to echo, and echo would interpret a\tb and print a and b separated by a tab. It would be more readable if written as echo 'a\tb', using apostrophes to provide quoting at the shell-command-parsing level. Likewise echo \\\\ is two backslashes after command line parsing, so echo sees \\ and outputs one backslash. If you wanted to print literally a\tb without using an other form of quoting, you'd have to says echo a\\\\tb.
So the shell has a simple rule - two backslashes on the command line to make one backslash in the argument word. And echo has a simple rule - two backslashes in the argument word to make one backslash in the output.
But there's a problem... when echo does its thing, a backslash followed by t means output a tab, a backslash followed by a backslash means output a backslash... but there are lots of combinations that don't mean anything. A backslash followed by T for example is not a valid escape sequence. In C it would be a warning or an error. But the echo command tries to be more tolerant.
Try echo \\T or echo '\T' and you will discover that a backslash followed by anything that doesn't have a defined meaning as a backslash escape will just cause echo to output both characters as-is.
Which brings us to the last case: what if the backslash isn't followed by anything at all? What if it's the last character in the argument word? In that case, echo just outputs the backslash.
So in summary, two backslashes in the argument word result in one backslash in the output. But one backslash in the argument word also results in one backslash in the output, if it is the last character in the word or if the backslash together with the next character don't form a valid escape sequence.
The command line echo \\\\ thus becomes the word list
echo
\\
which outputs a single backslash "properly", with quoting applied at all levels.
The command line echo \\ becomes the word list
echo
\
which output a single backslash "messily", because echo found a stray backslash at the end of the argument and was generous enough to output it for you even though it wasn't escaped.
The rest of the examples should be clear from these principles.

Using Grep in Unix to find dollar values from $10.00-$99.99

Trying to write the correct grep command to search a file for occurrences of any dollar values ranging from $10.00 to $99.99. My main concern is if the $ symbol needs an escape \. So far I have this
grep '$[1-9][0-9]\.[0-9][0-9]' file
just wondering if it should be
grep '\$[1-9][0-9]\.[0-9][0-9]' file
instead.

Both $ and . will need escape due to . being the greedy character matcher.

Parsing line continuations

What is the simplest way to parse line continuation characters? This seems like such a basic action that I'm surprised there's no basic command for doing this. 'while read' and 'while read -r' loops don't do what I want, and the easiest solution I've found is the sed solution below. Is there a way to do this with something basic like tr?
$ cat input
Output should be \
one line with a '\' character.
$ while read l; do echo $l; done < input
Output should be one line with a '' character.
$ while read -r l; do echo $l; done < input
Output should be \
one line with a '\' character.
$ sed '/\\$/{N; s/\\\n//;}' input
Output should be one line with a '\' character.
$ perl -0777 -pe 's/\\\n//s' input
Output should be one line with a '\' character.

If by "simplest" you mean concise and legible, I'd suggest your perl-ism with one small modification:
$ perl -pe 's/\\\n//' /tmp/line-cont
No need for the possibly memory intensive ... -0777 ... (whole file slurp mode) switch.
If, however, by "simplest" you mean not the leaving shell, this will suffice:
$ { while read -r LINE; do
printf "%s" "${LINE%\\}"; # strip line-continuation, if any
test "${LINE##*\\}" && echo; # emit newline for non-continued lines
done; } < /tmp/input
(I prefer printf "%s" $USER_INPUT to echo $USER_INPUT because echo cannot portably be told to stop looking for switches, and printf is commonly a built-in anyway.)
Just tuck that in a user-defined function and never be revolted by it again. Caution: this latter approach will add a trailing newline to a file which lacks one.

The regex way looks like the way to go.

I would go with the Perl solution simply because it will likely be the most extensible if you want to add more functionality later.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex