Linux - Get Substring from 1st occurence of character

Linux - Get Substring from 1st occurence of character - unix

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?

Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.

I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"

About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101

The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

Related

How to cut from one position to another , if 2 positions having a match in unix

I have a file like this
ABCDEFGH
IJKLMNOP
QRSTUFWH
if 6th position = F and if 8th position = H
I want to cut from position 2 to 4
So the output should be
BCD
RST
I can take records with the matching pattern to another file using this -
grep '^.....F.H' f1.txt > f2.txt
What i want is only position 2 to 4 , which matches the pattern.
Please help
Thank you

Could you please try following.
awk 'substr($0,6,1)=="F" && substr($0,8,1)=="H"{print substr($0,2,3)}' Input_file
Since you added Solaris tag in your question try changing awk to /usr/xpg4/bin/awk in case you are on Solaris.

This might work for you (GNU sed):
sed -En 's/^.(...).F.H.*/\1/p' file
Pattern match and use grouping and back reference to extract the required string.

This POSIX awk should work on most system:
awk '$6=="F" && $8=="H" {print $2$3$4}' FS="" file
BCD
RST
By setting Field Separator to nothing, every character becomes one field, so just test field 6 and 8, and then print field 2-4.

How to read nth line and mth field of text file in unix

Suppose i have | delimeted file,
Line1: 1|2|3|4
Line2: 5|6|7|8
Line3: 9|9|1|0
Now i need to read 3 field at second line which is 7 in above example how i can do that using Cut or Sed Command. I'm new to unix please help

A job for awk:
awk -F '|' 'NR==2{print $3}' file
or
awk -F '|' -v row=2 -v col=3 'NR==row{print $col}' file
Output:
7

This should work:
sed -n '2p' file |awk -F '|' '{print $3}'

This might work for you (GNU sed):
sed -rn '2s/^(([^|]*)\|?){3}.*/\2/p' file
Turn off automatic printing by setting the -n option, turn on easier regexp declaration by -r option. Use pattern matching and back references to replace the whole of the second line by the third field of the same line and print the result.
The address of the substitution command is limited to only the second line.
The regexp groups the non-delimited characters followed by a delimiter a specific number of times. The second group, only retains the non-delimited characters for the specific number. Each grouping is replaced by the next and so the last grouping is reported, the .* consumes the remainder of the line and so only the third field (contents of second group) is printed.
N.B. the delimiter would be present following the final column and is therefore optional \|?

Retrieving a variable name that starts with a specific string

I have a variable name that appears in multiple locations of a text file. This variable will always start with the same string but not always end with the same characters. For example, it can be var_name or var_name_TEXT.
I'm looking for a way to extract the first occurrence in the text file of this string starting with var_name and ending with , (but I don't want the comma in the output).
Example1: var_name, some_other_var, another_one, ....
Output: var_name
Example2: var_name_TEXT, some_other_var, another_one, ...
Output: var_name_TEXT

grep -oPm1 '\bvar_name[^, ]*(?=,)' file | head -1
match and output only variables starting with var_name and ending with comma, do not include comma in the output, quit after the first line of match and pick the first match on that line (if there are more than one)
ps. you have to include space in the regex as well.

I suggest with GNU grep:
grep -o '\bvar_name[^,]*' file | head -n 1

All you need is (GNU awk):
$ awk 'match($0,/\<var_name[^,]*/,a){print a[0]; exit}' file
var_name_TEXT

To print the field only (i.e., var_name or var_name_TEXT only; not the line containing it) you could use awk:
awk -F, '{for (i=1;i<=NF;i++) if ($i~/^var_name/) print $i}' file
If you actually have spaces before or after the commas (as you show in your example) you can change to awk field separator:
awk -F"[, ]+" '{for (i=1;i<=NF;i++) if ($i~/^var_name/) print $i}' file
You can also use GNU grep with a word boundary assertion:
grep -o '\bvar_name[^,]*' file
Or GNU awk:
awk '/\<var_name/' file
If you want only one considered, add exit to awk or -m 1 to grep to exit after the first match.

how to grep nth string

How to use "grep" shell command to show specific word from a line starting with a specific word.
Ex:
I want to print a string "myFTPpath/folderName/" from the line starting with searchStr in the below mentioned line.
searchStr:somestring:myFTPpath/folderName/:somestring

Something like this with awk:
awk -F: '/^searchStr/{print $3}' File
From all the lines starting with searchStr, print the 3rd field (field seperator set as :)
Sample:
AMD$ cat File
someStr:somestring:myFTPpath/folderName/:somestring
someStr:somestring:myFTPpath/folderName/:somestring
searchStr:somestring:myFTPpath/folderName/:somestring
someStr:somestring:myFTPpath/folderName/:somestring
AMD$ awk -F: '/^searchStr/{print $3}' File
myFTPpath/folderName/

Remember that grep isn't the only tool that can usefully do searches.
In this particular case, where the lines are naturally broken into fields, awk is probably the best solution, as #A.M.D's answer suggests.
For more general case edits, however, remember sed's -n option, which suppresses printing out a line after edits:
sed -n 's/searchStr:[^:]*:\([^:]*\):.*/\1/p' input-file
The -n suppresses automatic printing of the line, and the trailing /p flag explicitly prints out lines on which there is a substitution.
This matching pattern is fiddly – use awk in this fielded case – but don't forget sed -n.

You could get the desired output with grep itself but you need to enable -P and -o parameters.
$ echo 'searchStr:somestring:myFTPpath/folderName/:somestring' | grep -oP '^searchStr:[^:]*:\K[^:]*'
myFTPpath/folderName/
\K discards the characters which are matched previously from printing at the final leaving only the characters which are matched by the pattern exists next to \K. Here we used \K instead of a variable length positive lookbehind assertion.

How can I delete the second word of every line of top(1) output?

I have a formatted list of processes (top output) and I'd like to remove unnecessary information. How can I remove for example the second word+whitespace of each line.
Example:
1 a hello
2 b hi
3 c ahoi
Id like to delete a b and c.

You can use cut command.
cut -d' ' -f2 --complement file
--complement does the inverse. i.e. with -f2 second field was choosen. And with --complement if prints all fields except the second. This is useful when you have variable number of fields.
GNU's cut has the option --complement. In case, --complement is not available then, the following does the same:
cut -d' ' -f1,3- file
Meaning: print first field and then print from 3rd to the end i.e. Excludes second field and prints the rest.
Edit:
If you prefer awk you can do: awk {$2=""; print $0}' file
This sets the second to empty and prints the whole line (one-by-one).

Using sed to substitute the second column:
sed -r 's/(\w+\s+)\w+\s+(.*)/\1\2/' file
1 hello
2 hi
3 ahoi
Explanation:
(\w+\s+) # Capture the first word and trailing whitespace
\w+\s+ # Match the second word and trailing whitespace
(.*) # Capture everything else on the line
\1\2 # Replace with the captured groups
Notes: Use the -i option to save the results back to the file, -r is for extended regular expressions, check the man as it could be -E depending on implementation.
Or use awk to only print the specified columns:
$ awk '{print $1, $3}' file
1 hello
2 hi
3 ahoi
Both solutions have there merits, the awk solution is nice for a small fixed number of columns but you need to use a temp file to store the changes awk '{print $1, $3}' file > tmp; mv tmp file where as the sed solution is more flexible as columns aren't an issue and the -i option does the edit in place.

One way using sed:
sed 's/ [^ ]*//' file
Results:
1 hello
2 hi
3 ahoi

Using Bash:
$ while read f1 f2 f3
> do
> echo $f1 $f3
> done < file
1 hello
2 hi
3 ahoi

This might work for you (GNU sed):
sed -r 's/\S+\s+//2' file

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Linux - Get Substring from 1st occurence of character - unix

Use grep like so: echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b' 20220101 20220101 Here, GNU grep uses the following options: -P : Use Perl regexes. -o : Print the matches only (1 match per line), not the entire lines. SEE ALSO: grep manual perlre - Perl regular expressions

Related

How to cut from one position to another , if 2 positions having a match in unix

How to read nth line and mth field of text file in unix

Retrieving a variable name that starts with a specific string

how to grep nth string

How can I delete the second word of every line of top(1) output?

Categories

Resources