Unix - show some characters in file by line number - unix

I have a very huge file and need to look at a few characters in the middle of some huge line.
Is there a way to show easily characters from n1 position to n2 position in line number l in some file?
I think there should be some way to do it with sed, just cannot find corresponding option.

You better use awk:
awk 'NR==line_number {print substr($0,start_position,num_of_characters_to_show)}' file
For example, print 5 characters starting from the 2nd character in the line 2:
$ cat a
1234567890
abcdefghij
$ awk 'NR==2 {print substr($0,2,5)}' a
bcdef
If you really need to use sed, you can use something like:
$ sed -rn '2{s/^.{1}(.{5}).*$/\1/;p}' a
bcdef
This matches 2-1=1 digits after the beginning of the line and then catches 5 to print them back. And all of this is done just in the line 2, so we use -n to prevent the default print of the line.

The elegance of UNIX has always lain in its ability to string together relatively simple programs into pipelines to achieve complexity. You can do a sed-only solution but it's not likely to be as readable as a pipeline.
To that end, you can use a combination of sed to get a specific line and cut to get character positions on that line:
pax> echo '12345
...> abcde
...> fghij' | sed -n 2p | cut -c2-4
bcd
If you just want to use a single tool, awk can do it:
pax> echo '12345
...> abcde
...> fghij' | awk 'NR==2{print substr($0,2,3);exit}'
bcd
So can Perl:
pax> echo '12345
...> abcde
...> fghij' | perl -ne 'if($.==2){print substr($_,1,3); exit}'
In both those cases, it exits after the relevant line to avoid processing the rest of the file.

One solution using only sed, that insert newlines characters just before and after the substring and uses them as flags to remove all content not between them, like:
sed -n '2 { s/.\{5\}/&\n/; s/.\{2\}/&\n/; s/[^\n]*\n//; s/\n.*//; p; q }' infile
Assuming infile like:
1234567890
abcdefghij
It yields:
cde
Not that range is from 2 to 5 but start counting from zero and it excludes the end (so characters 2, 3 and 4). You can handle with it or do some arithmetic just before the command.

Related

Linux - Get Substring from 1st occurence of character

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?
Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions
Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.
I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"
About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101
The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

Sed/Awk: how to insert text in a series pattern in a line

need to insert '\N' between whereever 2 sequencial commas in the line like below:
"abc,,,,5,,,3.2,,"
to:
"abc,\N,\N,\N,5,\N,\N,3.2,\N,"
Also, the number of the consequencial comma is not fixed, maybe 6, 7 or more. Need a flexible way to handle it.
Didn't find a clear solution from the google.
You can just use the following sed command:
sed 's/,,/,\\N,/g;s/,,/,\\N,/g;'
Demo:
$ echo 'abc,,,,5,,,3.2,,' | sed 's/,,/,\\N,/g;s/,,/,\\N,/g;s/,,/,\\N,/g'
abc,\N,\N,\N,5,\N,\N,3.2,\N,
Explanations:
s/,,/,\\N,/g will replace ,, by ,\N, globally on the string, you will have however to do two passes on the pattern space to be sure that all the replacements took place giving the commands: s/,,/,\\N,/g;s/,,/,\\N,/g;.
Additional notes:
To answer to your doubts about this approach not being flexible, I have prepared the following input file.
$ cat input_comma.txt
abc,,,,5,,,3.2,,
,,,,,,def,
1,,,,,,1.2
6commas,,,,,,
7commas,,,,,,,
As you can see, it does not matter how many successive commas are present in the input:
$ sed 's/,,/,\\N,/g;s/,,/,\\N,/g;s/,,/,\\N,/g' input_comma.txt
abc,\N,\N,\N,5,\N,\N,3.2,\N,
,\N,\N,\N,\N,\N,def,
1,\N,\N,\N,\N,\N,1.2
6commas,\N,\N,\N,\N,\N,
7commas,\N,\N,\N,\N,\N,\N,
With awk a similar approach in 2 passes can be implemented in the same way:
$ echo "test,,,mmm,,,,aa,," | awk '{gsub(/\,\,/,",\\N,");gsub(/\,\,/,",\\N,")} 1'
test,\N,\N,mmm,\N,\N,\N,aa,\N,
Could you please try following once.
awk '{gsub(/\,\,/,",\\N,");gsub(/\,\,/,",\\N,")} 1' Input_file
With perl:
perl -pe '1 while s/,,/,\\N,/g'

How to read from a file which starts with particular string using unix

I have a file which has one particular string which never repeats and all my data starts from this string. My requirement is to read all data beneath this string(say [string-start]) and redirect the data read into another file.
#Krishna Kanth: This command may be helpful . Try it :
sed -e 's/^.*\(search-string\)/\1/' input-file > output-file
#Landys:
I tried using below command but found parsing error for the same.
$ sed -ne 'H;1h;${g;s/.*\START-OF-DATA//g;p}' < file.txt > file.out
sed: 0602-404 Function H;1h;${g;s/.*\START-OF-DATA//g;p} cannot be parsed.
Please suggest!!!
It's easy to achieve it with sed in one line.
sed -ne 'H;1h;${g;s/.*string-start//;p}' input.txt > output.txt
Here's the decomposition.
-ne run the following script with quiet mode.
h/H - copy/append pattern space to hold space, and H will append \n to hold space first.
H;1h - just used for copy all text to hold space, 1 match the first line.
s/.../.../ - used to replace text before string-start as empty, which means delete it.
p - print the current pattern space.
${...} - match the last line.
For example, the input.txt is as follows.
abc
def
ghistring-startjkl
mno
pqr
The output.txt will be as follows.
jkl
mno
pqr

How can I delete the second word of every line of top(1) output?

I have a formatted list of processes (top output) and I'd like to remove unnecessary information. How can I remove for example the second word+whitespace of each line.
Example:
1 a hello
2 b hi
3 c ahoi
Id like to delete a b and c.
You can use cut command.
cut -d' ' -f2 --complement file
--complement does the inverse. i.e. with -f2 second field was choosen. And with --complement if prints all fields except the second. This is useful when you have variable number of fields.
GNU's cut has the option --complement. In case, --complement is not available then, the following does the same:
cut -d' ' -f1,3- file
Meaning: print first field and then print from 3rd to the end i.e. Excludes second field and prints the rest.
Edit:
If you prefer awk you can do: awk {$2=""; print $0}' file
This sets the second to empty and prints the whole line (one-by-one).
Using sed to substitute the second column:
sed -r 's/(\w+\s+)\w+\s+(.*)/\1\2/' file
1 hello
2 hi
3 ahoi
Explanation:
(\w+\s+) # Capture the first word and trailing whitespace
\w+\s+ # Match the second word and trailing whitespace
(.*) # Capture everything else on the line
\1\2 # Replace with the captured groups
Notes: Use the -i option to save the results back to the file, -r is for extended regular expressions, check the man as it could be -E depending on implementation.
Or use awk to only print the specified columns:
$ awk '{print $1, $3}' file
1 hello
2 hi
3 ahoi
Both solutions have there merits, the awk solution is nice for a small fixed number of columns but you need to use a temp file to store the changes awk '{print $1, $3}' file > tmp; mv tmp file where as the sed solution is more flexible as columns aren't an issue and the -i option does the edit in place.
One way using sed:
sed 's/ [^ ]*//' file
Results:
1 hello
2 hi
3 ahoi
Using Bash:
$ while read f1 f2 f3
> do
> echo $f1 $f3
> done < file
1 hello
2 hi
3 ahoi
This might work for you (GNU sed):
sed -r 's/\S+\s+//2' file

Delete specific line number(s) from a text file using sed?

I want to delete one or more specific line numbers from a file. How would I do this using sed?
If you want to delete lines from 5 through 10 and line 12th:
sed -e '5,10d;12d' file
This will print the results to the screen. If you want to save the results to the same file:
sed -i.bak -e '5,10d;12d' file
This will store the unmodified file as file.bak, and delete the given lines.
Note: Line numbers start at 1. The first line of the file is 1, not 0.
You can delete a particular single line with its line number by
sed -i '33d' file
This will delete the line on 33 line number and save the updated file.
and awk as well
awk 'NR!~/^(5|10|25)$/' file
$ cat foo
1
2
3
4
5
$ sed -e '2d;4d' foo
1
3
5
$
This is very often a symptom of an antipattern. The tool which produced the line numbers may well be replaced with one which deletes the lines right away. For example;
grep -nh error logfile | cut -d: -f1 | deletelines logfile
(where deletelines is the utility you are imagining you need) is the same as
grep -v error logfile
Having said that, if you are in a situation where you genuinely need to perform this task, you can generate a simple sed script from the file of line numbers. Humorously (but perhaps slightly confusingly) you can do this with sed.
sed 's%$%d%' linenumbers
This accepts a file of line numbers, one per line, and produces, on standard output, the same line numbers with d appended after each. This is a valid sed script, which we can save to a file, or (on some platforms) pipe to another sed instance:
sed 's%$%d%' linenumbers | sed -f - logfile
On some platforms, sed -f does not understand the option argument - to mean standard input, so you have to redirect the script to a temporary file, and clean it up when you are done, or maybe replace the lone dash with /dev/stdin or /proc/$pid/fd/1 if your OS (or shell) has that.
As always, you can add -i before the -f option to have sed edit the target file in place, instead of producing the result on standard output. On *BSDish platforms (including OSX) you need to supply an explicit argument to -i as well; a common idiom is to supply an empty argument; -i ''.
The shortest, deleting the first line in sed
sed -i '1d' file
As Brian states here, <address><command> is used, <address> is <1> and <command> <d>.
I would like to propose a generalization with awk.
When the file is made by blocks of a fixed size
and the lines to delete are repeated for each block,
awk can work fine in such a way
awk '{nl=((NR-1)%2000)+1; if ( (nl<714) || ((nl>1025)&&(nl<1029)) ) print $0}'
OriginFile.dat > MyOutputCuttedFile.dat
In this example the size for the block is 2000 and I want to print the lines [1..713] and [1026..1029].
NR is the variable used by awk to store the current line number.
% gives the remainder (or modulus) of the division of two integers;
nl=((NR-1)%BLOCKSIZE)+1 Here we write in the variable nl the line number inside the current block. (see below)
|| and && are the logical operator OR and AND.
print $0 writes the full line
Why ((NR-1)%BLOCKSIZE)+1:
(NR-1) We need a shift of one because 1%3=1, 2%3=2, but 3%3=0.
+1 We add again 1 because we want to restore the desired order.
+-----+------+----------+------------+
| NR | NR%3 | (NR-1)%3 | (NR-1)%3+1 |
+-----+------+----------+------------+
| 1 | 1 | 0 | 1 |
| 2 | 2 | 1 | 2 |
| 3 | 0 | 2 | 3 |
| 4 | 1 | 0 | 1 |
+-----+------+----------+------------+
cat -b /etc/passwd | sed -E 's/^( )+(<line_number>)(\t)(.*)/--removed---/g;s/^( )+([0-9]+)(\t)//g'
cat -b -> print lines with numbers
s/^( )+(<line_number>)(\t)(.*)//g -> replace line number to null (remove line)
s/^( )+([0-9]+)(\t)//g #remove numbers the cat printed

Resources