How do I extract lines from a file using their line number on unix? - unix

Using sed or similar how would you extract lines from a file? If I wanted lines 1, 5, 1010, 20503 from a file, how would I get these 4 lines?
What if I have a fairly large number of lines I need to extract?
If I had a file with 100 lines, each representing a line number that I wanted to extract from another file, how would I do that?

Something like "sed -n '1p;5p;1010p;20503p'. Execute the command "man sed" for details.
For your second question, I'd transform the input file into a bunch of sed(1) commands to print the lines I wanted.

with awk it's as simple as:
awk 'NR==1 || NR==5 || NR==1010' "file"

#OP, you can do this easier and more efficiently with awk. so for your first question
awk 'NR~/^(1|2|5|1010)$/{print}' file
for 2nd question
awk 'FNR==NR{a[$1];next}(FNR in a){print}' file_with_linenr file

This ain't pretty and it could exceed command length limits under some circumstances*:
sed -n "$(while read a; do echo "${a}p;"; done < line_num_file)" data_file
Or its much slower but more attractive, and possibly more well-behaved, sibling:
while read a; do echo "${a}p;"; done < line_num_file | xargs -I{} sed -n \{\} data_file
A variation:
xargs -a line_num_file -I{} sed -n \{\}p\; data_file
You can speed up the xarg versions a little bit by adding the -P option with some large argument like, say, 83 or maybe 419 or even 1177, but 10 seems as good as any.
*xargs --show-limits </dev/null can be instructive

I'd investigate Perl, since it has the regexp facilities of sed plus the programming model surrounding it to allow you to read a file line by line, count the lines and extract according to what you want (including from a file of line numbers).
my $row = 1
while (<STDIN>) {
# capture the line in $_ and check $row against a suitable list.
$row++;
}

In Perl:
perl -ne 'print if $. =~ m/^(1|5|1010|20503)$/' file

Related

How can I identify lines from a delimited file, based on a lookup file in unix

Assume that there are two files
File1 - lookup.txt
CAN
USD
INR
EUR
Another file Input.txt
1~Canada~CAN
2~United States of America~USD
3~Brazil~BRL
Both files may be very huge, hypothetically several thousand of records . Now I'm trying to identify the records in Input.txt and identify them based on values in lookup file.
The expected output should be
1~Canada~CAN
2~United States of America~USD
I tried to do something like below
#!/bin/sh
lookupFile=$1 #lookup.txt
inputFile=$2 #input.txt
outputFile=$3 #output.txt
while IFS= read -r line
do
awk -F'~' '{if ($3==$line) print >> $outputFile}' $inputFile
done < "$lookupFile"
But I'm getting error like
awk: cmd. line:1: (FILENAME=input.txt FNR=2) fatal: can't redirect to
How can I fix this issue ? Also if the files really huge, with several thousand of records to search, is this an efficient way ?
With your shown samples please try following awk code. We could do this in single awk we need to take care of setting field separator as ~ before input.txt.
awk 'FNR==NR{arr[$0];next} ($3 in arr)' lookup.txt FS="~" input.txt
Explanation:
awk ' ##starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when lookup.txt is being read.
arr[$0] ##Creating array arr with $0 as index.
next ##next to skip all further statements from here.
}
($3 in arr) ##If $3 is present in arr then print that line.
' lookup.txt FS="~" input.txt ##Mentioning Input_files and setting FS to ~ before input.txt
A non-awk solution that you could compare with on the performance point of view:
$ grep -wFf lookup.txt input.txt
1~Canada~CAN
2~United States of America~USD
Warning: this does not match only on the last word. So if some values in lookup.txt can also be found elsewhere in input.txt, prefer another solution. Or, if it contains nothing that could be interpreted as a regular expression operator, preprocess lookup.txt before grep. Example with bash, sed and grep:
$ grep -f <( sed 's/.*/~&$/' lookup.txt ) input.txt
1~Canada~CAN
2~United States of America~USD

How to find a pattern using sed?

How can I combine multiple filters using sed?
Here's my data set
sex,city,age
male,london,32
male,manchester,32
male,oxford,64
female,oxford,23
female,london,33
male,oxford,45
I want to identify all lines which contain MALE AND OXFORD. Here's my approach:
sed -n '/male/,/oxford/p' file
Thanks
You can associate a block with the first check and put the second in there. For example:
sed -n '/male/ { /oxford/ p; }' file
Or invert the check and action:
sed '/male/!d; /oxford/!d' file
However, since (as #Jotne points out) lines that contain female also contain male and you probably don't want to match them, the patterns should at least be amended to contain word boundaries:
sed -n '/\<male\>/ { /\<oxford\>/ p; }' file
sed '/\<male\>/!d; /\<oxford\>/!d' file
But since that looks like comma-separated data and the check is probably not meant to test whether someone went to male university, it would probably be best to use a stricter check with awk:
awk -F, '$1 == "male" && $2 == "oxford"' file
This checks not only if a line contains male and oxford but also if they are in the appropriate fields. The same can be achieved, somewhat less prettily, with sed by using
sed '/^male,oxford,/!d' file
A single sed command command can be used to solve this. Let's look at two variations of using sed:
$ sed -e 's/^\(male,oxford,.*\)$/\1/;t;d' file
male,oxford,64
male,oxford,45
$ sed -e 's/^male,oxford,\(.*\)$/\1/;t;d' file
64
45
Both have the essentially the same regex:
^male,oxford,.*$
The interesting features are the capture group placement (either the whole line or just the age portion) and the use of ;t;d to discard non matching lines.
By doing it this way, we can avoid the requirement of using awk or grep to solve this problem.
You can use awk
awk -F, '/\<male\>/ && /\<oxford\>/' file
male,oxford,64
male,oxford,45
It uses the word anchor to prevent hit on female.

Efficient way to add two lines at the beginning of a very large file

I have a group of very large (a couple of GB's each) text files. I need to add two lines at the beginning of each of these files.
I tried using sed with the following command
sed -i '1iFirstLine'
sed -i '2iSecondLine'
The problem with sed is that it loops through the entire file, even if had to add only two lines at the beginning and therefore it takes lot of time.
Is there an alternate way to do this more efficiently, without reading the entire file?
You should try
echo "1iFirstLine" > newfile.txt
echo "2iSecondLine" >> newfile.txt
cat oldfile.txt >> newfile.txt
mv newfile.txt oldfile.txt
This one is perfectly working and its extremely fast too.
perl -pi -e '$.=0 if eof;print "first line\nsecond line\n" if ($.==1)' *.txt
Adding at the beginning is not possible without file rewrite (contrary to appending to the end). You simply cannot "shift" file content as no filesystem supports that. So you should do:
echo -e "line 1\nLine2" > tmp.txt
cat tmp2.txt oldbigfile.txt > newbigfile.txt
rm oldbigfile.txt
mv newbigfile.txt oldbigfile.txt
Note you need enough diskspace to hold both files for a while.

Unix command to prepend text to a file

Is there a Unix command to prepend some string data to a text file?
Something like:
prepend "to be prepended" text.txt
printf '%s\n%s\n' "to be prepended" "$(cat text.txt)" >text.txt
sed -i.old '1s;^;to be prepended;' inFile
-i writes the change in place and take a backup if any extension is given. (In this case, .old)
1s;^;to be prepended; substitutes the beginning of the first line by the given replacement string, using ; as a command delimiter.
Process Substitution
I'm surprised no one mentioned this.
cat <(echo "before") text.txt > newfile.txt
which is arguably more natural than the accepted answer (printing something and piping it into a substitution command is lexicographically counter-intuitive).
...and hijacking what ryan said above, with sponge you don't need a temporary file:
sudo apt-get install moreutils
<<(echo "to be prepended") < text.txt | sponge text.txt
EDIT: Looks like this doesn't work in Bourne Shell /bin/sh
Here String (zsh only)
Using a here-string - <<<, you can do:
<<< "to be prepended" < text.txt | sponge text.txt
This is one possibility:
(echo "to be prepended"; cat text.txt) > newfile.txt
you'll probably not easily get around an intermediate file.
Alternatives (can be cumbersome with shell escaping):
sed -i '0,/^/s//to be prepended/' text.txt
If it's acceptable to replace the input file:
Note:
Doing so may have unexpected side effects, notably potentially replacing a symlink with a regular file, ending up with different permissions on the file, and changing the file's creation (birth) date.
sed -i, as in Prince John Wesley's answer, tries to at least restore the original permissions, but the other limitations apply as well.
Here's a simple alternative that uses a temporary file (it avoids reading the whole input file into memory the way that shime's solution does):
{ printf 'to be prepended'; cat text.txt; } > tmp.txt && mv tmp.txt text.txt
Using a group command ({ ...; ...; }) is slightly more efficient than using a subshell ((...; ...)), as in 0xC0000022L's solution.
The advantages are:
It's easy to control whether the new text should be directly prepended to the first line or whether it should be inserted as new line(s) (simply append \n to the printf argument).
Unlike the sed solution, it works if the input file is empty (0 bytes).
The sed solution can be simplified if the intent is to prepend one or more whole lines to the existing content (assuming the input file is non-empty):
sed's i function inserts whole lines:
With GNU sed:
# Prepends 'to be prepended' *followed by a newline*, i.e. inserts a new line.
# To prepend multiple lines, use '\n' as part of the text.
# -i.old creates a backup of the input file with extension '.old'
sed -i.old '1 i\to be prepended' inFile
A portable variant that also works with macOS / BSD sed:
# Prepends 'to be prepended' *followed by a newline*
# To prepend multiple lines, escape the ends of intermediate
# lines with '\'
sed -i.old -e '1 i\
to be prepended' inFile
Note that the literal newline after the \ is required.
If the input file must be edited in place (preserving its inode with all its attributes):
Using the venerable ed POSIX utility:
Note:
ed invariably reads the input file as a whole into memory first.
To prepend directly to the first line (as with sed, this won't work if the input file is completely empty (0 bytes)):
ed -s text.txt <<EOF
1 s/^/to be prepended/
w
EOF
-s suppressed ed's status messages.
Note how the commands are provided to ed as a multi-line here-document (<<EOF\n...\nEOF), i.e., via stdin; by default string expansion is performed in such documents (shell variables are interpolated); quote the opening delimiter to suppress that (e.g., <<'EOF').
1 makes the 1st line the current line
function s performs a regex-based string substitution on the current line, as in sed; you may include literal newlines in the substitution text, but they must be \-escaped.
w writes the result back to the input file (for testing, replace w with ,p to only print the result, without modifying the input file).
To prepend one or more whole lines:
As with sed, the i function invariably adds a trailing newline to the text to be inserted.
ed -s text.txt <<EOF
0 i
line 1
line 2
.
w
EOF
0 i makes 0 (the beginning of the file) the current line and starts insert mode (i); note that line numbers are otherwise 1-based.
The following lines are the text to insert before the current line, terminated with . on its own line.
This will work to form the output. The - means standard input, which is provide via the pipe from echo.
echo -e "to be prepended \n another line" | cat - text.txt
To rewrite the file a temporary file is required as cannot pipe back into the input file.
echo "to be prepended" | cat - text.txt > text.txt.tmp
mv text.txt.tmp text.txt
Prefer Adam's answer
We can make it easier to use sponge. Now we don't need to create a temporary file and rename it by
echo -e "to be prepended \n another line" | cat - text.txt | sponge text.txt
Probably nothing built-in, but you could write your own pretty easily, like this:
#!/bin/bash
echo -n "$1" > /tmp/tmpfile.$$
cat "$2" >> /tmp/tmpfile.$$
mv /tmp/tmpfile.$$ "$2"
Something like that at least...
Editor's note:
This command will result in data loss if the input file happens to be larger than your system's pipeline buffer size, which is typically 64 KB nowadays. See the comments for details.
In some circumstances prepended text may available only from stdin.
Then this combination shall work.
echo "to be prepended" | cat - text.txt | tee text.txt
If you want to omit tee output, then append > /dev/null.
Another way using sed:
sed -i.old '1 {i to be prepended
}' inFile
If the line to be prepended is multiline:
sed -i.old '1 {i\
to be prepended\
multiline
}' inFile
Solution:
printf '%s\n%s' 'text to prepend' "$(cat file.txt)" > file.txt
Note that this is safe on all kind of inputs, because there are no expansions. For example, if you want to prepend !##$%^&*()ugly text\n\t\n, it will just work:
printf '%s\n%s' '!##$%^&*()ugly text\n\t\n' "$(cat file.txt)" > file.txt
The last part left for consideration is whitespace removal at end of file during command substitution "$(cat file.txt)". All work-arounds for this are relatively complex. If you want to preserve newlines at end of file.txt, see this: https://stackoverflow.com/a/22607352/1091436
As tested in Bash (in Ubuntu), if starting with a test file via;
echo "Original Line" > test_file.txt
you can execute;
echo "$(echo "New Line"; cat test_file.txt)" > test_file.txt
or, if the version of bash is too old for $(), you can use backticks;
echo "`echo "New Line"; cat test_file.txt`" > test_file.txt
and receive the following contents of "test_file.txt";
New Line
Original Line
No intermediary file, just bash/echo.
Another fairly straight forward solution is:
$ echo -e "string\n" $(cat file)
% echo blaha > blaha
% echo fizz > fizz
% cat blaha fizz > buzz
% cat buzz
blaha
fizz
You can do that easily with awk
cat text.txt|awk '{print "to be prepended"$0}'
It seems like the question is about prepending a string to the file not each line of the file, in this case as suggested by Tom Ekberg the following command should be used instead.
awk 'BEGIN{print "to be prepended"} {print $0}' text.txt
If you like vi/vim, this may be more your style.
printf '0i\n%s\n.\nwq\n' prepend-text | ed file
For future readers who want to append one or more lines of text (with variables or even subshell code) and keep it readable and formatted, you may enjoy this:
echo "Lonely string" > my-file.txt
Then run
cat <<EOF > my-file.txt
Hello, there!
$(cat my-file.txt)
EOF
Results of cat my-file.txt:
Hello, there!
Lonely string
This works because the read of my-file.txt happens first and in a subshell. I use this trick all the time to append important rules to config files in Docker containers rather than copy over entire config files.
you can use variables
Even though a bunsh of answers here work pretty well, I want to contribute this one-liner, just for completeness. At least it is easy to keep in mind and maybe contributes to some general understanding of bash for some people.
PREPEND="new line 1"; FILE="text.txt"; printf "${PREPEND}\n`cat $FILE`" > $FILE
In this snippe just replace text.txt with the textfile you want to prepend to and new line 1 with the text to prepend.
example
$ printf "old line 1\nold line 2" > text.txt
$ cat text.txt; echo ""
old line 1
old line 2
$ PREPEND="new line 1"; FILE="text.txt"; printf "${PREPEND}\n`cat $FILE`" > $FILE
$ cat text.txt; echo ""
new line 1
old line 1
old line 2
$
# create a file with content..
echo foo > /tmp/foo
# prepend a line containing "jim" to the file
sed -i "1s/^/jim\n/" /tmp/foo
# verify the content of the file has the new line prepened to it
cat /tmp/foo
I'd recommend defining a function and then importing and using that where needed.
prepend_to_file() {
file=$1
text=$2
if ! [[ -f $file ]] then
touch $file
fi
echo "$text" | cat - $file > $file.new
mv -f $file.new $file
}
Then use it like so:
prepend_to_file test.txt "This is first"
prepend_to_file test.txt "This is second"
Your file contents will then be:
This is second
This is first
I'm about to use this approach for implementing a change log updater.
With ex,
ex - $file << PREPEND
-1
i
prepended text
.
wq
PREPEND
The ex commands are
-1 Go to the very beginning of the file
i Begin insert mode
. End insert mode
wq Save (write) and quit

Interpret as fixed string/literal and not regex using sed

For grep there's a fixed string option, -F (fgrep) to turn off regex interpretation of the search string.
Is there a similar facility for sed? I couldn't find anything in the man. A recommendation of another gnu/linux tool would also be fine.
I'm using sed for the find and replace functionality: sed -i "s/abc/def/g"
Do you have to use sed? If you're writing a bash script, you can do
#!/bin/bash
pattern='abc'
replace='def'
file=/path/to/file
tmpfile="${TMPDIR:-/tmp}/$( basename "$file" ).$$"
while read -r line
do
echo "${line//$pattern/$replace}"
done < "$file" > "$tmpfile" && mv "$tmpfile" "$file"
With an older Bourne shell (such as ksh88 or POSIX sh), you may not have that cool ${var/pattern/replace} structure, but you do have ${var#pattern} and ${var%pattern}, which can be used to split the string up and then reassemble it. If you need to do that, you're in for a lot more code - but it's really not too bad.
If you're not in a shell script already, you could pretty easily make the pattern, replace, and filename parameters and just call this. :)
PS: The ${TMPDIR:-/tmp} structure uses $TMPDIR if that's set in your environment, or uses /tmp if the variable isn't set. I like to stick the PID of the current process on the end of the filename in the hopes that it'll be slightly more unique. You should probably use mktemp or similar in the "real world", but this is ok for a quick example, and the mktemp binary isn't always available.
Option 1) Escape regexp characters. E.g. sed 's/\$0\.0/0/g' will replace all occurrences of $0.0 with 0.
Option 2) Use perl -p -e in conjunction with quotemeta. E.g. perl -p -e 's/\\./,/gi' will replace all occurrences of . with ,.
You can use option 2 in scripts like this:
SEARCH="C++"
REPLACE="C#"
cat $FILELIST | perl -p -e "s/\\Q$SEARCH\\E/$REPLACE/g" > $NEWLIST
If you're not opposed to Ruby or long lines, you could use this:
alias replace='ruby -e "File.write(ARGV[0], File.read(ARGV[0]).gsub(ARGV[1]) { ARGV[2] })"'
replace test3.txt abc def
This loads the whole file into memory, performs the replacements and saves it back to disk. Should probably not be used for massive files.
If you don't want to escape your string, you can reach your goal in 2 steps:
fgrep the line (getting the line number) you want to replace, and
afterwards use sed for replacing this line.
E.g.
#/bin/sh
PATTERN='foo*[)*abc' # we need it literal
LINENUMBER="$( fgrep -n "$PATTERN" "$FILE" | cut -d':' -f1 )"
NEWSTRING='my new string'
sed -i "${LINENUMBER}s/.*/$NEWSTRING/" "$FILE"
You can do this in two lines of bash code if you're OK with reading the whole file into memory. This is quite flexible -- the pattern and replacement can contain newlines to match across lines if needed. It also preserves any trailing newline or lack thereof, which a simple loop with read does not.
mapfile -d '' < file
printf '%s' "${MAPFILE//"$pat"/"$rep"}" > file
For completeness, if the file can contain null bytes (\0), we need to extend the above, and it becomes
mapfile -d '' < <(cat file; printf '\0')
last=${MAPFILE[-1]}; unset "MAPFILE[-1]"
printf '%s\0' "${MAPFILE[#]//"$pat"/"$rep"}" > file
printf '%s' "${last//"$pat"/"$rep"}" >> file
perl -i.orig -pse 'while (($i = index($_,$s)) >= 0) { substr($_,$i,length($s), $r)}'--\
-s='$_REQUEST['\'old\'']' -r='$_REQUEST['\'new\'']' sample.txt
-i.orig in-place modification with backup.
-p print lines from the input file by default
-s enable rudimentary parsing of command line arguments
-e run this script
index($_,$s) search for the $s string
substr($_,$i,length($s), $r) replace the string
while (($i = index($_,$s)) >= 0) repeat until
-- end of perl parameters
-s='$_REQUEST['\'old\'']', -r='$_REQUEST['\'new\'']' - set $s,$r
You still need to "escape" ' chars but the rest should be straight forward.
Note: this started as an answer to How to pass special character string to sed hence the $_REQUEST['old'] strings, however this question is a bit more appropriately formulated.
You should be using replace instead of sed.
From the man page:
The replace utility program changes strings in place in files or on the
standard input.
Invoke replace in one of the following ways:
shell> replace from to [from to] ... -- file_name [file_name] ...
shell> replace from to [from to] ... < file_name
from represents a string to look for and to represents its replacement.
There can be one or more pairs of strings.

Resources