Efficient way to add two lines at the beginning of a very large file - unix

I have a group of very large (a couple of GB's each) text files. I need to add two lines at the beginning of each of these files.
I tried using sed with the following command
sed -i '1iFirstLine'
sed -i '2iSecondLine'
The problem with sed is that it loops through the entire file, even if had to add only two lines at the beginning and therefore it takes lot of time.
Is there an alternate way to do this more efficiently, without reading the entire file?

You should try
echo "1iFirstLine" > newfile.txt
echo "2iSecondLine" >> newfile.txt
cat oldfile.txt >> newfile.txt
mv newfile.txt oldfile.txt

This one is perfectly working and its extremely fast too.
perl -pi -e '$.=0 if eof;print "first line\nsecond line\n" if ($.==1)' *.txt

Adding at the beginning is not possible without file rewrite (contrary to appending to the end). You simply cannot "shift" file content as no filesystem supports that. So you should do:
echo -e "line 1\nLine2" > tmp.txt
cat tmp2.txt oldbigfile.txt > newbigfile.txt
rm oldbigfile.txt
mv newbigfile.txt oldbigfile.txt
Note you need enough diskspace to hold both files for a while.

Related

How to cat using part of a filename in terminal?

I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.
for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)
It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.

Changing a line in a file on UNIX

I have created a file named "asd.txt" on a UNIX based system.
I added four lines by using echo command.
Now, I would like to change the first line of this file.
I am not allowed to use any text editors, such as vi.
I have to do this by using only command line. Can anyone help?
Thanks.
Here is how you could do it with sed.
sed '1 s/search/replace/' asd.txt
If you are feeling up to it and have GNU sed, use the -i switch to do it in place.
If you want to replace the entire first line how about doing this?
echo "Here is my new first line" && sed '1d' asd.txt
For both of these commands you can redirect the output to a new file using the > operator.
#!/bin/bash
cat <(echo "Replacement") <(tail -n +2 foo.txt)

In-place processing with grep

I've got a script that calls grep to process a text file. Currently I am doing something like this.
$ grep 'SomeRegEx' myfile.txt > myfile.txt.temp
$ mv myfile.txt.temp myfile.txt
I'm wondering if there is any way to do in-place processing, as in store the results to the same original file without having to create a temporary file and then replace the original with the temp file when processing is done.
Of course I welcome comments as to why this should or should not be done, but I'm mainly interested in whether it can be done. In this example I'm using grep, but I'm interested about Unix tools in general. Thanks!
sponge (in moreutils package in Debian/Ubuntu) reads input till EOF and writes it into file, so you can grep file and write it back to itself.
Like this:
grep 'pattern' file | sponge file
Perl has the -i switch, so does sed and Ruby
sed -i.bak -n '/SomeRegex/p' file
ruby -i.bak -ne 'print if /SomeRegex/' file
But note that all it ever does is creating "temp" files at the back end which you think you don't see, that's all.
Other ways, besides grep
awk
awk '/someRegex/' file > t && mv t file
bash
while read -r line;do case "$line" in *someregex*) echo "$line";;esac;done <file > t && mv t file
No, in general it can't be done in Unix like this. You can only create/truncate (with >) or append to a file (with >>). Once truncated, the old contents would be lost.
In general, this can't be done. But Perl has the -i switch:
perl -i -ne 'print if /SomeRegEx/' myfile.txt
Writing -i.bak will cause the original to be saved in myfile.txt.bak.
(Of course internally, Perl just does basically what you're already doing -- there's no special magic involved.)
To edit file in-place using vim-way, try:
$ ex -s +'%!grep foo' -cxa myfile.txt
Alternatively use sed or gawk.
Most installations of sed can do in-place editing, check the man page, you probably want the -i flag.
Store in a variable and then assign it to the original file:
A=$(cat aux.log | grep 'Something') && echo "${A}" > aux.log
Take a look at my slides "Field Guide To the Perl Command-Line Options" at http://petdance.com/perl/command-line-options.pdf for more ideas on what you can do in place with Perl.
cat myfile.txt | grep 'sometext' > myfile.txt
This will find sometext in myfile.txt and save it back to myfile.txt, this will accomplish what you want. Not sure about regex, but it does work for text.

Unix: prepending a file without a dummy-file?

I do not want:
$ cat file > dummy; $ cat header dummy > file
I want similar to the command below but to the beginning, not to the end:
$ cat header >> file
You can't append to the beginning of a file without rewriting the file. The first way you gave is the correct way to do this.
This is easy to do in sed if you can embed the header string directly in the command:
$ sed -i "1iheader1,header2,header3"
Or if you really want to read it from a file, you can do so with bash's help:
$ sed -i "1i$(<header)" file
BEWARE that "-i" overwrites the input file with the results. If you want sed to make a backup, change it to "-i.bak" or similar, and of course always test first with sample data in a temp directory to be sure you understand what's going to happen before you apply to your real data.
The whole dummy file thing is pretty annoying. Here's a 1-liner solution that I just tried out which seems to work.
echo "`cat header file`" > file
The ticks make the part inside quotes execute first so that it doesn't complain about the output file being an input file. It seems related to hhh's solution but a bit shorter. I suppose if the files are really large this might cause problems though because it seems like I've seen the shell complain about the ticks making commands too long before. Somewhere the part that is executed first must be stored in a buffer so that the original can be overwritten, but I'm not enough of an expert to know what/where that buffer would be or how large it could be.
You can't prepend to a file without reading all the contents of the file and writing a new file with your prepended text + contents of the file. Think of a file in Unix as a stream of bytes - it's easy to append to an end of a stream, but there is no easy operation to "rewind" the stream and write to it. Even a seek operation to the beginning of the file will overwrite the beginning of with any data you write.
One possibility is to use a here-document:
cat > "prependedfile" << ENDENDEND
prepended line(s)
`cat "file"`
ENDENDEND
There may be a memory limitation to this trick.
Thanks to right searchterm!
echo "include .headers.java\n$(cat fileObject.java )" > fileObject.java
Then with a file:
echo "$(cat .headers.java)\n\n$(cat fileObject.java )" > fileObject.java
if you want to pre-pend "header" to "file" why not append "file" to "Header"
cat file >> header
Below is a simple c-shell attempt to solve this problem. This "prepend.sh" script takes two parameters:
$1 - The file containing the pre-appending wording.
$2 - The original/target file to be modified.
#!/bin/csh
if (if ./tmp.txt) then
rm ./tmp.txt
endif
cat $1 > ./tmp.txt
cat $2 >> ./tmp.txt
mv $2 $2.bak
mv ./tmp.txt $2

How do I extract lines from a file using their line number on unix?

Using sed or similar how would you extract lines from a file? If I wanted lines 1, 5, 1010, 20503 from a file, how would I get these 4 lines?
What if I have a fairly large number of lines I need to extract?
If I had a file with 100 lines, each representing a line number that I wanted to extract from another file, how would I do that?
Something like "sed -n '1p;5p;1010p;20503p'. Execute the command "man sed" for details.
For your second question, I'd transform the input file into a bunch of sed(1) commands to print the lines I wanted.
with awk it's as simple as:
awk 'NR==1 || NR==5 || NR==1010' "file"
#OP, you can do this easier and more efficiently with awk. so for your first question
awk 'NR~/^(1|2|5|1010)$/{print}' file
for 2nd question
awk 'FNR==NR{a[$1];next}(FNR in a){print}' file_with_linenr file
This ain't pretty and it could exceed command length limits under some circumstances*:
sed -n "$(while read a; do echo "${a}p;"; done < line_num_file)" data_file
Or its much slower but more attractive, and possibly more well-behaved, sibling:
while read a; do echo "${a}p;"; done < line_num_file | xargs -I{} sed -n \{\} data_file
A variation:
xargs -a line_num_file -I{} sed -n \{\}p\; data_file
You can speed up the xarg versions a little bit by adding the -P option with some large argument like, say, 83 or maybe 419 or even 1177, but 10 seems as good as any.
*xargs --show-limits </dev/null can be instructive
I'd investigate Perl, since it has the regexp facilities of sed plus the programming model surrounding it to allow you to read a file line by line, count the lines and extract according to what you want (including from a file of line numbers).
my $row = 1
while (<STDIN>) {
# capture the line in $_ and check $row against a suitable list.
$row++;
}
In Perl:
perl -ne 'print if $. =~ m/^(1|5|1010|20503)$/' file

Resources