Unix: prepending a file without a dummy-file? - unix

I do not want:
$ cat file > dummy; $ cat header dummy > file
I want similar to the command below but to the beginning, not to the end:
$ cat header >> file

You can't append to the beginning of a file without rewriting the file. The first way you gave is the correct way to do this.

This is easy to do in sed if you can embed the header string directly in the command:
$ sed -i "1iheader1,header2,header3"
Or if you really want to read it from a file, you can do so with bash's help:
$ sed -i "1i$(<header)" file
BEWARE that "-i" overwrites the input file with the results. If you want sed to make a backup, change it to "-i.bak" or similar, and of course always test first with sample data in a temp directory to be sure you understand what's going to happen before you apply to your real data.

The whole dummy file thing is pretty annoying. Here's a 1-liner solution that I just tried out which seems to work.
echo "`cat header file`" > file
The ticks make the part inside quotes execute first so that it doesn't complain about the output file being an input file. It seems related to hhh's solution but a bit shorter. I suppose if the files are really large this might cause problems though because it seems like I've seen the shell complain about the ticks making commands too long before. Somewhere the part that is executed first must be stored in a buffer so that the original can be overwritten, but I'm not enough of an expert to know what/where that buffer would be or how large it could be.

You can't prepend to a file without reading all the contents of the file and writing a new file with your prepended text + contents of the file. Think of a file in Unix as a stream of bytes - it's easy to append to an end of a stream, but there is no easy operation to "rewind" the stream and write to it. Even a seek operation to the beginning of the file will overwrite the beginning of with any data you write.

One possibility is to use a here-document:
cat > "prependedfile" << ENDENDEND
prepended line(s)
`cat "file"`
ENDENDEND
There may be a memory limitation to this trick.

Thanks to right searchterm!
echo "include .headers.java\n$(cat fileObject.java )" > fileObject.java
Then with a file:
echo "$(cat .headers.java)\n\n$(cat fileObject.java )" > fileObject.java

if you want to pre-pend "header" to "file" why not append "file" to "Header"
cat file >> header

Below is a simple c-shell attempt to solve this problem. This "prepend.sh" script takes two parameters:
$1 - The file containing the pre-appending wording.
$2 - The original/target file to be modified.
#!/bin/csh
if (if ./tmp.txt) then
rm ./tmp.txt
endif
cat $1 > ./tmp.txt
cat $2 >> ./tmp.txt
mv $2 $2.bak
mv ./tmp.txt $2

Related

How to handle a file having header in between the records after removing duplicates from the file

We have a file which has been processed by unix command for removing duplicates. After the de-duplication new file has the header in-between the records. Please help to solve this and thanks in advance for inputs.
Unix Command : Sort -u >
I would do something like this:
grep "headers" >output.txt
grep -v "headers" >>output.txt
The idea is the following: first take the headers and put them into output.txt, and afterwards take everything which is not a header and put it into that output file.
First you need to put the information in the output file (which means you need to create the output file, hence the single > character), secondly you need to append the information to the already existing output file (hence the double >> character).

UNIX how to use the base of an input file as part of an output file

I use UNIX fairly infrequently so I apologize if this seems like an easy question. I am trying to loop through subdirectories and files, then generate an output from the specific files that the loop grabs, then pipe an output to a file in another directory whos name will be identifiable from the input file. SO far I have:
for file in /home/sub_directory1/samples/SSTC*/
do
samtools depth -r chr9:218026635-21994999 < $file > /home/sub_directory_2/level_2/${file}_out
done
I was hoping to generate an output from file_1_novoalign.bam in sub_directory1/samples/SSTC*/ and to send that output to /home/sub_directory_2/level_2/ as an output file called file_1_novoalign_out.bam however it doesn't work - it says 'bash: /home/sub_directory_2/level_2/file_1_novoalign.bam.out: No such file or directory'.
I would ideally like to be able to strip off the '_novoalign.bam' part of the outfile and replace with '_out.txt'. I'm sure this will be easy for a regular unix user but I have searched and can't find a quick answer and don't really have time to spend ages searching. Thanks in advance for any suggestions building on the code I have so far or any alternate suggestions are welcome.
p.s. I don't have permission to write files to the directory containing the input folders
Beneath an explanation for filenames without spaces, keeping it simple.
When you want files, not directories, you should end your for-loop with * and not */.
When you only want to process files ending with _novoalign.bam, you should tell this to unix.
The easiest way is using sed for replacing a part of the string with sed.
A dollar-sign is for the end of the string. The total script will be
OUTDIR=/home/sub_directory_2/level_2
for file in /home/sub_directory1/samples/SSTC/*_novoalign.bam; do
echo Debug: Inputfile including path: ${file}
OUTPUTFILE=$(basename $file | sed -e 's/_novoalign.bam$/_out.txt/')
echo Debug: Outputfile without path: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE}
done
Note 1:
You can use parameter expansion like file=${fullfile##*/} to get the filename without path, but you will forget the syntax in one hour.
Easier to remember are basename and dirname, but you still have to do some processing.
Note 2:
When your script first changes the directory to /home/sub_directory_2/level_2 you can skip the basename call.
When all the files in the dir are to be processed, you can use the asterisk.
When all files have at most one underscore, you can use cut.
You might want to add some error handling. When you want the STDERR from samtools in your outputfile, add 2>&1.
These will turn your script into
OUTDIR=/home/sub_directory_2/level_2
cd /home/sub_directory1/samples/SSTC
for file in *; do
echo Debug: Inputfile: ${file}
OUTPUTFILE="$(basename $file | cut -d_ -f1)_out.txt"
echo Debug: Outputfile: ${OUTPUTFILE}
samtools depth -r chr9:218026635-21994999 < ${file} > ${OUTDIR}/${OUTPUTFILE} 2>&1
done

Converting Filename to Filename_Inode

I'm writing my first script that takes a file and moves it to another folder, except that I want to change the filename of the file to filename_inode instead of just filename incase there are any files with the same name
I've figured out how to show this by creating the following 4 variables
inode=$(ls -i $1 | cut -c1-7) #lists the file the user types, cuts the inode from it
space="_" #used to put inbetween the filename and bname
bname=$(basename $1) #gets the basename of the file without the directory etc
bnamespaceinode=$bname$space$inode #combines the 3 values into one variable
echo "$bnamespaceinode #prints filename_inode to the window
So the bottom echo shows filename_inode which is what I want, except now when I try to move this using mv or cp i'm getting the following errors
I dont think it's anything wrong with the syntax i'm using for the mv and cv commands, and so I'm thinking I need to concatenate the 3 variables into a new file or use the result of the first and then append the other 2 to that file?
I've tried both of the above but still not having any luck, any ideas?
Thanks
Without clearer examples, I guess this could work:
$TARGETDIR=/my/target/directory
mv $1 $TARGETDIR/$(basename "$1" | sed 's/_.*/_inode/')

Efficient way to add two lines at the beginning of a very large file

I have a group of very large (a couple of GB's each) text files. I need to add two lines at the beginning of each of these files.
I tried using sed with the following command
sed -i '1iFirstLine'
sed -i '2iSecondLine'
The problem with sed is that it loops through the entire file, even if had to add only two lines at the beginning and therefore it takes lot of time.
Is there an alternate way to do this more efficiently, without reading the entire file?
You should try
echo "1iFirstLine" > newfile.txt
echo "2iSecondLine" >> newfile.txt
cat oldfile.txt >> newfile.txt
mv newfile.txt oldfile.txt
This one is perfectly working and its extremely fast too.
perl -pi -e '$.=0 if eof;print "first line\nsecond line\n" if ($.==1)' *.txt
Adding at the beginning is not possible without file rewrite (contrary to appending to the end). You simply cannot "shift" file content as no filesystem supports that. So you should do:
echo -e "line 1\nLine2" > tmp.txt
cat tmp2.txt oldbigfile.txt > newbigfile.txt
rm oldbigfile.txt
mv newbigfile.txt oldbigfile.txt
Note you need enough diskspace to hold both files for a while.

How do i read, modify and write to the same file without involving a temporary file in zsh?

I like keeping my history files uncluttered. Since zsh has excellent history searching features, there is no need to save all the commands that I repeatedly use (e.g., finger, pwd, ls, etc) multiple times. To strip the history file of all duplicate lines, I did sort .zhistory|uniq -du. Now, I'd like to write this back to the same file, so that if I simply put this in my .zshrc, everytime I login, my history is trimmed and clean. If I try sort .zhistory|uniq -du>.zhistory, the resulting file is empty! On the other hand, if I do sort .zhistory|uniq -du>tempfile, it writes to tempfile correctly. Any idea how I can write to the same file?
You might be able to use a variable:
file='.zhistory' && var=$(sort -u "$file") && echo "$var" > "$file"
The reason you can't write to the same file is that the redirection occurs first and truncates the file before the utility ever sees it.
You can prevent duplicate lines in the first place. Use setopt with one or more of the following settings (from man zshoptions):
HIST_EXPIRE_DUPS_FIRST
If the internal history needs to be trimmed to add the current
command line, setting this option will cause the oldest history
event that has a duplicate to be lost before losing a unique
event from the list. You should be sure to set the value of
HISTSIZE to a larger number than SAVEHIST in order to give you
some room for the duplicated events, otherwise this option will
behave just like HIST_IGNORE_ALL_DUPS once the history fills up
with unique events.
HIST_FIND_NO_DUPS
When searching for history entries in the line editor, do not
display duplicates of a line previously found, even if the
duplicates are not contiguous.
HIST_IGNORE_ALL_DUPS
If a new command line being added to the history list duplicates
an older one, the older command is removed from the list (even
if it is not the previous event).
HIST_IGNORE_DUPS (-h)
Do not enter command lines into the history list if they are
duplicates of the previous event.
HIST_SAVE_NO_DUPS
When writing out the history file, older commands that duplicate
newer ones are omitted.
The program sponge can be useful to write back in the same file you read.
(For the example's sake, you don't know about sed -i)
echo "say what again" > file
sed s/what/woot/ file > file
So bad, file is now empty, you lost your file.
echo "say what again" > file
sed s/what/woot/ file | sponge file
does what you want
(Be careful not to write sponge > file or the file will be empty again.)
The fact that i didn't have an answer to this question annoyed me sufficiently that i wrote one - call this inplace and put it executably on your path:
#! /bin/bash
BACKUP_EXT=
while getopts "b:" flag
do
case "$flag" in
b) BACKUP_EXT="$OPTARG" ;;
esac
done
shift $((OPTIND - 1))
CMD="$1"
shift
for filename in "$#"
do
TMP_FILE="$(mktemp -t)"
bash -c "$CMD" <"$filename" >"$TMP_FILE"
if [[ -n "$BACKUP_EXT" ]]
then
mv "$filename" "$filename.$BACKUP_EXT"
fi
mv "$TMP_FILE" "$filename"
done
You may now say:
inplace 'sort | uniq -du' .zhistory
Incidentally, there's a way to do that uniqification without having to sort - but that's an answer for another question!

Resources