The best way in Unix to add a header to multiple files in a directory? - unix

Before anyone else checks, I am confident this is not a duplicate of the existing question of how to add a header in Unix to multiple files (the question is here: Adding header into multiple text files). This is more about optimisation of a solution I am currently using for this current issue.
I have numerous directories in which I have over 20000 files and for each file I want to add the same header.
What I have been doing is:
sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt
Now, this does work exactly as I want it to, but there have been a couple of issues.
First is that this seems to be an extremely slow method of doing this and it can take a pretty long time to get through all 20K+ files.
Second, and more frustratingly, occasionally my connection to the server I am using has timed out during this long process meaning that the command won't finish running, so I end up with half the files having the header and half not. And if I started from the top again this would mean a number of the files would have the header twice so I actually have to go through a process of creating them again so I can add the header all at once.
So, what I am wondering is if there is a better/quicker solution to this problem. The question I linked above seems like it would actually be slower (given that it seems like there is more the command line needs to do at each file as it is going through a loop) and so doesn't seem like it would fix this.

Don't use -i. It confuses things when you get interrupted. Instead, use
mkdir -p ../output-dir
for file in *.txt; do
sed '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' "$file" > ../output-dir/"$file"
done
When you're done, you can rename the directories if you wish. This doesn't address the connection issue (ThoriumBR's suggestion of nohup is good for that), but when it happens you can recover state more easily.

First, adding a header is slow. You have to move the entire file contents to add something at the start. Adding a trailer would be very fast.
Second, use nohup:
nohup - run a command immune to hangups, with output to a non-tty
Using nohup sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt will keep the command running on the background even if the server times you out.

Related

How can I tail -f but only in whole lines?

I have a constantly updating huge log file (MainLog).
I want to create another file which is only the last n lines of the log file BUT also updating.
If I use:
tail -f MainLog > RecentLog
I get ALMOST what I want except RecentLog is written as MainLog is available and might at any point only have part of the last MainLog line.
How can I specify to tail that I only want it to write when a WHOLE line is available?
By default, tail outputs whole lines unless you use the -c switch to count characters. Something like
tail -n 20 -f MainLog > RecentLog
(substituting the number of lines you want prepended to the second file for "20") should work as you want.
But if if doesn't, it is possible that using grep to line-buffer your output will fix this condition. See this question.
After many attempts, the only solution for multiple files that worked (fantastically well) for me is the fdlinecombine command. It's a small binary that reads multiple file descriptors and prints data to stdout linewise.
My use case is spawning multiple long-running ssh commands in the background and following their output, without having the lines garbled or interrupted in between.

Unix Shell Script: sleep command not working

i have a scenario in which i need to download files through curl command and want my script to pause for some time before downloading the second one. I used sleep command like
sleep 3m
but it is not working.
any idea ???
thanks in advance.
Make sure your text editor is not putting a /r /n and only a /n for every new line. This is typical if you are writing the script on windows.
Use notepad++ (windows) and go to edit|EOL convention|UNIX then save it. If you are stuck with it on the computer, i have read from here [talk.maemo.org/showthread.php?t=67836] that you can use [tr -d "\r" < oldname.sh > newname.sh] to remove the problem. if you want to see if it has the problem use [od -c yourscript.sh] and /r will occur before any /n.
Other problems I have seen it cause is cd /dir/dir and you get [cd: 1: can't cd to /dir/dir] or copy scriptfile.sh newfilename the resulting file will be called newfilenameX where X is an invisible character (ensure you can delete it before trying it), if the file is on a network share, a windows machine can see the character. Ensure it is not the last line for a successful test.
Until i figured it out (i knew i had to ask google for something that may manifest in various ways) i thought that there was an issue with this linux version i was using (sleep not working in a script???).
Are you sure you are using sleep the right way? Based on your description, you should be invoking it as:
sleep 180
Is this the way you are doing it?
You might also want to consider wget command as it has an explicit --wait flag, so you might avoid having the loop in the first place.
while read -r urlname
do
curl ..........${urlname}....
sleep 180 #180 seconds is 3 minutes
done < file_with_several_url_to_be_fetched
?

Vim execute a command and send out buffer over stdout [duplicate]

This question already has answers here:
Redirect ex command to STDOUT in vim
(3 answers)
Closed 9 years ago.
Here's how you can automate vim in an interesting way:
vim -c '0,$d | r source.txt | 1d | w | q' dest.txt
This uses vim ex commands to erase dest.txt, read source.txt into the buffer, erase the first line (which ends up as a blank line due to the way r works), write to the file (dest.txt), and then quit.
This (as far as I can tell) skips the entire vim terminal UI from loading and is conceptually a little like having a vimscript interpreter.
Now I'd love to be able to take this just one little step further to abuse the capabilities of vim: I want a script to peer at the currently edited changes of an opened file (as part of an interactive automation shell script) which exist in the vim *.swp swapfiles, apply the changes through vim's recover command, and then obtain the output.
Of course it would be perfectly serviceable to use an actual file, e.g. orig_file.txt is being edited in vim in another terminal; my script could do this at each point that the swapfile is detected to change:
cp orig_file.txt orig_file_ephemeral.txt
cp .orig_file.txt.swp .orig_file.txt_ephemeral.txt.swp
vim -c 'recover | w | q'
At this point orig_file_ephemeral.txt shall contain the content of the vim buffer from the other process in which editing is taking place, and we obtained this data without requiring any direct interaction with said process. Which is neat.
Of course for practical purposes it would probably make more sense to do exactly that, and just have the primary vim participate in the process. It would be splitting the functionality for the script out into the configuration of vim, which is a downside, but it would be more straightforward conceptually and computationally as it already has the buffer contents readily available for writing, and it should be straightforward to do so as I believe there exists an autocommand we can use (though whether that autocommand is run prior to saving the swapfile or not remains to be seen).
Either way, for the sake of completeness I'm curious to know if there exists an ex command to write stuff to the STDOUT of vim. Or if this even makes any sense.
I think it perhaps makes no sense as STDOUT is bound to be the actual terminal, e.g. it is where vim sends out its "view" of its UI and the buffer, and everything, to the terminal. So that for example if any of the vim -c 'vimscript commands' commands produce vim errors, I'll be seeing vim's terminal output to display these errors over STDOUT.
Therefore it may only be practical to use a file. But maybe there's some kind of craziness like !tee /dev/fd/3 I could do?
In addition, there is a wrinkle with this roundabout approach, which is that vim presents a Warning: Original file may have been changed error in bright red background text for about a second, and this is surely due to renaming the file. I can likely work around that by doing this work inside of a sub-directory while keeping the filenames identical.
That's the p command (and where the p in grep comes from):
ex -sc '%p|q' file
Would be a bit like cat file.

Keep log file of shell script execution until past few days

Am appending the standard output and error of the shell script execution on a unix bok like shown below
/home/mydir/shellScript.sh >> /home/mydir/shellScript.log 2>&1
Now am wondering a way to keep logs going back as much as say 30 days else the log file size will keep on increasing.
Would appreciate if anyone can provide recommendations around the same.
This kind of thind is generally done with a tool such as logrotate.
For example, with Apache's logs, I've seen it used to :
Once per day, move the current file to another (to have one log file per day), gzipping the resulting file of the day before
Delete the archived file that were more than 1 week old
So, I suppose you might be able to use it to get what you're asking.
Is this a long-running script (e.g. daemon)? Or does it do something then exits quickly? You could dynamically build the log file's name based on today's date, so a new file gets generated any time the date changes:
#/bin/sh
now=`date +%F`
/home/mydir/shellScript.sh >> /home/mydir/shellScript-$now.log 2>&1
previous=`date --date='30 days ago' +%F`
rm -f /home/mydir/shellScript-$previous.log 2>&1
(added stale log removal).
Pascal MARTIN is correct - it is a simple matter to put a configuration file into /etc/logrotate.d, or add an entry onto the end of the file /etc/logrotate, as logrotate is included stock in most UNIX systems. It is a very easy-to-understand configuration file that takes roughly 5 min. at a man page to understand. I recommend it as the easiest and most maintainable solution.
There's not a lot of context to your problem included.
I agree with both of the offered solutions.
I would also point you to my 2 rather long-winded ;-) discourses on naming and managing logfiles.
Bash piping output and input of a program
command line wisdom for 2 panel file manager user
I hope these help.

paste without temporary files in Unix

I'm trying to use the Unix command paste, which is like a column-appending form of cat, and came across a puzzle I've never known how to solve in Unix.
How can you use the outputs of two different programs as the input for another program (without using temporary files)?
Ideally, I'd do this (without using temporary files):
./progA > tmpA;
./progB > tmpB; paste tmpA tmpB
This seems to come up relatively frequently for me, but I can't figure out how to use the output from two different programs (progA and progB) as input to another without using temporary files (tmpA and tmpB).
For commands like paste, simply using paste $(./progA) $(./progB) (in bash notation) won't do the trick, because it can read from files or stdin.
The reason I'm wary of the temporary files is that I don't want to have jobs running in parallel to cause problems by using the same file; ensuring a unique file name is sometimes difficult.
I'm currently using bash, but would be curious to see solutions for any Unix shell.
And most importantly, am I even approaching the problem in the correct way?
Cheers!
You do not need temp files under bash, try this:
paste <(./progA) <(./progB)
See "Process Substitution" in the Bash manual.
Use named pipes (FIFOs) like this:
mkfifo fA
mkfifo fB
progA > fA &
progB > fB &
paste fA fB
rm fA fB
The process substitution for Bash does a similar thing transparently, so use this only if you have a different shell.
Holy moly, I recent found out that in some instances, you can get your process substitution to work if you set the following inside of a bash script (should you need to):
set +o posix
http://www.linuxjournal.com/content/shell-process-redirection
From link:
"Process substitution is not a POSIX compliant feature and so it may have to be enabled via: set +o posix"
I was stuck for many hours, until I had done this. Here's hoping that this additional tidbit will help.
Works in all shells.
{
progA
progB
} | paste

Resources