paste without temporary files in Unix - unix

I'm trying to use the Unix command paste, which is like a column-appending form of cat, and came across a puzzle I've never known how to solve in Unix.
How can you use the outputs of two different programs as the input for another program (without using temporary files)?
Ideally, I'd do this (without using temporary files):
./progA > tmpA;
./progB > tmpB; paste tmpA tmpB
This seems to come up relatively frequently for me, but I can't figure out how to use the output from two different programs (progA and progB) as input to another without using temporary files (tmpA and tmpB).
For commands like paste, simply using paste $(./progA) $(./progB) (in bash notation) won't do the trick, because it can read from files or stdin.
The reason I'm wary of the temporary files is that I don't want to have jobs running in parallel to cause problems by using the same file; ensuring a unique file name is sometimes difficult.
I'm currently using bash, but would be curious to see solutions for any Unix shell.
And most importantly, am I even approaching the problem in the correct way?
Cheers!

You do not need temp files under bash, try this:
paste <(./progA) <(./progB)
See "Process Substitution" in the Bash manual.

Use named pipes (FIFOs) like this:
mkfifo fA
mkfifo fB
progA > fA &
progB > fB &
paste fA fB
rm fA fB
The process substitution for Bash does a similar thing transparently, so use this only if you have a different shell.

Holy moly, I recent found out that in some instances, you can get your process substitution to work if you set the following inside of a bash script (should you need to):
set +o posix
http://www.linuxjournal.com/content/shell-process-redirection
From link:
"Process substitution is not a POSIX compliant feature and so it may have to be enabled via: set +o posix"
I was stuck for many hours, until I had done this. Here's hoping that this additional tidbit will help.

Works in all shells.
{
progA
progB
} | paste

Related

The best way in Unix to add a header to multiple files in a directory?

Before anyone else checks, I am confident this is not a duplicate of the existing question of how to add a header in Unix to multiple files (the question is here: Adding header into multiple text files). This is more about optimisation of a solution I am currently using for this current issue.
I have numerous directories in which I have over 20000 files and for each file I want to add the same header.
What I have been doing is:
sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt
Now, this does work exactly as I want it to, but there have been a couple of issues.
First is that this seems to be an extremely slow method of doing this and it can take a pretty long time to get through all 20K+ files.
Second, and more frustratingly, occasionally my connection to the server I am using has timed out during this long process meaning that the command won't finish running, so I end up with half the files having the header and half not. And if I started from the top again this would mean a number of the files would have the header twice so I actually have to go through a process of creating them again so I can add the header all at once.
So, what I am wondering is if there is a better/quicker solution to this problem. The question I linked above seems like it would actually be slower (given that it seems like there is more the command line needs to do at each file as it is going through a loop) and so doesn't seem like it would fix this.
Don't use -i. It confuses things when you get interrupted. Instead, use
mkdir -p ../output-dir
for file in *.txt; do
sed '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' "$file" > ../output-dir/"$file"
done
When you're done, you can rename the directories if you wish. This doesn't address the connection issue (ThoriumBR's suggestion of nohup is good for that), but when it happens you can recover state more easily.
First, adding a header is slow. You have to move the entire file contents to add something at the start. Adding a trailer would be very fast.
Second, use nohup:
nohup - run a command immune to hangups, with output to a non-tty
Using nohup sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt will keep the command running on the background even if the server times you out.

Unix Shell Script: sleep command not working

i have a scenario in which i need to download files through curl command and want my script to pause for some time before downloading the second one. I used sleep command like
sleep 3m
but it is not working.
any idea ???
thanks in advance.
Make sure your text editor is not putting a /r /n and only a /n for every new line. This is typical if you are writing the script on windows.
Use notepad++ (windows) and go to edit|EOL convention|UNIX then save it. If you are stuck with it on the computer, i have read from here [talk.maemo.org/showthread.php?t=67836] that you can use [tr -d "\r" < oldname.sh > newname.sh] to remove the problem. if you want to see if it has the problem use [od -c yourscript.sh] and /r will occur before any /n.
Other problems I have seen it cause is cd /dir/dir and you get [cd: 1: can't cd to /dir/dir] or copy scriptfile.sh newfilename the resulting file will be called newfilenameX where X is an invisible character (ensure you can delete it before trying it), if the file is on a network share, a windows machine can see the character. Ensure it is not the last line for a successful test.
Until i figured it out (i knew i had to ask google for something that may manifest in various ways) i thought that there was an issue with this linux version i was using (sleep not working in a script???).
Are you sure you are using sleep the right way? Based on your description, you should be invoking it as:
sleep 180
Is this the way you are doing it?
You might also want to consider wget command as it has an explicit --wait flag, so you might avoid having the loop in the first place.
while read -r urlname
do
curl ..........${urlname}....
sleep 180 #180 seconds is 3 minutes
done < file_with_several_url_to_be_fetched
?

Loop "paste" function over multiple files in the same folder

I'm trying to concatenate horizontally a number of files (1000) *.txt in a folder.
How can I loop over the files using the "paste" function?
NB: all the *.txt files are in the same directory.
Why loop? You can use wildcards.
paste *.txt > combined.txt
In general, it would be a question of just calling paste *.txt (and redirecting the output: paste *.txt > output.txt, as #zx did). Try it, but you'll be generating some enormously long lines. If paste can`t handle the line length you'll be generating, you'll have to reproduce its effect using a scripting language that has no line length limit, like perl or python.
Another possible sticking point is if your shell can't handle this many arguments in the expansion of the glob *.txt. Again, you can solve that with a script. It's easy to do so if that's your situation, let us know here.
PS. Given what paste does, looping is not going to do it for you: You (presumably) need the file contents side by side in the output, not one after the other.

Viewing Unix Log Files

We are having a discussion at work, what is the best UNIX command tool that to view log files. One side says use LESS, the other says use MORE. Is one better than the other?
A common problem is that logs have too many processes writing to them, I prefer to filter my log files and control the output using:
tail -f /var/log/<some logfile> | grep <some identifier> | more
This combination of commands allows you to watch an active log file without getting overwhelmed by the output.
I opt for less. A reason for this is that (with aid of lessopen) it can read gzipped log (as archived by logrotate).
As an example with this single command I can read in time ordered mode dpkg log, without treating differently gzipped ones:
less $(ls -rt /var/log/dpkg.log*) | less
Multitail is the best option, because you can view multiple logs at the same time. It also colors stuff, and you can set up regex to highlight entries you're looking for.
You can use any program: less, nano, vi, tail, cat etc, they differ in functionality.
There are also many log viewers: gnome-system-log, kiwi etc (they can sort log by date / type etc)
Less is more. Although since when I'm looking at my logs I'm typically searching for something specific or just interested in the last few events I find myself using cat, pipes and grep or tail rather than more or less.
less is the best, imo. It is light weight compared to an editor, it allows forward and backward navigation, it has powerful search capabilities, and many more things. Hit 'h' for help. It's well worth the time getting familiar with it.
On my Mac, using the standard terminal windows, there's one difference between less and more, namely, after exiting:
less leaves less mess on my screen
more leaves more useful information on my screen
Consequently, if I think I might want to do something with the material I'm viewing after the viewer finishes (for example, copy'n'paste operations), I use more; if I don't want to use the material after I've finished, then I use less.
The primary advantage of less is the ability to scroll backwards; therefore, I tend to use less rather than more, but both have uses for me. YMMV (YMWV; W = Will in this case!).
As your question was generically about 'Unix systems', keep into account that
in some cases you have no choice, for old systems you have only MORE available,
but not LESS.
LESS is part of the GNU tools, MORE comes from the UCB times.
Turn on grep's line buffering mode.
Using tail (Live monitoring)
tail -f fileName
Using less (Live monitoring)
less +F fileName
Using tail & grep
tail -f fileName | grep --line-buffered my_pattern
Using less & grep
less +F fileName | grep --line-buffered my_pattern
Using watch & tail to highlight new lines
watch -d tail fileName
Note: For linux systems.

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

Resources