Loop "paste" function over multiple files in the same folder - unix

I'm trying to concatenate horizontally a number of files (1000) *.txt in a folder.
How can I loop over the files using the "paste" function?
NB: all the *.txt files are in the same directory.

Why loop? You can use wildcards.
paste *.txt > combined.txt

In general, it would be a question of just calling paste *.txt (and redirecting the output: paste *.txt > output.txt, as #zx did). Try it, but you'll be generating some enormously long lines. If paste can`t handle the line length you'll be generating, you'll have to reproduce its effect using a scripting language that has no line length limit, like perl or python.
Another possible sticking point is if your shell can't handle this many arguments in the expansion of the glob *.txt. Again, you can solve that with a script. It's easy to do so if that's your situation, let us know here.
PS. Given what paste does, looping is not going to do it for you: You (presumably) need the file contents side by side in the output, not one after the other.

Related

Need to rename mulitiple files recursively using zmv

So I'm battling with a script at the moment. I'm using zsh. I've tried various combinations, but not coming right. Trying to change file names recursively. So basically I have a variable: file1.
What I'm trying to do is something like this:
zmv -W ${file1}/'**/*(test)*' ${file1}/'**/*red*'
This should change any file or folder in subdirectories recursively from test to red. Hence:
if $file1= /var/log then it should change:
/var/log/jump/greentest.txt to /var/log/jump/greenred.txt
also
/var/log/jump/1/1/test/test.xyz to /var/log/jump/1/1/red/red.xyz
Basically if I did a search:
ls **/*test* it would list all the files and folders recursively that had the word 'test' contained within them. With the zmv solution, I'd like to "find" those instances and change test to red.
How can I do this?
I hope you try these things first with -n....
Aside from this, the only part which looks wrong to me are the parenthesis around test. You introduce a new pattern variable for something which is not a wildcard, but your -W alreay implicitly introduces groups and references. Hence I would try it with
zmv -Wn $file1/'**/*test*' $file1/'**/*red*'
and if it works, remove the -n.

Split files linux and then grep

I'd like to split a file and grep each piece without writing them to indvidual files.
I've attempted a couple variations of split and grep and no such luck; any suggestions?
Something along the lines of:
split -b SIZE filename | grep "string"
I've attempted grep/fgrep to find the string but my shell complains that the files are too large. See: use fgrep instead
There is no point in splitting the file if you plan to [linearly] search each of the pieces anyway (assuming that's the only thing you are doing with it). Consider running grep on the entire file.
If however you plan to utilize the fact that the file is split later on, then the typical way would be:
Create a temporary directory and step into it
Run split/csplit on the original file
Use for loop over written fragment to do your processing.

Remove lines which are between given patterns from a file (using Unix tools)

I have a text file (more correctly, a “German style“ CSV file, i.e. semicolon-separated, decimal comma) which has a date and the value of a measurement on each line.
There are stretches of faulty values which I want to remove before further work. I'd like to store these cuts in some script so that my corrections are documented and I can replay those corrections if necessary.
The lines look like this:
28.01.2005 14:48:38;5,166
28.01.2005 14:50:38;2,916
28.01.2005 14:52:38;0,000
28.01.2005 14:54:38;0,000
(long stretch of values that should be removed; could also be something else beside 0)
01.02.2005 00:11:43;0,000
01.02.2005 00:13:43;1,333
01.02.2005 00:15:43;3,250
Now I'd like to store a list of begin and end patterns like 28.01.2005 14:52:38 + 01.02.2005 00:11:43, and the script would cut the lines matching these begin/end pairs and everything that's between them.
I'm thinking about hacking an awk script, but perhaps I'm missing an already existing tool.
Have a look at sed:
sed '/start_pat/,/end_pat/d'
will delete lines between start_pat and end_pat (inclusive).
To delete multiple such pairs, you can combine them with multiple -e options:
sed -e '/s1/,/e1/d' -e '/s2/,/e2/d' -e '/s3/,/e3/d' ...
Firstly, why do you need to keep a record of what you have done? Why not keep a backup of the original file, or take a diff between the old & new files, or put it under source control?
For the actual changes I suggest using Vim.
The Vim :global command (abbreviated to :g) can be used to run :ex commands on lines that match a regex. This is in many ways more powerful than awk since the commands can then refer to ranges relative to the matching line, plus you have the full text processing power of Vim at your disposal.
For example, this will do something close to what you want (untested, so caveat emptor):
:g!/^\d\d\.\d\d\.\d\d\d\d/ -1 write tmp.txt >> | delete
This matches lines that do NOT start with a date (the ! negates the match), appends the previous line to the file tmp.txt, then deletes the current line.
You will probably end up with duplicate lines in tmp.txt, but they can be removed by running the file through uniq.
you are also use awk
awk '/start/,/end/' file
I would seriously suggest learning the basics of perl (i.e. not the OO stuff). It will repay you in bucket-loads.
It is fast and simple to write a bit of perl to do this (and many other such tasks) once you have grasped the fundamentals, which if you are used to using awk, sed, grep etc are pretty simple.
You won't have to remember how to use lots of different tools and where you would previously have used multiple tools piped together to solve a problem, you can just use a single perl script (usually much faster to execute).
And, perl is installed on virtually every unix/linux distro now.
(that sed is neat though :-)
use grep -L (print none matching lines)
Sorry - thought you just wanted lines without 0,000 at the end

paste without temporary files in Unix

I'm trying to use the Unix command paste, which is like a column-appending form of cat, and came across a puzzle I've never known how to solve in Unix.
How can you use the outputs of two different programs as the input for another program (without using temporary files)?
Ideally, I'd do this (without using temporary files):
./progA > tmpA;
./progB > tmpB; paste tmpA tmpB
This seems to come up relatively frequently for me, but I can't figure out how to use the output from two different programs (progA and progB) as input to another without using temporary files (tmpA and tmpB).
For commands like paste, simply using paste $(./progA) $(./progB) (in bash notation) won't do the trick, because it can read from files or stdin.
The reason I'm wary of the temporary files is that I don't want to have jobs running in parallel to cause problems by using the same file; ensuring a unique file name is sometimes difficult.
I'm currently using bash, but would be curious to see solutions for any Unix shell.
And most importantly, am I even approaching the problem in the correct way?
Cheers!
You do not need temp files under bash, try this:
paste <(./progA) <(./progB)
See "Process Substitution" in the Bash manual.
Use named pipes (FIFOs) like this:
mkfifo fA
mkfifo fB
progA > fA &
progB > fB &
paste fA fB
rm fA fB
The process substitution for Bash does a similar thing transparently, so use this only if you have a different shell.
Holy moly, I recent found out that in some instances, you can get your process substitution to work if you set the following inside of a bash script (should you need to):
set +o posix
http://www.linuxjournal.com/content/shell-process-redirection
From link:
"Process substitution is not a POSIX compliant feature and so it may have to be enabled via: set +o posix"
I was stuck for many hours, until I had done this. Here's hoping that this additional tidbit will help.
Works in all shells.
{
progA
progB
} | paste

Unable to combine pwd with filenames in Zsh

Problem: to combine PATHs with filenames, such that I can easily source many files.
I have two files A and B. ls gives their names clearly.
I run
pwd `ls`
I get the error message
too many arguments
I did not find an option for pwd which would allow me to have more than one argument.
How can you combine pwd's output to filenames.
echo $PWD/*
In addition to sigjuice's answer, if, as you state, you need it for sourcing many files, simply use
source ./*
It'll probably burn less CPU cycles because the shell doesn't have to create absolute path names for each file.

Resources