How to cat using part of a filename in terminal? - unix

I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.

for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)

It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.

Related

Trim a file name in Unix

I have a file with name
ROCKET_25_08:00.csv
I want to trim the name of the file to
ROCKET_25_.csv
I tried mv but mv is not what I required because there will be cases where the files may be more than one.
I want the name till the second _.
How to get that in unix.
Please advise.
There are some utilities that provide more flexible renaming. But one solution that won't use anything other but included UNIX tools (like sed) would be:
ls -d * | sed -re 's/^([^_]*_[^_]*_)(.*)(\....)$/mv -v \1\2\3 \1\3/' | bash
This will only work in one directory, it won't process subdirectories.
It's not at all clear what you are actually trying to do, but if you just want to remove text between the last underscore and the period, you can do:
f=ROCKET_25_08:00.csv
echo ${f%_*}_.csv

Why not pipe list of file names into cat?

What is the design rationale that cat doesn't take list of file names from pipe input? Why did the designers choose that the following does not work?
ls *.txt | cat
Instead of this, they chose that we need to pass the file names as argument to cat as:
ls *.txt | xargs cat
When you say ls *.txt | cat doesn't work, you should say that doesn't work as you expect. In fact, that works in the way it was thought to work.
From man:
cat - Concatenate FILE(s), or standard input, to standard output
Suppose the next output:
$ ls *.txt
file1.txt
file2.txt
... the input to cat will be:
file1.txt
file2.txt
...and that's exactly what cat output in the standard output
In some shells, it's equivalent to:
cat <(ls *.txt)
or
ls *.txt > tmpfile; cat tmpfile
So, cat is really working as their designers expected to do so.
On the other hand, what you are expecting is that cat interprets its input as a set of filenames to read and concatenate their content, but when you pipe to cat, that input works as a lonely file.
To make it short, cat is a command, like echo or cp, and few others, which cannot convert pipe redirected input stream into arguments.
So, xargs, is used, to pass the input stream as an argument to the command.
More details here: http://en.wikipedia.org/wiki/Xargs
As a former unix SA, and now, Python developer, I believe I could compare xargs, to StringIO/CStringIO, in Python, as it kind of helps the same way.
When it comes to your question: Why didn't they allow stream input? Here is what I think
Nobody but them could answer this.
I believe, however, than cat is meant to print to stdout the content of a file, while the command echo, was meant to print to stdout the content of a string.
Each of these commands, had a specific role, when created.

Efficient way to add two lines at the beginning of a very large file

I have a group of very large (a couple of GB's each) text files. I need to add two lines at the beginning of each of these files.
I tried using sed with the following command
sed -i '1iFirstLine'
sed -i '2iSecondLine'
The problem with sed is that it loops through the entire file, even if had to add only two lines at the beginning and therefore it takes lot of time.
Is there an alternate way to do this more efficiently, without reading the entire file?
You should try
echo "1iFirstLine" > newfile.txt
echo "2iSecondLine" >> newfile.txt
cat oldfile.txt >> newfile.txt
mv newfile.txt oldfile.txt
This one is perfectly working and its extremely fast too.
perl -pi -e '$.=0 if eof;print "first line\nsecond line\n" if ($.==1)' *.txt
Adding at the beginning is not possible without file rewrite (contrary to appending to the end). You simply cannot "shift" file content as no filesystem supports that. So you should do:
echo -e "line 1\nLine2" > tmp.txt
cat tmp2.txt oldbigfile.txt > newbigfile.txt
rm oldbigfile.txt
mv newbigfile.txt oldbigfile.txt
Note you need enough diskspace to hold both files for a while.

unix command to read line from a file by passing line number

I am looking for a unix command to get a single line by passing line number to a big file (with around 5 million records). For example to get 10th line, I want to do something like
command file-name 10
Is there any such command available? We can do this by looping through each record but that will be time consuming process.
This forum entry suggests:
sed -n '52p' (file)
for printing the 52th line of a file.
Going forward, There are a lot of ways to do it, and other related stuffs.
If you want multiple lines to be printed,
sed -n -e 'Np' -e 'Mp'
Where N and M are lines which will only be printed. Refer this 10 Awesome Examples for Viewing Huge Log Files in Unix
command | sed -n '10p'
or
sed -n '10p' file
You could do something like:
head -n<lineno> <file> | tail -n1
That would give you the <lineno> lines, then only give the last line of output (your line).
Edit: It seems all the solutions here are pretty slow. However, by definition you'll have to iterate through all the records since the operating system has no way to parse line-oriented files since files are byte-oriented. (In some sense, all these programs are going to do is count the number of \n or \r characters.) In lieu of a great answer, I'll also present the timings on my system of several of these commands!
[mjschultz#mawdryn ~]$ time sed -n '145430980p' br.txt
0b10010011111111010001101111010111
real 0m25.871s
user 0m17.315s
sys 0m2.360s
[mjschultz#mawdryn ~]$ time head -n 145430980 br.txt | tail -n1
0b10010011111111010001101111010111
real 0m41.112s
user 0m39.385s
sys 0m4.291s
[mjschultz#mawdryn ~]$ time awk 'NR==145430980{print;exit}' br.txt
0b10010011111111010001101111010111
real 2m8.835s
user 1m38.076s
sys 0m3.337s
So, on my system, it looks like the sed -n '<lineno>p' <file> solution is fastest!
you can use awk
awk 'NR==10{print;exit}' file
Put an exit after printing the 10th line so that awk won't process the 5 million records file further.

Unix [Homework]: Get a list of /home/user/ directories in /etc/passwd

I'm very new to Unix, and currently taking a class learning the basics of the system and its commands.
I'm looking for a single command line to list off all of the user home directories in alphabetical order from the /etc/passwd directory. This applies only to the home directories, and not the contents within them. There should be no duplicate entries. I've tried many permutations of commands such as the following:
sort -d | find /etc/passwd /home/* -type -d | uniq | less
I've tried using -path, -name, removing -type, using -prune, and changing the search pattern to things like /home/*/$, but haven't gotten good results once. At best I can get a list of my own directory (complete with every directory inside it, which is bad), and the directories of the other students on the server (without the contained directories, which is good). I just can't get it to display the /home/user directories and nothing else for my own account.
Many thanks in advance.
/etc/passwd is a file. the home directory is usually at field/column 6, where ":" is the delimiter. When you are dealing with file structure that has distinct characters as delimiters, you should use a tool that can break your data down into smaller chunks for easier manipulation using fields and field delimiters. awk/cut etc, even using the shell with IFS variable set can do the job. eg
awk -F":" '{print $6}' /etc/passwd | sort
cut -d":" -f6 /etc/passwd |sort
using the shell to read the file
while IFS=":" read -r a b c d e home_dir g
do
echo $home_dir
done < /etc/passwd | sort
I think the tools you want are grep, tr and awk. Grep will give you lines from the file that actually contain home directories. tr will let you break up the delimiter into spaces, which makes each line easier to parse.
Awk is just one program that would help you display the results that you want.
Good luck :)
Another hint, try ls --color=auto /etc, passwd isn't the kind of file that you think it is. Directories show up in blue.
In Unix, find is a command for finding files under one or more directories. I think you are looking for a command for finding lines within a file that match a pattern? Look into the command grep.
sed 's|\(.[^:]*\):\(.[^:]*\):\(.*\):\(.[^:]*\):\(.[^:]*\)|\4|' /etc/passwd|sort
I think all this processing could be avoided. There is a utility to list directory contents.
ls -1 /home
If you'd like the order of the sorting reversed
ls -1r /home
Granted, this list out the name of just that directory name and doesn't include the '/home/', but that can be added back easily enough if desired with something like this
ls -1 /home | (while read line; do echo "/home/"$line; done)
I used something like :
ls -l -d $(cut -d':' -f6 /etc/passwd) 2>/dev/null | sort -u
The only thing I didn't do is to sort alphabetically, didn't figured that yet

Resources