How can I use the command line to combine multiple text files? - unix

Say I have 3 or more files, is there anyway I can combine these files into a single document? Example below.
File1:
abc123
File2:
2468, def
File3:
zyx987
I want the outcome to be
CombinedFile:
abc123 2468, def zyx987

There are different ways:
I tested with f1, f2, f3. If the name follows the pattern fXX, it can be done like this:
$ paste f*
abc123 2468, def zyx987
$ paste -d' ' f* #set space as delimiter
abc123  2468, def zyx987
$ cat f*
abc123
2468, def
zyx987
If you want the output to be a file, just add > result
$ cat f* > result
$ cat result
abc123
2468, def
zyx987

Here is another way using pr.
pr -mts' ' f{1,2,3}
$ head f*
==> f1 <==
abc123
==> f2 <==
2468, def
==> f3 <==
zyx987
$ pr -mts' ' f{1,2,3}
abc123 2468, def zyx987

Related

grep string without anything surrounding it

I have a directory with multiple files, let's call these 1.txt, 2.txt, etc. Each file consists of 3 columns: an ID, a lat, and a lon. Unfortunately, some of the ID's may also be numbered (e.g., 1346, 248, 67912, etc.). I am trying to count the number of instances a station ID occurs among all files (1.txt, 2.txt, etc.) based on a master file (masterfile.txt). So far I have:
while IFS='' read line || [[ -n "$line" ]]
do
cat * | grep -w -c $line >> counting_filename.txt
done < masterfile.txt
Which works great. However, if the lat and/or long contain the particular ID, this will also count. For example, if I am looking for the station ID of 4575, if there is a lat of '47.4575' or a lon of '-77.4575', these will also go towards the count. Therefore, there are two solutions I could think of that I can't figure out:
1) Just grep the first column of instances, or
2) grep while NOT including the leading '.'
For example:
1.txt
4575 39.4575 -77.51
5010 38.3498 -78.4575
LAMS 38.4444 -78.3126
2.txt
3124 39.1010 -79.4575
4575 39.4575 -77.5010
PAOQ 39.2222 -78.0032
If I ran the above command, I would get a count of 6 for 4575, 2 for 5010, 1 for LAMS, 1 for 3124, and 1 for PAOQ.
What is desired is: 2 for 4575, 1 for 5010, 1 for LAMS, 1 for 3124, and 1 for PAOQ.
Any thoughts?
You're using the wrong tools - a small, simple awk script will be far more robust, efficient, and portable than a mixture of shell loops, read, grep, etc.
It's not clear what masterfile.txt is for as from your example it looks like this is all you need:
$ awk '{cnt[$1]++} END{for (id in cnt) print id, cnt[id]}' 1.txt 2.txt
LAMS 1
PAOQ 1
3124 1
4575 2
5010 1
If you need masterfile.txt to list a specific set of IDs rather than just producing counts for all IDs as above then you can do that too:
$ cat masterfile.txt
4575
3124
PAOQ
BLAH
$ awk 'NR==FNR{ids[$1];next} $1 in ids{cnt[$1]++} END{for (id in cnt) print id, cnt[id]}' masterfile.txt 1.txt 2.txt
PAOQ 1
3124 1
4575 2
$ awk 'NR==FNR{ids[$1];next} $1 in ids{cnt[$1]++} END{for (id in ids) print id, cnt[id]+0}' masterfile.txt 1.txt 2.txt
BLAH 0
PAOQ 1
3124 1
4575 2
I added BLAH to show the different options you have of handling an ID from masterfile.txt that doesn't appear in your other files.

print a range of lines if they match a pattern

My input
1
abc
1cde
efg
xxx
1
abc
pattern1
pattern2
efg
xxx
1
abc
cde
efg
xxx
my expected output (print from 1 it contains pattern1 and 2):
1
abc
pattern1
pattern2
efg
xxx
I have so for:
sed -n '/^1/ {x;/pattern1/ {N;/\n.*pattern2/p};d} $/^1/ {h;/pattern1/ {N;/\n.*pattern2/p};d}}H' My file
BTW my file is a very big file, please show me a method that can do it quickly.
Thanks so much.
sed is for s/old/new/ - that is all. For anything else you should be using awk.
It looks like your expected output can't actually be produced from your sample input so it's a guess and untested since we don't have anything concrete to test against but it sounds like you might want:
awk -v RS= -v ORS='\n\n' '/pattern1/ && /pattern2/' file

How to handle "-" in grep?

Its been asked several times but its not clear to me yet.
I have the following text in a file ( data.txt, tab delimeted ):
ABC 12
ABC-AS 14
DEF 18
DEF-AS 9
Now I want to search for ABC and DEF, but not ABC-AS, DEF-AS as a result.
grep -w ABC data.txt
Output:
grep -w ABC data.txt
ABC
ABC-AS
grep --no-group-separator -w "ABC" data.txt
ABC
ABC-AS
grep --group-separator="\t" -w "ABC" data.txt
ABC
ABC-AS
With a regex
grep -E "(ABC|DEF)[^\-]" data.txt
Details
(ABC|DEF): Match "ABC" or "DEF"
[^\-]: Anything except "-"
Output
ABC 12
DEF 18
Try this, which select only those matches that exactly match the whole line
grep --line-regexp "ABC" data.txt

Convert specific column of file into upper case in unix (without using awk and sed)

My file is as below
file name = test
1 abc
2 xyz
3 pqr
How can i convert second column of file in upper case without using awk or sed.
You can use tr to transform from lowercase to uppercase. cut will extract the single columns and paste will combine the separated columns again.
Assumption: Columns are delimited by tabs.
paste <(cut -f1 file) <(cut -f2 file | tr '[:lower:]' '[:upper:]')
Replace file with your file name (that is test in your case).
In pure bash
#!/bin/bash
while read -r col1 col2;
do
printf "%s%7s\n" "$col1" "${col2^^}"
done < file > output-file
Input-file
$ cat file
1 abc
2 xyz
3 pqr
Output-file
$ cat output-file
1 ABC
2 XYZ
3 PQR

AWK to use multiple spaces as delimiter

I am using below command to join two files using first two columns.
awk 'NR==FNR{a[$1,$2]=substr($0,3);next} ($1,$2) in a{print $0, a[$1,$2] > "br0102_3.txt"}' br01.txt br02.txt
Now, by default AWk command uses whitespaces as the separators. But my file may contain single space between two words, e.g.
File 1:
ABCD TEXT1 TEXT2 123123112312312312312312312312312312
BCDEFG TEXT3TEXT4 133123123123123123123123123125423423
QWERT TEXT5TEXT6 123123123123125456678786789698758567
File 2:
ABCD TEXT1 TEXT2 12312312312312312312312312312
BCDEFG TEXT3TEXT4 31242342342342342342342342343
MNHT TEXT8 TEXT9 31242342342342342342342342343
I want the result file as ;
ABCD TEXT1 TEXT2 123123112312312312312312312312312312 12312312312312312312312312312
BCDEFG TEXT3TEXT4 133123123123123123123123123125423423 31242342342342342342342342343
QWERT TEXT5TEXT6 123123123123125456678786789698758567
MNHT TEXT8 TEXT9 31242342342342342342342342343
Any hints ?
awk supports a regular expression as the value of FS so you can specify a regular expression that matches at least two spaces. Something like -F '[[:space:]][[:space:]]+'.
$ awk '{print NF}' File2
4
3
4
$ awk -F '[[:space:]][[:space:]]+' '{print NF}' File2
3
3
3
You are using fixed width fields so you should be using gnu awk FIELDWIDTHS (or similar) to separate the fields, e.g. if the 2nd field is the 15 chars from char 8 to char 23 inclusive in this file:
$ cat file
abc def ghi klm
AAAAAAAB C D E F G H IJJJJ
abc def ghi klm
$ awk -v FIELDWIDTHS="7 15 4" '{print "<" $2 ">"}' file
<def ghi >
<B C D E F G H I>
< def ghi >
Any solution that relies on a certain number of spaces between fields will fail when you have 1 or zero spaces between your fields.
If you want to strip leading/trailing blanks from your target field(s):
$ awk -v FIELDWIDTHS="7 15 4" '{gsub(/^\s+|\s+$/,"",$2); print "<" $2 ">"}' file
<def ghi>
<B C D E F G H I>
<def ghi>
awk automatically detects multiple spaces if field seperator is set to " "
Thus, this simply works:
awk -F' ' '{ print $2 }'
to get the second column if you have a table like the one mentioned.

Resources