I have the following tab separated file
1 879375 879375
1 899892 899892
1 949363 949363
1 949523 949523
1 949696 949696
1 949739 949739
1 955619 955619
1 957605 957605
1 957693 957693
and have used the following unix command to add 1 to each of the values in column 3:
awk '{$3+=1}1' file > new_file
However the new file loses its tab separator and I would like to keep it.
You are on right path. You need to set FS(field separator) and OFS(output field separator) as \t to your code.
awk 'BEGIN{FS=OFS="\t"} {$3+=1}1' Input_file
I would like to split the following file based on the pattern ABC:
ABC
4
5
6
ABC
1
2
3
ABC
1
2
3
4
ABC
8
2
3
to get file1:
ABC
4
5
6
file2:
ABC
1
2
3
etc.
Looking at the docs of man csplit: csplit my_file /regex/ {num}.
I can split this file using: csplit my_file '/^ABC$/' {2} but this requires me to put in a number for {num}. When I try to match with {*} which suppose to repeat the pattern as much as possible, i get the error:
csplit: *}: bad repetition count
I am using a zshell.
To split a file on a pattern like this, I would turn to awk:
awk 'BEGIN { i=0; }
/^ABC/ { ++i; }
{ print >> "file" i }' < input
This reads lines from the file named input; before reading any lines, the BEGIN section explicitly initializes an "i" variable to zero; variables in awk default to zero, but it never hurts to be explicit. The "i" variable is our index to the serial filenames.
Subsequently, each line that starts with "ABC" will increment this "i" variable.
Any and every line in the file will then be printed (in append mode) to the file name that's generated from the text "file" and the current value of the "i" variable.
We have a fixed width file
Col1 length 10
Col2 length 10
Col3 length 30
Col4 length 40
Sample record
ABC 123 xyz. 5171-5261,51617
ABC. 1234. Xxy. 81651-61761
Col4 can have any number of comma separated values
1 or more within length of 40 characters: If it is has 1 value for that record there is no change in output file.
If more than one value is there i.e. comma separated (5171-5261,51617)
the output file should have multiple records.
1 record
ABC. 123. Xyz. 5171-5261
ABC 123. Xyz. 51617
What is the most efficient way to do this.
As of now trying using while and for loop but it is taking so long for execution since we are doing this splitting by reading each record.
The output file can be comma separated or fixed width.
awk is your friend here.
A single line of awk will achieve what you need:
awk -v FIELDWIDTHS="10 10 30 40" '{ if (match($4,",")) { split($4,array,","); for (i in array) { print $1,$2,$3,array[i]; }; } else { print $1,$2,$3,$4 }; }' samp.dat
For ease of reading the code is:
{
if (match($4,",")) {
split($4,array,",");
for (i in array) {
print $1,$2,$3,array[i];
};
} else {
print $1,$2,$3,$4
};
}
Testing with the sample data you supplied gives:
ABC 123 xyz. 5171-5261
ABC 123 xyz. 51617
ABC. 1234. Xxy. 81651-61761
How it works:
awk reads your file one line at a time.
The FIELDWIDTHS directive allows us to reference each column as $1,$2...
Now that we have our columns we can look for a comma in the fourth field with match($4,",").
If we find one we make an array of the values in the fourth field that are separated by commas with split($4,array,",").
Then we loop through this array and print multiple lines of output, one for each element of the array.
If the fourth field has no comma the else clause prints a single line.
This process repeats for each line in your fixed width file.
NOTE:
awk associative arrays do not guarantee to preserve the order of your data.
This means that your output might come out as
ABC 123 xyz. 51617
ABC 123 xyz. 5171-5261
ABC. 1234. Xxy. 81651-61761
i.e. 5171-5261,51617 in the input data produced a line from the second value before the first.
If the ordering is important to you then you can use the code below that makes a csv from your input data first, then produces the output preserving the order.
awk -v FIELDWIDTHS="10 10 30 40" '{print $1,$2,$3,$4}' OFS=',' samp.data > samp.csv
awk -F',' '{ for (i=4; i<=NF; i++) { print $1,$2,$3,$i } }' samp.csv
I have a text file:
head train_test_split.txt
1 0
2 1
3 0
4 1
5 1
What I want to do is save the first column values for which second column value is 1 to file train.txt.
So, the corresponding first column value for second column value with 1 are: 2,4,5. So, in my train.txt file i want:
2
4
5
How can I do this easily unix?
You can use awk for this:
awk '$2 == 1 { print $1 }' inputfile
That is,
$2 == 1 is a filter,
matching lines where the 2nd column is 1,
and print $1 means to print the first column.
In Perl:
$ perl -lane 'print "$F[0]" if $F[1]==1' file
Or GNU grep:
$ grep -oP '^(\S+)(?=[ \t]+1$)' file
But awk is the best. Use awk...
There are 3 files in a directory. How can i print first file 1st line, Second file 3rd line and Third file 4th line using UNIX command ?
I tried with cat filename.txt| sed -n 1p but it is applicable for only one file. How can I view all the three files at a time ??
Using awk. at the beginning of each file f is increased to follow which file we're dealing with then we just team that up with the required record number of each file (FNR):
$ awk 'FNR==1 {f++} f==1&&FNR==1 || f==2&&FNR==3 || f==3&&FNR==4' 1 2 3
11
23
34
Record of the first file, the others are similar:
$ cat 1
11
12
13
14