We have a file which has been processed by unix command for removing duplicates. After the de-duplication new file has the header in-between the records. Please help to solve this and thanks in advance for inputs.
Unix Command : Sort -u >
I would do something like this:
grep "headers" >output.txt
grep -v "headers" >>output.txt
The idea is the following: first take the headers and put them into output.txt, and afterwards take everything which is not a header and put it into that output file.
First you need to put the information in the output file (which means you need to create the output file, hence the single > character), secondly you need to append the information to the already existing output file (hence the double >> character).
This question might have been asked a million times before, but did’t see my exaxt case.
Suppose a text file contains:
a
ab
bac
Now I want to grep on ‘a’ and have a hit only on the 1st line. After the ‘a’ there’s always a [tab] character.
Anyone any ideas?
Thanks!
Ronald
Try this:
head -1 *.txt | grep -P "a\t"
head will give you specified amount of lines of each file (all txt files in my example) , grep -P use the regular expressions as defined by perl (perl has \t as tab)
May I know how to write a Unix command to remove the last character of 2 specific fields (column 28 and 30) for all rows and in multiple files.
Example of File 1 before removal:
0,0,1,14289067,10114404,145,60104212839,1,1,1,8801971507671,1,60104212839,1,8801971507671F,4,170523,170523,1,1,235045,235045,0,0,255,1,0,BMRBGBO,0,BWGKPEI,16758,2,6,00000000000,8801971507671,0,0,,FFFFFFFFFFFFFFFFFFFFFFFF,3
1,14286085,10114405,142,601124225298,1,1,1,1062895388906858,1,601124225298,1,1062895388906858F,41,170523,170523,1,1,235045,235045,0,1,255,1,0,BINDMAO,0,BWGKPAI,39285,2,6,00000000000,62895388906858,0,,FFFFFFFFFFFFFFFFFFFFFFFF,2
After removal of the last character in fields 28 and 30 in File 1:
0,0,1,14289067,10114404,145,60104212839,1,1,1,8801971507671,1,60104212839,1,8801971507671F,4,170523,170523,1,1,235045,235045,0,0,255,1,0,BMRBGB,0,BWGKPE,16758,2,6,00000000000,8801971507671,0,0,,FFFFFFFFFFFFFFFFFFFFFFFF,3
1,14286085,10114405,142,601124225298,1,1,1,1062895388906858,1,601124225298,1,1062895388906858F,41,170523,170523,1,1,235045,235045,0,1,255,1,0,BINDMA,0,BWGKPA,39285,2,6,00000000000,62895388906858,0,,FFFFFFFFFFFFFFFFFFFFFFFF,2
I want to then proceed to the next file, File 2 and repeat the same process as above. Then this should continue until all files in a directory are completed.
Any help is much appreciated. Thank you!
You can possibly cut the fields specifying the delimiter as comma. And you do sed command to remove the character that you need using substitutions.
cat filename | cut -d',' -f28 |sed s/(.*)./$1/g;
Sorry to ask this, might be a trivial question, tried awk script as well. But I think I am new to that.
I have a list of Ids in a file i.e. ids.txt
1xre23
223dsf
234ewe
and a log file with FIX messages which might contain those ids.
sample: log file abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=abcd23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=xyzw23^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
I want to check how many ids matched in that log file.
Ids are large almost 10K, and log file size is around 300MB.
sample output I am looking for is.
output:
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Try something like with grep command:
grep -w -f ids.txt abc.log
Output:
35=D^A54=1xre23^A22=s^A120=GBP^A<br>
35=D^A54=234ewe^A22=s^A120=GBP^A<br>
35=D^A54=223dsf^A22=s^A120=GBP^A<br>
If you like to use awk this should do:
awk -F"[=^]" 'FNR==NR {a[$0];next} $4 in a' ids.txt abc.log
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
This store the ids.txt in array a
If fourth field (separated by = and ^) contains the ID, print it.
You can also do it the other way around:
awk 'FNR==NR {a[$0];next} {for (i in a) if ($0~i) print}' abc.log ids.txt
35=D^A54=1xre23^A22=s^A120=GBP^A
35=D^A54=234ewe^A22=s^A120=GBP^A
35=D^A54=223dsf^A22=s^A120=GBP^A
Store all data from abc.log in array a
Then test if line contains data for id.txt
If yes, print the line.
I'm using terminal on OS 10.X. I have some data files of the format:
mbh5.0_mrg4.54545454545_period0.000722172513951.params.dat
mbh5.0_mrg4.54545454545_period0.00077271543854.params.dat
mbh5.0_mrg4.59090909091_period-0.000355232058085.params.dat
mbh5.0_mrg4.59090909091_period-0.000402015664015.params.dat
I know that there will be some files with similar numbers after mbh and mrg, but I won't know ahead of time what the numbers will be or how many similarly numbered ones there will be. My goal is to cat all the data from all the files with similar numbers after mbh and mrg into one data file. So from the above I would want to do something like...
cat mbh5.0_mrg4.54545454545*dat > mbh5.0_mrg4.54545454545.dat
cat mbh5.0_mrg4.5909090909*dat > mbh5.0_mrg4.5909090909.dat
I want to automate this process because there will be many such files.
What would be the best way to do this? I've been looking into sed, but I don't have a solution yet.
for file in *.params.dat; do
prefix=${file%_*}
cat "$file" >> "$prefix.dat"
done
This part ${file%_*} remove the last underscore and following text from the end of $file and saves the result in the prefix variable. (Ref: http://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion)
It's not 100% clear to me what you're trying to achieve here but if you want to aggregate files into a file with the same number after "mbh5.0_mrg4." then you can do the following.
ls -l mbh5.0_mrg4* | awk '{print "cat " $9 " > mbh5.0_mrg4." substr($9,12,11) ".dat" }' | /bin/bash
The "ls -s" lists the file and the "awk" takes the 9th column from the result of the ls. With some string concatenation the result is passed to /bin/bash to be executed.
This is a linux bash script, so assuming you have /bind/bash, I'm not 100% famililar with OS X. This script also assumes that the number youre grouping on is always in the same place in the filename. I think you can change /bin/bash to almost any shell you have installed.