Split data in a line using unix - unix

How do you use the unix to create a csv file where each field is a column?
My data is:
>A::LOLLLL rank=1 x=2 y=9 length=10
Column 1 Column 2 Column 3
>A LOLLLL 10
I tried using awk '{print $1}'input_file to try to separate between the fields but the terminal reads out command not found. I wanted to use this to then have each field I am interested in turned into a separate txt.file where I can change the extension to .csv and combine manually. Is there an easier way to do this?

Using awk you can do this:
echo ">A::LOLLLL rank=1 x=2 y=9 length=10" | awk -F"[: =]" '{print $1,$3,$NF}' OFS="\t"
>A LOLLLL 10
To get to separate files:
awk -F"[: =]" '{print $1 >"c1.csv";print $3 >"c2.csv";print $NF >"c3.csv"}' file

Related

find similar rows in a text file in unix system

I have a file named tt.txt and the contents of this file is as follows:
fdgs
jhds
fdgs
I am trying to get the similar row as the output in a text file.
my expected output is:
fdgs
fdgs
to do so, I used this command:
uniq -u tt.txt > output.txt
but it returns:
fdgs
jhds
fdgs
do you know how to fix it?
If by similar row you mean the row with the same content.
From the uniq manpage the uniq command would only filter the adjacent matching lines from the repeated lines. So you need to sort the input first and used -D option to print all duplicated lines like below. However -D options is limited to the GNU implementation, and doing this would print the output in different order from the input.
sort tt.txt | uniq -D
If you want the output to be in the same order you need to remember the input line number and sort the line number again like this
cat -n tt.txt | sort -k 2 | uniq -f 1 -D | sort -k 1,1 | sed 's/\s+[0-9]+\s+//'
cat -n would print the content with the line number
sort -k 2 would sort the input starting at 2rd column
uniq -f 1 would ignore the first column
sort -k1,1 would sort the the output back by the original line number
sed 's/\s+[0-9]+\s+//' would delete the first column with line number
uniq -u command would output only the unique input line, which is completely opposite as what you want.
One in awk:
$ awk '++seen[$0]==2;seen[$0]>1' file
fdgs
fdgs

Unix To display student records

Contents of sample input file(input.txt) - starting from following line,
Name|Class|School Name
Deepu|First|Meridian
Neethu|Second|Meridian
Sethu|First|DAV
Theekshana|Second|DAV
Teju|First|Sangamithra
I need to output the details of the student with the school name Sangamithra
in the below format. I am new to unix. So I need help.
Desired output:
Sangamithra|First|Teju
I think you are looking something like this one.
awk -F\| '{print $3"|"$2"|"$1}' filename
School Name|Class|Name
Meridian|First|Deepu
Meridian|Second|Neethu
DAV|First|Sethu
DAV|Second|Theekshana
Sangamithra|First|Teju
If you're just interested in the output, this can be achieved using grep:
grep "Sangamithra" input.txt
If you want the name to be first, you might need awk (tested):
grep "Sangamithra" input.txt | awk -F "|" '{print $3"|"$1"|"$2}'

How to grep content of file and create another file with grepped content?

I want to get grep content of file matching particular text and then want to save all those records which matches particular text to new file and also want to make sure that matched content is removed from original file.
296949657|QL|163744584|163744581|20441||
292465754|RE|W757|3012|301316469|00|
296950717|RC|7264|00001|013|27082856203|
292465754|QL|191427266|191427266|16405||
296950717|RC|7264|AETNAACTIVE|HHRPPO|27082856203|
299850356|RC|7700|153447|0891185100102-A|W19007007201|
292465754|RE|W757|3029|301316469|00|
299850356|RC|7700|153447|0891185100104-A|W19007007201|
293695591|QL|743559415|743559410|18452||
297348183|RC|6602|E924|0048|CD101699303|
297348183|RC|6602|E924|0051|CD101699303|
108327882|QL|613440276|613440275|17435||
I have written awk and it works as expected for small files but for larger files is not working as expected....am sure that i have missed something...
awk '{print $0 > ($0~/RC/?"RC_RECORDS":"TEST.DAT")}' TEST.DAT
any thoughts on how to fix this.
Update 1
Now in above file, i always want to check values of column two to |RC| and if it matches then move that record to RC_RECORDS file and if values matches to |RE| then move it to RE_RECORDS, how can this be done.
Case 1:
So for example if i have records as
108327882|RE|613440276|613440275|RC||
then it should go to RE_RECORDS file.
Case 2:
108327882|RC|613440276|613440275|RE||
then it should go to RE_RECORDS
Case 3:
108327882|QL|613440276|613440275|RC||
then it should not go to either RE_RECORDS or RC_RECORDS
Case 4:
108327882|QL|613440276|613440275|RE||
then it should not go to either RE_RECORDS or RC_RECORDS
My hunch is that
awk '/\|RC\|/ {print > "RC_RECORDS.DAT";next} {print > "NEWTEST.DAT"}' TEST.DAT | awk '$2 == "RC"'
awk '/\|RE\|/ {print > "RE_RECORDS.DAT";next} {print > "FINAL_NEWTEST.DAT"}' NEWTEST.DAT | awk '$2 == "RE"'
but wanted to check if there's an better and quicker solution out there that can be used.
Update 2
Update 3
I think this is what you want:
Option 1
awk -F'|' '
$2=="RC" {print >"RC_RECORDS.TXT";next}
$2=="RE" {print >"RE_RECORDS.TXT";next}
{print >"OTHER_RECORDS.TXT"}' file
You can put it all on one line if you prefer, like this:
awk -F'|' '$2=="RC"{print >"RC_RECORDS.TXT";next} $2=="RE"{print >"RE_RECORDS.TXT";next}{print >"OTHER_RECORDS.TXT"}' file
Option 2
Or you can see how grep compares for speed/readability:
grep -E "^[[:alnum:]]+\|RC\|" file > RC_RECORDS.TXT &
grep -E "^[[:alnum:]]+\|RE\|" file > RE_RECORDS.TXT &
grep -vE "^[[:alnum:]]+\|R[CE]" file > OTHER_RECORDS.TXT &
wait
Option 3
This solution uses 2 awk processes and maybe achieves better parallelism in the I/O. The first awk extracts the RC records to a file and passes the rest onwards. The second awk extracts the RE records to a file and passes the rest on to be written to the OTHER_RECORDS.TXT file.
awk -F'|' '$2=="RC"{print >"RC_RECORDS.TXT";next} 1' file | awk -F'|' '$2=="RE"{print >"RE_RECORDS.TXT";next} 1' > OTHER_RECORDS.TXT
I created an 88M record file (3 GB), and ran some test on a desktop iMac as follows:
Option 1: 65 seconds
Option 2: 92 seconds
Option 3: 53 seconds
Your mileage may vary.
My file looks like this, i.e. 33% RE records, 33% RC records and rest junk:
00000000|RE|abcdef|ghijkl|mnopq|rstu
00000001|RC|abcdef|ghijkl|mnopq|rstu
00000002|XX|abcdef|ghijkl|mnopq|rstu
00000003|RE|abcdef|ghijkl|mnopq|rstu
00000004|RC|abcdef|ghijkl|mnopq|rstu
00000005|XX|abcdef|ghijkl|mnopq|rstu
00000006|RE|abcdef|ghijkl|mnopq|rstu
00000007|RC|abcdef|ghijkl|mnopq|rstu
00000008|XX|abcdef|ghijkl|mnopq|rstu
00000009|RE|abcdef|ghijkl|mnopq|rstu
Sanity Check
wc -l *TXT
29333333 OTHER_RECORDS.TXT
29333333 RC_RECORDS.TXT
29333334 RE_RECORDS.TXT
88000000 total

Combining two awk commands in single command

I want to combine these two command and want to invoke single command
In first command i am storing 4th column of x.csv(Separator ,) file in z.csv file.
awk -F, '{print $4}' x.CSV > z.csv
In second command, i want to find out unique first-column value of z.csv(Separator-space) file.
awk -F\ '{print $1}' z.csv|sort|uniq
I want to combine these two command in single command,How can i do that?
Pipe the output of the first awk to the second awk:
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' |sort|uniq
or, as Avinash Raj suggested,
awk -F, '{print $4}' x.CSV | awk -F\ '{print $1}' | sort -u
Assuming that the content of z.csv is actually wanted, rather than just an artefact of the way you're currently implementing your program, then you can use:
awk -F, '{ print $4 > "z.csv"
split($4, f, " ")
f4[f[1]] = 1
}
END { for (i in f4) print i }' x.CSV
The split function breaks field 4 on spaces, and (associative) array f4 records the key value. The loop at the end prints out the distinct values, unsorted. If you need them sorted, you can either use GNU awk's built-in sort functions or (if you don't have an awk with built-in sort functions) write your own in awk, or pipe the output to sort.
With GNU awk, you can replace the END block with:
END { asorti(f4); for (i in f4) print f4[i] }
If you don't want the z.csv file, then (a) you could have used a pipe in the first place, and (b) you can simply remove the print $4 > "z.csv" line.
awk '{split($4,b," "); a[b[1]]=1} END { for( i in a) print i }' FS=, x.CSV
This does not sort the data, but it's not clear if you actually want it sorted or merely needed that to get unique entries. If you do want it sorted, pipe it to sort.

How can I delete the second word of every line of top(1) output?

I have a formatted list of processes (top output) and I'd like to remove unnecessary information. How can I remove for example the second word+whitespace of each line.
Example:
1 a hello
2 b hi
3 c ahoi
Id like to delete a b and c.
You can use cut command.
cut -d' ' -f2 --complement file
--complement does the inverse. i.e. with -f2 second field was choosen. And with --complement if prints all fields except the second. This is useful when you have variable number of fields.
GNU's cut has the option --complement. In case, --complement is not available then, the following does the same:
cut -d' ' -f1,3- file
Meaning: print first field and then print from 3rd to the end i.e. Excludes second field and prints the rest.
Edit:
If you prefer awk you can do: awk {$2=""; print $0}' file
This sets the second to empty and prints the whole line (one-by-one).
Using sed to substitute the second column:
sed -r 's/(\w+\s+)\w+\s+(.*)/\1\2/' file
1 hello
2 hi
3 ahoi
Explanation:
(\w+\s+) # Capture the first word and trailing whitespace
\w+\s+ # Match the second word and trailing whitespace
(.*) # Capture everything else on the line
\1\2 # Replace with the captured groups
Notes: Use the -i option to save the results back to the file, -r is for extended regular expressions, check the man as it could be -E depending on implementation.
Or use awk to only print the specified columns:
$ awk '{print $1, $3}' file
1 hello
2 hi
3 ahoi
Both solutions have there merits, the awk solution is nice for a small fixed number of columns but you need to use a temp file to store the changes awk '{print $1, $3}' file > tmp; mv tmp file where as the sed solution is more flexible as columns aren't an issue and the -i option does the edit in place.
One way using sed:
sed 's/ [^ ]*//' file
Results:
1 hello
2 hi
3 ahoi
Using Bash:
$ while read f1 f2 f3
> do
> echo $f1 $f3
> done < file
1 hello
2 hi
3 ahoi
This might work for you (GNU sed):
sed -r 's/\S+\s+//2' file

Resources