unix command to print every 2nd line of duplicate - unix

I have a text file that has 110132 lines and looks like this,
b3694658:heccc 238622
b3769025:heccc 238622
b3694659:heccc 238623
b3769026:heccc 238623
b3694660:heccc 238624
b3769027:heccc 238624
b3694661:heccc 238625
b3769028:heccc 238625
Notice that every 2nd line has a duplicate entry at heccc etc., i want an output that only has the 2nd occurrence of the duplicate, so it would look like this,
b3769025:heccc 238622
b3769026:heccc 238623
b3769027:heccc 238624
b3769028:heccc 238625
Thanks for your help!

It appears that you are just looking to output unique values. If that is so, just do this:
cat textfile | sort | uniq

uniq -f1 file.txt
should do in this case.
see how -f , -s options work with the uniq command?

Related

Remove data in file1 against file2

This might be the worst example ever given on StackOverflow, but my purpose is to remove everything in File1 against File2. Whilst ignoring case sensitivity and matching the entire line. For example Cats#123:bob would be removed from File2 as the word Cat appears in File1. So regardless of case sensitivty, if a matching word is found it should eradicate the entirety of the line.
Input (File1):
Cat
Dog
Horse
Wheel
MainFile (File2)
Cats#123:bob
dog#1:truth
Horse-1:fairytale
Wheel:tremendous
Divination:maximus
Desired output
Divination:maximus
As the output shows, only "Divination:maximus" should be outputted as no matching words were found in File1. I prefer to use Sed or Awk generally as I use Cygwin. But any suggestions are welcomed, I can answer all questions you may have, thanks.
Here's what I've tried so far, but it's not working unfortunately, as my output is incorrect. To add to this, simply the wrong lines are being outputted. I'm fairly inexperienced so I don't know how to develop upon this syntax below, and maybe it's completely irrelevant to the job at hand.
grep -avf file1.txt file2.txt > output.txt
The grep command can do that for you:
grep -v -i -f file1 file2
The -f file1 tells grep to use the patterns in file1
The -i flag means case insensitive
The -v flag means to search lines that do not contain those patterns

find similar rows in a text file in unix system

I have a file named tt.txt and the contents of this file is as follows:
fdgs
jhds
fdgs
I am trying to get the similar row as the output in a text file.
my expected output is:
fdgs
fdgs
to do so, I used this command:
uniq -u tt.txt > output.txt
but it returns:
fdgs
jhds
fdgs
do you know how to fix it?
If by similar row you mean the row with the same content.
From the uniq manpage the uniq command would only filter the adjacent matching lines from the repeated lines. So you need to sort the input first and used -D option to print all duplicated lines like below. However -D options is limited to the GNU implementation, and doing this would print the output in different order from the input.
sort tt.txt | uniq -D
If you want the output to be in the same order you need to remember the input line number and sort the line number again like this
cat -n tt.txt | sort -k 2 | uniq -f 1 -D | sort -k 1,1 | sed 's/\s+[0-9]+\s+//'
cat -n would print the content with the line number
sort -k 2 would sort the input starting at 2rd column
uniq -f 1 would ignore the first column
sort -k1,1 would sort the the output back by the original line number
sed 's/\s+[0-9]+\s+//' would delete the first column with line number
uniq -u command would output only the unique input line, which is completely opposite as what you want.
One in awk:
$ awk '++seen[$0]==2;seen[$0]>1' file
fdgs
fdgs

Customizing print output after getting a column using 'cut' command

I'm trying to print the first column of output in a "customized" way, after executing a program that prints out a table. I know how to get the first column from the output, but I want to print each row between single quotes. So, right now I have the commands that can get me the first column:
./genTable | cut -f2 | xargs -0
What can I add to this command so that it prints the values between quotes. For example, the output right now looks like
apple
cider
vinegar
I want it to look like
'apple'
'cider'
'vinegar'
I'd use Perl. ./genTable | perl -nwla -e 'print \'$F[1]\''
I'd use awk ;-) , i.e.
./genTable | awk -v singleQ="'" '{print singleQ $1 singleQ}'
And of course you if you want super-minimalist, change all references from singleQ to Q ;-)
output
'apple'
'cider'
'vinegar'
IHTH

how to take substring in ksh

I have a file named "output.txt" having data in format:
400949703|2000025967912|20130614010652|20130614131543
355949737|2144050263|20120407100407|20120407101307
355499738|2144500262|20110911010901|20110911135601
I am executing an awk command as shown below:
awk -F"|" '{num1="`echo $3| cut -c1-8`"; print $num1}' output.txt
My expected output is :
20130614
20120407
20110911
But I am getting output as what is actually the input.
400949703|2000025967912|20130614010652|20130614131543
355949737|2144050263|20120407100407|20120407101307
355499738|2144500262|20110911010901|20110911135601
Not able to find out the reason. My task is to compare the 1st 8 characters in 3rd and 4th column. But stucked at this part only.
Experts, kindly help me to get the way, where I am missing.
What about using cut twice?
$ cut -d'|' -f4 file | cut -c-8
20130614
20120407
20110911
Firstly to get the 4th field based on | delimiter.
Secondly to get the first 8 characters (note that cut -c-8 is the same as your cut -c1-8)
You're mixing bash with awk, one tool is just enough:
awk -F\| 'a=substr($3, 1, 8){if(a==substr($4, 1, 8)){print a}}' output.txt
Get substrings of columns 3 and 4 , compare it and print if its ok.

Value / pair update in awk

I have a basic CSV that contains key/value. The first two columns being the key and the third column being the value.
Example file1:
12389472,1,136-7402
23247984,1,136-7402
23247984,2,136-7402
34578897,1,136-7402
And in another file I have a list of keys that need their value changed in the first file. I'm trying to change the value to 136-7425
Example file2:
23247984,1
23247984,2
Here's what I'm currently doing:
/usr/xpg4/bin/awk '{FS=",";OFS=","}NR==FNR{a[$1,$2]="136-7425";next}{$3=a[$1,$2]}1' file2 file1 > output
Which is working but it's leaving the value blank for keys not found in file2. I'd like to only change the value for keys present in file2, and leave the current value for keys not found.
Can anyone point out what I'm doing wrong? Or perhaps there's an easier way to accomplish this.
Thanks!
Looks like you're just zapping the third field for keys that don't exist in the first file. Try this:
awk '{FS=OFS=","}NR==FNR{a[$1,$2]="136-7425";next} ($1,$2) in a{$3=a[$1,$2]} 1' file2 file1 > output
or (see comments below):
awk '{FS=OFS=","}NR==FNR{seen[$1,$2]++;next} seen[$1,$2]{$3="136-7425"} 1' file2 file1 > output
FYI an array named seen[] is also similarly and commonly used to remove duplicates from input, e.g.:
awk '!seen[$0]++' file
this line should work for you:
awk -F, -v OFS="," 'NR==FNR{a[$1,$2]=1;next}a[$1,$2]{$3="136-7425"}7' file2 file1

Resources