numbering unique values in a column

numbering unique values in a column - count

I have a file where column one has a list of family identifiers
AB
AB
AB
AB
SAR
SAR
EAR
is there a way that I can create a new column where each repeat is numbered creating a new label for each repeat i.e.
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1
I am able to do this in SAS but am looking for a bash option (possibly awk)
data file;
set file;
count+1;
by name;
if first.name then count=1;
new_name=compress(name||'_'||count);
run;

$ awk '{print $1"_"++a[$1]}' file
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

try this one-liner:
awk '{a[$0]++;print $0"_"a[$0]}' file
with your input data:
kent$ echo "AB
AB
AB
AB
SAR
SAR
EAR"|awk '{a[$0]++;print $0"_"a[$0]}'
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

Related

Comparison of two CSV files

I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want.
Here is my problem:
I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this:
john;london
marc;paris
karen;new york
ben;london
vic;dublin
I have a second file (File2) with other information where we find some names ($3) of File1:
45456;windows;john;454646
47764;mac;zack;470093
41225;mac;ben;622101
12634;windows;ben;218996
7856;windows;karen;637294
12;mac;finn;878317
2315;windows;beverly;221167
445;windows;lilly;12316
3232;mac;john;601316
4546;mac;fish;305035
487;windows;vic;447421
46464;mac;karen;468154
I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for:
45456;windows;john;454646;london
3232;mac;john;601316;london
7856;windows;karen;637294;new york
46464;mac;karen;468154;new york
41225;mac;ben;622101;london
12634;windows;ben;218996;london
487;windows;vic;447421;dublin
Could you help me?

Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city.
awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2
45456;windows;john;454646;london
41225;mac;ben;622101;london
12634;windows;ben;218996;london
7856;windows;karen;637294;new york
3232;mac;john;601316;london
487;windows;vic;447421;dublin
46464;mac;karen;468154;new york

With bash, GNU sort and GNU join:
join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2
Output:
12634;windows;ben;218996;london
41225;mac;ben;622101;london
45456;windows;john;454646;london
3232;mac;john;601316;london
46464;mac;karen;468154;new york
7856;windows;karen;637294;new york
487;windows;vic;447421;dublin

Using csvkit
csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2

deleting repetitive columns in unix

I would like to delete multiple repetitive columns from a huge file (about 1 million).
The columns that I want to delete has the same column names: A and others has different unique name. Say:
A B2 A B3
1.1 AA 1.2 AA
2.1 AB 4.3 CT
2.2 AC 6.4 GT
so column headers are A, B2, A, B3,... .
How could I delete the columns named as A's from the data.

Another in awk:
$ awk '
NR==1 {
split($0,a)
for(i in a)
if(a[i]=="A")
delete a[i]
}
{
for(i=1;i<=NF;i++)
printf "%s",(i in a?$i OFS:"")
printf ORS
}' file
B2 B3
AA AA
AB CT
AC GT

I'm not sure I'm understanding your question correctly, but here an (GNU) awk solution to delete all duplicate columns (keeping only the first occurrence):
#!/usr/bin/awk -f
NR==1 {
seen[$1] = 1
cols[0] = 1
for (i=2; i<=NF; i++) {
if (!($i in seen)) {
seen[$i] = 1
cols[length(cols)] = i
}
}
}
{
for (i=0; i<length(cols); i++)
printf $(cols[i]) " "
printf "\n"
}
For the first line (NR==1), we find all non-duplicate columns (preserving the order), and for all the other lines, we just print out the columns (fields) we selected before (cols array holds column/field indexes we wish to keep).
$ ./filter.awk file
A B2 B3
1.1 AA AA
2.1 AB CT
2.2 AC GT

cut -d' ' -f $(head -1 filename|tr ' ' '\n'|awk '{if(!seen[$0]++) print NR}'|paste -s -d ',') filename
this will work like a charm.

The question is solved by the James Brown code.
I added
!/usr/bin/awk -f
to the first line of his code and correct tiny typo at the end of the code (simply additional -'- deleted).
I am sorry, I did not have time to try all other suggestions
with my best wishes

Compare size of each cell in Unix Scripting

I want to compare each cell size/length and change its content depending on its length.
The current table is of format
AB
CD
AB
AB
CD
155668/01
AB
1233/10
I want to replace the cells which have length more than "2" to DE.
Output
AB
CD
AB
AB
CD
DE
AB
DE
I tried
awk -F "," '{ if($(#1) > "2") {print "DE"} else {print $1 }}'
It says syntax error.
If I use wc -m in place if $(# the output is same is the input.

The easiest way is to use sed:
sed '/^..$/!s/.*/DE/' file
In awk, you could say:
awk '!/^..$/ { $0 = "DE" } 1' file
In both cases, the idea is the same: if the line does not consist of exactly two characters, replace the whole line with DE. In the case of sed, the whole line is .*, in the case of awk, it is $0.

Try this -
$ awk '{print (length($1)>2?"DE":$1)}' f
AB
CD
AB
AB
CD
DE
AB
DE

The idiomatic way would be:
awk 'length($1) > 2 { $1 = "DE" } 1'

Convert specific column of file into upper case in unix (without using awk and sed)

My file is as below
file name = test
1 abc
2 xyz
3 pqr
How can i convert second column of file in upper case without using awk or sed.

You can use tr to transform from lowercase to uppercase. cut will extract the single columns and paste will combine the separated columns again.
Assumption: Columns are delimited by tabs.
paste <(cut -f1 file) <(cut -f2 file | tr '[:lower:]' '[:upper:]')
Replace file with your file name (that is test in your case).

In pure bash
#!/bin/bash
while read -r col1 col2;
do
printf "%s%7s\n" "$col1" "${col2^^}"
done < file > output-file
Input-file
$ cat file
1 abc
2 xyz
3 pqr
Output-file
$ cat output-file
1 ABC
2 XYZ
3 PQR

Create a CSV File with specific format from a String obtained after WS Invocation

The title is self explanatory. I am calling a web service which is returning a String like this :
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content.
The format must be something like this :
Field Name(in yellow color) Value(in yellow color)
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ?
Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here.
Kindly provide some suggestions(sample code). Any help is greatly appreciated .

an awk one-liner could convert the format:
awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file
if you want the output to look better, you could pass the output to column and change the OFS like:
awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t
the output is:
kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)658974562
short explanation:
awk #awk command
-v RS="\\n|;" #set line separator is \n(newline) or ;(semi)
-v OFS="\t" #set output field separator: <tab>
-F= #set "=" as field separator
'{gsub(/"/,""); #remove all double quotes
$1=$1} #$1=$1, to let awk reformat the line with given OFS
7' #the non-zero number to print the whole line.

Can be achieved using tr and column:
$ cat input
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
$ cat input | tr ";" "\n" | column -s= -t | tr -d \"
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes!
EDIT: Didn't realize that you want a CSV. In that event, use:
$ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \"
which will result into a TAB delimited output.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

numbering unique values in a column - count

$ awk '{print $1"_"++a[$1]}' file AB_1 AB_2 AB_3 AB_4 SAR_1 SAR_2 EAR_1

try this one-liner: awk '{a[$0]++;print $0"_"a[$0]}' file with your input data: kent$ echo "AB AB AB AB SAR SAR EAR"|awk '{a[$0]++;print $0"_"a[$0]}' AB_1 AB_2 AB_3 AB_4 SAR_1 SAR_2 EAR_1

Related

Comparison of two CSV files

deleting repetitive columns in unix

Compare size of each cell in Unix Scripting

Convert specific column of file into upper case in unix (without using awk and sed)

Create a CSV File with specific format from a String obtained after WS Invocation

Categories

Resources