I have existing csv file.I want to modify the csv file and include filename in first column of the file.
Example
file.csv
1,love,anger
Modified csv
file.csv
file.csv,1,love,anger
Can we do it using one liner in awk or unix
Thanks a lot in advance
another one
$ awk '{print FILENAME (NF?",":"") $0}' file
It's just as simple as: awk '{if($0) printf("%s,%s\n", FILENAME, $0); else print FILENAME;}' file.csv
where file.csv is input file name.
UPD.: I modified command adding condition to correctly deal with empty lines
Related
I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)
This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv
$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f
You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.
Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv
Could you please help in fetching duplicates from a specific column in a csv file in unix.
Tried with uniq utility, it works only with txt file.
Please suggest.
Try sorting the value before applying uniq.
awk -F ',' '{print $1}' <filenname> |sort | uniq
How to accept dynamic attributes (no of columns present in csv file)? But I want to do aggregation on particular column for ex cust_id, notification_type,count . And output of these should be stored in another csv file.
I have tried this
awk 'BEGIN{FS=OFS=","}{a[$2 OFS $3]+=$4}END{for(i in a)print i,a[i]}' file_name
It is single line command. I want proper script.
It should be wrapped in script
like script_name.sh input_file output Folder
Sample file (actual file may in GB's)
1,A,OTC,1
2,B,RC,1
3,C,PB,1
4,A,OTC,1
5,A,RC,1
6,B,RC,1
Output Should be this:-
1,A,OTC,2
,RC,1
2,B,RC,1
3,C,PB,1
I have some directory with multiple files with the extention .failed
This files have the following format:
file1.failed:
FHEAD|4525|20170109000000|20170125024831
THEAD|150001021|20170109121206||
TDETL|4000785067||1|EA|||RETURN|||N
TTAIL|1
THEAD|150001022|20170109012801||
TDETL|4000804525||1|EA|||RETURN|||N
TTAIL|1
FTAIL|6
I need to extract all the text between THEAD| and |2 to a output file.
im trying the following and it works only if i have only one file in the directory.
sed -n 's:.*THEAD|\(.*\)|2.*:\1:p' <*.failed >transactions.log
The output is:
transactions.log:
150001021
150001022
Now how can i do the same but for multiple files?
Also it is possible to add the filename in the output file?
expected output:
file1.failed
150001021
150001022
file2.failed
150001023
150001024
150001025
In awk:
$ awk -F\| 'FNR==1{print FILENAME} $1=="THEAD"{print $2}' foo foo
foo
150001021
150001022
foo
150001021
150001022
On the first record of each file it prints out the filename and after that it prints the second field on records that start with THEAD. Replace foo with all required files.
This might work for you (GNU sed):
sed -sn '1F;s/^THEAD|\([^|]*\)|.*/\1/p' file1 file2 file3 ...
Use the options -n and -s to invoke the grep-like nature and treat each files addresses separately. Display the current file name on the first line of the file only. Substitute and print the value between the required strings.
I am just splitting a very large csv file in to parts. When ever i run the following command. the doesn't completely split rather returns me the following error. how can i avoid the split the whole file.
awk -F, '{print > $2}' test1.csv
awk: YY1 makes too many open files
input record number 31608, file test1.csv
source line number 1
Just close the files after writing:
awk -F, '{print > $2; close($2)}' test1.csv
You must have a lot of lines. Are you sure that the second row repeats enough to put those records into an individual file? Anyway, awk is holding the files open until the end. You'll need a process that can close the file handles when not in use.
Perl to the rescue. Again.
#!perl
while( <> ) {
#content = split /,/, $_;
open ( OUT, ">> $content[1]") or die "whoops: $!";
print OUT $_;
close OUT;
}
usage: script.pl your_monster_file.csv
outputs the entire line into a file named the same as the value of the second CSV column in the current directory, assuming no quoted fields etc.