Trying to copy data from vertica table into csv using vsql - unix

Was trying to export data from vertica table to CSV file, but some data contain comma "," in values which pushed to other column.
vsql -h server_address -p 5433 -U username -w password -F $',' -A -o sadumpfile_3.csv -c
"select full_name from company_name;" -P footer=off
Vertica table data and expected csv:
full_name
----------
Samsun pvt, ltd
Apple inc
abc, pvt ltd
Ouput sadumpfile_3.csv
full_name
------------- ---------
Samsunpvt ltd
Apple inc
abc pvt ltd
Thanks in advance

Default behaviour (I have the four environment variables VSQL_USER, VSQL_PASSWORD, VSQL_HOST and VSQL_DATABASE set):
marco ~/1/Vertica/supp $ vsql -c "select full_name from company_name"
full_name
-----------------
Apple inc
Samsun pvt, ltd
abc, pvt ltd
(3 rows)
The simplest way to achieve what you were trying:
marco ~/1/Vertica/supp $ vsql -F ',' -A -c "select full_name from company_name;" -Pfooter
full_name
Apple inc
Samsun pvt, ltd
abc, pvt ltd
Note that the only commas are the ones already existing in the strings. If you only export one column, there's no field delimiter in the output.
I can only suppose that you want to have the output so that you can, for example, import it into Excel as CSV. If the field delimiter exists in a string, you would need to enclose the string with (usually double) quotes.
Vertica has a function that encloses a string with double quotes: QUOTE_IDENT():
marco ~/1/Vertica/supp $ vsql -F ',' -A \
-c "select QUOTE_IDENT(full_name) AS full_name from company_name;" -Pfooter
full_name
"Apple inc"
"Samsun pvt, ltd"
"abc, pvt ltd"

Related

Comparison of two CSV files

I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want.
Here is my problem:
I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this:
john;london
marc;paris
karen;new york
ben;london
vic;dublin
I have a second file (File2) with other information where we find some names ($3) of File1:
45456;windows;john;454646
47764;mac;zack;470093
41225;mac;ben;622101
12634;windows;ben;218996
7856;windows;karen;637294
12;mac;finn;878317
2315;windows;beverly;221167
445;windows;lilly;12316
3232;mac;john;601316
4546;mac;fish;305035
487;windows;vic;447421
46464;mac;karen;468154
I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for:
45456;windows;john;454646;london
3232;mac;john;601316;london
7856;windows;karen;637294;new york
46464;mac;karen;468154;new york
41225;mac;ben;622101;london
12634;windows;ben;218996;london
487;windows;vic;447421;dublin
Could you help me?
Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city.
awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2
45456;windows;john;454646;london
41225;mac;ben;622101;london
12634;windows;ben;218996;london
7856;windows;karen;637294;new york
3232;mac;john;601316;london
487;windows;vic;447421;dublin
46464;mac;karen;468154;new york
With bash, GNU sort and GNU join:
join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2
Output:
12634;windows;ben;218996;london
41225;mac;ben;622101;london
45456;windows;john;454646;london
3232;mac;john;601316;london
46464;mac;karen;468154;new york
7856;windows;karen;637294;new york
487;windows;vic;447421;dublin
Using csvkit
csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2

Finding a Column in File1 present in File2 in awk

I've 2 files as below
File1
USA,China,India,Canada
File2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
How can I read the Value in File1 and check if it matches with the first delimited string in File2 using awk? I've tried this But i could not achieve it. Please help.
For the Above Input the Output should be
USA Matches
China Not Matched
India Matches
Canada Not Matches
Could you please try following.
awk 'FNR==NR{a[$1];next} {for(i=1;i<=NF;i++){if($i in a){print $i,"Matches"} else {print $i,"Not Matches."}}}' FS="|" Input_file2 FS="," Input_file1
You can try Perl also
$ cat vinoth1
USA,China,India,Canada
$ cat vinoth2
Iran|XXXXXX|Iranian
Iraq|YYYYYY|Iraquian
Saudi|ZZZZZ|Saudi is a Rich Country
USA|AAAAAA|USA is United States of America.
India|IIII|India got freedom from British.
Scot|SSSSS|Canada Mexio.
$ perl -F, -lane ' BEGIN { $x=qx(cat vinoth2) } print $_,$x=~/^$_/m ? " matches" : " not matches" for(#F) ' vinoth1
USA matches
China not matches
India matches
Canada not matches

Command to perform full outer join with duplicate entries in key/join column

I have three files. I need to join them based on one column and perform some transformations.
file1.dat (Column 1 is used for joining)
123,is1,ric1,col1,smbc1
123,is2,ric1,col1,smbc1
234,is3,ric3,col3,smbc2
345,is4,ric4,,smbc2
345,is4,,col5,smbc2
file2.dat (Column 1 is used for joining)
123,abc
234,bcd
file3.dat (Column 4 is used for joining)
r0c1,r0c2,r0c3,123,r0c5,r0c6,r0c7,r0c8
r2c1,r2c2,r2c3,123,r2c5,r2c6,r2c7,r2c8
r3c1,r3c2,r3c3,234,r3c5,r3c6,r3c7,r3c8
r4c1,r4c2,r4c3,345,r4c5,r4c6,r4c7,r4c8
Expected Output (output.dat)
123,r0c5,is1,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3
123,r0c5,is2,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3
123,r2c5,is1,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3
123,r2c5,is2,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3
234,r3c5,is3,ric3,smbc2,bcd,r3c8,r3c6,col3,r3c7,r3c1,r3c2,r3c3
345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c1,r4c2,r4c3
345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c1,r4c2,r4c3
I wrote the following awk command.
awk '
BEGIN {FS=OFS=","}
FILENAME == ARGV[1] { temp_join_one[$1] = $2"|"$3"|"$4"|"$5; next}
FILENAME == ARGV[2] { exchtbunload[$1] = $2; next}
FILENAME == ARGV[3] { s_temp_join_one = temp_join_one[$4];
split(s_temp_join_one, array_temp_join_one,"|");
v3=(array_temp_join_one[1]==""?"N/A":array_temp_join_one[1]);
v4=(array_temp_join_one[2]==""?"N/A":array_temp_join_one[2]);
v5=(array_temp_join_one[4]==""?"N/A":array_temp_join_one[4]);
v6=(exchtbunload[$4]==""?"N/A":exchtbunload[$4]);
v9=(array_temp_join_one[3]==""?"N/A":array_temp_join_one[3]);
v11=($2=""?"N/A":$2);
print $4, $5, v3, v4, v5, v6, $8, $6, v9, $7, $1, v11, $3 >
"output.dat" }
' file1.dat file2.dat file3.dat
I need to join all three files.
The final output file should have all the values from file3 irrespective of whether they are in other two files and the corresponding columns should be empty(or N/A) if it is not present in other two files. (The order of the columns is not a very big problem. I can use awk to rearrange them.)
But my problem is, as the key is not unique, I am not getting the expected output. My output has only three lines.
I tried to apply the solution suggested using join condition. It works with smaller files. But the files I have are close to 3-5 GB in size. And they are in numerical order and not lexicographical order. Sorting them looks like would take lot of time.
Any suggestion would be helpful.
Thanks in advance.
with join, assuming files are sorted by the key.
$ join -t, -1 1 -2 4 <(join -t, -a1 -a2 -e "N/A" -o1.1,1.2,1.3,1.4,1.5,2.1 file1 file2) \
file3 -o1.1,2.5,1.2,1.3,1.5,1.6,2.8,2.6,1.4,2.7,2.2,2.3
123,r0c5,is1,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3
123,r2c5,is1,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3
123,r0c5,is2,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3
123,r2c5,is2,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3
234,r3c5,is3,ric3,smbc2,234,r3c8,r3c6,col3,r3c7,r3c2,r3c3
345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c2,r4c3
345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c2,r4c3
I really like the answer using join, but it does require that the files are sorted by the key column. Here's a version that doesn't have that restriction. Working under the theory that the best tool for doing database-like things is a database, it imports the CSV files into tables of a temporary SQLite database and then runs a SELECT on them to get your desired output:
(edit: Revised version based on new information about the data)
#!/bin/sh
# Usage: ./merge.sh file1.dat file2.dat file3.dat > output.dat
file1=$1
file2=$2
file3=$3
rm -f scratch.db
sqlite3 -batch -noheader -csv -nullvalue "N/A" scratch.db <<EOF | perl -pe 's#(?:^|,)\K""(?=,|$)#N/A#g'
CREATE TABLE file1(f1_1 INTEGER, f1_2, f1_3, f1_4, f1_5);
CREATE TABLE file2(f2_1 INTEGER, f2_2);
CREATE TABLE file3(f3_1, f3_2, f3_3, f3_4 INTEGER, f3_5, f3_6, f3_7, f3_8);
.import $file1 file1
.import $file2 file2
.import $file3 file3
-- Build indexes to speed up joining and sorting gigs of data.
CREATE INDEX file1_idx ON file1(f1_1);
CREATE INDEX file2_idx ON file2(f2_1);
CREATE INDEX file3_idx ON file3(f3_4);
SELECT f3_4, f3_5, f1_2, f1_3, f1_5, f2_2, f3_8, f3_6, f1_4, f3_7, f3_1
, f3_2, f3_3
FROM file3
LEFT JOIN file1 ON f1_1 = f3_4
LEFT JOIN file2 ON f2_1 = f3_4
ORDER BY f3_4;
EOF
rm -f scratch.db
Note: This will use a temporary database file that's going to be the size of all your data and then some because of indexes. If you're space constrained, I have an idea for doing it without temporary files, given the information that the join columns are sorted numerically, but it's enough work that I'm not going to bother unless asked.

Retrieving multiple rows only if data is complete

I am trying to retrieve rows that are relevant from a text file, however i am unsure of how i can get it done.
Below is the sample line in the .txt file.
Name : Alice
Age : 23
Email : Alice#email.com
Name : John
Age : 24
Name : Peter
Age: 25
Email :Peter#email.com
So as seen from above, i am only interested to take the data of Alice and Peter because John's information is not complete (Missing the email row).
So the output should just be :
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
Just print the records that have 3 lines:
$ awk -v RS= -v ORS='\n\n' -F'\n' 'NF==3' file
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
You can even automate it to figure out how many lines each record should have instead of hard-coding 3:
$ awk -v RS= -v ORS='\n\n' -F'\n' 'NR==FNR{m=(NF>m?NF:m);next} NF==m' file file
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
That last assumes there's at least one record in your file that IS complete.
With GNU grep:
grep -Poz '^Name.*\n^Age.*\n^Email.*(\n^$)*' file
Output:
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
You can use the following awk command:
awk '/Name :/&&/Age :/&&/Email :/' RS='' ORS='\n\n' file
Following man awk:
If RS is set to the null string, then records
are separated by blank lines.
This makes awk operating based on records rather lines. /Name :/&&/Age :/&&/Email :/ checks if this records contain all required fields. If that is true, awk will print the record.
awk solution:
awk '/Name/{ n=$0 }n && /Age/{ a=$0; rn=NR }a && /Email/ && (NR-rn == 1){ print n RS a RS $0 RS }' file
The output:
Name : Alice
Age : 23
Email : Alice#email.com
Name : Peter
Age: 25
Email :Peter#email.com
very terse perl:
perl -00 -lne 'print if tr/\n/\n/ == 2' file.txt

Create a CSV File with specific format from a String obtained after WS Invocation

The title is self explanatory. I am calling a web service which is returning a String like this :
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content.
The format must be something like this :
Field Name(in yellow color) Value(in yellow color)
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ?
Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here.
Kindly provide some suggestions(sample code). Any help is greatly appreciated .
an awk one-liner could convert the format:
awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file
if you want the output to look better, you could pass the output to column and change the OFS like:
awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t
the output is:
kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)658974562
short explanation:
awk #awk command
-v RS="\\n|;" #set line separator is \n(newline) or ;(semi)
-v OFS="\t" #set output field separator: <tab>
-F= #set "=" as field separator
'{gsub(/"/,""); #remove all double quotes
$1=$1} #$1=$1, to let awk reformat the line with given OFS
7' #the non-zero number to print the whole line.
Can be achieved using tr and column:
$ cat input
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
$ cat input | tr ";" "\n" | column -s= -t | tr -d \"
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes!
EDIT: Didn't realize that you want a CSV. In that event, use:
$ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \"
which will result into a TAB delimited output.

Resources