Trying to copy data from vertica table into csv using vsql - unix
Was trying to export data from vertica table to CSV file, but some data contain comma "," in values which pushed to other column.
vsql -h server_address -p 5433 -U username -w password -F $',' -A -o sadumpfile_3.csv -c
"select full_name from company_name;" -P footer=off
Vertica table data and expected csv:
full_name
----------
Samsun pvt, ltd
Apple inc
abc, pvt ltd
Ouput sadumpfile_3.csv
full_name
------------- ---------
Samsunpvt ltd
Apple inc
abc pvt ltd
Thanks in advance
Default behaviour (I have the four environment variables VSQL_USER, VSQL_PASSWORD, VSQL_HOST and VSQL_DATABASE set):
marco ~/1/Vertica/supp $ vsql -c "select full_name from company_name"
full_name
-----------------
Apple inc
Samsun pvt, ltd
abc, pvt ltd
(3 rows)
The simplest way to achieve what you were trying:
marco ~/1/Vertica/supp $ vsql -F ',' -A -c "select full_name from company_name;" -Pfooter
full_name
Apple inc
Samsun pvt, ltd
abc, pvt ltd
Note that the only commas are the ones already existing in the strings. If you only export one column, there's no field delimiter in the output.
I can only suppose that you want to have the output so that you can, for example, import it into Excel as CSV. If the field delimiter exists in a string, you would need to enclose the string with (usually double) quotes.
Vertica has a function that encloses a string with double quotes: QUOTE_IDENT():
marco ~/1/Vertica/supp $ vsql -F ',' -A \
-c "select QUOTE_IDENT(full_name) AS full_name from company_name;" -Pfooter
full_name
"Apple inc"
"Samsun pvt, ltd"
"abc, pvt ltd"
Related
Comparison of two CSV files
I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want. Here is my problem: I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this: john;london marc;paris karen;new york ben;london vic;dublin I have a second file (File2) with other information where we find some names ($3) of File1: 45456;windows;john;454646 47764;mac;zack;470093 41225;mac;ben;622101 12634;windows;ben;218996 7856;windows;karen;637294 12;mac;finn;878317 2315;windows;beverly;221167 445;windows;lilly;12316 3232;mac;john;601316 4546;mac;fish;305035 487;windows;vic;447421 46464;mac;karen;468154 I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for: 45456;windows;john;454646;london 3232;mac;john;601316;london 7856;windows;karen;637294;new york 46464;mac;karen;468154;new york 41225;mac;ben;622101;london 12634;windows;ben;218996;london 487;windows;vic;447421;dublin Could you help me?
Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city. awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2 45456;windows;john;454646;london 41225;mac;ben;622101;london 12634;windows;ben;218996;london 7856;windows;karen;637294;new york 3232;mac;john;601316;london 487;windows;vic;447421;dublin 46464;mac;karen;468154;new york
With bash, GNU sort and GNU join: join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2 Output: 12634;windows;ben;218996;london 41225;mac;ben;622101;london 45456;windows;john;454646;london 3232;mac;john;601316;london 46464;mac;karen;468154;new york 7856;windows;karen;637294;new york 487;windows;vic;447421;dublin
Using csvkit csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2
Finding a Column in File1 present in File2 in awk
I've 2 files as below File1 USA,China,India,Canada File2 Iran|XXXXXX|Iranian Iraq|YYYYYY|Iraquian Saudi|ZZZZZ|Saudi is a Rich Country USA|AAAAAA|USA is United States of America. India|IIII|India got freedom from British. Scot|SSSSS|Canada Mexio. How can I read the Value in File1 and check if it matches with the first delimited string in File2 using awk? I've tried this But i could not achieve it. Please help. For the Above Input the Output should be USA Matches China Not Matched India Matches Canada Not Matches
Could you please try following. awk 'FNR==NR{a[$1];next} {for(i=1;i<=NF;i++){if($i in a){print $i,"Matches"} else {print $i,"Not Matches."}}}' FS="|" Input_file2 FS="," Input_file1
You can try Perl also $ cat vinoth1 USA,China,India,Canada $ cat vinoth2 Iran|XXXXXX|Iranian Iraq|YYYYYY|Iraquian Saudi|ZZZZZ|Saudi is a Rich Country USA|AAAAAA|USA is United States of America. India|IIII|India got freedom from British. Scot|SSSSS|Canada Mexio. $ perl -F, -lane ' BEGIN { $x=qx(cat vinoth2) } print $_,$x=~/^$_/m ? " matches" : " not matches" for(#F) ' vinoth1 USA matches China not matches India matches Canada not matches
Command to perform full outer join with duplicate entries in key/join column
I have three files. I need to join them based on one column and perform some transformations. file1.dat (Column 1 is used for joining) 123,is1,ric1,col1,smbc1 123,is2,ric1,col1,smbc1 234,is3,ric3,col3,smbc2 345,is4,ric4,,smbc2 345,is4,,col5,smbc2 file2.dat (Column 1 is used for joining) 123,abc 234,bcd file3.dat (Column 4 is used for joining) r0c1,r0c2,r0c3,123,r0c5,r0c6,r0c7,r0c8 r2c1,r2c2,r2c3,123,r2c5,r2c6,r2c7,r2c8 r3c1,r3c2,r3c3,234,r3c5,r3c6,r3c7,r3c8 r4c1,r4c2,r4c3,345,r4c5,r4c6,r4c7,r4c8 Expected Output (output.dat) 123,r0c5,is1,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3 123,r0c5,is2,ric1,smbc1,abc,r0c8,r0c6,col1,r0c7,r0c1,r0c2,r0c3 123,r2c5,is1,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3 123,r2c5,is2,ric1,smbc1,abc,r2c8,r2c6,col1,r2c7,r2c1,r2c2,r2c3 234,r3c5,is3,ric3,smbc2,bcd,r3c8,r3c6,col3,r3c7,r3c1,r3c2,r3c3 345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c1,r4c2,r4c3 345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c1,r4c2,r4c3 I wrote the following awk command. awk ' BEGIN {FS=OFS=","} FILENAME == ARGV[1] { temp_join_one[$1] = $2"|"$3"|"$4"|"$5; next} FILENAME == ARGV[2] { exchtbunload[$1] = $2; next} FILENAME == ARGV[3] { s_temp_join_one = temp_join_one[$4]; split(s_temp_join_one, array_temp_join_one,"|"); v3=(array_temp_join_one[1]==""?"N/A":array_temp_join_one[1]); v4=(array_temp_join_one[2]==""?"N/A":array_temp_join_one[2]); v5=(array_temp_join_one[4]==""?"N/A":array_temp_join_one[4]); v6=(exchtbunload[$4]==""?"N/A":exchtbunload[$4]); v9=(array_temp_join_one[3]==""?"N/A":array_temp_join_one[3]); v11=($2=""?"N/A":$2); print $4, $5, v3, v4, v5, v6, $8, $6, v9, $7, $1, v11, $3 > "output.dat" } ' file1.dat file2.dat file3.dat I need to join all three files. The final output file should have all the values from file3 irrespective of whether they are in other two files and the corresponding columns should be empty(or N/A) if it is not present in other two files. (The order of the columns is not a very big problem. I can use awk to rearrange them.) But my problem is, as the key is not unique, I am not getting the expected output. My output has only three lines. I tried to apply the solution suggested using join condition. It works with smaller files. But the files I have are close to 3-5 GB in size. And they are in numerical order and not lexicographical order. Sorting them looks like would take lot of time. Any suggestion would be helpful. Thanks in advance.
with join, assuming files are sorted by the key. $ join -t, -1 1 -2 4 <(join -t, -a1 -a2 -e "N/A" -o1.1,1.2,1.3,1.4,1.5,2.1 file1 file2) \ file3 -o1.1,2.5,1.2,1.3,1.5,1.6,2.8,2.6,1.4,2.7,2.2,2.3 123,r0c5,is1,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3 123,r2c5,is1,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3 123,r0c5,is2,ric1,smbc1,123,r0c8,r0c6,col1,r0c7,r0c2,r0c3 123,r2c5,is2,ric1,smbc1,123,r2c8,r2c6,col1,r2c7,r2c2,r2c3 234,r3c5,is3,ric3,smbc2,234,r3c8,r3c6,col3,r3c7,r3c2,r3c3 345,r4c5,is4,ric4,smbc2,N/A,r4c8,r4c6,N/A,r4c7,r4c2,r4c3 345,r4c5,is4,N/A,smbc2,N/A,r4c8,r4c6,col5,r4c7,r4c2,r4c3
I really like the answer using join, but it does require that the files are sorted by the key column. Here's a version that doesn't have that restriction. Working under the theory that the best tool for doing database-like things is a database, it imports the CSV files into tables of a temporary SQLite database and then runs a SELECT on them to get your desired output: (edit: Revised version based on new information about the data) #!/bin/sh # Usage: ./merge.sh file1.dat file2.dat file3.dat > output.dat file1=$1 file2=$2 file3=$3 rm -f scratch.db sqlite3 -batch -noheader -csv -nullvalue "N/A" scratch.db <<EOF | perl -pe 's#(?:^|,)\K""(?=,|$)#N/A#g' CREATE TABLE file1(f1_1 INTEGER, f1_2, f1_3, f1_4, f1_5); CREATE TABLE file2(f2_1 INTEGER, f2_2); CREATE TABLE file3(f3_1, f3_2, f3_3, f3_4 INTEGER, f3_5, f3_6, f3_7, f3_8); .import $file1 file1 .import $file2 file2 .import $file3 file3 -- Build indexes to speed up joining and sorting gigs of data. CREATE INDEX file1_idx ON file1(f1_1); CREATE INDEX file2_idx ON file2(f2_1); CREATE INDEX file3_idx ON file3(f3_4); SELECT f3_4, f3_5, f1_2, f1_3, f1_5, f2_2, f3_8, f3_6, f1_4, f3_7, f3_1 , f3_2, f3_3 FROM file3 LEFT JOIN file1 ON f1_1 = f3_4 LEFT JOIN file2 ON f2_1 = f3_4 ORDER BY f3_4; EOF rm -f scratch.db Note: This will use a temporary database file that's going to be the size of all your data and then some because of indexes. If you're space constrained, I have an idea for doing it without temporary files, given the information that the join columns are sorted numerically, but it's enough work that I'm not going to bother unless asked.
Retrieving multiple rows only if data is complete
I am trying to retrieve rows that are relevant from a text file, however i am unsure of how i can get it done. Below is the sample line in the .txt file. Name : Alice Age : 23 Email : Alice#email.com Name : John Age : 24 Name : Peter Age: 25 Email :Peter#email.com So as seen from above, i am only interested to take the data of Alice and Peter because John's information is not complete (Missing the email row). So the output should just be : Name : Alice Age : 23 Email : Alice#email.com Name : Peter Age: 25 Email :Peter#email.com
Just print the records that have 3 lines: $ awk -v RS= -v ORS='\n\n' -F'\n' 'NF==3' file Name : Alice Age : 23 Email : Alice#email.com Name : Peter Age: 25 Email :Peter#email.com You can even automate it to figure out how many lines each record should have instead of hard-coding 3: $ awk -v RS= -v ORS='\n\n' -F'\n' 'NR==FNR{m=(NF>m?NF:m);next} NF==m' file file Name : Alice Age : 23 Email : Alice#email.com Name : Peter Age: 25 Email :Peter#email.com That last assumes there's at least one record in your file that IS complete.
With GNU grep: grep -Poz '^Name.*\n^Age.*\n^Email.*(\n^$)*' file Output: Name : Alice Age : 23 Email : Alice#email.com Name : Peter Age: 25 Email :Peter#email.com
You can use the following awk command: awk '/Name :/&&/Age :/&&/Email :/' RS='' ORS='\n\n' file Following man awk: If RS is set to the null string, then records are separated by blank lines. This makes awk operating based on records rather lines. /Name :/&&/Age :/&&/Email :/ checks if this records contain all required fields. If that is true, awk will print the record.
awk solution: awk '/Name/{ n=$0 }n && /Age/{ a=$0; rn=NR }a && /Email/ && (NR-rn == 1){ print n RS a RS $0 RS }' file The output: Name : Alice Age : 23 Email : Alice#email.com Name : Peter Age: 25 Email :Peter#email.com
very terse perl: perl -00 -lne 'print if tr/\n/\n/ == 2' file.txt
Create a CSV File with specific format from a String obtained after WS Invocation
The title is self explanatory. I am calling a web service which is returning a String like this : First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623" Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content. The format must be something like this : Field Name(in yellow color) Value(in yellow color) First Name Kunal Middle Name Last Name Bhowmick Address 1 HGB Address 2 cvf Address 3 tfg City DF State KL Country MN Postal Code 0012 Telephone (+98)6589745623 Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ? Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here. Kindly provide some suggestions(sample code). Any help is greatly appreciated .
an awk one-liner could convert the format: awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file if you want the output to look better, you could pass the output to column and change the OFS like: awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t the output is: kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t First Name Kunal Middle Name Last Name Bhowmick Address 1 HGB Address 2 cvf Address 3 tfg City DF State KL Country MN Postal Code 0012 Telephone (+98)658974562 short explanation: awk #awk command -v RS="\\n|;" #set line separator is \n(newline) or ;(semi) -v OFS="\t" #set output field separator: <tab> -F= #set "=" as field separator '{gsub(/"/,""); #remove all double quotes $1=$1} #$1=$1, to let awk reformat the line with given OFS 7' #the non-zero number to print the whole line.
Can be achieved using tr and column: $ cat input First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623" $ cat input | tr ";" "\n" | column -s= -t | tr -d \" First Name Kunal Middle Name Last Name Bhowmick Address 1 HGB Address 2 cvf Address 3 tfg City DF State KL Country MN Postal Code 0012 Telephone (+98)6589745623 Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes! EDIT: Didn't realize that you want a CSV. In that event, use: $ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \" which will result into a TAB delimited output.