How to give hive query into sperate file for each query? - unix

I have a .sql file in which there are 100s of hive queries and i want their output in a multiple files, like for 1st query abc.txt file gets created for 2nd query xyz.txt file gets created and so on....for 100 queries 100 output file with their result respectively

If your main .sql file has semicolon separated sql queries, you could use an awk command like this to generate separate hive commands with output files.
tr '\n' ' ' < yourqueryfile | awk 'BEGIN {RS=";"} \
{gsub(/(^ +| +$)/, "", $0);printf "hive -e \"%s\" >OUT_"NF".txt\n",$0}'
RS=";" - sets record separator to ";"
tr - to replace newlines between queries to single space.
gsub - to trim leading and trailing spaces.
The command will generate multiple hive command lines like this.
hive -e "select 1" >OUT_2.txt
hive -e "select 3 from ( select 4 )" >OUT_7.txt
hive -e "select name from t union select n from t2" >OUT_9.txt
hive -e "select * from c" >OUT_4.txt

Related

Convert hive console output to text or csv

I need to perform a count on Hive table and output the result into a text file and drop it at another location as a trigger.
The hive output currently looks like this:
+-------------+----------+
| _c0 | _c1 |
+-------------+----------+
| 2020-03-01 | 3203500 |
+-------------+----------+
I tried options like following:
hive -e 'select CURRENT_DATE, count(*) from db.table;' | sed 's/[[:space:]]\+/,/g' > /trigger/trigger_file.txt
But its not giving the expected the result. What else can i try?
The expected outcome inside the .txt file is as follows:
2020-03-01,3203500
You may replace your sed command with 
awk -F'[| ]+' '$2 ~ /[0-9]{4}-[0-9]{2}-[0-9]{2}/{print $2","$3}'
The -F'[| ]+' sets the field separator to a [| ]+ regex that matches one or more occurrences of a space or pipe chars, then grabs all records where the second field matches a datelike pattern ([0-9]{4}-[0-9]{2}-[0-9]{2}, see demo), and prints their second and third column values with a comma and space in between.
To avoid all replacing results using sed ..etc, try with this approach using concat_ws(',',col1,col2...etc) and results output will have , separated data!
hive -e 'select CONCAT_WS(',',CURRENT_DATE, count(*)) from Mytable' > /home/user/Mycsv.csv
Hive provides inbuilt command to write into files
INSERT OVERWRITE LOCAL DIRECTORY '/home/docs/temp' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from db.table;
Other way
hive -S -e 'set hive.cli.print.header=false; select * from db.table' | sed 's/[[:space:]]\+/,/g' > /home/docs/temp.csv

extract set of lines using separator from file in UNIX

I have below 3 queries in a single flat file. i want to extract 1 query from input file(ex: 2nd query) at a time. Separator for each query is ";"(semi colon). Please suggest how can i do this?
input file: query.sql
select * from
DBNAME.table1;
select * from
DBNAME.table2
;
select * from
DBNAME.table3
WHERE date<= current_date-30;
output should be
Outputfile: query_out.sql
select * from
DBNAME.table2
;
You can use this awk,
awk 'BEGIN{RS=";"} NR==2{print $0}' yourfile.sql > output.sql
sat's answer does not trim blank lines between two sql requests, nor does it output the semicolon ending a request.
Provided you are using gawk (or any flavour of awk allowing RS to be a regular expression), the following will probably suit your needs:
awk 'BEGIN {RS=";[[:space:]]*"} NR==2 {printf "%s;\n",$0}' yourfile.sql

how to pass a large list of ids to the where clause in sqlite3 batch mode?

I have a large list of ids I want to use for a query in sqlite3. I can loop one by one in the shell, perl or R, or do clever hacking with xargs or ',' concatenating in order to query by batches in order to be more efficient but I was wondering if there is a way of loading directly the file in a temp table or do a 'where in ([read file]). Which is the standard way of dealing with this common situation?
You mean sqlite3 the command-line shell, right?
Yes, that one has an option to load file, in something-separated format, in a (temporary) table.
create temporary table ids (id integer primary key);
.import ids.txt ids
Where ids.txt would have each id on separate line. If you wanted to import multiple columns, you'd have to set separator using the .separator command beforehand, but for only one column it shouldn't matter.
Note that integer primary key is faster than any other column type because it aliases row id and sqlite stores tables in b-trees indexed by rowid. So use it if you can.
I came with a large one line solution: create a string of comma separated and quoted elements, and pipe to xargs. xargs will put as many of them as they are allowed in the comand line and repeat the command as many times as necessary.
If I have a tab separated file with my elements in the second column:
perl -lane ' push #x,$F[2]; END{ print join(", ",map{q{\"}.$_.q{\"}}#x)}' file.txt |\
xargs -t -I{} sqlite3 -separator $'\t' -header database.db \
'select * from xtable where mycolumn in ('{}') and myothercolumn="foo" order by bar"

Export from sqlite to csv using shell script

I'm making a shell script to export a sqlite query to a csv file, just like this:
#!/bin/bash
./bin/sqlite3 ./sys/xserve_sqlite.db ".headers on"
./bin/sqlite3 ./sys/xserve_sqlite.db ".mode csv"
./bin/sqlite3 ./sys/xserve_sqlite.db ".output out.csv"
./bin/sqlite3 ./sys/xserve_sqlite.db "select * from eS1100_sensor_results;"
./bin/sqlite3 ./sys/xserve_sqlite.db ".exit"
When executing the script, the output apears on the screen, instead of being saved to "out.csv". It's working doing the same method with the command line, but I don't know why the shell script fails to export data to the file.
What am I doing wrong?
Instead of the dot commands, you could use sqlite3 command options:
sqlite3 -header -csv my_db.db "select * from my_table;" > out.csv
This makes it a one-liner.
Also, you can run a sql script file:
sqlite3 -header -csv my_db.db < my_script.sql > out.csv
Use sqlite3 -help to see the list of available options.
sqlite3
You have a separate call to sqlite3 for each line; by the time your select runs, your .out out.csv has been forgotten.
Try:
#!/bin/bash
./bin/sqlite3 ./sys/xserve_sqlite.db <<!
.headers on
.mode csv
.output out.csv
select * from eS1100_sensor_results;
!
instead.
sh/bash methods
You can either call your script with a redirection:
$ your_script >out.csv
or you can insert the following as a first line in your script:
exec >out.csv
The former method allows you to specify different filenames, while the latter outputs to a specific filename. In both cases the line .output out.csv can be ignored.
I recently created a shell script that will be able to take the tables from a db file and convert them into csv files.
https://github.com/darrentu/convert-db-to-csv
Feel free to ask me any questions on my script :)
Although the question is about shell script, I think it will help few of those who are just bothered about transferring the data from the sqlite3 database to a csv file.
I found a very convinient way to do it with the firefox browser using SQLite Manager extension.
Simply connect to your sqlite database file in firefox ( SQlite manager -> connect database ) and then Table -> Export table. You will be served with some more options that you can just click and try....
In the end you get a csv file with the table u have chosen to be exported.
Using command line for Linux:
user#dell-Admin: sqlite3 #activate your sqlite database first
sqlite> .tables #search for tables if any available if already created one.
sqlite> .schema #if you want to check the schema of the table.
# once you find your table(s), then just do the following:
sqlite> .headers on #export along with headers (column names)
sqlite> .mode csv #file type is csv
sqlite> .output example.csv #you want to provide file name to export
sqlite> SELECT * from events; #If entire table is needed or select only required
sqlite> .quit #finally quit the sqlite3
Now search in your system for example.csv file and you will get it.
In one line is
sqlite3 -header -csv ./sys/xserve_sqlite.db "select * from eS1100_sensor_results;" >./out.csv
A synthesis of the answers till now:
function sqlite2csv-table() {
local db="${1}" table="${2}" output="${3}"
if test -z "$output" ; then
output="${db:r}_hi${table}.csv"
fi
[[ "$output" =~ '.csv$' ]] || output+='.csv'
echo "$0: outputting table '$table' to '$output'"
sqlite3 -header -csv "$db" "select * from ${table};" > "$output" || return $?
}
function sqlite2csv() {
local db="${1}" o="${2}"
tables=($(sqlite3 $db ".tables"))
local t
for table in $tables[#] ; do
sqlite2csv-table "$db" "$table" "${o}_${table}.csv"
done
}
Usage:
sqlite2csv some.db [/path/to/output]

How to automate a process with the sqlite3.exe command line tool?

I'm trying to bulk load a lot of data ( 5.5 million rows ) into an SQLite database file.
Loading via INSERTs seems to be far too slow, so I'm trying to use the sqlite3 command line tool and the .import command.
It works perfectly if I enter the commands by hand, but I can't for the life of me work out how to automate it from a script ( .bat file or python script; I'm working on a Windows machine ).
The commands I issue at the command line are these:
> sqlite3 database.db
sqlite> CREATE TABLE log_entry ( <snip> );
sqlite> .separator "\t"
sqlite> .import logfile.log log_entry
But nothing I try will get this to work from a bat file or python script.
I've been trying things like:
sqlite3 "database.db" .separator "\t" .import logfile.log log_entry
echo '.separator "\t" .import logfile.log log_entry' | sqlite3 database.db
Surely I can do this somehow?
Create a text file with the lines you want to enter into the sqlite command line program, like this:
CREATE TABLE log_entry ( );
.separator "\t"
.import logfile.log log_entry
and then just call sqlite3 database.db < commands.txt
Alternatively you can put everything in one shell script file (thus simplifying maintenance) using heredoc import.sh :
#!/bin/bash --
sqlite3 -batch $1 <<"EOF"
CREATE TABLE log_entry ( <snip> );
.separator "\t"
.import logfile.log log_entry
EOF
...and run it:
import.sh database.db
It makes it easier to maintain just one script file.
By the way, if you need to run it under Windows, Power Shell also features heredoc
In addition this approach helps to deal with lacking script parameter support. You can use bash variables:
#!/bin/bash --
table_name=log_entry
sqlite3 -batch $1 <<EOF
CREATE TABLE ${table_name} ( <snip> );
.separator "\t"
.import logfile.log ${table_name}
EOF
Or even do a trick like this:
#!/bin/bash --
table_name=$2
sqlite3 -batch $1 <<EOF
CREATE TABLE ${table_name} ( <snip> );
.separator "\t"
.import logfile.log ${table_name}
EOF
...and run it: import.sh database.db log_entry
Create a separate text file containing all the commands you would normally type into the sqlite3 shell app:
CREATE TABLE log_entry ( <snip> );
.separator "\t"
.import /path/to/logfile.log log_entry
Save it as, say, impscript.sql.
Create a batch file which calls the sqlite3 shell with that script:
sqlite3.exe yourdatabase.db < /path/to/impscript.sql
Call the batch file.
On a side note - when importing, make sure to wrap the INSERTs in a transaction! That will give you an instant 10.000% speedup.
I just recently had a similar problem while converting Firefox' cookies.sqlite to a text file (for some downloading tool) and stumbled across this question.
I wanted to do that with a single shell line and that would be my solution applied to the above mentioned problem:
echo -e ".mode tabs\n.import logfile.log log_entry" | sqlite3 database.db
But I haven't tested that line yet. But it worked fine with the Firefox problem I mentioned above (btw via Bash on Mac OSX ):
echo -e ".mode tabs\nselect host, case when host glob '.*' then 'TRUE' else 'FALSE' end, path, case when isSecure then 'TRUE' else 'FALSE' end, expiry, name, value from moz_cookies;" | sqlite3 cookies.sqlite
sqlite3 abc.db ".read scriptname.sql"
At this point, I'm not sure what else I can add other than, I had some trouble adding a unix environment variable to the bash script suggested by nad2000.
running this:
bash dbmake.sh database.db <(sed '1d' $DATA/logfile.log | head -n 1000)
I needed to import from stdin as workaround and I found this solution:
sqlite3 $1 <<"EOF"
CREATE TABLE log_entry;
EOF
sqlite3 -separator $'\t' $1 ".import $2 log_entry"
By adding the second sqlite3 line, I was able to pass the $2 from Unix into the file parameter for .import, full path and everything.
On Windows, this should work:
(echo CREATE TABLE log_entry ( <snip> ); & echo .separator "\t" & echo .import logfile.log log_entry) | sqlite3.exe database.db
I haven't tested this particular command but from my own pursuit of solving this issue of piping multiple commands I found that the key was to enclose the echoed commands within parentheses. That being said, it is possible that you may need to tweak the above command to also escape some of those characters. For example:
(echo CREATE TABLE log_entry ^( ^<snip^> ^); & echo .separator "\t" & echo .import logfile.log log_entry) | sqlite3.exe database.db
I'm not sure if the escaping is needed in this case, but it is highly probable since the parentheses may conflict with the enclosing ones, then the "less than" and "greater than" symbols are usually interpreted as input or output which may also conflict. An extensive list of characters' escape may be found here: http://www.robvanderwoude.com/escapechars.php
here trans is table name and trans.csv is a csv file in which i have 1959 rows of data
$ sqlite3 abc.db ".separator ','"
$ sqlite3 abc.db ".import 'trans.csv' trans"
$ sqlite3 abc.db "select count(*) from trans;"
1959
but its impossible to write like as you wrote

Resources