I'm loading CSV to sqlite db like this:
sqlite3 /path/to/output.db < /path/to/sqlite_commands.sql
The sqlite command file looks like this:
sqlite_commands.sql
CREATE TABLE products (
"c1" TEXT PRIMARY KEY NOT NULL,
"c2" TEXT,
"c3" TEXT
);
.mode csv
.import /tmp/csv_with_dups.csv products
and the csv looks like
/tmp/csv_with_dups.csv
c1,c2,c3
a,b,c
b,c,d
c,d,e
d,e,f
a,a,b
e,f,g
I am getting errors to stderr
/tmp/csv_with_dups.csv.tmp:6: INSERT failed: UNIQUE constraint failed: products.c1
I want to silence this error as we know that some csv have duplicates (the csv is generated by seperate mechanism on very large data set can can not validate duplicates at that stage)
I've tried adding this line per the documentation
.log off
also tried
.log stderr|off
also tried
.log stderr off
sqlite3
.help
...
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
...
The "INSERT failed" message is always printed to stderr.
You could ignore stderr, but that would also suppress all other error messages:
sqlite3 ... 2>/dev/null
Alternatively, generate the SQL commands yourself so that you can use INSERT OR IGNORE:
import sys
import csv
def quote_sql_str(s):
return "'" + s.replace("'", "''") + "'"
print('BEGIN;')
with open(sys.argv[1], 'rb') as file:
for row in csv.reader(file):
print('INSERT OR IGNORE INTO products VALUES({});'
.format(','.join([quote_sql_str(s) for s in row])))
print('COMMIT;')
python script.py csv_with_dups.csv | sqlite3 /path/to/output.db
Alternatively, import into a temporary table without constraints, then copy into the real table with INSERT OR IGNORE.
Related
I am using DataGrip or SQLiteStudio (database managers) to run a series of queries in a database which guide me to find the information that I require. The queries works well and the results are shown in the console of the Dabase Manager. However, I need to export the results that appears in the database manager console into a CVS file.
I have seen everybody works directly in the shell, but I need (I have to) to use a DB manager to run the queries (so far the queries that I need to run in one step are about 600 lines).
In the sqlite3 shell I am able to run (and works)
(.headers on)
(.mode csv)
(.output C:/filename.csv)
(select * from "6000_1000_Results";)
(.output stdout)
However, running this code in the sql editor of the DB manager, doesn´t work at all.
--(.....)
--(around 600 lines before)
--(.....)
"Material ID",
"Material Name",
SUM("Quantity of Material") Quantity
FROM
"6000_1000_Results_Temp"
GROUP BY
"DataCenterID", "Material ID";
------------------------------------------------------------
--(HERE IS WHERE I NEED TO EXPORT THE RESULTS IN A CVS FILE)
------------------------------------------------------------
.headers on
.mode csv
.output C:/NextCloudLuis/TemproDB.git/csvtest.csv
select * from "6000_1000_Results";
.output stdout
.show
DROP TABLE IF EXISTS "6000_1000_Results_Temp";
DROP TABLE IF EXISTS "6000_1000_Results";
Datagrip do not show any error, it runs the queries in a few seconds, but there´s no file anywhere, SQLiteStudio gives a syntax error.
Finally I solve this issue in the next steps:
I run over python all the queries using the lybrary sqlite3. Then the result of all the queries are saved in a pandas dataframe. Then the pandas dataframe is exported to a cvs and xlxs file.
Here is the python code:
import sqlite3
import queries
conn = sqlite3.connect("tempro.db") #make the database connection to python
level4000 = queries.level4000to1000(conn) #I call the function in queries.py
level4000.to_csv('Level4000to1000.csv') #export result ro cvs
level4000.to_excel('Level4000to1000.xlsx') #export result ro xlsx
conn.close()
and here is the python file where I save all the queries (queries.py)
import sqlite3
from sqlite3 import Error
import pandas as pd
def level4000to1000(conn):
cur = conn.cursor()
cur.executescript(
"""
/* Here I put all the 600 lines of queries */
DROP TABLE IF EXISTS "4000_1000_Results_Temp";
DROP TABLE IF EXISTS "4000_1000_Results";
/* Here more and more lines */
--To keep the results from all queries
CREATE TABLE "4000_1000_Results_Temp" (
"DeviceID" INTEGER,
"Device Name" TEXT,
SUM("Quantity of Material") Quantity
FROM
"4000_1000_Results_Temp"
GROUP BY
"DeviceID", "Material ID";
""")
df = pd.read_sql_query('''SELECT * FROM "4000_1000_Results";''', conn)
cur.executescript("""DROP TABLE IF EXISTS "4000_1000_Results_Temp";
DROP TABLE IF EXISTS "4000_1000_Results";""")
return df #returns a dataframe with the info results from the queries
In the end, it seems that there is no way to export results to file formats as cvs using SQL coding.
I have a text file with .sql extension which contains code for building table and population values into columns.
How would I generate a .db sqlite file out of that
Assuming you are using SQLITE via a command the use the .read FILENAME to run the SQL you'd then use the .save FILENAME command to save the database (including the .db extension), alternately before using the .read command you could use .open FILENAME)
e.g. Using the file C:/User/Mike/mysql.sql as :-
CREATE TABLE IF NOT EXISTS mytable (id INTEGER PRIMARY KEY, mydata TEXT);
INSERT INTO mytable (mydata) VALUES('Fred'),('Mary'),('Sue'),('Tom');
SELECT * FROM mytable;
Starting a command window and then :-
Microsoft Windows [Version 10.0.17134.407]
(c) 2018 Microsoft Corporation. All rights reserved.
C:\Users\Mike>SQLITE3
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .read mysql.sql
1|Fred
2|Mary
3|Sue
4|Tom
sqlite>
.read mysql.sql being manually input after issuing the SQLITE3 command.
PS .help results in :-
sqlite> .help
.auth ON|OFF Show authorizer callbacks
.backup ?DB? FILE Backup DB (default "main") to FILE
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.clone NEWDB Clone data into NEWDB from the existing database
.databases List names and files of attached databases
.dbinfo ?DB? Show status information about the database
.dump ?TABLE? ... Dump the database in an SQL text format
If TABLE specified, only dump tables matching
LIKE pattern TABLE.
.echo on|off Turn command echo on or off
.eqp on|off|full Enable or disable automatic EXPLAIN QUERY PLAN
.excel Display the output of next command in a spreadsheet
.exit Exit this program
.expert EXPERIMENTAL. Suggest indexes for specified queries
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help Show this message
.import FILE TABLE Import data from FILE into TABLE
.imposter INDEX TABLE Create imposter table TABLE on index INDEX
.indexes ?TABLE? Show names of all indexes
If TABLE specified, only show indexes for tables
matching LIKE pattern TABLE.
.limit ?LIMIT? ?VAL? Display or change the value of an SQLITE_LIMIT
.lint OPTIONS Report potential schema issues. Options:
fkey-indexes Find missing foreign key indexes
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.mode MODE ?TABLE? Set output mode where MODE is one of:
ascii Columns/rows delimited by 0x1F and 0x1E
csv Comma-separated values
column Left-aligned columns. (See .width)
html HTML <table> code
insert SQL insert statements for TABLE
line One value per line
list Values delimited by "|"
quote Escape answers as for SQL
tabs Tab-separated values
tcl TCL list elements
.nullvalue STRING Use STRING in place of NULL values
.once (-e|-x|FILE) Output for the next SQL command only to FILE
or invoke system text editor (-e) or spreadsheet (-x)
on the output.
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
The --new option starts with an empty file
.output ?FILE? Send output to FILE or stdout
.print STRING... Print literal STRING
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILENAME Execute SQL in FILENAME
.restore ?DB? FILE Restore content of DB (default "main") from FILE
.save FILE Write in-memory database into FILE
.scanstats on|off Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN? Show the CREATE statements matching PATTERN
Add --indent for pretty-printing
.selftest ?--init? Run tests defined in the SELFTEST table
.separator COL ?ROW? Change the column separator and optionally the row
separator for both the output mode and .import
.sha3sum ?OPTIONS...? Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.stats ?on|off? Show stats or turn stats on or off
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables
If TABLE specified, only list tables matching
LIKE pattern TABLE.
.testcase NAME Begin redirecting output to 'testcase-out.txt'
.timeout MS Try opening locked tables for MS milliseconds
.timer on|off Turn SQL timer on or off
.trace FILE|off Output each SQL statement as it is run
.vfsinfo ?AUX? Information about the top-level VFS
.vfslist List all available VFSes
.vfsname ?AUX? Print the name of the VFS stack
.width NUM1 NUM2 ... Set column widths for "column" mode
Negative values right-justify
sqlite>
The entire process (less creating the file mysql.sql) :-
C:\Users\Mike>dir
Volume in drive C has no label.
Volume Serial Number is 14E1-AC1D
Directory of C:\Users\Mike
19/11/2018 11:37 AM <DIR> .
19/11/2018 11:37 AM <DIR> ..
14/11/2018 07:48 PM <DIR> Links
14/11/2018 07:48 PM <DIR> Music
19/11/2018 11:26 AM 168 mysql.sql
21/08/2017 06:02 PM <DIR> Nero
10 File(s) 82,031 bytes
34 Dir(s) 149,798,195,200 bytes free
C:\Users\Mike>SQLITE3
SQLite version 3.22.0 2018-01-22 18:45:57
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open mydb.db
sqlite> .read mysql.sql
1|Fred
2|Mary
3|Sue
4|Tom
sqlite>
C:\Users\Mike>dir
Volume in drive C has no label.
Volume Serial Number is 14E1-AC1D
Directory of C:\Users\Mike
19/11/2018 11:39 AM <DIR> .
19/11/2018 11:39 AM <DIR> ..
14/11/2018 07:48 PM <DIR> Music
19/11/2018 11:39 AM 12,288 mydb.db
19/11/2018 11:26 AM 168 mysql.sql
21/08/2017 06:02 PM <DIR> Nero
11 File(s) 94,319 bytes
34 Dir(s) 149,797,101,568 bytes free
new post :
i already read tutorial and i found this script
.LOGMECH LDAP;
.LOGON xx.xx.xx.xx/username,password;
.LOGTABLE dbname.LOG_tablename;
DATABASE dbname;
.BEGIN EXPORT SESSIONS 2;
.EXPORT OUTFILE D:\test.txt
MODE RECORD format text;
select a.my_date,b.name2,a.value from dbsource.tablesource a
inner join dbname.ANG_tablename b
on a.name1=b.name2
where value=59000
and a.my_date >= 01/12/2015
;
.END EXPORT;
.LOGOFF;
but it is like not working
D:\>bteq < dodol.txt
BTEQ 15.00.00.00 Tue Jan 05 14:40:52 2016 PID: 4452
+---------+---------+---------+---------+---------+---------+---------+----
.LOGMECH LDAP;
+---------+---------+---------+---------+---------+---------+---------+----
.LOGON xx.xx.xx.xx/username,
*** Logon successfully completed.
*** Teradata Database Release is 13.10.07.12
*** Teradata Database Version is 13.10.07.12
*** Transaction Semantics are BTET.
*** Session Character Set Name is 'ASCII'.
*** Total elapsed time was 4 seconds.
+---------+---------+---------+---------+---------+---------+---------+----
.LOGTABLE dbname.LOG_tablename;
*** Error: Unrecognized command 'LOGTABLE'.
+---------+---------+---------+---------+---------+---------+---------+----
DATABASE dbname;
*** New default database accepted.
*** Total elapsed time was 2 seconds.
+---------+---------+---------+---------+---------+---------+---------+----
.BEGIN EXPORT SESSIONS 2;
*** Error: Unrecognized command 'BEGIN'.
+---------+---------+---------+---------+---------+---------+---------+----
.EXPORT OUTFILE D:\test.txt
*** Warning: No data format given. Assuming REPORT carries over.
*** Error: Expected FILE or DDNAME keyword, not 'OUTFILE'.
+---------+---------+---------+---------+---------+---------+---------+----
MODE RECORD format text;
MODE RECORD format text;
$
*** Failure 3706 Syntax error: expected something between the beginning of
the request and the 'MODE' keyword.
Statement# 2, Info =6
*** Total elapsed time was 1 second.
+---------+---------+---------+---------+---------+---------+---------+----
select a.my_date,b.name2,a.value from dbsource.tablesource a
inner join dbname.ANG_tablename b
on a.name1=b.name2
where value=59000
and a.my_date >= 01/12/2015
;
old post :
I am new in teradata, i have found mload to upload big data, now i have question, is there option to use cmd ( win7 ) to export data from teradata to xxx.txt
--- sample
select a.data1,b.data2,a.data3 from room1.REPORT_DAILY a
inner join room1.andaikan_saja b
on a.likeme=b.data2
where revenue=30000
and content_id like '%super%'
and a.trx_date >= 01/12/2015
;
this is my mload up.txt
.LOGMECH LDAP;
.LOGON xx.xx.xx.xx/username,mypassword;
.LOGTABLE mydatabase.LOG_my_table;
SET QUERY_BAND = 'ApplicationName=TD-Subscriber-RechargeLoad; Version=01.00.00.00;' FOR SESSION;
.BEGIN IMPORT MLOAD
TABLES mydatabase.my_table
WORKTABLES mydatabase.WT_my_table
ERRORTABLES mydatabase.ET_my_table mydatabase.UV_my_table;
.LAYOUT LAYOUT_DATA INDICATORS;
.FIELD number * VARCHAR(20);
.DML LABEL DML_INSERT;
INSERT INTO mydatabase.my_table
(
number =:number
);
.IMPORT INFILE "D:\folderdata\data.txt"
LAYOUT LAYOUT_DATA
FORMAT VARTEXT
APPLY DML_INSERT;
.END MLOAD;
.LOGOFF &SYSRC;
i need solution to export file to my laptop, just like my script that i put ---sample title ....
i use that script from teradasql, and i am search for cmd script
If it's just a few MB and an adhoc export you can use SQL Assistant: Set the delimiter in Tools-Options-Export/Import, maybe modify the settings in Tools-Options-Export and then click File-Export Results before submitting your Select. (Similar in TD Studio)
Otherwise the easiest way to extract data in a readable delimited format is TPT, either Export for large amounts of data (GBs) or SQL Selector (MBs). TPT is available for most Operating Systems including Windows.
There's a nice User Guide with lots of example scripts:
Job Example 12: Extracting Rows and Sending Them in Delimited Format
In your case you'll define a generic template file like this:
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a delimited file'
(
APPLY TO OPERATOR ($FILE_WRITER() ATTR (Format = 'DELIMITED'))
SELECT * FROM OPERATOR ($SELECTOR ATTR (SelectStmt = #ExportSelectStmt));
);
Change $SELECTOR to $EXPORT for larger exports.
Then you just need a job variable file like this:
SourceTdpId = 'your system'
,SourceUserName = 'your user'
,SourceUserPassword = 'your password'
,FileWriterFileName = 'xxx.txt'
,ExportSelectStmt = 'select a.data1,b.data2,a.data3 from room1.REPORT_DAILY a
inner join room1.andaikan_saja b
on a.likeme=b.data2
where revenue=30000
and content_id like ''%super%''
and a.trx_date >= DATE ''2015-12-01'' -- modified this to a valid date literal
;'
The only bad part is that you have to double any single quotes within your select, e.g. '%super%' -> ''%super%''.
Finally you run a cmd:
tbuild -f your_template_file -v your_job_var_file
Depending on the volume of data you wish to extract from Teradata you can use Teradata BTEQ or the Teradata Parallel Transport (TPT) utility with the EXPORT operator from the command line to extract the data.
The TPT utility is the eventual replacement for the legacy Teradata Load and Unload utilities (FastLoad, MultiLoad, FastExport, and TPump) and provides an easier mechanism to produce delimited flat files over FastExport. TPT is fairly flexible and effective for exporting large volumes of data to channel or network attached clients.
Teradata BTEQ can perform lightweight load and unload functions. The BTEQ manual is pretty good at providing you an overview of how to use the various commands to produce a semi-structured report or data extract. It doesn't have a simple command to produce a delimited flat file. If you review the manual's overview of the EXPORT command you should get a good feel for how BTEQ behaves when working with channel or network attached clients.
I have following problem: I have a .csv file with data (around 30mb). I like to load content of that file to my database more specific to my IPBlock table which look like this:
startIP: Int
endIP: Int
LocationID: Int
and content of a file looks like that:
"16777216","16777471","17"
"16777472","16778239","49"
"16778240","16778495","14409"
I try to execute this query:
LOAD DATA LOCAL INFILE 'C:\Users\Molu\Desktop\GeoLiteCity_20131203\test.csv'
INTO TABLE IPBlock
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(startIP , endIP, LocationID);
and I got following errors
The LOAD DATA SQL construct or statement is not supported.
and
Error Source:".Net sqlClient Data Provider" Error message "Incorrect syntax near LOCAL"
I already try version with double"\" like: C:\\Users\\Molu and with and without "LOCAL" key-word (here only difference is that error message is: "Incorrect syntax near INFILE" )
Do you have any ideas ? Thanks in advance.
There is no LOAD DATA LOCAL INFILE in SQLServer; it's present in MySQL. You should be rather using bcp (Bulk Copy) utility to do the same.
See here on how to use the same
https://msdn.microsoft.com/en-us/library/ms188365.aspx
(OR)
Use Bulk insert like this way
BULK
INSERT IPBlock
FROM 'C:\Users\Molu\Desktop\GeoLiteCity_20131203\test.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Is it possible to execute multiple queries at the same time in impala ? If yes, how does impala handle it?
I would certainly do some tests on your own, but I was not able to get multiple queries to execute:
I was using Impala connection, and reading query from a .sql file. This works for single commands.
from impala.dbapi import connect
# actual server and port changed for this post for security
conn=connect(host='impala server', port=11111,auth_mechanism="GSSAPI")
cursor = conn.cursor()
cursor.execute((open("sandbox/z_temp.sql").read()))
This is the error I received.
HiveServer2Error: AnalysisException: Syntax error in line 2:
This is what the SQL looked like in the .sql file.
Select * FROM database1.table1;
Select * FROM database1.table2;
I was able to run multiple commands with the SQL commands in separate .sql files iterating over all .sql files in a specified folder.
#Create list of file names for recon .sql files this will be sorted
#Numbers at begining of filename are important to sort so that files will be executed in correct order
file_names = glob.glob('folder/.sql')
asc_names = sorted(file_names, reverse = False)
filename = ""
for file_name in asc_names:
str_filename = str(file_name)
print(filename)
query = (open(str_filename).read())
cursor = conn.cursor()
# creates an error log dataframe to print, or write to file at end of job.
try:
# Each SQL command must be executed seperately
cursor.execute(query)
df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'PASS'}])
df_log = df_log.append(df_id, ignore_index=True)
except:
df_id= pd.DataFrame([{'test_name': str_filename[-40:], 'test_status': 'FAIL'}])
df_log = df_log.append(df_id, ignore_index=True)
continue
Another way to do this would be to have all of the SQL statements in one .sql file separated by ; then loop thru the .sql file splitting statements out by ; running one at a time.
from impala.dbapi import connect
from impala.util import as_pandas
conn=connect(host='impalaserver', port=11111, auth_mechanism='GSSAPI')
cursor = conn.cursor()
# split SQL statements from one file seperated by ';', Note: last command will not have semicolon at end.
sql_file = open("sandbox/temp.sql").read()
sql = sql_file.split(';')
for cmd in sql:
# This gets rid of the non printing characters you may have
cmd = cmd.replace('/r','')
cmd = cmd.replace('/n','')
# This runs your SQL commands one at a time.
cursor.execute(cmd)
print(cmd)
Impala can execute multiple queries at the same time as long as it doesn't hit the memory cap.
You can issue a command like impala-shell -f <<file_name>>, where the file has multiple queries each complete query separated by a semi colon (;)
If you are a python geek, you can even try the impyla package to create multiple connections and run all your queries at once.
pip install impyla