I want to investigate how to access SQLite DB from SAS. What's the easiest way of doing this? Is there a SAS product that we can license to do that? I don't want to use ODBC drivers as that seems to have been written a long time ago and is not officially part of SQLite.
SAS supports reading data from pipes (in a unix environment). Essentially, you can set up a filename statement to execute an sqlite command in the host environment, then process the command output as if reading it from a text file.
SAS Support page: http://support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/viewer.htm#pipe.htm
Example:
*----------------------------------------------
* (1) Write a command in place of the file path
* --> important: the 'pipe' option makes this work
*----------------------------------------------;
filename QUERY pipe 'sqlite3 database_file "select * from table_name"';
*----------------------------------------------
* (2) Use a datastep to read the output from sqlite
*----------------------------------------------;
options linesize=max; *to prevent truncation of results;
data table_name;
infile QUERY delimiter='|' missover dsd lrecl=32767;
length
numeric_id 8
numeric_field 8
character_field_1 $40
character_field_2 $20
wide_character_field $500
;
input
numeric_id
numeric_field $
character_field_1 $
character_field_2 $
wide_character_field $
;
run;
*----------------------------------------------
* (3) View the results, process data etc.
*----------------------------------------------;
proc contents;
proc means;
proc print;
run;
Related
I have a sqoop job ran via oozie coordinator. After a major upgrade we can no longer use hive cli and were told to use beeline. I'm not sure how to do this? Here is the current process:
I have a hive file: hive_ddl.hql
use schema_name;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
SET mapreduce.map.memory.mb=16384;
SET mapreduce.map.java.opts=-Xmx16G;
SET hive.exec.compress.output=true;
SET mapreduce.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
drop table if exists 'table_name_stg' purge;
create external table if not exists 'table_name_stg'
(
col1 string,
col2 string,
...
)
row format delimited
fields terminated by '\001'
stored as textfile
location 'my/location/table_name_stg';
drop table if exists 'table_name' purge;
create table if not exists 'table_name'
stored as parquet
tblproperties('parquet.compress'='snappy') as
select * from schema.tablename_stg
drop table if exists 'table_name_stg' purge;
This is pretty straight forward, make a stage table, then use that to make the final table stuff...
it's then called in a .sh file as such:
hive cli -f $HOME/my/path/hive_ddl.hql
I'm new to most of this and not sure what beeline is, and couldn't find any examples of how to use it to accomplish the same thing my hivecli is. I'm hoping it's as simple as calling the hive_ddl.hql file differently, versus having to rewrite everything.
Any help is greatly appreciated.
Beeline is a command line shell supported in hive. In your case you can replace hive cli with a beeline command in the same .sh file. Would look roughly like the one given below.
beeline -u hiveJDBCUrl and -f test.hql
You can explore more about the beeline command options by going to the below link
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline%E2%80%93CommandLineShell
I'm working on writing a batch file to pull data from multiple SQLite3 databases. I need to set 2 System environment variables then use them two environment variables within my SELECT statement.As this will run against multiple databases that are the same schema, but for different site locations. Currently 14 differen locations. What I have come up with so far is my main queryrun.bat file:
#echo off
setx /m DateStart '20200101'
setx /m DateEnd '20200103'
sqlite3 SiteID40.db < query40.dat
TIMEOUT /t 3
sqlite3 SiteID41.db < query41.dat
Then the query#.dat files look like this:
.mode list
.separator ,
.output results40.csv
SELECT * FROM FilesActive WHERE CompactDate >= %DateStart% and CompactDate <= %DateEnd%;
.quit
The %DateStart% and %DateEnd% is where I am wanting to insert the system variables of the same name. I have tried using $DateStart and %DateStart% both with no luck.
Any constructive help is appreciated. I must RE-ITERATE though that this is SQLite3 NOT MS SQL Server or mySQL.
Using the .param command in sqlite, something like this should work:
Change the .bat call from sqlite3 SiteID40.db < query40.dat to
sqlite3 SiteID40.db ".param set :DateStart %DateStart%" ".param set :DateEnd %DateEnd%" ".read query.dat"
Change the query#.dat file SELECTs to:
SELECT * FROM FilesActive WHERE CompactDate >= :DateStart and CompactDate <= :DateEnd;
I am discovering writing functions in TCL for Sqlite (https://github.com/pawelsalawa/sqlitestudio/wiki/ScriptingTcl).
I wanted to play a basic exemple found in the official page of sqlite(http://sqlite.org/tclsqlite.html):
db eval {SELECT * FROM MyTable ORDER BY MyID} values {
parray values
puts ""
}
I get the following error:
Error while requesting the database « -- » : invalid command name "parray"
Help is very welcome :)
SqliteStudio does not seem to fully initialise Tcl, as you would expect it from a non-embedded installation:
Using external Tcl packages or modules is not possible, because Tcl
interpreters are not initialized with "init.tcl".
See Wiki.
Background
Standard Tcl sources init.tcl, early as part of an Tcl interpreter's initialisation. init.tcl, in turn, registers a number of Tcl procs for autoloading. parray is one of those lazily acquired procs.
Ways forward
I am not familiar with SqliteStudio. Why not stick with sqlite's standard Tcl frontend, which gives you full Tcl and comes with Tcl distributions free house? But this certainly depends on your requirements.
That said, you could attempt to force-load init.tcl in SqliteStudio's embedded Tcl, but I don't know (and can't test) whether the distribution has not pruned these scripts or whether they were effectively relocated. From the top of my head (untested):
source [file join $tcl_library init.tcl]
# ...
db eval {SELECT * FROM MyTable ORDER BY MyID} values {
parray values
puts ""
}
Background: I am developing a rscript that pulls data from a mysql database, performs a logistic regression and then inserts the predictions back into the database. I want the entire system to be self contained in the script in case of database failure. This includes all mysql stored procedures that the script depends on to aggregate the data on the backend since these would be deleted in such a database failure.
Question: I'm having trouble creating a stored procedure from an R script. I am running the following:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
DROP PROCEDURE IF EXISTS Test.Tester;
DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END //
DELIMITER ;
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)
I however get the following error that seems to involve the DELIMITER change.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
EN' at line 2
What I've Done: I have spent quite a bit of time searching for the answer, but have come up with nothing. What am I missing?
Just wanted to follow up on this string of comments. Thank you for your thoughts on this issue. I have a couple Python scripts that need to have this functionality and I began researching the same topic for Python. I found this question that indicates the answer. The question states:
"The DELIMITER command is a MySQL shell client builtin, and it's recognized only by that program (and MySQL Query Browser). It's not necessary to use DELIMITER if you execute SQL statements directly through an API.
The purpose of DELIMITER is to help you avoid ambiguity about the termination of the CREATE FUNCTION statement, when the statement itself can contain semicolon characters. This is important in the shell client, where by default a semicolon terminates an SQL statement. You need to set the statement terminator to some other character in order to submit the body of a function (or trigger or procedure)."
Hence the following code will run in R:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)
new post :
i already read tutorial and i found this script
.LOGMECH LDAP;
.LOGON xx.xx.xx.xx/username,password;
.LOGTABLE dbname.LOG_tablename;
DATABASE dbname;
.BEGIN EXPORT SESSIONS 2;
.EXPORT OUTFILE D:\test.txt
MODE RECORD format text;
select a.my_date,b.name2,a.value from dbsource.tablesource a
inner join dbname.ANG_tablename b
on a.name1=b.name2
where value=59000
and a.my_date >= 01/12/2015
;
.END EXPORT;
.LOGOFF;
but it is like not working
D:\>bteq < dodol.txt
BTEQ 15.00.00.00 Tue Jan 05 14:40:52 2016 PID: 4452
+---------+---------+---------+---------+---------+---------+---------+----
.LOGMECH LDAP;
+---------+---------+---------+---------+---------+---------+---------+----
.LOGON xx.xx.xx.xx/username,
*** Logon successfully completed.
*** Teradata Database Release is 13.10.07.12
*** Teradata Database Version is 13.10.07.12
*** Transaction Semantics are BTET.
*** Session Character Set Name is 'ASCII'.
*** Total elapsed time was 4 seconds.
+---------+---------+---------+---------+---------+---------+---------+----
.LOGTABLE dbname.LOG_tablename;
*** Error: Unrecognized command 'LOGTABLE'.
+---------+---------+---------+---------+---------+---------+---------+----
DATABASE dbname;
*** New default database accepted.
*** Total elapsed time was 2 seconds.
+---------+---------+---------+---------+---------+---------+---------+----
.BEGIN EXPORT SESSIONS 2;
*** Error: Unrecognized command 'BEGIN'.
+---------+---------+---------+---------+---------+---------+---------+----
.EXPORT OUTFILE D:\test.txt
*** Warning: No data format given. Assuming REPORT carries over.
*** Error: Expected FILE or DDNAME keyword, not 'OUTFILE'.
+---------+---------+---------+---------+---------+---------+---------+----
MODE RECORD format text;
MODE RECORD format text;
$
*** Failure 3706 Syntax error: expected something between the beginning of
the request and the 'MODE' keyword.
Statement# 2, Info =6
*** Total elapsed time was 1 second.
+---------+---------+---------+---------+---------+---------+---------+----
select a.my_date,b.name2,a.value from dbsource.tablesource a
inner join dbname.ANG_tablename b
on a.name1=b.name2
where value=59000
and a.my_date >= 01/12/2015
;
old post :
I am new in teradata, i have found mload to upload big data, now i have question, is there option to use cmd ( win7 ) to export data from teradata to xxx.txt
--- sample
select a.data1,b.data2,a.data3 from room1.REPORT_DAILY a
inner join room1.andaikan_saja b
on a.likeme=b.data2
where revenue=30000
and content_id like '%super%'
and a.trx_date >= 01/12/2015
;
this is my mload up.txt
.LOGMECH LDAP;
.LOGON xx.xx.xx.xx/username,mypassword;
.LOGTABLE mydatabase.LOG_my_table;
SET QUERY_BAND = 'ApplicationName=TD-Subscriber-RechargeLoad; Version=01.00.00.00;' FOR SESSION;
.BEGIN IMPORT MLOAD
TABLES mydatabase.my_table
WORKTABLES mydatabase.WT_my_table
ERRORTABLES mydatabase.ET_my_table mydatabase.UV_my_table;
.LAYOUT LAYOUT_DATA INDICATORS;
.FIELD number * VARCHAR(20);
.DML LABEL DML_INSERT;
INSERT INTO mydatabase.my_table
(
number =:number
);
.IMPORT INFILE "D:\folderdata\data.txt"
LAYOUT LAYOUT_DATA
FORMAT VARTEXT
APPLY DML_INSERT;
.END MLOAD;
.LOGOFF &SYSRC;
i need solution to export file to my laptop, just like my script that i put ---sample title ....
i use that script from teradasql, and i am search for cmd script
If it's just a few MB and an adhoc export you can use SQL Assistant: Set the delimiter in Tools-Options-Export/Import, maybe modify the settings in Tools-Options-Export and then click File-Export Results before submitting your Select. (Similar in TD Studio)
Otherwise the easiest way to extract data in a readable delimited format is TPT, either Export for large amounts of data (GBs) or SQL Selector (MBs). TPT is available for most Operating Systems including Windows.
There's a nice User Guide with lots of example scripts:
Job Example 12: Extracting Rows and Sending Them in Delimited Format
In your case you'll define a generic template file like this:
DEFINE JOB EXPORT_DELIMITED_FILE
DESCRIPTION 'Export rows from a Teradata table to a delimited file'
(
APPLY TO OPERATOR ($FILE_WRITER() ATTR (Format = 'DELIMITED'))
SELECT * FROM OPERATOR ($SELECTOR ATTR (SelectStmt = #ExportSelectStmt));
);
Change $SELECTOR to $EXPORT for larger exports.
Then you just need a job variable file like this:
SourceTdpId = 'your system'
,SourceUserName = 'your user'
,SourceUserPassword = 'your password'
,FileWriterFileName = 'xxx.txt'
,ExportSelectStmt = 'select a.data1,b.data2,a.data3 from room1.REPORT_DAILY a
inner join room1.andaikan_saja b
on a.likeme=b.data2
where revenue=30000
and content_id like ''%super%''
and a.trx_date >= DATE ''2015-12-01'' -- modified this to a valid date literal
;'
The only bad part is that you have to double any single quotes within your select, e.g. '%super%' -> ''%super%''.
Finally you run a cmd:
tbuild -f your_template_file -v your_job_var_file
Depending on the volume of data you wish to extract from Teradata you can use Teradata BTEQ or the Teradata Parallel Transport (TPT) utility with the EXPORT operator from the command line to extract the data.
The TPT utility is the eventual replacement for the legacy Teradata Load and Unload utilities (FastLoad, MultiLoad, FastExport, and TPump) and provides an easier mechanism to produce delimited flat files over FastExport. TPT is fairly flexible and effective for exporting large volumes of data to channel or network attached clients.
Teradata BTEQ can perform lightweight load and unload functions. The BTEQ manual is pretty good at providing you an overview of how to use the various commands to produce a semi-structured report or data extract. It doesn't have a simple command to produce a delimited flat file. If you review the manual's overview of the EXPORT command you should get a good feel for how BTEQ behaves when working with channel or network attached clients.