I'm using Jupyter notebook in Watson studio to query data from db2 using ibm_db_dbi python library.
db2_pipeline_connection_connection = dbi.connect(db2_pipeline_connection_dsn)
query = 'SELECT count("SALE_ID") FROM "DS_DATA"."SALES"'
db2_data = pd.read_sql_query(query, con=db2_pipeline_connection_connection)
db2_pipeline_connection_connection.close()
result is 721152 which is true, but when I try below query
db2_pipeline_connection_connection = dbi.connect(db2_pipeline_connection_dsn)
query = 'SELECT * FROM "DS_DATA"."SALES"'
db2_data = pd.read_sql_query(query, con=db2_pipeline_connection_connection)
db2_pipeline_connection_connection.close()
it only result 270845 rows
How can I get the entire database using this method?
Related
The pandas equivalent code for connecting to Teradata, I have used is:
database = config.get('Teradata connection', 'database')
host = config.get('Teradata connection', 'host')
user = config.get('Teradata connection', 'user')
pwd = config.get('Teradata connection', 'pwd')
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{}".format(database, tables)
df = pd.read_sql_query(query1, connect)
Now, I need to use the Dask library for loading big data as an alternative to pandas.
Please suggest a method to connect the same with Teradata.
Teradata appears to have a sqlalchemy engine, so you should be able to install that, set your connection string appropriately and use Dask's existing from_sql function.
Alternatively, you could do this by hand: you need to decide on a set of conditions which will partition the data for you, each partition being small enough for your workers to handle. Then you can make a set of partitions and combine into a dataframe as follows
def get_part(condition):
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{} WHERE {}".format(database, tables, condition)
return pd.read_sql_query(query1, connect)
parts = [dask.delayed(get_part)(cond) for cond in conditions)
df = dd.from_delayed(parts)
(ideally, you can derive the meta= parameter for from_delayed beforehand, perhaps by getting the first 10 rows of the original query).
I have R linked to an Access database using the ODBC and DBI packages. I scoured the internet and couldn't find a way to write an update query, so I'm using the dbSendStatement function to update entries individually. Combined with a for loop, this effectively works like an update query with one snag - when I try to update any field in the database that is text I get an error that says "[Microsoft][ODBC Microsoft Access Driver] One of your parameters is invalid."
DBI::dbSendStatement(conn = dB.Connection, statement = paste("UPDATE DC_FIMs_BLDG_Lvl SET kWh_Rate_Type = ",dquote(BLDG.LVL.Details[i,5])," WHERE FIM_ID = ",BLDG.LVL.Details[i,1]," AND BUILDING_ID = ",BLDG.LVL.Details[i,2],";", sep = ""))
If it's easier, when pasted, the code reads like this:
DBI::dbSendStatement(conn = dB.Connection, statement = paste("UPDATE DC_FIMs_BLDG_Lvl SET kWh_Rate_Type = “Incremental” WHERE FIM_ID = 26242807 AND BUILDING_ID = 515;", sep = ""))
I'm using the odbc package to connect to a MS SQL Server
con <- dbConnect(odbc::odbc(),
Driver = "ODBC Driver 13 for SQL Server",
Server = "server",
Database = "database",
UID = "user",
PWD = "pass",
Port = 1111)
This server has many tables, so I'm using dbListTables(con) to search for the ones containing a certain substring. But once I find them I need to discover which schema they are in to be able to query them. I'm currently doing this manually (looking for the name of the table in each schema), but is there any way I can get the schema of all tables that match a string?
Consider running an SQL query with LIKE search using the built-in INFORMATION_SCHEMA metadata table if your user has sufficient privileges.
SELECT SCHEMA_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME LIKE '%some string%'
Call above with R odbc with a parameterized query on the wildcard search:
# PREPARED STATEMENT
strSQL <- paste("SELECT SCHEMA_NAME" ,
"FROM INFORMATION_SCHEMA.SCHEMATA",
"WHERE SCHEMA_NAME LIKE ?SEARCH")
# SAFELY INTERPOLATED QUERY
query <- sqlInterpolate(conn, strSQL, SEARCH = '%some string%')
# DATA FRAME BUILD FROM RESULTSET
schema_names_df <- dbGetQuery(conn, query)
I found a work around using the RODBC package:
library('RODBC')
# First connect to the DB
dbconn <- odbcDriverConnect("driver = {ODBC Driver xx for SQL Server};
server = server;
database = database;
uid = username;
pwd = password")
# Now fetch the DB tables
sqlTables(dbconn)
For my specific DB I get:
names(sqlTables(dbconn)
[1] "TABLE_CAT" "TABLE_SCHEM" "TABLE_NAME" "TABLE_TYPE" "REMARKS"
The following piece of code creates two databases:
import sqlite3
db = 'brazil'
conn = sqlite3.connect(db+'.db')
c = conn.cursor()
qCreate = """
CREATE TABLE states
(zip_id numeric NOT NULL,
state_name text NOT NULL,
CONSTRAINT pk_brazil
PRIMARY KEY (zip_id) """
c.execute(qCreate)
conn.commit()
conn.close()
db = 'city'
conn = sqlite3.connect(db+'.db')
c = conn.cursor()
qCreate = """CREATE TABLE rio_de_janeiro
(zip_id numeric NOT NULL,
beach_name text NOT NULL,
CONSTRAINT pk_rio
PRIMARY KEY (zip_id)
"""
c.execute(qCreate)
conn.commit()
conn.close()
The following piece of code attaches the database RIO to the database BRAZIL and prints all the databases (Rio and Brazil).
db = 'brazil'
conn = sqlite3.connect(db+'.db')
c = conn.cursor()
qCreate = """ATTACH DATABASE ? AS competition """
c.execute(qCreate, ('rio.db',))
c.execute("PRAGMA database_list")
data = c.fetchall()
print data
conn.commit()
conn.close()
However the following piece of code prints only Brazil database:
db = 'brazil'
conn = sqlite3.connect(db+'.db')
c = conn.cursor()
c.execute("PRAGMA database_list")
data = c.fetchall()
print data
conn.commit()
conn.close()
The attached database is no longer attached.
The sqlite3 documentation hints on these lines:
The ATTACH DATABASE statement adds another database file to the current database connection.
Do I have to attach the database every time?
I planed to use attached databases for schemas, but maybe I should try something else?
I am using python in Pythonista App in iOS
Almost all settings you can change in SQLite apply only to the current connection, i.e., are not saved in the database file.
So you have to re-ATTACH any databases whenever you have re-opened the main database.
Using attached databases makes sense only if you must use multiple database files due to some external constraint. In most cases, you should use only a single database.
SQLite does not have schemas. If you want to emulate them with attached databases, you have to live with the limitations of that approach.
I am writing data into a table belonging to another schema using the function sqlSave() from the package RODBC. The user of the other schema has issued an alias with the same name as the original table. My user has enough rights to write into the table. The database is Oracle 11g.
This is how I write:
sqlSave(channel, object, tablename = table, safer=TRUE, rownames = FALSE, append = TRUE, verbose = FALSE, nastring = NULL, fast = TRUE)
When I run the sqlSave() I get an error message from the Oracle DB. If i look at the SQL which R sends to the DB I see that R doubles the columns of the object I try to write. The SQL looks like so:
insert into table (column_A, column_B, column_A, column_B)
If the alias is removed and I use the schema as prefix to the table than I do not get any error message however R does not execute at all the query.
sqlSave(channel, object, tablename = schema.table, safer=TRUE, rownames = FALSE, append = TRUE, verbose = FALSE, nastring = NULL, fast = TRUE)
Then I get:
insert into table (column_A, column_B) values(?,?)
The only thing it worked till now is to assign to the table a different alias as the table Name. In that case I manage to write in the table.
I would very much appreciate if anybody can suggest a solution to my Problem.
Thanks in advance for your response