Context:
A python 3.6 script is updating a Sqlite database several times a day using sqlite3 module.
The database is ~500Mo, each update adds up ~250Ko.
Issue:
I deliver every updated versions of the database and would like to reduce the size of the data transferred. In other words, I would like to transfer only the updated content (through a kind of patch).
The sqldiff.exe utility program could be used for that, nevertheless, it requires to create a local copy of the database every time I update it.
Question:
Is there a way, using Python (through the DB-API 2.0 interface or using other ways in Python), to generate this kind of patch while updating the database?
First thoughts:
Wouldn't it be possible to write a patch (e.g. a list of actions to be done to update the database) based on the cursor before/while performing the commit?
import sqlite3
# Open database
conn = sqlite3.connect('mydb.db')
cur = conn.cursor()
# Insert/Update data
new_data = 3.14
cur.execute('INSERT INTO mytable VALUES (?)', (new_data,))
# KEEP TRACK & Save (commit) the changes
conn.dump_planned_actions() # ?????
conn.commit()
conn.close()
The following snippet shows the workaround I found.
It relies on the Sqlite3 method set_trace_callback to log all the SQL statements sent and executescript to apply these statements.
import sqlite3
class DBTraceCallbackHandler(object):
"""Class handling callbacks in order to log sql statements history."""
def __init__(self):
self.sql_statements = []
def instance_handler(self, event):
self.sql_statements.append(str(event))
def database_modification(cursor):
# user-defined
pass
def create_patch(db_path):
# Openning connection
conn = sqlite3.connect(db_path)
c = conn.cursor()
# Start tracing sql
callback_handler = DBTraceCallbackHandler()
conn.set_trace_callback(callback_handler.instance_handler)
# Modification of database
database_modification(c)
# End of modification of database
conn.commit()
c.close()
# Generating the patch - selecting sql statements that modify the db
idx_rm = []
for idx, sql_statement in enumerate(callback_handler.sql_statements):
if not any([sql_statement.startswith(kw) for kw in ['UPDATE', 'INSERT', 'CREATE']]):
idx_rm.append(idx)
for idx in sorted(idx_rm, reverse=True):
del callback_handler.sql_statements[idx]
return ';\n'.join(callback_handler.sql_statements) + ';\n'
def apply_patch(db_path, sql_script):
# Openning connection
conn = sqlite3.connect(db_path)
c = conn.cursor()
# Modification of database - apply sql script
c.executescript(sql_script)
# End of modification of database
conn.commit()
c.close()
Related
I wanted to 'Call' MariaDB Procedure from Azure Data Factory.
How can this be achieved, are there any other service which can be integrated with ADF to call this MariaDB procedures
I tried calling the procedure by writing the query using lookup activity.
It fails while showing this error.
ErrorCode=InvalidParameter,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The value of the property 'columns' is invalid: 'Value cannot be null.
Parameter name: columns'.,Source=,''Type=System.ArgumentNullException,Message=Value cannot be null.
Parameter name: columns,Source=Microsoft.DataTransfer.Common,'
Lookup activity reads and returns the content of the query. I tried to repro this by creating three stored procedures in Azure SQL database for Maria DB.
First Stored procedure is written to update the data in the table.
DELIMITER $$
CREATE PROCEDURE update_inventory()
BEGIN
UPDATE inventory SET quantity = 150
WHERE id = 1;
END$$
DELIMITER ;
When this procedure is called in ADF lookup activity, error occurs.
Second stored procedure is written with select query.
DELIMITER $$
CREATE PROCEDURE select_inventory()
BEGIN
select * from inventory;
END$$
DELIMITER ;
When this SP is called, ADF pipeline is executed successfully.
In order to execute the stored procedure with update statements (or any statements), a select statement is added in the Stored procedure.
DELIMITER $$
CREATE PROCEDURE update_select_inventory()
BEGIN
UPDATE inventory SET quantity = 150
WHERE id = 1;
select * from inventory;
END$$
DELIMITER ;
When this stored procedure is called through Lookup activity, it got executed successfully.
Try adding select statement in the stored procedure and execute it in Lookup activity. Or add Select statement after Call stored procedure statement.
By selecting the 'query' option, you can call the stored procedure using lookup activity. From your error message, it looks like you are missing the parameter columns while calling the stored procedure.
Did you try executing the same code using the client tools like MySQL workbench? If you can execute the stored proc from other client tools, then you should be able to execute the same using the lookup activity.
I tested from my end and was able to execute the Stored procedure using lookup activity. Please see the below screenshot for your reference.
Does anyone know how I can save the output of the “HELP VOLATILE TABLE” statement as a table that I can use in a query later on.
My goal is to save a list of all Volatile Tables that are currently present.
I tried to use the “HELP VOLATILE TABLE” in a CTE, but it doesn’t do the trick. It refuses to run. Any help is useful.
Update: It seems HELP/SHOW statements can only return data to the client. They can’t be used in a query.
It seems it is possible to write an external stored procedure in f.e. Java that FETCHES this data and INSERTS it into a Global Temporary Table.
My question is whether someone knows how to write said external stored procedure in JAVA, and knows how to import it and use it?
For those using SAS, this is easy to do.
PROC SQL NOPRINT;
CONNECT TO TERADATA ( &connect_string. );
SELECT *
FROM CONNECTION TO TERADATA
(HELP VOLATILE TABLE);
DISCONNECT FROM TERADATA;
QUIT;
For those using Python, the same can be done easily too.
import os
import teradatasql
import pandas as pd
with teradatasql.connect('{"host":"your_host_name"}', user="", password = "") as connect:
df = pd.read_sql("HELP VOLATILE TABLE", connect)
I have a Netezza SQL server I connect to using DBI::dbConnect. The server has multiple databases we will name db1 and db2.
I would like to use dbplyr as much as possible and skip having to write SQL code in RODBC::sqlQuery(), but I am not sure how to do the following:.
1) How to read a table in db1, work on it and have the server write the result into a table in db2 without going through my desktop?
2) How to do a left join between a table in db1 and another in db2 ?
It looks like there might be a way to connect to database ="SYSTEM" instead of database = "db1" or "db2", but I am not sure what a next step would be.
con <- dbConnect(odbc::odbc(),
driver = "NetezzaSQL",
database = "SYSTEM",
uid = Sys.getenv("netezza_username"),
pwd = Sys.getenv("netezza_password"),
server = "NETEZZA_SERVER",
port = 5480)
I work around this problem on SQL server using in_schema and dbExecute as follows. Assuming Netezza is not too different.
Part 1: shared connection
The first problem is to connect to both tables via the same connection. If we use a different connection then joining the two tables results in data being copied from one connection to the other which is very slow.
con <- dbConnect(...) # as required by your database
table_1 <- dplyr::tbl(con, from = dbplyr::in_schema("db1", "table_name_1"))
table_2 <- dplyr::tbl(con, from = dbplyr::in_schema("db2.schema2", "table_name_2"))
While in_schema is intended for passing schema names you can also use it for passing the database name (or both with a dot in between).
The following should now work without issue:
# check connection
head(table_1)
head(table_2)
# test join code
left_join(table_1, table_2, by = "id") %>% show_query()
# check left join
left_join(table_1, table_2, by = "id") %>% head()
Part 2: write to datebase
A remote table is defined by two things
The connection
The code of the current query (e.g. the result of show_query)
We can use these with dbExecute to write to the database. My example will be with SQL server (which uses INTO as the keyword, you'll have to adapt to your own environment if the sql syntax is different).
# optional, extract connection from table-to-save
con <- table_to_save$src$con
# SQL query
sql_query <- paste0("SELECT *\n",
"INTO db1.new_table \n", # the database and name you want to save
"FROM (\n",
dbplyr::sql_render(table_to_save),
"\n) subquery_alias")
# run query
dbExecute(con, as.character(sql_query))
The idea is to create a query that can be executed by the database that will write the new table. I have done this by treating the existing query as a subquery of the SELECT ... INTO ... FROM (...) subquery_alias pattern.
Notes:
If the sql query produced by show_query or sql_render would work when you access the database directly then the above should work (all that changes is the command is arriving via R instead of via the sql console).
The functions I have written to smooth this process for me can be found on here. They also include appending, deleting, compressing, indexing, and handling views.
Writing a table via dbExecute will error if the table already exists in the database, so I recommend checking for this first.
I use this work around in other places, but inserting the database name with in_schema has not worked for creating views. To create (or delete) a view I have to ensure the connection is to the database where I want the view.
I have been working with sqlite DB for some time but want to integrate my codes to web2py esp. DAL. How do I rewrite such a code to web2py DAL code?
name = input ('Please Type your Question: ').lower().split()
name2 = name[:]
import sqlite3
for item in name2:#break
conn = sqlite3.connect("foods.db")
cursor = conn.cursor()
cursor.execute("INSERT INTO INPUT33 (NAME) VALUES (?);", (name2,))
cursor.execute("select MAX(rowid) from [input33];")
conn.commit()
for rowid in cursor:break
for elem in rowid:
m = elem
print(m)
cursor.execute("DELETE FROM INPUT33 (NAME) WHERE NAME = name")
I do not quite understand the question so I would like to apologize in advance for any misunderstanding.
Web2py is a Web MVC framework and you should follow that pattern while designing your application. Having that in mind, using a console-related function like input makes no sense. Also, you shouldn't use the same component to extract user interaction related data and deal with database connection and data access/manipulation.
If your intent is to simply convert your code snippet that used sqlite3 module into using pyDAL you just need to install it pip install pydal and change your code to something like
#Do your imports
from pydal import DAL, Field
# connect to your database
db = DAL('sqlite://foods.db')
# define your table model and table fields
db.define_table('input33', Field('NAME'))
# perform an insert into input database
db.input33.insert(name=name2)
# every insert/delete/update needs your to commit to your changes
db.commit()
A full documentation can be found here
I have an audit record table that I am writing to. I am connecting to MyDb, which has a stored procedure called 'CreateAudit', which is a passthrough stored procedure to another database on the same machine called MyOther DB with a stored procedure called 'CreatedAudit' as well.
In other words in MyDB I have CreateAudit, which does the following EXEC dbo.MyOtherDB.CreateAudit.
I call the MyDb CreateAudit stored procedure from my application, using subsonic as the DAL. The first time I call it, I call it with the following (pseudocode):
int openStatus, closeStatus = 0;
openStatus = Convert.ToInt32(SPs.LogAccess(userId, "OPENED"));
closeStatus = Convert.ToInt32(SPs.LogAccess(userId, "CLOSED"));
This is simplified, but this is what LogAccess calls:
ALTER procedure [dbo].[LogAccess]
#UserID uniqueid,
#Action varchar(10),
#Status integer output
as
DECLARE #mStatus INT
EXEC [MyOtherDb].[dbo].[LogAccess]
#UserID = #UserID,
#Action = #Action,
#Status = #mStatus OUTPUT
select #mStatus
In my second stored procedure it is supposed to mark the record that was created by the CreateAudit(recordId, "Opened") with a status of closed.
This works great if I run them independently of one another, or even if I paste them into query analyzer. However when they execute from the application, the record is not marked as "Closed".
When I run SQL profiler I see that both queries ran, and if I copy the queries out and run them from query analyzer the record gets marked as closed 100% of the time!
When I run it from the application, about once every 20 times or so, the record is successfully marked closed - the other 19 times nothing happens, but I do not get an error!
Is it possible for the .NET app to skip over the ouput from the first stored procedure and start executing the second stored procedure before the record in the first is created?
When I add a "WAITFOR DELAY '00:00:00:003'" to the top of my stored procedure, the record is also closed 100% of the time.
My head is spinning, any ideas why this is happening!
Thanks for any responses, very interested in hearing how this can happen.
In your 1st stored proc, try having the EXEC statement wait for a return value from the 2nd stored proc. My suspicion is that your first SP is firing off the 2nd stored proc and then immediately returning control to your .NET code, which is leading to the above commenter's concurrency issue. (That is to say, the 2nd SP hasn't finished running yet by the time your next DB call is made!)
SP1: EXEC #retval = SP2 ....