How to set journal-mode of Sqlite3 in ruby-on-rails - sqlite

I try to set journal-mode of Sqlite3 in my Ruby-on-Rails project.
It seems that ruby-on-rails uses default journal-mode of sqlite, i.e. 'delete', as i saw a journal file in folder 'db' when I updated database and it was deleted when update was done. I hope to set jouranl-mode to be "WAL" or "memory".
I tried Sqlite command line
PRAGMA main.journal_mode=WAL
but it does not affect application in ruby-on-rails.
Finally I made it by changing source code of sqlite3_adapter.rb
I changed a function in file : activerecord-5.1.4/lib/active_record/connection_adapters/sqlite3_adapter.rb
def configure_connection
# here are original codes
execute("PRAGMA journal_mode = WAL", "SCHEMA")
end
Because configure_connection is called by initialize of SQLite3Adapter
It does not sound nice solution although it works.
Is there any nicer way to set journal-mode of Sqlite3 in Ruby-on-Rails(version is 5.1.4)? For example configuration options

It has been a while since I've had to do this, but you should be able to use an initializer so you don't need to patch the source.
Putting something like this in config/initializers/configure_sqlite_journal.rb
if c = ::ActiveRecord::Base.connection
c.execute 'PRAGMA journal_mode = WAL'
end
Should do what you want

I found a better answer in here: Speed up your Rails sqlite database for large dataset, more performance and configure:
The code( I put this into config/initializers/speedup_sqlite3.rb ):
if ::ActiveRecord::Base.connection_config[:adapter] == 'sqlite3'
if c = ::ActiveRecord::Base.connection
# see http://www.sqlite.org/pragma.html for details
# Page size of the database. The page size must be a power of two between 512 and 65536 inclusive
c.execute 'PRAGMA main.page_size=4096;'
# Suggested maximum number of database disk pages that SQLite will hold in memory at once per open database file
c.execute 'PRAGMA main.cache_size=10000;'
# Database connection locking-mode. The locking-mode is either NORMAL or EXCLUSIVE
c.execute 'PRAGMA main.locking_mode=EXCLUSIVE;'
# Setting of the "synchronous" flag, "NORMAL" means sync less often but still more than none
c.execute 'PRAGMA main.synchronous=NORMAL;'
# Journal mode for database, WAL=write-ahead log
c.execute 'PRAGMA main.journal_mode=WAL;'
# Storage location for temporary tables, indices, views, triggers
c.execute 'PRAGMA main.temp_store = MEMORY;'
end
end

Related

Oracle PLSQL Print Bulk Batch Output After Each Batch Completion [duplicate]

I have an SQL script that is called from within a shell script and takes a long time to run. It currently contains dbms_output.put_line statements at various points. The output from these print statements appear in the log files, but only once the script has completed.
Is there any way to ensure that the output appears in the log file as the script is running?
Not really. The way DBMS_OUTPUT works is this: Your PL/SQL block executes on the database server with no interaction with the client. So when you call PUT_LINE, it is just putting that text into a buffer in memory on the server. When your PL/SQL block completes, control is returned to the client (I'm assuming SQLPlus in this case); at that point the client gets the text out of the buffer by calling GET_LINE, and displays it.
So the only way you can make the output appear in the log file more frequently is to break up a large PL/SQL block into multiple smaller blocks, so control is returned to the client more often. This may not be practical depending on what your code is doing.
Other alternatives are to use UTL_FILE to write to a text file, which can be flushed whenever you like, or use an autonomous-transaction procedure to insert debug statements into a database table and commit after each one.
If it is possible to you, you should replace the calls to dbms_output.put_line by your own function.
Here is the code for this function WRITE_LOG -- if you want to have the ability to choose between 2 logging solutions:
write logs to a table in an autonomous transaction
CREATE OR REPLACE PROCEDURE to_dbg_table(p_log varchar2)
-- table mode:
-- requires
-- CREATE TABLE dbg (u varchar2(200) --- username
-- , d timestamp --- date
-- , l varchar2(4000) --- log
-- );
AS
pragma autonomous_transaction;
BEGIN
insert into dbg(u, d, l) values (user, sysdate, p_log);
commit;
END to_dbg_table;
/
or write directly to the DB server that hosts your database
This uses the Oracle directory TMP_DIR
CREATE OR REPLACE PROCEDURE to_dbg_file(p_fname varchar2, p_log varchar2)
-- file mode:
-- requires
--- CREATE OR REPLACE DIRECTORY TMP_DIR as '/directory/where/oracle/can/write/on/DB_server/';
AS
l_file utl_file.file_type;
BEGIN
l_file := utl_file.fopen('TMP_DIR', p_fname, 'A');
utl_file.put_line(l_file, p_log);
utl_file.fflush(l_file);
utl_file.fclose(l_file);
END to_dbg_file;
/
WRITE_LOG
Then the WRITE_LOG procedure which can switch between the 2 uses, or be deactivated to avoid performances loss (g_DEBUG:=FALSE).
CREATE OR REPLACE PROCEDURE write_log(p_log varchar2) AS
-- g_DEBUG can be set as a package variable defaulted to FALSE
-- then change it when debugging is required
g_DEBUG boolean := true;
-- the log file name can be set with several methods...
g_logfname varchar2(32767) := 'my_output.log';
-- choose between 2 logging solutions:
-- file mode:
g_TYPE varchar2(7):= 'file';
-- table mode:
--g_TYPE varchar2(7):= 'table';
-----------------------------------------------------------------
BEGIN
if g_DEBUG then
if g_TYPE='file' then
to_dbg_file(g_logfname, p_log);
elsif g_TYPE='table' then
to_dbg_table(p_log);
end if;
end if;
END write_log;
/
And here is how to test the above:
1) Launch this (file mode) from your SQLPLUS:
BEGIN
write_log('this is a test');
for i in 1..100 loop
DBMS_LOCK.sleep(1);
write_log('iter=' || i);
end loop;
write_log('test complete');
END;
/
2) on the database server, open a shell and
tail -f -n500 /directory/where/oracle/can/write/on/DB_server/my_output.log
Two alternatives:
You can insert your logging details in a logging table by using an autonomous transaction. You can query this logging table in another SQLPLUS/Toad/sql developer etc... session. You have to use an autonomous transaction to make it possible to commit your logging without interfering the transaction handling in your main sql script.
Another alternative is to use a pipelined function that returns your logging information. See here for an example: http://berxblog.blogspot.com/2009/01/pipelined-function-vs-dbmsoutput.html When you use a pipelined function you don't have to use another SQLPLUS/Toad/sql developer etc... session.
the buffer of DBMS_OUTPUT is read when the procedure DBMS_OUTPUT.get_line is called. If your client application is SQL*Plus, it means it will only get flushed once the procedure finishes.
You can apply the method described in this SO to write the DBMS_OUTPUT buffer to a file.
Set session metadata MODULE and/or ACTION using dbms_application_info().
Monitor with OEM, for example:
Module: ArchiveData
Action: xxx of xxxx
If you have access to system shell from PL/SQL environment you can call netcat:
BEGIN RUN_SHELL('echo "'||p_msg||'" | nc '||p_host||' '||p_port||' -w 5'); END;
p_msg - is a log message
v_host is a host running python script that reads data from socket on port v_port.
I used this design when I wrote aplogr for real-time shell and pl/sql logs monitoring.

How can I log sql execution results in airflow?

I use airflow python operators to execute sql queries against a redshift/postgres database. In order to debug, I'd like the DAG to return the results of the sql execution, similar to what you would see if executing locally in a console:
I'm using psycop2 to create a connection/cursor and execute the sql. Having this logged would be extremely helpful to confirm the parsed parameterized sql, and confirm that data was actually inserted (I have painfully experiences issues where differences in environments caused unexpected behavior)
I do not have deep knowledge of airflow or the low level workings of the python DBAPI, but the pscyopg2 documentation does seem to refer to some methods and connection configurations that may allow this.
I find it very perplexing that this is difficult to do, as I'd imagine it would be a primary use case of running ETLs on this platform. I've heard suggestions to simply create additional tasks that query the table before and after, but this seems clunky and ineffective.
Could anyone please explain how this may be possible, and if not, explain why? Alternate methods of achieving similar results welcome. Thanks!
So far I have tried the connection.status_message() method, but it only seems to return the first line of the sql and not the results. I have also attempted to create a logging cursor, which produces the sql, but not the console results
import logging
import psycopg2 as pg
from psycopg2.extras import LoggingConnection
conn = pg.connect(
connection_factory=LoggingConnection,
...
)
conn.autocommit = True
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.addHandler(logging.StreamHandler(sys.stdout))
conn.initialize(logger)
cur = conn.cursor()
sql = """
INSERT INTO mytable (
SELECT *
FROM other_table
);
"""
cur.execute(sql)
I'd like the logger to return something like:
sql> INSERT INTO mytable (
SELECT ...
[2019-07-25 23:00:54] 912 rows affected in 4 s 442 ms
Let's assume you are writing an operator that uses postgres hook to do something in sql.
Anything printed inside an operator is logged.
So, if you want to log the statement, just print the statement in your operator.
print(sql)
If you want to log the result, fetch the result and print the result.
E.g.
result = cur.fetchall()
for row in result:
print(row)
Alternatively you can use self.log.info in place of print, where self refers to the operator instance.
Ok, so after some trial and error I've found a method that works for my setup and objective. To recap, my goal is to run ETL's via python scripts, orchestrated in Airflow. Referring to the documentation for statusmessage:
Read-only attribute containing the message returned by the last command:
The key is to manage logging in context with transactions executed on the server. In order for me to do this, I had to specifically set con.autocommit = False, and wrap SQL blocks with BEGIN TRANSACTION; and END TRANSACTION;. If you insert cur.statusmessage directly following a statement that deletes or inserts, you will get a response such as 'INSERT 0 92380'.
This still isn't as verbose as I would prefer, but it is a much better than nothing, and is very useful for troubleshooting ETL issues within Airflow logs.
Side notes:
- When autocommit is set to False, you must explicitly commit transactions.
- It may not be necessary to state transaction begin/end in your SQL. It may depend on your DB version.
con = psy.connect(...)
con.autocommit = False
cur = con.cursor()
try:
cur.execute([some_sql])
logging.info(f"Cursor statusmessage: {cur.statusmessage})
except:
con.rollback()
finally:
con.close()
There is some buried functionality within psycopg2 that I'm sure can be utilized, but the documentation is pretty thin and there are no clear examples. If anyone has suggestions on how to utilize things such as logobjects, or returning join PID to somehow retrieve additional information.

How to check which SQL query is so CPU intensive

Is there any possible way to check which query is so CPU intensive in _sqlsrv2 process?
Something which give me information about executed query in that process in that moment.
Is there any way to terminate that query without killing _sqlsrv2 process?
I cannot find any official materials in that subject.
Thank You for any help.
You could look into client database-request caching.
Code examples below assume you have ABL access to the environment. If not you will have to use SQL instead but it shouldn't be to hard to "translate" the code below
I haven't used this a lot myself but I wouldn't be surprised if it has some impact on performance.
You need to start caching in the active connection. This can be done in the connection itself or remotely via VST tables (as long as your remote session is connected to the same database) so you need to be able to identify your connections. This can be done via the process ID.
Generally how to enable the caching:
/* "_myconnection" is your current connection. You shouldn't do this */
FIND _myconnection NO-LOCK.
FIND _connect WHERE _connect-usr = _myconnection._MyConn-userid.
/* Start caching */
_connect._Connect-CachingType = 3.
DISPLAY _connect WITH FRAME x1 SIDE-LABELS WIDTH 100 1 COLUMN.
/* End caching */
_connect._Connect-CachingType = 0.
You need to identify your process first, via top or another program.
Then you can do something like:
/* Assuming pid 21966 */
FIND FIRST _connect NO-LOCK WHERE _Connect._Connect-Pid = 21966 NO-ERROR.
IF AVAILABLE _Connect THEN
DISPLAY _connect.
You could also look at the _Connect-Type. It should be 'SQLC' for SQL connections.
FOR EACH _Connect NO-LOCK WHERE _Connect._connect-type = "SQLC":
DISPLAY _connect._connect-type.
END.
Best of all would be to do this in a separate environment. If you can't at least try it in a test environment first.
Here's a good guide.
You can use a Select like this:
select
c."_Connect-type",
c."_Connect-PID" as 'PID',
c."_connect-ipaddress" as 'IP',
c."_Connect-CacheInfo"
from
pub."_connect" c
where
c."_Connect-CacheInfo" is not null
But first you need to enable connection cache, follow this example

SQLite : Pragma cache_spill = false not working

I was getting a "database locked" error while reading as I was executing long transaction. Since Cache_Spill is the reason for this exclusive lock, I wanted to avoid exclusive lock on the database . So I needed to set cache_spill = false.
But the following code seems not to be working.
iretval = sqlite3_exec(pSqlHandle,
"PRAGMA cache_spill=false",ZERO,ZERO,ZERO);
if(iretval != SQLITE_OK)
{
iretval = DB_QUERY_FAILED;
goto db_close;
}
I am not getting any error.But I am still getting the lock at the same position while reading.
EDIT
Just to check, I intentionally mistyped as PRAGMA cahe_spill=false and I didn't get any error. So the first question is, am I executing the pragma correctly?
EDIT
I am executing it after opening the database.
Unknown PRAGMAs are ignored.
(PRAGMA cache_spill was introduced in SQLite 3.8.0.)
You cannot prevent writing to disk when SQLite runs out of memory in its page cache; PRAGMA cache_spill prevents only write before that.
You also have to increate the page cache size.

R and RJDBC: Using dbSendUpdate results in ORA-01000: maximum open cursors exceeded

I have encountered an error when trying to insert thousands of rows with R/RJDBC and the dbSendUpdate command on an Oracle database.
Problem can be reproduced by creating a test table with
CREATE TABLE mytest (ID NUMBER(10) not null);
and then executing the following R script
library(RJDBC)
drv<-JDBC("oracle.jdbc.OracleDriver","ojdbc-11.1.0.7.0.jar") # place your JDBC driver here
conn <- dbConnect(drv, "jdbc:oracle:thin:#MYSERVICE", "myuser", "mypasswd") # place your connection details here
for (i in 1:10000) {
dbSendUpdate(conn,"INSERT INTO mytest VALUES (?)",i))
}
Searching the Internet provided the information, that one should close result cursors, which is obvious (e.g. see java.sql.SQLException: - ORA-01000: maximum open cursors exceeded or Unable to resolve error - java.sql.SQLException: ORA-01000: maximum open cursors exceeded).
But the help file for ??dbSendUpdate claims for not using result cursors at all:
.. that dbSendUpdate is used with DBML queries and thus doesn't return any result set.
Therefore this behavior doesn't make much sense to me :-(
Can anybody help?
Thanks, a lot!
PS: Found something interessting in the RJDBC documentation http://www.rforge.net/RJDBC/
Note that the life time of a connection, result set, driver etc. is determined by the lifetime of the corresponding R object. Once the R handle goes out of scope (or if removed explicitly by rm) and is garbage-collected in R, the corresponding connection or result set is closed and released. This is important for databases that have limited resources (like Oracle) - you may need to add gc() by hand to force garbage collection if there could be many open objects. The only exception are drivers which stay registered in the JDBC even after the corresponding R object is released as there is currently no way to unload a JDBC driver (in RJDBC).
But again, even inserting gc() within the loop will produce the same beahvoir.
Finally we realized this being a bug in package RJDBC. If you are willing to patch RJDBC you can use the patched sources available at https://github.com/Starfox899/RJDBC (as long as they are not imported into the package).

Resources