Heavy archive log generation in PostgreSQL - postgresql-9.1

We are using PostgreSQL 9.1 in our production server. From the past one month, our database is generating nearly 35 GB of archive logs on a daily basis. For that, we have monitored what are all the queries are being run at the time of archive log generation. Then we ran vacuum(freeze,analyze) on the whole database. But it seems to be no effect on the archive log generation. We have suspected one table for this. Every 9th and 39th minute of an hour, same delete statement is being running. It is deleting the whole table each time it gets executed. For test purpose, we ran EXPLAIN(ANALYZE,BUFFERS) against the statement. We found that read = 576 MB and written = 328 MB. But, in our production server, shared_buffers = 24 MB.
So, each and every time it is taking 24 MB of data blocks from the disk to shared buffer. Then it will flush to disk and again it will take 24 MB of data blocks into the shared buffers.
So, is the reason of archive log generation is because of frequent flushing of data blocks ? Shall we need to increase shared_buffers in our production server to get rid of heavy archive log generation?
In our production server, work_mem = 1 MB
Here is the output of EXPLAIN(ANALYZE,BUFFERS) for your information:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Delete on table_a (cost=83.35..171.85 rows=2060 width=12) (actual time=11949.929..11949.929 rows=0 loops=1)
Buffers: shared hit=375963 read=73999 written=42030
-> Hash Semi Join (cost=83.35..171.85 rows=2060 width=12) (actual time=1.028..12.570 rows=2060 loops=1)
Hash Cond: (public.table_a.id = public.table_a.id)
Buffers: shared hit=46 read=30 written=18
-> Seq Scan on table_a (cost=0.00..57.60 rows=2060 width=10) (actual time=0.007..5.009 rows=2060 loops=1)
Buffers: shared hit=7 read=30 written=18
-> Hash (cost=57.60..57.60 rows=2060 width=10) (actual time=0.973..0.973 rows=2060 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 89kB
Buffers: shared hit=37
-> Seq Scan on table_a (cost=0.00..57.60 rows=2060 width=10) (actual time=0.002..0.463 rows=2060 loops=1)
Buffers: shared hit=37
Total runtime: 11950.028 ms

https://www.postgresql.org/docs/9.1/static/wal-intro.html
Briefly, WAL's central concept is that changes to data files (where
tables and indexes reside) must be written only after those changes
have been logged, that is, after log records describing the changes
have been flushed to permanent storage. If we follow this procedure,
we do not need to flush data pages to disk on every transaction
commit, because we know that in the event of a crash we will be able
to recover the database using the log: any changes that have not been
applied to the data pages can be redone from the log records. (This is
roll-forward recovery, also known as REDO.)
So in short INSERT, UPDATE, DELETE, VACUUM, CLUSTER, ALTER.. SET TABLESPACE (and other similar) produce WALs. As more blocks are rewritten as more WALs created. shared_buffers will not affect it.

Related

Sizing requirements for Oracle downstream mining database while using Oracle Golden gate Downstream Integrated Capture

I just want to know role of oracle downstream mining database machine in OGG downstream integrated capture mode. To be specific, I want to know whether the mining db also stores data or it only process the archive logs received from source and forward processed data to target without storing?
For example, if I have 1000 tables having size of 15TB in source system and I just want to replicate one table having size 1MB to the target, whether all the 1000 table having size of 15TB need to exist in downstream mining db, or none of the 1000 tables need to be exists in downstream mining DB, or only the interested table having size 1MB need to be exists in downstream mining DB.
Thanks
Lack of points adding comments here.
No need of source table(s) or data files on Log mining server.Only Redo or Archive logs are shipped/transported from source db to Log mining server.

Allocate specific storage space to MariaDB users

I am using an Amazon EC2 instance and a MariaDB database.
Suppose that I have 20GB database storage and 10 user accounts. I want each account to have 2GB database storage.
How can I achieve this?
I don't think there is any builtin way to monitor or report disk usage by user.
If you limit each user to one database:
GRANT ALL PRIVILEGES ON user1_db TO user1#'...' ...;
and periodically run a query like
SELECT table_schema AS db_name,
SUM(data_length+index_length) / 1073741824 DB_Gigabyte
FROM information_schema.tables
GROUP BY table_schema;
to get a list of how many GB is in each database. From there, it is up to you to deal with any over-eating users.
Caution: Because of temp tables, free space, etc., it is quite possible for a user to temporarily exceed the limit. Also, there are cases where inserting one small row will grow the allocation by a megabyte or more. So, be lenient, else both you and the user will be puzzled at what is going on.
Be sure to have innodb_file_per_table = ON so that you can more easily deal with bloated users.

Simple query is executed very slow to 100 rows

I use asp.net mvc, sql server. Query in my repository's class. Sometimes query is executed in 10 seconds, sometimes in 3 minutes!! Why? I used a SQL Server Profiler, but I realy don't understand what could be the cause and how I can find it.
Query:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[FirstAddressId] AS [FirstAddressId],
[Extent1].[SecondAddressId] AS [SecondAddressId],
[Extent1].[Distance] AS [Distance],
[Extent1].[JsonRoute] AS [JsonRoute]
FROM [dbo].[AddressXAddressDistances] AS [Extent1]
Check your query plan. Just run your SELECT statement in SqlServer Management Studio to obtain real query plan. More info is here: Query plan.
If they are the same, but response time differs significantly between the calls, than probably the issue is with lockc on db level (or huge active workloads). I mean incorrect transaction isolation level for instance or some reports running in the meantime obtaining too much resources (or generating locks "because of something" to ensure some data consistency enforced by some developer).
Many factors have influence on performance (including memory available at the moment of query execution).
You can run also a few queries to analyze quality of your statistics (or just update all of them using EXEC sp_updatestats) or please analyze fragmentation of the indexes. I guess, but by me locks and outdated stats or defragmented indexes, can force SqlServer to choose very inefficient query plan.
Some info on active locks: Active locks on table
Additional info 1:
If you are the only user of this db and it's on your local machine (you use SQLServer Express) the issue with locks is rather less possible then other problems. Try to open Event Log of SqlServer. It's available in SqlServer Management Studio on left side (tree) under your engine instance here: Management/Sql Server Logs/Current. Do you see any unusual info there? Try to review system log also (using Event Viewer app). In case of hardware problems you should see there also some info. Btw: how many rows do your have in the table? Try to review also behavior of your disks in some Process Explorer or Performance Monitor. If disk queue length is to big it can be main source of the problem (in such case look what apps stress disk)...
More info on locks:
SELECT
[spid] = session_Id
, ecid
, [blockedBy] = blocking_session_id
, [database] = DB_NAME(sp.dbid)
, [user] = nt_username
, [status] = er.status
, [wait] = wait_type
, [current stmt] =
SUBSTRING (
qt.text,
er.statement_start_offset/2,
(CASE
WHEN er.statement_end_offset = -1 THEN DATALENGTH(qt.text)
ELSE er.statement_end_offset
END - er.statement_start_offset)/2)
,[current batch] = qt.text
, reads
, logical_reads
, cpu
, [time elapsed (ms)] = DATEDIFF(mi, start_time,getdate())
, program = program_name
, hostname
--, nt_domain
, start_time
, qt.objectid
FROM sys.dm_exec_requests er
INNER JOIN sys.sysprocesses sp ON er.session_id = sp.spid
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle)as qt
WHERE session_Id > 50 -- Ignore system spids.
AND session_Id NOT IN (##SPID) -- Ignore this current statement.
ORDER BY 1, 2
GO
Before you waste any more time on this, you should realize that something like the time a query takes in development is essentially meaningless. In development, you're running a single-threaded web server in IIS Express, which means that you've also got VS running, sitting on roughly 2-4 GB of RAM. Together with that, you're running a SQL Server instance, that's fighting the system for both RAM and hard drive time. You haven't given any specs of your system, but if you also happen to be sporting a consumer-class 5400 or 7200 RPM platter-style drive rather than an SSD, that's going to severely impact performance as well. Then, we haven't even got into what else might be running on this system. Photoshop? Outlook? Your favorite playlists of MP3s decoding in the background? What's Windows doing? It might be downloading/applying updates, indexing your drive for search, etc. None of that applies any more when you move into production (or at least shouldn't). In production, you should have a dedicated server with 4-8 GB of RAM and an SSD or enterprise-class 15,000+ RPM platter drive devoted just to SQL Server, so it can spit out query results at lightning speeds.
Long and short, if you want to guage website/query performance of your application, you need to deploy it to a facsimile of what you'll be running in production. There, you can pound the hell out of it and get some real data you can actually do something with. Trying to profile your app in development is just a total waste of time.

Clarification regarding journal_size_limit in SQLite

If I set journal_size_limit = 67110000 (64 MiB) will I be able to:
work with / commit transactions over that value (somewhat unlikely)
be able to successfully perform a VACUUM (even if the database has like 3 GiB or more)
The VACUUM command works by copying the contents of the database into
a temporary database file and then overwriting the original with the
contents of the temporary file. When overwriting the original, a
rollback journal or write-ahead log WAL file is used just as it would
be for any other database transaction. This means that when
VACUUMing a database, as much as twice the size of the original
database file is required in free disk space.
It's not entirely clear in the documentation, and I would appreciate if someone could tell me for sure.
The journal_size_limit is not an upper limit on the transaction journal; it is an upper limit for an inactive transaction journal.
After a transaction has finished, the journal is not needed, but not deleting the journal can make things faster because the file system does not need to free this data and then reallocate it for the next transaction.
The purpose of this setting is to limit the size of unused journal data.

Plugging another relational DB to OpenDS

Currently I'm working on a project with opends. I have to upload more than 200k entries in the OpenDS. But unfortunately its fails at random times when file limit exceeding more than 10k - 15k.
When I google for that particular error (alert ID 9896233: JE Database Environment corresponding to backend id userRoot is corrupt. Restart the Directory Server to reopen the Environment) it seems like openDS backend DB [BerklyDB] is not that reliable when adding massive number of entries. How can i plug in new commercial or open source reliable relational DB [Oracle/ H2] to the openDS. any configuration ? or do i have to change the openDS code ?
First you should be aware that Oracle has pulled the plug on the OpenDS project and it is now completely stalled. Development continues as open source as the OpenDJ project : http://opendj.forgerock.org.
This said, I believe that there is a problem with your environment. When I was still working on OpenDS, our basic stress test was importing and running very high load against 10 Millions users. 200K entries is not massive number. My daily OpenDJ tests on my laptop are done with 100K to 1M entries. We have customers running in production with OpenDJ with more than 20M entries, growing 40% every 6 months !
Berkeley DB has been proved to be very scalable and reliable.
Things you might want to check : what is the maximum number of files that can be opened by a single process on your machine ? Linux defaults to 1024 and the limit may be easy to hit with OpenDS or OpenDJ. Are you using a local filesystem ? Berkeley DB is not supported on networked FS such as NFS or other NAS.
Finally, check the logs/errors file and your systems log. Chances are that one of them will have a message containing the root cause of the problem (most likely logs/errors).
Kind regards,
Ludovic Poitou
ForgeRock - Product Manager for OpenDJ

Resources