Sizing requirements for Oracle downstream mining database while using Oracle Golden gate Downstream Integrated Capture - oracle-golden-gate

I just want to know role of oracle downstream mining database machine in OGG downstream integrated capture mode. To be specific, I want to know whether the mining db also stores data or it only process the archive logs received from source and forward processed data to target without storing?
For example, if I have 1000 tables having size of 15TB in source system and I just want to replicate one table having size 1MB to the target, whether all the 1000 table having size of 15TB need to exist in downstream mining db, or none of the 1000 tables need to be exists in downstream mining DB, or only the interested table having size 1MB need to be exists in downstream mining DB.
Thanks

Lack of points adding comments here.
No need of source table(s) or data files on Log mining server.Only Redo or Archive logs are shipped/transported from source db to Log mining server.

Related

Kafka Connector for Oracle Database Source

I want to build a Kafka Connector in order to retrieve records from a database at near real time. My database is the Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 and the tables have millions of records. First of all, I would like to add the minimum load to my database using CDC. Secondly, I would like to retrieve records based on a LastUpdate field which has value after a certain date.
Searching at the site of confluent, the only open source connector that I found was the “Kafka Connect JDBC”. I think that this connector doesn’t have CDC mechanism and it isn’t possible to retrieve millions of records when the connector starts for the first time. The alternative solution that I thought is Debezium, but there is no Debezium Oracle Connector at the site of Confluent and I believe that it is at a beta version.
Which solution would you suggest? Is something wrong to my assumptions of Kafka Connect JDBC or Debezium Connector? Is there any other solution?
For query-based CDC which is less efficient, you can use the JDBC source connector.
For log-based CDC I am aware of a couple of options however, some of them require license:
1) Attunity Replicate that allows users to use a graphical interface to create real-time data pipelines from producer systems into Apache Kafka, without having to do any manual coding or scripting. I have been using Attunity Replicate for Oracle -> Kafka for a couple of years and was very satisfied.
2) Oracle GoldenGate that requires a license
3) Oracle Log Miner that does not require any license and is used by both Attunity and kafka-connect-oracle which is is a Kafka source connector for capturing all row based DML changes from an Oracle and streaming these changes to Kafka.Change data capture logic is based on Oracle LogMiner solution.
We have numerous customers using IBM's IIDR (info sphere Data Replication) product to replicate data from Oracle databases, (as well as Z mainframe, I-series, SQL Server, etc.) into Kafka.
Regardless of which of the sources used, data can be normalized into one of many formats in Kafka. An example of an included, selectable format is...
https://www.ibm.com/support/knowledgecenter/en/SSTRGZ_11.4.0/com.ibm.cdcdoc.cdckafka.doc/tasks/kcopauditavrosinglerow.html
The solution is highly scalable and has been measured to replicate changes into the 100,000's of rows per second.
We also have a proprietary ability to reconstitute data written in parallel to Kafka back into its original source order. So, despite data having been written to numerous partitions and topics , the original total order can be known. This functionality is known as the TCC (transactionally consistent consumer).
See the video and slides here...
https://kafka-summit.org/sessions/exactly-once-replication-database-kafka-cloud/

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

How to validate data in Teradata from Oracle

My source data is in Oracle and target data is in Teradata.Can you please provide me the easy and quick way to validate data .There are 900 tables.If possible can you provide syntax too
There is a product available known as the Teradata Gateway that works with Oracle and allows you to access Teradata in a "heterogeneous" manner. This may not be the most effective way to compare the data.
Ultimately what your requirements sound more process driven and to be done effectively would require the source data to be compared/validated as stage tables on the Teradata environment after your ETL/ELT process has completed.

Clarification regarding journal_size_limit in SQLite

If I set journal_size_limit = 67110000 (64 MiB) will I be able to:
work with / commit transactions over that value (somewhat unlikely)
be able to successfully perform a VACUUM (even if the database has like 3 GiB or more)
The VACUUM command works by copying the contents of the database into
a temporary database file and then overwriting the original with the
contents of the temporary file. When overwriting the original, a
rollback journal or write-ahead log WAL file is used just as it would
be for any other database transaction. This means that when
VACUUMing a database, as much as twice the size of the original
database file is required in free disk space.
It's not entirely clear in the documentation, and I would appreciate if someone could tell me for sure.
The journal_size_limit is not an upper limit on the transaction journal; it is an upper limit for an inactive transaction journal.
After a transaction has finished, the journal is not needed, but not deleting the journal can make things faster because the file system does not need to free this data and then reallocate it for the next transaction.
The purpose of this setting is to limit the size of unused journal data.

some basic oracle concepts

Hi:
In our new application we have to use the oracle as the db,and we use mysql/sqlserver before,when I come to oracle I am confused by its concepts,for exmaple,the table space,the object,the schema table,index, procedure, database link,...:(
And the schema is closed to the user,I can not make it.
Since when we use the mysql,I just know that one database contain as many tables,and contain as many users,user have different authentication for different table.
But in oracle,everything is different.
Anyone can tell me some basic concepts of oracle,and some quick start docs?
Oracle has specific meanings for commonly-used terms, and you're right, it is confusing. I'll build a hierarchy of terms from the bottom up:
Database - In Oracle, the database is the collection of files that make up your overall collection of data. To get a handle on what Oracle means, picture the database management system (dbms) in a non-running state. All those files are your "database."
Instance - When you start the Oracle software, all those files become active, things get loaded into memory, and there's an entity to which you can connect. Many people would use the term "database" to describe a running dbms, but, once everything is up-and-running, Oracle calls it an, "instance."
Tablespace - A abstraction that allows you to think about a chunk of storage without worrying about the physical details. When you create a user, you ask Oracle to put that user's data in a specific tablespace. Oracle manages storage via the tablespace metaphor.
Data file - The physical files that actually store the data. Data files are grouped into tablespaces. If you use all the storage you have allocated to a user, or group of users, you add data files (or make the existing files bigger) to the tablespace they're configured to use.
User - An abstraction that encapsulates the privileges, authentication information, and default storage areas for an account that can log on to an Oracle instance.
Schema - The tables, indices, constraints, triggers, etc. that are owned by a particular user. There is a one-to-one correspondence between users and schemas. The schema has the same name as the user. The difference between the two is that the user concept is all about account information, while the schema concept deals with logical database objects.
This is a very simplified list of terms. There are different states of "running" for an Oracle instance, for example, and it's easy to get into very nuanced discussions of what things mean. Here's a practical exercise that will let you put your hands on these things, and will make the distinctions clearer:
Start an already-created Oracle instance. This step will transform a group of files, or as Oracle would say, a database, into a running Oracle instance.
Create a tablespace with the CREATE TABLESPACE command. You'll have to specify some data files to put into the tablespace, as well as some storage parameters.
Create a user with the CREATE USER command. You'll see that the items you have to specify have to do with passwords, privileges, quotas, and the like. Specify that the user's data be stored in the tablespace you created in step 2.
Connect to the Oracle using the credentials you created with the new user from step 3. Type, "SELECT * FROM CAT". Nothing should come back yet. Your user has a schema, but it's empty.
Run a CREATE TABLE command. INSERT some data into the table. The schema now contains some objects.
table spaces: these are basically
storage definitions. when defining a
table or index, etc., you can specify
storage options simply by putting
your table in a specific table_space
table, index, procedure: these are pretty much the same
user, schema: explained well before
database link: you can join table A in instance A and table B in instance B using a - database link between the two instances (while logged in on of them)
object: has properties (like a columns in a table) and methods that operate on those poperties (pretty much like in OO design); these are not widely used
A few links:
Start page for 11g rel 2 docs http://www.oracle.com/pls/db112/homepage
Database concepts, Table of contents http://download.oracle.com/docs/cd/E11882_01/server.112/e16508/toc.htm

Resources