Update a column in a table from other table with 6billion rows in oracle 11g - oracle11g

I have 2 table Table A, Table B. Both the tables are of size 500GB, Some of the columns of tables are as below.
Table A
ID
Type
DateModified
Added a new column to Table as CID, which is available in Table B.
Table B
ID
CID
DateGenerated
Table A is partitioned on dateModified, Table B is not partitioned, My task is to get the CID from Table B and update it in Table A. Both the tables are having billions of records.
I have tried Merge/SQL but its too slow, which cannot be completed in 2 days.

Adding a new column to an existing table causing row fragmentation. Updating the new column to some value will probably cause massive row chaining, partitioned or not. And yes, that is slow, even when there are sufficient indexes etc.
Recommended approach:
You are on Enterprise Edition since you have partitioning, so you might be able to solve this using the schema versions functionality.
But if this is a one time action and you do not know how to use it well, I would use a "create table ... as" approach. Building the table from scratch and then switching it when ready. Take care to not miss any trickle loaded transactions. With partitioning it will be fast (writing 500 GB at say 50 MB/sec on a strong server is not unrealistic, taking 3 hours).

Related

Change a large dynamodb table to use LSIs instead of GSIs

I have a live table in dynamo with about 28 million records in it.
The table has a number of GSI that I'd like to change to be LSIs however LSIs can only be created when the table is created.
I need to create a new table and migrate the data with minimum downtime. I was thinking I'd do the following:
Create the new table with the correct indexes.
Update the code to write records to the old and new table. When this starts, take a note of the timestamp for the first record.
Write a simple process to sync existing data for anything with a create date prior to my first date.
I'd have to add a lock field to the new table to prevent race conditions when an existing record is updated.
When it's all synced we'd swap to using the new table.
I think that will work, but it's fairly complicated and feels prone to error. Has anyone found a better way to do this?
Here is an approach:
(Let's refer to the table with GSIs as oldTable and the new table with LSIs as newTable).
Create newTable with the required LSIs.
Create a DynamoDB tirgger for the oldTable such that for every new record coming to the oldTable insert the same record to the newTable. (This logic needs to be in the AWS Lambda).
Make your application point to the newTable.
Migrate all the records from oldTable to newTable.

Oracle 11g express - Choose the position where a column is inserted?

Lets say I already have 3 columns A,B,C in my table Tb. I want to add a new column M between B and C. How can I do this ?
After adding M,my table should look like - A B M C and NOT A B C M ?
The simple answer is that you can't. Columns are always added at the end. However, you shouldn't care about the order of columns in a table since you should always be explicitly listing columns in your queries and in your DML. And you should always have an interface layer (a view, for example) where, if order is important, you can add the new column in the appropriate place.
If you are really determined, you can create a new table with the new column order, move the data to the new table, drop the old table, and rename the new table. You'll need to recreate any indexes, constraints, or triggers on the table. Something like
ALTER TABLE tb
ADD( M NUMBER );
CREATE TABLE tb_new
AS
SELECT a, b, m, c
FROM tb;
DROP TABLE tb;
ALTER TABLE tb_new
RENAME TO tb;
I'm not sure whether it's an option in the express edition (I tend to doubt it is but I don't have an XE database handy to verify) but you could also potentially use the DBMS_REDEFINITION package as Barbara shows in that example. Behind the scenes, Oracle is doing basically the same thing that is done above but with some added materialized view logs to allow applications to continue to access the table during the operation.
If you find yourself caring about the order of columns in a table, though, you're much better off stopping to figure out what you've done wrong rather than continuing to move forward on either path. It should be exceptionally, exceptionally rare that you would care about the physical order of columns in a table.

Seeking advice on how to structure the SQL Server 2008 DB table with large amount of data?

I am planning a web application (programmed using ASP.NET) that manages the database of logged events. The database will be managed in an SQL Server 2008. Each event may come from a set of, let's call them, "units." A user will be able to add and remove these "units" via the ASP.NET interface.
Each of the "units" can potentially log up to a million entries, or maybe even more. (The cut off will be administered via a date. For instance:
DELETE FROM [tbl] WHERE [date] < '01-01-2011'
The question I have is what is the best way to structure such database:
By placing all entries for all "units" in a single table like this:
CREATE TABLE tblLogCommon (id INT PRIMARY INDEX,
idUnit INT,
dtIn DATETIME2, dtOut DATETIME2, etc INT)
Or, by separating tables for each "unit":
CREATE TABLE tblLogUnit_1 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
CREATE TABLE tblLogUnit_2 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
CREATE TABLE tblLogUnit_3 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
--and so on
CREATE TABLE tblLogUnit_N (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
Approach #1 seems simpler from a standpoint of referencing entries because with approach #2 I'll have to deal with variable N number of tables (as I said users will be allowed to add and remove "units.)
But approach #1 may render access to those log entries later very inefficient. I will have to generate reports from those logs via the ASP.NET interface.
So I'd like to hear your take on this before I begin coding?
EDIT: I didn't realize that the number of columns in a table makes a difference. My bad! The actual number of columns in a table is 16.
I would go with approach 1, as the table does not seem very large(width wise) and yuo could apply indexes to improve searching/selecting.
Further to this, you could also look at partitioned tables and indexes.
Creating Partitioned Tables and Indexes
Splitting in separate tables is going to yield better insert and search speed.
With one table the difference is an index on idUnit. With that index search speed is going to be nearly as fast as separate tables (and you can search across idUnits is a single query). Where one table is going to take a hit is insert but that is a small hit.
A lot depends on how you intend to use this data. If you split the data into multiple tables, will you be querying over multiple tables, or will all your queries be within the defined date range. How often will data be inserted and updated.
In other words, there's no correct answer!
Also, can you afford a license for SQL enterprise in order to use partitioned tables?
I did some tests on the actual data with SQL Server 2008 Express, using local computer connection, no network latency. The computer this was tested on: Desktop, Windows 7 Ultimate, 64-bit, CPU: i7, #2.8GHZ, 4 cores; RAM: 8GB; HDD (OS): 1TB, 260GB free.
First all records were located in a "SINGLE" table (approach #1). All records were generated with random data. A complex SELECT statement processing each particular "unitID" was tried two times (one immediately after another), with CPU load: 12% to 16%, RAM load: 53% - 62%. Here's the outcome:
UnitID NumRecords Complex_SELECT_Timing
1 486,810 1m:26s / 1m:13s
3 1,538,800 1m:13s / 0m:51s
4 497,860 0m:30s / 0m:24s
5 497,860 1m:20s / 0m:50s
Then the same records were separated into four tables with identical structure (approach #2). I then ran the same SELECT statement two times as before, on the same PC, with identical CPU and RAM loads. Next are the results:
Table NumRecords Complex_SELECT_Timing
t1 486,810 0m:19s / 0m:12s
t3 1,538,800 0m:42s / 0m:38s
t4 497,860 0m:03s / 0m:01s
t5 497,860 0m:15s / 0m:12s
I thought to share this with whoever is interested. This pretty much gives your the answer...
Thanks everyone who contributed!

Understanding the ORA_ROWSCN behavior in Oracle

So this is essentially a follow-up question on Finding duplicate records.
We perform data imports from text files everyday and we ended up importing 10163 records spread across 182 files twice. On running the query mentioned above to find duplicates, the total count of records we got is 10174, which is 11 records more than what are contained in the files. I assumed about the posibility of 2 records that are exactly the same and are valid ones being accounted for as well in the query. So I thought it would be best to use a timestamp field and simply find all the records that ran today (and hence ended up adding duplicate rows). I used ORA_ROWSCN using the following query:
select count(*) from my_table
where TRUNC(SCN_TO_TIMESTAMP(ORA_ROWSCN)) = '01-MAR-2012'
;
However, the count is still more i.e. 10168. Now, I am pretty sure that the total lines in the file is 10163 by running the following command in the folder that contains all the files. wc -l *.txt.
Is it possible to find out which rows are actually inserted twice?
By default, ORA_ROWSCN is stored at the block level, not at the row level. It is only stored at the row level if the table was originally built with ROWDEPENDENCIES enabled. Assuming that you can fit many rows of your table in a single block and that you're not using the APPEND hint to insert the new data above the existing high water mark of the table, you are likely inserting new data into blocks that already have some existing data in them. By default, that is going to change the ORA_ROWSCN of every row in the block causing your query to count more rows than were actually inserted.
Since ORA_ROWSCN is only guaranteed to be an upper-bound on the last time there was DML on a row, it would be much more common to determine how many rows were inserted today by adding a CREATE_DATE column to the table that defaults to SYSDATE or to rely on SQL%ROWCOUNT after your INSERT ran (assuming, of course, that you are using a single INSERT statement to insert all the rows).
Generally, using the ORA_ROWSCN and the SCN_TO_TIMESTAMP function is going to be a problematic way to identify when a row was inserted even if the table is built with ROWDEPENDENCIES. ORA_ROWSCN returns an Oracle SCN which is a System Change Number. This is a unique identifier for a particular change (i.e. a transaction). As such, there is no direct link between a SCN and a time-- my database might be generating SCN's a million times more quickly than yours and my SCN 1 may be years different from your SCN 1. The Oracle background process SMON maintains a table that maps SCN values to approximate timestamps but it only maintains that data for a limited period of time-- otherwise, your database would end up with a multi-billion row table that was just storing SCN to timestamp mappings. If the row was inserted more than, say, a week ago (and the exact limit depends on the database and database version), SCN_TO_TIMESTAMP won't be able to convert the SCN to a timestamp and will return an error.

Easy Way to split up a large Table in MS Access

I have a table in a MS Access 2010 Database and it can easily be split up into multiple tables. However I don't know how to do that and still keep all the data linked together. Does anyone know an easy way to do this?
I ended up just writing a bunch of Update and Append queries to create smaller tables and keep all the data synced.
You must migrate to other database system, like MSSQL, mySQL. You can't do in MsAccess replication...
Not sure what do you mean by split up into multiple tables.
Are the two tables have same structure? you want to divide the table into two pats ... means if original table has fields A,B,C,D ... then you want to split it to Table1: A,B and
Table2: C,D.
Anyways, I googled it a bit and the below links might of what you are looking for. Check them.
Split a table into related tables (MDB)
How hard is it to split a table in Access into two smaller tables?
Where do you run into trouble with the table analyzer wizard? Maybe you can work around the issue you are running into.
However, if the table analyzer wizard isn't working out, you might also consider the tactics described in http://office.microsoft.com/en-us/access-help/resolve-and-help-prevent-duplicate-data-HA010341696.aspx.
Under Microsoft Access 2012, Database Tools, Analyze table.. I use the wizard to split a large table into multiple normalized tables. Hope that helps.
Hmmm, can't you just make a copy of the table, then delete opposite items in each table leaving the data the way you want except, make sure that both tables have the same exact auto number field, and use that field to reference the other.
It may not be the most proficient way to do it, but I solved a similar issue the following way:
a) Procedure that creates a new table via SQL:
CREATE TABLE t002 (ID002 INTEGER PRIMARY KEY, CONSTRAINT SomeName FOREIGN KEY (ID002) REFERENCES t001(ID001));
The two tables are related to each other through the foreign key.
b) Procedure that adds the neccessary fields to the new table (t002). In the following sample code let's use just one field, and let's call it [MyFieldName].
c) Procedure to append all values of field ID001 from Table t001 to field ID002 in Table t002, via SQL:
INSERT INTO ID002 (t002) SELECT t001.ID001 FROM t001;
d) Procedure to transfer values from fields in t001 to fields in t001, via SQL:
UPDATE t001 INNER JOIN t002 ON t001.ID001 = t002.ID002 SET t002.MyFieldName = t001.MyFieldName;
e) Procedure to remove (drop) the fields in question in Table t001, via SQL:
ALTER TABLE t001 DROP COLUMN MyFieldName;
f) Procedure that calls them all one after the other. Fieldnames are fed into the process as parameters in the call to Procedure f.
It is quite a bunch of coding, but it did the job for me.

Resources