SQL Server Data Archiving - asp.net

I have a SQL Azure database on which I need to perform some data archiving operation.
Plan is to move all the irrelevant data from the actual tables into Archive_* tables.
I have tables which have up to 8-9 million records.
One option is to write a stored procedure and insert data in to the new Archive_* tables and also delete from the actual tables.
But this operation is really time consuming and running for more than 3 hrs.
I am in a situation where I can't have more than an hour's downtime.
How can I make this archiving faster?

You can use Azure Automation to schedule execution of a stored procedure every day at the same time, during maintenance window, where this stored procedure will archive the oldest one week or one month of data only, each time it runs. The store procedure should archive data older than X number of weeks/months/years only. Please read this article to create the runbook. In a few days you will have all the old data archived and the Runbook will continue to do the job from now and on.

You can't make it faster, but you can make it seamless. The first option is to have a separate task that moves data in portions from the source to the archive tables. In order to prevent table lock escalations and overall performance degradation I would suggest you to limit the size of a single transaction. E.g. start transaction, insert N records into the archive table, delete these records from the source table, commit transaction. Continue for a few days until all the necessary data is transferred. The advantage of that way is that if there is some kind of a failure, you may restart the archival process and it will continue from the point of the failure.
The second option that does not exclude the first one really depends on how critical the performance of the source tables for you and how many updates are happening with them. It if is not a problem you can write triggers that actually pour every inserted/updated record into an archive table. Then, when you want a cleanup all you need to do is to delete the obsolete records from the source tables, their copies will already be in the archive tables.
In the both cases you will not need to have any downtime.

Related

Inserting Result of Time-Consuming Stored Procedures to Tables

I have a website that runs a stored procedure when you open home page. That stored procedure process data from 4 relational table and gives a result. Since DB records increased, completion of the stored procedure can take more than 10 seconds and it is too much for a home page.
So I think, inserting result of the stored procedure into a new table regularly and using that table for home page can be a good idea to solve the problem but I am not sure if it is a good practice for SQL Server.
Is there any better solution for my case?
Edit: Those 4 tables are updated every 15 minutes with about 30 insert.
If you are willing to have a "designated victim" update the cache as needed (which may also cause other users to wait) you can do something like this in a stored procedure (SP):
Start a transaction to block access to the cache.
Check the date/time of the cache entries. (This requires either adding a CacheUpdated column to the cache table or storing the value elsewhere.)
If the cached data is sufficiently recent then return the data and end the transaction.
Delete the cached data and run a new query to refill it with an appropriate CacheUpdated date/time.
Return the cached data and end the transaction.
If the update time becomes too long for users to wait, or the cache rebuild blocks too many users, you can run a stored procedure at a scheduled interval by creating a job in SQL Server Agent. The SP would:
Save the current date/time, e.g. as #Now.
Run the query to update the cache marking each row with CacheUpdated = #Now.
Delete any cache rows where CacheUpdate != #Now.
The corresponding SP for users would simply return the oldest set of data, i.e. Min( CacheUpdated ) rows. If there is only one set, that's what they get. If an update is in progress then they'll get the older complete set, not the work in progress.
As far as you have explained your issue I see no problem in doing that, but you must explain more, since we don't know what type of data you collecting and how it is increases every time, so as to provide you a better solution

Should ANALYZE be run in a transaction?

In sqlite (specifically version 3), should ANALYZE be run in a transaction?
If so, and I'm at the end of a long transaction that made lots of changes, is it okay to run ANALYZE in that same transaction or should that transaction be committed first and begin another transaction for the ANALYZE?
The documentation doesn't say anything about this one way or another.
ANALYZE reads the data from indexed columns and writes statistical information into some internal table.
This is somewhat similar to the following query:
INSERT OR REPLACE INTO sqlite_statXXX
SELECT 'MyTable', 'MyColumn', COUNT(*), AVG(MyColumn) FROM MyTable
done once for every indexed column.
Like any other SQL statement that writes a small amount of data to the database, the transaction overhead will be much larger than the actual effort to write the data itself.
In your case, it is not necessary for your changed data to be available without the changed statistics, so you could just as well do the ANALYZE in the same transaction.
If the database is so big that ANALYZE runs for a long time, it might make sense to delay its execution until later when it does not conflict with more important transactions.

Updating records one by one

I have 5000 records I am calculating salary of one user and update his data in database. So it’s taking quit long to update 5000 records. I want to calculate all users’ salary first and then update to records in db.
Is there any other way we can update db in single click
It really depends on how your are managing your data access layer and what data you need for doing the calculation? Do you have all the data you need in just one table or for each record you need to fetch some other data from another tables?
One way is to retrieve each record and do the calculation in a transaction and then store it on the database. In this way, you can also take advantage of ajax UI to inform the user about the progress of calculation. In this way, you should use SqlDataReader to fetching the data as it is very optimized and has less overhead than using DataSet and DataTables and also you can prevent several type-castings. In addition, you can also make it optimized by taking advantage of TPL or make it configurable for fetching/updating N records each time. This approach works if you have the ID of the records. You also need to have a field for your records to track your calculations in case of any disconnection or crashes or iisreset execution so that you can resume the calculation instead of rerunning it again.

Attaching two memory databases

I am collecting data every second and storing it in a ":memory" database. Inserting data into this database is inside a transaction.
Everytime one request is sending to server and server will read data from the first memory, do some calculation, store it in the second database and send it back to the client. For this, I am creating another ":memory:" database to store the aggregated information of the first db. I cannot use the same db because I need to do some large calculation to get the aggregated result. This cannot be done inside the transaction( because if one collection takes 5 sec I will lose all the 4 seconds data). I cannot create table in the same database because I will not be able to write the aggregate data while it is collecting and inserting the original data(it is inside transaction and it is collecting every one second)
-- Sometimes I want to retrieve data from both the databses. How can I link both these memory databases? Using attach database stmt, I can attach the second db to the first one. But the problem is next time when a request comes how will I check the second db is exist or not?
-- Suppose, I am attaching the second memory db to first one. Will it lock the second database, when we write data to the first db?
-- Is there any other way to store this aggregated data??
As far as I got your idea, I don't think that you need two databases at all. I suppose you are misinterpreting the idea of transactions in sql.
If you are beginning a transaction other processes will be still allowed to read data. If you are reading data, you probably don't need a database lock.
A possible workflow could look as the following.
Insert some data to the database (use a transaction just for the
insertion process)
Perform heavy calculations on the database (but do not use a transaction, otherwise it will prevent other processes of inserting any data to your database). Even if this step includes really heavy computation, you can still insert and read data by using another process as SELECT statements will not lock your database.
Write results to the database (again, by using a transaction)
Just make sure that heavy calculations are not performed within a transaction.
If you want a more detailed description of this solution, look at the documentation about the file locking behaviour of sqlite3: http://www.sqlite.org/lockingv3.html

Teradata Change data capture

My team is thinking about developing a real time application (a bunch of charts, gauges etc) reading from the database. At the backend we have a high volume Teradata database. We expect some other applications to be constantly feeding in data into this database.
Now we are wondering about how to feed in the changes from the database to the application. Polling from the application would not be a viable option in our case.
Are there any tools that are available within Teradata that would help us achieve this?
Any directions on this would be greatly appreciated
We faced similar requirement. But in our case client asked us to provide daily changes to a purchase orders table. That means we had to run a batch of scripts every day to capture the changes occuring to the table.
So we started to collect data every day and store the data in a sparse history format in another table. So the process is simple here. We collect a purchase order details record in the against first day's date in the history table. And then the next day we compare the next day's feed record against the history record and identify any change in that record. If there is a change in the purchase order record columns we collect that record and keep it in a final reporting table which will be shown to the client.
If you run the batch scripts every day once and there will be more than one change in a day to a record then this method cannot give you the full changes. For that you may need to run the batch scripts more than once every day based on your requirement.
Please let us know if you find any other solution. Hope this helps.
There is a change data capture tool from wisdomforce.
http://www.wisdomforce.com/resources/docs/databasesync/DatabaseSyncBestPracticesforTeradata.pdf
It would it probably work in this case
Are triggers with stored procedures an option?
CREATE TRIGGER dbname.triggername
AFTER INSERT ON db_name.tbl_name
REFERENCING stored_procedure
Theoretically speaking, you can write external stored procedures which may call UDFs written in Java or C/C++ etc which can push the row data to your application in near real time.

Resources