Kusto database creation datetime - azure-data-explorer

We need to cleanup Kusto databases in a cluster that we have created and are not getting used thus have 0 size. I'm planning to use powershell script for it and can perform it using combination of Get-AzKustoDatabase command and filtering out the databases with size 0 and using
Remove-AzKustoDatabase command
However, I need to know if the database was not created recently (say 10 days back) before removing it. Is there any way we can identify kusto database creation date?

If the database was created during the last 365 days, you can find an entry for its creation on the Journal: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/journal

Related

Column pruning on parquet files defined as an external table

Context: We store historical data in Azure Data Lake as versioned parquet files from our existing Databricks pipeline where we write to different Delta tables. One particular log source is about 18 GB a day in parquet. I have read through the documentation and executed some queries using Kusto.Explorer on the external table I have defined for that log source. In the query summary window of Kusto.Explorer I see that I download the entire folder when I search it, even when using the project operator. The only exception to that seems to be when I use the take operator.
Question: Is it possible to prune columns to reduce the amount of data being fetched from external storage? Whether during external table creation or using an operator at query time.
Background: The reason I ask is that in Databricks it is possible to use the SELCECT statement to only fetch the columns I'm interested in. This reduces the query time significantly.
As David wrote above, the optimization does happen on Kusto side, but there's a bug with the "Downloaded Size" metric - it presents the total data size, regardless of the selected columns. We'll fix. Thanks for reporting.

SQL Server Data Archiving

I have a SQL Azure database on which I need to perform some data archiving operation.
Plan is to move all the irrelevant data from the actual tables into Archive_* tables.
I have tables which have up to 8-9 million records.
One option is to write a stored procedure and insert data in to the new Archive_* tables and also delete from the actual tables.
But this operation is really time consuming and running for more than 3 hrs.
I am in a situation where I can't have more than an hour's downtime.
How can I make this archiving faster?
You can use Azure Automation to schedule execution of a stored procedure every day at the same time, during maintenance window, where this stored procedure will archive the oldest one week or one month of data only, each time it runs. The store procedure should archive data older than X number of weeks/months/years only. Please read this article to create the runbook. In a few days you will have all the old data archived and the Runbook will continue to do the job from now and on.
You can't make it faster, but you can make it seamless. The first option is to have a separate task that moves data in portions from the source to the archive tables. In order to prevent table lock escalations and overall performance degradation I would suggest you to limit the size of a single transaction. E.g. start transaction, insert N records into the archive table, delete these records from the source table, commit transaction. Continue for a few days until all the necessary data is transferred. The advantage of that way is that if there is some kind of a failure, you may restart the archival process and it will continue from the point of the failure.
The second option that does not exclude the first one really depends on how critical the performance of the source tables for you and how many updates are happening with them. It if is not a problem you can write triggers that actually pour every inserted/updated record into an archive table. Then, when you want a cleanup all you need to do is to delete the obsolete records from the source tables, their copies will already be in the archive tables.
In the both cases you will not need to have any downtime.

Bulk insert in datastage using teradata connector

I am new to datastage, I created a simple job to get data from .ds file and load it in teradata using teradata connector, in the properties of teradata connector I set the
access_method=Bulk, max_session=2, min_session=1,load_type=load,
max_buffer_size=10000 and max_partition_sessions=1
but the job is continously in running state without displaying amount of rows transfered. Whereas when I choose the access_method=immediate then it starts to proceed, can any one suggest me the right way to do load in parallel.
Are you aware of the nodes you are using? Add APT_CONFIG_FLE environment variable and try to see from director how many nodes are used. N number of nodes means 2n number of processes. This might help you.
If this doesn't help try looking into the source database connector stage to see if increasing values for any options helps.

SQL Server 2005 frequent deadlocks

we have a web application built using ASP.NET 4.0 (C#) and we are using SQL Server 2005 as the backend.
the application itself is a workflow engine where each record is attested by 4 role bearers over 18 days in a month.
we roughly have 200k records which come on 1st of each month.
during the 18 days - some people are looking and attesting records whereas system admin might be changing the ownership of these records.
my question or worry is that we often get deadlock issues in the database.
some user may have 10000 records in their kitty and they try to attest all records in one go whereas system admin may also change ownership in bulk for few thousand records and at that point we get deadlock and even when two or more users with laods of accounts try to attest - we get deadlocks.
We are extensively using stored procs with transactions. Is there a way to code for such situations?
or to simply avoid deadlocks.
Apologies for asking in such a haphazard manner but any hints or tips are welcome and if you need more info to under stand the issue then let me know.
thanks
Few suggestions:
1) Use the same order for reading/writing data from/into tables.
Example #1 (read-write deadlock): Avoid creating a stored procedure usp_ReadA_WriteB that reads from A and then writes into B and another stored procedure usp_ReadB_WriteA that reads from B and then writes into A. Read this blog post please.
Example #2 (write-write deadlock): Avoid creating a stored procedure usp_WriteA_WriteB that writes data into table A and then into table B and another stored procedure usp_WriteB_writeA that writes data into the same tables: table B and then into table A.
2) Minimize duration of transactions. Minimize the affected rows to reduce the number of locks. Be attention at 5000 locks threshold for lock escalation.
3) Optimize your queries. For example: look for [Clustered]{Index|Table}Scan, {Key|RID} Lookup and Sort operators in execution plans. Use indices but, also, try to minimize the number of indices and try to minimize the size of every index (first try to minimize the index key's size). Read this blog post, please.

Teradata Change data capture

My team is thinking about developing a real time application (a bunch of charts, gauges etc) reading from the database. At the backend we have a high volume Teradata database. We expect some other applications to be constantly feeding in data into this database.
Now we are wondering about how to feed in the changes from the database to the application. Polling from the application would not be a viable option in our case.
Are there any tools that are available within Teradata that would help us achieve this?
Any directions on this would be greatly appreciated
We faced similar requirement. But in our case client asked us to provide daily changes to a purchase orders table. That means we had to run a batch of scripts every day to capture the changes occuring to the table.
So we started to collect data every day and store the data in a sparse history format in another table. So the process is simple here. We collect a purchase order details record in the against first day's date in the history table. And then the next day we compare the next day's feed record against the history record and identify any change in that record. If there is a change in the purchase order record columns we collect that record and keep it in a final reporting table which will be shown to the client.
If you run the batch scripts every day once and there will be more than one change in a day to a record then this method cannot give you the full changes. For that you may need to run the batch scripts more than once every day based on your requirement.
Please let us know if you find any other solution. Hope this helps.
There is a change data capture tool from wisdomforce.
http://www.wisdomforce.com/resources/docs/databasesync/DatabaseSyncBestPracticesforTeradata.pdf
It would it probably work in this case
Are triggers with stored procedures an option?
CREATE TRIGGER dbname.triggername
AFTER INSERT ON db_name.tbl_name
REFERENCING stored_procedure
Theoretically speaking, you can write external stored procedures which may call UDFs written in Java or C/C++ etc which can push the row data to your application in near real time.

Resources