Azure Synapse replicated to Cosmos DB? - azure-cosmosdb

We have a Azure data warehouse db2(Azure Synapse) that will need to be consumed by read only users around the world, and we would like to replicate the needed objects from the data warehouse potentially to a cosmos DB. Is this possible, and if so what are the available options? (transactional, merege, etc)

Synapse is mainly about getting your data to do analysis. I dont think it has a direct export option, the kind you have described above.
However, what you can do, is to use 'Azure Stream Analytics' and then you should be able to integrate/stream whatever you want to any destination you need, like an app or a database ands so on.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure-stream-analytics
I think you can also pull the data into BI, and perhaps setup some kind of a automatic export from there.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi

Related

Transfer huge data from maria db to ssms when their is data change in mariadb

I did tried using schedular for Databricks notebook but it is creating unnecessary of loading data. The data in mariadb changes randomly it is not fixed , if I try pipeline I cant call a trigger for change in data and transfer of data from one Database to another.
Please help me with any pipeline ideas , azure datafactory ideas or python codes as well so that I can transfer tables when their are changes in Mariadb
One way to trigger pipeline is to use event-based trigger.
Creating event-based trigger in Azure Data Factory. Create trigger and select Type as Custom events.
Refer - https://www.mssqltips.com/sqlservertip/6063/create-event-based-trigger-in-azure-data-factory/
Second way is to use logic app. This is the best approach for your query.
Refer this answer by Trent Tamura

Verification of export parallelization

When it comes to export, we have the following property options which affect concurrency of the export either to storage directly or to external table (documentation link):-
distribution
distributed
spread
concurrency
query_fanout_nodes_percent
Say, I tweak these options and increase/decrease concurrency based on shards or nodes, is there any Kusto command that will allow me to exactly see how many of these parallel threads of export (whether it's based on per_shard or per_node or some percent) are running? The command .show operation details doesn't show these details , it just shows how many separate export commands are issued by client and not the related parallelization details.
As it stands now, there is no additional information that the system will provide regarding the threads used in the export operation in the same way that this information is not available for queries.
Can you add to your question the benefit of having such information? Is it to track the progress of the command? In any case, if this is something that you feel is missing from the service please open a new item or vote for an existing item in the Azure Data Explorer user voice

GCP encryption thru Beam / Dataflow APIs for Bigquery and Cloud SQL

Context: We are trying to load some CSV format data into GCP BigQuery using GCP Dataflow (Apache Beam). As a part of this for the first time (for each table) creating the BQ tables thru BigQueryIO API. One of the customer requirement is the data on GCP needs to be encrypted using Customer supplied/managed Encryption keys.
Problem Statement: We are not able to find any way to specify the "Custom Encryption Keys" thru APIs while creating Tables. The GCP documentation details about how to specify the Custom encryption keys thru GCP BQ Console but could not find anything for specifying it thru APIs from within DataFlow Code.
Code Snippet:
String tableSpec = new StringBuilder().append(PipelineConstants.PROJECT_ID).append(":")
.append(dataValue.getKey().target_dataset).append(".").append(dataValue.getKey().target_table_name)
.toString();
ValueProvider<String> valueProvider = StaticValueProvider.of("gs://bucket/folder/");
dataValue.getValue().apply(Count.globally()).apply(ParDo.of(new RowCount(dataValue.getKey())))
.apply(ParDo.of(new SourceAudit(runId)));
dataValue.getValue().apply(ParDo.of(new PreProcessing(dataValue.getKey())))
.apply(ParDo.of(new FixedToDelimited(dataValue.getKey())))
.apply(ParDo.of(new CreateTableRow(dataValue.getKey(), runId, timeStamp)))
.apply(BigQueryIO.writeTableRows().to(tableSpec)
.withSchema(CreateTableRow.getSchema(dataValue.getKey()))
.withCustomGcsTempLocation(valueProvider)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
Query: If anybody could let us know
If this is possible to provide encryption key thru Beam API?
If its not possible with the current version what could be the possible work
around?
Kindly let know if additional information is required.
Customer supplied encryption keys is a new feature, not all libraries have been updated to support it yet.
If you know the table name in advance, you can use UI/CLI or API to create table, then run your normal flow to load data into that table. That might be a work around for you.
https://cloud.google.com/bigquery/docs/customer-managed-encryption#create_table
API to create table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
You need to set this section on table object:
"encryptionConfiguration": {
"kmsKeyName": string
}
More details on table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

How to validate data in Teradata from Oracle

My source data is in Oracle and target data is in Teradata.Can you please provide me the easy and quick way to validate data .There are 900 tables.If possible can you provide syntax too
There is a product available known as the Teradata Gateway that works with Oracle and allows you to access Teradata in a "heterogeneous" manner. This may not be the most effective way to compare the data.
Ultimately what your requirements sound more process driven and to be done effectively would require the source data to be compared/validated as stage tables on the Teradata environment after your ETL/ELT process has completed.

Resources