GCP encryption thru Beam / Dataflow APIs for Bigquery and Cloud SQL - encryption

Context: We are trying to load some CSV format data into GCP BigQuery using GCP Dataflow (Apache Beam). As a part of this for the first time (for each table) creating the BQ tables thru BigQueryIO API. One of the customer requirement is the data on GCP needs to be encrypted using Customer supplied/managed Encryption keys.
Problem Statement: We are not able to find any way to specify the "Custom Encryption Keys" thru APIs while creating Tables. The GCP documentation details about how to specify the Custom encryption keys thru GCP BQ Console but could not find anything for specifying it thru APIs from within DataFlow Code.
Code Snippet:
String tableSpec = new StringBuilder().append(PipelineConstants.PROJECT_ID).append(":")
.append(dataValue.getKey().target_dataset).append(".").append(dataValue.getKey().target_table_name)
.toString();
ValueProvider<String> valueProvider = StaticValueProvider.of("gs://bucket/folder/");
dataValue.getValue().apply(Count.globally()).apply(ParDo.of(new RowCount(dataValue.getKey())))
.apply(ParDo.of(new SourceAudit(runId)));
dataValue.getValue().apply(ParDo.of(new PreProcessing(dataValue.getKey())))
.apply(ParDo.of(new FixedToDelimited(dataValue.getKey())))
.apply(ParDo.of(new CreateTableRow(dataValue.getKey(), runId, timeStamp)))
.apply(BigQueryIO.writeTableRows().to(tableSpec)
.withSchema(CreateTableRow.getSchema(dataValue.getKey()))
.withCustomGcsTempLocation(valueProvider)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
Query: If anybody could let us know
If this is possible to provide encryption key thru Beam API?
If its not possible with the current version what could be the possible work
around?
Kindly let know if additional information is required.

Customer supplied encryption keys is a new feature, not all libraries have been updated to support it yet.
If you know the table name in advance, you can use UI/CLI or API to create table, then run your normal flow to load data into that table. That might be a work around for you.
https://cloud.google.com/bigquery/docs/customer-managed-encryption#create_table
API to create table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
You need to set this section on table object:
"encryptionConfiguration": {
"kmsKeyName": string
}
More details on table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource

Related

Azure Synapse replicated to Cosmos DB?

We have a Azure data warehouse db2(Azure Synapse) that will need to be consumed by read only users around the world, and we would like to replicate the needed objects from the data warehouse potentially to a cosmos DB. Is this possible, and if so what are the available options? (transactional, merege, etc)
Synapse is mainly about getting your data to do analysis. I dont think it has a direct export option, the kind you have described above.
However, what you can do, is to use 'Azure Stream Analytics' and then you should be able to integrate/stream whatever you want to any destination you need, like an app or a database ands so on.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure-stream-analytics
I think you can also pull the data into BI, and perhaps setup some kind of a automatic export from there.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi

How to programmatically create a database in ADX using Java

I am using REST API (https://learn.microsoft.com/en-us/azure/kusto/api/rest/request) to interact with the database in ADX.
I want to create more databases in the same cluster. How should I do it using Java?
I am not using the Java SDK. I have relied on the REST APIs so far.
I think I cannot create a new database using the REST API, so looking for alternative.
It would have been really helpful if there was a command like ".create table tablename" just for the database.
Clusters and databases can be managed using the "Control Plane", aka ARM APIs. These APIs have libraries in different languanges (as well as REST).
For instance, for the java library use this link, for C# use this link
Example for how to create a database in C# library (Java should be very similar):
var database = managementClient.Databases.CreateOrUpdate(resourceGroup, clusterName, databaseName, new Database(location, softDeletePeriod: softDeletePeriod, hotCachePeriod: hotCachePeriod));
Read more here
I think you'll need to use the Azure ARM REST API since the database is treated as a resource. From that point you can interact with it through the ADX APIs.

How to delete all data in a partition?

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

ASP.NET Azure Blob Geographically Redundant Storage - How to use?

I have been searching for an answer on MS, SE and Google and cannot find it. I want to use the GRS option for Azure Storage (Cloud Block Blobs) but I cannot figure out how to properly do that.
I created my storage object in Azure and chose the GRS option.
I get that I have a primary and secondary connection string and know how to get that from the Azure portal.
What I do not know, in ASP.NET 4.0, is how to set both connection strings in the CloudBlockClient and gracefully handle the primary storage being unavailable.
--What exception is thrown and where, when primary is unavailable? Is this thrown when I create the client, or when I try to get a blob reference?
-- How do I then use the secondary?
Do I have to just test for any old exception and then try using the secondary connection string in a new CloudBlockClient if the primary does not work? Or is there anything in the API for this. I would think there would be but I cannot find it.
None of the "How to use Azure Storage" tutorials I have seen go into this. Most of the documentation seems to date from before mid-2014 when this feature became generally available.
This blog post should help you. In short if you want to read from both primary and secondary you want to enable RA-GRS - essentially read access from the secondary. If you are using out storage client libraries you can also enable a retry policy that will first try to read from a primary and then from the secondary if the first read fails.

How to Read and Write to Amazon ElasticCache using ASP.NET

I am interested to know what commands allows me to write and read data to and from Amazon ElasticCache using the ASP.NET SDK. I've viewed the online documentation but couldn't figure out how it is done.
What I did in the code: I created to keys in the web.config to store the Id and Access password.
AmazonElastiCacheClient client = new AmazonElastiCacheClient(ElasticCache_Id, ElasticCache_Pass);
Initialize the AmazonElasticCacheClient object and pass the credentials strings.
I need a sample code that will demonstrate how to put data and how to retrieve data from the ElasticCache cluster. thanks.
It looks like you can only manage elasticcache clusters through the AWS SDK.
You can use any memcached client to read and write to elasticcache since that is the underlying technology.
Here is an example:
http://geekswithblogs.net/shaunxu/archive/2010/04/07/first-round-playing-with-memcached.aspx

Resources