Verification of export parallelization - azure-data-explorer

When it comes to export, we have the following property options which affect concurrency of the export either to storage directly or to external table (documentation link):-
distribution
distributed
spread
concurrency
query_fanout_nodes_percent
Say, I tweak these options and increase/decrease concurrency based on shards or nodes, is there any Kusto command that will allow me to exactly see how many of these parallel threads of export (whether it's based on per_shard or per_node or some percent) are running? The command .show operation details doesn't show these details , it just shows how many separate export commands are issued by client and not the related parallelization details.

As it stands now, there is no additional information that the system will provide regarding the threads used in the export operation in the same way that this information is not available for queries.
Can you add to your question the benefit of having such information? Is it to track the progress of the command? In any case, if this is something that you feel is missing from the service please open a new item or vote for an existing item in the Azure Data Explorer user voice

Related

Setting offer throughput or autopilot on container is not supported for serverless accounts. Azure Cosmos DB migration tool

After Selecting all the details correct in Migration tool returns the error related throughput value and having 0 or -1 does not help.
Workaround to migrate data using tool is to just create the collection first in Azure portal Cosmos DB and then run the Migration tool with same details then it will add all the rows to the same collection created. Main issue here was creation of new collection but i do not know why it returns something related to throughput which i think does not have anything to do with it.
In general, setting an explicitly value for offer_throughput is not allowed for serverless type of accounts. So either you omit that value and it will be applied by default one or you change your account type.
Related issues (still opened as of 23/02/2022):
https://learn.microsoft.com/en-us/answers/questions/94814/cosmos-quick-start-gt-create-items-contaner-gt-htt.html
https://github.com/Azure/azure-cosmos-dotnet-v2/issues/861

Azure Synapse replicated to Cosmos DB?

We have a Azure data warehouse db2(Azure Synapse) that will need to be consumed by read only users around the world, and we would like to replicate the needed objects from the data warehouse potentially to a cosmos DB. Is this possible, and if so what are the available options? (transactional, merege, etc)
Synapse is mainly about getting your data to do analysis. I dont think it has a direct export option, the kind you have described above.
However, what you can do, is to use 'Azure Stream Analytics' and then you should be able to integrate/stream whatever you want to any destination you need, like an app or a database ands so on.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-azure-stream-analytics
I think you can also pull the data into BI, and perhaps setup some kind of a automatic export from there.
more details here - https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

DynamoDB limitations when deploying MoonMail

I'm trying to deploy MoonMail on AWS. However, I receive this exception from CloudFormation:
Subscriber limit exceeded: Only 10 tables can be created, updated, or deleted simultaneously
Is there another way to deploy without opening support case and asking them to remove my limit?
This is an AWS limit for APIs: (link)
API-Specific Limits
CreateTable/UpdateTable/DeleteTable
In general, you can have up to 10
CreateTable, UpdateTable, and DeleteTable requests running
simultaneously (in any combination). In other words, the total number
of tables in the CREATING, UPDATING or DELETING state cannot exceed
10.
The only exception is when you are creating a table with one or more
secondary indexes. You can have up to 5 such requests running at a
time; however, if the table or index specifications are complex,
DynamoDB might temporarily reduce the number of concurrent requests
below 5.
You could try to open a support request to AWS to raise this limit for your account, but I don't feel this is necessary. It seems that you could create the DynamoDB tables a priori, using the AWS CLI or AWS SDK, and use MoonMail with read-only access to those tables. Using the SDK (example), you could create those tables sequentially, without reaching this simultaneously creation limit.
Another option, is to edit the s-resources-cf.json file to include only 10 tables and deploy. After that, add the missing tables and deploy again.
Whatever solution you apply, consider creating an issue ticket in MoonMail's repo, because as it stands now, it does not work in a first try (there are 12 tables in the resources file).

How could Bosun fit for my usecase?

I need an alerting system where I could have my own metric and threshold to report for anomalies (basically alerting on the basis of logs and data in DB). I explored Bosun but not sure how to make it work. I have following issues:-
There are pre-defined items which are all system level, but I couldn't find a way to add new items, i.e. custom items
How will bosun ingest data other than scollector. As I understand could I use logstash as data source and totally miss OpenTDSP( Really don't like HBase dependency)?
By Items I think you mean metrics. Bosun learns about metrics, and their tag relationships when you do one of the following:
Relay opentsdb data through Bosun (http://bosun.org/api#sending-data)
Get copies of metrics sent to the api/index route http://bosun.org/api#apiindex
There are also metadata routes, which tell bosun about the metric, such as counter/gauge, unit, and description.
The logstash datasource will be deprecated in favor of an elastic datasource in the coming 0.5.0 release. But it is replaced by an elastic one is better (but requires ES 2+). To use those expressions see the raw documentation (bosun.org docs will updated next release): https://raw.githubusercontent.com/bosun-monitor/bosun/master/docs/expressions.md. To add it you would have something like the following in the config:
elasticHosts=http://ny-lselastic01.ds.stackexchange.com:9200,http://ny-lselastic02.ds.stackexchange.com:9200,http://ny-lselastic03.ds.stackexchange.com:9200
The functions to query various backends are only loaded into the expression library when the backend is configured.

Resources