Azure DataFactory - Can we ordered CopyData source before ingestion - azure-cosmosdb

I have a case where I need to ingest CSV files into CosmosDb.
So I have one DataSets to process the CSV, and another to prepare CosmosDb schema.
In the pipeline, I have a CopyData task mapping from CSV and then writing in Cosmos.
In the CopyData Source parameter, I specify an Azure Blob Storage where CSV are stored.
Until now, there was no problem.
Thing is, I now need to find a way to ensure that blobs are ingested like an alphabeticaly ordered files array (based on fileName).
Is there a way ?

It's hard to sort by fileNames in ADF.
One way to achieve:
Save all your fileNames in a csv file, then use Sort activity in Data Flow and overwrite this file. Finally, use Lookup and For Each activity to copy blobs to Cosmos DB.
Another way:
Pass childItems of Get Metadata activity's output to Azure Function. Then sort fileNames in Azure Function. Finally, loop output of Function by For Each activity and copy to Cosmos DB.

Related

How to ingest CSV data into kusto

I need to ingest the csv data into kusto using kusto query for geographical map visualization. but I couldn't find any query to ingest csv data. please help me on this.
Thank you
Try the One-Click ingestion wizard or ingest-from-storage commands (requires data to be in Azure Blob or ADLSv2 file.
You can ingest using query as well as described here: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/data-ingestion/ingest-from-storage
You will need to place your files in Azure Blob Storage or Azure Data Lake Store Gen 2.
However, one click ingestion wizard method is advisable as that will ingest data via Data Management component that offers better queuing and ingestion orchestration instead of ingesting directly into database/table using the command shared above.

Can Cosmos DB read data from File Blob or Csv or Json file at a batch size?

I am currently researching around reading data using cosmos db, basically our current approach is using a .Net Core C# application with Cosmos DB SDK to read entire data from a file blob or csv or json file, and then use the for loop, one by one pulling its information from cosmos db and compare/insert/update, this somehow feels inefficient.
We're curious if cosmos DB could perform the ability to read a bunch of data (let's say a batch size of 5000 records) from file blob or csv or json file and similar like SQL server, do a bulk insert or merge statement within the cosmos DB directly? Basically the goal is not doing same operation one by one for each item interacting with cosmos DB.
I've noticed and researched in BulkExecutor as well, the BulkUpdate looks like a more straightforward way of directly updating an item without considering if it should update. In my case for example, if I have 1000 items, only 300 items' properties got changed, so I'll just need to update those 300 items without updating the irrelevant remaining 700 items as well. Basically I need to find out a way to have Cosmos DB do the data compare as in a collection, not inside a loop and focus on each single item, it could either perform a update or output a collection that I can use for later updating as well.
Would the (.Net + SDK) application be able to perform that or would a cosmos DB stored procedure could handle similar job? Any other Azure tool is welcome as well!
What you are looking for is the Cosmos DB Bulk Executor library
It is designed to operate using millions of records in bulk and it is very efficient.
You can find the .NET documentation here

Can we insert a data into Firebase realtime Database?

One child node of my Firebase realtime database has become huge (aroung 20 GB) and I need to purge this and insert the the extracted data of last month from the backup into the Firebase realtime database using Python Admin SDK.
In the documentation, I see the following options:
set - Write or replace data to a defined path, like messages/users/
update - Update some of the keys for a defined path without replacing all of the data
push - Add to a list of data in the database. Every time you push a new node onto a list, your database generates a unique key, like messages/users//
transaction - Use transactions when working with complex data that could be corrupted by concurrent updates
However, I want to add/insert the data from the firebase backup. I have to insert because the app is used in production and I cannot afford the overwrite of data.
Is there any method available to insert/add the data and not overwrite the data?
Any help/support is greatly appreciated.
There is no way to do this in Firebase Realtime Database without reading the current value of each location.
The only operation that allows you to update data based on its existing value is a transaction. A Firebase transaction gives you the (likely) current value at a location, and you then return what the new value should become.
But if the data you're restoring is (largely) the same as the data you have in the database, you might be able to use an update() call with sufficiently deep paths.

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

what is the best way to upload a csv file into a MS SQL table?

Several approaches:
Use SQL Bulk Import Stored Proc and call the stored proc with the file path
Use SqlBulkCopy in System.Data.SqlClient dll
Read the file line by line and then insert into a table row by row
Any other ways?
Which one is best? I just want the user to select a file from asp.net webpage. And then click on Upload button to store the file in DB.
Secondly, do I need to move the file in server's memory before the file is copied into db table?
The DB won't know of the file because everything should be decoupled and layered. Saving the file to some shared location adds overhead and tidy ups etc
Yes
Surely you want this to be an atomic operation. 10k rows would be 10 round trips with a client side transaction running. If not atomic, then you'd need staging tables and tidy ups
Parse in c#, send to the DB with a table valued parameter. Otherwise, probably nothing same and/or realistic...

Resources