IngestFromStreamAsync method does not work

IngestFromStreamAsync method does not work - azure-data-explorer

I manage to ingest data successfully using below code
var kcsbDM = new KustoConnectionStringBuilder(
"https://test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateDirectIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStream(memStream, ingestProps);
}
when I try to use QueuedClient and IngestFromStreamAsync, the code is executed successfully but no any data is ingested into database even after 30 minutes
var kcsbDM = new KustoConnectionStringBuilder(
"https://ingest-test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStreamAsync(memStream, ingestProps);
}

Try running .show ingestion failures on "https://test123.southeastasia.kusto.windows.net" endpoint, see if there are ingestion error.
Also, you set Queue reporting method, you can get the detailed result by reading from the queue.
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
(On the first example you used KustoQueuedIngestionProperties, you should use KustoIngestionProperties. KustoQueuedIngestionProperties has additional properties that will be ignored by the ingest client, ReportLevel and ReportMethod for example)

Could you please change the line to:
var ingestionResult = await ingestClient.IngestFromStreamAsync(memStream, ingestProps);
Also please note that queued ingestion has a batching stage of up to 5 minutes before the data is actually ingested:
IngestionBatching policy
.show table ingestion batching policy

I find the reason finally, need to enable stream ingestion in the table:
.alter table TraceLog policy streamingingestion enable
See the Azure documentation for details.

enable streamingestion policy is actually only needed if
stream ingestion is turned on in the cluster (azure portal)
the code is using CreateManagedStreamingIngestClient
the ManagedStreamingIngestClient will first try stream ingesting the data, if it fails a few times, then it will use the QueuedClient
if the ingesting data is smaller, under 4MB, it's recommended to use this client.
if using QueuedClient, you can try
.show commands-and-queries | | where StartedOn > ago(20m) and Text contains "{YourTableName}" and CommandType =="DataIngestPull"
This can give you the command executed; however it could have latency > 5 mins
Finally, you can check the status with any client you use, do this
StreamDescription description = new StreamDescription
{
SourceId = Guid.NewGuid(),
Stream = dataStream
};
then you have the source id
ingesting by calling this:
var checker = await client.IngestFromStreamAsync(description, ingestProps);
after that, call
var statusCheck = checker.GetIngestionStatusBySourceId(description.sourceId.Value);
You can figure out the status of this ingestion job. It's better wrapped in a separate thread, so you can keep checking once a few seconds, for example.

Related

Server performance question about streaming from cosmos dB

I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource
public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
var container = GetContainer(containerName);
using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);
while (iterator.HasMoreResults)
{
foreach (var item in await iterator.ReadNextAsync())
{
yield return item;
}
}
}
I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.
So, how will this affect server load/performance.
I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.

The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.
If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.
An example of how something like that could look:
var qry = container
.GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
.ToFeedIterator();
var results = new List<Item>();
var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;
while (qry.HasMoreResults)
{
if (delay > 0)
{
await Task.Delay(delay);
}
previousElapsed = stopwatch.ElapsedMilliseconds;
var response = await qry.ReadNextAsync();
var charge = response.RequestCharge;
var elapsed = stopwatch.ElapsedMilliseconds;
var delta = elapsed - previousElapsed;
delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);
foreach (var item in response)
{
results.Add(item);
}
}
Edit:
Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:
Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.

How to get transfer data from stripe using payment id and account id (.Net)

I am trying to get transfer related data from stripe
TransferService service = new TransferService();
TransferListOptions stripeTransferList = new TransferListOptions
{
Destination = accountId,
Limit = 100
};
var list = await service.ListAsync(stripeTransferList);
var finalData = list.FirstOrDefault(x => x.DestinationPaymentId == paymentId);
so when I try to search paymentId from that list I was not able to find any because the page limit is 100 only
Limit = 100
how to fetch all the data and filter from that??

The stripe-dotnet library supports automatic pagination and it's documented here.
This lets you paginate through all the data in your account based on specific criteria that you passed as parameters. For example you can list all Transfer objects made to a specific connected account this way:
TransferService service = new TransferService();
TransferListOptions listOptions = new TransferListOptions
{
Destination = "acct_123456",
Limit = 100
};
foreach (var transfer in service.ListAutoPaging(listOptions)) {
// Do something with this Transfer
}
Now, this allows you to iterate over every Transfer but if you have a lot of data this could be quite slow. An alternative would be to start from the charge id, the py_123456 that you have from your connected account. If you know which account this charge was created on, you can fetch it directly via the API. This is done using the Retrieve Charge API along with passing the connected account id as documented here.
The Charge resource has the source_transfer property which is the id of the Transfer (tr_123) from the platform that created this charge. You can also use the Expand feature, which lets you fetch the entire Transfer object back to get some detailed information about it.
The code would look like this
var transferOptions = new TransferGetOptions{};
transferOptions.AddExpand("source_transfer");
var requestOptions = new RequestOptions();
requestOptions.StripeAccount = "acct_12345";
TransferService service = new TransferService();
Charge charge = service.Get(transferOptions, requestOptions);
// Access information about the charge or the associated transfer
var transferId = charge.SourceTransfer.Id;
var transferAmount = charge.SourceTransfer.TransferData.Amount;

Ingest from storage with persistDetails = true not save ingest status result

I'm now implement a program to migrate large amount of data to ADX base on Ingest from Storage feature of ADX and I'm need to check that status of each ingestion request each time the request finish but I'm facing an issue
Base on MS document in here
If I set the persistDetails = true for example with the command below it must save the ingestion status but currently this setting seem not work (with or without it)
.ingest async into table MigrateTable
(
h'correct blob url link'
)
with (
jsonMappingReference = 'table_mapping',
format = 'json',
persistDetails = true
)
Above command will return an OperationId and when I using it to check export status when the ingest task finish I always get this error message :
Error An admin command cannot be executed due to an invalid state: State='Operation 'DataIngestPull' does not persist its operation results' clientRequestId: KustoWebV2;
Can someone clarify for me what is the root cause relate to this? With me it seem like a bug relate to ADX

Ingesting data directly against the Data Engine, by running .ingest commands, is usually not recommended, compared to using Queued Ingestion (motivation included in the link). Using Kusto's ingestion client library allows you to track the ingestion status.
Some tools/services already do that for you, and you can consider using them directly. e.g. LightIngest, Azure Data Factory
If you don't follow option 1, you can still look for the state/status of your command using the operation ID you get when using the async keyword, by using .show operations
You can also use the client request ID to filter the result set of .show commands to view the state/status of your command.
If you're interested in looking specifically at failures, .show ingestion failures is also available for you.
The persistDetails option you specified in your .ingest command actually has no effect - as mentioned in the docs:
Not all control commands persist their results, and those that do usually do so by default on asynchronous executions only (using the async keyword). Please search the documentation for the specific command and check if it does (see, for example data export).

============ Update sample code follow suggestion from Yoni ========
Turn out, other member in my team mess up with access right with adx, after fixing it everything work fine
I just have one concern relate to PartiallySucceeded that need clarify from #yoni or someone have better knowledge relate to that
try
{
var ingestProps = new KustoQueuedIngestionProperties(model.DatabaseName, model.IngestTableName)
{
ReportLevel = IngestionReportLevel.FailuresAndSuccesses,
ReportMethod = IngestionReportMethod.Table,
FlushImmediately = true,
JSONMappingReference = model.IngestMappingName,
AdditionalProperties = new Dictionary<string, string>
{
{"jsonMappingReference",$"{model.IngestMappingName}" },
{ "format","json"}
}
};
var sourceId = Guid.NewGuid();
var clientResult = await IngestClient.IngestFromStorageAsync(model.FileBlobUrl, ingestProps, new StorageSourceOptions
{
DeleteSourceOnSuccess = true,
SourceId = sourceId
});
var ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
while (ingestionStatus.Status == Status.Pending)
{
await Task.Delay(WaitingInterval);
ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
}
if (ingestionStatus.Status == Status.Succeeded)
{
return true;
}
LogUtils.TraceError(_logger, $"Error when ingest blob file events, error: {ingestionStatus.ErrorCode.FastGetDescription()}");
return false;
}
catch (Exception e)
{
return false;
}

Invalid index exception when using BulkExecutor in CosmosDb

I have an error when I'm trying to use BulkExecutor to update one of the properties in CosmosDb. The error message is "Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index"
Important point- I don't have partition key defined on my collection.
Here is my code:
SetUpdateOperation<string> player1NameUpdateOperation = new SetUpdateOperation<string>("Player1Name", name);
var updateOperations = new List<UpdateOperation>();
updateOperations.Add(player1NameUpdateOperation);
var updateItems = new List<UpdateItem>();
foreach (var match in list)
{
string id = match.id;
updateItems.Add(new UpdateItem(id, null, updateOperations));
}
var executor = new Microsoft.Azure.CosmosDB.BulkExecutor.BulkExecutor(_client, _collection);
await executor.InitializeAsync();
var executeResult = await executor.BulkUpdateAsync(updateItems);
var count = executeResult.NumberOfDocumentsUpdated;
What am I missing?

If I run the bulk executor on a collection without a partition key, I get the same error. If I run it with a collection that does have it and i specify it, the bulk executor works fine.
Pretty sure they just don't support it right now through the bulk executor api, just use the normal cosmos api for updating the doc as a workaround for now.

How to force SSL encryption?

We use the following code to communicate with DynamoDB:
using (var client = AWSClientFactory.CreateAmazonDynamoDBClient(RegionEndpoint.USEast1))
{
var table = Table.LoadTable(client, "Users");
var item = await table.GetItemAsync(id);
}
How can I force AWS.SDK for .NET to always use SSL connections? This is very important as we want to store user data including passwords in DynamoDB.

Here is a code snippet that will allow you to set the endpoint explicitly:
AmazonDynamoDBConfig config = new AmazonDynamoDBConfig();
config.ServiceURL = "https://dynamodb.us-east-1.amazonaws.com";
var client = new AmazonDynamoDBClient(config);
Feel free to replace the region (us-east-1) with any other AWS region.
This example is based on the documentation available here on setting the endpoint: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TestingDotNetApiSamples.html

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

IngestFromStreamAsync method does not work - azure-data-explorer

I find the reason finally, need to enable stream ingestion in the table: .alter table TraceLog policy streamingingestion enable See the Azure documentation for details.

Related

Server performance question about streaming from cosmos dB

How to get transfer data from stripe using payment id and account id (.Net)

Ingest from storage with persistDetails = true not save ingest status result

Invalid index exception when using BulkExecutor in CosmosDb

How to force SSL encryption?

Categories

Resources