Kusto Streaming Ingest: ErrorReason=Not Found - azure-data-explorer

I'm trying to ingest data with client.IngestFromStreamAsync or client.IngestFromStream from my c# application but always get the following error:
MonitoredActivityContext=(ActivityType=KustoManagedStreamingIngestClient.IngestFromStream, Timestamp=2021-02-08T17:57:52.0547851Z, ParentActivityId=570e5455-1c3d-4cb4-82ff-a06761e66a30, TimeSinceStarted=1921,1077 [ms])IngestionSourceId=f182ab29-812a-4b44-9d06-1951c7aa972f
IngestionSource=Stream
Error=Not Found (404-NotFound): . This normally represents a permanent error, and retrying is unlikely to help.
Error details:
DataSource='https://table.southcentralus.kusto.windows.net/v1/rest/ingest/bla/table?streamFormat=json&mappingName=JsonMapping',
DatabaseName=,
ClientRequestId='KI.KustoManagedStreamingIngestClient.IngestFromStream.ad8c8892-7495-483d-90bc-8585483445fa;73864817-246c-479c-a2da-138aca01b9a2;f182ab29-812a-4b44-9d06-1951c7aa972f',
ActivityId='00000000-0000-0000-0000-000000000000,
Timestamp='2021-02-08T17:57:53.9596167Z'.
This is how I define ingestion mapping
var kustoIngestionProperties = new KustoIngestionProperties(databaseName: databaseName, tableName: rtable)
{
Format = DataSourceFormat.json,
IngestionMapping = new IngestionMapping()
{
IngestionMappingReference = "JsonMapping",
IngestionMappingKind = Kusto.Data.Ingestion.IngestionMappingKind.Json
}
};
Before referencing the mapping I create it like this:
.create table ingest_table ingestion json mapping 'JsonMapping' '[{"column":"Timestamp","Properties":{"path":"$.Timestamp"}},{"column":"AskRatioVar","Properties":{"path":"$.AskRatioVar"}},{"column":"score_BidRatioVar","Properties":{"path":"$.score_BidkRatioVar"}},]'
Any ideas what could cause the error?
All Streaming examples seems to be outdate here: https://github.com/Azure/azure-kusto-samples-dotnet/tree/master/client/StreamingIngestionSample
Thank you

You should verify that the names of the database, table and ingestion mapping you're passing actually exist in your cluster.
Specifically - in the activity you referenced, you reference an ingestion mapping named JsonAnomalyMapping1, whereas the table only has a mapping named JsonAnomalyMapping

Related

AWS Java SDK DynamoDB, how to get attribute values from ExecuteStatementRequest response?

I'm using Java AWS SDK to query a DynamoDB table using ExecuteStatementRequest, but I'm don't know how to fetch the returned attribute values from the response.
Given I have the following query:
var response2 = client.executeStatement(ExecuteStatementRequest.builder()
.statement("""
UPDATE "my-table"
SET thresholdValue= thresholdValue + 12.5
WHERE assignmentId='item1#123#item2#456#item3#789'
RETURNING ALL NEW *
""")
.build());
System.out.println(response2.toString());
System.out.println(response2.getValueForField("Items", Collections.class)); // Doesn't cast to
This query executes fine and returns as part of the response attributes, however I can't find a way to get these values out of the response object using Java.
How can I do that?
I have found how to do that, however I'm not sure it this is the indicated way as the documentation doesn't provide any examples.
List items = response2.getValueForField("Items", List.class).get();
for (Object item : items) {
var values = (Map<String, AttributeValue>) item;
System.out.println(values.get("assignmentId").s());
System.out.println(values.get("thresholdValue").n());
}

Ingest from storage with persistDetails = true not save ingest status result

I'm now implement a program to migrate large amount of data to ADX base on Ingest from Storage feature of ADX and I'm need to check that status of each ingestion request each time the request finish but I'm facing an issue
Base on MS document in here
If I set the persistDetails = true for example with the command below it must save the ingestion status but currently this setting seem not work (with or without it)
.ingest async into table MigrateTable
(
h'correct blob url link'
)
with (
jsonMappingReference = 'table_mapping',
format = 'json',
persistDetails = true
)
Above command will return an OperationId and when I using it to check export status when the ingest task finish I always get this error message :
Error An admin command cannot be executed due to an invalid state: State='Operation 'DataIngestPull' does not persist its operation results' clientRequestId: KustoWebV2;
Can someone clarify for me what is the root cause relate to this? With me it seem like a bug relate to ADX
Ingesting data directly against the Data Engine, by running .ingest commands, is usually not recommended, compared to using Queued Ingestion (motivation included in the link). Using Kusto's ingestion client library allows you to track the ingestion status.
Some tools/services already do that for you, and you can consider using them directly. e.g. LightIngest, Azure Data Factory
If you don't follow option 1, you can still look for the state/status of your command using the operation ID you get when using the async keyword, by using .show operations
You can also use the client request ID to filter the result set of .show commands to view the state/status of your command.
If you're interested in looking specifically at failures, .show ingestion failures is also available for you.
The persistDetails option you specified in your .ingest command actually has no effect - as mentioned in the docs:
Not all control commands persist their results, and those that do usually do so by default on asynchronous executions only (using the async keyword). Please search the documentation for the specific command and check if it does (see, for example data export).
============ Update sample code follow suggestion from Yoni ========
Turn out, other member in my team mess up with access right with adx, after fixing it everything work fine
I just have one concern relate to PartiallySucceeded that need clarify from #yoni or someone have better knowledge relate to that
try
{
var ingestProps = new KustoQueuedIngestionProperties(model.DatabaseName, model.IngestTableName)
{
ReportLevel = IngestionReportLevel.FailuresAndSuccesses,
ReportMethod = IngestionReportMethod.Table,
FlushImmediately = true,
JSONMappingReference = model.IngestMappingName,
AdditionalProperties = new Dictionary<string, string>
{
{"jsonMappingReference",$"{model.IngestMappingName}" },
{ "format","json"}
}
};
var sourceId = Guid.NewGuid();
var clientResult = await IngestClient.IngestFromStorageAsync(model.FileBlobUrl, ingestProps, new StorageSourceOptions
{
DeleteSourceOnSuccess = true,
SourceId = sourceId
});
var ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
while (ingestionStatus.Status == Status.Pending)
{
await Task.Delay(WaitingInterval);
ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
}
if (ingestionStatus.Status == Status.Succeeded)
{
return true;
}
LogUtils.TraceError(_logger, $"Error when ingest blob file events, error: {ingestionStatus.ErrorCode.FastGetDescription()}");
return false;
}
catch (Exception e)
{
return false;
}

How to fetch All records from azure cosmos db using query

I want to fetch more than 100 records from azure-cosmos DB using select query.
I am writing a stored procedure and using a select query to fetch the record.
This is my stored procedure -
function getall(){
var context = getContext();
var response = context.getResponse();
var collection = context.getCollection();
var collectionLink = collection.getSelfLink();
var filterQuery = 'SELECT * FROM c';
collection.queryDocuments(collectionLink, filterQuery, {pageSize:-1 },
function(err, documents) {
response.setBody(response.getBody() + JSON.stringify(documents));
}
);
}
Initially, It was working with less amount of data in database.
But, with large amount of data,
The stored procedure is throwing this exception -
Encountered exception while executing function. Exception = Error:
Resulting message would be too large because of "Body". Return from
script with current message and use continuation token to call the
script again or modify your script. Stack trace: Error: Resulting
message would be too large because of "Body". Return from script with
current message and use continuation token to call the script again or
modify your script.
Document DB imposes limits on Response page size.
This link summarizes some of those limits:
Azure DocumentDb Storage Limits - what exactly do they mean?
You can paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly.
https://azure.microsoft.com/en-us/blog/documentdb-paging-support-with-top-and-more-query-improvements/
Are you using .NET sdk to retrieve the data returned by your stored procedure? If so, take advantage of the .HasMoreResults. It automatically get the allowed size data results thus not showing the error you posted. Loop through it until there's no more fetched results.
http://www.kevinkuszyk.com/2016/08/19/paging-through-query-results-in-azure-documentdb/

AddToSet operation requires a target array field

Trying to make use of Azure DocumentDB/CosmsoDB using the MongoDB driver. I have learned that there are many limitations as the full set of features is not currently implemented. I want to use aggregate functions, specifically $group, and .distinct but I don't think that is available yet. As a work around, I am trying to maintain a separate "tracking" document to enable "distinct". trying to update a document using $addToSet, but getting the following:
MongoError: Message: {"Errors":["Encountered exception while executing function. Exception = Error: AddToSet operation requires a target array field.\r\nStack trace: Error: AddToSet operation requires a target array field.\n at arrayAddToSet (__.sys.commonUpdate.js:2907:25)\n at handleUpdate (__.sys.commonUpdate.js:2649:29)\n at processOneResult (__.sys.commonUpdate.js:2484:25)\n at queryCallback (__.sys.commonUpdate.js:2461:21)\n at Anonymous function (__.sys.commonUpdate.js:619:29)"]}
The update command i am using:
var usersDocument = collection.updateOne(
{ "type": "users" },
{ $addToSet: {users: "someone#gmail.com"} },
function(err, count, status) {
console.log("updateOne err: " + err)
console.log("updateOne count: " + count)
console.log("updateOne status: " + status)
}
)
This seems to me to be a pretty straight-forward command, pulled from the mongo documentation and fields adjusted as needed. Maybe I am missing something really basic?
My ultimate goal was to make sure that my code was portable as to be able to move it into a Mongo cluster, if I so desired (not be locked into Azure-specific). To get started and not have to manage a multi-server cluster, Azure CosmosDB looked like a great jumpstart, but the limitations are maddening.
UPDATE:
Now that I have fixed my document and I actually have a field with an array, $addToSet is just replacing the value, rather than adding to the array. I'll create a new question for that.
Yup, something basic. The error message was actually correct. After inspecting the existing document:
I found:
{ "users": "[]" }
And changed it to:
{ "users": [] }
Now it is working.

How to get the table name in AWS dynamodb trigger function?

I am new with AWS and working on creating a lambda function on Python. The function will get the dynamodb table stream and write to a file in s3. Here the name of the file should be the name of the table.
Can someone please tell me how to get the table name if the trigger that is invoking the lambda function?
Thanks for help.
Since you mentioned you are new to AWS, I am going to answer descriptively.
I am assuming that you have set 'Stream enabled' setting for your DynamoDB table to 'Yes', and have set up this as an event source to your lambda function.
This is how I got the table name from the stream that invoked my lambda function -
def lambda_handler(event, context):
print(json.dumps(event, indent=2)) # Shows what's in the event object
for record in event['Records']:
ddbARN = record['eventSourceARN']
ddbTable = ddbARN.split(':')[5].split('/')[1]
print("DynamoDB table name: " + ddbTable)
return 'Successfully processed records.'
Basically, the event object that contains all the information about a particular DynamoDB stream that was responsible for that particular lambda function invoke, contains a parameter eventSourceARN. This eventSourceARN is the ARN (Amazon Resource Number) that uniquely identifies your DynamoDB table from which the event occurred.
This is a sample value for eventSourceARN -
arn:aws:dynamodb:us-east-1:111111111111:table/test/stream/2020-10-10T08:18:22.385
Notice the bold text above - test; this is the table name you are looking for.
In the line ddbTable = ddbARN.split(':')[5].split('/')[1] above, I have tried to split the entire ARN by ':' first, and then by '/' in order to get the value test. Once you have this value, you can call S3 APIs to write to a file in S3 with the same name.
Hope this helps.
Please note that eventSourceArn is not always provided. From my testing today, I didn't see eventSourceArn presented in record. You can also refer to the links:
Issue: https://github.com/aws/aws-sdk-js/issues/2226
API: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_Record.html
One way to do it will be via pattern matching in Scala using regex:
val ddbArnRegex: Regex = """arn:aws:dynamodb:(.+):(.+):table/(.+)/stream/(.+)""".r
def parseTableName(ddbARN: String): Option[String] = {
if (null == ddbARN) None
ddbARN match {
case ddbArnRegex(_, _, table, _) => Some(table)
case _ => None
}
}

Resources