Ingest from storage with persistDetails = true not save ingest status result - azure-data-explorer

I'm now implement a program to migrate large amount of data to ADX base on Ingest from Storage feature of ADX and I'm need to check that status of each ingestion request each time the request finish but I'm facing an issue
Base on MS document in here
If I set the persistDetails = true for example with the command below it must save the ingestion status but currently this setting seem not work (with or without it)
.ingest async into table MigrateTable
(
h'correct blob url link'
)
with (
jsonMappingReference = 'table_mapping',
format = 'json',
persistDetails = true
)
Above command will return an OperationId and when I using it to check export status when the ingest task finish I always get this error message :
Error An admin command cannot be executed due to an invalid state: State='Operation 'DataIngestPull' does not persist its operation results' clientRequestId: KustoWebV2;
Can someone clarify for me what is the root cause relate to this? With me it seem like a bug relate to ADX

Ingesting data directly against the Data Engine, by running .ingest commands, is usually not recommended, compared to using Queued Ingestion (motivation included in the link). Using Kusto's ingestion client library allows you to track the ingestion status.
Some tools/services already do that for you, and you can consider using them directly. e.g. LightIngest, Azure Data Factory
If you don't follow option 1, you can still look for the state/status of your command using the operation ID you get when using the async keyword, by using .show operations
You can also use the client request ID to filter the result set of .show commands to view the state/status of your command.
If you're interested in looking specifically at failures, .show ingestion failures is also available for you.
The persistDetails option you specified in your .ingest command actually has no effect - as mentioned in the docs:
Not all control commands persist their results, and those that do usually do so by default on asynchronous executions only (using the async keyword). Please search the documentation for the specific command and check if it does (see, for example data export).

============ Update sample code follow suggestion from Yoni ========
Turn out, other member in my team mess up with access right with adx, after fixing it everything work fine
I just have one concern relate to PartiallySucceeded that need clarify from #yoni or someone have better knowledge relate to that
try
{
var ingestProps = new KustoQueuedIngestionProperties(model.DatabaseName, model.IngestTableName)
{
ReportLevel = IngestionReportLevel.FailuresAndSuccesses,
ReportMethod = IngestionReportMethod.Table,
FlushImmediately = true,
JSONMappingReference = model.IngestMappingName,
AdditionalProperties = new Dictionary<string, string>
{
{"jsonMappingReference",$"{model.IngestMappingName}" },
{ "format","json"}
}
};
var sourceId = Guid.NewGuid();
var clientResult = await IngestClient.IngestFromStorageAsync(model.FileBlobUrl, ingestProps, new StorageSourceOptions
{
DeleteSourceOnSuccess = true,
SourceId = sourceId
});
var ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
while (ingestionStatus.Status == Status.Pending)
{
await Task.Delay(WaitingInterval);
ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
}
if (ingestionStatus.Status == Status.Succeeded)
{
return true;
}
LogUtils.TraceError(_logger, $"Error when ingest blob file events, error: {ingestionStatus.ErrorCode.FastGetDescription()}");
return false;
}
catch (Exception e)
{
return false;
}

Related

Is it a good / working practice to use Firebase's documentReference.get(GetOptions(source: cache)) in Flutter?

My issue was, that with the default GetOptions (omitting the parameter), a request like the following could load seconds if not minutes if the client is offline:
await docRef.get()...
If I check if the client is offline and in this case purposefully change the Source to Source.cache, I have performance that is at least as good, if not better, than if the client was online.
Source _source = Source.serverAndCache;
try {
final result = await InternetAddress.lookup('example.com');
if (result.isNotEmpty && result[0].rawAddress.isNotEmpty) {
_source = Source.serverAndCache;
}
} on SocketException catch (_) {
_source = Source.cache;
}
and then use this variable in the following way:
docRef.get(GetOptions(source: _source))
.then(...
This code works perfectly for me now, but I am not sure, if there are any cases in which using the code like this could raise issues.
Also it seems like a lot of boilerplate code (I refactored it into a function so I can use it in any Database methods but still...)
If there are no issues with this, why wouldn't this be the Firebase default, since after trying the server for an unpredictably long time it switches to cache anyways.

How to return failed task result in continuation task [duplicate]

This question already has an answer here:
How to return failed task result in continuation task?
(1 answer)
Closed 2 years ago.
I'm writing my first app in Kotlin and am using Firestore & Firebase Storage. In the process of deleting a document, I want to delete all files in Storage that the document references (as it is the only reference to them in my case). If the Storage delete fails, I want to abort the document delete, in order to avoid orphan files in my Storage. I also want to do everything in "one Task", to allow showing a progress bar properly. My simplified code looks like this:
fun deleteItem(id: String): Task<Void>? {
val deleteTask = deleteTaleMedia(id)
continueWithTaskOrInNew(deleteTask) { task ->
if (task?.isSuccessful != false) { ... }
}
}
fun deleteItemMedia(id: String): Task<Void>? =
getItem(id)?.continueWithTask { task ->
if (task.isSuccessful)
task.result?.toObject(ItemModel::class.java)?.let { deleteFiles(it.media) }
else ???
}
fun deleteFiles(filesList: List<String>): Task<Void>? {
var deleteTask: Task<Void>? = null
for (file in filesList) deleteTask = continueWithTaskOrInNew(deleteTask) { task ->
if (task?.isSuccessful != false) deleteFile(file)
else task
}
return task
}
fun deleteFile(fileName: String) = Firebase.storage.getReferenceFromUrl(fileName).delete()
fun getItem(id: String): Task<DocumentSnapshot>? {
val docRef = userDocRef?.collection(COLLECTION_PATH)?.document(id)
return docRef?.get()
?.addOnCompleteListener { ... }
}
fun <ResultT, TContinuationResult> continueWithTaskOrInNew(
task: Task<ResultT>?,
continuation: (Task<ResultT>?) -> Task<TContinuationResult>?
) = task?.continueWithTask { continuation.invoke(task) } ?: continuation.invoke(null)
data class ItemModel(
#DocumentId val id: String = "",
var title: String = "",
var media: List<String> = listOf()
)
My problem comes in deleteItemMedia function (the "???" at the end). In case the get task failed, I want to return a task that will tell my deleteItem function to abort deletion (task.isSuccessful == false). I cannot return the get task itself (replace "???" with "task" in code), because it's type (Task<DocumentSnapshot>) differs from the type of the delete task (Task<Void>). I cannot return null, as null is returned in the case of no media at all, which is a valid case for me (document should be deleted). Is there a way to create a new "failed Task"?
In the process of deleting a document I want to delete all files in Storage that the document references (as it is the only reference to them in my case).
There is no API that's doing that. You have to perform both delete operations yourself.
I also want to do everything in "one Task", to allow showing a progress bar properly.
Unfortunately, this is not possible in a single go. If you think of an atomic operation, this also not possible because none of the Firebase services support this kind of cross-product transactional operations. What you need to do is, get the document, get the references to the files in the Storage, delete the document and as soon as the delete operation is complete, delete the files. You can definitely reduce the risk by trying to roll-back the data from the client, but you cannot do them atomic, in "one Task". However, at some point in time, there will be an Exception that the client can't rollback.
If the Storage delete fails, I want to abort the document delete, in order to avoid orphan files in my Storage.
To avoid that, first, try not to have incomplete data. For instance, when you read the document and you get the corresponding Storage URLs, don't blindly assume that all those files actually exist. A file can unavailable for many reasons (was previously deleted, for some reasons the service is unavailable, etc.)
Another approach might be to use Cloud Functions for Firebase, so you can delete the desired document, and use onDelete function to delete the corresponding files from the Storage. Meaning, when document delete fails, the files from the Storage won't be deleted. If the operation to delete the document is successful, the Cloud Function will be triggered and the images will be deleted from the Storage. This approach will drastically reduce the chances of having failures between the document delete operation and the deletion of the files from Storage, but it doesn't eliminate that chance 100%.
Besides that, the most common approach to avoid failures is to make your code as robust as you possibly can against failure and do frequent database cleanups.

IngestFromStreamAsync method does not work

I manage to ingest data successfully using below code
var kcsbDM = new KustoConnectionStringBuilder(
"https://test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateDirectIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStream(memStream, ingestProps);
}
when I try to use QueuedClient and IngestFromStreamAsync, the code is executed successfully but no any data is ingested into database even after 30 minutes
var kcsbDM = new KustoConnectionStringBuilder(
"https://ingest-test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStreamAsync(memStream, ingestProps);
}
Try running .show ingestion failures on "https://test123.southeastasia.kusto.windows.net" endpoint, see if there are ingestion error.
Also, you set Queue reporting method, you can get the detailed result by reading from the queue.
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
(On the first example you used KustoQueuedIngestionProperties, you should use KustoIngestionProperties. KustoQueuedIngestionProperties has additional properties that will be ignored by the ingest client, ReportLevel and ReportMethod for example)
Could you please change the line to:
var ingestionResult = await ingestClient.IngestFromStreamAsync(memStream, ingestProps);
Also please note that queued ingestion has a batching stage of up to 5 minutes before the data is actually ingested:
IngestionBatching policy
.show table ingestion batching policy
I find the reason finally, need to enable stream ingestion in the table:
.alter table TraceLog policy streamingingestion enable
See the Azure documentation for details.
enable streamingestion policy is actually only needed if
stream ingestion is turned on in the cluster (azure portal)
the code is using CreateManagedStreamingIngestClient
the ManagedStreamingIngestClient will first try stream ingesting the data, if it fails a few times, then it will use the QueuedClient
if the ingesting data is smaller, under 4MB, it's recommended to use this client.
if using QueuedClient, you can try
.show commands-and-queries | | where StartedOn > ago(20m) and Text contains "{YourTableName}" and CommandType =="DataIngestPull"
This can give you the command executed; however it could have latency > 5 mins
Finally, you can check the status with any client you use, do this
StreamDescription description = new StreamDescription
{
SourceId = Guid.NewGuid(),
Stream = dataStream
};
then you have the source id
ingesting by calling this:
var checker = await client.IngestFromStreamAsync(description, ingestProps);
after that, call
var statusCheck = checker.GetIngestionStatusBySourceId(description.sourceId.Value);
You can figure out the status of this ingestion job. It's better wrapped in a separate thread, so you can keep checking once a few seconds, for example.

Azure Mobile Service Sync with select clause not supported?

I’ve had a sync operation in place in my Xamarin Forms app for a long time now and only this pase couple of weeks has it started throwing exceptions, which makes me think maybe it’s a service change or something introduced in an update?
On startup I sync all data with m Azure Mobile Service using:
await this.client.SyncContext.PushAsync();
if (signedInUser != Guid.Empty)
{
await this.MyTable.PullAsync(
"myWorkoutsOnly",
this.MyTable.CreateQuery().Select(u => u.UserId == signedInUser));
}
And as I say, I’ve never had an issue with this code. Now though, I’m getting:
System.ArgumentException: Pull query with select clause is not supported
I only want to sync the data that matches the signed in user, so is there another way to achieve this?
I only want to sync the data that matches the signed in user, so is there another way to achieve this?
You could leverage the code below to achieve your purpose as follows:
var queryName = $"incsync_{UserId}";
var query = table.CreateQuery()
.Where(u => u.UserId == signedInUser);
await table.PullAsync(queryName, query);
For Select method, you could retrieve the specific properties into your local store instead of all properties in your online table as follows:
var queryName = $"incsync:s:{typeof(T).Name}";
var query = table.CreateQuery()
.Select(r => new { r.Text, r.Complete, r.UpdatedAt, r.Version });
await table.PullAsync(queryName, query);
For more details, you could refer to adrian hall's book about Query Management.
UPDATE:
As Joagwa commented, you could change your server side code and limit data to be retrieved only by the logged in user. For more details, you could refer to Data Projection and Queries > Per-User Data.

Asynchronous with indexeddb problems

I am having problem with a function in IndexedDB, where I need to change the status of some meetings. The Search feature which meetings are checked by grabbing the ID of each one of them, soon after I A for() where I retrace the vector that contains the ids for each database access do I get a different passing the id of the time. The following code example:
var val = [];
var checkbox = $('input:checkbox[class^=checkReunioes]:checked');
if(checkbox.length > 0){
checkbox.each(function(){
val.push($(this).val());
});
}
for(var i = 0; i < val.length; i++){
var transaction = db.transaction(["tbl_REUNIOES"], "readwrite").objectStore("tbl_REUNIOES");
var request = transaction.get(val[i]);
request.onerror = function(event) {
alert("BAD");
};
request.onsuccess = function(event) {
var data = request.result;
data.FLG_STATU_REUNI = 'I';
var codigo_igreja = localStorage.getItem("igreja");
var dataJSON = JSON.stringify(data);
enviarFilaSincronismo("tbl_REUNIOES", "U", dataJSON, " WHERE COD_IDENT_REUNI = '" + val[i] + "' and COD_IDENT_IGREJ = '" + codigo_igreja + "'");
var requestUpdate = transaction.put(data);
requestUpdate.onerror = function(event) {
alert("OK");
};
requestUpdate.onsuccess = function(event) {
$("#listReunioes").html("");
serchAll(w_key_celula);
};
};
}
In my view the problem is occurring due to be a bank indexeddb asynchronous, it passes to the next search, even before the first stop.
But how can I do to confer this ?
What is the good practice for something in this case ?.
If you are inexperienced with writing asynchronous code, a good general rule to consider is to never define functions inside loops. Do not set request.onsuccess to a function from within the for loop.
You can perform multiple get and put requests on the same transaction when you do not expect the individual requests to fail for data-related reasons, such as the violation of a uniqueness constraint of an index, or because you are performing many thousands of requests on the same transaction and reaching processing limits.
You might find that using IDBObjectStore.prototype.openCursor together with IDBCursor.prototype.update is more convenient than using IDBObjectStore.prototype.get and IDBObjectStore.prototype.put.
Your example code indicates that a successful get request means that data was retrieved, when in fact, this is not what actually happens. A successful get request just means that a request occurred without errors (e.g. against an object store that exists, against a database that is not blocked by other requests, against a database connection that is still valid). It does not mean that an object matched your get request query. You should be checking for whether the request's result object is defined, and use that check as a determination of whether an object matched your get query, and not simply that a successful request occurred.
You might want to spend more time organizing your code into smaller functions that use clearer names. Your example code is difficult to read.
It looks like you are using some type of global db variable. If you are not well experienced with writing asynchronous code, avoid using a global db variable. There is no guarantee the db variable will be defined and open when you decide to access it, which could lead to an unexpected error.

Resources