Difference between "Download to a stream" and "Download from a stream" in Azure blob storage [duplicate] - asp.net

What is the difference between OpenReadAsync and DownloadToStreamAsync functions of CloudBlockBlob in the Azure blob storage? Searched in google but could not find an answer.

Both OpenReadAsync and DownloadToStreamAsync could initiate an asynchronous operation for you to retrieve the blob stream.
Based on my testing, you could have a better understanding of them by the following sections:
 
Basic Concepts
DownloadToStreamAsync:Initiates an asynchronous operation to download the contents of a blob to a stream.
OpenReadAsync:Initiates an asynchronous operation to download the contents of a blob to a stream.
 
Usage
a) DownloadToStreamAsync
Sample Code:
using (var fs = new FileStream(<yourLocalFilePath>, FileMode.Create))
{
await blob.DownloadToStreamAsync(fs);
}
 
b) OpenReadAsync
Sample Code:
//Set buffer for reading from a blob stream, the default value is 4MB.
blob.StreamMinimumReadSizeInBytes=10*1024*1024; //10MB
using (var blobStream = await blob.OpenReadAsync())
{
using (var fs = new FileStream(localFile, FileMode.Create))
{
await blobStream.CopyToAsync(fs);
}
}
Capturing Network requests via Fiddler
a) DownloadToStreamAsync
 
b) OpenReadAsync
 
According to the above, DownloadToStreamAsync just sends one get request for retrieving blob stream, while OpenReadAsync sends more than one request to retrieving blob stream based on the “Blob.StreamMinimumReadSizeInBytes” you have set or by default value.

The difference between DownloadToStreamAsync and OpenReadAsync is that DownloadToStreamAsync will download the contents of the blob to the stream before returning, but OpenReadAsync will not trigger a download until the stream is consumed.
For example, if using this to return a file stream from an ASP.NET core service, you should use OpenReadAsync and not DownloadToStreamAsync:
Example with DownloadToStreamAsync (not recommended in this case):
Stream target = new MemoryStream(); // Could be FileStream
await blob.DownloadToStreamAsync(target); // Returns when streaming (downloading) is finished. This requires the whole blob to be kept in memory before returning!
_logger.Log(LogLevel.Debug, $"DownloadToStreamAsync: Length: {target.Length} Position: {target.Position}"); // Output: DownloadToStreamAsync: Length: 517000 Position: 517000
target.Position = 0; // Rewind before returning Stream:
return File(target, contentType: blob.Properties.ContentType, fileDownloadName: blob.Name, lastModified: blob.Properties.LastModified, entityTag: null);
Example with OpenReadAsync (recommended in this case):
// Do NOT put the stream in a using (or close it), as this will close the stream before ASP.NET finish consuming it.
Stream blobStream = await blob.OpenReadAsync(); // Returns when the stream has been opened
_logger.Log(LogLevel.Debug, $"OpenReadAsync: Length: {blobStream.Length} Position: {blobStream.Position}"); // Output: OpenReadAsync: Length: 517000 Position: 0
return File(blobStream, contentType: blob.Properties.ContentType, fileDownloadName: blob.Name, lastModified: blob.Properties.LastModified, entityTag: null);

Answer from a member of Microsoft Azure (here):
The difference between DownloadStreamingAsync and OpenReadAsync is
that the former gives you a network stream (wrapped with few layers
but effectively think about it as network stream) which holds on to
single connection, the later on the other hand fetches payload in
chunks and buffers issuing multiple requests to fetch content. Picking
one over the other one depends on the scenario, i.e. if the consuming
code is fast and you have good broad network link to storage account
then former might be better choice as you avoid multiple req-res
exchanges but if the consumer is slow then later might be a good idea
as it releases a connection back to the pool right after reading and
buffering next chunk. We recommend to perf test your app with both to
reveal which is best choice if it's not obvious.

OpenReadAsync returns a Task<Stream> and you use it with an await.
sample test method
CloudBlobContainer container = GetRandomContainerReference();
try
{
await container.CreateAsync();
CloudBlockBlob blob = container.GetBlockBlobReference("blob1");
using (MemoryStream wholeBlob = new MemoryStream(buffer))
{
await blob.UploadFromStreamAsync(wholeBlob);
}
using (MemoryStream wholeBlob = new MemoryStream(buffer))
{
using (var blobStream = await blob.OpenReadAsync())
{
await TestHelper.AssertStreamsAreEqualAsync(wholeBlob, blobStream);
}
}
}
DownloadToStreamAsync is a virtual (can be overridden) method returning a task and takes stream object as input.
sample usage.
await blog.DownloadToStreamAsync(memoryStream);

Related

Server performance question about streaming from cosmos dB

I read the article here about IAsyncEnumerable, more specifically towards a Cosmos Db-datasource
public async IAsyncEnumerable<T> Get<T>(string containerName, string sqlQuery)
{
var container = GetContainer(containerName);
using FeedIterator<T> iterator = container.GetItemQueryIterator<T>(sqlQuery);
while (iterator.HasMoreResults)
{
foreach (var item in await iterator.ReadNextAsync())
{
yield return item;
}
}
}
I am wondering how the CosmosDB is handling this, compared to paging, lets say 100 documents at the time. We have had some "429 - Request rate too large"-errors in the past and I dont wish to create new ones.
So, how will this affect server load/performance.
I dont see a big difference from the servers perspective, between when client is streaming (and doing some quick checks), and old way, get all document and while (iterator.HasMoreResults) and collect the items in a list.
The SDK will retrieve batches of documents that can be adjusted in size using the QueryRequestOptions and changing the MaxItemCount (which defaults to 100 if not set). It has no option though to throttle the RU usage apart from it running into the 429 error and using the retry mechanism the SDK offers to retry a while later. Depending on how generous you set the retry mechanism it'll retry oft & long enough to get a proper response.
If you have a situation where you want to limit the RU usage for e.g. there's multiple processes using your cosmos and you don't want those to result in 429 errors you would have to write the logic yourself.
An example of how something like that could look:
var qry = container
.GetItemLinqQueryable<Item>(requestOptions: new() { MaxItemCount = 2000 })
.ToFeedIterator();
var results = new List<Item>();
var stopwatch = new Stopwatch();
var targetRuMsRate = 200d / 1000; //target 200RU/s
var previousElapsed = 0L;
var delay = 0;
stopwatch.Start();
var totalCharge = 0d;
while (qry.HasMoreResults)
{
if (delay > 0)
{
await Task.Delay(delay);
}
previousElapsed = stopwatch.ElapsedMilliseconds;
var response = await qry.ReadNextAsync();
var charge = response.RequestCharge;
var elapsed = stopwatch.ElapsedMilliseconds;
var delta = elapsed - previousElapsed;
delay = (int) ((charge - targetRuMsRate * delta) / targetRuMsRate);
foreach (var item in response)
{
results.Add(item);
}
}
Edit:
Internally the SDK will call the underlying Cosmos REST API. Once your code reaches the iterator.ReadNextSync() it will call the query documents method in the background. If you would dig into the source code or intercept the message send to HttpClient you can observe the resulting message which lacks the x-ms-max-item-count header that determines the number of the documents it'll try to retrieve (unless you have specified a MaxItemCount yourself). According to the Microsoft Docs it'll default to 100 if not set:
Query requests support pagination through the x-ms-max-item-count and x-ms-continuation request headers. The x-ms-max-item-count header specifies the maximum number of values that can be returned by the query execution. This can be between 1 and 1000, and is configured with a default of 100.

How can I generate a url to a new AppInsights query?

I have a process that generates AppInsights telemetry. I would like to prove a link to a query in AppInsights. However, it is not the same query every time - the parameters change. I know I can share a link to an existing query, but how do I generate such a link to a new query?
In your Application Insights Query Editor, we have an option called Copy link to query. In this link we have following details:
The URL generated from this action has the following format:
https://portal.azure.com/## TENANT_ID/blade/Microsoft_Azure_Monitoring_Logs/LogsBlade/resourceId/%2Fsubscriptions%2F SUBSCRIPTION_ID %2FresourceGroups%2F< RESOURCEGROUP%2Fproviders%2Fmicrosoft.insights%2Fcomponents%2F APPLICATION INSIGHTS_INSTANCE_NAME /source/LogsBlade.AnalyticsShareLinkToQuery/q/ ENCODED
BASE 64_KQL_QUERY /timespan/TIMESPAN
I’ve emphasized in bold here the parameters of the URL. These parameters have the following values:
TENANT_ID: Your Tenant ID
SUBSCRIPTION_ID: Your Azure Subscription ID that contains the Application Insights instance.
RESOURCE_GROUP: Your Resource Group where the Application Insights instance is deployed.
APPINSIGHTS_INSTANCE_NAME: Your Application Insights instance Name.
ENCODED_KQL_QUERY: Base64 encoding of your query text zipped and URL encoded
TIMESPAN: time filter for the query (optional).
If your query has less than 1600 characters, you can also replace the q parameter in the above URL with a query parameter, and the encoded string will simply be your query plain text escaped (without zipping and encoding).
Dynamic URL it’s important to:
Take the text of your KQL query
Zip it
Encode it in Base64
A C# code that does the encoding of the KQL query is the following:
Generate the Query whatever you want and pass that into the below function to get the Encoded base 64 URL and you can add this in a base URL of application insights.
static string Encodedbase64KQLQuery(string query)
{
var bytes = System.Text.Encoding.UTF8.GetBytes(query);
using (MemoryStream memoryStream = new MemoryStream())
{
using (GZipStream compressedStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
{
compressedStream.Write(bytes, 0, bytes.Length);
}
memoryStream.Seek(0, SeekOrigin.Begin);
Byte[] bytedata = memoryStream.ToArray();
string encodedBase64Query = Convert.ToBase64String(bytedata);
return HttpUtility.UrlEncode(encodedBase64Query);
}
}
Please visit this blog which helped me a lot.
Thanks Delliganesh and Stefano from the blog link. Here is a simple JavaScript example. Be sure to replace all 4 constant values at top and the sessionId when calling the function. You can also tweak the query, but just keep in mind the 1600 character limit as described above and in the blog.
const APP_INSIGHTS_INSTANCE_NAME = "APP_INSIGHTS_INSTANCE_NAME";
const APP_INSIGHTS_RESOURCE_GROUP = "APP_INSIGHTS_RESOURCE_GROUP";
const APP_INSIGHTS_SUBSCRIPTION_ID = "APP_INSIGHTS_SUBSCRIPTION_ID";
const APP_INSIGHTS_TENANT_ID = "APP_INSIGHTS_TENANT_ID";
const getAppInsightsQueryUrl = ({ sessionId }) => {
const query = `requests | where session_Id == "${sessionId}"`;
const url = `https://portal.azure.com/##${APP_INSIGHTS_TENANT_ID}/blade/Microsoft_Azure_Monitoring_Logs/LogsBlade/resourceId/%2Fsubscriptions%2F${APP_INSIGHTS_SUBSCRIPTION_ID}%2FresourceGroups%2F${APP_INSIGHTS_RESOURCE_GROUP}%2Fproviders%2Fmicrosoft.insights%2Fcomponents%2F${APP_INSIGHTS_INSTANCE_NAME}/source/LogsBlade.AnalyticsShareLinkToQuery/query/${encodeURI(
query
)}/timespan/TIMESPAN`;
return url;
};
getAppInsightsQueryUrl({
sessionId: 'my-session-id',
})

How to make an HttpClient GetAsync wait for a webpage that loads data asynchronously?

I'm making a snippet that sends data to a website that analyses it then sends back results.
Is there any way to make my GetAsych wait until the website finishes its calculation before getting a "full response"?
Ps: The await will not know if the page requested contains any asynchronous processing (eg: xhr calls)- I already use await and ReadAsByteArrayAsync()/ReadAsStringAsync()
Thank you!
You will need something like Selenium to not only fetch the HTML of the website but to fully render the page and execute any dynamic scripts.
You can then hook into some events, wait for certain DOM elements to appear or just wait some time until the page is fully initialized.
Afterwards you can use the API of Selenium to access the DOM and extract the information you need.
Example code:
using (var driver = new ChromeDriver(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)))
{
driver.Navigate().GoToUrl(#"https://automatetheplanet.com/multiple-files-page-objects-item-templates/");
var link = driver.FindElement(By.PartialLinkText("TFS Test API"));
var jsToBeExecuted = $"window.scroll(0, {link.Location.Y});";
((IJavaScriptExecutor)driver).ExecuteScript(jsToBeExecuted);
var wait = new WebDriverWait(driver, TimeSpan.FromMinutes(1));
var clickableElement = wait.Until(ExpectedConditions.ElementToBeClickable(By.PartialLinkText("TFS Test API")));
clickableElement.Click();
}
Source: https://www.automatetheplanet.com/webdriver-dotnetcore2/
What you're looking for here is the await operator. According to the docs:
The await operator suspends evaluation of the enclosing async method until the asynchronous operation represented by its operand completes.
Sample use within the context of an HttpClient object:
public static async Task Main()
{
// send the HTTP GET request
var response = await httpClient.GetAsync("my-url");
// get the response string
// there are other `ReadAs...()` methods if the return type is not a string
var getResult = response.Content.ReadAsStringAsync();
}
Note that the method that encloses the await-ed code is marked as async and has a return type of Task (Task<T> would also work, depending on your needs).

IngestFromStreamAsync method does not work

I manage to ingest data successfully using below code
var kcsbDM = new KustoConnectionStringBuilder(
"https://test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateDirectIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStream(memStream, ingestProps);
}
when I try to use QueuedClient and IngestFromStreamAsync, the code is executed successfully but no any data is ingested into database even after 30 minutes
var kcsbDM = new KustoConnectionStringBuilder(
"https://ingest-test123.southeastasia.kusto.windows.net",
"testdb")
.WithAadApplicationTokenAuthentication(acquireTokenTask.AccessToken);
using (var ingestClient = KustoIngestFactory.CreateQueuedIngestClient(kcsbDM))
{
var ingestProps = new KustoQueuedIngestionProperties("testdb", "TraceLog");
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
ingestProps.Format = DataSourceFormat.json;
//generate datastream and columnmapping
ingestProps.IngestionMapping = new IngestionMapping() {
IngestionMappings = columnMappings };
var ingestionResult = ingestClient.IngestFromStreamAsync(memStream, ingestProps);
}
Try running .show ingestion failures on "https://test123.southeastasia.kusto.windows.net" endpoint, see if there are ingestion error.
Also, you set Queue reporting method, you can get the detailed result by reading from the queue.
ingestProps.ReportLevel = IngestionReportLevel.FailuresOnly;
ingestProps.ReportMethod = IngestionReportMethod.Queue;
(On the first example you used KustoQueuedIngestionProperties, you should use KustoIngestionProperties. KustoQueuedIngestionProperties has additional properties that will be ignored by the ingest client, ReportLevel and ReportMethod for example)
Could you please change the line to:
var ingestionResult = await ingestClient.IngestFromStreamAsync(memStream, ingestProps);
Also please note that queued ingestion has a batching stage of up to 5 minutes before the data is actually ingested:
IngestionBatching policy
.show table ingestion batching policy
I find the reason finally, need to enable stream ingestion in the table:
.alter table TraceLog policy streamingingestion enable
See the Azure documentation for details.
enable streamingestion policy is actually only needed if
stream ingestion is turned on in the cluster (azure portal)
the code is using CreateManagedStreamingIngestClient
the ManagedStreamingIngestClient will first try stream ingesting the data, if it fails a few times, then it will use the QueuedClient
if the ingesting data is smaller, under 4MB, it's recommended to use this client.
if using QueuedClient, you can try
.show commands-and-queries | | where StartedOn > ago(20m) and Text contains "{YourTableName}" and CommandType =="DataIngestPull"
This can give you the command executed; however it could have latency > 5 mins
Finally, you can check the status with any client you use, do this
StreamDescription description = new StreamDescription
{
SourceId = Guid.NewGuid(),
Stream = dataStream
};
then you have the source id
ingesting by calling this:
var checker = await client.IngestFromStreamAsync(description, ingestProps);
after that, call
var statusCheck = checker.GetIngestionStatusBySourceId(description.sourceId.Value);
You can figure out the status of this ingestion job. It's better wrapped in a separate thread, so you can keep checking once a few seconds, for example.

Ingest from storage with persistDetails = true not save ingest status result

I'm now implement a program to migrate large amount of data to ADX base on Ingest from Storage feature of ADX and I'm need to check that status of each ingestion request each time the request finish but I'm facing an issue
Base on MS document in here
If I set the persistDetails = true for example with the command below it must save the ingestion status but currently this setting seem not work (with or without it)
.ingest async into table MigrateTable
(
h'correct blob url link'
)
with (
jsonMappingReference = 'table_mapping',
format = 'json',
persistDetails = true
)
Above command will return an OperationId and when I using it to check export status when the ingest task finish I always get this error message :
Error An admin command cannot be executed due to an invalid state: State='Operation 'DataIngestPull' does not persist its operation results' clientRequestId: KustoWebV2;
Can someone clarify for me what is the root cause relate to this? With me it seem like a bug relate to ADX
Ingesting data directly against the Data Engine, by running .ingest commands, is usually not recommended, compared to using Queued Ingestion (motivation included in the link). Using Kusto's ingestion client library allows you to track the ingestion status.
Some tools/services already do that for you, and you can consider using them directly. e.g. LightIngest, Azure Data Factory
If you don't follow option 1, you can still look for the state/status of your command using the operation ID you get when using the async keyword, by using .show operations
You can also use the client request ID to filter the result set of .show commands to view the state/status of your command.
If you're interested in looking specifically at failures, .show ingestion failures is also available for you.
The persistDetails option you specified in your .ingest command actually has no effect - as mentioned in the docs:
Not all control commands persist their results, and those that do usually do so by default on asynchronous executions only (using the async keyword). Please search the documentation for the specific command and check if it does (see, for example data export).
============ Update sample code follow suggestion from Yoni ========
Turn out, other member in my team mess up with access right with adx, after fixing it everything work fine
I just have one concern relate to PartiallySucceeded that need clarify from #yoni or someone have better knowledge relate to that
try
{
var ingestProps = new KustoQueuedIngestionProperties(model.DatabaseName, model.IngestTableName)
{
ReportLevel = IngestionReportLevel.FailuresAndSuccesses,
ReportMethod = IngestionReportMethod.Table,
FlushImmediately = true,
JSONMappingReference = model.IngestMappingName,
AdditionalProperties = new Dictionary<string, string>
{
{"jsonMappingReference",$"{model.IngestMappingName}" },
{ "format","json"}
}
};
var sourceId = Guid.NewGuid();
var clientResult = await IngestClient.IngestFromStorageAsync(model.FileBlobUrl, ingestProps, new StorageSourceOptions
{
DeleteSourceOnSuccess = true,
SourceId = sourceId
});
var ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
while (ingestionStatus.Status == Status.Pending)
{
await Task.Delay(WaitingInterval);
ingestionStatus = clientResult.GetIngestionStatusBySourceId(sourceId);
}
if (ingestionStatus.Status == Status.Succeeded)
{
return true;
}
LogUtils.TraceError(_logger, $"Error when ingest blob file events, error: {ingestionStatus.ErrorCode.FastGetDescription()}");
return false;
}
catch (Exception e)
{
return false;
}

Resources