Streaming data out of Kusto - azure-data-explorer

Streaming data out of Kusto - azure-data-explorer

There is enough ADX documentation available about streaming ingestion but I don't find anything about streaming it out of Kusto. Even continuous export has minimum limit on export frequency (I think 5 mins) and that is far from being a streaming method of exporting. Is there a way to stream high volume data out of ADX to BLOB or ADLS Gen2?

Continuous export is the recommended approach for continuously exporting high volume of data from Kusto since the exporting is distributed. The minimal frequency is one minute, see the "frequency" section in the doc :
For other exports of a large amount of data, use the applicable method in the SDKs. For example, here is the Java SDK and here is the .Net sample:
KustoConnectionStringBuilder kcsb = new KustoConnectionStringBuilder(connectionString);
kcsb.Streaming = true;
using (KustoDataContext context = new KustoDataContext(kcsb))
using (var reader = context.ExecuteQuery(query, requestProperties: requestProperties))
using (var csvStream = new CsvFromDataReaderJitStream(reader, leaveOpen: false, writeHeader: true))
{
// consume stream here
}

Related

Safe to Save Binary Data in Cloud Firestore Database?

I've always used the Cloud Firestore Database (and the old real-time one) to store text, and then use the Storage for images.
While using SurveyJS and AngularFirestore, I discovered I can push binary files into and out of the Firestore Database with the attached code. My question is: Is this OK?? I mean it works great, but I don't want to incur a cost or network slowdown...Thanks
var resultAsString = JSON.stringify(this.survey.data);
this.qs.saveSupplierQuestionnaire(this.companyid, this.id,this.survey.data)
...
saveSupplierQuestionnaire(userid:string, questionnaireid:string, questionnaireData:any) {
var resultAsString = JSON.stringify(questionnaireData);
var numCompleted = 0; /////test grading
const dbRef = this.afs.collection<questionnaire>('companies/' + userid + '/questionnaires/').doc(questionnaireid).update({results:resultAsString})

If it meets the needs of your application, then it's OK.
You should be aware than any time a document is read, the entire document is transferred to the client. So, even if you don't use the field with the binary data, you are going to make the user wait for the entire contents to be downloaded. This is true for all fields of a document, regardless of their types. There is really nothing special about binary fields, other than how the data is typed.

how to send sensor data 's real time and data to firebase using arduino?

I am only able to send the timestamp but not the actual time to firebase using the arduino code below
StaticJsonBuffer<200> jsonBuffer;
JsonObject& PIR1Object = jsonBuffer.createObject();
JsonObject& PIR1ONTime = PIR1Object.createNestedObject("timestamp");
PIR1Object["PIR_1_ON"] = 1;
PIR1ONTime[".sv"] = "timestamp";
Firebase.push("/sensor/PIR_1", PIR1Object);

what do you mean by not actual time?Can you share screen shot of results ?
The method you have chosen it will save the time in UNIX format.For testing purposes you can convert the unix form to human read form using this site. On app you might have to convert it depending on your language or need.

Cosmos DB image/file attachment

I have created a document with pdf attachment using below code and it's working (able to retrieve attached file).
var myDoc = new { id = "42", Name = "Max", City="Aberdeen" }; // this is the
document you are trying to save
var attachmentStream = File.OpenRead("c:/Path/To/File.pdf"); // this is the
document stream you are attaching
var client = await GetClientAsync();
var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName,
CollectionName);
Document document = await client.CreateDocumentAsync(createUrl, myDoc);
await client.CreateAttachmentAsync(document.SelfLink, attachmentStream, new
MediaOptions()
{
ContentType = "application/pdf", // your application type
Slug = "78", // this is actually attachment ID
});
I can upload a document directly in blob storage and put that blob URL in the document.
Can anyone help me to understand the value of inbuild attachment feature? how this is better than blob and other option? where cosmos DB keep attachment?
I want to understand which scenario we should consider this option (I know 2GB per account limitation)

Can anyone help me to understand the value of inbuild attachment
feature?how this is better than blob and other option?
Based on this official doc, you could get answer for your question.
You could store two types data:
1.binary blobs/media
2.metadata (for example, location, author etc.) of a media stored in a remote media storage
In addition,attachments has garbage disposal mechanism which is different with Azure Blob Storage I think.
Azure Cosmos DB will ensure to garbage collect the media when all of
the outstanding references are dropped. Azure Cosmos DB automatically
generates the attachment when you upload the new media and populates
the _media to point to the newly added media. If you choose to store
the media in a remote blob store managed by you (for example,
OneDrive, Azure Storage, DropBox, etc.), you can still use attachments
to reference the media. In this case, you will create the attachment
yourself and populate its _media property.
So,per my understanding,if your resource data will be frequently added or deleted, I think you could consider using attachment. You just need to store remote URL into _media property.
where cosmos DB keep attachment?
Attachment is stored in the collection as JSON format document,it can be created, replaced, deleted, read, or enumerated easily using either REST APIs or any of the client SDKs. As I know, it can't display on the portal so far.
BTW, Azure cosmos db is more expensive than blob storage usually.I think cost is an important factor to consider. More details, you could refer to the price doc.
Hope I'm clear on this.

Graphite Derivative shows no data

Using graphite/Grafana to record the sizes of all collections in a mongodb instance. I wrote a simple (WIP) python script to do so:
#!/usr/bin/python
from pymongo import MongoClient
import socket
import time
statsd_ip = '127.0.0.1'
statsd_port = 8125
# create a udp socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
client = MongoClient(host='12.34.56.78', port=12345)
db = client.my_DB
# get collection list each runtime
collections = db.collection_names()
sizes = {}
# main
while (1):
# get collection size per name
for collection in collections:
sizes[collection] = db.command('collstats', collection)['size']
# write to statsd
for size in sizes:
MESSAGE = "collection_%s:%d|c" % (size, sizes[size])
sock.sendto(MESSAGE, (statsd_ip, statsd_port))
time.sleep(60)
This properly shows all of my collection sizes in grafana. However, I want to get a rate of change on these sizes, so I build the following graphite query in grafana:
derivative(statsd.myHost.collection_myCollection)
And the graph shows up totally blank. Any ideas?
FOLLOW-UP: When selecting a time range greater than 24h, all data similarly disappears from the graph. Can't for the life of me figure out that one.

Update: This was due to the fact that my collectd was configured to send samples every second. The statsd plugin for collectd, however, was receiving data every 60 seconds, so I ended up with None for most data points.
I discovered this by checking the raw data in Graphite by appending &format=raw to the end of a graphite-api query in a browser, which gives you the value of each data point as a comma-separated list.
The temporary fix for this was to surround the graphite query with keepLastValue(60). This however creates a stair-step graph, as the value for each None (60 values) becomes the last valid value within 60 steps. Graphing a derivative of this then becomes a widely spaced sawtooth graph.
In order to fix this, I will probably go on to fix the flush interval on collectd or switch to a standalone statsd instance and configure as necessary from there.

AWS .Net SDK Create an Amazon EC2 instance with a specific EBS Volume Size

I am developing a .net web application that creates and manages ec2 instances programatically. As of now, when I create new instances, the size of the disk volume is fixed: defined by the image (AMI) I believe.
I would like to Predefine the size of the disk volume when creating a new instance so that I don't need to run a resize operation afterwards. Is that possible? Which would be the best approach?
I have a few ideas:
Define the volume size on the RunInstancesRequest object. But I think there is no such option.
Create a copy of the AMI image with a different disk size and use that one to request a new EC2 instance. Can this be done?
Any other/better ways?
In case that helps, I attach the code I currently use to request new instances:
var launchRequest = new RunInstancesRequest()
{
ImageId = amiID,
InstanceType = type,
MinCount = 1,
MaxCount = 1,
SecurityGroupIds = groups
};
var launchResponse = ec2Client.RunInstances(launchRequest);
var instances = launchResponse.Reservation.Instances;
var myInstance = instances.First();

You need to set the (integer GiB) value of the VolumeSize of the EbsBlockDevice in the launchRequest.BlockDeviceMappings before launch.
Remember that if you specify a snapshot, the volume size must be equal to or larger than the snapshot size. Also, if you're creating the volume from a snapshot and don't specify a volume size, the default is the snapshot size.
TIP: Always check the Boolean value of DeleteOnTermination as well and do not assume it has a default value of True for root volumes as in AWS console.
You can find out more on EbsBlockDevice properties here

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex