I have an API that currently receives JSON calls that I push to files (800KB-1MB) (1 for each call), and would like to have an hourly task that takes all of the JSON files in the last hour and combines them into a single file as to make it better to do daily/monthly analytics on.
Each file consists of a collection of data, so in the format of [ object {property: value, ... ]. Due to this, I cannot do simple concatenation as it'll no longer be valid JSON (nor add a comma then the file will be a collection of collections). I would like to keep the memory foot-print as low as possible, so I was looking at the following example and just pushing each file to the stream (deserializing the file using JsonConvert.DeserializeObject(fileContent); however, by doing this, I end up with a collection of collection as well. I have also tried using a JArray instead of the JsonConvert, pushing to a list outside of the foreach with but provides the same result. If I move the Serialize call outside the ForEach, it does work; however, I am worried about holding the 4-6GB worth of items in memory.
In summary, I'm ending up with [ [ object {property: value, ... ],... [ object {property: value, ... ]] where my desired output would be [ object {property: value (file1), ... object {property: value (fileN) ].
using (FileStream fs = File.Open(#"C:\Users\Public\Documents\combined.json", FileMode.CreateNew))
{
using (StreamWriter sw = new StreamWriter(fs))
{
using (JsonWriter jw = new JsonTextWriter(sw))
{
jw.Formatting = Formatting.None;
JArray list = new JArray();
JsonSerializer serializer = new JsonSerializer();
foreach (IListBlobItem blob in blobContainer.ListBlobs(prefix: "SharePointBlobs/"))
{
if (blob.GetType() == typeof(CloudBlockBlob))
{
var blockBlob = (CloudBlockBlob)blob;
var content = blockBlob.DownloadText();
var deserialized = JArray.Parse(content);
//deserialized = JsonConvert.DeserializeObject(content);
list.Merge(deserialized);
serializer.Serialize(jw, list);
}
else
{
Console.WriteLine("Non-Block-Blob: " + blob.StorageUri);
}
}
}
}
}
In this situation, to keep your processing and memory footprints low, I think I would just concatenate the files one after the other even though it results in technically invalid JSON. To deserialize the combined file later, you can take advantage of the SupportMultipleContent setting on the JsonTextReader class and process the object collections through a stream as if they were one whole collection. See this answer for an example of how to do this.
Related
I've a JSON model that contains strings instead of dates (the model is generated via T4TS, so I cannot change that).
The code is currently using an expanded model extending the original json, where the dates are recalculated on new fields.
I was wondering if it would be possible to apply the filters on the fields being string without adding that additional step of extending the model.
private makeNumeric(label: string, property: string) {
return {
label: label,
key: property,
prepareDimension: (crossfilter) => (CrossfilterUtils.makeNumeric(crossfilter, property)),
prepareGroup: (dimension) => {
if (!this.values[property]) {
var group = CrossfilterUtils.makeNumericGroup(dimension);
this.values[property] = group;
}
return this.values[property];
},
valuesAreOrdinal: false
};
}
I haven't used the crossfilter library much before and by looking at the documentation I can't seem to reconcile it with the code (heritage code, to put it that way).
The incoming date format looks like this: "2020-10-22T07:26:00Z"
The typescript model I'm working with is like this:
interface MyModel {
...
CreatedDate?: string;
}
Any idea?
The usual pattern in JavaScript is to loop through the data and do any conversions you need:
data.forEach(function(d) {
d.date = new Date(d.date);
d.number = +d.number;
});
const cf = crossfilter(data);
However, if this is not allowed due to TS, you can also make the conversions when creating your dimensions and groups:
const cf = crossfilter(data);
const dateDim = cf.dimension(d => new Date(d.date));
const monthGroup = dateDim.group(date => d3.timeMonth(date))
.reduceSum(d => +d.number);
I find this a little less robust because you have to remember to do this everywhere. It's a little harder to reason about the efficiency since you have to trust that crossfilter uses the accessors sparingly, but I don't recall seeing this be a problem in practice.
I building a GUI in Qt Creator and I want to read a CSV.
So far I can read the CSV as a HTTP Request and I can display it as a text with the following function:
//read CSV Datei
function readTextFile(filename){
var xhr = new XMLHttpRequest;
xhr.open("GET", filename); // set Method and File
xhr.onreadystatechange = function() {
if(xhr.readyState === XMLHttpRequest.DONE){ // if request_status == DONE
var response = xhr.responseText;
screen.liste = response
console.log(response);
}
}
xhr.send(); // begin the request
}
In the following I am trying to find the individual entries of the "array".
Is there a way to split this list in individual strings?
The List has 50 rows and 18 columns and the entries in one row are separated by ';'
For example here are the first two rows:
P22;P64;P99;P20;P88;P18;50;90;80;90;40;0;10;0;40;80;60;20
P51;P44;P57;P46;P96;P10;20;40;50;80;20;60;50;80;0;30;10;50
...
Welcome to SO!
The String QML type extends the JS String object (see at https://doc.qt.io/qt-5/qml-qtqml-string.html#details), so you can just use the split() method to get the needed tokens in an array, which can be indexed with [].
I’m relatively new to Azure Cosmos DB, and I am struggling with how to approach this problem due to some conflicting documentation.
I have a single container, with JSON data.
Each JSON document has a root level array called opcos which can contain N number of GUIDS (typically less than 5).
These opcos GUIDS refer to child items which are ID’s or separate documents.
If a parent document links to a child, then I need to check the child for more children in its opcos node.
Whats the best way to get all the related items, there could be approx. 100 related documents.
I need to keep each document separate, so I can’t store them as sub-documents, as link between parent and child is fluid between multiple parents.
I am looking for a recursive solution, and I am trying to do this from within Cosmos DB, as I am assuming that running potentially 100 calls from outside of Cosmos DB carries a performance overhead with all the connecting etc.
Advice is welcomed, I took a snippet off another article and tried editing it, but it immediately errors onvar context = getContext();
Also, any tips on debugging functions and stored procedures is welcome. I've 15 years of TSQL behind me, but this is very different.
When I tried using a function in Cosmos DB it says
ReferenceError:
'getContext' is not defined
If I try the following code
var context = getContext();
var collection = context.getCollection();
function userDefinedFunction(id){
var context = getContext();
var collection = context.getCollection();
var metadataQuery = 'SELECT company.opcos FROM company where company.id in (' + id + ')';
var metadata = collection.queryDocuments(collection.getSelfLink(), metadataQuery, {}, function (err, documents, options) {
if (err) throw new Error('Error: ', + err.message);
if (!documents || !documents.length) {
throw new Error('Unable to find any documents');
} else {
var response = getContext().getResponse();
/*for (var i = 0; i < documents.length; i++) {
var children = documents[i]['$1'].Children;
if (children.length) {
for (var j = 0; j < children.length; j++) {
var child = children[j];
children[j] = GetWikiChildren(child);
}
}
}*/
response.setBody(documents);
}
});
}
The answer really comes down to your partitioning strategy.
First and foremost your udf doesn't run because UDFs don't have the execution context as part of their API. Your function will run but you need to create it as a stored procedure, not a user defined function.
Now you have to keep in mind that stored procedures can be executed only against a single logical partition and this is their transaction scope. Your technique will work as long as you pass an array of ids in the stored procedure and the documents you're manipulating are in the same partition. If they are not then it's impossible to used a stored proc (well except if you have one per document which probably isn't worth it at this point).
On a side note you want to parameterize the way you add the ids in the query to prevent potential sql injection.
So i am using the sql.js library i.e. the port of sqlite in javascript which can be found here https://github.com/kripken/sql.js.
This is my code to open and read the database that comes from a flat file store locally.
First the file a local file is selected via this HTML
<input type="file" id="input" onchange="handleFiles(this.files)">
The js code behind the scenes is as follows,
function handleFiles(files) {
var file = files[0];
var reader = new FileReader();
reader.readAsBinaryString(file);
openDbOnFileLoad(reader);
function openDbOnFileLoad(reader){
setTimeout(function () {
if(reader.readyState == reader.DONE) {
//console.log(reader.result);
db = SQL.open(bin2Array(reader.result));
execute("SELECT * FROM table");
} else {
//console.log("Waiting for loading...");
openDbOnFileLoad(reader);
}
}, 500);
}
}
function execute(commands) {
commands = commands.replace(/\n/g, '; ');
try {
var data = db.exec(commands);
console.log(data);
} catch(e) {
console.log(e);
}
}
function bin2Array(bin) {
'use strict';
var i, size = bin.length, ary = [];
for (i = 0; i < size; i++) {
ary.push(bin.charCodeAt(i) & 0xFF);
}
return ary;
}
Now this works and i can access all the columns and values in the database, however there is one column which is of type blob and that just shows up as empty. Any ideas of how i can access the contents of this blob?
The correct answer!
So what I was trying to ask in this question is simply how to read the contents of a column of type blob using sql.js. The correct answer is to specify the column names in the question and for the column that contains data of type blob, get its contents using the hex function i.e. select column1,hex(column2) from table. It was by no means a question about the most efficient way of doing this. I have also written a blog post about this.
Here is a slightly modified copy of the function responsible for initializing my sqlite database:
sqlite.prototype._initQueryDb = function(file, callback) {
self = this;
var reader = new FileReader();
// Fires when the file blob is done loading to memory.
reader.onload = function(event) {
var arrayBuffer = event.target.result,
eightBitArray = new Uint8Array(arrayBuffer),
database = SQL.open(eightBitArray);
self._queryDb = database;
// Trigger the callback to the calling function
callback();
}
// Start reading the file blob.
reader.readAsArrayBuffer(file);
}
In this case, file is a local sqlite database handle that I get from an HTML input element. I specify a function to call when a change event happens to that input and get the blob from the resulting event.target.files[0] object.
For the sake of brevity on my part I left some things out but I can throw together a smaller and more simplified example if you are still struggling.
The answer is: with kripken's sql.js, that you mentioned above you can't. At least as of today (may 2014). The original author doesn't maintain sql.js anymore.
However, I'm the author of a fork of sql.js, that is available here: https://github.com/lovasoa/sql.js .
This fork brings several improvements, including support for prepared statements, in which, contrarily to the original version, values are handled in their natural javascript type, and not only as strings.
With this version, you can handle BLOBs (both for reading and writing), they appear as Uint8Arrays (that you can for instance convert to object URL to display contents to your users).
Here is an example of how to read blob data from a database:
var db = new SQL.Database(eightBitArray); // eightBitArray can be an Uint8Array
var stmt = db.prepare("SELECT blob_column FROM your_table");
while (stmt.step()) { // Executed once for every row of result
var my_blob = stmt.get()[0]; // Get the first column of result
//my_blob is now an Uint8Array, do whatever you want with it
}
db.close(); // Free the memory used by the database
You can see the full documentation here: http://lovasoa.github.io/sql.js/documentation/
How do you set the userdata in the controller action. The way I'm doing it is breaking my grid. I'm trying a simple test with no luck. Here's my code which does not work. Thanks.
var dataJson = new
{
total =
page = 1,
records = 10000,
userdata = "{test1:thefield}",
rows = (from e in equipment
select new
{
id = e.equip_id,
cell = new string[] {
e.type_desc,
e.make_descr,
e.model_descr,
e.equip_year,
e.work_loc,
e.insp_due_dt,
e.registered_by,
e.managed_by
}
}).ToArray()
};
return Json(dataJson);
I don't think you have to convert it to an Array. I've used jqGrid and i just let the Json function serialize the object. I'm not certain that would cause a problem, but it's unnecessary at the very least.
Also, your user data would evaluate to a string (because you are sending it as a string). Try sending it as an anonymous object. ie:
userdata = new { test1 = "thefield" },
You need a value for total and a comma between that and page. (I'm guessing that's a typo. I don't think that would compile as is.)
EDIT:
Also, i would recommend adding the option "jsonReader: { repeatitems: false }" to your javascript. This will allow you to send your collection in the "rows" field without converting it to the "{id: ID, cell: [ data_row_as_array ] }" syntax. You can set the property "key = true" in your colModel to indicate which field is the ID. It makes it a lot simpler to pass data to the grid.