Bulk updating data in DocumentDB

Bulk updating data in DocumentDB - azure-cosmosdb

I have a desire to add a property with a default value to a set of documents that I retrieve via a SELECT query if they contain no value.
I was thinking of this in two parts:
SELECT * FROM c article WHERE article.details.locale = 'en-us'
I'd like to find all articles where article.details.x does not exist.
Add the property, article.details.x = true
I was hoping this EXEC command could be supported via the Azure Portal so I don't have to create a migration tool to run this command once but I couldn't find this option in the portal. Is this possible?

You can use Azure Document DB Studio as a front end to creating and executing a stored procedure. It can be found here. It's pretty easy to setup and use.
I've mocked up a stored procedure based on your example:
function updateArticlesDetailsX() {
var collection = getContext().getCollection();
var collectionLink = collection.getSelfLink();
var response = getContext().getResponse();
var docCount = 0;
var counter = 0;
tryQueryAndUpdate();
function tryQueryAndUpdate(continuation) {
var query = {
query: "select * from root r where IS_DEFINED(r.details.x) != true"
};
var requestOptions = {
continuation: continuation
};
var isAccepted =
collection
.queryDocuments(collectionLink,
query,
requestOptions,
function queryCallback(err, documents, responseOptions) {
if (err) throw err;
if (documents.length > 0) {
// If at least one document is found, update it.
docCount = documents.length;
for (var i=0; i<docCount; i++){
tryUpdate(documents[i]);
}
response.setBody("Updated " + docCount + " documents");
}
else if (responseOptions.continuation) {
// Else if the query came back empty, but with a continuation token;
// repeat the query w/ the token.
tryQueryAndUpdate(responseOptions.continuation);
} else {
throw new Error("Document not found.");
}
});
if (!isAccepted) {
throw new Error("The stored procedure timed out");
}
}
function tryUpdate(document) {
//Optimistic concurrency control via HTTP ETag.
var requestOptions = { etag: document._etag };
//Update statement goes here:
document.details.x = "some new value";
var isAccepted = collection
.replaceDocument(document._self,
document,
requestOptions,
function replaceCallback(err, updatedDocument, responseOptions) {
if (err) throw err;
counter++;
});
// If we hit execution bounds - throw an exception.
if (!isAccepted) {
throw new Error("The stored procedure timed out");
}
}
}
I got the rough outline for this code from Andrew Liu on GitHub.
This outline should be close to what you need to do.

DocumentDB has no way in a single query to update a bunch of documents. However, the portal does have a Script Explorer that allows you to write and execute a stored procedure against a single collection. Here is an example sproc that combines a query with a replaceDocument command to update some documents that you could use as a starting point for writing your own. The one gotcha to keep in mind is that DocumentDB will not allow sprocs to run longer than 5 seconds (with some buffer). So you may have to run your sproc multiple times and keep track of what you've already done if it can't complete in one 5 second run. The use of IS_DEFINED(collection.field.subfield) != true (thanks #cnaegle) in your query followed up by a document replacement that defines that field (or removes that document) should allow you to run the sproc as many times as necessary.
If you didn't want to write a sproc, the easiest thing to do would be to export the database using the DocumentDB Data Migration tool. Import that into Excel to manipulate or write a script to do the manipulation. Then upload it again using the Data Migration tool.

Related

qml LocalStorage query threaded

Is it possible to run a localstorage database query in qml in a spererate thread?
My query on a big database takes 500ms which blocks the responsivenes of the ui.
My relevant code:
property var db: null
function openDB() {
if(db !== null) return;
db = LocalStorage.openDatabaseSync("dbname", "0.1", "dbname", 3000000000);
}
function runQuery(query)
{
var results
db.readTransaction(function(tx) {
results = tx.executeSql(query)
});
return results;
}
I want to run a query every second without blocking the ui.
e.g.
var osresult = runQuery('SELECT * FROM os_data')
where later I loop over the results to show them in a graph.
I looked at WorkerScript but with WorkerScript I have to use js files where I can not use LocalStorage.openDatabaseSync.
How should one run queries in qml without blocking the UI?

Change feed not trigger with multiple insert document in Cosmos DB

I replicated the architecture below.
I insert the gps positions (document db) in cosmos db and in the javascript client (maps google) the pin moves.
All the step works: insert document db, trigger azure function and signalr that link client and document db in azure cosmos db.
The code to upload a document db in Cosmos:
Microsoft.Azure.Documents.Document doc = client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(databaseName, collectionName), estimatedPathDocument).Result.Resource;
ret[0] = doc.Id;
Azure function:
public static async Task Run(IReadOnlyList<Document> input, IAsyncCollector<SignalRMessage> signalRMessages, ILogger log)
{
if (input != null && input.Count > 0)
{
var val = input.Select((d) => new
{
genKey = d.GetPropertyValue<string>("genKey"),
dataType = d.GetPropertyValue<string>("dataType")
});
await signalRMessages.AddAsync(new SignalRMessage
{
UserId = val.First().genKey,
Target = "tripUpdated",
Arguments = new[] { input }
});
}
}
When I insert only one position the function in azure records the event and fire by moving the pin.
The problem is when I insert sequentially a series of positions in an almost instantaneous way and this does not trigger the function for the document following the first one.
Only if i insert a delay only some documents fire the trigger:
Microsoft.Azure.Documents.Document doc = client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(databaseName, collectionName), estimatedPathDocument).Result.Resource;
Thread.Sleep(3000);
ret[0] = doc.Id;
I don't know if I load the documents correctly, but even managing them in an asynchronous way (see under), it almost seems like the trigger is triggered only when the document in cosmos db is "really / physically" created.
Task.Run(async () => await AzureCosmosDB_class.MyDocumentAzureCosmosDB.CreateRealCoordDocumentIfNotExists_v1("axylog-cdb-01", "axylog-collection-01", realCoord, uri, key));
The solution can be to list the documents in a queue and load them on azure cosmos sequentially after a delay of about ten seconds one from the other?

How to send List of objects to google execution api and handle it in google apps script

In general I want to export data from asp.net mvc application to Google Sheets for example list of people. I've already set up connection and authenticated app with my Google account (trough OAuth2) but now I'm trying to send my list of objects to api and then handle it in script (by putting all data in new file) and couldn't get my head around this.
Here is some sample code in my app that sends the request.
public async Task<ActionResult> SendTestData()
{
var result = new AuthorizationCodeMvcApp(this, new AppFlowMetadata()).
AuthorizeAsync(CancellationToken.None).Result;
if (result.Credential != null)
{
string scriptId = "MY_SCRIPT_ID";
var service = new ScriptService(new BaseClientService.Initializer
{
HttpClientInitializer = result.Credential,
ApplicationName = "Test"
});
IList<object> parameters = new List<object>();
var people= new List<Person>(); // next i'm selecting data from db.Person to this variable
parameters.Add(people);
ExecutionRequest req = new ExecutionRequest();
req.Function = "testFunction";
req.Parameters = parameters;
ScriptsResource.RunRequest runReq = service.Scripts.Run(req, scriptId);
try
{
Operation op = runReq.Execute();
if (op.Error != null)
{
// The API executed, but the script returned an error.
// Extract the first (and only) set of error details
// as a IDictionary. The values of this dictionary are
// the script's 'errorMessage' and 'errorType', and an
// array of stack trace elements. Casting the array as
// a JSON JArray allows the trace elements to be accessed
// directly.
IDictionary<string, object> error = op.Error.Details[0];
if (error["scriptStackTraceElements"] != null)
{
// There may not be a stacktrace if the script didn't
// start executing.
Newtonsoft.Json.Linq.JArray st =
(Newtonsoft.Json.Linq.JArray)error["scriptStackTraceElements"];
}
}
else
{
// The result provided by the API needs to be cast into
// the correct type, based upon what types the Apps
// Script function returns. Here, the function returns
// an Apps Script Object with String keys and values.
// It is most convenient to cast the return value as a JSON
// JObject (folderSet).
Newtonsoft.Json.Linq.JObject folderSet =
(Newtonsoft.Json.Linq.JObject)op.Response["result"];
}
}
catch (Google.GoogleApiException e)
{
// The API encountered a problem before the script
// started executing.
AddAlert(Severity.error, e.Message);
}
return RedirectToAction("Index", "Controller");
}
else
{
return new RedirectResult(result.RedirectUri);
}
}
The next is how to handle this data in scripts - are they serialized to JSON there?

The execution API calls are essentially REST calls so the payload should be serialized as per that. Stringified JSON is typically fine. Your GAS function should then parse that payload to consume the encoded lists
var data = JSON.parse(payload);

Rethinkdb pattern for dynamic pubsubs through socketio

Coming from the Meteor world & I'm curious how to replicate the cached pubsub/observers functionality. For a basic example, let's say I have a todolist where each todo has a userId and I want to keep todos private to each userId (but a userId could exist on multiple connected devices, eg phone + desktop). I imagine I have to create some publish function that verifies the userId by the socketId from the sent request, then create a socket namespace specific to that query (since the query could include more than a userId constraint). Then, register an emitter that only sends the changes to those socketIds that are verified to listen to the given namespace. Am I close? All my research just returns basic things like publishing to all connected users based on keywords. Any links to reading material would be great! Here's a first attempt with the missing logic in comments...
export function sendTodosByUserId(io, userId) {
//How to auth? By linking a client socketId to a user in a lookup table?
connect()
.then(conn => {
r
.table('todos')
.filter(todos => todos("userId").eq(userId))
.changes().run(conn, (err, cursor) => {
cursor.each((err, change) => {
//Do I emit a unique message? namespace? How do I handle 2 clients using the same userId?
io.emit('TODO_CHANGE', change);
});
});
});
}

You could implement some way of mapping new sockets to the correct user. For example, you could put the userId in a Express session and store the user's socket ids in a simple object.
var userSockets = {};
io.sockets.on('connection', function(socket) {
var userId = socket.handshake.session.userId;
if(userSockets[userId]) {
userSockets[userId].push(socket.id);
} else {
userSockets[userId] = [socket.id];
sendTodosByUserId(io, userId)
}
socket.on('disconnect', function() {
var i = userSockets[userId].indexOf(socket.id);
userSockets[userId].splice(i);
if(userSockets[userId].length === 0) {
delete userSockets[userId];
}
});
});
In your sendTodosByUsedId, you would just loop over the socket array belonging to the user, and emit to every socket.
var sockets = userSockets[userId];
for(var i = 0; i < sockets.length; ++i) {
io.to(sockets[i]).emit('TODO_CHANGE', change);
}
Note that this will not work if you have multiple nodes the user can be connected to. Then you might have to store the userSockets-object in e.g. Redis.
Alternatively, you could just have your users join a room named e.g. user:<userId>, and emit to this on every todo-change.
io.sockets.on('connection', function(socket) {
var userId = socket.handshake.session.userId;
socket.join('user:' + userId);
socket.on('disconnect', function() {
console.log("Rooms are left automatically on disconnect");
}
});
On todo change:
io.to('user:' + userId).emit('TODO_CHANGE', change);

On Meteor, how can I validate with Collection2 on the client side?

I always use methods to insert, update and remove. This is the way my code look just now:
Client side
Template.createClient.events({
'submit form': function(event, tmpl) {
e.preventDefault();
var client = {
name: event.target.name.value,
// .... more fields
}
var validatedData = Clients.validate(client);
if (validatedData.errors) {
// Display validation errors
return;
}
Meteor.call('createClient', validatedData.client, function(error) {
if (error)
// Display error
});
}
});
Client and server side:
Clients = new Mongo.Collection("clients");
Clients.validate = function(client) {
// ---- Clean data ----
client.name = _.str.trim(client.name);
// .... more fields clean
// ---- Validate data ---
var errors = [];
if (!client.name)
errors.push("The name is required.");
// .... more fields validation
// Return and object with errors and cleaned data
return { errors: _.isEmpty(errors) ? undefined : errors, client: client };
}
Meteor.methods({
'createClient': function (client) {
// --- Validate user permisions ---
// If server, validate data again
if (Meteor.isServer) {
var validatedData = Clients.validate(client);
if (validatedData.errors)
// There is no need to send a detailed error, because data was validated on client before
throw new Meteor.Error(500, "Invalid client.");
client = validatedData.client;
}
check(client, {
name: String,
// .... more fields
});
return Clients.insert(client);
}
});
Meteor.call is executed on client and server side, but Meteor doesn't have a way stop the running on the server side if the validation on the client side fails (or at least, I don't know how). With this pattern, I avoid sending data to the server with Meteor.call if validation fail.
I want to start using Collection2, but I can't figure how to get the same pattern. All the examples I found involve the usage of direct Insert and Update on client side and Allow/Deny to manage security, but I want to stick with Meteor.call.
I found on documentation that I can validate before insert or update, but I don't know how to get this to work:
Books.simpleSchema().namedContext().validate({title: "Ulysses", author: "James Joyce"}, {modifier: false});
I know the autoform package, but I want to avoid that package for now.
How can I validate with Collection2 on the client side before sending data to the server side with Meteor.call? Is my pattern wrong or incompatible with Collection2 and I need to do it in another way?

In under 30 lines you can write your very own, full-featured validation package for Collection2. Let's walk through an example:
"use strict"; //keep it clean
var simplyValid = window.simplyValid = {}; //OK, not that clean (global object)
simplyValid.RD = new ReactiveDict(); //store error messages here
/**
*
* #param data is an object with the collection name, index (if storing an array), and field name, as stored in the schema (e.g. 'foo.$.bar')
* #param value is the user-inputted value
* #returns {boolean} true if it's valid
*/
simplyValid.validateField = function (data, value) {
var schema = R.C[data.collection]._c2._simpleSchema; //access the schema from the local collection, 'R.C' is where I store all my collections
var field = data.field;
var fieldVal = field.replace('$', data.idx); //make a seperate key for each array val
var objToValidate = {};
var dbValue = schema._schema[field].dbValue; //custom conversion (standard to metric, dollars to cents, etc.) IGNORE
if (dbValue && value) value = dbValue.call({value: value}); //IGNORE
objToValidate[field] = value; //create a doc to clean
schema.clean(objToValidate, {removeEmptyStrings: false}); //clean the data (trim, etc.)
var isValid = schema.namedContext().validateOne(objToValidate, field, {extendedCustomContext: true}); //FINALLY, we validate
if (isValid) {
simplyValid.RD.set(fieldVal, undefined); //The RD stores error messages, if it's valid, it won't have one
return true;
}
var errorType = schema.namedContext()._getInvalidKeyObject(field).type; //get the error type
var errorMessage = schema.messageForError(errorType, field); //get the message for the given error type
simplyValid.RD.set(fieldVal, errorMessage); //set the error message. it's important to validate on error message because changing an input could get rid of an error message & produce another one
return false;
};
simplyValid.isFieldValid = function (field) {
return simplyValid.RD.equals(field, undefined); //a very cheap function to get the valid state
};
Feel free to hack out the pieces you need and shoot me any questions you might have.

You can send the schema to the client and validate before sending to the server. If you want to use Collection" you need to attach the schema to the collection and use the insert which is something that you don't want. So the best option, for your scenario, is sending the schema to the client and use it to validate.
Also reconsider using mini-mongo instead of using Methods for everything, it will save you lots of time and don't think your app is secure jut because you're using Methods.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Bulk updating data in DocumentDB - azure-cosmosdb

Related

qml LocalStorage query threaded

Change feed not trigger with multiple insert document in Cosmos DB

How to send List of objects to google execution api and handle it in google apps script

Rethinkdb pattern for dynamic pubsubs through socketio

On Meteor, how can I validate with Collection2 on the client side?

Categories

Resources