Cosmosdb store procedure get less documents than real - azure-cosmosdb

I am preparing store procedure on cosmosdb by Javascript, however, it gets less documents than the real number of documents in collection.
The sproc is called by C#, C# pass a parameter "transmitterMMSI" which is also the partition key of this collection.
First, the following query is executed in sproc:
var query = 'SELECT COUNT(1) AS Num FROM AISData a WHERE a.TransmitterMMSI="' + transmitterMMSI + '"';
The result is output in response, and the value is 5761, which is the same as the real number of documents in collection.
However, when I change the query to the following:
var query = 'SELECT * FROM AISData a WHERE a.TransmitterMMSI="' + transmitterMMSI + '"';
The documents.length is output as 5574, which is smaller than the real number.
I have already changed the pageSize: -1, which should mean unlimited.
I did some search with google and stack overflow, it seems that continuation can be help. However, I tried some examples, and they don't work.
Anyone familiar with this can help?
The following list the scripts.
The sproc js script is here, which is also the file "DownSampling.js" used in the C# code:
function DownSampling(transmitterMMSI, interval) {
var context = getContext();
var collection = context.getCollection();
var response = context.getResponse();
var receiverTime;
var tempTime;
var groupKey;
var aggGroup = new Object();
var query = 'SELECT * FROM AISData a WHERE a.TransmitterMMSI="' + transmitterMMSI + '"';
var accept = collection.queryDocuments(collection.getSelfLink(), query, { pageSize: -1},
function (err, documents, responseOptions) {
if (err) throw new Error("Error" + err.message);
// Find the smallest deviation comparting to IntervalTime in each group
for (i = 0; i < documents.length; i++) {
receiverTime = Date.parse(documents[i].ReceiverTime);
tempTime = receiverTime / 1000 + interval / 2;
documents[i].IntervalTime = (tempTime - tempTime % interval) * 1000;
documents[i].Deviation = Math.abs(receiverTime - documents[i].IntervalTime);
// Generate a group key for each group, combinated of TransmitterMMSI and IntervalTime
groupKey = documents[i].IntervalTime.toString();
if (typeof aggGroup[groupKey] === 'undefined' || aggGroup[groupKey] > documents[i].Deviation) {
aggGroup[groupKey] = documents[i].Deviation;
}
}
// Tag the downsampling
for (i = 0; i < documents.length; i++) {
groupKey = documents[i].IntervalTime;
if (aggGroup[groupKey] == documents[i].Deviation) {
documents[i].DownSamplingTag = 1;
} else {
documents[i].DownSamplingTag = 0;
}
// Remove the items that are not used
delete documents[i].IntervalTime;
delete documents[i].Deviation;
// Replace the document
var acceptDoc = collection.replaceDocument(documents[i]._self, documents[i], {},
function (errDoc, docReplaced) {
if (errDoc) {
throw new Error("Update documents error:" + errDoc.message);
}
});
if (!acceptDoc) {
throw "Update documents not accepted, abort ";
}
}
response.setBody(documents.length);
});
if (!accept) {
throw new Error("The stored procedure timed out.");
}
}
And the C# code is here:
private async Task DownSampling()
{
Database database = this.client.CreateDatabaseQuery().Where(db => db.Id == DatabaseId).ToArray().FirstOrDefault();
DocumentCollection collection = this.client.CreateDocumentCollectionQuery(database.SelfLink).Where(c => c.Id == AISTestCollectionId).ToArray().FirstOrDefault();
string scriptFileName = #"..\..\StoredProcedures\DownSampling.js";
string scriptId = Path.GetFileNameWithoutExtension(scriptFileName);
var sproc = new StoredProcedure
{
Id = scriptId,
Body = File.ReadAllText(scriptFileName)
};
await TryDeleteStoredProcedure(collection.SelfLink, sproc.Id);
sproc = await this.client.CreateStoredProcedureAsync(collection.SelfLink, sproc);
IQueryable<dynamic> query = this.client.CreateDocumentQuery(
UriFactory.CreateDocumentCollectionUri(DatabaseId, AISTestCollectionId),
new SqlQuerySpec()
{
//QueryText = "SELECT a.TransmitterMMSI FROM " + AISTestCollectionId + " a",
QueryText = "SELECT a.TransmitterMMSI FROM " + AISTestCollectionId + " a WHERE a.TransmitterMMSI=\"219633000\"",
}, new FeedOptions { MaxItemCount = -1, EnableCrossPartitionQuery = true, MaxDegreeOfParallelism = -1, MaxBufferedItemCount = -1 });
List<dynamic> transmitterMMSIList = query.ToList(); //TODO: Remove duplicates
Console.WriteLine("TransmitterMMSI count: {0}", transmitterMMSIList.Count());
HashSet<string> exist = new HashSet<string>();
foreach (var item in transmitterMMSIList)
{
//int transmitterMMSI = Int32.Parse(item.TransmitterMMSI.ToString());
string transmitterMMSI = item.TransmitterMMSI.ToString();
if (exist.Contains(transmitterMMSI))
{
continue;
}
exist.Add(transmitterMMSI);
Console.WriteLine("TransmitterMMSI: {0} is being processed.", transmitterMMSI);
var response = await this.client.ExecuteStoredProcedureAsync<string>(sproc.SelfLink,
new RequestOptions { PartitionKey = new PartitionKey(transmitterMMSI) }, transmitterMMSI, 30);
string s = response.Response;
Console.WriteLine("TransmitterMMSI: {0} is processed completely.", transmitterMMSI);
}
}
private async Task TryDeleteStoredProcedure(string collectionSelfLink, string sprocId)
{
StoredProcedure sproc = this.client.CreateStoredProcedureQuery(collectionSelfLink).Where(s => s.Id == sprocId).AsEnumerable().FirstOrDefault();
if (sproc != null)
{
await client.DeleteStoredProcedureAsync(sproc.SelfLink);
}
}
I tried to comment the 2 loops in the JS codes, only the documents.length output, while the response number is still less. However, I changed the query to SELECT a.id, the documents.length is correct. Looks like it is the continuation issue.

The sproc is probably timing out. To use a continuation token in these circumstances, you will need to return it to your C# calling code then make another call to the sproc passing in your token. If you show us your sproc code we can help more.

You can use a continuation token to make repeated calls to queryDocuments() from within the sproc without additional roundtrips to the client. Keep in mind that if you do this too many times your sproc will eventually timeout, though. In your case, it sounds like you're already very close to getting all of the documents you're seeking so maybe you will be OK.
Here is an example of using a continuation token within a sproc to query multiple pages of data:
function getManyThings() {
var collection = getContext().getCollection();
var query = {
query: 'SELECT r.id, r.FieldOne, r.FieldTwo FROM root r WHERE r.FieldThree="sought"'
};
var continuationToken;
getThings(continuationToken);
function getThings(continuationToken) {
var requestOptions = {
continuation: continuationToken,
pageSize: 1000 // Adjust this to suit your needs
};
var isAccepted = collection.queryDocuments(collection.getSelfLink(), query, requestOptions, function (err, feed, responseOptions) {
if (err) {
throw err;
}
for (var i = 0, len = feed.length; i < len; i++) {
var thing = feed[i];
// Do your logic on thing...
}
if (responseOptions.continuation) {
getThings(responseOptions.continuation);
}
else {
var response = getContext().getResponse();
response.setBody("RESULTS OF YOUR LOGIC");
}
});
if (!isAccepted) {
var response = getContext().getResponse();
response.setBody("Server rejected query - please narrow search criteria");
}
}
}

Related

CosmosDB Stored Procedure returns error "Encountered exception while executing Javascript. Exception = Error: Invalid arguments for query"

I have below stored procedure which is written to work on continuation token based mechanism for fetching documents from the documentDB collection:
I am getting exception with query.
I am trying to get all the documents using continuation token.
Is this the correct way?
function getOrdersByBranchNumber(branchNumber){
var context = getContext();
var collection = context.getCollection();
var link = collection.getSelfLink();
var response = context.getResponse();
var nodesBatch = [];
var continuationToken = true;
var responseSize = 0;
//validate inputs
if(!branchNumber || (typeof branchNumber != "string")){
return errorResponse(400, (!branchNumber) ? "branchNumber is Undefined":"String type is expected for branchNumber.");
}
var querySelect = "SELECT * from orders o WHERE o.branchNbr = '"+branchNumber+"' ";
var query = { query: querySelect};
getNodes(continuationToken);
function getNodes(continuationToken) {
var requestOptions = {
continuation: continuationToken,
pageSize: 90
};
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
query,
requestOptions,
function (err, feed, options) {
var queryPageSize = JSON.stringify(feed).length;
// DocumentDB has a response size limit of 1 MB.
if (responseSize + queryPageSize < 1024 * 1024) {
// Append query results to nodesBatch.
nodesBatch = nodesBatch.concat(feed);
// Keep track of the response size.
responseSize += queryPageSize;
if (responseOptions.continuation) {
// If there is a continuation token... Run the query again to get the next page of results
lastContinuationToken = responseOptions.continuation;
getNodes(responseOptions.continuation);
} else {
// If there is no continutation token, we are done. Return the response.
var feedData=JSON.stringify(nodesBatch);
getContext().getResponse().setBody(feedData);
}
} else {
// If the response size limit reached; run the script again with the lastContinuationToken as a script parameter.
var feedData=JSON.stringify(nodesBatch);
getContext().getResponse().setBody(feedData);
}
});
if (!isAccepted){
return errorResponse(400, "The query was not accepted by the server.");
}
}
function errorResponse(code,message){
var errorObj = {};
errorObj.code = code;
errorObj.message = message;
errorObj.date = getDateTime();
return response.setBody(errorObj);
}
function getDateTime(){
var currentdate = new Date();
var dateTime = currentdate.getFullYear() + "-" +(currentdate.getMonth()+1)+ "-" + currentdate.getDate()+ " " +currentdate.getHours()+":"+currentdate.getMinutes() +":"+currentdate.getSeconds();
return dateTime;
}
}
But I am seeing below mentioned error after saving and executing the stored procedure.
Any idea as to what is wrong in query?
Encountered exception while executing Javascript. Exception = Error: Invalid
arguments for query.
Stack trace: Error: Invalid arguments for query.
at queryDocuments (getOrdersByBranchNumber.js:614:17)
at getNodes (getOrdersByBranchNumber.js:27:5)
at getOrdersByBranchNumber (getOrdersByBranchNumber.js:19:1)
at __docDbMain (getOrdersByBranchNumber.js:78:5)
at Global code (getOrdersByBranchNumber.js:1:2)
Try removing the continuation item on the query options. I got this error when I had an invalid one specified.

How do scrape table from the provided website using casperjs?

The final goal is to retrieve stock data in table form from provided broker website and save it to some text file. Here is the code, that I managed to compile so far by reading few tutorials:
var casper = require("casper").create();
var url = 'https://iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=60&date=2016-12-19-21-59';
var terminate = function() {
this.echo("Exiting ...").exit();
};
var processPage = function() {
var rows = document.querySelectorAll('#mCSB_3_container > table'); //get table from broker site (copy/paste via copy selector in chrome tools)
//var nodes = document.getElementsByClassName('mCSB_container');
this.echo(rows);
this.echo(rows.length);
for (var i = 0; i < rows.length; i++)
{
var cell = rows[i].querySelector('.quotes-table-result__date');
this.echo(cell); //print each cell
}
};
casper.start(url);
casper.waitForSelector('#mCSB_3_container', processPage, terminate);
casper.run();
This code should retrieve the stock price table and print out each cell. However, all what I get is 'undefined', which likely means that I got no objects returned by queryselector call. And please assume that I don't know any web programming (HTML,CSS).
First of all, on problem is that the waitFor wasn't set so good, you have to wait for the rows/cells.
The Nodes you get out on this page are a bit wired,if anybody got a more abstract solution where ChildNodes are better handled that in my solution i would be really interested:
var casper = require('casper').create();
var url = 'https://eu.iqoption.com/en/historical-financial-quotes?active_id=1&tz_offset=60&date=2016-12-19-21-59';
var length;
casper.start(url);
casper.then(function() {
this.waitForSelector('#mCSB_3_container table tbody tr');
});
function getCellContent(row, cell) {
cellText = casper.evaluate(function(row, cell) {
return document.querySelectorAll('table tbody tr')[row].childNodes[cell].innerText.trim();
}, row, cell);
return cellText;
}
casper.then(function() {
var rows = casper.evaluate(function() {
return document.querySelectorAll('table tbody tr');
});
length = rows.length;
this.echo("table length: " + length);
});
// This part can be done nicer, but it's the way it should work ...
casper.then(function() {
for (var i = 0; i < length; i++) {
this.echo("Date: " + getCellContent(i, 0));
this.echo("Bid: " + getCellContent(i, 1));
this.echo("Ask: " + getCellContent(i, 2));
this.echo("Quotes: " + getCellContent(i, 3));
}
});
casper.run();

DocumentDB resource with specified id already exists when running a pre-trigger on create

In DocumentDB I've create a pre trigger on Create operation. The trigger code is the following
function createBlock() {
var collection = getContext().getCollection();
var request = getContext().getRequest();
var docToCreate = request.getBody();
if (docToCreate.DocumentType)
{
var query = "SELECT TOP 1 a.BlockSequence FROM a ORDER BY a.BlockSequence DESC";
var isAccepted = collection.queryDocuments(collection.getSelfLink(), query, function (err, feed, options) {
if (err)
throw err;
if (!feed)
throw new Error("Failed to find the document.");
if (feed.length)
{
docToCreate.BlockCode += (feed[0].BlockSequence + 1);
docToCreate.BlockSequence = feed[0].BlockSequence + 1;
}
else
{
docToCreate.BlockCode += "1";
docToCreate.BlockSequence = 1;
}
var isAccepted = collection.createDocument(collection.getSelfLink(), docToCreate);
if (!isAccepted)
throw new Error("The call createDocument returned false.");
});
}
else
throw new Error("DocumentType property is required.");
if (!isAccepted)
throw new Error("The call queryDocuments returned false.");
}
The trigger is executed up to the line immediately above the var isAccepted = collection.createDocument(collection.getSelfLink(), docToCreate);.
When the var isAccepted = collection.createDocument(collection.getSelfLink(), docToCreate); is executed, this error is thrown Message: {"Errors":["Resource with specified id or name already exists"]}
I've checked and no documents with the same id of the new document is stored into this collection.
You shouldn't try to do the write in your trigger. You should simply modify the body or throw an error. In modifying the body, you change the document that is created. In throwing an error you abort the operation.
So instead of:
var isAccepted = collection.createDocument(collection.getSelfLink(), docToCreate);
Do:
return request.setBody(docToCreate);

Need help on using documentdb-lumenize on .net documentdb client sdk

I have a problem using the aggregate storedproc lumenize https://github.com/lmaccherone/documentdb-lumenize with the .net client. I get error when try passing in the parameter and query into the storedproc. Below is my code
public async static void QuerySP() {
using (client = new DocumentClient(new Uri(endpointUrl), authorizationKey))
{
//Get the Database
var database = client.CreateDatabaseQuery().Where(db => db.Id == databaseId).ToArray().FirstOrDefault();
//Get the Document Collection
var collection = client.CreateDocumentCollectionQuery(database.SelfLink).Where(c => c.Id == collectionId).ToArray().FirstOrDefault();
StoredProcedure storedProc = client.CreateStoredProcedureQuery(collection.StoredProceduresLink).Where(sp => sp.Id == "cube").ToArray().FirstOrDefault();
dynamic result = await client.ExecuteStoredProcedureAsync<dynamic>(storedProc.SelfLink, "{cubeConfig: {groupBy: 'publication', field: 'pid', f: 'count'}, filterQuery: 'SELECT pid, publication FROM c'}");
Console.WriteLine("Result from script: {0}\r\n", result.Response);
}
}
I am getting the following error when execute the code
Message: {"Errors":["Encountered exception while executing Javascript. Exception = Error: cubeConfig or savedCube required\r\nStack trace: Error: cubeConfig or savedCube required\n at fn (cube.js:1803:7)\n at __docDbMain (cube.js:1844:5)\n at Unknown script code (cube.js:1:2)"]}
Not sure what I had done wrong. I would really appreciate the help. Thanks.
You almost have it. The problem is that you are sending in the cubeConfig as a string. It needs to be an object. Here is code that does that:
string cubeConfigString = #"{
cubeConfig: {
groupBy: 'publication',
field: 'pid',
f: 'count'
},
filterQuery: 'SELECT * FROM c'
}";
Object cubeConfig = JsonConvert.DeserializeObject<Object>(cubeConfigString);
Console.WriteLine(cubeConfig);
dynamic result = await client.ExecuteStoredProcedureAsync<dynamic>("dbs/dev-test-database/colls/dev-test-collection/sprocs/cube", cubeConfig);
Console.WriteLine(result.Response);
my working code
public async static Task QuerySP2()
{
using (client = new DocumentClient(new Uri(endpointUrl), authorizationKey))
{
//Get the Database
var database = client.CreateDatabaseQuery().Where(db => db.Id == databaseId).ToArray().FirstOrDefault();
//Get the Document Collection
var collection = client.CreateDocumentCollectionQuery(database.SelfLink).Where(c => c.Id == collectionId).ToArray().FirstOrDefault();
StoredProcedure storedProc = client.CreateStoredProcedureQuery(collection.StoredProceduresLink).Where(sp => sp.Id == "cube").ToArray().FirstOrDefault();
string filterQuery = string.Format(#"SELECT * from c");
string cubeConfigString = #"{
cubeConfig: {
groupBy: 'publication',
field: 'id',
f: 'count'
},
filterQuery: '" + filterQuery + "'}";
dynamic cubeConfig = JsonConvert.DeserializeObject<dynamic>(cubeConfigString);
Console.WriteLine(cubeConfig);
string continuationToken = null;
dynamic result=null;
do
{
var queryDone = false;
while (!queryDone)
{
try
{
result = await client.ExecuteStoredProcedureAsync<dynamic>(storedProc.SelfLink, cubeConfig);
cubeConfig = result.Response;
continuationToken = cubeConfig.continuation;
queryDone = true;
}
catch (DocumentClientException documentClientException)
{
var statusCode = (int)documentClientException.StatusCode;
if (statusCode == 429 || statusCode == 503)
System.Threading.Thread.Sleep(documentClientException.RetryAfter);
else
throw;
}
catch (AggregateException aggregateException)
{
if (aggregateException.InnerException.GetType() == typeof(DocumentClientException))
{
var docExcep = aggregateException.InnerException as DocumentClientException;
var statusCode = (int)docExcep.StatusCode;
if (statusCode == 429 || statusCode == 503)
System.Threading.Thread.Sleep(docExcep.RetryAfter);
else
throw;
}
}
}
} while (continuationToken != null);
Console.WriteLine("Result from script: {0}\r\n", result.Response);
}
}

Asp.Net MVC3, Update query in Linq

I'd like to know how to run this query in Linq way.
UPDATE orders SET shipDate = '6/15/2012' WHERE orderId IN ('123123','4986948','23947439')
My Codes,
[HttpGet]
public void test()
{
EFOrdersRepository ordersRepository = new EFOrdersRepository();
var query = ordersRepository.Orders;
// How to run this query in LINQ
// Query : UPDATE orders SET shipDate = '6/15/2012' WHERE orderId IN ('123123','4986948','23947439')
}
EFOrdersRepository.cs
public class EFOrdersRepository
{
private EFMysqlContext context = new EFMysqlContext();
public IQueryable<Order> Orders
{
get { return context.orders; }
}
}
EFMysqlContext.cs
class EFMysqlContext : DbContext
{
public DbSet<Order> orders { get; set; }
}
Actually it's pretty easy check the following code
EFOrdersRepository db = new EFOrdersRepository();
int[] ids= new string[] { "123123", "4986948", "23947439"};
//this linq give's the orders with the numbers
List<Order> orders = db.Order().ToList()
.Where( x => ids.Contains(x.orderId.Contains));
foreach(var order in orders)
{
order.ShipDate = '06/15/2012';
db.Entry(usuario).State = EntityState.Modified;
}
db.SaveChanges();
Something like this should work (warning Pseudo code ahead!!)
EDIT I like using the Jorge's method of retrieving the orders better (using contains), but leaving this here as another alternative. The statements below the code sample still hold true however.
[HttpGet]
public void test()
{
EFOrdersRepository ordersRepository = new EFOrdersRepository();
var query = ordersRepository.Orders.Where(x=>x.orderId == '123123' ||
x.orderId == '4986948' || x.orderId = '23947439').ToList();
foreach(var order in query){
var localOrder = order;
order.ShipDate = '06/15/2012';
}
ordersRepository.SaveChanges();
}
Basically, LINQ does not do 'bulk updates' well. You either have to fetch and loop through your orders or write a stored procedure that can take an array of ids and bulk update them that way. If you are only doing a few at a time, the above will work ok. If you have tons of orders that need to be updated, the ORM probably will not be the best choice. I look forward to see if anyone else has a better approach.
Disclaimer: the var localOrder = order line is to ensure that there are no modified closure issues. Also, ReSharper and other tools may have a less verbose way of writing the above.
Note: You need to call SaveChanges from your DBContext at the end
Short answer:
var f = new[] { 123123, 4986948, 23947439 };
var matchingOrders = orders.Where(x => f.Contains(x.ID)).ToList();
matchingOrders.ForEach(x => x.ShipDate = newDate);
Complete test:
// new date value
var newDate = new DateTime(2012, 6, 15);
// id's
var f = new[] { 123123, 4986948, 23947439 };
// simpulating the orders from the db
var orders = Builder<Order2>.CreateListOfSize(10).Build().ToList();
orders.Add(new Order2 { ID = 123123 });
orders.Add(new Order2 { ID = 4986948 });
orders.Add(new Order2 { ID = 23947439 });
// selecting only the matching orders
var matchingOrders = orders.Where(x => f.Contains(x.ID)).ToList();
matchingOrders.ForEach(x => Console.WriteLine("ID: " + x.ID + " Date: " + x.ShipDate.ToShortDateString()));
// setting the new value to all the results
matchingOrders.ForEach(x => x.ShipDate = newDate);
matchingOrders.ForEach(x => Console.WriteLine("ID: " + x.ID + " Date: " + x.ShipDate.ToShortDateString()));
Output:
ID: 123123 Date: 1/1/0001
ID: 4986948 Date: 1/1/0001
ID: 23947439 Date: 1/1/0001
ID: 123123 Date: 6/15/2012
ID: 4986948 Date: 6/15/2012
ID: 23947439 Date: 6/15/2012
In ORMs, You have to fetch the record first make the change to the record then save it back. To do that, I will add an UpdateOrder method to my Repositary like this
public bool UpdateOrder(Order order)
{
int result=false;
int n=0;
context.Orders.Attach(order);
context.Entry(order).State=EntityState.Modified;
try
{
n=context.SaveChanges();
result=true;
}
catch (DbUpdateConcurrencyException ex)
{
ex.Entries.Single().Reload();
n= context.SaveChanges();
result= true;
}
catch (Exception ex2)
{
//log error or propogate to parent
}
return result;
}
And i will call it from my Action method like this
int orderId=123232;
var orders=ordersRepository.Orders.Where(x=> x.orderId.Contains(orderId)).ToList();
if(orders!=null)
{
foreach(var order in orders)
{
order.ShipDate=DateTime.Parse('12/12/2012);
var result= ordersRepository.UpdateOrder();
}
}
In this Approach, if you have to update many number of records, you are executing thatn many number of update statement to the database. In this purticular case, i would like to execute the Raw SQL statement with only one query using the Database.SqlQuery method
string yourQry="UPDATE orders SET shipDate = '6/15/2012'
WHERE orderId IN ('123123','4986948','23947439')";
var reslt=context.Database.SqlQuery<int>(yourQry);

Resources