Cosmos DB - Gremlin - Query performance slow when ordering - azure-cosmosdb

I hope that someone could please help me. I have setup a simple test to test query performance for a basic graph. I have included the code below. Basically what I do is create 30000 vertices and then create 20000 edges on one of the vertices.
public async Task<bool> runTests(Models.Person person)
{
try
{
var gremlinServer = new GremlinServer(_hostname, _port, enableSsl: true,
username: "/dbs/" + _dataBaseId + "/colls/" + _collectionId,
password: _authKey);
using (var gremlinClient = new GremlinClient(gremlinServer, new GraphSON2Reader(), new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
{
await gremlinClient.SubmitAsync<dynamic>("g.E().drop()");
await gremlinClient.SubmitAsync<dynamic>("g.V().drop()");
var counter = 30000;
while (counter != 0)
{
await gremlinClient.SubmitAsync<dynamic>("g.addV('partition" + counter + "').property('id', 'partition" + counter + "').property('profilePicture', '" + person.ProfilePicture + "').property('name', 'Person " + counter + "').property('partitionKey', 'partition" + counter + "')");
counter--;
}
var counter = 20000;
while (counter != 0)
{
int num = counter + 1;
var personToLink = "partition" + num;
await gremlinClient.SubmitAsync<dynamic>("g.V('partition1').addE('friendsWith').to(g.V('partition" + num + "'))");
counter--;
}
var searchResults = await gremlinClient.SubmitAsync<dynamic>("g.V().hasId('partition1').out('friendsWith').order().by('name', incr).valueMap('name', 'profilePicture').range(0,2)");
return true;
}
}
catch (Exception ex)
{
throw ex;
}
}
When I run the following query results are returned quickly:
g.V().hasId('partition1').out('friendsWith').valueMap('name', 'profilePicture').range(0,2)
However, as soon as I add an order clause the query takes much, much longer. Over a minute to complete:
g.V().hasId('partition1').out('friendsWith').order().by('name', incr).valueMap('name', 'profilePicture').range(0,2)
Is there a way to index the graph to speed this type of query up?
I have another question as well, I have set the throughput to 5000 RU, however when I run a query that run very quickly I get the following:
Data Explorer Query Stats Result
What is this value supposed to represent (RU's?) if so why is it so high?
Also when I try to run a simple query:
g.V().hasId('partition1').out('friendsWith').hasId('partition20001')
I get "Request rate is large" even though this is such a simple query. Even more concerning is when I increase the throughput to 5000 RU's I do get a result back but its really slow, takes about 5-6 seconds for what should be a really simple query.

Related

Firebase count group by

Does Firebase supports grouped counting?
I would like to get counting for specific key grouped by the value.
Example of my data structure:
"playbackPosition" : {
"-JZ2c7v-imQXaEx2c4Cs" : {
"data" : "0"
},
"-JZ2cDta73UGAgwIOejL" : {
"data" : "25"
},
"-JZ2cJutDZu-7quLpLI2" : {
"data" : "50"
},
"-JZ2cO-Me0nGK5XLTwd-" : {
"data" : "75"
},
"-JZ2cSh-XnstYxbE_Zad" : {
"data" : "100"
},
"-JZ2cWxya0-kmTPCze4u" : {
"data" : "0"
},
"-JZ2c_wX-ODv43TKXkNG" : {
"data" : "25"
}
}
Required results based on the data key :
0 => 2
25 => 2
50 => 1
75 => 1
100 => 1
And of course I must consider that it will have thousands of children's, not only 7...
Thanks ahead!
EDIT
Deeper explanation of the app and the problem we want to solve.
We have video scripts which runs on different websites, each video session (a user session) sends events and data, you can see an example here, check the Network tab - https://secure.bwebi.co/videostir/regular.html
Our goal is to collect this data and create an analytics real time dashboard with few charts and graphs.
You can see our current data structure here
Example for graphs we need:
Completion rate
General - Bar graph showing overall number of views per clip duration pre defined periods.
Filters - date (start/end), unique/all, All urls / specific urls (embed on), All configurations / specific configurations/ ignore silent.
X axis - groups 0-25, 25-50,50-75,75-99, 100
Y axis - number of views
Views per day (with completion rate)
General - Multi lines graph showing number of views per day per duration periods.
Filters - date (start/end), unique/all, All urls / specific urls (embed on), All configurations / specific configurations / ignore silent.
X axis - Time in days
Y axis - Number of views
Lines for:
Total daily views
Daily views with 100% duration
Daily views with 75-99% duration
Daily views with 50-75% duration
Daily views with 25-50% duration
Daily views with 0-25% duration
Hope it's more clear now!
Group by is a SQL function. The reason SQL can't do real-time data is because this sort of method does not scale. Mongo provides similar functionality, but once again, it doesn't scale. You may notice a pattern here of why Firebase does not provide this sort of query function.
It would be extremely helpful if you provided some context of what you're actually attempting to accomplish here, what the rules of the app are, and what approaches you've ruled out, rather than just your presupposed solution of group by. There are probably other, possibly better, alternatives. See the XY problem.
Here are a couple generic alternatives derived by making sweeping assumptions about your use case.
Store the totals
This is the most scalable solution. Store your data as follows:
/playbacks/$id/<playback data>
/group_totals/$group/<count>
When writing to playbacks, also update the count for the appropriate group:
var fb = new Firebase(URL);
function addPlayback(rec) {
var ref = fb.child('playbacks').push(rec, function(err) {
if( err ) throw err;
incrementCount(rec.data);
});
}
function incrementCount(count) {
fb.child('group_totals/' + count).transaction(function(currentVal) {
return (currentVal||0)+1;
});
}
Now when you want to get the total for a group, you can simply look up the value at group_totals/$group. Similarly, you can store ids for records that belong to each group and utilize that index to grab only the records for a given group.
Use priorities to fetch and group
A simpler approach would be to give each record a priority based on the group/data value.
var fb = new Firebase(URL);
function addPlayback(rec) {
rec['.priority'] = rec.data;
var ref = fb.child('playbacks').push(rec, function(err) {
if( err ) throw err;
});
}
Now to grab a set of records for a given group:
var fb = new Firebase(URL);
function getGroup(groupValue, callback) {
fb.child('playbackPosition').startAt(groupValue).endAt(groupValue).once('value', callback);
}
function logGroupCount(groupValue, callback) {
getGroup(groupValue, function(snap) {
console.log('there are ' + snap.numChildren() + ' items in group ' +groupValue);
});
}
A am not professional programmer, I am just learning.
Here is the piece of code I came up with when I wanted to group my query results:
btn.setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View v) {
progressBar.setVisibility(View.VISIBLE);
tv_info.setText("Please wait ... ");
Query query = collectionRef.orderBy("timestamp", Query.Direction.DESCENDING);
query.get().addOnSuccessListener(new OnSuccessListener<QuerySnapshot>() {
#Override
public void onSuccess(QuerySnapshot queryDocumentSnapshots) {
String dataInfo = "";
int arrayLength = 1;
List <String> UsedNames = new ArrayList<String>();
String someName = "...";
String oneRow = "";
int orderNumber = -1;
int count = 0;
for (QueryDocumentSnapshot documentSnapshot : queryDocumentSnapshots) {
OneRecord oneRecord = documentSnapshot.toObject(OneRecord.class);
someName = oneRecord.getSomeName();
if (UsedNames.contains(someName)) {
// Log.i("test" , "Array Contains ");
} else {
orderNumber += 1;
UsedNames.add(someName);
}
}
List list = queryDocumentSnapshots.toObjects(OneRecord.class);
for (String someString : UsedNames) {
int counter = 0;
for (int i = 0; i < list.size(); i++) {
OneRecord oneRecord = (OneRecord) list.get(i);
String name = oneRecord.getName();
if (someString.equals(name)) {
counter += 1;
}
}
Log.i("test" , "Array: " + someString + " : " + counter);
count = count +1;
dataInfo = dataInfo + someString + " : " + counter + "\n";
}
Log.i("test" , "Used length: " + UsedNames.size());
progressBar.setVisibility(View.GONE);
tv_info.setText(dataInfo);
}
}).addOnFailureListener(new OnFailureListener() {
#Override
public void onFailure(#NonNull Exception e) {
progressBar.setVisibility(View.GONE);
tv_info.setText("Could not query last records: " + e.getMessage());
}
});
}
});
Unfortunately I did not figure out how to sort them in DESCENDING or ASCENDING order

Can't find memory leak in Swift app

I'm trying to learn Swift and so I wrote a little test application to that end. It just gives the total size of items in a directory, recursing into subdirectories to accumulate the total size of the their contents. The application works, but the memory usage just grows and grows while it runs. I had expected the memory usage to increase as the recursion got deeper and decrease when a recursive call returns. Instead, the memory usage just constantly climbs. Instruments doesn't identify any leaks. I've tried a few tips I've found in various google results, including:
re-using the default NSFileManager
not re-using the default NSFileManager but creating a new one for each recursive call
avoiding String interpolation
Nothing seems to make any difference. I had thought that Swift would clean up objects as their reference count reached zero.
This is the code in its entirety in its current state:
import Foundation
func sizeOfContents(path: String) -> UInt64
{
let subManager = NSFileManager()
var totalSize: UInt64 = 0;
var isDir: ObjCBool = false
if subManager.fileExistsAtPath(path, isDirectory: &isDir)
{
if !isDir.boolValue
{
var error: NSError? = nil
let attributes: NSDictionary? = subManager.attributesOfItemAtPath(path, error: &error)
let size: UInt64? = attributes?.fileSize()
totalSize += size!
}
else
{
var error: NSError? = nil
if let subContents = subManager.contentsOfDirectoryAtPath(path, error: &error)
{
for subItem in subContents
{
var subName = subItem as String
subName = path + "/" + subName
totalSize += sizeOfContents(subName)
}
}
}
}
return totalSize
}
let manager = NSFileManager.defaultManager()
var rootPath = "/Applications/"
if let contents = manager.contentsOfDirectoryAtPath(rootPath, error: nil)
{
for item in contents
{
let itemName = item as String
var isDir: ObjCBool = false
print("item: " + (rootPath + itemName))
if manager.fileExistsAtPath(rootPath + itemName, isDirectory: &isDir)
{
if !isDir.boolValue
{
var error: NSError? = nil
let attributes: NSDictionary? = manager.attributesOfItemAtPath(rootPath + itemName, error: &error)
let size: UInt64? = attributes?.fileSize()
println("\t\(size!)")
}
else
{
if(itemName != "Volumes")
{
let size = sizeOfContents(rootPath + itemName)
println("\t\(size)")
}
}
}
}
}
You need to add an autoreleasepool in the loop, possibly around the recursive call. Since it is a tight loop the idle loop is not getting a change to release the temp memory allocations.
example:
...
autoreleasepool {
totalSize += sizeOfContents(subName)
}
...

winjs sqlite database is locked

I use in a Windows 8 project (js/html) the SQLite3-WinRT library https://github.com/doo/SQLite3-WinRT.
I create a function that is called in a for loop.
I have this error:
SQLiteError: 0x800700aa: eachAsync("INSERT INTO home (id, url, cksum)VALUES (16, 'main_page_2.jpg', 'e0d046ca3421a3c2df328b293ad5981a');", ) database is locked
I think the error is because I create a new connection every iteration of loop, but I don't understand another method. Who can help me?
This is the function:
function insertInDB(dbPath, tbName, arrayCol, arrayVal) {
SQLite3JS.openAsync(dbPath).then(function (db) {
var query = "INSERT INTO " + tbName;
var column = " (";
var values = "VALUES (";
for (var i = 0; i < arrayCol.length; i++) {
if (i == arrayCol.length - 1) {
column = column + arrayCol[i] + ")";
} else {
column = column + arrayCol[i] + ", ";
}
}
for (var i = 0; i < arrayVal.length; i++) {
if (i == arrayCol.length - 1) {
values = values + arrayVal[i] + ");";
} else {
values = values + arrayVal[i] + ", ";
}
}
query = query + column + values;
return db.eachAsync(query).done(function () {
console.log("Ok");
db.close();
},
function (error) { console.log(error); },
function (progress) { });
});
}
and this is the loop that call a previous function:
listHome.forEach(function(value, index, array){
var valconfig = new Array(value.id, "'" + value.url + "'", "'" + value.cksum + "'");
console.log("id=" + value.id + " url=" + value.url + " ck=" + value.cksum);
insertInDB(sqlPath, "home", colconfig, valconfig);
})
If I'm reading this correctly, your calling code is iterating over a list of values synchronously. listHome.forEach will call insertInDB for each item in listHome ... but it doesn't wait for insertInDB to return before making the next call to insertInDB.
Inside insertInDB you have call to SQLite3JS.openAsync and db.eachAsync - both asynchronous methods. After perusing SQLite3JS a little bit (which looks pretty cool), both of those methods return promises, where internally they call into a WinRT component. Great design.
So this is what I suspect is happening: one of the asynchronous calls in insertInDB puts a lock on the database. However, insertInDB returns control back to the listHome.forEach loop as soon as it hits the first asynchronous method call. If the lock on the database remains once forEach gets to the next item in listHome, then the operation will attempt to write to a locked database. Hence the error.
I'll think about this a little bit and see if I can come up with a solution.
-- edit --
Okay, I have a solution that might work for you. You might want to create a "DataBaseHelper" class that will queue up the transactions that you need to make in the database.
Here's a rough prototype that I threw together:
[Replaces your foreach loop]
DBHelper.queueUpdates(listHome);
[DBHelper module definition]
(function () {
var _queue;
function queueUpdates(array) {
_queue = array;
scheduleUpdates();
}
function scheduleUpdates() {
if (_queue.length > 0) {
var transaction = _queue.pop();
insertInDB("path", "table", "column", transaction);
}
}
function insertInDB(dbPath, tbName, arrayCol, arrayVal) {
return SQLite3JS.openAsync(dbPath).then(function (db) {
// Construct your SQL query ...
return db.eachAsync(query).done(function () {
db.close();
scheduleUpdates();
},
function (error) { console.log(error); },
function (progress) { });
});
}
WinJS.Namespace.define("DBHelper", {
queueUpdates: queueUpdates
})
})();

Make operation with aggregate function result

I confess that I do not get along very well with the Deferred object. I'm making a query to the database on several "Stores" and as a result I want to do a series of operations. This troubles me because the results are returned asynchronously and I have no way to perform the corresponding operation on the "store" you should. In short, the problem is that this piece of code always executes the same function on the same "Store"
for (var i = 0; i < schema['stores'].length; i++) {
storeName = schema['stores'][i].name;
var objeto = db.executeSql('SELECT MAX(date_upd) FROM ' + '"' + storeName + '"').done(
function(result, a){
//saveDataSynce(db, storeName, result);
console.log(result);
}
);
}
Whenever there is a loop on async operation, be very careful about function scope. In your example code, storeName inside the function will always be the last executed value. Use function scope as follow:
var getMax = function(storeName) {
db.executeSql('SELECT MAX(date_upd) FROM ' + '"' + storeName + '"').done(
function(result){
//saveDataSynce(db, storeName, result);
console.log(storeName, result);
}
);
}
for (var i = 0; i < schema['stores'].length; i++) {
getMax(schema['stores'][i].name);
}
However, preferred coding pattern for YDN-DB is NoSQL style as follow:
var getMax = function(storeName) {
var indexName = 'date_upd';
var key_range = null; // whole store
var limit = 1;
var offset = 0;
var reverse = true;
db.values(storeName, indexName, key_range, limit, offset, reverse).done(
function(results) {
var max_key = results[0]; // may be undefined. OK.
//saveDataSynce(db, storeName, max_key);
console.log(storeName, max_key);
}
);
}
Note that keys (primary or index) are always sorted by ascending order. Max key is the first key in reverse order.

Listing orders and passing ISO 8601 date to Amazon MWS

What would be the best way to retrieve orders from Amazon MWS?
My current code is as follows...
MarketplaceWebServiceOrdersConfig config = new MarketplaceWebServiceOrdersConfig();
config.ServiceURL = productsURL;
MarketplaceWebServiceOrders.MarketplaceWebServiceOrdersClient service = new MarketplaceWebServiceOrdersClient(appname, version, accesskeyID, secretkey, config);
ListOrdersRequest request = new ListOrdersRequest();
request.MarketplaceId = new MarketplaceIdList();
request.MarketplaceId.Id = new List<string>(new string[] { marketids[0] });
request.SellerId = merchantID;
request.OrderStatus = new OrderStatusList() { Status = new List<OrderStatusEnum>() { OrderStatusEnum.Unshipped, OrderStatusEnum.PartiallyShipped } };
request.CreatedAfter = Convert.ToDateTime(dc.Settings.SingleOrDefault().lastOrdersRetrieved);
ListOrdersResponse response = service.ListOrders(request);
I am having issues passing the ISO Date across, also if you see any other issues with the code please feel free to let me know.
If your looking for something created after the immediate second you make the request, it wont find anything at all as with Amazon you can only grab up to the last 2 minutes of data for orders.
I had an issue with trying to set time from Now - 5 minutes. After speaking to Amazon support they provided the following nugget: "
In Orders API, if you don't specify an end time (CreatedBefore or
LastUpdatedBefore), it will assume now (actually, now minus 2
minutes). And in its response, it will tell you exactly what time it
used as the cutoff time."
In your case you will want to remove the CreatedAfter request and let Amazon choose for you.
If you are then looking for the created after, you can grab the response time Amazon gave and pass that in to your created after param.
The method I have right now to list orders is as follows, mind you this will just list the orders to console, but the data gets returned all the same:
public List<string> ListOrders(MarketplaceWebServiceOrders.MarketplaceWebServiceOrders service, string merchantId, List<OrderStatusEnum> orderStatus)
{
List<string> salesOrderIds = new List<string>();
ListOrdersRequest listOrdersRequest = new ListOrdersRequest();
DateTime createdAfter = DateTime.Now.Add(new TimeSpan(-1, 0, 0));
DateTime createdbefore = DateTime.Now.Add(new TimeSpan(0, -15, 0));
listOrdersRequest.CreatedAfter = createdAfter;
listOrdersRequest.CreatedBefore = createdbefore;
listOrdersRequest.SellerId = merchantId;
listOrdersRequest.OrderStatus = new OrderStatusList();
foreach (OrderStatusEnum status in orderStatus)
{
listOrdersRequest.OrderStatus.Status.Add(status);
}
listOrdersRequest.FulfillmentChannel = new FulfillmentChannelList();
listOrdersRequest.FulfillmentChannel.Channel = new List<FulfillmentChannelEnum>();
listOrdersRequest.FulfillmentChannel.Channel.Add(FulfillmentChannelEnum.MFN);
listOrdersRequest.MarketplaceId = new MarketplaceIdList();
listOrdersRequest.MarketplaceId.Id = new List<string>();
listOrdersRequest.MarketplaceId.Id.Add("yourID");
ListOrdersResponse listOrdersResponse = service.ListOrders(listOrdersRequest);
int i = 0;
foreach (Order order in listOrdersResponse.ListOrdersResult.Orders.Order)
{
i++;
Console.WriteLine("Amazon Order ID: \t" + order.AmazonOrderId);
Console.WriteLine("Buyer Name: \t" + order.BuyerName);
Console.WriteLine("Buyer Email: \t" + order.BuyerEmail);
Console.WriteLine("Fulfillment Channel: \t" + order.FulfillmentChannel);
Console.WriteLine("Order Status: \t" + order.OrderStatus);
Console.WriteLine("Order Total: \t" + order.OrderTotal);
Console.WriteLine("Number of Items Shipped: \t" + order.NumberOfItemsShipped);
Console.WriteLine("Number of Items Unshipped: \t" + order.NumberOfItemsUnshipped);
Console.WriteLine("Purchase Date: \t" + order.PurchaseDate);
Console.WriteLine("===========================================================");
salesOrderIds.Add(order.AmazonOrderId);
}
Console.WriteLine("We returned a total of {0} records. ", i);
return salesOrderIds;
}

Resources