How does MaxItemCount from FeedOption and RetrievedDocumentCount from QueryMetric works in Cosmos DB and why both never match? - azure-cosmosdb

I am currently facing query performance issue with Cosmos DB and I am quite sure I have followed most of the performance tips from Microsoft page but still query takes > 1 second.
Connection policy
private static readonly ConnectionPolicy ConnectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
RequestTimeout = new TimeSpan(1, 0, 0),
MaxConnectionLimit = 1000,
RetryOptions = new RetryOptions
{
MaxRetryAttemptsOnThrottledRequests = 10,
MaxRetryWaitTimeInSeconds = 60
}
};
Document Client
this.Client = new DocumentClient(new Uri(config.DocumentDBURI), config.DocumentDBKey, ConnectionPolicy);
Document Query
FeedOptions options = new FeedOptions
{
MaxItemCount = config.getSearchLimit,//// which is 100
PartitionKey = new PartitionKey(partitionKey),
RequestContinuation = responseContinuation
};
var documentQuery = Client.CreateDocumentQuery<SearchByAttributesResult>(
this.TenantCollectionUri,
querySpec,
options).AsDocumentQuery();
Query 1
SELECT p.Doc.id, p.Doc.Name, p.Doc.isOrganization,p.Doc.organizationLegalName, p.Doc.isFactoryAutoUpdate,p.Doc.StartDate, p.Doc.EndDate, p.Doc.InactiveReasonCode,p.Doc.Specialty.specialty AllSpecialty, Address from p JOIN Address IN p.Doc.Address.address WHERE (p.Doc.EndDate = null or (p.Doc.StartDate <= #STARTDATE and p.Doc.EndDate >= #ENDDATE)) and CONTAINS(p.Doc.Name, #PROVIDERNAME) and Address.alpha2Code= #ALPHA2CODE
Query 2
SELECT p.Doc.id, p.Doc.Name, p.Doc.isOrganization,p.Doc.organizationLegalName, p.Doc.isFactoryAutoUpdate,p.Doc.StartDate, p.Doc.EndDate, p.Doc.InactiveReasonCode,p.Doc.Specialty.specialty AllSpecialty, Address from p JOIN Address IN p.Doc.Address.address WHERE (p.Doc.EndDate = null or (p.Doc.StartDate <= #STARTDATE and p.Doc.EndDate >= #ENDDATE)) and STARTSWITH(Address.postalCode, #POSTALCODE) and Address.alpha2Code= #ALPHA2CODE
above query changes based on user search condition
I have only 900 documents in my collection but still query takes > 1 seconds always.
trying to understand few points here
Though I set MaxItemCount to 100 why I am seeing RetrievedDocumentCount from QueryMetrics as 900?
use of CONTAINS/STARTSWITH causing this performance issue?
What's wrong I am doing here and how can i improve this query performance into sub-seconds ( <.5s)

First things first, MaxItemCount doesn't mean that you will get the top 100 documents.
It means that every iteration of ExecuteNextAsync will return up to 100 documents at a time, but up to everything that matches this query.
If you want to limit your results to the top 100 then, in LINQ use the .Take(100) method before you use AsDocumentQuery or in SQL use the TOP keyword.
In terms of performance, it's bad for three reasons.
Checking for records between range of dates
You are using the CONTAINS/STARTSWITH function.
You are joining
At this point, if changing the schema isn't an option, I would recommend reading more about Indexing and optimising it based on the querying requirements of your application.

Related

AWS Scan ignores withLimit()

I am trying to fetch the items from a DynamoDB table to put them in a csv file. Following is the code:
ArrayList<String> ids = new ArrayList<String>();
ScanResult result = null;
do{
ScanRequest req = new ScanRequest();
req.setTableName("table");
req.withLimit(10);
if(result != null){
req.setExclusiveStartKey(result.getLastEvaluatedKey());
}
AmazonDynamoDBClient client = new AmazonDynamoDBClient(awsCreds);
result = client.scan(req);
List<Map<String, AttributeValue>> rows = result.getItems();
for(Map<String, AttributeValue> map : rows){
try{
AttributeValue v = map.get("prod_number");
String id = v.getS();
ids.add(id);
} catch (NumberFormatException e){
System.out.println(e.getMessage());
}
}
} while(result.getLastEvaluatedKey() != null);
System.out.println("Result size: " + ids.size());
I want to know why 'req.withLimit(10)' has no impact on the number of results. The query still tries to fetch all the records.
The limit property of ScanRequest means:
The maximum number of items to evaluate (not necessarily the number of matching items). If DynamoDB processes the number of items up to the limit while processing the results, it stops the operation and returns the matching values up to that point, and a key in LastEvaluatedKey to apply in a subsequent operation, so that you can pick up where you left off. Also, if the processed dataset size exceeds 1 MB before DynamoDB reaches this limit, it stops the operation and returns the matching values up to the limit, and a key in LastEvaluatedKey to apply in a subsequent operation to continue the operation. For more information, see Working with Queries in the Amazon DynamoDB Developer Guide.
So, it limits only the size of a portion of data returned by a single request, but not the whole scan operation. And I see you're doing multiple requests, so you'll get more data.

Why is my query so slow?

I try to tune my query but I have no idea what I can change:
A screenshot of both tables: http://abload.de/image.php?img=1plkyg.jpg
The relation is: 1 UserPM (a Private Message) has 1 Sender (User, SenderID -> User.SenderID) and 1 Recipient (User, RecipientID -> User.UserID) and 1 User has X UserPMs as Recipient and X UserPMs as Sender.
The intial load takes around 200ms, it only takes the first 20 rows and display them. After this is displayed a JavaScript PageMethod gets the GetAllPMsAsReciepient method and loads the rest of the data
this GetAllPMsAsReciepient method takes around 4.5 to 5.0 seconds each time to run on around 250 rows
My code:
public static List<UserPM> GetAllPMsAsReciepient(Guid userID)
{
using (RPGDataContext dc = new RPGDataContext())
{
DateTime dt = DateTime.Now;
DataLoadOptions options = new DataLoadOptions();
//options.LoadWith<UserPM>(a => a.User);
options.LoadWith<UserPM>(a => a.User1);
dc.LoadOptions = options;
List<UserPM> pm = (
from a in dc.UserPMs
where a.RecieverID == userID
&& !a.IsDeletedRec
orderby a.Timestamp descending select a
).ToList();
TimeSpan ts = DateTime.Now - dt;
System.Diagnostics.Debug.WriteLine(ts.Seconds + "." + ts.Milliseconds);
return pm;
}
}
I have no idea how to tune this Query, I mean 250 PMs are nothing at all, on other inboxes on other websites I got around 5000 or something and it doesn't need a single second to load...
I try to set Indexes on Timestamp to reduce the Orderby time but nothing happend so far.
Any ideas here?
EDIT
I try to reproduce it on LinqPad:
Without the DataLoadOptions, in LinqPad the query needs 300ms, with DataLoadOptions around 1 Second.
So, that means:
I could save around 60% of the time, If I can avoid to load the User-table within this query, but how?
Why Linqpad needs only 1 second on the same connection, from the same computer, where my code is need 4.5-5.0 seconds?
Here is the execution plan: http://abload.de/image.php?img=54rjwq.jpg
Here is the SQL Linqpad gives me:
SELECT [t0].[PMID], [t0].[Text], [t0].[RecieverID], [t0].[SenderID], [t0].[Title], [t0].[Timestamp], [t0].[IsDeletedRec], [t0].[IsRead], [t0].[IsDeletedSender], [t0].[IsAnswered], [t1].[UserID], [t1].[Username], [t1].[Password], [t1].[Email], [t1].[RegisterDate], [t1].[LastLogin], [t1].[RegisterIP], [t1].[RefreshPing], [t1].[Admin], [t1].[IsDeleted], [t1].[DeletedFrom], [t1].[IsBanned], [t1].[BannedReason], [t1].[BannedFrom], [t1].[BannedAt], [t1].[NowPlay], [t1].[AcceptAGB], [t1].[AcceptRules], [t1].[MainProfile], [t1].[SetShowHTMLEditorInRPGPosts], [t1].[Age], [t1].[SetIsAgePublic], [t1].[City], [t1].[SetIsCityShown], [t1].[Verified], [t1].[Design], [t1].[SetRPGCountPublic], [t1].[SetLastLoginPublic], [t1].[SetRegisterDatePublic], [t1].[SetGBActive], [t1].[Gender], [t1].[IsGenderVisible], [t1].[OnlinelistHidden], [t1].[Birthday], [t1].[SetIsMenuHideable], [t1].[SetColorButtons], [t1].[SetIsAboutMePublic], [t1].[Name], [t1].[SetIsNamePublic], [t1].[ContactAnimexx], [t1].[ContactRPGLand], [t1].[ContactSkype], [t1].[ContactICQ], [t1].[ContactDeviantArt], [t1].[ContactFacebook], [t1].[ContactTwitter], [t1].[ContactTumblr], [t1].[IsContactAnimexxPublic], [t1].[IsContactRPGLandPublic], [t1].[IsContactSkypePublic], [t1].[IsContactICQPublic], [t1].[IsContactDeviantArtPublic], [t1].[IsContactFacebookPublic], [t1].[IsContactTwitterPublic], [t1].[IsContactTumblrPublic], [t1].[IsAdult], [t1].[IsShoutboxVisible], [t1].[Notification], [t1].[ShowTutorial], [t1].[MainProfilePreview], [t1].[SetSound], [t1].[EmailNotification], [t1].[UsernameOld], [t1].[UsernameChangeDate]
FROM [UserPM] AS [t0]
INNER JOIN [User] AS [t1] ON [t1].[UserID] = [t0].[RecieverID]
WHERE ([t0].[RecieverID] = #p0) AND (NOT ([t0].[IsDeletedRec] = 1))
ORDER BY [t0].[Timestamp] DESC
If you want to get rid of the LoadWith, you can select your field explicitly :
public static List<Tuple<UserPM, User> > GetAllPMsAsReciepient(Guid userID)
{
using (var dataContext = new RPGDataContext())
{
return (
from a in dataContext.UserPMs
where a.RecieverID == userID
&& !a.IsDeletedRec
orderby a.Timestamp descending
select Tuple.Create(a, a.User1)
).ToList();
}
}
I found a solution:
At first it seems that with the DataLoadOptions is something not okay, at second its not clever to load a table with 30 Coloumns when you only need 1.
To Solve this, I make a view which covers all nececeery fields and of course the join.
It reduces the time from 5.0 seconds to 230ms!

OrCriteria taking forever to execute using Tridion content delivery api

I am converting a SQL query into broker API functionality. The query basically retrieves custom meta data based on key and value filters. The issue is when I am joining two criteria using or criteria the query.executequery takes forever and the control never returns. The code that I am using is as below
PublicationCriteria pubCriteria = new PublicationCriteria(80);
//1st query
CustomMetaKeyCriteria keyCriteria1 = new CustomMetaKeyCriteria("PublicationType");
CustomMetaValueCriteria valueCriteria11 = new CustomMetaValueCriteria("Report", Criteria.Like);
CustomMetaValueCriteria valueCriteria12 = new CustomMetaValueCriteria("Video", Criteria.Like);
Criteria valueCriteria1 = CriteriaFactory.Or(valueCriteria11, valueCriteria12);
Criteria criteria1 =CriteriaFactory.And(keyCriteria1, valueCriteria1);
//2nd query
CustomMetaKeyCriteria keyCriteria2 = new CustomMetaKeyCriteria("Tags");
CustomMetaValueCriteria valueCriteria21 = new CustomMetaValueCriteria("tcm:80-20641", Criteria.Equal);
CustomMetaValueCriteria valueCriteria22 = new CustomMetaValueCriteria("tcm:80-20645", Criteria.Equal);
Criteria valueCriteria2 = CriteriaFactory.Or(valueCriteria21, valueCriteria22);
Criteria criteria2 = CriteriaFactory.And(keyCriteria2, valueCriteria2);
Criteria querycriteria = CriteriaFactory.Or(criteria1, criteria2);
Criteria finalCriteria = CriteriaFactory.And(pubCriteria, querycriteria);
Query query = new Query(criteria2);
query.SetResultFilter(new LimitFilter(10));
var n = query.ExecuteQuery();
I have tried using new orcriteria and passing the criteria as array but this also didn't work.
Couple of weeks back I tried the same, it worked for me. I have put my findings here. http://vadalis.com/custom-meta-query-from-tridionbroker-database/
Note : my broker database is very small.

Retrieve Cellset Value in SSAS\MDX

Im writing SSAS MDX queries involving more than 2 axis' to retrieve a value. Using ADOMD.NET, I can get the returned cellset and determine the value by using
lblTotalGrossSales.Text = CellSet.Cells(0).Value
Is there a way I can get the CellSet's Cell(0) Value in my MDX query, instead of relying on the data returning to ADOMD.NET?
thanks!
Edit 1: - Based on Daryl's comment, here's some elaboration on what Im doing. My current query is using several axis', which is:
SELECT {[Term Date].[Date Calcs].[MTD]} ON 0,
{[Sale Date].[YQMD].[DAY].&[20121115]} ON 1,
{[Customer].[ID].[All].[A612Q4-35]} ON 2,
{[Measures].[Loss]} ON 3
FROM OUR_CUBE
If I run that query in Management Studio, I am told Results cannot be displayed for cellsets with more than two axes - which makes sense since.. you know.. there's more than 2 axes. However, if I use ADOMD.NET to run this query in-line, and read the returning value into an ADOMD.NET cellset, I can check the value at cell "0", giving me my value... which as I understand it (im a total noob at cubes) is the value sitting where all these values intersect.
So to answer your question Daryl, what I'd love to have is the ability to have the value here returned to me, not have to read in a cell set into the calling application. Why you may ask? Well.. ultimately I'd love to have one query that performs several multi-axis queries to return the values. Again.. Im VERY new to cubes and MDX, so it's possible Im going at this all wrong (Im a .NET developer by trade).
Simplify your query to return two axis;
SELECT {[Measures].[Loss]} ON 0, {[Term Date].[Date Calcs].[MTD] * [Sale Date].[YQMD].[DAY].&[20121115] * [Customer].[ID].[All].[A612Q4-35]} ON 1 FROM OUR_CUBE
and then try the following to access the cellset;
string connectionString = "Data Source=localhost;Catalog=AdventureWorksDW2012";
//Create a new string builder to store the results
System.Text.StringBuilder result = new System.Text.StringBuilder();
AdomdConnection conn = new AdomdConnection(connectionString);
//Connect to the local serverusing (AdomdConnection conn = new AdomdConnection("Data Source=localhost;"))
{
conn.Open();
//Create a command, using this connection
AdomdCommand cmd = conn.CreateCommand();
cmd.CommandText = #"SELECT { [Measures].[Unit Price] } ON COLUMNS , {[Product].[Color].[Color].MEMBERS-[Product].[Color].[]} * [Product].[Model Name].[Model Name]ON ROWS FROM [Adventure Works] ;";
//Execute the query, returning a cellset
CellSet cs = cmd.ExecuteCellSet();
//Output the column captions from the first axis//Note that this procedure assumes a single member exists per column.
result.Append("\t\t\t");
TupleCollection tuplesOnColumns = cs.Axes[0].Set.Tuples;
foreach (Microsoft.AnalysisServices.AdomdClient.Tuple column in tuplesOnColumns)
{
result.Append(column.Members[0].Caption + "\t");
}
result.AppendLine();
//Output the row captions from the second axis and cell data//Note that this procedure assumes a two-dimensional cellset
TupleCollection tuplesOnRows = cs.Axes[1].Set.Tuples;
for (int row = 0; row < tuplesOnRows.Count; row++)
{
for (int members = 0; members < tuplesOnRows[row].Members.Count; members++ )
{
result.Append(tuplesOnRows[row].Members[members].Caption + "\t");
}
for (int col = 0; col < tuplesOnColumns.Count; col++)
{
result.Append(cs.Cells[col, row].FormattedValue + "\t");
}
result.AppendLine();
}
conn.Close();
TextBox1.Text = result.ToString();
} // using connection
Source : Retrieving Data Using the CellSet
This is fine upto select on columns and on Rows. It will be helpful analyze how to traverse sub select queries from main query.

Is the Expression.* namespace the only way to create expression trees for EF 4.3 + ODAC?

I am having an issue with ODAC (Oracle Data Access Components), Entity Framework 4.3.1, and expression trees. We have a legacy database (don't we all?) that we are mapping in Entity Framework. The table has millions of records and over one hundred columns (sad face).
Here is an example query on an indexed column:
int myId = 2;
var matchingRecord = context.MyLargeTable.Where(v=>v.Id == myId).ToList(); //Super slow (5+ minutes, sometimes Out of Memory exception)
int myId = 2;
Expression<Func<bool>> myLambda = v => v.Id == myId; //Shouldn't this work now?
var matchingRecord = context.MyLargeTable.Where(myLambda).ToList(); //Still super slow (5+ minutes, sometimes Out of Memory exception)
var elementName = Expression.Parameter(typeof(LargeTable), "v");
var propertyName = Expression.Parameter(elementName, "Id");
var constantValue = Expression.Constant(myId);
var comparisonMethod = Expression.Call(
propertyName,
typeof(int).GetMethod("Equals", new[] { typeof(int) }),
constantValue
)
var finalTree = Expression.Lambda<Func<LargeTable, bool>>(comparisonMethod, elementName);
var matchingRecord = context.MyLargeTable.Where(finalTree).ToList(); //Super fast
I've read things like this that explain the different between Func<> and Expression> and how Expression> actually gets passed to the database for the query and that's why it is faster.
http://www.fascinatedwithsoftware.com/blog/post/2011/12/02/Falling-in-Love-with-LINQ-Part-7-Expressions-and-Funcs.aspx - Whole thing is good, but if in a rush, just read the section titled “Unintended Consequences” for the main takeaway
http://fascinatedwithsoftware.com/blog/post/2012/01/10/More-on-Expression-vs-Func-with-Entity-Framework.aspx
Why would you use Expression<Func<T>> rather than Func<T>? - No set of links is complete without a corresponding SO question
My question is this: Are people really sitting there constructing expression trees using Expression.* classes? Any query beyond simple comparisons get really complicated and is almost impossible to read. What am I missing about passing the Expression> to the database? Who do I go punch in the face for this manually constructed expression tree solution? Oracle? EF? What am I missing?

Resources