Get top records by latest date in Azure CosmosDB - azure-cosmosdb

I have a table like this:
Assume there are lots of Names (i.e., E,F,G,H,I etc.,) and their respective Date and Produced Items in this table. It's a massive table, so I'd want to write an optimised query.
In this, I want to query the latest A,B,C,D records.
I was using the following query:
SELECT * FROM c WHERE c.Name IN ('A','B','C','D') ORDER BY c.Date DESC OFFSET 0 LIMIT 4
But the problem with this query is, since I'm ordering by Date, the latest 4 records I'm getting are:
I want to get this result:
Please help me in modifying the query. Thanks.

Related

How to improve Teradata query for pagination?

Hi Guys i'm trying to use pagination for my teradata query as below:
SELECT RANK() OVER (ORDER BY id,firstname,lastname,grade,gender,age,profession) as row_num, ORDER BY id,firstname,lastname,grade,gender,age,profession FROM table-1 QUALIFY row_num BETWEEN 0 and 1000 ;
the query works fine. However, i'm trying to see if there is an other way to query the table based on * instead of stating all table columns in query 2 times.
Apprecite your input on this request.

PeopleSoft Query Manager - 'count' function

I'm using the current version of PeopleSoft and I'm using their Query manager. I've built a query that looks at the job table and a customized version of the job table (so I can see future hires). In order to do this I've created a union. Everything works fine, except now I want to do a count of the job codes.
When I put in a count, I get an error. I don't know how to get it to work properly. I also don't really know how to using the 'having' tab.
I've attached some screenshots, including the SQL code.
SQL:
Having tab
You have a criteria in your query:
AND COUNT(*) = A.JOBCODE
Your job codes are string values that uniquely identify a job. It will never be equal to a count.
If you remove that criteria, your query will work:
The bigger issue is, what do you want to count? If your query was simply:
SELECT DEPTID, JOBCODE, COUNT(*)
This will give the count of employees in this department and job code. In your description, you said that you wanted the count of job codes. But each row has JOBCODE on it. The count of job codes on the row is one. What do you really want? The count of job codes in the database? The count of job codes in the result set?
If you want to get anything other than the count of rows within the group, you are not able to put that logic in PeopleSoft Query. You will need to create a view in AppDesigner and then you can add that to the query.

Kusto: Query execution has exceeded the allowed limits (80DA0003):

There are three tables mentioned below, I eventually want to bring in a field from Table3 to Table1 (but the only way to join these two tables is via a common field present in Table2)
Table 1: Application Insights-30 days data (datasize ~4,000,000)
Table 2: Kusto based table (datasize: 1,080,153)
Table 3: Kusto based table (datasize: 38,815,878)
I was not able to join the tables directly, So, I used various filter conditions, distinct operators and split the month data to 4 weeks and then used union to join all 3 tables and got the resultant table.
However, now I am unable to perform any operations on the resultant table (even |count doesn't work)
I get the following error
Query execution has exceeded the allowed limits (80DA0003):
Any help in handling such cases would be helpful
Please check article how to control the query limits:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/concepts/querylimits#limit-on-result-set-size-result-truncation
When using the join operator, make sure that the table with fewer rows is the first one (left-most in query).
See more Best Practices here.

how can I get faster FTS4 query results ordered by a field in another table?

Background
I'm implementing full-text search over a body of email messages stored in SQLite, making use of its fantastic built-in FTS4 engine. I'm getting some rather poor query performance, although not exactly where I would expect. Let's take a look.
Representative schema
I'll give some simplified examples of the code in question, with links to the full code where applicable.
We've got a MessageTable that stores the data about an email message (full version spread out over several files here, here, and here):
CREATE TABLE MessageTable (
id INTEGER PRIMARY KEY,
internaldate_time_t INTEGER
);
CREATE INDEX MessageTableInternalDateTimeTIndex
ON MessageTable(internaldate_time_t);
The searchable text is added to an FTS4 table named MessageSearchTable (full version here):
CREATE VIRTUAL TABLE MessageSearchTable USING fts4(
id INTEGER PRIMARY KEY,
body
);
The id in the search table acts as a foreign key to the message table.
I'll leave it as an exercise for the reader to insert data into these tables (I certainly can't give out my private email). I have just under 26k records in each table.
Problem query
When we retrieve search results, we need them to be ordered descending by internaldate_time_t so we can pluck out only the most recent few results. Here's an example search query (full version here):
SELECT id
FROM MessageSearchTable
JOIN MessageTable USING (id)
WHERE MessageSearchTable MATCH 'a'
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
On my machine, with my email, that runs in about 150 milliseconds, as measured via:
time sqlite3 test.db <<<"..." > /dev/null
150 milliseconds is no beast of a query, but for a simple FTS lookup and indexed order, it's sluggish. If I omit the ORDER BY, it completes in 10 milliseconds, for example. Also keep in mind that the actual query has one more sub-select, so there's a little more work going on in general: the full version of the query runs in about 600 milliseconds, which is into beast territory, and omitting the ORDER BY in that case shaves 500 milliseconds off the time.
If I turn on stats inside sqlite3 and run the query, I notice the line:
Sort Operations: 1
If my interpretation of the docs about those stats is correct, it looks like the query is completely skipping using the MessageTableInternalDateTimeTIndex. The full version of the query also has the line:
Fullscan Steps: 25824
Sounds like it's walking the table somewhere, but let's ignore that for now.
What I've discovered
So let's work on optimizing that a little bit. I can rearrange the query into a sub-select and force SQLite to use our index with the INDEXED BY extension:
SELECT id
FROM MessageTable
INDEXED BY MessageTableInternalDateTimeTIndex
WHERE id IN (
SELECT id
FROM MessageSearchTable
WHERE MessageSearchTable MATCH 'a'
)
ORDER BY internaldate_time_t DESC
LIMIT 10 OFFSET 0
Lo and behold, the running time has dropped to around 100 milliseconds (300 milliseconds in the full version of the query, a 50% reduction in running time), and there are no sort operations reported. Note that with just reorganizing the query like this but not forcing the index with INDEXED BY, there's still a sort operation (though we've still shaved off a few milliseconds oddly enough), so it appears that SQLite is indeed ignoring our index unless we force it.
I've also tried some other things to see if they'd make a difference, but they didn't:
Explicitly making the index DESC as described here, with and without INDEXED BY
Explicitly adding the id column in the index, with and without internaldate_time_t ordered DESC, with and without INDEXED BY
Probably several other things I can't remember at this moment
Questions
100 milliseconds here still seems awfully slow for what seems like it should be a simple FTS lookup and indexed order.
What's going on here? Why is it ignoring the obvious index unless you force its hand?
Am I hitting some limitation with combining data from virtual and regular tables?
Why is it still so relatively slow, and is there anything else I can do to get FTS matches ordered by a field in another table?
Thanks!
An index is useful for looking up a table row based on the value of the indexed column.
Once a table row is found, indexes are no longer useful because it is not efficient to look up a table row in an index by any other criterium.
An implication of this is that it is not possible to use more than one index for each table accessed in a query.
Also see the documentation: Query Planning, Query Optimizer.
Your first query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 1 1 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
for each row found in 1., the MessageTable primary key index is used to find the matching row;
all rows found in 2. are sorted with a temporary table;
the first 10 rows are returned.
Your second query has the following EXPLAIN QUERY PLAN output:
0 0 0 SCAN TABLE MessageTable USING COVERING INDEX MessageTableInternalDateTimeTIndex (~100000 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
What happens is that
the FTS index is used to find all matching MessageSearchTable rows;
SQLite goes through all entries in the MessageTableInternalDateTimeTIndex in the index order, and returns a row when the id value is one of the values found in step 1.
SQLite stops after the tenth such row.
In this query, it is possible to use the index for (implied) sorting, but only because no other index is used for looking up rows in this table.
Using an index in this way implies that SQLite has to go through all entries, instead of lookup up the few rows that match some other condition.
When you omit the INDEXED BY clause from your second query, you get the following EXPLAIN QUERY PLAN output:
0 0 0 SEARCH TABLE MessageTable USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
0 0 0 EXECUTE LIST SUBQUERY 1
1 0 0 SCAN TABLE MessageSearchTable VIRTUAL TABLE INDEX 4: (~0 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
which is essentially the same as your first query, except that joins and subqueries are handled slightly differently.
With your table structure, it is not really possible to get faster.
You are doing three operations:
looking up rows in MessageSearchTable;
looking up corresponding rows in MessageTable;
sorting rows by a MessageTable value.
As far as indexes are concerned, steps 2 and 3 conflict with each other.
The database has to choose whether to use an index for step 2 (in which case sorting must be done explicitly) or for step 3 (in which case it has to go through all MessageTable entries).
You could try to return fewer records from the FTS search by making the message time a part of the FTS table and searching only for the last few days (and increasing or dropping the time if you don't get enough results).

Does a multi-column index work for single column selects too?

I've got (for example) an index:
CREATE INDEX someIndex ON orders (customer, date);
Does this index only accelerate queries where customer and date are used or does it accelerate queries for a single-column like this too?
SELECT * FROM orders WHERE customer > 33;
I'm using SQLite.
If the answer is yes, why is it possible to create more than one index per table?
Yet another question: How much faster is a combined index compared with two separat indexes when you use both columns in a query?
marc_s has the correct answer to your first question. The first key in a multi key index can work just like a single key index but any subsequent keys will not.
As for how much faster the composite index is depends on your data and how you structure your index and query, but it is usually significant. The indexes essentially allow Sqlite to do a binary search on the fields.
Using the example you gave if you ran the query:
SELECT * from orders where customer > 33 && date > 99
Sqlite would first get all results using a binary search on the entire table where customer > 33. Then it would do a binary search on only those results looking for date > 99.
If you did the same query with two separate indexes on customer and date, Sqlite would have to binary search the whole table twice, first for the customer and again for the date.
So how much of a speed increase you will see depends on how you structure your index with regard to your query. Ideally, the first field in your index and your query should be the one that eliminates the most possible matches as that will give the greatest speed increase by greatly reducing the amount of work the second search has to do.
For more information see this:
http://www.sqlite.org/optoverview.html
I'm pretty sure this will work, yes - it does in MS SQL Server anyway.
However, this index doesn't help you if you need to select on just the date, e.g. a date range. In that case, you might need to create a second index on just the date to make those queries more efficient.
Marc
I commonly use combined indexes to sort through data I wish to paginate or request "streamily".
Assuming a customer can make more than one order.. and customers 0 through 11 exist and there are several orders per customer all inserted in random order. I want to sort a query based on customer number followed by the date. You should sort the id field as well last to split sets where a customer has several identical dates (even if that may never happen).
sqlite> CREATE INDEX customer_asc_date_asc_index_asc ON orders
(customer ASC, date ASC, id ASC);
Get page 1 of a sorted query (limited to 10 items):
sqlite> SELECT id, customer, date FROM orders
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2653|1|1303828585
2520|1|1303828713
2583|1|1303829785
1828|1|1303830446
1756|1|1303830540
1761|1|1303831506
2442|1|1303831705
2523|1|1303833761
2160|1|1303835195
2645|1|1303837524
Get the next page:
sqlite> SELECT id, customer, date FROM orders WHERE
(customer = 1 AND date = 1303837524 and id > 2645) OR
(customer = 1 AND date > 1303837524) OR
(customer > 1)
ORDER BY customer ASC, date ASC, id ASC LIMIT 10;
2515|1|1303837914
2370|1|1303839573
1898|1|1303840317
1546|1|1303842312
1889|1|1303843243
2439|1|1303843699
2167|1|1303849376
1544|1|1303850494
2247|1|1303850869
2108|1|1303853285
And so on...
Having the indexes in place reduces server side index scanning when you would otherwise use a query OFFSET coupled with a LIMIT. The query time gets longer and the drives seek harder the higher the offset goes. Using this method eliminates that.
Using this method is advised if you plan on joining data later but only need a limited set of data per request. Join against a SUBSELECT as described above to reduce memory overhead for large tables.

Resources