How to get distinct values as a single document in CosmosDB

How to get distinct values as a single document in CosmosDB - azure-cosmosdb

When I do a SELECT DISTINCT c.Damage.Cause_of_Damage FROM c, where Cause_of_Damage is an array, I get a lot of documents with individual values back. Is there a way to concat all of them to a single array in a single result document?

It works using a self-join:
SELECT DISTINCT VALUE i
FROM c
JOIN i IN c.Damage.Cause_of_Damage

Related

Deleting duplicate rows

I am learning SQLite and constructed a line which I thought would delete dups but it deletes all rows instead.
DELETE from tablename WHERE rowid not in (SELECT distinct(timestamp) from tablename);
I expected this to delete rows with a duplicate (leaving one). I know I can simply create a new table with the distinct rows, but why does what I have done not work? Thanks

If timestamp is a column in the table and this is what you want to compare so to delete duplicates then do this:
delete from tablename
where exists (
select 1 from tablename t
where t.rowid < tablename.rowid and t.timestamp = tablename.timestamp
)

With recent versions of sqlite, the following is an alternative:
DELETE FROM tablename
WHERE rowid IN (SELECT rowid
FROM (SELECT rowid, row_number() OVER (PARTITION BY timestamp) AS rownum
FROM tablename)
WHERE rownum >= 2);

why does what I have done not work?
Consider the WHERE condition:
rowid not in (SELECT distinct(timestamp) from tablename)
The simple answer is that you are not comparing data in the same columns, nor are they columns with the same type of data. rowid is an automatically-incremented integer column and I assume that timestamp column is either a numeric or string column containing time values, or perhaps custom-generated sequential numeric values. Because rowid likely never matches a value in timestamp, then the NOT IN operation will always return true. Thus each row of the table will be deleted.
SQL is rather explicit and so there are no hidden/mysterious column comparisons. It will not automatically compare the rowid's from one query with another. Notice that the various alternative statements do something to distinguish rows with duplicate key values (timestamp in your case), either by direct comparison between main query and subquery, or using windowing functions to uniquely label rows with duplicate values, etc.
Just for kicks, here's another alternative that uses NOT IN like your original code.
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT max(t.rowid) FROM tablename t
GROUP BY t.timestamp )
First notice that this is comparing rowid with max(t.rowid), values which derive from the same column.
Because the subquery groups on t.timestamp, the aggregate function max() will return the greatest/last t.rowid separately for each set of rows with the same t.timestamp value. The resultant list will exclude t.rowid values that are less than the maximum. Thus, the NOT IN operation will not find those lesser values and will return true so they will be deleted.
It also uses basic SQL (no window functions... the OVER keyword). It will likely be more efficient than the alternative that references the outer query from the subquery, because this statement can execute the subquery just once and then use an efficient index to match individual records... it doesn't need to rerun the query for each row. For that matter, it should also be more efficient than the windowing function, because the window partition essentially "groups" on the partitioned columns, but must then execute the windowing function for each row, an extra step not present in the basic aggregate query. Efficiency is not always critical, but something important to consider.
By the way, the distinct keyword is not a function and does not need/accept parenthesis. It is a directive that applies to the entire select statement. The subquery is being interpreted as
SELECT DISTINCT (timestamp) FROM tablename
where DISTINCT is interpreted in isolation and the parenthesis are interpreted as a separate expression.
Update
These two queries will return the same data:
SELECT DISTINCT timestamp FROM tablename;
SELECT timestamp FROM tablename GROUP BY timestamp;
Both results eliminate duplicate rows from the output by showing only unique/distinct values, but neither has a "handle" (other data column) which indicates which rows to keep and which rows to eliminate. In other words, these queries return distinct values, but the results loose all relationship to the source rows and so have no use in specifying which source rows to delete (or keep). To understand better, you should run subqueries separately to inspect what they return so that you can understand and verify what data you're working with.
To make those queries useful, we need to do something to distinguish rows with duplicate key values. The rows need a "handle"--some other key value to select for either deleting or keeping those rows. Try this...
SELECT DISTINCT rowid, timestamp FROM tablename;
But that won't work, because it applies the DISTINCT keyword to ALL returned columns, but since rowid is already unique it will necessarily output each row separately and so there is no use to the query.
SELECT max(rowid), timestamp FROM tablename GROUP BY timestamp;
That query preserves the unique grouping, but provides just one rowid per timestamp as the "handle" to include/exclude for deletion.

try this
DELETE liens from liens where
id in
( SELECT * FROM (SELECT min(id) FROM liens group by lkey having count(*) > 1 ) AS c)
you can do this many times

sql query for extracting one column from many tables

I need your support for a query in SQLite Studio.
I am dealing with a database made by 1,000 different tables.
Half of them (all named "news" + an identification number, like 04AD86) contain the column "category" which I am interested in. This column can have from 100 to 200 records for each table.
Could you suggest me a query that extracts "category" from every table and returns a list of all possible categories (without duplicates records)?
Thanks a lot

You will probably need dynamic SQL to handle this in a single query. If you don't mind doing this over several queries, then here is one option. First do a query to obtain all the tables which contain the category column:
SELECT name
FROM sqlite_master
WHERE type = 'table' AND name LIKE 'news%'
Next, for the actual queries to obtain the unique categories, you can perform a series of unions to get your list. Here is what it would look like:
SELECT DISTINCT category
FROM news04AD86
UNION
SELECT DISTINCT category
FROM news 05BG34
UNION
...
The DISTINCT keyword will remove duplicates within any given name table, and UNION will remove duplicates which might occur between one table and another.

What is the fastest way of selecting by a list of strings in sqlite database?

I have database with with roughly following structure:
table1 (name) -< table2 -< table3 (score)
where -< means 1 to many relationship. What i need to do is for every string in a given list find the linked entry from table3 with a maximum score value. The way i do it now is quite slow, and i wonder of it could be sped up.
How i am doing this:
SELECT k.score,k.yaw,k.pitch,k.roll,k.kp_number,k.ke_number,k.points,k.elems --various fields of third table
FROM File
JOIN FaceDetection AS d ON d.f_id=File.file_id --joining second table
JOIN FaceKey AS k ON k.face_det=d.fd_id --joining third table
WHERE name=:fld
ORDER BY k.score DESC
I open transaction, prepare query with the above text, and in cycle retrieve the entries i am interested in from the database, then commit transaction. What are better, faster ways?

Indexes can be used for all the columns that are used for lookups or sorting, but a query cannot use more than one index per table.
Check the EXPLAIN QUERY PLAN output to see whether this query does table scans or uses indexes.
You are not returning values from any table but FaceKey, so you do not actually need to do a join.
However, rewriting the query as below might or might not help:
SELECT score,
yaw,
pitch,
roll,
kp_number,
ke_number,
points,
elems
FROM FaceKey
WHERE face_det IN (SELECT fd_id
FROM FaceDetection
WHERE f_id IN (SELECT file_id
FROM File
WHERE name = :fld))
ORDER BY score DESC

SQLite - Selecting not indexed column in GROUP BY

I have similar situation like question below.
Mysql speed up max() group by
SELECT MAX(id) id, cid FROM table GROUP BY cid
To optimize above query (shown in the question), creating index(cid, id) does the trick.
However, when I add a column that is not indexed to SELECT, query speed drastically slows down.
For example,
SELECT MAX(id) id, cid, newcolumn FROM table GROUP BY cid
If I create index(cid, id, newcolumn), query time comes back to minimal. It seems I should index all the columns I select while using GROUP BY.
Is there any way other than indexing all the columns to be select?

When all the columns used in the query are part of the index (which is then called a covering index), SQLite can get all values from the index and does not need to access the table itself.
When adding a column that is not indexed, each record must be looked up in both the index and the table.
Furthermore, the order of the records in the table is unlikely to be the same as the order in the index, so the table's pages are not read in order, and are read multiple times, which means that caching will not work as well.
The newcolumn values must be read from either the table or an index; there is no other mechanism to store data.
tl;dr: no

sqlite, slow joined query throught 4 tables

I have query through 4 tables: times, tags, users and categories.
Each table has no more than 400 records, but this query takes 70ms.
I need it many times (400x), so all procedure takes a total of about 30 seconds.
SELECT COUNT(*) FROM times
INNER JOIN tags ON times.user_id = tags.tag_id
INNER JOIN users ON tags.user_nr = users.nr
INNER JOIN categories ON users.category_id = categories.id
WHERE (times.time_raw < "000560")
AND (times.time_raw != 0 )
AND (times.cell != 1 )
AND (categories.name="kategory_A")
AND (times.run_id="08")
How can I make it faster?

Indexes is the solution!!
The following list gives guidelines in choosing columns to index:
•You should create indexes on columns that are used frequently in
WHERE clauses.
•You should create indexes on columns that are used
frequently to join tables.
•You should create indexes on columns
that are used frequently in ORDER BY clauses.
•You should create
indexes on columns that have few of the same values or unique values
in the table.
•You should not create indexes on small tables (tables
that use only a few blocks) because a full table scan may be faster
than an indexed query.
•If possible, choose a primary key that
orders the rows in the most appropriate order.
•If only one column
of the concatenated index is used frequently in WHERE clauses, place
that column first in the CREATE INDEX statement.
•If more than one
column in a concatenated index is used frequently in WHERE clauses,
place the most selective column first in the CREATE INDEX statement.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to get distinct values as a single document in CosmosDB - azure-cosmosdb

When I do a SELECT DISTINCT c.Damage.Cause_of_Damage FROM c, where Cause_of_Damage is an array, I get a lot of documents with individual values back. Is there a way to concat all of them to a single array in a single result document?

It works using a self-join: SELECT DISTINCT VALUE i FROM c JOIN i IN c.Damage.Cause_of_Damage

Related

Deleting duplicate rows

sql query for extracting one column from many tables

What is the fastest way of selecting by a list of strings in sqlite database?

SQLite - Selecting not indexed column in GROUP BY

sqlite, slow joined query throught 4 tables

Categories

Resources