How are you supposed to think of joins in cosmosdb? - azure-cosmosdb

I am very confused by the cosmosdb documentation on joins. When I think of a join conventionally, I think of 2 tables, with 1 shared id, on which I perform the join. These 2 tables have different schemas, but the result of the join is a combined table with a merge of the columns from both tables. The join for cosmosdb does not seem to me intuitively congruent with that.
I have a collection with heterogenous data. Each document can have a different structure from the next. I want to count the number of documents that have a value that is present in the result set of a subquery. Intuitively, I want to do something like this:
SELECT COUNT(1) as c
FROM CollectionName as outer
where outer.type = "table"
JOIN ((SELECT c.id from c where c.type = "database") as inner) on outer.databaseId == t.id
// count the number of tables that are in deleted databases
It would seem like I would need to do a join on the result of the subquery with the result of the outer query, and then process that resulting table. But I am not understanding right now how to do that:
Select COUNT(1)
from Collection outer
where outer.type = 'table'
JOIN (select c.id from c IN outer.databaseId where c.type = "database" and c.state = "deleted")
I am constantly getting a 400 with the above query. So how am I supposed to think about joins in cosmosdb?

Cosmos is a document database. It stores and operates on json data which can be in hierarchical format. Joins in Cosmos reference tuples within these hierarchies where they can be projected with other data in the document.
There is a really good article that talks through this at pretty deep level but also have lots of examples too, Joins in Cosmos DB.
This takes some getting used to writing queries like this but once you get the hang of it you'll be ok. You can easily practice queries using the Query Playground that has a bunch of sample queries for nutrition dataset with food and ingredients. Or follow along with the families data in the docs. You can create additional items and then write some queries to see how joins work.
Hope that is helpful.

Related

Using querys to get information from different tables

I am learning SQLite and I'm using this database to learn how to correctly use querys but I'm struggling specially when I have to use data from multiple tables to get some information.
For example, with the given database, is there a way to get the first name, last name and the name of songs that every customer has bought?
All you have to do is a simple SQL SELECT query. When you say you're having trouble getting it from multiple tables, I'm not sure if you're trying to get the data from all of the tables in one single query, as that is not necessary. You just need to have multiple instances of the SELECT query, just for different tables (and different column names).
SELECT firstName, lastName, songName FROM table_name
You have to study about JOINS:
select
c.FirstName,
c.LastName,
t.Name
from invoice_items ii
inner join tracks t on t.trackid = ii.trackid
inner join invoices i on i.invoiceid = ii.invoiceid
inner join customers c on c.customerid = i.customerid
In this query there are 4 tables involved and the diagram in the link you posted, shows exactly their relationships.
So you start from the table invoice_items where you find the bought songs and join the other 3 tables by providing the columns on which the join will be set.
One more useful thing to remember: aliases for tables (like c for customers) and if needed for columns also.
You need to use joins to get data from multiple tables. In this case I'd recommend you using inner joins.
In case your are not familiar with joins, this is a very good article that explains the different types of joins supported in SQLite.
SQLite INNER JOINS return all rows from multiple tables where the join
condition is met.
This query will return the first and last name of customers, and the tracks they purchased.
select customers.FirstName,
customers.LastName,
tracks.name as PurchasedTracks from invoice_items
inner join invoices on invoices.InvoiceId = invoice_items.InvoiceId
inner join customers on invoices.CustomerId = customers.CustomerId
inner join tracks on invoice_items.TrackId = tracks.TrackId
order by customers.LastName

Merge existing records in neo4j, remove duplicates, keep relationships

I've imported my millions of records using CREATE for performance reasons, now I want to MERGE the records together, and keep all the relationships intact.
Any ideas?
EDIT:
MATCH (c1:company), (c2:company)
WITH c1, c2
WHERE c1.name = c2.name
SET c1=c2
Is the type of thing I'm looking for.
If you want to merge nodes in cypher you can do something like this:
MATCH (c:Company)
WITH c.name as name, collect(c) as companies, count(*) as cnt
WHERE cnt > 1
WITH head(companies) as first, tail(companies) as rest
LIMIT 1000
UNWIND rest AS to_delete
MATCH (to_delete)<-[r:WORKS_AT]-(e:Employee)
MERGE (first)<-[:WORKS_AT]-(e)
DELETE r
DELETE to_delete
RETURN count(*);
see: http://www.neo4j.org/graphgist?dropbox-14493611%2Fmerge_nodes.adoc
It doesn't work that way. There is no way to move relationships around, and no way to coalesce existing nodes. You should use MERGE from the beginning, along with constraints and indexes to aid performance.

Combining data from two SQL queries

I'm porting my app from Django to ASP.NET Webforms (against my will, but what can we do with the corporate world..), and I'm used to Django generating all my SQL queries so now I need help.
I have 3 tables: proceso,marcador,marcador_progreso
Every proceso has many marcador_progreso, which in turn is the foreign key table to marcador.
So basically the tables look like:
proceso
id
marcador
id
text
marcador_progreso
id
marcador_id
proceso_id
state
For all the marcador_progreso where its proceso_id is the current proceso (from a QueryField in the URL), I need to list its state and it's respective marcador.text.
I've been working with EntityFramework but this is like a double query so I'm not sure how to do it.
I guess it is something that combines the following two statements, but I'm not sure how to do it.
SELECT [state] FROM [marcador_progreso]
SELECT [text] FROM [marcador] WHERE ([id] = marcador_id)
You want to do a JOIN:
SELECT mp.state, m.text
FROM marcador_progreso as mp
INNER JOIN marcador as m
ON mp.marcador_id = m.id
This is an excellent post that goes over the various join types.
You'll want to know about JOINs to call more than one table in your FROM clause. JOIN combines records from two or more tables in a database by using values common to each. There are different types - the SQL example below is an INNER join, which gets only records where both of the tables have a match on the common value. You may want to consider a LEFT join which would get any records that exist for the LEFT table (in this case marcador), even if there are not any matching record in the RIGHT(marcador_progreso ) table.
Pop the below in Management Studio, Play with different joins. Replace the INNER with LEFT, run it without the WHERE.
Read about `JOIN's.
In general, for your new venture of writing your own queries, they all start with the same basic structure:
SELECT (UPDATE,WHATEVER DML statement, etc) (COLUMNS) what you want to display (update,etc)
FROM (TABLE) where those records live
WHERE (FILTER/LIMIT) conditions that must be met by the data
Happy fetching!
SQL:
DECLARE #ProcessoId int
SET #ProcessoId = --1
SELECT m.[STATE],mp.[TEXT]
FROM marcador M
INNER JOIN marcador_progreso MP ON MP.marcador_id = m.id
WHERE proceso_id = #ProcessoId
EF INNER example
var marc = from m in yourcontext.marcador
join mp in yourcontext.marcador_progreso on m.id equals mp.marcador_id
where proceso_id == processoIdvariable
EF LEFT example
var marc = from m in yourcontext.marcador
join mp in yourcontext.marcador_progreso on m.id equals mp.marcador_id into details
from d in details.DefaultIfEmpty()
where proceso_id == processoIdvariable

How to increase performace on a sqlite3 database that uses QT?

I am using QSQlQuery on a sqlite3 database. To fetch a particular item , I was populating the result from 4 different tables. I thought joining the tables would increase the performance/speed and get the result faster. So I joined 2 tables initially but it takes longer time to fetch the data after joining the tables (?)
Any suggestion on how to improve the performance would be really appreciated. Also, I was looking at the http://qt-project.org/doc/qt-4.8/qsqlquery.html and it is mentioned that using setForwardOnly would increase the performance on some databases. Any idea if it would work for SQLite3?
Thanks!
According to this link,
http://sqlite.org/cvstrac/wiki?p=PerformanceTuning
SQLite implements JOIN USING by translating the USING clausing into some extra WHERE clause terms. It does the same with NATURAL JOIN and JOIN ON. So while those constructs might be helpful to the human reader, they don't really make any difference to SQLite's query optimizer.
-I was wrong to join two tables and expect the fetch to be faster. It does not work with SQLite database. Instead using a "where" clause and joining two results directly definitely has some positive impact on the performance.
(example :
select * from A,B where A.id = B.id where A.id = 1; instead of
select * from A left outer join B on A.id = B.id where A.id = 1)
The SQLite translates first statement to second before compiling and you could save on the small amount of CPU time by directly using the second statement

SQL Server Creating Indexes Catalogs with joined tables

So at the moment I am building a Advanced Search in .NET, and getting the results is just proving a bit slow so was looking at creating indexes on the tables.
I.e went to tables and define full text index.
So now I have my catalog with the 5 tables and selected columns.
But I cant see how this catalog actually joins these tables ?
I.e. in my "slow" stored procedure I could have
select *
from table1
inner join table2 ON table1.id = table2.linkedID
etc for other tables ?
and now I guess I can go
select * from catalogName
but how does catalogName know what columns to join for the inner join etc
You don't query the fultext catalog directly, you use the fulltext functions in your query, like CONTAINS, CONTAINSTABLE, FREETEXT and FREETEXTTABLE:
SELECT field, field, field
FROM table
WHERE CONTAINS(field, 'some text');
Full text has nothing to do with joining tables and if your query is slow because you join 5 tables then FT is unlikely to help at all.

Resources