I have created a simple ADF pipeline that has two sources (S1, S2) and stores data from these sources into Azure Cosmos DB sink using left outer join (condition: s1.abc = S2.abc). After running this pipeline, I can see all columns from S1 and none of the columns from S2. Why is that? Please help me understand.
I can see all columns from S1 and none of the columns from S2
Since you mentioned left outer join in your question, so i think you are using Data Flow Activity to transfer the data. I tested on my side an it works for me.
Firstly,please check the statement of left outer join in the official document:
Then please refer to my sample test:
I have 2 csv files as below:
My data flow activity as below, the B is the join key:
Output in cosmos db,the row from left stream has no match,so the output from right stream is NULL:
Related
I ran an EXPLAIN on a slow (2 minutes to return 2 sorted results) query in MariaDB, and some of the returned columns contain multiple values separated by a "|" symbol.
When using a better index (same query running in 20ms), EXPLAIN returns similar values but separated by a comma.
I spent the last hour looking for any kind of reference online, both in the MariaDB and MySQL documentation (since I'm not sure it's MariaDB-specific), but nothing relevant came up - not even a SO question.
Do you know what the "|" symbol means in this context? Considering the time difference vs the comma-separated result it feels like a combinatory operator, but adding "combinatory" or "exponential" as google search key didn't provide any additional insight.
EXPLAIN EXTENDED followed by SHOW WARNINGS didn't provide any additional insight either.
Return fields examples:
TYPE: ref|filter
KEY: key1|key2
KEY_LEN: 9|9
Rows: 2 (0%)
Extra: Using where; Using rowid filter
Thank you for any input!
EDIT: for additional context, here's the hibernate-generated query that produces the result above:
select * from things this_ left outer join rel_tab rt_ on this_.id=rt_.thing_id left outer join tab2 t2_ on this_.id=t2_.thing_id where this_.filter1=123 and this_.filter2=456 and this_.filter3=1 order by this_.id desc limit 20;
I also updated the explain plan result above with the filter selectivity.
I have been searching and searching and have to resolved to post! I'm still pretty new to R.
I have 2 data frames. The large one is HEAT and the small one is EE.
I have manage to do a left join to get EE matched up with HEAT.
df(HEAT)
Date Time. EVENT. Person. PersonID
DTgroup1. X. Code. Code
DTgroup2. X Code. Code
DTgroup3. Y. Code. Code
....
Then there is:
df(EE)
Person ID. Type. var 3. var 4 var 5
here is the merge that I used:
merge <- left_join(HEAT, EE)
I have managed to merge the two data frames but I loose all the data in df(EE) except for the PersonID that it share with df(HEAT).
Does anyone have any advice about what I am doing wrong?
Thanks a bunch!
A left join will keep all rows on the left side, in your case HEAT, and include data where there is a match on the right hand side.
An inner join, would only return records where there is a valid join on both sides, in your case, one record would be returned.
See What is the difference between “INNER JOIN” and “OUTER JOIN”? for more info.
Obviously, you want a
merge <- full_join(HEAT, EE)
Here is a nice Cheat sheet page http://stat545.com/bit001_dplyr-cheatsheet.html
And here a super nice graphics http://r4ds.had.co.nz/relational-data.html
I have some tables in Hive that I need to join together. Since I need to do some work on each of them, normalize the key, remove outliers.... and as I add more and more tables... This chaining process turned out to be a big mass.
It is so easy to get lost where you are and the query is getting out of control.
However, I have a pretty clear idea how the final table should look like and each column is fairly independent of the other tables.
For examp, here is an example:
table_class1
name id score
Alex 1 90
Chad 3 50
...
table_class2
name id score
Alexandar 1 50
Benjamin 2 100
...
In the end I really want something looks like:
name id class1 class2 ...
alex 1 90 50
ben 2 100 NA
chad 3 50 NA
I know it could be a left outer join, but I am really having a hard time to create a seperate table for each of them after the normalization and then use left outer join with the union of the keys to left outer join each of them...
I am thinking about using NOSQL(HBase) to dump the processed data into NOSQL format.. like:
(source, key, variable, value)
(table_class1, (alex, 1), class1, 90)
(table_class1, (chad, 3), class1, 50)
(table_class2, (alex, 1), class2, 50)
(table_class2, (benjamin, 2), class2, 100)
...
In the end, I want to use something like the melt and cast in R reshape package to bring that data back to be a table.
Since this is a big data project, and there will be hundreds of millions of key value pairs in HBase.
(1) I don't know if this is a legit approach
(2) If so, is there any big data tool to pivot long HBase table into a Hive table.
Honestly, I would love to help more, but I am not clear about what you're trying to achieve (maybe because I've never used R), please elaborate and I'll try to improve my answer if necessary.
Why do you need HBase for? You can store your processed data in new tables and work with them, you can even CREATE VIEW to simplify the query if it's too large, maybe that's what you're looking for (HIVE manual). Unless you have a good reason for using HBase, I'll stick just to HIVE to avoid additional complexity, don't get me wrong, there are a lot of valid reasons for using HBase.
About your second question, you can define and use HBase tables as HIVE tables, you can even CREATE and SELECT INSERT into them all inside HIVE, is that what you're looking for?: HBase/HIVE integration doc
One last thing in case you don't know, you can create custom functions in HIVE very easily to help you with the tedious normalization process, take a look at this.
I have 3 tables in a SQLite database for an Android app. This picture below shows the relevant tables that I'm working with.
Tables
I'm trying to get two fields, value and name, from measurement_lines and competences respectively, tied to a specific person_id in measurements. I'm trying to make a query that returns these fields but I'm having little luck. The best I've got so far is the following query:
SELECT name, value
FROM measurements, measurement_lines, competences
WHERE measurements.id = measurement_lines.measurements_id
AND measurement_lines.competences_id = competences.id
AND measurements.persons_id = 1
This, however, has one issue. This query won't return any records when a person has no entries in measurements (and subsequently, nothing in measurement_lines). What I want is to always get a list of competence names, even if the value column is empty. I'm guessing I need a Left Outer Join for this but I can't seem to make it work. The following query just returns no records:
SELECT name, value
FROM measurements AS m, competences AS c
LEFT OUTER JOIN measurement_lines AS ml ON c._id = ml.competence_id
WHERE ml.measurement_id = m._id AND m.persons_id = 1
For inner joins, you can be sloppy with the distinction between join conditions and selection predicates, but when outer joins are involved that makes a difference. Any criterion appearing in the WHERE clause filters your result rows after all joins are performed (logically, at least), which can remove result rows associated with outer tables.
In addition, if you're ever uncertain about join order, you can use parentheses to make your intent clear. At least in many DBMSs. It lokos like SQLite doesn't support them.
It looks like you may want this: (edited to avoid use of parentheses)
SELECT c.name, pm.value
FROM competences c
LEFT OUTER JOIN (
SELECT ml.competences_id AS cid,
ml.value AS value
FROM measurement_lines ml
INNER JOIN measurements m
ON m.id = ml.measurements_id
WHERE m.person_id = 1
) pm
ON pm.cid = c.id
I have two tables, ta and tb:
ta:
key col1
--------
k1 a
k2 c
tb:
key col2
-------
k2 cc
k3 ee
They connected by "key". I want to know how can I get a table, tc, like:
key col1 col2
-------------
k1 a
k2 c cc
k3 ee
Is there a easy method instead of inserting every record? They are one million records of tables so I need an effective way.
Make a VIEW of the two tables. Write a SELECT ... JOIN statement that gives you the result you want, and then use that as the base for a VIEW.
Example:
CREATE VIEW
database.viewname
AS
SELECT
ta.key,
ta.col1,
tb.col2
FROM
ta
LEFT JOIN
tb
USING(key)
Using a VIEW is the right way to go if you're looking for the data to reflect changes in the original tables.
If you do actually want the data to be copied into a new table, you'll need to do something like:
CREATE TABLE tc(key,col1,col2)
INSERT INTO tc (key,col1,col2)
SELECT ta.key, ta.col1, tb.col2
FROM ta FULL OUTER JOIN tb USING(key)
That will populate the new table with data from the old tables, but they'll be able to vary independently.
For what you are looking for you will need to do a FULL OUTER JOIN to make sure you don't miss any keys. Once you have the query working you can think about just using it or creating a view.
You may need to work around limitations of the DB if FULL OUTER JOIN isn't implemented you can normally just UNION a left and right outer join to create your full.